Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of easy overlapping changes in the confict
resolutions here.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/Documentation/bpf/README.rst b/Documentation/bpf/README.rst
new file mode 100644
index 0000000..b9a80c9
--- /dev/null
+++ b/Documentation/bpf/README.rst
@@ -0,0 +1,36 @@
+=================
+BPF documentation
+=================
+
+This directory contains documentation for the BPF (Berkeley Packet
+Filter) facility, with a focus on the extended BPF version (eBPF).
+
+This kernel side documentation is still work in progress. The main
+textual documentation is (for historical reasons) described in
+`Documentation/networking/filter.txt`_, which describe both classical
+and extended BPF instruction-set.
+The Cilium project also maintains a `BPF and XDP Reference Guide`_
+that goes into great technical depth about the BPF Architecture.
+
+The primary info for the bpf syscall is available in the `man-pages`_
+for `bpf(2)`_.
+
+
+
+Frequently asked questions (FAQ)
+================================
+
+Two sets of Questions and Answers (Q&A) are maintained.
+
+* QA for common questions about BPF see: bpf_design_QA_
+
+* QA for developers interacting with BPF subsystem: bpf_devel_QA_
+
+
+.. Links:
+.. _bpf_design_QA: bpf_design_QA.rst
+.. _bpf_devel_QA: bpf_devel_QA.rst
+.. _Documentation/networking/filter.txt: ../networking/filter.txt
+.. _man-pages: https://www.kernel.org/doc/man-pages/
+.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
+.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst
new file mode 100644
index 0000000..6780a6d
--- /dev/null
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -0,0 +1,221 @@
+==============
+BPF Design Q&A
+==============
+
+BPF extensibility and applicability to networking, tracing, security
+in the linux kernel and several user space implementations of BPF
+virtual machine led to a number of misunderstanding on what BPF actually is.
+This short QA is an attempt to address that and outline a direction
+of where BPF is heading long term.
+
+.. contents::
+ :local:
+ :depth: 3
+
+Questions and Answers
+=====================
+
+Q: Is BPF a generic instruction set similar to x64 and arm64?
+-------------------------------------------------------------
+A: NO.
+
+Q: Is BPF a generic virtual machine ?
+-------------------------------------
+A: NO.
+
+BPF is generic instruction set *with* C calling convention.
+-----------------------------------------------------------
+
+Q: Why C calling convention was chosen?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A: Because BPF programs are designed to run in the linux kernel
+which is written in C, hence BPF defines instruction set compatible
+with two most used architectures x64 and arm64 (and takes into
+consideration important quirks of other architectures) and
+defines calling convention that is compatible with C calling
+convention of the linux kernel on those architectures.
+
+Q: can multiple return values be supported in the future?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A: NO. BPF allows only register R0 to be used as return value.
+
+Q: can more than 5 function arguments be supported in the future?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A: NO. BPF calling convention only allows registers R1-R5 to be used
+as arguments. BPF is not a standalone instruction set.
+(unlike x64 ISA that allows msft, cdecl and other conventions)
+
+Q: can BPF programs access instruction pointer or return address?
+-----------------------------------------------------------------
+A: NO.
+
+Q: can BPF programs access stack pointer ?
+------------------------------------------
+A: NO.
+
+Only frame pointer (register R10) is accessible.
+From compiler point of view it's necessary to have stack pointer.
+For example LLVM defines register R11 as stack pointer in its
+BPF backend, but it makes sure that generated code never uses it.
+
+Q: Does C-calling convention diminishes possible use cases?
+-----------------------------------------------------------
+A: YES.
+
+BPF design forces addition of major functionality in the form
+of kernel helper functions and kernel objects like BPF maps with
+seamless interoperability between them. It lets kernel call into
+BPF programs and programs call kernel helpers with zero overhead.
+As all of them were native C code. That is particularly the case
+for JITed BPF programs that are indistinguishable from
+native kernel C code.
+
+Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
+------------------------------------------------------------------------
+A: Soft yes.
+
+At least for now until BPF core has support for
+bpf-to-bpf calls, indirect calls, loops, global variables,
+jump tables, read only sections and all other normal constructs
+that C code can produce.
+
+Q: Can loops be supported in a safe way?
+----------------------------------------
+A: It's not clear yet.
+
+BPF developers are trying to find a way to
+support bounded loops where the verifier can guarantee that
+the program terminates in less than 4096 instructions.
+
+Instruction level questions
+---------------------------
+
+Q: LD_ABS and LD_IND instructions vs C code
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Q: How come LD_ABS and LD_IND instruction are present in BPF whereas
+C code cannot express them and has to use builtin intrinsics?
+
+A: This is artifact of compatibility with classic BPF. Modern
+networking code in BPF performs better without them.
+See 'direct packet access'.
+
+Q: BPF instructions mapping not one-to-one to native CPU
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Q: It seems not all BPF instructions are one-to-one to native CPU.
+For example why BPF_JNE and other compare and jumps are not cpu-like?
+
+A: This was necessary to avoid introducing flags into ISA which are
+impossible to make generic and efficient across CPU architectures.
+
+Q: why BPF_DIV instruction doesn't map to x64 div?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A: Because if we picked one-to-one relationship to x64 it would have made
+it more complicated to support on arm64 and other archs. Also it
+needs div-by-zero runtime check.
+
+Q: why there is no BPF_SDIV for signed divide operation?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A: Because it would be rarely used. llvm errors in such case and
+prints a suggestion to use unsigned divide instead
+
+Q: Why BPF has implicit prologue and epilogue?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A: Because architectures like sparc have register windows and in general
+there are enough subtle differences between architectures, so naive
+store return address into stack won't work. Another reason is BPF has
+to be safe from division by zero (and legacy exception path
+of LD_ABS insn). Those instructions need to invoke epilogue and
+return implicitly.
+
+Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A: Because classic BPF didn't have them and BPF authors felt that compiler
+workaround would be acceptable. Turned out that programs lose performance
+due to lack of these compare instructions and they were added.
+These two instructions is a perfect example what kind of new BPF
+instructions are acceptable and can be added in the future.
+These two already had equivalent instructions in native CPUs.
+New instructions that don't have one-to-one mapping to HW instructions
+will not be accepted.
+
+Q: BPF 32-bit subregister requirements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
+registers which makes BPF inefficient virtual machine for 32-bit
+CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
+be added to BPF in the future?
+
+A: NO. The first thing to improve performance on 32-bit archs is to teach
+LLVM to generate code that uses 32-bit subregisters. Then second step
+is to teach verifier to mark operations where zero-ing upper bits
+is unnecessary. Then JITs can take advantage of those markings and
+drastically reduce size of generated code and improve performance.
+
+Q: Does BPF have a stable ABI?
+------------------------------
+A: YES. BPF instructions, arguments to BPF programs, set of helper
+functions and their arguments, recognized return codes are all part
+of ABI. However when tracing programs are using bpf_probe_read() helper
+to walk kernel internal datastructures and compile with kernel
+internal headers these accesses can and will break with newer
+kernels. The union bpf_attr -> kern_version is checked at load time
+to prevent accidentally loading kprobe-based bpf programs written
+for a different kernel. Networking programs don't do kern_version check.
+
+Q: How much stack space a BPF program uses?
+-------------------------------------------
+A: Currently all program types are limited to 512 bytes of stack
+space, but the verifier computes the actual amount of stack used
+and both interpreter and most JITed code consume necessary amount.
+
+Q: Can BPF be offloaded to HW?
+------------------------------
+A: YES. BPF HW offload is supported by NFP driver.
+
+Q: Does classic BPF interpreter still exist?
+--------------------------------------------
+A: NO. Classic BPF programs are converted into extend BPF instructions.
+
+Q: Can BPF call arbitrary kernel functions?
+-------------------------------------------
+A: NO. BPF programs can only call a set of helper functions which
+is defined for every program type.
+
+Q: Can BPF overwrite arbitrary kernel memory?
+---------------------------------------------
+A: NO.
+
+Tracing bpf programs can *read* arbitrary memory with bpf_probe_read()
+and bpf_probe_read_str() helpers. Networking programs cannot read
+arbitrary memory, since they don't have access to these helpers.
+Programs can never read or write arbitrary memory directly.
+
+Q: Can BPF overwrite arbitrary user memory?
+-------------------------------------------
+A: Sort-of.
+
+Tracing BPF programs can overwrite the user memory
+of the current task with bpf_probe_write_user(). Every time such
+program is loaded the kernel will print warning message, so
+this helper is only useful for experiments and prototypes.
+Tracing BPF programs are root only.
+
+Q: bpf_trace_printk() helper warning
+------------------------------------
+Q: When bpf_trace_printk() helper is used the kernel prints nasty
+warning message. Why is that?
+
+A: This is done to nudge program authors into better interfaces when
+programs need to pass data to user space. Like bpf_perf_event_output()
+can be used to efficiently stream data via perf ring buffer.
+BPF maps can be used for asynchronous data sharing between kernel
+and user space. bpf_trace_printk() should only be used for debugging.
+
+Q: New functionality via kernel modules?
+----------------------------------------
+Q: Can BPF functionality such as new program or map types, new
+helpers, etc be added out of kernel module code?
+
+A: NO.
diff --git a/Documentation/bpf/bpf_design_QA.txt b/Documentation/bpf/bpf_design_QA.txt
deleted file mode 100644
index f3e458a..0000000
--- a/Documentation/bpf/bpf_design_QA.txt
+++ /dev/null
@@ -1,156 +0,0 @@
-BPF extensibility and applicability to networking, tracing, security
-in the linux kernel and several user space implementations of BPF
-virtual machine led to a number of misunderstanding on what BPF actually is.
-This short QA is an attempt to address that and outline a direction
-of where BPF is heading long term.
-
-Q: Is BPF a generic instruction set similar to x64 and arm64?
-A: NO.
-
-Q: Is BPF a generic virtual machine ?
-A: NO.
-
-BPF is generic instruction set _with_ C calling convention.
-
-Q: Why C calling convention was chosen?
-A: Because BPF programs are designed to run in the linux kernel
- which is written in C, hence BPF defines instruction set compatible
- with two most used architectures x64 and arm64 (and takes into
- consideration important quirks of other architectures) and
- defines calling convention that is compatible with C calling
- convention of the linux kernel on those architectures.
-
-Q: can multiple return values be supported in the future?
-A: NO. BPF allows only register R0 to be used as return value.
-
-Q: can more than 5 function arguments be supported in the future?
-A: NO. BPF calling convention only allows registers R1-R5 to be used
- as arguments. BPF is not a standalone instruction set.
- (unlike x64 ISA that allows msft, cdecl and other conventions)
-
-Q: can BPF programs access instruction pointer or return address?
-A: NO.
-
-Q: can BPF programs access stack pointer ?
-A: NO. Only frame pointer (register R10) is accessible.
- From compiler point of view it's necessary to have stack pointer.
- For example LLVM defines register R11 as stack pointer in its
- BPF backend, but it makes sure that generated code never uses it.
-
-Q: Does C-calling convention diminishes possible use cases?
-A: YES. BPF design forces addition of major functionality in the form
- of kernel helper functions and kernel objects like BPF maps with
- seamless interoperability between them. It lets kernel call into
- BPF programs and programs call kernel helpers with zero overhead.
- As all of them were native C code. That is particularly the case
- for JITed BPF programs that are indistinguishable from
- native kernel C code.
-
-Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
-A: Soft yes. At least for now until BPF core has support for
- bpf-to-bpf calls, indirect calls, loops, global variables,
- jump tables, read only sections and all other normal constructs
- that C code can produce.
-
-Q: Can loops be supported in a safe way?
-A: It's not clear yet. BPF developers are trying to find a way to
- support bounded loops where the verifier can guarantee that
- the program terminates in less than 4096 instructions.
-
-Q: How come LD_ABS and LD_IND instruction are present in BPF whereas
- C code cannot express them and has to use builtin intrinsics?
-A: This is artifact of compatibility with classic BPF. Modern
- networking code in BPF performs better without them.
- See 'direct packet access'.
-
-Q: It seems not all BPF instructions are one-to-one to native CPU.
- For example why BPF_JNE and other compare and jumps are not cpu-like?
-A: This was necessary to avoid introducing flags into ISA which are
- impossible to make generic and efficient across CPU architectures.
-
-Q: why BPF_DIV instruction doesn't map to x64 div?
-A: Because if we picked one-to-one relationship to x64 it would have made
- it more complicated to support on arm64 and other archs. Also it
- needs div-by-zero runtime check.
-
-Q: why there is no BPF_SDIV for signed divide operation?
-A: Because it would be rarely used. llvm errors in such case and
- prints a suggestion to use unsigned divide instead
-
-Q: Why BPF has implicit prologue and epilogue?
-A: Because architectures like sparc have register windows and in general
- there are enough subtle differences between architectures, so naive
- store return address into stack won't work. Another reason is BPF has
- to be safe from division by zero (and legacy exception path
- of LD_ABS insn). Those instructions need to invoke epilogue and
- return implicitly.
-
-Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
-A: Because classic BPF didn't have them and BPF authors felt that compiler
- workaround would be acceptable. Turned out that programs lose performance
- due to lack of these compare instructions and they were added.
- These two instructions is a perfect example what kind of new BPF
- instructions are acceptable and can be added in the future.
- These two already had equivalent instructions in native CPUs.
- New instructions that don't have one-to-one mapping to HW instructions
- will not be accepted.
-
-Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
- registers which makes BPF inefficient virtual machine for 32-bit
- CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
- be added to BPF in the future?
-A: NO. The first thing to improve performance on 32-bit archs is to teach
- LLVM to generate code that uses 32-bit subregisters. Then second step
- is to teach verifier to mark operations where zero-ing upper bits
- is unnecessary. Then JITs can take advantage of those markings and
- drastically reduce size of generated code and improve performance.
-
-Q: Does BPF have a stable ABI?
-A: YES. BPF instructions, arguments to BPF programs, set of helper
- functions and their arguments, recognized return codes are all part
- of ABI. However when tracing programs are using bpf_probe_read() helper
- to walk kernel internal datastructures and compile with kernel
- internal headers these accesses can and will break with newer
- kernels. The union bpf_attr -> kern_version is checked at load time
- to prevent accidentally loading kprobe-based bpf programs written
- for a different kernel. Networking programs don't do kern_version check.
-
-Q: How much stack space a BPF program uses?
-A: Currently all program types are limited to 512 bytes of stack
- space, but the verifier computes the actual amount of stack used
- and both interpreter and most JITed code consume necessary amount.
-
-Q: Can BPF be offloaded to HW?
-A: YES. BPF HW offload is supported by NFP driver.
-
-Q: Does classic BPF interpreter still exist?
-A: NO. Classic BPF programs are converted into extend BPF instructions.
-
-Q: Can BPF call arbitrary kernel functions?
-A: NO. BPF programs can only call a set of helper functions which
- is defined for every program type.
-
-Q: Can BPF overwrite arbitrary kernel memory?
-A: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read()
- and bpf_probe_read_str() helpers. Networking programs cannot read
- arbitrary memory, since they don't have access to these helpers.
- Programs can never read or write arbitrary memory directly.
-
-Q: Can BPF overwrite arbitrary user memory?
-A: Sort-of. Tracing BPF programs can overwrite the user memory
- of the current task with bpf_probe_write_user(). Every time such
- program is loaded the kernel will print warning message, so
- this helper is only useful for experiments and prototypes.
- Tracing BPF programs are root only.
-
-Q: When bpf_trace_printk() helper is used the kernel prints nasty
- warning message. Why is that?
-A: This is done to nudge program authors into better interfaces when
- programs need to pass data to user space. Like bpf_perf_event_output()
- can be used to efficiently stream data via perf ring buffer.
- BPF maps can be used for asynchronous data sharing between kernel
- and user space. bpf_trace_printk() should only be used for debugging.
-
-Q: Can BPF functionality such as new program or map types, new
- helpers, etc be added out of kernel module code?
-A: NO.
diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst
new file mode 100644
index 0000000..0e7c1d9
--- /dev/null
+++ b/Documentation/bpf/bpf_devel_QA.rst
@@ -0,0 +1,640 @@
+=================================
+HOWTO interact with BPF subsystem
+=================================
+
+This document provides information for the BPF subsystem about various
+workflows related to reporting bugs, submitting patches, and queueing
+patches for stable kernels.
+
+For general information about submitting patches, please refer to
+`Documentation/process/`_. This document only describes additional specifics
+related to BPF.
+
+.. contents::
+ :local:
+ :depth: 2
+
+Reporting bugs
+==============
+
+Q: How do I report bugs for BPF kernel code?
+--------------------------------------------
+A: Since all BPF kernel development as well as bpftool and iproute2 BPF
+loader development happens through the netdev kernel mailing list,
+please report any found issues around BPF to the following mailing
+list:
+
+ netdev@vger.kernel.org
+
+This may also include issues related to XDP, BPF tracing, etc.
+
+Given netdev has a high volume of traffic, please also add the BPF
+maintainers to Cc (from kernel MAINTAINERS_ file):
+
+* Alexei Starovoitov <ast@kernel.org>
+* Daniel Borkmann <daniel@iogearbox.net>
+
+In case a buggy commit has already been identified, make sure to keep
+the actual commit authors in Cc as well for the report. They can
+typically be identified through the kernel's git tree.
+
+**Please do NOT report BPF issues to bugzilla.kernel.org since it
+is a guarantee that the reported issue will be overlooked.**
+
+Submitting patches
+==================
+
+Q: To which mailing list do I need to submit my BPF patches?
+------------------------------------------------------------
+A: Please submit your BPF patches to the netdev kernel mailing list:
+
+ netdev@vger.kernel.org
+
+Historically, BPF came out of networking and has always been maintained
+by the kernel networking community. Although these days BPF touches
+many other subsystems as well, the patches are still routed mainly
+through the networking community.
+
+In case your patch has changes in various different subsystems (e.g.
+tracing, security, etc), make sure to Cc the related kernel mailing
+lists and maintainers from there as well, so they are able to review
+the changes and provide their Acked-by's to the patches.
+
+Q: Where can I find patches currently under discussion for BPF subsystem?
+-------------------------------------------------------------------------
+A: All patches that are Cc'ed to netdev are queued for review under netdev
+patchwork project:
+
+ http://patchwork.ozlabs.org/project/netdev/list/
+
+Those patches which target BPF, are assigned to a 'bpf' delegate for
+further processing from BPF maintainers. The current queue with
+patches under review can be found at:
+
+ https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147
+
+Once the patches have been reviewed by the BPF community as a whole
+and approved by the BPF maintainers, their status in patchwork will be
+changed to 'Accepted' and the submitter will be notified by mail. This
+means that the patches look good from a BPF perspective and have been
+applied to one of the two BPF kernel trees.
+
+In case feedback from the community requires a respin of the patches,
+their status in patchwork will be set to 'Changes Requested', and purged
+from the current review queue. Likewise for cases where patches would
+get rejected or are not applicable to the BPF trees (but assigned to
+the 'bpf' delegate).
+
+Q: How do the changes make their way into Linux?
+------------------------------------------------
+A: There are two BPF kernel trees (git repositories). Once patches have
+been accepted by the BPF maintainers, they will be applied to one
+of the two BPF trees:
+
+ * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/
+ * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
+
+The bpf tree itself is for fixes only, whereas bpf-next for features,
+cleanups or other kind of improvements ("next-like" content). This is
+analogous to net and net-next trees for networking. Both bpf and
+bpf-next will only have a master branch in order to simplify against
+which branch patches should get rebased to.
+
+Accumulated BPF patches in the bpf tree will regularly get pulled
+into the net kernel tree. Likewise, accumulated BPF patches accepted
+into the bpf-next tree will make their way into net-next tree. net and
+net-next are both run by David S. Miller. From there, they will go
+into the kernel mainline tree run by Linus Torvalds. To read up on the
+process of net and net-next being merged into the mainline tree, see
+the `netdev FAQ`_ under:
+
+ `Documentation/networking/netdev-FAQ.txt`_
+
+Occasionally, to prevent merge conflicts, we might send pull requests
+to other trees (e.g. tracing) with a small subset of the patches, but
+net and net-next are always the main trees targeted for integration.
+
+The pull requests will contain a high-level summary of the accumulated
+patches and can be searched on netdev kernel mailing list through the
+following subject lines (``yyyy-mm-dd`` is the date of the pull
+request)::
+
+ pull-request: bpf yyyy-mm-dd
+ pull-request: bpf-next yyyy-mm-dd
+
+Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to?
+---------------------------------------------------------------------------------
+
+A: The process is the very same as described in the `netdev FAQ`_, so
+please read up on it. The subject line must indicate whether the
+patch is a fix or rather "next-like" content in order to let the
+maintainers know whether it is targeted at bpf or bpf-next.
+
+For fixes eventually landing in bpf -> net tree, the subject must
+look like::
+
+ git format-patch --subject-prefix='PATCH bpf' start..finish
+
+For features/improvements/etc that should eventually land in
+bpf-next -> net-next, the subject must look like::
+
+ git format-patch --subject-prefix='PATCH bpf-next' start..finish
+
+If unsure whether the patch or patch series should go into bpf
+or net directly, or bpf-next or net-next directly, it is not a
+problem either if the subject line says net or net-next as target.
+It is eventually up to the maintainers to do the delegation of
+the patches.
+
+If it is clear that patches should go into bpf or bpf-next tree,
+please make sure to rebase the patches against those trees in
+order to reduce potential conflicts.
+
+In case the patch or patch series has to be reworked and sent out
+again in a second or later revision, it is also required to add a
+version number (``v2``, ``v3``, ...) into the subject prefix::
+
+ git format-patch --subject-prefix='PATCH net-next v2' start..finish
+
+When changes have been requested to the patch series, always send the
+whole patch series again with the feedback incorporated (never send
+individual diffs on top of the old series).
+
+Q: What does it mean when a patch gets applied to bpf or bpf-next tree?
+-----------------------------------------------------------------------
+A: It means that the patch looks good for mainline inclusion from
+a BPF point of view.
+
+Be aware that this is not a final verdict that the patch will
+automatically get accepted into net or net-next trees eventually:
+
+On the netdev kernel mailing list reviews can come in at any point
+in time. If discussions around a patch conclude that they cannot
+get included as-is, we will either apply a follow-up fix or drop
+them from the trees entirely. Therefore, we also reserve to rebase
+the trees when deemed necessary. After all, the purpose of the tree
+is to:
+
+i) accumulate and stage BPF patches for integration into trees
+ like net and net-next, and
+
+ii) run extensive BPF test suite and
+ workloads on the patches before they make their way any further.
+
+Once the BPF pull request was accepted by David S. Miller, then
+the patches end up in net or net-next tree, respectively, and
+make their way from there further into mainline. Again, see the
+`netdev FAQ`_ for additional information e.g. on how often they are
+merged to mainline.
+
+Q: How long do I need to wait for feedback on my BPF patches?
+-------------------------------------------------------------
+A: We try to keep the latency low. The usual time to feedback will
+be around 2 or 3 business days. It may vary depending on the
+complexity of changes and current patch load.
+
+Q: How often do you send pull requests to major kernel trees like net or net-next?
+----------------------------------------------------------------------------------
+
+A: Pull requests will be sent out rather often in order to not
+accumulate too many patches in bpf or bpf-next.
+
+As a rule of thumb, expect pull requests for each tree regularly
+at the end of the week. In some cases pull requests could additionally
+come also in the middle of the week depending on the current patch
+load or urgency.
+
+Q: Are patches applied to bpf-next when the merge window is open?
+-----------------------------------------------------------------
+A: For the time when the merge window is open, bpf-next will not be
+processed. This is roughly analogous to net-next patch processing,
+so feel free to read up on the `netdev FAQ`_ about further details.
+
+During those two weeks of merge window, we might ask you to resend
+your patch series once bpf-next is open again. Once Linus released
+a ``v*-rc1`` after the merge window, we continue processing of bpf-next.
+
+For non-subscribers to kernel mailing lists, there is also a status
+page run by David S. Miller on net-next that provides guidance:
+
+ http://vger.kernel.org/~davem/net-next.html
+
+Q: Verifier changes and test cases
+----------------------------------
+Q: I made a BPF verifier change, do I need to add test cases for
+BPF kernel selftests_?
+
+A: If the patch has changes to the behavior of the verifier, then yes,
+it is absolutely necessary to add test cases to the BPF kernel
+selftests_ suite. If they are not present and we think they are
+needed, then we might ask for them before accepting any changes.
+
+In particular, test_verifier.c is tracking a high number of BPF test
+cases, including a lot of corner cases that LLVM BPF back end may
+generate out of the restricted C code. Thus, adding test cases is
+absolutely crucial to make sure future changes do not accidentally
+affect prior use-cases. Thus, treat those test cases as: verifier
+behavior that is not tracked in test_verifier.c could potentially
+be subject to change.
+
+Q: samples/bpf preference vs selftests?
+---------------------------------------
+Q: When should I add code to `samples/bpf/`_ and when to BPF kernel
+selftests_ ?
+
+A: In general, we prefer additions to BPF kernel selftests_ rather than
+`samples/bpf/`_. The rationale is very simple: kernel selftests are
+regularly run by various bots to test for kernel regressions.
+
+The more test cases we add to BPF selftests, the better the coverage
+and the less likely it is that those could accidentally break. It is
+not that BPF kernel selftests cannot demo how a specific feature can
+be used.
+
+That said, `samples/bpf/`_ may be a good place for people to get started,
+so it might be advisable that simple demos of features could go into
+`samples/bpf/`_, but advanced functional and corner-case testing rather
+into kernel selftests.
+
+If your sample looks like a test case, then go for BPF kernel selftests
+instead!
+
+Q: When should I add code to the bpftool?
+-----------------------------------------
+A: The main purpose of bpftool (under tools/bpf/bpftool/) is to provide
+a central user space tool for debugging and introspection of BPF programs
+and maps that are active in the kernel. If UAPI changes related to BPF
+enable for dumping additional information of programs or maps, then
+bpftool should be extended as well to support dumping them.
+
+Q: When should I add code to iproute2's BPF loader?
+---------------------------------------------------
+A: For UAPI changes related to the XDP or tc layer (e.g. ``cls_bpf``),
+the convention is that those control-path related changes are added to
+iproute2's BPF loader as well from user space side. This is not only
+useful to have UAPI changes properly designed to be usable, but also
+to make those changes available to a wider user base of major
+downstream distributions.
+
+Q: Do you accept patches as well for iproute2's BPF loader?
+-----------------------------------------------------------
+A: Patches for the iproute2's BPF loader have to be sent to:
+
+ netdev@vger.kernel.org
+
+While those patches are not processed by the BPF kernel maintainers,
+please keep them in Cc as well, so they can be reviewed.
+
+The official git repository for iproute2 is run by Stephen Hemminger
+and can be found at:
+
+ https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/
+
+The patches need to have a subject prefix of '``[PATCH iproute2
+master]``' or '``[PATCH iproute2 net-next]``'. '``master``' or
+'``net-next``' describes the target branch where the patch should be
+applied to. Meaning, if kernel changes went into the net-next kernel
+tree, then the related iproute2 changes need to go into the iproute2
+net-next branch, otherwise they can be targeted at master branch. The
+iproute2 net-next branch will get merged into the master branch after
+the current iproute2 version from master has been released.
+
+Like BPF, the patches end up in patchwork under the netdev project and
+are delegated to 'shemminger' for further processing:
+
+ http://patchwork.ozlabs.org/project/netdev/list/?delegate=389
+
+Q: What is the minimum requirement before I submit my BPF patches?
+------------------------------------------------------------------
+A: When submitting patches, always take the time and properly test your
+patches *prior* to submission. Never rush them! If maintainers find
+that your patches have not been properly tested, it is a good way to
+get them grumpy. Testing patch submissions is a hard requirement!
+
+Note, fixes that go to bpf tree *must* have a ``Fixes:`` tag included.
+The same applies to fixes that target bpf-next, where the affected
+commit is in net-next (or in some cases bpf-next). The ``Fixes:`` tag is
+crucial in order to identify follow-up commits and tremendously helps
+for people having to do backporting, so it is a must have!
+
+We also don't accept patches with an empty commit message. Take your
+time and properly write up a high quality commit message, it is
+essential!
+
+Think about it this way: other developers looking at your code a month
+from now need to understand *why* a certain change has been done that
+way, and whether there have been flaws in the analysis or assumptions
+that the original author did. Thus providing a proper rationale and
+describing the use-case for the changes is a must.
+
+Patch submissions with >1 patch must have a cover letter which includes
+a high level description of the series. This high level summary will
+then be placed into the merge commit by the BPF maintainers such that
+it is also accessible from the git log for future reference.
+
+Q: Features changing BPF JIT and/or LLVM
+----------------------------------------
+Q: What do I need to consider when adding a new instruction or feature
+that would require BPF JIT and/or LLVM integration as well?
+
+A: We try hard to keep all BPF JITs up to date such that the same user
+experience can be guaranteed when running BPF programs on different
+architectures without having the program punt to the less efficient
+interpreter in case the in-kernel BPF JIT is enabled.
+
+If you are unable to implement or test the required JIT changes for
+certain architectures, please work together with the related BPF JIT
+developers in order to get the feature implemented in a timely manner.
+Please refer to the git log (``arch/*/net/``) to locate the necessary
+people for helping out.
+
+Also always make sure to add BPF test cases (e.g. test_bpf.c and
+test_verifier.c) for new instructions, so that they can receive
+broad test coverage and help run-time testing the various BPF JITs.
+
+In case of new BPF instructions, once the changes have been accepted
+into the Linux kernel, please implement support into LLVM's BPF back
+end. See LLVM_ section below for further information.
+
+Stable submission
+=================
+
+Q: I need a specific BPF commit in stable kernels. What should I do?
+--------------------------------------------------------------------
+A: In case you need a specific fix in stable kernels, first check whether
+the commit has already been applied in the related ``linux-*.y`` branches:
+
+ https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/
+
+If not the case, then drop an email to the BPF maintainers with the
+netdev kernel mailing list in Cc and ask for the fix to be queued up:
+
+ netdev@vger.kernel.org
+
+The process in general is the same as on netdev itself, see also the
+`netdev FAQ`_ document.
+
+Q: Do you also backport to kernels not currently maintained as stable?
+----------------------------------------------------------------------
+A: No. If you need a specific BPF commit in kernels that are currently not
+maintained by the stable maintainers, then you are on your own.
+
+The current stable and longterm stable kernels are all listed here:
+
+ https://www.kernel.org/
+
+Q: The BPF patch I am about to submit needs to go to stable as well
+-------------------------------------------------------------------
+What should I do?
+
+A: The same rules apply as with netdev patch submissions in general, see
+`netdev FAQ`_ under:
+
+ `Documentation/networking/netdev-FAQ.txt`_
+
+Never add "``Cc: stable@vger.kernel.org``" to the patch description, but
+ask the BPF maintainers to queue the patches instead. This can be done
+with a note, for example, under the ``---`` part of the patch which does
+not go into the git log. Alternatively, this can be done as a simple
+request by mail instead.
+
+Q: Queue stable patches
+-----------------------
+Q: Where do I find currently queued BPF patches that will be submitted
+to stable?
+
+A: Once patches that fix critical bugs got applied into the bpf tree, they
+are queued up for stable submission under:
+
+ http://patchwork.ozlabs.org/bundle/bpf/stable/?state=*
+
+They will be on hold there at minimum until the related commit made its
+way into the mainline kernel tree.
+
+After having been under broader exposure, the queued patches will be
+submitted by the BPF maintainers to the stable maintainers.
+
+Testing patches
+===============
+
+Q: How to run BPF selftests
+---------------------------
+A: After you have booted into the newly compiled kernel, navigate to
+the BPF selftests_ suite in order to test BPF functionality (current
+working directory points to the root of the cloned git tree)::
+
+ $ cd tools/testing/selftests/bpf/
+ $ make
+
+To run the verifier tests::
+
+ $ sudo ./test_verifier
+
+The verifier tests print out all the current checks being
+performed. The summary at the end of running all tests will dump
+information of test successes and failures::
+
+ Summary: 418 PASSED, 0 FAILED
+
+In order to run through all BPF selftests, the following command is
+needed::
+
+ $ sudo make run_tests
+
+See the kernels selftest `Documentation/dev-tools/kselftest.rst`_
+document for further documentation.
+
+Q: Which BPF kernel selftests version should I run my kernel against?
+---------------------------------------------------------------------
+A: If you run a kernel ``xyz``, then always run the BPF kernel selftests
+from that kernel ``xyz`` as well. Do not expect that the BPF selftest
+from the latest mainline tree will pass all the time.
+
+In particular, test_bpf.c and test_verifier.c have a large number of
+test cases and are constantly updated with new BPF test sequences, or
+existing ones are adapted to verifier changes e.g. due to verifier
+becoming smarter and being able to better track certain things.
+
+LLVM
+====
+
+Q: Where do I find LLVM with BPF support?
+-----------------------------------------
+A: The BPF back end for LLVM is upstream in LLVM since version 3.7.1.
+
+All major distributions these days ship LLVM with BPF back end enabled,
+so for the majority of use-cases it is not required to compile LLVM by
+hand anymore, just install the distribution provided package.
+
+LLVM's static compiler lists the supported targets through
+``llc --version``, make sure BPF targets are listed. Example::
+
+ $ llc --version
+ LLVM (http://llvm.org/):
+ LLVM version 6.0.0svn
+ Optimized build.
+ Default target: x86_64-unknown-linux-gnu
+ Host CPU: skylake
+
+ Registered Targets:
+ bpf - BPF (host endian)
+ bpfeb - BPF (big endian)
+ bpfel - BPF (little endian)
+ x86 - 32-bit X86: Pentium-Pro and above
+ x86-64 - 64-bit X86: EM64T and AMD64
+
+For developers in order to utilize the latest features added to LLVM's
+BPF back end, it is advisable to run the latest LLVM releases. Support
+for new BPF kernel features such as additions to the BPF instruction
+set are often developed together.
+
+All LLVM releases can be found at: http://releases.llvm.org/
+
+Q: Got it, so how do I build LLVM manually anyway?
+--------------------------------------------------
+A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have
+that set up, proceed with building the latest LLVM and clang version
+from the git repositories::
+
+ $ git clone http://llvm.org/git/llvm.git
+ $ cd llvm/tools
+ $ git clone --depth 1 http://llvm.org/git/clang.git
+ $ cd ..; mkdir build; cd build
+ $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
+ -DBUILD_SHARED_LIBS=OFF \
+ -DCMAKE_BUILD_TYPE=Release \
+ -DLLVM_BUILD_RUNTIME=OFF
+ $ make -j $(getconf _NPROCESSORS_ONLN)
+
+The built binaries can then be found in the build/bin/ directory, where
+you can point the PATH variable to.
+
+Q: Reporting LLVM BPF issues
+----------------------------
+Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code
+generation back end or about LLVM generated code that the verifier
+refuses to accept?
+
+A: Yes, please do!
+
+LLVM's BPF back end is a key piece of the whole BPF
+infrastructure and it ties deeply into verification of programs from the
+kernel side. Therefore, any issues on either side need to be investigated
+and fixed whenever necessary.
+
+Therefore, please make sure to bring them up at netdev kernel mailing
+list and Cc BPF maintainers for LLVM and kernel bits:
+
+* Yonghong Song <yhs@fb.com>
+* Alexei Starovoitov <ast@kernel.org>
+* Daniel Borkmann <daniel@iogearbox.net>
+
+LLVM also has an issue tracker where BPF related bugs can be found:
+
+ https://bugs.llvm.org/buglist.cgi?quicksearch=bpf
+
+However, it is better to reach out through mailing lists with having
+maintainers in Cc.
+
+Q: New BPF instruction for kernel and LLVM
+------------------------------------------
+Q: I have added a new BPF instruction to the kernel, how can I integrate
+it into LLVM?
+
+A: LLVM has a ``-mcpu`` selector for the BPF back end in order to allow
+the selection of BPF instruction set extensions. By default the
+``generic`` processor target is used, which is the base instruction set
+(v1) of BPF.
+
+LLVM has an option to select ``-mcpu=probe`` where it will probe the host
+kernel for supported BPF instruction set extensions and selects the
+optimal set automatically.
+
+For cross-compilation, a specific version can be select manually as well ::
+
+ $ llc -march bpf -mcpu=help
+ Available CPUs for this target:
+
+ generic - Select the generic processor.
+ probe - Select the probe processor.
+ v1 - Select the v1 processor.
+ v2 - Select the v2 processor.
+ [...]
+
+Newly added BPF instructions to the Linux kernel need to follow the same
+scheme, bump the instruction set version and implement probing for the
+extensions such that ``-mcpu=probe`` users can benefit from the
+optimization transparently when upgrading their kernels.
+
+If you are unable to implement support for the newly added BPF instruction
+please reach out to BPF developers for help.
+
+By the way, the BPF kernel selftests run with ``-mcpu=probe`` for better
+test coverage.
+
+Q: clang flag for target bpf?
+-----------------------------
+Q: In some cases clang flag ``-target bpf`` is used but in other cases the
+default clang target, which matches the underlying architecture, is used.
+What is the difference and when I should use which?
+
+A: Although LLVM IR generation and optimization try to stay architecture
+independent, ``-target <arch>`` still has some impact on generated code:
+
+- BPF program may recursively include header file(s) with file scope
+ inline assembly codes. The default target can handle this well,
+ while ``bpf`` target may fail if bpf backend assembler does not
+ understand these assembly codes, which is true in most cases.
+
+- When compiled without ``-g``, additional elf sections, e.g.,
+ .eh_frame and .rela.eh_frame, may be present in the object file
+ with default target, but not with ``bpf`` target.
+
+- The default target may turn a C switch statement into a switch table
+ lookup and jump operation. Since the switch table is placed
+ in the global readonly section, the bpf program will fail to load.
+ The bpf target does not support switch table optimization.
+ The clang option ``-fno-jump-tables`` can be used to disable
+ switch table generation.
+
+- For clang ``-target bpf``, it is guaranteed that pointer or long /
+ unsigned long types will always have a width of 64 bit, no matter
+ whether underlying clang binary or default target (or kernel) is
+ 32 bit. However, when native clang target is used, then it will
+ compile these types based on the underlying architecture's conventions,
+ meaning in case of 32 bit architecture, pointer or long / unsigned
+ long types e.g. in BPF context structure will have width of 32 bit
+ while the BPF LLVM back end still operates in 64 bit. The native
+ target is mostly needed in tracing for the case of walking ``pt_regs``
+ or other kernel structures where CPU's register width matters.
+ Otherwise, ``clang -target bpf`` is generally recommended.
+
+You should use default target when:
+
+- Your program includes a header file, e.g., ptrace.h, which eventually
+ pulls in some header files containing file scope host assembly codes.
+
+- You can add ``-fno-jump-tables`` to work around the switch table issue.
+
+Otherwise, you can use ``bpf`` target. Additionally, you *must* use bpf target
+when:
+
+- Your program uses data structures with pointer or long / unsigned long
+ types that interface with BPF helpers or context data structures. Access
+ into these structures is verified by the BPF verifier and may result
+ in verification failures if the native architecture is not aligned with
+ the BPF architecture, e.g. 64-bit. An example of this is
+ BPF_PROG_TYPE_SK_MSG require ``-target bpf``
+
+
+.. Links
+.. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/
+.. _MAINTAINERS: ../../MAINTAINERS
+.. _Documentation/networking/netdev-FAQ.txt: ../networking/netdev-FAQ.txt
+.. _netdev FAQ: ../networking/netdev-FAQ.txt
+.. _samples/bpf/: ../../samples/bpf/
+.. _selftests: ../../tools/testing/selftests/bpf/
+.. _Documentation/dev-tools/kselftest.rst:
+ https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html
+
+Happy BPF hacking!
diff --git a/Documentation/bpf/bpf_devel_QA.txt b/Documentation/bpf/bpf_devel_QA.txt
deleted file mode 100644
index da57601..0000000
--- a/Documentation/bpf/bpf_devel_QA.txt
+++ /dev/null
@@ -1,570 +0,0 @@
-This document provides information for the BPF subsystem about various
-workflows related to reporting bugs, submitting patches, and queueing
-patches for stable kernels.
-
-For general information about submitting patches, please refer to
-Documentation/process/. This document only describes additional specifics
-related to BPF.
-
-Reporting bugs:
----------------
-
-Q: How do I report bugs for BPF kernel code?
-
-A: Since all BPF kernel development as well as bpftool and iproute2 BPF
- loader development happens through the netdev kernel mailing list,
- please report any found issues around BPF to the following mailing
- list:
-
- netdev@vger.kernel.org
-
- This may also include issues related to XDP, BPF tracing, etc.
-
- Given netdev has a high volume of traffic, please also add the BPF
- maintainers to Cc (from kernel MAINTAINERS file):
-
- Alexei Starovoitov <ast@kernel.org>
- Daniel Borkmann <daniel@iogearbox.net>
-
- In case a buggy commit has already been identified, make sure to keep
- the actual commit authors in Cc as well for the report. They can
- typically be identified through the kernel's git tree.
-
- Please do *not* report BPF issues to bugzilla.kernel.org since it
- is a guarantee that the reported issue will be overlooked.
-
-Submitting patches:
--------------------
-
-Q: To which mailing list do I need to submit my BPF patches?
-
-A: Please submit your BPF patches to the netdev kernel mailing list:
-
- netdev@vger.kernel.org
-
- Historically, BPF came out of networking and has always been maintained
- by the kernel networking community. Although these days BPF touches
- many other subsystems as well, the patches are still routed mainly
- through the networking community.
-
- In case your patch has changes in various different subsystems (e.g.
- tracing, security, etc), make sure to Cc the related kernel mailing
- lists and maintainers from there as well, so they are able to review
- the changes and provide their Acked-by's to the patches.
-
-Q: Where can I find patches currently under discussion for BPF subsystem?
-
-A: All patches that are Cc'ed to netdev are queued for review under netdev
- patchwork project:
-
- http://patchwork.ozlabs.org/project/netdev/list/
-
- Those patches which target BPF, are assigned to a 'bpf' delegate for
- further processing from BPF maintainers. The current queue with
- patches under review can be found at:
-
- https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147
-
- Once the patches have been reviewed by the BPF community as a whole
- and approved by the BPF maintainers, their status in patchwork will be
- changed to 'Accepted' and the submitter will be notified by mail. This
- means that the patches look good from a BPF perspective and have been
- applied to one of the two BPF kernel trees.
-
- In case feedback from the community requires a respin of the patches,
- their status in patchwork will be set to 'Changes Requested', and purged
- from the current review queue. Likewise for cases where patches would
- get rejected or are not applicable to the BPF trees (but assigned to
- the 'bpf' delegate).
-
-Q: How do the changes make their way into Linux?
-
-A: There are two BPF kernel trees (git repositories). Once patches have
- been accepted by the BPF maintainers, they will be applied to one
- of the two BPF trees:
-
- https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/
- https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
-
- The bpf tree itself is for fixes only, whereas bpf-next for features,
- cleanups or other kind of improvements ("next-like" content). This is
- analogous to net and net-next trees for networking. Both bpf and
- bpf-next will only have a master branch in order to simplify against
- which branch patches should get rebased to.
-
- Accumulated BPF patches in the bpf tree will regularly get pulled
- into the net kernel tree. Likewise, accumulated BPF patches accepted
- into the bpf-next tree will make their way into net-next tree. net and
- net-next are both run by David S. Miller. From there, they will go
- into the kernel mainline tree run by Linus Torvalds. To read up on the
- process of net and net-next being merged into the mainline tree, see
- the netdev FAQ under:
-
- Documentation/networking/netdev-FAQ.txt
-
- Occasionally, to prevent merge conflicts, we might send pull requests
- to other trees (e.g. tracing) with a small subset of the patches, but
- net and net-next are always the main trees targeted for integration.
-
- The pull requests will contain a high-level summary of the accumulated
- patches and can be searched on netdev kernel mailing list through the
- following subject lines (yyyy-mm-dd is the date of the pull request):
-
- pull-request: bpf yyyy-mm-dd
- pull-request: bpf-next yyyy-mm-dd
-
-Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be
- applied to?
-
-A: The process is the very same as described in the netdev FAQ, so
- please read up on it. The subject line must indicate whether the
- patch is a fix or rather "next-like" content in order to let the
- maintainers know whether it is targeted at bpf or bpf-next.
-
- For fixes eventually landing in bpf -> net tree, the subject must
- look like:
-
- git format-patch --subject-prefix='PATCH bpf' start..finish
-
- For features/improvements/etc that should eventually land in
- bpf-next -> net-next, the subject must look like:
-
- git format-patch --subject-prefix='PATCH bpf-next' start..finish
-
- If unsure whether the patch or patch series should go into bpf
- or net directly, or bpf-next or net-next directly, it is not a
- problem either if the subject line says net or net-next as target.
- It is eventually up to the maintainers to do the delegation of
- the patches.
-
- If it is clear that patches should go into bpf or bpf-next tree,
- please make sure to rebase the patches against those trees in
- order to reduce potential conflicts.
-
- In case the patch or patch series has to be reworked and sent out
- again in a second or later revision, it is also required to add a
- version number (v2, v3, ...) into the subject prefix:
-
- git format-patch --subject-prefix='PATCH net-next v2' start..finish
-
- When changes have been requested to the patch series, always send the
- whole patch series again with the feedback incorporated (never send
- individual diffs on top of the old series).
-
-Q: What does it mean when a patch gets applied to bpf or bpf-next tree?
-
-A: It means that the patch looks good for mainline inclusion from
- a BPF point of view.
-
- Be aware that this is not a final verdict that the patch will
- automatically get accepted into net or net-next trees eventually:
-
- On the netdev kernel mailing list reviews can come in at any point
- in time. If discussions around a patch conclude that they cannot
- get included as-is, we will either apply a follow-up fix or drop
- them from the trees entirely. Therefore, we also reserve to rebase
- the trees when deemed necessary. After all, the purpose of the tree
- is to i) accumulate and stage BPF patches for integration into trees
- like net and net-next, and ii) run extensive BPF test suite and
- workloads on the patches before they make their way any further.
-
- Once the BPF pull request was accepted by David S. Miller, then
- the patches end up in net or net-next tree, respectively, and
- make their way from there further into mainline. Again, see the
- netdev FAQ for additional information e.g. on how often they are
- merged to mainline.
-
-Q: How long do I need to wait for feedback on my BPF patches?
-
-A: We try to keep the latency low. The usual time to feedback will
- be around 2 or 3 business days. It may vary depending on the
- complexity of changes and current patch load.
-
-Q: How often do you send pull requests to major kernel trees like
- net or net-next?
-
-A: Pull requests will be sent out rather often in order to not
- accumulate too many patches in bpf or bpf-next.
-
- As a rule of thumb, expect pull requests for each tree regularly
- at the end of the week. In some cases pull requests could additionally
- come also in the middle of the week depending on the current patch
- load or urgency.
-
-Q: Are patches applied to bpf-next when the merge window is open?
-
-A: For the time when the merge window is open, bpf-next will not be
- processed. This is roughly analogous to net-next patch processing,
- so feel free to read up on the netdev FAQ about further details.
-
- During those two weeks of merge window, we might ask you to resend
- your patch series once bpf-next is open again. Once Linus released
- a v*-rc1 after the merge window, we continue processing of bpf-next.
-
- For non-subscribers to kernel mailing lists, there is also a status
- page run by David S. Miller on net-next that provides guidance:
-
- http://vger.kernel.org/~davem/net-next.html
-
-Q: I made a BPF verifier change, do I need to add test cases for
- BPF kernel selftests?
-
-A: If the patch has changes to the behavior of the verifier, then yes,
- it is absolutely necessary to add test cases to the BPF kernel
- selftests suite. If they are not present and we think they are
- needed, then we might ask for them before accepting any changes.
-
- In particular, test_verifier.c is tracking a high number of BPF test
- cases, including a lot of corner cases that LLVM BPF back end may
- generate out of the restricted C code. Thus, adding test cases is
- absolutely crucial to make sure future changes do not accidentally
- affect prior use-cases. Thus, treat those test cases as: verifier
- behavior that is not tracked in test_verifier.c could potentially
- be subject to change.
-
-Q: When should I add code to samples/bpf/ and when to BPF kernel
- selftests?
-
-A: In general, we prefer additions to BPF kernel selftests rather than
- samples/bpf/. The rationale is very simple: kernel selftests are
- regularly run by various bots to test for kernel regressions.
-
- The more test cases we add to BPF selftests, the better the coverage
- and the less likely it is that those could accidentally break. It is
- not that BPF kernel selftests cannot demo how a specific feature can
- be used.
-
- That said, samples/bpf/ may be a good place for people to get started,
- so it might be advisable that simple demos of features could go into
- samples/bpf/, but advanced functional and corner-case testing rather
- into kernel selftests.
-
- If your sample looks like a test case, then go for BPF kernel selftests
- instead!
-
-Q: When should I add code to the bpftool?
-
-A: The main purpose of bpftool (under tools/bpf/bpftool/) is to provide
- a central user space tool for debugging and introspection of BPF programs
- and maps that are active in the kernel. If UAPI changes related to BPF
- enable for dumping additional information of programs or maps, then
- bpftool should be extended as well to support dumping them.
-
-Q: When should I add code to iproute2's BPF loader?
-
-A: For UAPI changes related to the XDP or tc layer (e.g. cls_bpf), the
- convention is that those control-path related changes are added to
- iproute2's BPF loader as well from user space side. This is not only
- useful to have UAPI changes properly designed to be usable, but also
- to make those changes available to a wider user base of major
- downstream distributions.
-
-Q: Do you accept patches as well for iproute2's BPF loader?
-
-A: Patches for the iproute2's BPF loader have to be sent to:
-
- netdev@vger.kernel.org
-
- While those patches are not processed by the BPF kernel maintainers,
- please keep them in Cc as well, so they can be reviewed.
-
- The official git repository for iproute2 is run by Stephen Hemminger
- and can be found at:
-
- https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/
-
- The patches need to have a subject prefix of '[PATCH iproute2 master]'
- or '[PATCH iproute2 net-next]'. 'master' or 'net-next' describes the
- target branch where the patch should be applied to. Meaning, if kernel
- changes went into the net-next kernel tree, then the related iproute2
- changes need to go into the iproute2 net-next branch, otherwise they
- can be targeted at master branch. The iproute2 net-next branch will get
- merged into the master branch after the current iproute2 version from
- master has been released.
-
- Like BPF, the patches end up in patchwork under the netdev project and
- are delegated to 'shemminger' for further processing:
-
- http://patchwork.ozlabs.org/project/netdev/list/?delegate=389
-
-Q: What is the minimum requirement before I submit my BPF patches?
-
-A: When submitting patches, always take the time and properly test your
- patches *prior* to submission. Never rush them! If maintainers find
- that your patches have not been properly tested, it is a good way to
- get them grumpy. Testing patch submissions is a hard requirement!
-
- Note, fixes that go to bpf tree *must* have a Fixes: tag included. The
- same applies to fixes that target bpf-next, where the affected commit
- is in net-next (or in some cases bpf-next). The Fixes: tag is crucial
- in order to identify follow-up commits and tremendously helps for people
- having to do backporting, so it is a must have!
-
- We also don't accept patches with an empty commit message. Take your
- time and properly write up a high quality commit message, it is
- essential!
-
- Think about it this way: other developers looking at your code a month
- from now need to understand *why* a certain change has been done that
- way, and whether there have been flaws in the analysis or assumptions
- that the original author did. Thus providing a proper rationale and
- describing the use-case for the changes is a must.
-
- Patch submissions with >1 patch must have a cover letter which includes
- a high level description of the series. This high level summary will
- then be placed into the merge commit by the BPF maintainers such that
- it is also accessible from the git log for future reference.
-
-Q: What do I need to consider when adding a new instruction or feature
- that would require BPF JIT and/or LLVM integration as well?
-
-A: We try hard to keep all BPF JITs up to date such that the same user
- experience can be guaranteed when running BPF programs on different
- architectures without having the program punt to the less efficient
- interpreter in case the in-kernel BPF JIT is enabled.
-
- If you are unable to implement or test the required JIT changes for
- certain architectures, please work together with the related BPF JIT
- developers in order to get the feature implemented in a timely manner.
- Please refer to the git log (arch/*/net/) to locate the necessary
- people for helping out.
-
- Also always make sure to add BPF test cases (e.g. test_bpf.c and
- test_verifier.c) for new instructions, so that they can receive
- broad test coverage and help run-time testing the various BPF JITs.
-
- In case of new BPF instructions, once the changes have been accepted
- into the Linux kernel, please implement support into LLVM's BPF back
- end. See LLVM section below for further information.
-
-Stable submission:
-------------------
-
-Q: I need a specific BPF commit in stable kernels. What should I do?
-
-A: In case you need a specific fix in stable kernels, first check whether
- the commit has already been applied in the related linux-*.y branches:
-
- https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/
-
- If not the case, then drop an email to the BPF maintainers with the
- netdev kernel mailing list in Cc and ask for the fix to be queued up:
-
- netdev@vger.kernel.org
-
- The process in general is the same as on netdev itself, see also the
- netdev FAQ document.
-
-Q: Do you also backport to kernels not currently maintained as stable?
-
-A: No. If you need a specific BPF commit in kernels that are currently not
- maintained by the stable maintainers, then you are on your own.
-
- The current stable and longterm stable kernels are all listed here:
-
- https://www.kernel.org/
-
-Q: The BPF patch I am about to submit needs to go to stable as well. What
- should I do?
-
-A: The same rules apply as with netdev patch submissions in general, see
- netdev FAQ under:
-
- Documentation/networking/netdev-FAQ.txt
-
- Never add "Cc: stable@vger.kernel.org" to the patch description, but
- ask the BPF maintainers to queue the patches instead. This can be done
- with a note, for example, under the "---" part of the patch which does
- not go into the git log. Alternatively, this can be done as a simple
- request by mail instead.
-
-Q: Where do I find currently queued BPF patches that will be submitted
- to stable?
-
-A: Once patches that fix critical bugs got applied into the bpf tree, they
- are queued up for stable submission under:
-
- http://patchwork.ozlabs.org/bundle/bpf/stable/?state=*
-
- They will be on hold there at minimum until the related commit made its
- way into the mainline kernel tree.
-
- After having been under broader exposure, the queued patches will be
- submitted by the BPF maintainers to the stable maintainers.
-
-Testing patches:
-----------------
-
-Q: Which BPF kernel selftests version should I run my kernel against?
-
-A: If you run a kernel xyz, then always run the BPF kernel selftests from
- that kernel xyz as well. Do not expect that the BPF selftest from the
- latest mainline tree will pass all the time.
-
- In particular, test_bpf.c and test_verifier.c have a large number of
- test cases and are constantly updated with new BPF test sequences, or
- existing ones are adapted to verifier changes e.g. due to verifier
- becoming smarter and being able to better track certain things.
-
-LLVM:
------
-
-Q: Where do I find LLVM with BPF support?
-
-A: The BPF back end for LLVM is upstream in LLVM since version 3.7.1.
-
- All major distributions these days ship LLVM with BPF back end enabled,
- so for the majority of use-cases it is not required to compile LLVM by
- hand anymore, just install the distribution provided package.
-
- LLVM's static compiler lists the supported targets through 'llc --version',
- make sure BPF targets are listed. Example:
-
- $ llc --version
- LLVM (http://llvm.org/):
- LLVM version 6.0.0svn
- Optimized build.
- Default target: x86_64-unknown-linux-gnu
- Host CPU: skylake
-
- Registered Targets:
- bpf - BPF (host endian)
- bpfeb - BPF (big endian)
- bpfel - BPF (little endian)
- x86 - 32-bit X86: Pentium-Pro and above
- x86-64 - 64-bit X86: EM64T and AMD64
-
- For developers in order to utilize the latest features added to LLVM's
- BPF back end, it is advisable to run the latest LLVM releases. Support
- for new BPF kernel features such as additions to the BPF instruction
- set are often developed together.
-
- All LLVM releases can be found at: http://releases.llvm.org/
-
-Q: Got it, so how do I build LLVM manually anyway?
-
-A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have
- that set up, proceed with building the latest LLVM and clang version
- from the git repositories:
-
- $ git clone http://llvm.org/git/llvm.git
- $ cd llvm/tools
- $ git clone --depth 1 http://llvm.org/git/clang.git
- $ cd ..; mkdir build; cd build
- $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
- -DBUILD_SHARED_LIBS=OFF \
- -DCMAKE_BUILD_TYPE=Release \
- -DLLVM_BUILD_RUNTIME=OFF
- $ make -j $(getconf _NPROCESSORS_ONLN)
-
- The built binaries can then be found in the build/bin/ directory, where
- you can point the PATH variable to.
-
-Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code
- generation back end or about LLVM generated code that the verifier
- refuses to accept?
-
-A: Yes, please do! LLVM's BPF back end is a key piece of the whole BPF
- infrastructure and it ties deeply into verification of programs from the
- kernel side. Therefore, any issues on either side need to be investigated
- and fixed whenever necessary.
-
- Therefore, please make sure to bring them up at netdev kernel mailing
- list and Cc BPF maintainers for LLVM and kernel bits:
-
- Yonghong Song <yhs@fb.com>
- Alexei Starovoitov <ast@kernel.org>
- Daniel Borkmann <daniel@iogearbox.net>
-
- LLVM also has an issue tracker where BPF related bugs can be found:
-
- https://bugs.llvm.org/buglist.cgi?quicksearch=bpf
-
- However, it is better to reach out through mailing lists with having
- maintainers in Cc.
-
-Q: I have added a new BPF instruction to the kernel, how can I integrate
- it into LLVM?
-
-A: LLVM has a -mcpu selector for the BPF back end in order to allow the
- selection of BPF instruction set extensions. By default the 'generic'
- processor target is used, which is the base instruction set (v1) of BPF.
-
- LLVM has an option to select -mcpu=probe where it will probe the host
- kernel for supported BPF instruction set extensions and selects the
- optimal set automatically.
-
- For cross-compilation, a specific version can be select manually as well.
-
- $ llc -march bpf -mcpu=help
- Available CPUs for this target:
-
- generic - Select the generic processor.
- probe - Select the probe processor.
- v1 - Select the v1 processor.
- v2 - Select the v2 processor.
- [...]
-
- Newly added BPF instructions to the Linux kernel need to follow the same
- scheme, bump the instruction set version and implement probing for the
- extensions such that -mcpu=probe users can benefit from the optimization
- transparently when upgrading their kernels.
-
- If you are unable to implement support for the newly added BPF instruction
- please reach out to BPF developers for help.
-
- By the way, the BPF kernel selftests run with -mcpu=probe for better
- test coverage.
-
-Q: In some cases clang flag "-target bpf" is used but in other cases the
- default clang target, which matches the underlying architecture, is used.
- What is the difference and when I should use which?
-
-A: Although LLVM IR generation and optimization try to stay architecture
- independent, "-target <arch>" still has some impact on generated code:
-
- - BPF program may recursively include header file(s) with file scope
- inline assembly codes. The default target can handle this well,
- while bpf target may fail if bpf backend assembler does not
- understand these assembly codes, which is true in most cases.
-
- - When compiled without -g, additional elf sections, e.g.,
- .eh_frame and .rela.eh_frame, may be present in the object file
- with default target, but not with bpf target.
-
- - The default target may turn a C switch statement into a switch table
- lookup and jump operation. Since the switch table is placed
- in the global readonly section, the bpf program will fail to load.
- The bpf target does not support switch table optimization.
- The clang option "-fno-jump-tables" can be used to disable
- switch table generation.
-
- - For clang -target bpf, it is guaranteed that pointer or long /
- unsigned long types will always have a width of 64 bit, no matter
- whether underlying clang binary or default target (or kernel) is
- 32 bit. However, when native clang target is used, then it will
- compile these types based on the underlying architecture's conventions,
- meaning in case of 32 bit architecture, pointer or long / unsigned
- long types e.g. in BPF context structure will have width of 32 bit
- while the BPF LLVM back end still operates in 64 bit. The native
- target is mostly needed in tracing for the case of walking pt_regs
- or other kernel structures where CPU's register width matters.
- Otherwise, clang -target bpf is generally recommended.
-
- You should use default target when:
-
- - Your program includes a header file, e.g., ptrace.h, which eventually
- pulls in some header files containing file scope host assembly codes.
- - You can add "-fno-jump-tables" to work around the switch table issue.
-
- Otherwise, you can use bpf target. Additionally, you _must_ use bpf target
- when:
-
- - Your program uses data structures with pointer or long / unsigned long
- types that interface with BPF helpers or context data structures. Access
- into these structures is verified by the BPF verifier and may result
- in verification failures if the native architecture is not aligned with
- the BPF architecture, e.g. 64-bit. An example of this is
- BPF_PROG_TYPE_SK_MSG require '-target bpf'
-
-Happy BPF hacking!
diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index cfe8f64..3ceeb8d 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -82,8 +82,6 @@
switch0: switch0@0 {
compatible = "marvell,mv88e6085";
- #address-cells = <1>;
- #size-cells = <0>;
reg = <0>;
dsa,member = <0 0>;
@@ -135,8 +133,6 @@
switch1: switch1@0 {
compatible = "marvell,mv88e6085";
- #address-cells = <1>;
- #size-cells = <0>;
reg = <0>;
dsa,member = <0 1>;
@@ -204,8 +200,6 @@
switch2: switch2@0 {
compatible = "marvell,mv88e6085";
- #address-cells = <1>;
- #size-cells = <0>;
reg = <0>;
dsa,member = <0 2>;
diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
index 9c67ee4..bbcb255 100644
--- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt
+++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
@@ -2,7 +2,10 @@
Required properties:
-- compatible: should be "qca,qca8337"
+- compatible: should be one of:
+ "qca,qca8334"
+ "qca,qca8337"
+
- #size-cells: must be 0
- #address-cells: must be 1
@@ -14,6 +17,20 @@
referencing the internal PHY connected to it. The CPU port of this switch is
always port 0.
+A CPU port node has the following optional node:
+
+- fixed-link : Fixed-link subnode describing a link to a non-MDIO
+ managed entity. See
+ Documentation/devicetree/bindings/net/fixed-link.txt
+ for details.
+
+For QCA8K the 'fixed-link' sub-node supports only the following properties:
+
+- 'speed' (integer, mandatory), to indicate the link speed. Accepted
+ values are 10, 100 and 1000
+- 'full-duplex' (boolean, optional), to indicate that full duplex is
+ used. When absent, half duplex is assumed.
+
Example:
@@ -53,6 +70,10 @@
label = "cpu";
ethernet = <&gmac1>;
phy-mode = "rgmii";
+ fixed-link {
+ speed = 1000;
+ full-duplex;
+ };
};
port@1 {
diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index 3d6d5fa..cfe7243 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -7,6 +7,7 @@
- compatible: must be one of the following string:
"allwinner,sun8i-a83t-emac"
"allwinner,sun8i-h3-emac"
+ "allwinner,sun8i-r40-gmac"
"allwinner,sun8i-v3s-emac"
"allwinner,sun50i-a64-emac"
- reg: address and length of the register for the device.
@@ -20,18 +21,18 @@
- phy-handle: See ethernet.txt
- #address-cells: shall be 1
- #size-cells: shall be 0
-- syscon: A phandle to the syscon of the SoC with one of the following
- compatible string:
- - allwinner,sun8i-h3-system-controller
- - allwinner,sun8i-v3s-system-controller
- - allwinner,sun50i-a64-system-controller
- - allwinner,sun8i-a83t-system-controller
+- syscon: A phandle to the device containing the EMAC or GMAC clock register
Optional properties:
-- allwinner,tx-delay-ps: TX clock delay chain value in ps. Range value is 0-700. Default is 0)
-- allwinner,rx-delay-ps: RX clock delay chain value in ps. Range value is 0-3100. Default is 0)
-Both delay properties need to be a multiple of 100. They control the delay for
-external PHY.
+- allwinner,tx-delay-ps: TX clock delay chain value in ps.
+ Range is 0-700. Default is 0.
+ Unavailable for allwinner,sun8i-r40-gmac
+- allwinner,rx-delay-ps: RX clock delay chain value in ps.
+ Range is 0-3100. Default is 0.
+ Range is 0-700 for allwinner,sun8i-r40-gmac
+Both delay properties need to be a multiple of 100. They control the
+clock delay for external RGMII PHY. They do not apply to the internal
+PHY or external non-RGMII PHYs.
Optional properties for the following compatibles:
- "allwinner,sun8i-h3-emac",
diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt b/Documentation/devicetree/bindings/net/meson-dwmac.txt
index 61cada2..1321bb1 100644
--- a/Documentation/devicetree/bindings/net/meson-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/meson-dwmac.txt
@@ -11,6 +11,7 @@
- "amlogic,meson8b-dwmac"
- "amlogic,meson8m2-dwmac"
- "amlogic,meson-gxbb-dwmac"
+ - "amlogic,meson-axg-dwmac"
Additionally "snps,dwmac" and any applicable more
detailed version number described in net/stmmac.txt
should be used.
diff --git a/Documentation/devicetree/bindings/net/microchip,lan78xx.txt b/Documentation/devicetree/bindings/net/microchip,lan78xx.txt
new file mode 100644
index 0000000..76786a0
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/microchip,lan78xx.txt
@@ -0,0 +1,54 @@
+Microchip LAN78xx Gigabit Ethernet controller
+
+The LAN78XX devices are usually configured by programming their OTP or with
+an external EEPROM, but some platforms (e.g. Raspberry Pi 3 B+) have neither.
+The Device Tree properties, if present, override the OTP and EEPROM.
+
+Required properties:
+- compatible: Should be one of "usb424,7800", "usb424,7801" or "usb424,7850".
+
+Optional properties:
+- local-mac-address: see ethernet.txt
+- mac-address: see ethernet.txt
+
+Optional properties of the embedded PHY:
+- microchip,led-modes: a 0..4 element vector, with each element configuring
+ the operating mode of an LED. Omitted LEDs are turned off. Allowed values
+ are defined in "include/dt-bindings/net/microchip-lan78xx.h".
+
+Example:
+
+/* Based on the configuration for a Raspberry Pi 3 B+ */
+&usb {
+ usb-port@1 {
+ compatible = "usb424,2514";
+ reg = <1>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ usb-port@1 {
+ compatible = "usb424,2514";
+ reg = <1>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethernet: ethernet@1 {
+ compatible = "usb424,7800";
+ reg = <1>;
+ local-mac-address = [ 00 11 22 33 44 55 ];
+
+ mdio {
+ #address-cells = <0x1>;
+ #size-cells = <0x0>;
+ eth_phy: ethernet-phy@1 {
+ reg = <1>;
+ microchip,led-modes = <
+ LAN78XX_LINK_1000_ACTIVITY
+ LAN78XX_LINK_10_100_ACTIVITY
+ >;
+ };
+ };
+ };
+ };
+ };
+};
diff --git a/Documentation/devicetree/bindings/net/mscc-miim.txt b/Documentation/devicetree/bindings/net/mscc-miim.txt
new file mode 100644
index 0000000..7104679
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mscc-miim.txt
@@ -0,0 +1,26 @@
+Microsemi MII Management Controller (MIIM) / MDIO
+=================================================
+
+Properties:
+- compatible: must be "mscc,ocelot-miim"
+- reg: The base address of the MDIO bus controller register bank. Optionally, a
+ second register bank can be defined if there is an associated reset register
+ for internal PHYs
+- #address-cells: Must be <1>.
+- #size-cells: Must be <0>. MDIO addresses have no size component.
+- interrupts: interrupt specifier (refer to the interrupt binding)
+
+Typically an MDIO bus might have several children.
+
+Example:
+ mdio@107009c {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "mscc,ocelot-miim";
+ reg = <0x107009c 0x36>, <0x10700f0 0x8>;
+ interrupts = <14>;
+
+ phy0: ethernet-phy@0 {
+ reg = <0>;
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/mscc-ocelot.txt b/Documentation/devicetree/bindings/net/mscc-ocelot.txt
new file mode 100644
index 0000000..0a84711
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mscc-ocelot.txt
@@ -0,0 +1,82 @@
+Microsemi Ocelot network Switch
+===============================
+
+The Microsemi Ocelot network switch can be found on Microsemi SoCs (VSC7513,
+VSC7514)
+
+Required properties:
+- compatible: Should be "mscc,vsc7514-switch"
+- reg: Must contain an (offset, length) pair of the register set for each
+ entry in reg-names.
+- reg-names: Must include the following entries:
+ - "sys"
+ - "rew"
+ - "qs"
+ - "hsio"
+ - "qsys"
+ - "ana"
+ - "portX" with X from 0 to the number of last port index available on that
+ switch
+- interrupts: Should contain the switch interrupts for frame extraction and
+ frame injection
+- interrupt-names: should contain the interrupt names: "xtr", "inj"
+- ethernet-ports: A container for child nodes representing switch ports.
+
+The ethernet-ports container has the following properties
+
+Required properties:
+
+- #address-cells: Must be 1
+- #size-cells: Must be 0
+
+Each port node must have the following mandatory properties:
+- reg: Describes the port address in the switch
+
+Port nodes may also contain the following optional standardised
+properties, described in binding documents:
+
+- phy-handle: Phandle to a PHY on an MDIO bus. See
+ Documentation/devicetree/bindings/net/ethernet.txt for details.
+
+Example:
+
+ switch@1010000 {
+ compatible = "mscc,vsc7514-switch";
+ reg = <0x1010000 0x10000>,
+ <0x1030000 0x10000>,
+ <0x1080000 0x100>,
+ <0x10d0000 0x10000>,
+ <0x11e0000 0x100>,
+ <0x11f0000 0x100>,
+ <0x1200000 0x100>,
+ <0x1210000 0x100>,
+ <0x1220000 0x100>,
+ <0x1230000 0x100>,
+ <0x1240000 0x100>,
+ <0x1250000 0x100>,
+ <0x1260000 0x100>,
+ <0x1270000 0x100>,
+ <0x1280000 0x100>,
+ <0x1800000 0x80000>,
+ <0x1880000 0x10000>;
+ reg-names = "sys", "rew", "qs", "hsio", "port0",
+ "port1", "port2", "port3", "port4", "port5",
+ "port6", "port7", "port8", "port9", "port10",
+ "qsys", "ana";
+ interrupts = <21 22>;
+ interrupt-names = "xtr", "inj";
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port0: port@0 {
+ reg = <0>;
+ phy-handle = <&phy0>;
+ };
+ port1: port@1 {
+ reg = <1>;
+ phy-handle = <&phy1>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
new file mode 100644
index 0000000..0ea18a5
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
@@ -0,0 +1,30 @@
+Qualcomm Bluetooth Chips
+---------------------
+
+This documents the binding structure and common properties for serial
+attached Qualcomm devices.
+
+Serial attached Qualcomm devices shall be a child node of the host UART
+device the slave device is attached to.
+
+Required properties:
+ - compatible: should contain one of the following:
+ * "qcom,qca6174-bt"
+
+Optional properties:
+ - enable-gpios: gpio specifier used to enable chip
+ - clocks: clock provided to the controller (SUSCLK_32KHZ)
+
+Example:
+
+serial@7570000 {
+ label = "BT-UART";
+ status = "okay";
+
+ bluetooth {
+ compatible = "qcom,qca6174-bt";
+
+ enable-gpios = <&pm8994_gpios 19 GPIO_ACTIVE_HIGH>;
+ clocks = <&divclk4>;
+ };
+};
diff --git a/Documentation/devicetree/bindings/net/sff,sfp.txt b/Documentation/devicetree/bindings/net/sff,sfp.txt
index 929591d..8321399 100644
--- a/Documentation/devicetree/bindings/net/sff,sfp.txt
+++ b/Documentation/devicetree/bindings/net/sff,sfp.txt
@@ -7,11 +7,11 @@
"sff,sfp" for SFP modules
"sff,sff" for soldered down SFF modules
-Optional Properties:
-
- i2c-bus : phandle of an I2C bus controller for the SFP two wire serial
interface
+Optional Properties:
+
- mod-def0-gpios : GPIO phandle and a specifier of the MOD-DEF0 (AKA Mod_ABS)
module presence input gpio signal, active (module absent) high. Must
not be present for SFF modules
diff --git a/Documentation/devicetree/bindings/net/sh_eth.txt b/Documentation/devicetree/bindings/net/sh_eth.txt
index 5172799..82a4cf2 100644
--- a/Documentation/devicetree/bindings/net/sh_eth.txt
+++ b/Documentation/devicetree/bindings/net/sh_eth.txt
@@ -14,6 +14,7 @@
"renesas,ether-r8a7791" if the device is a part of R8A7791 SoC.
"renesas,ether-r8a7793" if the device is a part of R8A7793 SoC.
"renesas,ether-r8a7794" if the device is a part of R8A7794 SoC.
+ "renesas,gether-r8a77980" if the device is a part of R8A77980 SoC.
"renesas,ether-r7s72100" if the device is a part of R7S72100 SoC.
"renesas,rcar-gen1-ether" for a generic R-Car Gen1 device.
"renesas,rcar-gen2-ether" for a generic R-Car Gen2 or RZ/G1
diff --git a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
index 96398cc..fc8f017 100644
--- a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
+++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
@@ -13,13 +13,25 @@
- reg: Address where registers are mapped and size of region.
- interrupts: Should contain the MAC interrupt.
- phy-mode: See ethernet.txt in the same directory. Allow to choose
- "rgmii", "rmii", or "mii" according to the PHY.
+ "rgmii", "rmii", "mii", or "internal" according to the PHY.
+ The acceptable mode is SoC-dependent.
- phy-handle: Should point to the external phy device.
See ethernet.txt file in the same directory.
- clocks: A phandle to the clock for the MAC.
+ For Pro4 SoC, that is "socionext,uniphier-pro4-ave4",
+ another MAC clock, GIO bus clock and PHY clock are also required.
+ - clock-names: Should contain
+ - "ether", "ether-gb", "gio", "ether-phy" for Pro4 SoC
+ - "ether" for others
+ - resets: A phandle to the reset control for the MAC. For Pro4 SoC,
+ GIO bus reset is also required.
+ - reset-names: Should contain
+ - "ether", "gio" for Pro4 SoC
+ - "ether" for others
+ - socionext,syscon-phy-mode: A phandle to syscon with one argument
+ that configures phy mode. The argument is the ID of MAC instance.
Optional properties:
- - resets: A phandle to the reset control for the MAC.
- local-mac-address: See ethernet.txt in the same directory.
Required subnode:
@@ -34,8 +46,11 @@
interrupts = <0 66 4>;
phy-mode = "rgmii";
phy-handle = <ðphy>;
+ clock-names = "ether";
clocks = <&sys_clk 6>;
+ reset-names = "ether";
resets = <&sys_rst 6>;
+ socionext,syscon-phy-mode = <&soc_glue 0>;
local-mac-address = [00 00 00 00 00 00];
mdio {
diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
index 3d2a031..7fd4e8c 100644
--- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
+++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt
@@ -4,6 +4,7 @@
- compatible: Should be one of the following:
* "qcom,ath10k"
* "qcom,ipq4019-wifi"
+ * "qcom,wcn3990-wifi"
PCI based devices uses compatible string "qcom,ath10k" and takes calibration
data along with board specific data via "qcom,ath10k-calibration-data".
@@ -18,8 +19,12 @@
"qcom,ath10k-calibration-data" conflict with each other and only one
can be provided per device.
+SNOC based devices (i.e. wcn3990) uses compatible string "qcom,wcn3990-wifi".
+
Optional properties:
- reg: Address and length of the register set for the device.
+- reg-names: Must include the list of following reg names,
+ "membase"
- resets: Must contain an entry for each entry in reset-names.
See ../reset/reseti.txt for details.
- reset-names: Must include the list of following reset names,
@@ -49,6 +54,8 @@
hw versions.
- qcom,ath10k-pre-calibration-data : pre calibration data as an array,
the length can vary between hw versions.
+- <supply-name>-supply: handle to the regulator device tree node
+ optional "supply-name" is "vdd-0.8-cx-mx".
Example (to supply the calibration data alone):
@@ -119,3 +126,27 @@
qcom,msi_base = <0x40>;
qcom,ath10k-pre-calibration-data = [ 01 02 03 ... ];
};
+
+Example (to supply wcn3990 SoC wifi block details):
+
+wifi@18000000 {
+ compatible = "qcom,wcn3990-wifi";
+ reg = <0x18800000 0x800000>;
+ reg-names = "membase";
+ clocks = <&clock_gcc clk_aggre2_noc_clk>;
+ clock-names = "smmu_aggre2_noc_clk"
+ interrupts =
+ <0 130 0 /* CE0 */ >,
+ <0 131 0 /* CE1 */ >,
+ <0 132 0 /* CE2 */ >,
+ <0 133 0 /* CE3 */ >,
+ <0 134 0 /* CE4 */ >,
+ <0 135 0 /* CE5 */ >,
+ <0 136 0 /* CE6 */ >,
+ <0 137 0 /* CE7 */ >,
+ <0 138 0 /* CE8 */ >,
+ <0 139 0 /* CE9 */ >,
+ <0 140 0 /* CE10 */ >,
+ <0 141 0 /* CE11 */ >;
+ vdd-0.8-cx-mx-supply = <&pm8998_l5>;
+};
diff --git a/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt b/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt
index 77cd42c..b025770 100644
--- a/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt
+++ b/Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt
@@ -17,7 +17,8 @@
Required properties:
-- compatible : Must be "ti,keystone-navigator-qmss";
+- compatible : Must be "ti,keystone-navigator-qmss".
+ : Must be "ti,66ak2g-navss-qm" for QMSS on K2G SoC.
- clocks : phandle to the reference clock for this device.
- queue-range : <start number> total range of queue numbers for the device.
- linkram0 : <address size> for internal link ram, where size is the total
@@ -39,6 +40,12 @@
- Descriptor memory setup region.
- Queue Management/Queue Proxy region for queue Push.
- Queue Management/Queue Proxy region for queue Pop.
+
+For QMSS on K2G SoC, following QM reg indexes are used in that order
+ - Queue Peek region.
+ - Queue configuration region.
+ - Queue Management/Queue Proxy region for queue Push/Pop.
+
- queue-pools : child node classifying the queue ranges into pools.
Queue ranges are grouped into 3 type of pools:
- qpend : pool of qpend(interruptible) queues
diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt
index 5efae00..d296312 100644
--- a/Documentation/filesystems/nfs/nfsroot.txt
+++ b/Documentation/filesystems/nfs/nfsroot.txt
@@ -5,6 +5,7 @@
Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
Updated 2006 by Horms <horms@verge.net.au>
+Updated 2018 by Chris Novakovic <chris@chrisn.me.uk>
@@ -79,7 +80,7 @@
ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:
- <dns0-ip>:<dns1-ip>
+ <dns0-ip>:<dns1-ip>:<ntp0-ip>
This parameter tells the kernel how to configure IP addresses of devices
and also how to set up the IP routing table. It was originally called
@@ -110,6 +111,9 @@
will not be triggered if it is missing and NFS root is not
in operation.
+ Value is exported to /proc/net/pnp with the prefix "bootserver "
+ (see below).
+
Default: Determined using autoconfiguration.
The address of the autoconfiguration server is used.
@@ -123,10 +127,13 @@
Default: Determined using autoconfiguration.
- <hostname> Name of the client. May be supplied by autoconfiguration,
- but its absence will not trigger autoconfiguration.
- If specified and DHCP is used, the user provided hostname will
- be carried in the DHCP request to hopefully update DNS record.
+ <hostname> Name of the client. If a '.' character is present, anything
+ before the first '.' is used as the client's hostname, and anything
+ after it is used as its NIS domain name. May be supplied by
+ autoconfiguration, but its absence will not trigger autoconfiguration.
+ If specified and DHCP is used, the user-provided hostname (and NIS
+ domain name, if present) will be carried in the DHCP request; this
+ may cause a DNS record to be created or updated for the client.
Default: Client IP address is used in ASCII notation.
@@ -162,12 +169,55 @@
Default: any
- <dns0-ip> IP address of first nameserver.
- Value gets exported by /proc/net/pnp which is often linked
- on embedded systems by /etc/resolv.conf.
+ <dns0-ip> IP address of primary nameserver.
+ Value is exported to /proc/net/pnp with the prefix "nameserver "
+ (see below).
- <dns1-ip> IP address of second nameserver.
- Same as above.
+ Default: None if not using autoconfiguration; determined
+ automatically if using autoconfiguration.
+
+ <dns1-ip> IP address of secondary nameserver.
+ See <dns0-ip>.
+
+ <ntp0-ip> IP address of a Network Time Protocol (NTP) server.
+ Value is exported to /proc/net/ipconfig/ntp_servers, but is
+ otherwise unused (see below).
+
+ Default: None if not using autoconfiguration; determined
+ automatically if using autoconfiguration.
+
+ After configuration (whether manual or automatic) is complete, two files
+ are created in the following format; lines are omitted if their respective
+ value is empty following configuration:
+
+ - /proc/net/pnp:
+
+ #PROTO: <DHCP|BOOTP|RARP|MANUAL> (depending on configuration method)
+ domain <dns-domain> (if autoconfigured, the DNS domain)
+ nameserver <dns0-ip> (primary name server IP)
+ nameserver <dns1-ip> (secondary name server IP)
+ nameserver <dns2-ip> (tertiary name server IP)
+ bootserver <server-ip> (NFS server IP)
+
+ - /proc/net/ipconfig/ntp_servers:
+
+ <ntp0-ip> (NTP server IP)
+ <ntp1-ip> (NTP server IP)
+ <ntp2-ip> (NTP server IP)
+
+ <dns-domain> and <dns2-ip> (in /proc/net/pnp) and <ntp1-ip> and <ntp2-ip>
+ (in /proc/net/ipconfig/ntp_servers) are requested during autoconfiguration;
+ they cannot be specified as part of the "ip=" kernel command line parameter.
+
+ Because the "domain" and "nameserver" options are recognised by DNS
+ resolvers, /etc/resolv.conf is often linked to /proc/net/pnp on systems
+ that use an NFS root filesystem.
+
+ Note that the kernel will not synchronise the system time with any NTP
+ servers it discovers; this is the responsibility of a user space process
+ (e.g. an initrd/initramfs script that passes the IP addresses listed in
+ /proc/net/ipconfig/ntp_servers to an NTP client before mounting the real
+ root filesystem if it is on NFS).
nfsrootdebug
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
new file mode 100644
index 0000000..91928d9
--- /dev/null
+++ b/Documentation/networking/af_xdp.rst
@@ -0,0 +1,297 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======
+AF_XDP
+======
+
+Overview
+========
+
+AF_XDP is an address family that is optimized for high performance
+packet processing.
+
+This document assumes that the reader is familiar with BPF and XDP. If
+not, the Cilium project has an excellent reference guide at
+http://cilium.readthedocs.io/en/doc-1.0/bpf/.
+
+Using the XDP_REDIRECT action from an XDP program, the program can
+redirect ingress frames to other XDP enabled netdevs, using the
+bpf_redirect_map() function. AF_XDP sockets enable the possibility for
+XDP programs to redirect frames to a memory buffer in a user-space
+application.
+
+An AF_XDP socket (XSK) is created with the normal socket()
+syscall. Associated with each XSK are two rings: the RX ring and the
+TX ring. A socket can receive packets on the RX ring and it can send
+packets on the TX ring. These rings are registered and sized with the
+setsockopts XDP_RX_RING and XDP_TX_RING, respectively. It is mandatory
+to have at least one of these rings for each socket. An RX or TX
+descriptor ring points to a data buffer in a memory area called a
+UMEM. RX and TX can share the same UMEM so that a packet does not have
+to be copied between RX and TX. Moreover, if a packet needs to be kept
+for a while due to a possible retransmit, the descriptor that points
+to that packet can be changed to point to another and reused right
+away. This again avoids copying data.
+
+The UMEM consists of a number of equally size frames and each frame
+has a unique frame id. A descriptor in one of the rings references a
+frame by referencing its frame id. The user space allocates memory for
+this UMEM using whatever means it feels is most appropriate (malloc,
+mmap, huge pages, etc). This memory area is then registered with the
+kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two
+rings: the FILL ring and the COMPLETION ring. The fill ring is used by
+the application to send down frame ids for the kernel to fill in with
+RX packet data. References to these frames will then appear in the RX
+ring once each packet has been received. The completion ring, on the
+other hand, contains frame ids that the kernel has transmitted
+completely and can now be used again by user space, for either TX or
+RX. Thus, the frame ids appearing in the completion ring are ids that
+were previously transmitted using the TX ring. In summary, the RX and
+FILL rings are used for the RX path and the TX and COMPLETION rings
+are used for the TX path.
+
+The socket is then finally bound with a bind() call to a device and a
+specific queue id on that device, and it is not until bind is
+completed that traffic starts to flow.
+
+The UMEM can be shared between processes, if desired. If a process
+wants to do this, it simply skips the registration of the UMEM and its
+corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind
+call and submits the XSK of the process it would like to share UMEM
+with as well as its own newly created XSK socket. The new process will
+then receive frame id references in its own RX ring that point to this
+shared UMEM. Note that since the ring structures are single-consumer /
+single-producer (for performance reasons), the new process has to
+create its own socket with associated RX and TX rings, since it cannot
+share this with the other process. This is also the reason that there
+is only one set of FILL and COMPLETION rings per UMEM. It is the
+responsibility of a single process to handle the UMEM.
+
+How is then packets distributed from an XDP program to the XSKs? There
+is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The
+user-space application can place an XSK at an arbitrary place in this
+map. The XDP program can then redirect a packet to a specific index in
+this map and at this point XDP validates that the XSK in that map was
+indeed bound to that device and ring number. If not, the packet is
+dropped. If the map is empty at that index, the packet is also
+dropped. This also means that it is currently mandatory to have an XDP
+program loaded (and one XSK in the XSKMAP) to be able to get any
+traffic to user space through the XSK.
+
+AF_XDP can operate in two different modes: XDP_SKB and XDP_DRV. If the
+driver does not have support for XDP, or XDP_SKB is explicitly chosen
+when loading the XDP program, XDP_SKB mode is employed that uses SKBs
+together with the generic XDP support and copies out the data to user
+space. A fallback mode that works for any network device. On the other
+hand, if the driver has support for XDP, it will be used by the AF_XDP
+code to provide better performance, but there is still a copy of the
+data into user space.
+
+Concepts
+========
+
+In order to use an AF_XDP socket, a number of associated objects need
+to be setup.
+
+Jonathan Corbet has also written an excellent article on LWN,
+"Accelerating networking with AF_XDP". It can be found at
+https://lwn.net/Articles/750845/.
+
+UMEM
+----
+
+UMEM is a region of virtual contiguous memory, divided into
+equal-sized frames. An UMEM is associated to a netdev and a specific
+queue id of that netdev. It is created and configured (frame size,
+frame headroom, start address and size) by using the XDP_UMEM_REG
+setsockopt system call. A UMEM is bound to a netdev and queue id, via
+the bind() system call.
+
+An AF_XDP is socket linked to a single UMEM, but one UMEM can have
+multiple AF_XDP sockets. To share an UMEM created via one socket A,
+the next socket B can do this by setting the XDP_SHARED_UMEM flag in
+struct sockaddr_xdp member sxdp_flags, and passing the file descriptor
+of A to struct sockaddr_xdp member sxdp_shared_umem_fd.
+
+The UMEM has two single-producer/single-consumer rings, that are used
+to transfer ownership of UMEM frames between the kernel and the
+user-space application.
+
+Rings
+-----
+
+There are a four different kind of rings: Fill, Completion, RX and
+TX. All rings are single-producer/single-consumer, so the user-space
+application need explicit synchronization of multiple
+processes/threads are reading/writing to them.
+
+The UMEM uses two rings: Fill and Completion. Each socket associated
+with the UMEM must have an RX queue, TX queue or both. Say, that there
+is a setup with four sockets (all doing TX and RX). Then there will be
+one Fill ring, one Completion ring, four TX rings and four RX rings.
+
+The rings are head(producer)/tail(consumer) based rings. A producer
+writes the data ring at the index pointed out by struct xdp_ring
+producer member, and increasing the producer index. A consumer reads
+the data ring at the index pointed out by struct xdp_ring consumer
+member, and increasing the consumer index.
+
+The rings are configured and created via the _RING setsockopt system
+calls and mmapped to user-space using the appropriate offset to mmap()
+(XDP_PGOFF_RX_RING, XDP_PGOFF_TX_RING, XDP_UMEM_PGOFF_FILL_RING and
+XDP_UMEM_PGOFF_COMPLETION_RING).
+
+The size of the rings need to be of size power of two.
+
+UMEM Fill Ring
+~~~~~~~~~~~~~~
+
+The Fill ring is used to transfer ownership of UMEM frames from
+user-space to kernel-space. The UMEM indicies are passed in the
+ring. As an example, if the UMEM is 64k and each frame is 4k, then the
+UMEM has 16 frames and can pass indicies between 0 and 15.
+
+Frames passed to the kernel are used for the ingress path (RX rings).
+
+The user application produces UMEM indicies to this ring.
+
+UMEM Completetion Ring
+~~~~~~~~~~~~~~~~~~~~~~
+
+The Completion Ring is used transfer ownership of UMEM frames from
+kernel-space to user-space. Just like the Fill ring, UMEM indicies are
+used.
+
+Frames passed from the kernel to user-space are frames that has been
+sent (TX ring) and can be used by user-space again.
+
+The user application consumes UMEM indicies from this ring.
+
+
+RX Ring
+~~~~~~~
+
+The RX ring is the receiving side of a socket. Each entry in the ring
+is a struct xdp_desc descriptor. The descriptor contains UMEM index
+(idx), the length of the data (len), the offset into the frame
+(offset).
+
+If no frames have been passed to kernel via the Fill ring, no
+descriptors will (or can) appear on the RX ring.
+
+The user application consumes struct xdp_desc descriptors from this
+ring.
+
+TX Ring
+~~~~~~~
+
+The TX ring is used to send frames. The struct xdp_desc descriptor is
+filled (index, length and offset) and passed into the ring.
+
+To start the transfer a sendmsg() system call is required. This might
+be relaxed in the future.
+
+The user application produces struct xdp_desc descriptors to this
+ring.
+
+XSKMAP / BPF_MAP_TYPE_XSKMAP
+----------------------------
+
+On XDP side there is a BPF map type BPF_MAP_TYPE_XSKMAP (XSKMAP) that
+is used in conjunction with bpf_redirect_map() to pass the ingress
+frame to a socket.
+
+The user application inserts the socket into the map, via the bpf()
+system call.
+
+Note that if an XDP program tries to redirect to a socket that does
+not match the queue configuration and netdev, the frame will be
+dropped. E.g. an AF_XDP socket is bound to netdev eth0 and
+queue 17. Only the XDP program executing for eth0 and queue 17 will
+successfully pass data to the socket. Please refer to the sample
+application (samples/bpf/) in for an example.
+
+Usage
+=====
+
+In order to use AF_XDP sockets there are two parts needed. The
+user-space application and the XDP program. For a complete setup and
+usage example, please refer to the sample application. The user-space
+side is xdpsock_user.c and the XDP side xdpsock_kern.c.
+
+Naive ring dequeue and enqueue could look like this::
+
+ // typedef struct xdp_rxtx_ring RING;
+ // typedef struct xdp_umem_ring RING;
+
+ // typedef struct xdp_desc RING_TYPE;
+ // typedef __u32 RING_TYPE;
+
+ int dequeue_one(RING *ring, RING_TYPE *item)
+ {
+ __u32 entries = ring->ptrs.producer - ring->ptrs.consumer;
+
+ if (entries == 0)
+ return -1;
+
+ // read-barrier!
+
+ *item = ring->desc[ring->ptrs.consumer & (RING_SIZE - 1)];
+ ring->ptrs.consumer++;
+ return 0;
+ }
+
+ int enqueue_one(RING *ring, const RING_TYPE *item)
+ {
+ u32 free_entries = RING_SIZE - (ring->ptrs.producer - ring->ptrs.consumer);
+
+ if (free_entries == 0)
+ return -1;
+
+ ring->desc[ring->ptrs.producer & (RING_SIZE - 1)] = *item;
+
+ // write-barrier!
+
+ ring->ptrs.producer++;
+ return 0;
+ }
+
+
+For a more optimized version, please refer to the sample application.
+
+Sample application
+==================
+
+There is a xdpsock benchmarking/test application included that
+demonstrates how to use AF_XDP sockets with both private and shared
+UMEMs. Say that you would like your UDP traffic from port 4242 to end
+up in queue 16, that we will enable AF_XDP on. Here, we use ethtool
+for this::
+
+ ethtool -N p3p2 rx-flow-hash udp4 fn
+ ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
+ action 16
+
+Running the rxdrop benchmark in XDP_DRV mode can then be done
+using::
+
+ samples/bpf/xdpsock -i p3p2 -q 16 -r -N
+
+For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
+can be displayed with "-h", as usual.
+
+Credits
+=======
+
+- Björn Töpel (AF_XDP core)
+- Magnus Karlsson (AF_XDP core)
+- Alexander Duyck
+- Alexei Starovoitov
+- Daniel Borkmann
+- Jesper Dangaard Brouer
+- John Fastabend
+- Jonathan Corbet (LWN coverage)
+- Michael S. Tsirkin
+- Qi Z Zhang
+- Willem de Bruijn
+
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9ba04c0..c13214d 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -140,7 +140,7 @@
Module options may be given as command line arguments to the
insmod or modprobe command, but are usually specified in either the
-/etc/modrobe.d/*.conf configuration files, or in a distro-specific
+/etc/modprobe.d/*.conf configuration files, or in a distro-specific
configuration file (some of which are detailed in the next section).
Details on bonding support for sysfs is provided in the
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index fd55c7d..e6b4ebb 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -483,6 +483,12 @@
[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
+When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
+setting any other value than that will return in failure. This is even the case for
+setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
+is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
+generally recommended approach instead.
+
In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
generating disassembly out of the kernel log's hexdump:
@@ -1136,6 +1142,7 @@
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
0x1ff), because of potential carries.
+
Besides arithmetic, the register state can also be updated by conditional
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false'
@@ -1144,14 +1151,16 @@
from the signed and unsigned bounds can be combined; for instance if a value is
first tested < 8 and then tested s> 4, the verifier will conclude that the value
is also > 4 and s< 8, since the bounds prevent crossing the sign boundary.
+
PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all
pointers sharing that same variable offset. This is important for packet range
-checks: after adding some variable to a packet pointer, if you then copy it to
-another register and (say) add a constant 4, both registers will share the same
-'id' but one will have a fixed offset of +4. Then if it is bounds-checked and
-found to be less than a PTR_TO_PACKET_END, the other register is now known to
-have a safe range of at least 4 bytes. See 'Direct packet access', below, for
-more on PTR_TO_PACKET ranges.
+checks: after adding a variable to a packet pointer register A, if you then copy
+it to another register B and then add a constant 4 to A, both registers will
+share the same 'id' but the A will have a fixed offset of +4. Then if A is
+bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is
+now known to have a safe range of at least 4 bytes. See 'Direct packet access',
+below, for more on PTR_TO_PACKET ranges.
+
The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of
the pointer returned from a map lookup. This means that when one copy is
checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index f204eaf..cbd9bdd 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -6,6 +6,7 @@
.. toctree::
:maxdepth: 2
+ af_xdp
batman-adv
can
dpaa2/index
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 35ffaa2..924bd51 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -449,8 +449,10 @@
features.
RACK: 0x1 enables the RACK loss detection for fast detection of lost
- retransmissions and tail drops.
+ retransmissions and tail drops. It also subsumes and disables
+ RFC6675 recovery for SACK connections.
RACK: 0x2 makes RACK's reordering window static (min_rtt/4).
+ RACK: 0x4 disables RACK's DUPACK threshold heuristic
Default: 0x1
@@ -523,6 +525,19 @@
tcp_sack - BOOLEAN
Enable select acknowledgments (SACKS).
+tcp_comp_sack_delay_ns - LONG INTEGER
+ TCP tries to reduce number of SACK sent, using a timer
+ based on 5% of SRTT, capped by this sysctl, in nano seconds.
+ The default is 1ms, based on TSO autosizing period.
+
+ Default : 1,000,000 ns (1 ms)
+
+tcp_comp_sack_nr - INTEGER
+ Max numer of SACK that can be compressed.
+ Using 0 disables SACK compression.
+
+ Detault : 44
+
tcp_slow_start_after_idle - BOOLEAN
If set, provide RFC2861 behavior and time out the congestion
window after an idle period. An idle period is defined at
@@ -1428,6 +1443,19 @@
ip6frag_time - INTEGER
Time in seconds to keep an IPv6 fragment in memory.
+IPv6 Segment Routing:
+
+seg6_flowlabel - INTEGER
+ Controls the behaviour of computing the flowlabel of outer
+ IPv6 header in case of SR T.encaps
+
+ -1 set flowlabel to zero.
+ 0 copy flowlabel from Inner packet in case of Inner IPv6
+ (Set flowlabel to 0 in case IPv4/L2)
+ 1 Compute the flowlabel using seg6_make_flowlabel()
+
+ Default is 0.
+
conf/default/*:
Change the interface-specific default settings.
diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
index c77f9d5..c4a54c1 100644
--- a/Documentation/networking/netdev-features.txt
+++ b/Documentation/networking/netdev-features.txt
@@ -113,6 +113,13 @@
NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
+ * Transmit UDP segmentation offload
+
+NETIF_F_GSO_UDP_GSO_L4 accepts a single UDP header with a payload that exceeds
+gso_size. On segmentation, it segments the payload on gso_size boundaries and
+replicates the network and UDP headers (fixing up the last one if less than
+gso_size).
+
* Transmit DMA from high memory
On platforms where this is relevant, NETIF_F_HIGHDMA signals that
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 5992602..9ecde51 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -45,6 +45,7 @@
translate these BPF proglets into native CPU instructions. There are
two flavors of JITs, the newer eBPF JIT currently supported on:
- x86_64
+ - x86_32
- arm64
- arm32
- ppc64
diff --git a/MAINTAINERS b/MAINTAINERS
index ca4afd6..f492431 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2728,7 +2728,6 @@
F: Documentation/bpf/
F: include/linux/bpf*
F: include/linux/filter.h
-F: include/trace/events/bpf.h
F: include/trace/events/xdp.h
F: include/uapi/linux/bpf*
F: include/uapi/linux/filter.h
@@ -5317,7 +5316,6 @@
F: include/linux/of_net.h
F: include/linux/phy.h
F: include/linux/phy_fixed.h
-F: include/linux/platform_data/mdio-gpio.h
F: include/linux/platform_data/mdio-bcm-unimac.h
F: include/trace/events/mdio.h
F: include/uapi/linux/mdio.h
@@ -8472,6 +8470,7 @@
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/dsa/mv88e6xxx/
+F: linux/platform_data/mv88e6xxx.h
F: Documentation/devicetree/bindings/net/dsa/marvell.txt
MARVELL ARMADA DRM SUPPORT
@@ -9025,12 +9024,14 @@
Q: http://patchwork.ozlabs.org/project/netdev/list/
F: drivers/net/ethernet/mellanox/mlx5/core/en_*
-MELLANOX ETHERNET INNOVA DRIVER
+MELLANOX ETHERNET INNOVA DRIVERS
R: Boris Pismenny <borisp@mellanox.com>
L: netdev@vger.kernel.org
S: Supported
W: http://www.mellanox.com
Q: http://patchwork.ozlabs.org/project/netdev/list/
+F: drivers/net/ethernet/mellanox/mlx5/core/en_accel/*
+F: drivers/net/ethernet/mellanox/mlx5/core/accel/*
F: drivers/net/ethernet/mellanox/mlx5/core/fpga/*
F: include/linux/mlx5/mlx5_ifc_fpga.h
@@ -9290,6 +9291,12 @@
F: include/uapi/linux/cciss*.h
F: Documentation/scsi/smartpqi.txt
+MICROSEMI ETHERNET SWITCH DRIVER
+M: Alexandre Belloni <alexandre.belloni@bootlin.com>
+L: netdev@vger.kernel.org
+S: Supported
+F: drivers/net/ethernet/mscc/
+
MICROSOFT SURFACE PRO 3 BUTTON DRIVER
M: Chen Yu <yu.c.chen@intel.com>
L: platform-driver-x86@vger.kernel.org
@@ -9833,6 +9840,7 @@
F: net/netfilter/xt_SECMARK.c
NETWORKING [TLS]
+M: Boris Pismenny <borisp@mellanox.com>
M: Aviad Yehezkel <aviadye@mellanox.com>
M: Dave Watson <davejwatson@fb.com>
L: netdev@vger.kernel.org
@@ -13401,6 +13409,7 @@
STMMAC ETHERNET DRIVER
M: Giuseppe Cavallaro <peppe.cavallaro@st.com>
M: Alexandre Torgue <alexandre.torgue@st.com>
+M: Jose Abreu <joabreu@synopsys.com>
L: netdev@vger.kernel.org
W: http://www.stlinux.com
S: Supported
@@ -14617,7 +14626,9 @@
M: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
L: netdev@vger.kernel.org
S: Maintained
+F: Documentation/devicetree/bindings/net/microchip,lan78xx.txt
F: drivers/net/usb/lan78xx.*
+F: include/dt-bindings/net/microchip-lan78xx.h
USB MASS STORAGE DRIVER
M: Alan Stern <stern@rowland.harvard.edu>
@@ -15405,6 +15416,14 @@
S: Maintained
F: drivers/media/tuners/tuner-xc2028.*
+XDP SOCKETS (AF_XDP)
+M: Björn Töpel <bjorn.topel@intel.com>
+M: Magnus Karlsson <magnus.karlsson@intel.com>
+L: netdev@vger.kernel.org
+S: Maintained
+F: kernel/bpf/xskmap.c
+F: net/xdp/
+
XEN BLOCK SUBSYSTEM
M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
M: Roger Pau Monné <roger.pau@citrix.com>
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index b5030e1..d3ea645 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -234,18 +234,11 @@
#define SCRATCH_SIZE 80
/* total stack size used in JITed code */
-#define _STACK_SIZE \
- (ctx->prog->aux->stack_depth + \
- + SCRATCH_SIZE + \
- + 4 /* extra for skb_copy_bits buffer */)
-
-#define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
+#define _STACK_SIZE (ctx->prog->aux->stack_depth + SCRATCH_SIZE)
+#define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
/* Get the offset of eBPF REGISTERs stored on scratch space. */
-#define STACK_VAR(off) (STACK_SIZE-off-4)
-
-/* Offset of skb_copy_bits buffer */
-#define SKB_BUFFER STACK_VAR(SCRATCH_SIZE)
+#define STACK_VAR(off) (STACK_SIZE - off)
#if __LINUX_ARM_ARCH__ < 7
@@ -1452,83 +1445,6 @@
emit(ARM_LDR_I(rn, ARM_SP, STACK_VAR(src_lo)), ctx);
emit_ldx_r(dst, rn, dstk, off, ctx, BPF_SIZE(code));
break;
- /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
- case BPF_LD | BPF_ABS | BPF_W:
- case BPF_LD | BPF_ABS | BPF_H:
- case BPF_LD | BPF_ABS | BPF_B:
- /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
- case BPF_LD | BPF_IND | BPF_W:
- case BPF_LD | BPF_IND | BPF_H:
- case BPF_LD | BPF_IND | BPF_B:
- {
- const u8 r4 = bpf2a32[BPF_REG_6][1]; /* r4 = ptr to sk_buff */
- const u8 r0 = bpf2a32[BPF_REG_0][1]; /*r0: struct sk_buff *skb*/
- /* rtn value */
- const u8 r1 = bpf2a32[BPF_REG_0][0]; /* r1: int k */
- const u8 r2 = bpf2a32[BPF_REG_1][1]; /* r2: unsigned int size */
- const u8 r3 = bpf2a32[BPF_REG_1][0]; /* r3: void *buffer */
- const u8 r6 = bpf2a32[TMP_REG_1][1]; /* r6: void *(*func)(..) */
- int size;
-
- /* Setting up first argument */
- emit(ARM_MOV_R(r0, r4), ctx);
-
- /* Setting up second argument */
- emit_a32_mov_i(r1, imm, false, ctx);
- if (BPF_MODE(code) == BPF_IND)
- emit_a32_alu_r(r1, src_lo, false, sstk, ctx,
- false, false, BPF_ADD);
-
- /* Setting up third argument */
- switch (BPF_SIZE(code)) {
- case BPF_W:
- size = 4;
- break;
- case BPF_H:
- size = 2;
- break;
- case BPF_B:
- size = 1;
- break;
- default:
- return -EINVAL;
- }
- emit_a32_mov_i(r2, size, false, ctx);
-
- /* Setting up fourth argument */
- emit(ARM_ADD_I(r3, ARM_SP, imm8m(SKB_BUFFER)), ctx);
-
- /* Setting up function pointer to call */
- emit_a32_mov_i(r6, (unsigned int)bpf_load_pointer, false, ctx);
- emit_blx_r(r6, ctx);
-
- emit(ARM_EOR_R(r1, r1, r1), ctx);
- /* Check if return address is NULL or not.
- * if NULL then jump to epilogue
- * else continue to load the value from retn address
- */
- emit(ARM_CMP_I(r0, 0), ctx);
- jmp_offset = epilogue_offset(ctx);
- check_imm24(jmp_offset);
- _emit(ARM_COND_EQ, ARM_B(jmp_offset), ctx);
-
- /* Load value from the address */
- switch (BPF_SIZE(code)) {
- case BPF_W:
- emit(ARM_LDR_I(r0, r0, 0), ctx);
- emit_rev32(r0, r0, ctx);
- break;
- case BPF_H:
- emit(ARM_LDRH_I(r0, r0, 0), ctx);
- emit_rev16(r0, r0, ctx);
- break;
- case BPF_B:
- emit(ARM_LDRB_I(r0, r0, 0), ctx);
- /* No need to reverse */
- break;
- }
- break;
- }
/* ST: *(size *)(dst + off) = imm */
case BPF_ST | BPF_MEM | BPF_W:
case BPF_ST | BPF_MEM | BPF_H:
diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi b/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi
index 24552f1..6a57387 100644
--- a/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi
@@ -36,4 +36,30 @@
drive-strength = <2>; /* 2 MA */
};
};
+
+ blsp1_uart1_default: blsp1_uart1_default {
+ mux {
+ pins = "gpio41", "gpio42", "gpio43", "gpio44";
+ function = "blsp_uart2";
+ };
+
+ config {
+ pins = "gpio41", "gpio42", "gpio43", "gpio44";
+ drive-strength = <16>;
+ bias-disable;
+ };
+ };
+
+ blsp1_uart1_sleep: blsp1_uart1_sleep {
+ mux {
+ pins = "gpio41", "gpio42", "gpio43", "gpio44";
+ function = "gpio";
+ };
+
+ config {
+ pins = "gpio41", "gpio42", "gpio43", "gpio44";
+ drive-strength = <2>;
+ bias-disable;
+ };
+ };
};
diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi b/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi
index 59b29ddf..6167af9 100644
--- a/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi
@@ -14,6 +14,28 @@
};
};
+ bt_en_gpios: bt_en_gpios {
+ pinconf {
+ pins = "gpio19";
+ function = PMIC_GPIO_FUNC_NORMAL;
+ output-low;
+ power-source = <PM8994_GPIO_S4>; // 1.8V
+ qcom,drive-strength = <PMIC_GPIO_STRENGTH_LOW>;
+ bias-pull-down;
+ };
+ };
+
+ wlan_en_gpios: wlan_en_gpios {
+ pinconf {
+ pins = "gpio8";
+ function = PMIC_GPIO_FUNC_NORMAL;
+ output-low;
+ power-source = <PM8994_GPIO_S4>; // 1.8V
+ qcom,drive-strength = <PMIC_GPIO_STRENGTH_LOW>;
+ bias-pull-down;
+ };
+ };
+
volume_up_gpio: pm8996_gpio2 {
pinconf {
pins = "gpio2";
@@ -26,6 +48,16 @@
};
};
+ divclk4_pin_a: divclk4 {
+ pinconf {
+ pins = "gpio18";
+ function = PMIC_GPIO_FUNC_FUNC2;
+
+ bias-disable;
+ power-source = <PM8994_GPIO_S4>;
+ };
+ };
+
usb3_vbus_det_gpio: pm8996_gpio22 {
pinconf {
pins = "gpio22";
diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
index 1c8f1b8..4b8bb02 100644
--- a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
@@ -23,6 +23,7 @@
aliases {
serial0 = &blsp2_uart1;
serial1 = &blsp2_uart2;
+ serial2 = &blsp1_uart1;
i2c0 = &blsp1_i2c2;
i2c1 = &blsp2_i2c1;
i2c2 = &blsp2_i2c0;
@@ -34,7 +35,36 @@
stdout-path = "serial0:115200n8";
};
+ clocks {
+ divclk4: divclk4 {
+ compatible = "fixed-clock";
+ #clock-cells = <0>;
+ clock-frequency = <32768>;
+ clock-output-names = "divclk4";
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&divclk4_pin_a>;
+ };
+ };
+
soc {
+ serial@7570000 {
+ label = "BT-UART";
+ status = "okay";
+ pinctrl-names = "default", "sleep";
+ pinctrl-0 = <&blsp1_uart1_default>;
+ pinctrl-1 = <&blsp1_uart1_sleep>;
+
+ bluetooth {
+ compatible = "qcom,qca6174-bt";
+
+ /* bt_disable_n gpio */
+ enable-gpios = <&pm8994_gpios 19 GPIO_ACTIVE_HIGH>;
+
+ clocks = <&divclk4>;
+ };
+ };
+
serial@75b0000 {
label = "LS-UART1";
status = "okay";
@@ -139,9 +169,40 @@
pinctrl-0 = <&usb2_vbus_det_gpio>;
};
+ bt_en: bt-en-1-8v {
+ pinctrl-names = "default";
+ pinctrl-0 = <&bt_en_gpios>;
+ compatible = "regulator-fixed";
+ regulator-name = "bt-en-regulator";
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+
+ /* WLAN card specific delay */
+ startup-delay-us = <70000>;
+ enable-active-high;
+ };
+
+ wlan_en: wlan-en-1-8v {
+ pinctrl-names = "default";
+ pinctrl-0 = <&wlan_en_gpios>;
+ compatible = "regulator-fixed";
+ regulator-name = "wlan-en-regulator";
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+
+ gpio = <&pm8994_gpios 8 0>;
+
+ /* WLAN card specific delay */
+ startup-delay-us = <70000>;
+ enable-active-high;
+ };
+
agnoc@0 {
qcom,pcie@600000 {
+ status = "okay";
perst-gpio = <&msmgpio 35 GPIO_ACTIVE_LOW>;
+ vddpe-supply = <&wlan_en>;
+ vddpe1-supply = <&bt_en>;
};
qcom,pcie@608000 {
diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index 410ae78..f8e49d0 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -419,6 +419,16 @@
#clock-cells = <1>;
};
+ blsp1_uart1: serial@7570000 {
+ compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm";
+ reg = <0x07570000 0x1000>;
+ interrupts = <GIC_SPI 108 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&gcc GCC_BLSP1_UART2_APPS_CLK>,
+ <&gcc GCC_BLSP1_AHB_CLK>;
+ clock-names = "core", "iface";
+ status = "disabled";
+ };
+
blsp1_spi0: spi@7575000 {
compatible = "qcom,spi-qup-v2.2.1";
reg = <0x07575000 0x600>;
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index a933504..a6fdaea 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -21,7 +21,6 @@
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/printk.h>
-#include <linux/skbuff.h>
#include <linux/slab.h>
#include <asm/byteorder.h>
@@ -80,37 +79,6 @@
ctx->idx++;
}
-static inline void emit_a64_mov_i64(const int reg, const u64 val,
- struct jit_ctx *ctx)
-{
- u64 tmp = val;
- int shift = 0;
-
- emit(A64_MOVZ(1, reg, tmp & 0xffff, shift), ctx);
- tmp >>= 16;
- shift += 16;
- while (tmp) {
- if (tmp & 0xffff)
- emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx);
- tmp >>= 16;
- shift += 16;
- }
-}
-
-static inline void emit_addr_mov_i64(const int reg, const u64 val,
- struct jit_ctx *ctx)
-{
- u64 tmp = val;
- int shift = 0;
-
- emit(A64_MOVZ(1, reg, tmp & 0xffff, shift), ctx);
- for (;shift < 48;) {
- tmp >>= 16;
- shift += 16;
- emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx);
- }
-}
-
static inline void emit_a64_mov_i(const int is64, const int reg,
const s32 val, struct jit_ctx *ctx)
{
@@ -122,7 +90,8 @@
emit(A64_MOVN(is64, reg, (u16)~lo, 0), ctx);
} else {
emit(A64_MOVN(is64, reg, (u16)~hi, 16), ctx);
- emit(A64_MOVK(is64, reg, lo, 0), ctx);
+ if (lo != 0xffff)
+ emit(A64_MOVK(is64, reg, lo, 0), ctx);
}
} else {
emit(A64_MOVZ(is64, reg, lo, 0), ctx);
@@ -131,6 +100,59 @@
}
}
+static int i64_i16_blocks(const u64 val, bool inverse)
+{
+ return (((val >> 0) & 0xffff) != (inverse ? 0xffff : 0x0000)) +
+ (((val >> 16) & 0xffff) != (inverse ? 0xffff : 0x0000)) +
+ (((val >> 32) & 0xffff) != (inverse ? 0xffff : 0x0000)) +
+ (((val >> 48) & 0xffff) != (inverse ? 0xffff : 0x0000));
+}
+
+static inline void emit_a64_mov_i64(const int reg, const u64 val,
+ struct jit_ctx *ctx)
+{
+ u64 nrm_tmp = val, rev_tmp = ~val;
+ bool inverse;
+ int shift;
+
+ if (!(nrm_tmp >> 32))
+ return emit_a64_mov_i(0, reg, (u32)val, ctx);
+
+ inverse = i64_i16_blocks(nrm_tmp, true) < i64_i16_blocks(nrm_tmp, false);
+ shift = max(round_down((inverse ? (fls64(rev_tmp) - 1) :
+ (fls64(nrm_tmp) - 1)), 16), 0);
+ if (inverse)
+ emit(A64_MOVN(1, reg, (rev_tmp >> shift) & 0xffff, shift), ctx);
+ else
+ emit(A64_MOVZ(1, reg, (nrm_tmp >> shift) & 0xffff, shift), ctx);
+ shift -= 16;
+ while (shift >= 0) {
+ if (((nrm_tmp >> shift) & 0xffff) != (inverse ? 0xffff : 0x0000))
+ emit(A64_MOVK(1, reg, (nrm_tmp >> shift) & 0xffff, shift), ctx);
+ shift -= 16;
+ }
+}
+
+/*
+ * This is an unoptimized 64 immediate emission used for BPF to BPF call
+ * addresses. It will always do a full 64 bit decomposition as otherwise
+ * more complexity in the last extra pass is required since we previously
+ * reserved 4 instructions for the address.
+ */
+static inline void emit_addr_mov_i64(const int reg, const u64 val,
+ struct jit_ctx *ctx)
+{
+ u64 tmp = val;
+ int shift = 0;
+
+ emit(A64_MOVZ(1, reg, tmp & 0xffff, shift), ctx);
+ for (;shift < 48;) {
+ tmp >>= 16;
+ shift += 16;
+ emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx);
+ }
+}
+
static inline int bpf2a64_offset(int bpf_to, int bpf_from,
const struct jit_ctx *ctx)
{
@@ -163,7 +185,7 @@
/* Tail call offset to jump into */
#define PROLOGUE_OFFSET 7
-static int build_prologue(struct jit_ctx *ctx)
+static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
{
const struct bpf_prog *prog = ctx->prog;
const u8 r6 = bpf2a64[BPF_REG_6];
@@ -188,7 +210,7 @@
* | ... | BPF prog stack
* | |
* +-----+ <= (BPF_FP - prog->aux->stack_depth)
- * |RSVD | JIT scratchpad
+ * |RSVD | padding
* current A64_SP => +-----+ <= (BPF_FP - ctx->stack_size)
* | |
* | ... | Function call stack
@@ -210,19 +232,19 @@
/* Set up BPF prog stack base register */
emit(A64_MOV(1, fp, A64_SP), ctx);
- /* Initialize tail_call_cnt */
- emit(A64_MOVZ(1, tcc, 0, 0), ctx);
+ if (!ebpf_from_cbpf) {
+ /* Initialize tail_call_cnt */
+ emit(A64_MOVZ(1, tcc, 0, 0), ctx);
- cur_offset = ctx->idx - idx0;
- if (cur_offset != PROLOGUE_OFFSET) {
- pr_err_once("PROLOGUE_OFFSET = %d, expected %d!\n",
- cur_offset, PROLOGUE_OFFSET);
- return -1;
+ cur_offset = ctx->idx - idx0;
+ if (cur_offset != PROLOGUE_OFFSET) {
+ pr_err_once("PROLOGUE_OFFSET = %d, expected %d!\n",
+ cur_offset, PROLOGUE_OFFSET);
+ return -1;
+ }
}
- /* 4 byte extra for skb_copy_bits buffer */
- ctx->stack_size = prog->aux->stack_depth + 4;
- ctx->stack_size = STACK_ALIGN(ctx->stack_size);
+ ctx->stack_size = STACK_ALIGN(prog->aux->stack_depth);
/* Set up function call stack */
emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
@@ -723,71 +745,6 @@
emit(A64_CBNZ(0, tmp3, jmp_offset), ctx);
break;
- /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
- case BPF_LD | BPF_ABS | BPF_W:
- case BPF_LD | BPF_ABS | BPF_H:
- case BPF_LD | BPF_ABS | BPF_B:
- /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
- case BPF_LD | BPF_IND | BPF_W:
- case BPF_LD | BPF_IND | BPF_H:
- case BPF_LD | BPF_IND | BPF_B:
- {
- const u8 r0 = bpf2a64[BPF_REG_0]; /* r0 = return value */
- const u8 r6 = bpf2a64[BPF_REG_6]; /* r6 = pointer to sk_buff */
- const u8 fp = bpf2a64[BPF_REG_FP];
- const u8 r1 = bpf2a64[BPF_REG_1]; /* r1: struct sk_buff *skb */
- const u8 r2 = bpf2a64[BPF_REG_2]; /* r2: int k */
- const u8 r3 = bpf2a64[BPF_REG_3]; /* r3: unsigned int size */
- const u8 r4 = bpf2a64[BPF_REG_4]; /* r4: void *buffer */
- const u8 r5 = bpf2a64[BPF_REG_5]; /* r5: void *(*func)(...) */
- int size;
-
- emit(A64_MOV(1, r1, r6), ctx);
- emit_a64_mov_i(0, r2, imm, ctx);
- if (BPF_MODE(code) == BPF_IND)
- emit(A64_ADD(0, r2, r2, src), ctx);
- switch (BPF_SIZE(code)) {
- case BPF_W:
- size = 4;
- break;
- case BPF_H:
- size = 2;
- break;
- case BPF_B:
- size = 1;
- break;
- default:
- return -EINVAL;
- }
- emit_a64_mov_i64(r3, size, ctx);
- emit(A64_SUB_I(1, r4, fp, ctx->stack_size), ctx);
- emit_a64_mov_i64(r5, (unsigned long)bpf_load_pointer, ctx);
- emit(A64_BLR(r5), ctx);
- emit(A64_MOV(1, r0, A64_R(0)), ctx);
-
- jmp_offset = epilogue_offset(ctx);
- check_imm19(jmp_offset);
- emit(A64_CBZ(1, r0, jmp_offset), ctx);
- emit(A64_MOV(1, r5, r0), ctx);
- switch (BPF_SIZE(code)) {
- case BPF_W:
- emit(A64_LDR32(r0, r5, A64_ZR), ctx);
-#ifndef CONFIG_CPU_BIG_ENDIAN
- emit(A64_REV32(0, r0, r0), ctx);
-#endif
- break;
- case BPF_H:
- emit(A64_LDRH(r0, r5, A64_ZR), ctx);
-#ifndef CONFIG_CPU_BIG_ENDIAN
- emit(A64_REV16(0, r0, r0), ctx);
-#endif
- break;
- case BPF_B:
- emit(A64_LDRB(r0, r5, A64_ZR), ctx);
- break;
- }
- break;
- }
default:
pr_err_once("unknown opcode %02x\n", code);
return -EINVAL;
@@ -851,6 +808,7 @@
struct bpf_prog *tmp, *orig_prog = prog;
struct bpf_binary_header *header;
struct arm64_jit_data *jit_data;
+ bool was_classic = bpf_prog_was_classic(prog);
bool tmp_blinded = false;
bool extra_pass = false;
struct jit_ctx ctx;
@@ -905,7 +863,7 @@
goto out_off;
}
- if (build_prologue(&ctx)) {
+ if (build_prologue(&ctx, was_classic)) {
prog = orig_prog;
goto out_off;
}
@@ -928,7 +886,7 @@
skip_init_ctx:
ctx.idx = 0;
- build_prologue(&ctx);
+ build_prologue(&ctx, was_classic);
if (build_body(&ctx)) {
bpf_jit_binary_free(header);
diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 3e2798b..aeb7b1b 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -95,7 +95,6 @@
* struct jit_ctx - JIT context
* @skf: The sk_filter
* @stack_size: eBPF stack size
- * @tmp_offset: eBPF $sp offset to 8-byte temporary memory
* @idx: Instruction index
* @flags: JIT flags
* @offsets: Instruction offsets
@@ -105,7 +104,6 @@
struct jit_ctx {
const struct bpf_prog *skf;
int stack_size;
- int tmp_offset;
u32 idx;
u32 flags;
u32 *offsets;
@@ -293,7 +291,6 @@
locals_size = (ctx->flags & EBPF_SEEN_FP) ? MAX_BPF_STACK : 0;
stack_adjust += locals_size;
- ctx->tmp_offset = locals_size;
ctx->stack_size = stack_adjust;
@@ -399,7 +396,6 @@
emit_instr(ctx, lui, reg, upper >> 16);
emit_instr(ctx, addiu, reg, reg, lower);
}
-
}
static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
@@ -547,28 +543,6 @@
return 0;
}
-static void * __must_check
-ool_skb_header_pointer(const struct sk_buff *skb, int offset,
- int len, void *buffer)
-{
- return skb_header_pointer(skb, offset, len, buffer);
-}
-
-static int size_to_len(const struct bpf_insn *insn)
-{
- switch (BPF_SIZE(insn->code)) {
- case BPF_B:
- return 1;
- case BPF_H:
- return 2;
- case BPF_W:
- return 4;
- case BPF_DW:
- return 8;
- }
- return 0;
-}
-
static void emit_const_to_reg(struct jit_ctx *ctx, int dst, u64 value)
{
if (value >= 0xffffffffffff8000ull || value < 0x8000ull) {
@@ -1267,110 +1241,6 @@
return -EINVAL;
break;
- case BPF_LD | BPF_B | BPF_ABS:
- case BPF_LD | BPF_H | BPF_ABS:
- case BPF_LD | BPF_W | BPF_ABS:
- case BPF_LD | BPF_DW | BPF_ABS:
- ctx->flags |= EBPF_SAVE_RA;
-
- gen_imm_to_reg(insn, MIPS_R_A1, ctx);
- emit_instr(ctx, addiu, MIPS_R_A2, MIPS_R_ZERO, size_to_len(insn));
-
- if (insn->imm < 0) {
- emit_const_to_reg(ctx, MIPS_R_T9, (u64)bpf_internal_load_pointer_neg_helper);
- } else {
- emit_const_to_reg(ctx, MIPS_R_T9, (u64)ool_skb_header_pointer);
- emit_instr(ctx, daddiu, MIPS_R_A3, MIPS_R_SP, ctx->tmp_offset);
- }
- goto ld_skb_common;
-
- case BPF_LD | BPF_B | BPF_IND:
- case BPF_LD | BPF_H | BPF_IND:
- case BPF_LD | BPF_W | BPF_IND:
- case BPF_LD | BPF_DW | BPF_IND:
- ctx->flags |= EBPF_SAVE_RA;
- src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
- if (src < 0)
- return src;
- ts = get_reg_val_type(ctx, this_idx, insn->src_reg);
- if (ts == REG_32BIT_ZERO_EX) {
- /* sign extend */
- emit_instr(ctx, sll, MIPS_R_A1, src, 0);
- src = MIPS_R_A1;
- }
- if (insn->imm >= S16_MIN && insn->imm <= S16_MAX) {
- emit_instr(ctx, daddiu, MIPS_R_A1, src, insn->imm);
- } else {
- gen_imm_to_reg(insn, MIPS_R_AT, ctx);
- emit_instr(ctx, daddu, MIPS_R_A1, MIPS_R_AT, src);
- }
- /* truncate to 32-bit int */
- emit_instr(ctx, sll, MIPS_R_A1, MIPS_R_A1, 0);
- emit_instr(ctx, daddiu, MIPS_R_A3, MIPS_R_SP, ctx->tmp_offset);
- emit_instr(ctx, slt, MIPS_R_AT, MIPS_R_A1, MIPS_R_ZERO);
-
- emit_const_to_reg(ctx, MIPS_R_T8, (u64)bpf_internal_load_pointer_neg_helper);
- emit_const_to_reg(ctx, MIPS_R_T9, (u64)ool_skb_header_pointer);
- emit_instr(ctx, addiu, MIPS_R_A2, MIPS_R_ZERO, size_to_len(insn));
- emit_instr(ctx, movn, MIPS_R_T9, MIPS_R_T8, MIPS_R_AT);
-
-ld_skb_common:
- emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
- /* delay slot move */
- emit_instr(ctx, daddu, MIPS_R_A0, MIPS_R_S0, MIPS_R_ZERO);
-
- /* Check the error value */
- b_off = b_imm(exit_idx, ctx);
- if (is_bad_offset(b_off)) {
- target = j_target(ctx, exit_idx);
- if (target == (unsigned int)-1)
- return -E2BIG;
-
- if (!(ctx->offsets[this_idx] & OFFSETS_B_CONV)) {
- ctx->offsets[this_idx] |= OFFSETS_B_CONV;
- ctx->long_b_conversion = 1;
- }
- emit_instr(ctx, bne, MIPS_R_V0, MIPS_R_ZERO, 4 * 3);
- emit_instr(ctx, nop);
- emit_instr(ctx, j, target);
- emit_instr(ctx, nop);
- } else {
- emit_instr(ctx, beq, MIPS_R_V0, MIPS_R_ZERO, b_off);
- emit_instr(ctx, nop);
- }
-
-#ifdef __BIG_ENDIAN
- need_swap = false;
-#else
- need_swap = true;
-#endif
- dst = MIPS_R_V0;
- switch (BPF_SIZE(insn->code)) {
- case BPF_B:
- emit_instr(ctx, lbu, dst, 0, MIPS_R_V0);
- break;
- case BPF_H:
- emit_instr(ctx, lhu, dst, 0, MIPS_R_V0);
- if (need_swap)
- emit_instr(ctx, wsbh, dst, dst);
- break;
- case BPF_W:
- emit_instr(ctx, lw, dst, 0, MIPS_R_V0);
- if (need_swap) {
- emit_instr(ctx, wsbh, dst, dst);
- emit_instr(ctx, rotr, dst, dst, 16);
- }
- break;
- case BPF_DW:
- emit_instr(ctx, ld, dst, 0, MIPS_R_V0);
- if (need_swap) {
- emit_instr(ctx, dsbh, dst, dst);
- emit_instr(ctx, dshd, dst, dst);
- }
- break;
- }
-
- break;
case BPF_ALU | BPF_END | BPF_FROM_BE:
case BPF_ALU | BPF_END | BPF_FROM_LE:
dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile
index 02d369c..809f019 100644
--- a/arch/powerpc/net/Makefile
+++ b/arch/powerpc/net/Makefile
@@ -3,7 +3,7 @@
# Arch-specific network modules
#
ifeq ($(CONFIG_PPC64),y)
-obj-$(CONFIG_BPF_JIT) += bpf_jit_asm64.o bpf_jit_comp64.o
+obj-$(CONFIG_BPF_JIT) += bpf_jit_comp64.o
else
obj-$(CONFIG_BPF_JIT) += bpf_jit_asm.o bpf_jit_comp.o
endif
diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 8bdef7e..3609be4 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -20,7 +20,7 @@
* with our redzone usage.
*
* [ prev sp ] <-------------
- * [ nv gpr save area ] 8*8 |
+ * [ nv gpr save area ] 6*8 |
* [ tail_call_cnt ] 8 |
* [ local_tmp_var ] 8 |
* fp (r31) --> [ ebpf stack space ] upto 512 |
@@ -28,8 +28,8 @@
* sp (r1) ---> [ stack pointer ] --------------
*/
-/* for gpr non volatile registers BPG_REG_6 to 10, plus skb cache registers */
-#define BPF_PPC_STACK_SAVE (8*8)
+/* for gpr non volatile registers BPG_REG_6 to 10 */
+#define BPF_PPC_STACK_SAVE (6*8)
/* for bpf JIT code internal usage */
#define BPF_PPC_STACK_LOCALS 16
/* stack frame excluding BPF stack, ensure this is quadword aligned */
@@ -39,10 +39,8 @@
#ifndef __ASSEMBLY__
/* BPF register usage */
-#define SKB_HLEN_REG (MAX_BPF_JIT_REG + 0)
-#define SKB_DATA_REG (MAX_BPF_JIT_REG + 1)
-#define TMP_REG_1 (MAX_BPF_JIT_REG + 2)
-#define TMP_REG_2 (MAX_BPF_JIT_REG + 3)
+#define TMP_REG_1 (MAX_BPF_JIT_REG + 0)
+#define TMP_REG_2 (MAX_BPF_JIT_REG + 1)
/* BPF to ppc register mappings */
static const int b2p[] = {
@@ -63,40 +61,23 @@
[BPF_REG_FP] = 31,
/* eBPF jit internal registers */
[BPF_REG_AX] = 2,
- [SKB_HLEN_REG] = 25,
- [SKB_DATA_REG] = 26,
[TMP_REG_1] = 9,
[TMP_REG_2] = 10
};
-/* PPC NVR range -- update this if we ever use NVRs below r24 */
-#define BPF_PPC_NVR_MIN 24
-
-/* Assembly helpers */
-#define DECLARE_LOAD_FUNC(func) u64 func(u64 r3, u64 r4); \
- u64 func##_negative_offset(u64 r3, u64 r4); \
- u64 func##_positive_offset(u64 r3, u64 r4);
-
-DECLARE_LOAD_FUNC(sk_load_word);
-DECLARE_LOAD_FUNC(sk_load_half);
-DECLARE_LOAD_FUNC(sk_load_byte);
-
-#define CHOOSE_LOAD_FUNC(imm, func) \
- (imm < 0 ? \
- (imm >= SKF_LL_OFF ? func##_negative_offset : func) : \
- func##_positive_offset)
+/* PPC NVR range -- update this if we ever use NVRs below r27 */
+#define BPF_PPC_NVR_MIN 27
#define SEEN_FUNC 0x1000 /* might call external helpers */
#define SEEN_STACK 0x2000 /* uses BPF stack */
-#define SEEN_SKB 0x4000 /* uses sk_buff */
-#define SEEN_TAILCALL 0x8000 /* uses tail calls */
+#define SEEN_TAILCALL 0x4000 /* uses tail calls */
struct codegen_context {
/*
* This is used to track register usage as well
* as calls to external helpers.
* - register usage is tracked with corresponding
- * bits (r3-r10 and r25-r31)
+ * bits (r3-r10 and r27-r31)
* - rest of the bits can be used to track other
* things -- for now, we use bits 16 to 23
* encoded in SEEN_* macros above
diff --git a/arch/powerpc/net/bpf_jit_asm64.S b/arch/powerpc/net/bpf_jit_asm64.S
deleted file mode 100644
index 7e4c514..0000000
--- a/arch/powerpc/net/bpf_jit_asm64.S
+++ /dev/null
@@ -1,180 +0,0 @@
-/*
- * bpf_jit_asm64.S: Packet/header access helper functions
- * for PPC64 BPF compiler.
- *
- * Copyright 2016, Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
- * IBM Corporation
- *
- * Based on bpf_jit_asm.S by Matt Evans
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; version 2
- * of the License.
- */
-
-#include <asm/ppc_asm.h>
-#include <asm/ptrace.h>
-#include "bpf_jit64.h"
-
-/*
- * All of these routines are called directly from generated code,
- * with the below register usage:
- * r27 skb pointer (ctx)
- * r25 skb header length
- * r26 skb->data pointer
- * r4 offset
- *
- * Result is passed back in:
- * r8 data read in host endian format (accumulator)
- *
- * r9 is used as a temporary register
- */
-
-#define r_skb r27
-#define r_hlen r25
-#define r_data r26
-#define r_off r4
-#define r_val r8
-#define r_tmp r9
-
-_GLOBAL_TOC(sk_load_word)
- cmpdi r_off, 0
- blt bpf_slow_path_word_neg
- b sk_load_word_positive_offset
-
-_GLOBAL_TOC(sk_load_word_positive_offset)
- /* Are we accessing past headlen? */
- subi r_tmp, r_hlen, 4
- cmpd r_tmp, r_off
- blt bpf_slow_path_word
- /* Nope, just hitting the header. cr0 here is eq or gt! */
- LWZX_BE r_val, r_data, r_off
- blr /* Return success, cr0 != LT */
-
-_GLOBAL_TOC(sk_load_half)
- cmpdi r_off, 0
- blt bpf_slow_path_half_neg
- b sk_load_half_positive_offset
-
-_GLOBAL_TOC(sk_load_half_positive_offset)
- subi r_tmp, r_hlen, 2
- cmpd r_tmp, r_off
- blt bpf_slow_path_half
- LHZX_BE r_val, r_data, r_off
- blr
-
-_GLOBAL_TOC(sk_load_byte)
- cmpdi r_off, 0
- blt bpf_slow_path_byte_neg
- b sk_load_byte_positive_offset
-
-_GLOBAL_TOC(sk_load_byte_positive_offset)
- cmpd r_hlen, r_off
- ble bpf_slow_path_byte
- lbzx r_val, r_data, r_off
- blr
-
-/*
- * Call out to skb_copy_bits:
- * Allocate a new stack frame here to remain ABI-compliant in
- * stashing LR.
- */
-#define bpf_slow_path_common(SIZE) \
- mflr r0; \
- std r0, PPC_LR_STKOFF(r1); \
- stdu r1, -(STACK_FRAME_MIN_SIZE + BPF_PPC_STACK_LOCALS)(r1); \
- mr r3, r_skb; \
- /* r4 = r_off as passed */ \
- addi r5, r1, STACK_FRAME_MIN_SIZE; \
- li r6, SIZE; \
- bl skb_copy_bits; \
- nop; \
- /* save r5 */ \
- addi r5, r1, STACK_FRAME_MIN_SIZE; \
- /* r3 = 0 on success */ \
- addi r1, r1, STACK_FRAME_MIN_SIZE + BPF_PPC_STACK_LOCALS; \
- ld r0, PPC_LR_STKOFF(r1); \
- mtlr r0; \
- cmpdi r3, 0; \
- blt bpf_error; /* cr0 = LT */
-
-bpf_slow_path_word:
- bpf_slow_path_common(4)
- /* Data value is on stack, and cr0 != LT */
- LWZX_BE r_val, 0, r5
- blr
-
-bpf_slow_path_half:
- bpf_slow_path_common(2)
- LHZX_BE r_val, 0, r5
- blr
-
-bpf_slow_path_byte:
- bpf_slow_path_common(1)
- lbzx r_val, 0, r5
- blr
-
-/*
- * Call out to bpf_internal_load_pointer_neg_helper
- */
-#define sk_negative_common(SIZE) \
- mflr r0; \
- std r0, PPC_LR_STKOFF(r1); \
- stdu r1, -STACK_FRAME_MIN_SIZE(r1); \
- mr r3, r_skb; \
- /* r4 = r_off, as passed */ \
- li r5, SIZE; \
- bl bpf_internal_load_pointer_neg_helper; \
- nop; \
- addi r1, r1, STACK_FRAME_MIN_SIZE; \
- ld r0, PPC_LR_STKOFF(r1); \
- mtlr r0; \
- /* R3 != 0 on success */ \
- cmpldi r3, 0; \
- beq bpf_error_slow; /* cr0 = EQ */
-
-bpf_slow_path_word_neg:
- lis r_tmp, -32 /* SKF_LL_OFF */
- cmpd r_off, r_tmp /* addr < SKF_* */
- blt bpf_error /* cr0 = LT */
- b sk_load_word_negative_offset
-
-_GLOBAL_TOC(sk_load_word_negative_offset)
- sk_negative_common(4)
- LWZX_BE r_val, 0, r3
- blr
-
-bpf_slow_path_half_neg:
- lis r_tmp, -32 /* SKF_LL_OFF */
- cmpd r_off, r_tmp /* addr < SKF_* */
- blt bpf_error /* cr0 = LT */
- b sk_load_half_negative_offset
-
-_GLOBAL_TOC(sk_load_half_negative_offset)
- sk_negative_common(2)
- LHZX_BE r_val, 0, r3
- blr
-
-bpf_slow_path_byte_neg:
- lis r_tmp, -32 /* SKF_LL_OFF */
- cmpd r_off, r_tmp /* addr < SKF_* */
- blt bpf_error /* cr0 = LT */
- b sk_load_byte_negative_offset
-
-_GLOBAL_TOC(sk_load_byte_negative_offset)
- sk_negative_common(1)
- lbzx r_val, 0, r3
- blr
-
-bpf_error_slow:
- /* fabricate a cr0 = lt */
- li r_tmp, -1
- cmpdi r_tmp, 0
-bpf_error:
- /*
- * Entered with cr0 = lt
- * Generated code will 'blt epilogue', returning 0.
- */
- li r_val, 0
- blr
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 0ef3d95..f1c9577 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -59,7 +59,7 @@
* [ prev sp ] <-------------
* [ ... ] |
* sp (r1) ---> [ stack pointer ] --------------
- * [ nv gpr save area ] 8*8
+ * [ nv gpr save area ] 6*8
* [ tail_call_cnt ] 8
* [ local_tmp_var ] 8
* [ unused red zone ] 208 bytes protected
@@ -88,21 +88,6 @@
BUG();
}
-static void bpf_jit_emit_skb_loads(u32 *image, struct codegen_context *ctx)
-{
- /*
- * Load skb->len and skb->data_len
- * r3 points to skb
- */
- PPC_LWZ(b2p[SKB_HLEN_REG], 3, offsetof(struct sk_buff, len));
- PPC_LWZ(b2p[TMP_REG_1], 3, offsetof(struct sk_buff, data_len));
- /* header_len = len - data_len */
- PPC_SUB(b2p[SKB_HLEN_REG], b2p[SKB_HLEN_REG], b2p[TMP_REG_1]);
-
- /* skb->data pointer */
- PPC_BPF_LL(b2p[SKB_DATA_REG], 3, offsetof(struct sk_buff, data));
-}
-
static void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
{
int i;
@@ -145,18 +130,6 @@
if (bpf_is_seen_register(ctx, i))
PPC_BPF_STL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, b2p[i]));
- /*
- * Save additional non-volatile regs if we cache skb
- * Also, setup skb data
- */
- if (ctx->seen & SEEN_SKB) {
- PPC_BPF_STL(b2p[SKB_HLEN_REG], 1,
- bpf_jit_stack_offsetof(ctx, b2p[SKB_HLEN_REG]));
- PPC_BPF_STL(b2p[SKB_DATA_REG], 1,
- bpf_jit_stack_offsetof(ctx, b2p[SKB_DATA_REG]));
- bpf_jit_emit_skb_loads(image, ctx);
- }
-
/* Setup frame pointer to point to the bpf stack area */
if (bpf_is_seen_register(ctx, BPF_REG_FP))
PPC_ADDI(b2p[BPF_REG_FP], 1,
@@ -172,14 +145,6 @@
if (bpf_is_seen_register(ctx, i))
PPC_BPF_LL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, b2p[i]));
- /* Restore non-volatile registers used for skb cache */
- if (ctx->seen & SEEN_SKB) {
- PPC_BPF_LL(b2p[SKB_HLEN_REG], 1,
- bpf_jit_stack_offsetof(ctx, b2p[SKB_HLEN_REG]));
- PPC_BPF_LL(b2p[SKB_DATA_REG], 1,
- bpf_jit_stack_offsetof(ctx, b2p[SKB_DATA_REG]));
- }
-
/* Tear down our stack frame */
if (bpf_has_stack_frame(ctx)) {
PPC_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size);
@@ -202,25 +167,37 @@
static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, u64 func)
{
+ unsigned int i, ctx_idx = ctx->idx;
+
+ /* Load function address into r12 */
+ PPC_LI64(12, func);
+
+ /* For bpf-to-bpf function calls, the callee's address is unknown
+ * until the last extra pass. As seen above, we use PPC_LI64() to
+ * load the callee's address, but this may optimize the number of
+ * instructions required based on the nature of the address.
+ *
+ * Since we don't want the number of instructions emitted to change,
+ * we pad the optimized PPC_LI64() call with NOPs to guarantee that
+ * we always have a five-instruction sequence, which is the maximum
+ * that PPC_LI64() can emit.
+ */
+ for (i = ctx->idx - ctx_idx; i < 5; i++)
+ PPC_NOP();
+
#ifdef PPC64_ELF_ABI_v1
- /* func points to the function descriptor */
- PPC_LI64(b2p[TMP_REG_2], func);
- /* Load actual entry point from function descriptor */
- PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
- /* ... and move it to LR */
- PPC_MTLR(b2p[TMP_REG_1]);
/*
* Load TOC from function descriptor at offset 8.
* We can clobber r2 since we get called through a
* function pointer (so caller will save/restore r2)
* and since we don't use a TOC ourself.
*/
- PPC_BPF_LL(2, b2p[TMP_REG_2], 8);
-#else
- /* We can clobber r12 */
- PPC_FUNC_ADDR(12, func);
- PPC_MTLR(12);
+ PPC_BPF_LL(2, 12, 8);
+ /* Load actual entry point from function descriptor */
+ PPC_BPF_LL(12, 12, 0);
#endif
+
+ PPC_MTLR(12);
PPC_BLRL();
}
@@ -291,7 +268,7 @@
/* Assemble the body code between the prologue & epilogue */
static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
struct codegen_context *ctx,
- u32 *addrs)
+ u32 *addrs, bool extra_pass)
{
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -747,29 +724,30 @@
break;
/*
- * Call kernel helper
+ * Call kernel helper or bpf function
*/
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
- func = (u8 *) __bpf_call_base + imm;
- /* Save skb pointer if we need to re-cache skb data */
- if ((ctx->seen & SEEN_SKB) &&
- bpf_helper_changes_pkt_data(func))
- PPC_BPF_STL(3, 1, bpf_jit_stack_local(ctx));
+ /* bpf function call */
+ if (insn[i].src_reg == BPF_PSEUDO_CALL)
+ if (!extra_pass)
+ func = NULL;
+ else if (fp->aux->func && off < fp->aux->func_cnt)
+ /* use the subprog id from the off
+ * field to lookup the callee address
+ */
+ func = (u8 *) fp->aux->func[off]->bpf_func;
+ else
+ return -EINVAL;
+ /* kernel helper call */
+ else
+ func = (u8 *) __bpf_call_base + imm;
bpf_jit_emit_func_call(image, ctx, (u64)func);
/* move return value from r3 to BPF_REG_0 */
PPC_MR(b2p[BPF_REG_0], 3);
-
- /* refresh skb cache */
- if ((ctx->seen & SEEN_SKB) &&
- bpf_helper_changes_pkt_data(func)) {
- /* reload skb pointer to r3 */
- PPC_BPF_LL(3, 1, bpf_jit_stack_local(ctx));
- bpf_jit_emit_skb_loads(image, ctx);
- }
break;
/*
@@ -887,65 +865,6 @@
break;
/*
- * Loads from packet header/data
- * Assume 32-bit input value in imm and X (src_reg)
- */
-
- /* Absolute loads */
- case BPF_LD | BPF_W | BPF_ABS:
- func = (u8 *)CHOOSE_LOAD_FUNC(imm, sk_load_word);
- goto common_load_abs;
- case BPF_LD | BPF_H | BPF_ABS:
- func = (u8 *)CHOOSE_LOAD_FUNC(imm, sk_load_half);
- goto common_load_abs;
- case BPF_LD | BPF_B | BPF_ABS:
- func = (u8 *)CHOOSE_LOAD_FUNC(imm, sk_load_byte);
-common_load_abs:
- /*
- * Load from [imm]
- * Load into r4, which can just be passed onto
- * skb load helpers as the second parameter
- */
- PPC_LI32(4, imm);
- goto common_load;
-
- /* Indirect loads */
- case BPF_LD | BPF_W | BPF_IND:
- func = (u8 *)sk_load_word;
- goto common_load_ind;
- case BPF_LD | BPF_H | BPF_IND:
- func = (u8 *)sk_load_half;
- goto common_load_ind;
- case BPF_LD | BPF_B | BPF_IND:
- func = (u8 *)sk_load_byte;
-common_load_ind:
- /*
- * Load from [src_reg + imm]
- * Treat src_reg as a 32-bit value
- */
- PPC_EXTSW(4, src_reg);
- if (imm) {
- if (imm >= -32768 && imm < 32768)
- PPC_ADDI(4, 4, IMM_L(imm));
- else {
- PPC_LI32(b2p[TMP_REG_1], imm);
- PPC_ADD(4, 4, b2p[TMP_REG_1]);
- }
- }
-
-common_load:
- ctx->seen |= SEEN_SKB;
- ctx->seen |= SEEN_FUNC;
- bpf_jit_emit_func_call(image, ctx, (u64)func);
-
- /*
- * Helper returns 'lt' condition on error, and an
- * appropriate return value in BPF_REG_0
- */
- PPC_BCC(COND_LT, exit_addr);
- break;
-
- /*
* Tail call
*/
case BPF_JMP | BPF_TAIL_CALL:
@@ -971,6 +890,14 @@
return 0;
}
+struct powerpc64_jit_data {
+ struct bpf_binary_header *header;
+ u32 *addrs;
+ u8 *image;
+ u32 proglen;
+ struct codegen_context ctx;
+};
+
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
{
u32 proglen;
@@ -978,6 +905,7 @@
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
+ struct powerpc64_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
@@ -985,6 +913,7 @@
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
+ bool extra_pass = false;
if (!fp->jit_requested)
return org_fp;
@@ -998,11 +927,32 @@
fp = tmp_fp;
}
+ jit_data = fp->aux->jit_data;
+ if (!jit_data) {
+ jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+ if (!jit_data) {
+ fp = org_fp;
+ goto out;
+ }
+ fp->aux->jit_data = jit_data;
+ }
+
flen = fp->len;
+ addrs = jit_data->addrs;
+ if (addrs) {
+ cgctx = jit_data->ctx;
+ image = jit_data->image;
+ bpf_hdr = jit_data->header;
+ proglen = jit_data->proglen;
+ alloclen = proglen + FUNCTION_DESCR_SIZE;
+ extra_pass = true;
+ goto skip_init_ctx;
+ }
+
addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL) {
fp = org_fp;
- goto out;
+ goto out_addrs;
}
memset(&cgctx, 0, sizeof(struct codegen_context));
@@ -1011,10 +961,10 @@
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
/* Scouting faux-generate pass 0 */
- if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+ if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
/* We hit something illegal or unsupported. */
fp = org_fp;
- goto out;
+ goto out_addrs;
}
/*
@@ -1032,9 +982,10 @@
bpf_jit_fill_ill_insns);
if (!bpf_hdr) {
fp = org_fp;
- goto out;
+ goto out_addrs;
}
+skip_init_ctx:
code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
/* Code generation passes 1-2 */
@@ -1042,7 +993,7 @@
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
bpf_jit_build_prologue(code_base, &cgctx);
- bpf_jit_build_body(fp, code_base, &cgctx, addrs);
+ bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
bpf_jit_build_epilogue(code_base, &cgctx);
if (bpf_jit_enable > 1)
@@ -1068,10 +1019,20 @@
fp->jited_len = alloclen;
bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+ if (!fp->is_func || extra_pass) {
+out_addrs:
+ kfree(addrs);
+ kfree(jit_data);
+ fp->aux->jit_data = NULL;
+ } else {
+ jit_data->addrs = addrs;
+ jit_data->ctx = cgctx;
+ jit_data->proglen = proglen;
+ jit_data->image = image;
+ jit_data->header = bpf_hdr;
+ }
out:
- kfree(addrs);
-
if (bpf_blinded)
bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
diff --git a/arch/s390/net/Makefile b/arch/s390/net/Makefile
index e0d5f24..d4663b4 100644
--- a/arch/s390/net/Makefile
+++ b/arch/s390/net/Makefile
@@ -2,4 +2,4 @@
#
# Arch-specific network modules
#
-obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
+obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o
diff --git a/arch/s390/net/bpf_jit.S b/arch/s390/net/bpf_jit.S
deleted file mode 100644
index 9f79486..0000000
--- a/arch/s390/net/bpf_jit.S
+++ /dev/null
@@ -1,120 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * BPF Jit compiler for s390, help functions.
- *
- * Copyright IBM Corp. 2012,2015
- *
- * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
- * Michael Holzheu <holzheu@linux.vnet.ibm.com>
- */
-
-#include <linux/linkage.h>
-#include <asm/nospec-insn.h>
-#include "bpf_jit.h"
-
-/*
- * Calling convention:
- * registers %r7-%r10, %r11,%r13, and %r15 are call saved
- *
- * Input (64 bit):
- * %r3 (%b2) = offset into skb data
- * %r6 (%b5) = return address
- * %r7 (%b6) = skb pointer
- * %r12 = skb data pointer
- *
- * Output:
- * %r14= %b0 = return value (read skb value)
- *
- * Work registers: %r2,%r4,%r5,%r14
- *
- * skb_copy_bits takes 4 parameters:
- * %r2 = skb pointer
- * %r3 = offset into skb data
- * %r4 = pointer to temp buffer
- * %r5 = length to copy
- * Return value in %r2: 0 = ok
- *
- * bpf_internal_load_pointer_neg_helper takes 3 parameters:
- * %r2 = skb pointer
- * %r3 = offset into data
- * %r4 = length to copy
- * Return value in %r2: Pointer to data
- */
-
-#define SKF_MAX_NEG_OFF -0x200000 /* SKF_LL_OFF from filter.h */
-
-/*
- * Load SIZE bytes from SKB
- */
-#define sk_load_common(NAME, SIZE, LOAD) \
-ENTRY(sk_load_##NAME); \
- ltgr %r3,%r3; /* Is offset negative? */ \
- jl sk_load_##NAME##_slow_neg; \
-ENTRY(sk_load_##NAME##_pos); \
- aghi %r3,SIZE; /* Offset + SIZE */ \
- clg %r3,STK_OFF_HLEN(%r15); /* Offset + SIZE > hlen? */ \
- jh sk_load_##NAME##_slow; \
- LOAD %r14,-SIZE(%r3,%r12); /* Get data from skb */ \
- B_EX OFF_OK,%r6; /* Return */ \
- \
-sk_load_##NAME##_slow:; \
- lgr %r2,%r7; /* Arg1 = skb pointer */ \
- aghi %r3,-SIZE; /* Arg2 = offset */ \
- la %r4,STK_OFF_TMP(%r15); /* Arg3 = temp bufffer */ \
- lghi %r5,SIZE; /* Arg4 = size */ \
- brasl %r14,skb_copy_bits; /* Get data from skb */ \
- LOAD %r14,STK_OFF_TMP(%r15); /* Load from temp bufffer */ \
- ltgr %r2,%r2; /* Set cc to (%r2 != 0) */ \
- BR_EX %r6; /* Return */
-
-sk_load_common(word, 4, llgf) /* r14 = *(u32 *) (skb->data+offset) */
-sk_load_common(half, 2, llgh) /* r14 = *(u16 *) (skb->data+offset) */
-
- GEN_BR_THUNK %r6
- GEN_B_THUNK OFF_OK,%r6
-
-/*
- * Load 1 byte from SKB (optimized version)
- */
- /* r14 = *(u8 *) (skb->data+offset) */
-ENTRY(sk_load_byte)
- ltgr %r3,%r3 # Is offset negative?
- jl sk_load_byte_slow_neg
-ENTRY(sk_load_byte_pos)
- clg %r3,STK_OFF_HLEN(%r15) # Offset >= hlen?
- jnl sk_load_byte_slow
- llgc %r14,0(%r3,%r12) # Get byte from skb
- B_EX OFF_OK,%r6 # Return OK
-
-sk_load_byte_slow:
- lgr %r2,%r7 # Arg1 = skb pointer
- # Arg2 = offset
- la %r4,STK_OFF_TMP(%r15) # Arg3 = pointer to temp buffer
- lghi %r5,1 # Arg4 = size (1 byte)
- brasl %r14,skb_copy_bits # Get data from skb
- llgc %r14,STK_OFF_TMP(%r15) # Load result from temp buffer
- ltgr %r2,%r2 # Set cc to (%r2 != 0)
- BR_EX %r6 # Return cc
-
-#define sk_negative_common(NAME, SIZE, LOAD) \
-sk_load_##NAME##_slow_neg:; \
- cgfi %r3,SKF_MAX_NEG_OFF; \
- jl bpf_error; \
- lgr %r2,%r7; /* Arg1 = skb pointer */ \
- /* Arg2 = offset */ \
- lghi %r4,SIZE; /* Arg3 = size */ \
- brasl %r14,bpf_internal_load_pointer_neg_helper; \
- ltgr %r2,%r2; \
- jz bpf_error; \
- LOAD %r14,0(%r2); /* Get data from pointer */ \
- xr %r3,%r3; /* Set cc to zero */ \
- BR_EX %r6; /* Return cc */
-
-sk_negative_common(word, 4, llgf)
-sk_negative_common(half, 2, llgh)
-sk_negative_common(byte, 1, llgc)
-
-bpf_error:
-# force a return 0 from jit handler
- ltgr %r15,%r15 # Set condition code
- BR_EX %r6
diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h
index 5e1e513..7822ea9 100644
--- a/arch/s390/net/bpf_jit.h
+++ b/arch/s390/net/bpf_jit.h
@@ -16,9 +16,6 @@
#include <linux/filter.h>
#include <linux/types.h>
-extern u8 sk_load_word_pos[], sk_load_half_pos[], sk_load_byte_pos[];
-extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
-
#endif /* __ASSEMBLY__ */
/*
@@ -36,15 +33,6 @@
* | | |
* | BPF stack | |
* | | |
- * +---------------+ |
- * | 8 byte skbp | |
- * R15+176 -> +---------------+ |
- * | 8 byte hlen | |
- * R15+168 -> +---------------+ |
- * | 4 byte align | |
- * +---------------+ |
- * | 4 byte temp | |
- * | for bpf_jit.S | |
* R15+160 -> +---------------+ |
* | new backchain | |
* R15+152 -> +---------------+ |
@@ -57,17 +45,11 @@
* The stack size used by the BPF program ("BPF stack" above) is passed
* via "aux->stack_depth".
*/
-#define STK_SPACE_ADD (8 + 8 + 4 + 4 + 160)
+#define STK_SPACE_ADD (160)
#define STK_160_UNUSED (160 - 12 * 8)
#define STK_OFF (STK_SPACE_ADD - STK_160_UNUSED)
-#define STK_OFF_TMP 160 /* Offset of tmp buffer on stack */
-#define STK_OFF_HLEN 168 /* Offset of SKB header length on stack */
-#define STK_OFF_SKBP 176 /* Offset of SKB pointer on stack */
#define STK_OFF_R6 (160 - 11 * 8) /* Offset of r6 on stack */
#define STK_OFF_TCCNT (160 - 12 * 8) /* Offset of tail_call_cnt on stack */
-/* Offset to skip condition code check */
-#define OFF_OK 4
-
#endif /* __ARCH_S390_NET_BPF_JIT_H */
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index dd2bcf0..d2db8ac 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -51,23 +51,21 @@
#define BPF_SIZE_MAX 0xffff /* Max size for program (16 bit branches) */
-#define SEEN_SKB 1 /* skb access */
-#define SEEN_MEM 2 /* use mem[] for temporary storage */
-#define SEEN_RET0 4 /* ret0_ip points to a valid return 0 */
-#define SEEN_LITERAL 8 /* code uses literals */
-#define SEEN_FUNC 16 /* calls C functions */
-#define SEEN_TAIL_CALL 32 /* code uses tail calls */
-#define SEEN_REG_AX 64 /* code uses constant blinding */
-#define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB)
+#define SEEN_MEM (1 << 0) /* use mem[] for temporary storage */
+#define SEEN_RET0 (1 << 1) /* ret0_ip points to a valid return 0 */
+#define SEEN_LITERAL (1 << 2) /* code uses literals */
+#define SEEN_FUNC (1 << 3) /* calls C functions */
+#define SEEN_TAIL_CALL (1 << 4) /* code uses tail calls */
+#define SEEN_REG_AX (1 << 5) /* code uses constant blinding */
+#define SEEN_STACK (SEEN_FUNC | SEEN_MEM)
/*
* s390 registers
*/
#define REG_W0 (MAX_BPF_JIT_REG + 0) /* Work register 1 (even) */
#define REG_W1 (MAX_BPF_JIT_REG + 1) /* Work register 2 (odd) */
-#define REG_SKB_DATA (MAX_BPF_JIT_REG + 2) /* SKB data register */
-#define REG_L (MAX_BPF_JIT_REG + 3) /* Literal pool register */
-#define REG_15 (MAX_BPF_JIT_REG + 4) /* Register 15 */
+#define REG_L (MAX_BPF_JIT_REG + 2) /* Literal pool register */
+#define REG_15 (MAX_BPF_JIT_REG + 3) /* Register 15 */
#define REG_0 REG_W0 /* Register 0 */
#define REG_1 REG_W1 /* Register 1 */
#define REG_2 BPF_REG_1 /* Register 2 */
@@ -92,10 +90,8 @@
[BPF_REG_9] = 10,
/* BPF stack pointer */
[BPF_REG_FP] = 13,
- /* Register for blinding (shared with REG_SKB_DATA) */
+ /* Register for blinding */
[BPF_REG_AX] = 12,
- /* SKB data pointer */
- [REG_SKB_DATA] = 12,
/* Work registers for s390x backend */
[REG_W0] = 0,
[REG_W1] = 1,
@@ -402,27 +398,6 @@
}
/*
- * For SKB access %b1 contains the SKB pointer. For "bpf_jit.S"
- * we store the SKB header length on the stack and the SKB data
- * pointer in REG_SKB_DATA if BPF_REG_AX is not used.
- */
-static void emit_load_skb_data_hlen(struct bpf_jit *jit)
-{
- /* Header length: llgf %w1,<len>(%b1) */
- EMIT6_DISP_LH(0xe3000000, 0x0016, REG_W1, REG_0, BPF_REG_1,
- offsetof(struct sk_buff, len));
- /* s %w1,<data_len>(%b1) */
- EMIT4_DISP(0x5b000000, REG_W1, BPF_REG_1,
- offsetof(struct sk_buff, data_len));
- /* stg %w1,ST_OFF_HLEN(%r0,%r15) */
- EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0, REG_15, STK_OFF_HLEN);
- if (!(jit->seen & SEEN_REG_AX))
- /* lg %skb_data,data_off(%b1) */
- EMIT6_DISP_LH(0xe3000000, 0x0004, REG_SKB_DATA, REG_0,
- BPF_REG_1, offsetof(struct sk_buff, data));
-}
-
-/*
* Emit function prologue
*
* Save registers and create stack frame if necessary.
@@ -462,12 +437,6 @@
EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0,
REG_15, 152);
}
- if (jit->seen & SEEN_SKB) {
- emit_load_skb_data_hlen(jit);
- /* stg %b1,ST_OFF_SKBP(%r0,%r15) */
- EMIT6_DISP_LH(0xe3000000, 0x0024, BPF_REG_1, REG_0, REG_15,
- STK_OFF_SKBP);
- }
}
/*
@@ -537,12 +506,12 @@
{
struct bpf_insn *insn = &fp->insnsi[i];
int jmp_off, last, insn_count = 1;
- unsigned int func_addr, mask;
u32 dst_reg = insn->dst_reg;
u32 src_reg = insn->src_reg;
u32 *addrs = jit->addrs;
s32 imm = insn->imm;
s16 off = insn->off;
+ unsigned int mask;
if (dst_reg == BPF_REG_AX || src_reg == BPF_REG_AX)
jit->seen |= SEEN_REG_AX;
@@ -1029,13 +998,6 @@
}
/* lgr %b0,%r2: load return value into %b0 */
EMIT4(0xb9040000, BPF_REG_0, REG_2);
- if ((jit->seen & SEEN_SKB) &&
- bpf_helper_changes_pkt_data((void *)func)) {
- /* lg %b1,ST_OFF_SKBP(%r15) */
- EMIT6_DISP_LH(0xe3000000, 0x0004, BPF_REG_1, REG_0,
- REG_15, STK_OFF_SKBP);
- emit_load_skb_data_hlen(jit);
- }
break;
}
case BPF_JMP | BPF_TAIL_CALL:
@@ -1235,73 +1197,6 @@
jmp_off = addrs[i + off + 1] - (addrs[i + 1] - 4);
EMIT4_PCREL(0xa7040000 | mask << 8, jmp_off);
break;
- /*
- * BPF_LD
- */
- case BPF_LD | BPF_ABS | BPF_B: /* b0 = *(u8 *) (skb->data+imm) */
- case BPF_LD | BPF_IND | BPF_B: /* b0 = *(u8 *) (skb->data+imm+src) */
- if ((BPF_MODE(insn->code) == BPF_ABS) && (imm >= 0))
- func_addr = __pa(sk_load_byte_pos);
- else
- func_addr = __pa(sk_load_byte);
- goto call_fn;
- case BPF_LD | BPF_ABS | BPF_H: /* b0 = *(u16 *) (skb->data+imm) */
- case BPF_LD | BPF_IND | BPF_H: /* b0 = *(u16 *) (skb->data+imm+src) */
- if ((BPF_MODE(insn->code) == BPF_ABS) && (imm >= 0))
- func_addr = __pa(sk_load_half_pos);
- else
- func_addr = __pa(sk_load_half);
- goto call_fn;
- case BPF_LD | BPF_ABS | BPF_W: /* b0 = *(u32 *) (skb->data+imm) */
- case BPF_LD | BPF_IND | BPF_W: /* b0 = *(u32 *) (skb->data+imm+src) */
- if ((BPF_MODE(insn->code) == BPF_ABS) && (imm >= 0))
- func_addr = __pa(sk_load_word_pos);
- else
- func_addr = __pa(sk_load_word);
- goto call_fn;
-call_fn:
- jit->seen |= SEEN_SKB | SEEN_RET0 | SEEN_FUNC;
- REG_SET_SEEN(REG_14); /* Return address of possible func call */
-
- /*
- * Implicit input:
- * BPF_REG_6 (R7) : skb pointer
- * REG_SKB_DATA (R12): skb data pointer (if no BPF_REG_AX)
- *
- * Calculated input:
- * BPF_REG_2 (R3) : offset of byte(s) to fetch in skb
- * BPF_REG_5 (R6) : return address
- *
- * Output:
- * BPF_REG_0 (R14): data read from skb
- *
- * Scratch registers (BPF_REG_1-5)
- */
-
- /* Call function: llilf %w1,func_addr */
- EMIT6_IMM(0xc00f0000, REG_W1, func_addr);
-
- /* Offset: lgfi %b2,imm */
- EMIT6_IMM(0xc0010000, BPF_REG_2, imm);
- if (BPF_MODE(insn->code) == BPF_IND)
- /* agfr %b2,%src (%src is s32 here) */
- EMIT4(0xb9180000, BPF_REG_2, src_reg);
-
- /* Reload REG_SKB_DATA if BPF_REG_AX is used */
- if (jit->seen & SEEN_REG_AX)
- /* lg %skb_data,data_off(%b6) */
- EMIT6_DISP_LH(0xe3000000, 0x0004, REG_SKB_DATA, REG_0,
- BPF_REG_6, offsetof(struct sk_buff, data));
- /* basr %b5,%w1 (%b5 is call saved) */
- EMIT2(0x0d00, BPF_REG_5, REG_W1);
-
- /*
- * Note: For fast access we jump directly after the
- * jnz instruction from bpf_jit.S
- */
- /* jnz <ret0> */
- EMIT4_PCREL(0xa7740000, jit->ret0_ip - jit->prg);
- break;
default: /* too complex, give up */
pr_err("Unknown opcode %02x\n", insn->code);
return -1;
diff --git a/arch/sparc/net/Makefile b/arch/sparc/net/Makefile
index 76fa8e9..d32aac3 100644
--- a/arch/sparc/net/Makefile
+++ b/arch/sparc/net/Makefile
@@ -1,4 +1,7 @@
#
# Arch-specific network modules
#
-obj-$(CONFIG_BPF_JIT) += bpf_jit_asm_$(BITS).o bpf_jit_comp_$(BITS).o
+obj-$(CONFIG_BPF_JIT) += bpf_jit_comp_$(BITS).o
+ifeq ($(BITS),32)
+obj-$(CONFIG_BPF_JIT) += bpf_jit_asm_32.o
+endif
diff --git a/arch/sparc/net/bpf_jit_64.h b/arch/sparc/net/bpf_jit_64.h
index 428f7fd..fbc836f 100644
--- a/arch/sparc/net/bpf_jit_64.h
+++ b/arch/sparc/net/bpf_jit_64.h
@@ -33,35 +33,6 @@
#define I5 0x1d
#define FP 0x1e
#define I7 0x1f
-
-#define r_SKB L0
-#define r_HEADLEN L4
-#define r_SKB_DATA L5
-#define r_TMP G1
-#define r_TMP2 G3
-
-/* assembly code in arch/sparc/net/bpf_jit_asm_64.S */
-extern u32 bpf_jit_load_word[];
-extern u32 bpf_jit_load_half[];
-extern u32 bpf_jit_load_byte[];
-extern u32 bpf_jit_load_byte_msh[];
-extern u32 bpf_jit_load_word_positive_offset[];
-extern u32 bpf_jit_load_half_positive_offset[];
-extern u32 bpf_jit_load_byte_positive_offset[];
-extern u32 bpf_jit_load_byte_msh_positive_offset[];
-extern u32 bpf_jit_load_word_negative_offset[];
-extern u32 bpf_jit_load_half_negative_offset[];
-extern u32 bpf_jit_load_byte_negative_offset[];
-extern u32 bpf_jit_load_byte_msh_negative_offset[];
-
-#else
-#define r_RESULT %o0
-#define r_SKB %o0
-#define r_OFF %o1
-#define r_HEADLEN %l4
-#define r_SKB_DATA %l5
-#define r_TMP %g1
-#define r_TMP2 %g3
#endif
#endif /* _BPF_JIT_H */
diff --git a/arch/sparc/net/bpf_jit_asm_64.S b/arch/sparc/net/bpf_jit_asm_64.S
deleted file mode 100644
index 7177867..0000000
--- a/arch/sparc/net/bpf_jit_asm_64.S
+++ /dev/null
@@ -1,162 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include <asm/ptrace.h>
-
-#include "bpf_jit_64.h"
-
-#define SAVE_SZ 176
-#define SCRATCH_OFF STACK_BIAS + 128
-#define BE_PTR(label) be,pn %xcc, label
-#define SIGN_EXTEND(reg) sra reg, 0, reg
-
-#define SKF_MAX_NEG_OFF (-0x200000) /* SKF_LL_OFF from filter.h */
-
- .text
- .globl bpf_jit_load_word
-bpf_jit_load_word:
- cmp r_OFF, 0
- bl bpf_slow_path_word_neg
- nop
- .globl bpf_jit_load_word_positive_offset
-bpf_jit_load_word_positive_offset:
- sub r_HEADLEN, r_OFF, r_TMP
- cmp r_TMP, 3
- ble bpf_slow_path_word
- add r_SKB_DATA, r_OFF, r_TMP
- andcc r_TMP, 3, %g0
- bne load_word_unaligned
- nop
- retl
- ld [r_TMP], r_RESULT
-load_word_unaligned:
- ldub [r_TMP + 0x0], r_OFF
- ldub [r_TMP + 0x1], r_TMP2
- sll r_OFF, 8, r_OFF
- or r_OFF, r_TMP2, r_OFF
- ldub [r_TMP + 0x2], r_TMP2
- sll r_OFF, 8, r_OFF
- or r_OFF, r_TMP2, r_OFF
- ldub [r_TMP + 0x3], r_TMP2
- sll r_OFF, 8, r_OFF
- retl
- or r_OFF, r_TMP2, r_RESULT
-
- .globl bpf_jit_load_half
-bpf_jit_load_half:
- cmp r_OFF, 0
- bl bpf_slow_path_half_neg
- nop
- .globl bpf_jit_load_half_positive_offset
-bpf_jit_load_half_positive_offset:
- sub r_HEADLEN, r_OFF, r_TMP
- cmp r_TMP, 1
- ble bpf_slow_path_half
- add r_SKB_DATA, r_OFF, r_TMP
- andcc r_TMP, 1, %g0
- bne load_half_unaligned
- nop
- retl
- lduh [r_TMP], r_RESULT
-load_half_unaligned:
- ldub [r_TMP + 0x0], r_OFF
- ldub [r_TMP + 0x1], r_TMP2
- sll r_OFF, 8, r_OFF
- retl
- or r_OFF, r_TMP2, r_RESULT
-
- .globl bpf_jit_load_byte
-bpf_jit_load_byte:
- cmp r_OFF, 0
- bl bpf_slow_path_byte_neg
- nop
- .globl bpf_jit_load_byte_positive_offset
-bpf_jit_load_byte_positive_offset:
- cmp r_OFF, r_HEADLEN
- bge bpf_slow_path_byte
- nop
- retl
- ldub [r_SKB_DATA + r_OFF], r_RESULT
-
-#define bpf_slow_path_common(LEN) \
- save %sp, -SAVE_SZ, %sp; \
- mov %i0, %o0; \
- mov %i1, %o1; \
- add %fp, SCRATCH_OFF, %o2; \
- call skb_copy_bits; \
- mov (LEN), %o3; \
- cmp %o0, 0; \
- restore;
-
-bpf_slow_path_word:
- bpf_slow_path_common(4)
- bl bpf_error
- ld [%sp + SCRATCH_OFF], r_RESULT
- retl
- nop
-bpf_slow_path_half:
- bpf_slow_path_common(2)
- bl bpf_error
- lduh [%sp + SCRATCH_OFF], r_RESULT
- retl
- nop
-bpf_slow_path_byte:
- bpf_slow_path_common(1)
- bl bpf_error
- ldub [%sp + SCRATCH_OFF], r_RESULT
- retl
- nop
-
-#define bpf_negative_common(LEN) \
- save %sp, -SAVE_SZ, %sp; \
- mov %i0, %o0; \
- mov %i1, %o1; \
- SIGN_EXTEND(%o1); \
- call bpf_internal_load_pointer_neg_helper; \
- mov (LEN), %o2; \
- mov %o0, r_TMP; \
- cmp %o0, 0; \
- BE_PTR(bpf_error); \
- restore;
-
-bpf_slow_path_word_neg:
- sethi %hi(SKF_MAX_NEG_OFF), r_TMP
- cmp r_OFF, r_TMP
- bl bpf_error
- nop
- .globl bpf_jit_load_word_negative_offset
-bpf_jit_load_word_negative_offset:
- bpf_negative_common(4)
- andcc r_TMP, 3, %g0
- bne load_word_unaligned
- nop
- retl
- ld [r_TMP], r_RESULT
-
-bpf_slow_path_half_neg:
- sethi %hi(SKF_MAX_NEG_OFF), r_TMP
- cmp r_OFF, r_TMP
- bl bpf_error
- nop
- .globl bpf_jit_load_half_negative_offset
-bpf_jit_load_half_negative_offset:
- bpf_negative_common(2)
- andcc r_TMP, 1, %g0
- bne load_half_unaligned
- nop
- retl
- lduh [r_TMP], r_RESULT
-
-bpf_slow_path_byte_neg:
- sethi %hi(SKF_MAX_NEG_OFF), r_TMP
- cmp r_OFF, r_TMP
- bl bpf_error
- nop
- .globl bpf_jit_load_byte_negative_offset
-bpf_jit_load_byte_negative_offset:
- bpf_negative_common(1)
- retl
- ldub [r_TMP], r_RESULT
-
-bpf_error:
- /* Make the JIT program itself return zero. */
- ret
- restore %g0, %g0, %o0
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 48a2586..222785a 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -48,10 +48,6 @@
}
}
-#define SEEN_DATAREF 1 /* might call external helpers */
-#define SEEN_XREG 2 /* ebx is used */
-#define SEEN_MEM 4 /* use mem[] for temporary storage */
-
#define S13(X) ((X) & 0x1fff)
#define S5(X) ((X) & 0x1f)
#define IMMED 0x00002000
@@ -198,7 +194,6 @@
bool tmp_1_used;
bool tmp_2_used;
bool tmp_3_used;
- bool saw_ld_abs_ind;
bool saw_frame_pointer;
bool saw_call;
bool saw_tail_call;
@@ -207,9 +202,7 @@
#define TMP_REG_1 (MAX_BPF_JIT_REG + 0)
#define TMP_REG_2 (MAX_BPF_JIT_REG + 1)
-#define SKB_HLEN_REG (MAX_BPF_JIT_REG + 2)
-#define SKB_DATA_REG (MAX_BPF_JIT_REG + 3)
-#define TMP_REG_3 (MAX_BPF_JIT_REG + 4)
+#define TMP_REG_3 (MAX_BPF_JIT_REG + 2)
/* Map BPF registers to SPARC registers */
static const int bpf2sparc[] = {
@@ -238,9 +231,6 @@
[TMP_REG_1] = G1,
[TMP_REG_2] = G2,
[TMP_REG_3] = G3,
-
- [SKB_HLEN_REG] = L4,
- [SKB_DATA_REG] = L5,
};
static void emit(const u32 insn, struct jit_ctx *ctx)
@@ -800,25 +790,6 @@
return 0;
}
-static void load_skb_regs(struct jit_ctx *ctx, u8 r_skb)
-{
- const u8 r_headlen = bpf2sparc[SKB_HLEN_REG];
- const u8 r_data = bpf2sparc[SKB_DATA_REG];
- const u8 r_tmp = bpf2sparc[TMP_REG_1];
- unsigned int off;
-
- off = offsetof(struct sk_buff, len);
- emit(LD32I | RS1(r_skb) | S13(off) | RD(r_headlen), ctx);
-
- off = offsetof(struct sk_buff, data_len);
- emit(LD32I | RS1(r_skb) | S13(off) | RD(r_tmp), ctx);
-
- emit(SUB | RS1(r_headlen) | RS2(r_tmp) | RD(r_headlen), ctx);
-
- off = offsetof(struct sk_buff, data);
- emit(LDPTRI | RS1(r_skb) | S13(off) | RD(r_data), ctx);
-}
-
/* Just skip the save instruction and the ctx register move. */
#define BPF_TAILCALL_PROLOGUE_SKIP 16
#define BPF_TAILCALL_CNT_SP_OFF (STACK_BIAS + 128)
@@ -857,9 +828,6 @@
emit_reg_move(I0, O0, ctx);
/* If you add anything here, adjust BPF_TAILCALL_PROLOGUE_SKIP above. */
-
- if (ctx->saw_ld_abs_ind)
- load_skb_regs(ctx, bpf2sparc[BPF_REG_1]);
}
static void build_epilogue(struct jit_ctx *ctx)
@@ -926,7 +894,6 @@
const int i = insn - ctx->prog->insnsi;
const s16 off = insn->off;
const s32 imm = insn->imm;
- u32 *func;
if (insn->src_reg == BPF_REG_FP)
ctx->saw_frame_pointer = true;
@@ -1225,16 +1192,11 @@
u8 *func = ((u8 *)__bpf_call_base) + imm;
ctx->saw_call = true;
- if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
- emit_reg_move(bpf2sparc[BPF_REG_1], L7, ctx);
emit_call((u32 *)func, ctx);
emit_nop(ctx);
emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
-
- if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
- load_skb_regs(ctx, L7);
break;
}
@@ -1412,43 +1374,6 @@
emit_nop(ctx);
break;
}
-#define CHOOSE_LOAD_FUNC(K, func) \
- ((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative_offset : func) : func##_positive_offset)
-
- /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
- case BPF_LD | BPF_ABS | BPF_W:
- func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_word);
- goto common_load;
- case BPF_LD | BPF_ABS | BPF_H:
- func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_half);
- goto common_load;
- case BPF_LD | BPF_ABS | BPF_B:
- func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_byte);
- goto common_load;
- /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
- case BPF_LD | BPF_IND | BPF_W:
- func = bpf_jit_load_word;
- goto common_load;
- case BPF_LD | BPF_IND | BPF_H:
- func = bpf_jit_load_half;
- goto common_load;
-
- case BPF_LD | BPF_IND | BPF_B:
- func = bpf_jit_load_byte;
- common_load:
- ctx->saw_ld_abs_ind = true;
-
- emit_reg_move(bpf2sparc[BPF_REG_6], O0, ctx);
- emit_loadimm(imm, O1, ctx);
-
- if (BPF_MODE(code) == BPF_IND)
- emit_alu(ADD, src, O1, ctx);
-
- emit_call(func, ctx);
- emit_alu_K(SRA, O1, 0, ctx);
-
- emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
- break;
default:
pr_err_once("unknown opcode %02x\n", code);
@@ -1583,12 +1508,11 @@
build_epilogue(&ctx);
if (bpf_jit_enable > 1)
- pr_info("Pass %d: shrink = %d, seen = [%c%c%c%c%c%c%c]\n", pass,
+ pr_info("Pass %d: shrink = %d, seen = [%c%c%c%c%c%c]\n", pass,
image_size - (ctx.idx * 4),
ctx.tmp_1_used ? '1' : ' ',
ctx.tmp_2_used ? '2' : ' ',
ctx.tmp_3_used ? '3' : ' ',
- ctx.saw_ld_abs_ind ? 'L' : ' ',
ctx.saw_frame_pointer ? 'F' : ' ',
ctx.saw_call ? 'C' : ' ',
ctx.saw_tail_call ? 'T' : ' ');
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c07f492..d51a71d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -138,7 +138,7 @@
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
- select HAVE_EBPF_JIT if X86_64
+ select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_EXIT_THREAD
select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 8b38df9..f6f6c63 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -308,16 +308,20 @@
* lfence
* jmp spec_trap
* do_rop:
- * mov %rax,(%rsp)
+ * mov %rax,(%rsp) for x86_64
+ * mov %edx,(%esp) for x86_32
* retq
*
* Without retpolines configured:
*
- * jmp *%rax
+ * jmp *%rax for x86_64
+ * jmp *%edx for x86_32
*/
#ifdef CONFIG_RETPOLINE
-# define RETPOLINE_RAX_BPF_JIT_SIZE 17
-# define RETPOLINE_RAX_BPF_JIT() \
+# ifdef CONFIG_X86_64
+# define RETPOLINE_RAX_BPF_JIT_SIZE 17
+# define RETPOLINE_RAX_BPF_JIT() \
+do { \
EMIT1_off32(0xE8, 7); /* callq do_rop */ \
/* spec_trap: */ \
EMIT2(0xF3, 0x90); /* pause */ \
@@ -325,11 +329,30 @@
EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \
/* do_rop: */ \
EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */ \
- EMIT1(0xC3); /* retq */
-#else
-# define RETPOLINE_RAX_BPF_JIT_SIZE 2
-# define RETPOLINE_RAX_BPF_JIT() \
- EMIT2(0xFF, 0xE0); /* jmp *%rax */
+ EMIT1(0xC3); /* retq */ \
+} while (0)
+# else /* !CONFIG_X86_64 */
+# define RETPOLINE_EDX_BPF_JIT() \
+do { \
+ EMIT1_off32(0xE8, 7); /* call do_rop */ \
+ /* spec_trap: */ \
+ EMIT2(0xF3, 0x90); /* pause */ \
+ EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \
+ EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \
+ /* do_rop: */ \
+ EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */ \
+ EMIT1(0xC3); /* ret */ \
+} while (0)
+# endif
+#else /* !CONFIG_RETPOLINE */
+# ifdef CONFIG_X86_64
+# define RETPOLINE_RAX_BPF_JIT_SIZE 2
+# define RETPOLINE_RAX_BPF_JIT() \
+ EMIT2(0xFF, 0xE0); /* jmp *%rax */
+# else /* !CONFIG_X86_64 */
+# define RETPOLINE_EDX_BPF_JIT() \
+ EMIT2(0xFF, 0xE2) /* jmp *%edx */
+# endif
#endif
#endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
index fefb4b6..59e123d 100644
--- a/arch/x86/net/Makefile
+++ b/arch/x86/net/Makefile
@@ -1,6 +1,9 @@
#
# Arch-specific network modules
#
-OBJECT_FILES_NON_STANDARD_bpf_jit.o += y
-obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
+ifeq ($(CONFIG_X86_32),y)
+ obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o
+else
+ obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o
+endif
diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
deleted file mode 100644
index b33093f..0000000
--- a/arch/x86/net/bpf_jit.S
+++ /dev/null
@@ -1,154 +0,0 @@
-/* bpf_jit.S : BPF JIT helper functions
- *
- * Copyright (C) 2011 Eric Dumazet (eric.dumazet@gmail.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; version 2
- * of the License.
- */
-#include <linux/linkage.h>
-#include <asm/frame.h>
-
-/*
- * Calling convention :
- * rbx : skb pointer (callee saved)
- * esi : offset of byte(s) to fetch in skb (can be scratched)
- * r10 : copy of skb->data
- * r9d : hlen = skb->len - skb->data_len
- */
-#define SKBDATA %r10
-#define SKF_MAX_NEG_OFF $(-0x200000) /* SKF_LL_OFF from filter.h */
-
-#define FUNC(name) \
- .globl name; \
- .type name, @function; \
- name:
-
-FUNC(sk_load_word)
- test %esi,%esi
- js bpf_slow_path_word_neg
-
-FUNC(sk_load_word_positive_offset)
- mov %r9d,%eax # hlen
- sub %esi,%eax # hlen - offset
- cmp $3,%eax
- jle bpf_slow_path_word
- mov (SKBDATA,%rsi),%eax
- bswap %eax /* ntohl() */
- ret
-
-FUNC(sk_load_half)
- test %esi,%esi
- js bpf_slow_path_half_neg
-
-FUNC(sk_load_half_positive_offset)
- mov %r9d,%eax
- sub %esi,%eax # hlen - offset
- cmp $1,%eax
- jle bpf_slow_path_half
- movzwl (SKBDATA,%rsi),%eax
- rol $8,%ax # ntohs()
- ret
-
-FUNC(sk_load_byte)
- test %esi,%esi
- js bpf_slow_path_byte_neg
-
-FUNC(sk_load_byte_positive_offset)
- cmp %esi,%r9d /* if (offset >= hlen) goto bpf_slow_path_byte */
- jle bpf_slow_path_byte
- movzbl (SKBDATA,%rsi),%eax
- ret
-
-/* rsi contains offset and can be scratched */
-#define bpf_slow_path_common(LEN) \
- lea 32(%rbp), %rdx;\
- FRAME_BEGIN; \
- mov %rbx, %rdi; /* arg1 == skb */ \
- push %r9; \
- push SKBDATA; \
-/* rsi already has offset */ \
- mov $LEN,%ecx; /* len */ \
- call skb_copy_bits; \
- test %eax,%eax; \
- pop SKBDATA; \
- pop %r9; \
- FRAME_END
-
-
-bpf_slow_path_word:
- bpf_slow_path_common(4)
- js bpf_error
- mov 32(%rbp),%eax
- bswap %eax
- ret
-
-bpf_slow_path_half:
- bpf_slow_path_common(2)
- js bpf_error
- mov 32(%rbp),%ax
- rol $8,%ax
- movzwl %ax,%eax
- ret
-
-bpf_slow_path_byte:
- bpf_slow_path_common(1)
- js bpf_error
- movzbl 32(%rbp),%eax
- ret
-
-#define sk_negative_common(SIZE) \
- FRAME_BEGIN; \
- mov %rbx, %rdi; /* arg1 == skb */ \
- push %r9; \
- push SKBDATA; \
-/* rsi already has offset */ \
- mov $SIZE,%edx; /* size */ \
- call bpf_internal_load_pointer_neg_helper; \
- test %rax,%rax; \
- pop SKBDATA; \
- pop %r9; \
- FRAME_END; \
- jz bpf_error
-
-bpf_slow_path_word_neg:
- cmp SKF_MAX_NEG_OFF, %esi /* test range */
- jl bpf_error /* offset lower -> error */
-
-FUNC(sk_load_word_negative_offset)
- sk_negative_common(4)
- mov (%rax), %eax
- bswap %eax
- ret
-
-bpf_slow_path_half_neg:
- cmp SKF_MAX_NEG_OFF, %esi
- jl bpf_error
-
-FUNC(sk_load_half_negative_offset)
- sk_negative_common(2)
- mov (%rax),%ax
- rol $8,%ax
- movzwl %ax,%eax
- ret
-
-bpf_slow_path_byte_neg:
- cmp SKF_MAX_NEG_OFF, %esi
- jl bpf_error
-
-FUNC(sk_load_byte_negative_offset)
- sk_negative_common(1)
- movzbl (%rax), %eax
- ret
-
-bpf_error:
-# force a return 0 from jit handler
- xor %eax,%eax
- mov (%rbp),%rbx
- mov 8(%rbp),%r13
- mov 16(%rbp),%r14
- mov 24(%rbp),%r15
- add $40, %rbp
- leaveq
- ret
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 263c845..8fca446 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1,4 +1,5 @@
-/* bpf_jit_comp.c : BPF JIT compiler
+/*
+ * bpf_jit_comp.c: BPF JIT compiler
*
* Copyright (C) 2011-2013 Eric Dumazet (eric.dumazet@gmail.com)
* Internal BPF Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
@@ -16,15 +17,6 @@
#include <asm/set_memory.h>
#include <asm/nospec-branch.h>
-/*
- * assembly code in arch/x86/net/bpf_jit.S
- */
-extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
-extern u8 sk_load_word_positive_offset[], sk_load_half_positive_offset[];
-extern u8 sk_load_byte_positive_offset[];
-extern u8 sk_load_word_negative_offset[], sk_load_half_negative_offset[];
-extern u8 sk_load_byte_negative_offset[];
-
static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
{
if (len == 1)
@@ -45,14 +37,15 @@
#define EMIT2(b1, b2) EMIT((b1) + ((b2) << 8), 2)
#define EMIT3(b1, b2, b3) EMIT((b1) + ((b2) << 8) + ((b3) << 16), 3)
#define EMIT4(b1, b2, b3, b4) EMIT((b1) + ((b2) << 8) + ((b3) << 16) + ((b4) << 24), 4)
+
#define EMIT1_off32(b1, off) \
- do {EMIT1(b1); EMIT(off, 4); } while (0)
+ do { EMIT1(b1); EMIT(off, 4); } while (0)
#define EMIT2_off32(b1, b2, off) \
- do {EMIT2(b1, b2); EMIT(off, 4); } while (0)
+ do { EMIT2(b1, b2); EMIT(off, 4); } while (0)
#define EMIT3_off32(b1, b2, b3, off) \
- do {EMIT3(b1, b2, b3); EMIT(off, 4); } while (0)
+ do { EMIT3(b1, b2, b3); EMIT(off, 4); } while (0)
#define EMIT4_off32(b1, b2, b3, b4, off) \
- do {EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
+ do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
static bool is_imm8(int value)
{
@@ -70,9 +63,10 @@
}
/* mov dst, src */
-#define EMIT_mov(DST, SRC) \
- do {if (DST != SRC) \
- EMIT3(add_2mod(0x48, DST, SRC), 0x89, add_2reg(0xC0, DST, SRC)); \
+#define EMIT_mov(DST, SRC) \
+ do { \
+ if (DST != SRC) \
+ EMIT3(add_2mod(0x48, DST, SRC), 0x89, add_2reg(0xC0, DST, SRC)); \
} while (0)
static int bpf_size_to_x86_bytes(int bpf_size)
@@ -89,7 +83,8 @@
return 0;
}
-/* list of x86 cond jumps opcodes (. + s8)
+/*
+ * List of x86 cond jumps opcodes (. + s8)
* Add 0x10 (and an extra 0x0f) to generate far jumps (. + s32)
*/
#define X86_JB 0x72
@@ -103,38 +98,37 @@
#define X86_JLE 0x7E
#define X86_JG 0x7F
-#define CHOOSE_LOAD_FUNC(K, func) \
- ((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative_offset : func) : func##_positive_offset)
-
-/* pick a register outside of BPF range for JIT internal work */
+/* Pick a register outside of BPF range for JIT internal work */
#define AUX_REG (MAX_BPF_JIT_REG + 1)
-/* The following table maps BPF registers to x64 registers.
+/*
+ * The following table maps BPF registers to x86-64 registers.
*
- * x64 register r12 is unused, since if used as base address
+ * x86-64 register R12 is unused, since if used as base address
* register in load/store instructions, it always needs an
* extra byte of encoding and is callee saved.
*
- * r9 caches skb->len - skb->data_len
- * r10 caches skb->data, and used for blinding (if enabled)
+ * Also x86-64 register R9 is unused. x86-64 register R10 is
+ * used for blinding (if enabled).
*/
static const int reg2hex[] = {
- [BPF_REG_0] = 0, /* rax */
- [BPF_REG_1] = 7, /* rdi */
- [BPF_REG_2] = 6, /* rsi */
- [BPF_REG_3] = 2, /* rdx */
- [BPF_REG_4] = 1, /* rcx */
- [BPF_REG_5] = 0, /* r8 */
- [BPF_REG_6] = 3, /* rbx callee saved */
- [BPF_REG_7] = 5, /* r13 callee saved */
- [BPF_REG_8] = 6, /* r14 callee saved */
- [BPF_REG_9] = 7, /* r15 callee saved */
- [BPF_REG_FP] = 5, /* rbp readonly */
- [BPF_REG_AX] = 2, /* r10 temp register */
- [AUX_REG] = 3, /* r11 temp register */
+ [BPF_REG_0] = 0, /* RAX */
+ [BPF_REG_1] = 7, /* RDI */
+ [BPF_REG_2] = 6, /* RSI */
+ [BPF_REG_3] = 2, /* RDX */
+ [BPF_REG_4] = 1, /* RCX */
+ [BPF_REG_5] = 0, /* R8 */
+ [BPF_REG_6] = 3, /* RBX callee saved */
+ [BPF_REG_7] = 5, /* R13 callee saved */
+ [BPF_REG_8] = 6, /* R14 callee saved */
+ [BPF_REG_9] = 7, /* R15 callee saved */
+ [BPF_REG_FP] = 5, /* RBP readonly */
+ [BPF_REG_AX] = 2, /* R10 temp register */
+ [AUX_REG] = 3, /* R11 temp register */
};
-/* is_ereg() == true if BPF register 'reg' maps to x64 r8..r15
+/*
+ * is_ereg() == true if BPF register 'reg' maps to x86-64 r8..r15
* which need extra byte of encoding.
* rax,rcx,...,rbp have simpler encoding
*/
@@ -153,7 +147,7 @@
return reg == BPF_REG_0;
}
-/* add modifiers if 'reg' maps to x64 registers r8..r15 */
+/* Add modifiers if 'reg' maps to x86-64 registers R8..R15 */
static u8 add_1mod(u8 byte, u32 reg)
{
if (is_ereg(reg))
@@ -170,13 +164,13 @@
return byte;
}
-/* encode 'dst_reg' register into x64 opcode 'byte' */
+/* Encode 'dst_reg' register into x86-64 opcode 'byte' */
static u8 add_1reg(u8 byte, u32 dst_reg)
{
return byte + reg2hex[dst_reg];
}
-/* encode 'dst_reg' and 'src_reg' registers into x64 opcode 'byte' */
+/* Encode 'dst_reg' and 'src_reg' registers into x86-64 opcode 'byte' */
static u8 add_2reg(u8 byte, u32 dst_reg, u32 src_reg)
{
return byte + reg2hex[dst_reg] + (reg2hex[src_reg] << 3);
@@ -184,27 +178,24 @@
static void jit_fill_hole(void *area, unsigned int size)
{
- /* fill whole space with int3 instructions */
+ /* Fill whole space with INT3 instructions */
memset(area, 0xcc, size);
}
struct jit_context {
- int cleanup_addr; /* epilogue code offset */
- bool seen_ld_abs;
- bool seen_ax_reg;
+ int cleanup_addr; /* Epilogue code offset */
};
-/* maximum number of bytes emitted while JITing one eBPF insn */
+/* Maximum number of bytes emitted while JITing one eBPF insn */
#define BPF_MAX_INSN_SIZE 128
#define BPF_INSN_SAFETY 64
-#define AUX_STACK_SPACE \
- (32 /* space for rbx, r13, r14, r15 */ + \
- 8 /* space for skb_copy_bits() buffer */)
+#define AUX_STACK_SPACE 40 /* Space for RBX, R13, R14, R15, tailcnt */
-#define PROLOGUE_SIZE 37
+#define PROLOGUE_SIZE 37
-/* emit x64 prologue code for BPF program and check it's size.
+/*
+ * Emit x86-64 prologue code for BPF program and check its size.
* bpf_tail_call helper will skip it while jumping into another program
*/
static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf)
@@ -212,8 +203,11 @@
u8 *prog = *pprog;
int cnt = 0;
- EMIT1(0x55); /* push rbp */
- EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */
+ /* push rbp */
+ EMIT1(0x55);
+
+ /* mov rbp,rsp */
+ EMIT3(0x48, 0x89, 0xE5);
/* sub rsp, rounded_stack_depth + AUX_STACK_SPACE */
EMIT3_off32(0x48, 0x81, 0xEC,
@@ -222,19 +216,8 @@
/* sub rbp, AUX_STACK_SPACE */
EMIT4(0x48, 0x83, 0xED, AUX_STACK_SPACE);
- /* all classic BPF filters use R6(rbx) save it */
-
/* mov qword ptr [rbp+0],rbx */
EMIT4(0x48, 0x89, 0x5D, 0);
-
- /* bpf_convert_filter() maps classic BPF register X to R7 and uses R8
- * as temporary, so all tcpdump filters need to spill/fill R7(r13) and
- * R8(r14). R9(r15) spill could be made conditional, but there is only
- * one 'bpf_error' return path out of helper functions inside bpf_jit.S
- * The overhead of extra spill is negligible for any filter other
- * than synthetic ones. Therefore not worth adding complexity.
- */
-
/* mov qword ptr [rbp+8],r13 */
EMIT4(0x4C, 0x89, 0x6D, 8);
/* mov qword ptr [rbp+16],r14 */
@@ -243,9 +226,10 @@
EMIT4(0x4C, 0x89, 0x7D, 24);
if (!ebpf_from_cbpf) {
- /* Clear the tail call counter (tail_call_cnt): for eBPF tail
+ /*
+ * Clear the tail call counter (tail_call_cnt): for eBPF tail
* calls we need to reset the counter to 0. It's done in two
- * instructions, resetting rax register to 0, and moving it
+ * instructions, resetting RAX register to 0, and moving it
* to the counter location.
*/
@@ -260,7 +244,9 @@
*pprog = prog;
}
-/* generate the following code:
+/*
+ * Generate the following code:
+ *
* ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
* if (index >= array->map.max_entries)
* goto out;
@@ -278,23 +264,26 @@
int label1, label2, label3;
int cnt = 0;
- /* rdi - pointer to ctx
+ /*
+ * rdi - pointer to ctx
* rsi - pointer to bpf_array
* rdx - index in bpf_array
*/
- /* if (index >= array->map.max_entries)
- * goto out;
+ /*
+ * if (index >= array->map.max_entries)
+ * goto out;
*/
EMIT2(0x89, 0xD2); /* mov edx, edx */
EMIT3(0x39, 0x56, /* cmp dword ptr [rsi + 16], edx */
offsetof(struct bpf_array, map.max_entries));
-#define OFFSET1 (41 + RETPOLINE_RAX_BPF_JIT_SIZE) /* number of bytes to jump */
+#define OFFSET1 (41 + RETPOLINE_RAX_BPF_JIT_SIZE) /* Number of bytes to jump */
EMIT2(X86_JBE, OFFSET1); /* jbe out */
label1 = cnt;
- /* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
- * goto out;
+ /*
+ * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+ * goto out;
*/
EMIT2_off32(0x8B, 0x85, 36); /* mov eax, dword ptr [rbp + 36] */
EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */
@@ -308,8 +297,9 @@
EMIT4_off32(0x48, 0x8B, 0x84, 0xD6, /* mov rax, [rsi + rdx * 8 + offsetof(...)] */
offsetof(struct bpf_array, ptrs));
- /* if (prog == NULL)
- * goto out;
+ /*
+ * if (prog == NULL)
+ * goto out;
*/
EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */
#define OFFSET3 (8 + RETPOLINE_RAX_BPF_JIT_SIZE)
@@ -321,7 +311,8 @@
offsetof(struct bpf_prog, bpf_func));
EMIT4(0x48, 0x83, 0xC0, PROLOGUE_SIZE); /* add rax, prologue_size */
- /* now we're ready to jump into next BPF program
+ /*
+ * Wow we're ready to jump into next BPF program
* rdi == ctx (1st arg)
* rax == prog->bpf_func + prologue_size
*/
@@ -334,26 +325,6 @@
*pprog = prog;
}
-
-static void emit_load_skb_data_hlen(u8 **pprog)
-{
- u8 *prog = *pprog;
- int cnt = 0;
-
- /* r9d = skb->len - skb->data_len (headlen)
- * r10 = skb->data
- */
- /* mov %r9d, off32(%rdi) */
- EMIT3_off32(0x44, 0x8b, 0x8f, offsetof(struct sk_buff, len));
-
- /* sub %r9d, off32(%rdi) */
- EMIT3_off32(0x44, 0x2b, 0x8f, offsetof(struct sk_buff, data_len));
-
- /* mov %r10, off32(%rdi) */
- EMIT3_off32(0x4c, 0x8b, 0x97, offsetof(struct sk_buff, data));
- *pprog = prog;
-}
-
static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
u32 dst_reg, const u32 imm32)
{
@@ -361,7 +332,8 @@
u8 b1, b2, b3;
int cnt = 0;
- /* optimization: if imm32 is positive, use 'mov %eax, imm32'
+ /*
+ * Optimization: if imm32 is positive, use 'mov %eax, imm32'
* (which zero-extends imm32) to save 2 bytes.
*/
if (sign_propagate && (s32)imm32 < 0) {
@@ -373,7 +345,8 @@
goto done;
}
- /* optimization: if imm32 is zero, use 'xor %eax, %eax'
+ /*
+ * Optimization: if imm32 is zero, use 'xor %eax, %eax'
* to save 3 bytes.
*/
if (imm32 == 0) {
@@ -400,7 +373,8 @@
int cnt = 0;
if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) {
- /* For emitting plain u32, where sign bit must not be
+ /*
+ * For emitting plain u32, where sign bit must not be
* propagated LLVM tends to load imm64 over mov32
* directly, so save couple of bytes by just doing
* 'mov %eax, imm32' instead.
@@ -439,8 +413,6 @@
{
struct bpf_insn *insn = bpf_prog->insnsi;
int insn_cnt = bpf_prog->len;
- bool seen_ld_abs = ctx->seen_ld_abs | (oldproglen == 0);
- bool seen_ax_reg = ctx->seen_ax_reg | (oldproglen == 0);
bool seen_exit = false;
u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
int i, cnt = 0;
@@ -450,9 +422,6 @@
emit_prologue(&prog, bpf_prog->aux->stack_depth,
bpf_prog_was_classic(bpf_prog));
- if (seen_ld_abs)
- emit_load_skb_data_hlen(&prog);
-
for (i = 0; i < insn_cnt; i++, insn++) {
const s32 imm32 = insn->imm;
u32 dst_reg = insn->dst_reg;
@@ -460,13 +429,9 @@
u8 b2 = 0, b3 = 0;
s64 jmp_offset;
u8 jmp_cond;
- bool reload_skb_data;
int ilen;
u8 *func;
- if (dst_reg == BPF_REG_AX || src_reg == BPF_REG_AX)
- ctx->seen_ax_reg = seen_ax_reg = true;
-
switch (insn->code) {
/* ALU */
case BPF_ALU | BPF_ADD | BPF_X:
@@ -525,7 +490,8 @@
else if (is_ereg(dst_reg))
EMIT1(add_1mod(0x40, dst_reg));
- /* b3 holds 'normal' opcode, b2 short form only valid
+ /*
+ * b3 holds 'normal' opcode, b2 short form only valid
* in case dst is eax/rax.
*/
switch (BPF_OP(insn->code)) {
@@ -593,7 +559,8 @@
/* mov rax, dst_reg */
EMIT_mov(BPF_REG_0, dst_reg);
- /* xor edx, edx
+ /*
+ * xor edx, edx
* equivalent to 'xor rdx, rdx', but one byte less
*/
EMIT2(0x31, 0xd2);
@@ -655,7 +622,7 @@
}
break;
}
- /* shifts */
+ /* Shifts */
case BPF_ALU | BPF_LSH | BPF_K:
case BPF_ALU | BPF_RSH | BPF_K:
case BPF_ALU | BPF_ARSH | BPF_K:
@@ -686,7 +653,7 @@
case BPF_ALU64 | BPF_RSH | BPF_X:
case BPF_ALU64 | BPF_ARSH | BPF_X:
- /* check for bad case when dst_reg == rcx */
+ /* Check for bad case when dst_reg == rcx */
if (dst_reg == BPF_REG_4) {
/* mov r11, dst_reg */
EMIT_mov(AUX_REG, dst_reg);
@@ -724,13 +691,13 @@
case BPF_ALU | BPF_END | BPF_FROM_BE:
switch (imm32) {
case 16:
- /* emit 'ror %ax, 8' to swap lower 2 bytes */
+ /* Emit 'ror %ax, 8' to swap lower 2 bytes */
EMIT1(0x66);
if (is_ereg(dst_reg))
EMIT1(0x41);
EMIT3(0xC1, add_1reg(0xC8, dst_reg), 8);
- /* emit 'movzwl eax, ax' */
+ /* Emit 'movzwl eax, ax' */
if (is_ereg(dst_reg))
EMIT3(0x45, 0x0F, 0xB7);
else
@@ -738,7 +705,7 @@
EMIT1(add_2reg(0xC0, dst_reg, dst_reg));
break;
case 32:
- /* emit 'bswap eax' to swap lower 4 bytes */
+ /* Emit 'bswap eax' to swap lower 4 bytes */
if (is_ereg(dst_reg))
EMIT2(0x41, 0x0F);
else
@@ -746,7 +713,7 @@
EMIT1(add_1reg(0xC8, dst_reg));
break;
case 64:
- /* emit 'bswap rax' to swap 8 bytes */
+ /* Emit 'bswap rax' to swap 8 bytes */
EMIT3(add_1mod(0x48, dst_reg), 0x0F,
add_1reg(0xC8, dst_reg));
break;
@@ -756,7 +723,8 @@
case BPF_ALU | BPF_END | BPF_FROM_LE:
switch (imm32) {
case 16:
- /* emit 'movzwl eax, ax' to zero extend 16-bit
+ /*
+ * Emit 'movzwl eax, ax' to zero extend 16-bit
* into 64 bit
*/
if (is_ereg(dst_reg))
@@ -766,7 +734,7 @@
EMIT1(add_2reg(0xC0, dst_reg, dst_reg));
break;
case 32:
- /* emit 'mov eax, eax' to clear upper 32-bits */
+ /* Emit 'mov eax, eax' to clear upper 32-bits */
if (is_ereg(dst_reg))
EMIT1(0x45);
EMIT2(0x89, add_2reg(0xC0, dst_reg, dst_reg));
@@ -809,9 +777,9 @@
/* STX: *(u8*)(dst_reg + off) = src_reg */
case BPF_STX | BPF_MEM | BPF_B:
- /* emit 'mov byte ptr [rax + off], al' */
+ /* Emit 'mov byte ptr [rax + off], al' */
if (is_ereg(dst_reg) || is_ereg(src_reg) ||
- /* have to add extra byte for x86 SIL, DIL regs */
+ /* We have to add extra byte for x86 SIL, DIL regs */
src_reg == BPF_REG_1 || src_reg == BPF_REG_2)
EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x88);
else
@@ -840,25 +808,26 @@
/* LDX: dst_reg = *(u8*)(src_reg + off) */
case BPF_LDX | BPF_MEM | BPF_B:
- /* emit 'movzx rax, byte ptr [rax + off]' */
+ /* Emit 'movzx rax, byte ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_H:
- /* emit 'movzx rax, word ptr [rax + off]' */
+ /* Emit 'movzx rax, word ptr [rax + off]' */
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_W:
- /* emit 'mov eax, dword ptr [rax+0x14]' */
+ /* Emit 'mov eax, dword ptr [rax+0x14]' */
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
else
EMIT1(0x8B);
goto ldx;
case BPF_LDX | BPF_MEM | BPF_DW:
- /* emit 'mov rax, qword ptr [rax+0x14]' */
+ /* Emit 'mov rax, qword ptr [rax+0x14]' */
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
-ldx: /* if insn->off == 0 we can save one extra byte, but
- * special case of x86 r13 which always needs an offset
+ldx: /*
+ * If insn->off == 0 we can save one extra byte, but
+ * special case of x86 R13 which always needs an offset
* is not worth the hassle
*/
if (is_imm8(insn->off))
@@ -870,7 +839,7 @@
/* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */
case BPF_STX | BPF_XADD | BPF_W:
- /* emit 'lock add dword ptr [rax + off], eax' */
+ /* Emit 'lock add dword ptr [rax + off], eax' */
if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT3(0xF0, add_2mod(0x40, dst_reg, src_reg), 0x01);
else
@@ -889,35 +858,12 @@
case BPF_JMP | BPF_CALL:
func = (u8 *) __bpf_call_base + imm32;
jmp_offset = func - (image + addrs[i]);
- if (seen_ld_abs) {
- reload_skb_data = bpf_helper_changes_pkt_data(func);
- if (reload_skb_data) {
- EMIT1(0x57); /* push %rdi */
- jmp_offset += 22; /* pop, mov, sub, mov */
- } else {
- EMIT2(0x41, 0x52); /* push %r10 */
- EMIT2(0x41, 0x51); /* push %r9 */
- /* need to adjust jmp offset, since
- * pop %r9, pop %r10 take 4 bytes after call insn
- */
- jmp_offset += 4;
- }
- }
if (!imm32 || !is_simm32(jmp_offset)) {
- pr_err("unsupported bpf func %d addr %p image %p\n",
+ pr_err("unsupported BPF func %d addr %p image %p\n",
imm32, func, image);
return -EINVAL;
}
EMIT1_off32(0xE8, jmp_offset);
- if (seen_ld_abs) {
- if (reload_skb_data) {
- EMIT1(0x5F); /* pop %rdi */
- emit_load_skb_data_hlen(&prog);
- } else {
- EMIT2(0x41, 0x59); /* pop %r9 */
- EMIT2(0x41, 0x5A); /* pop %r10 */
- }
- }
break;
case BPF_JMP | BPF_TAIL_CALL:
@@ -970,7 +916,7 @@
else
EMIT2_off32(0x81, add_1reg(0xF8, dst_reg), imm32);
-emit_cond_jmp: /* convert BPF opcode to x86 */
+emit_cond_jmp: /* Convert BPF opcode to x86 */
switch (BPF_OP(insn->code)) {
case BPF_JEQ:
jmp_cond = X86_JE;
@@ -996,22 +942,22 @@
jmp_cond = X86_JBE;
break;
case BPF_JSGT:
- /* signed '>', GT in x86 */
+ /* Signed '>', GT in x86 */
jmp_cond = X86_JG;
break;
case BPF_JSLT:
- /* signed '<', LT in x86 */
+ /* Signed '<', LT in x86 */
jmp_cond = X86_JL;
break;
case BPF_JSGE:
- /* signed '>=', GE in x86 */
+ /* Signed '>=', GE in x86 */
jmp_cond = X86_JGE;
break;
case BPF_JSLE:
- /* signed '<=', LE in x86 */
+ /* Signed '<=', LE in x86 */
jmp_cond = X86_JLE;
break;
- default: /* to silence gcc warning */
+ default: /* to silence GCC warning */
return -EFAULT;
}
jmp_offset = addrs[i + insn->off] - addrs[i];
@@ -1039,7 +985,7 @@
jmp_offset = addrs[i + insn->off] - addrs[i];
if (!jmp_offset)
- /* optimize out nop jumps */
+ /* Optimize out nop jumps */
break;
emit_jmp:
if (is_imm8(jmp_offset)) {
@@ -1052,66 +998,13 @@
}
break;
- case BPF_LD | BPF_IND | BPF_W:
- func = sk_load_word;
- goto common_load;
- case BPF_LD | BPF_ABS | BPF_W:
- func = CHOOSE_LOAD_FUNC(imm32, sk_load_word);
-common_load:
- ctx->seen_ld_abs = seen_ld_abs = true;
- jmp_offset = func - (image + addrs[i]);
- if (!func || !is_simm32(jmp_offset)) {
- pr_err("unsupported bpf func %d addr %p image %p\n",
- imm32, func, image);
- return -EINVAL;
- }
- if (BPF_MODE(insn->code) == BPF_ABS) {
- /* mov %esi, imm32 */
- EMIT1_off32(0xBE, imm32);
- } else {
- /* mov %rsi, src_reg */
- EMIT_mov(BPF_REG_2, src_reg);
- if (imm32) {
- if (is_imm8(imm32))
- /* add %esi, imm8 */
- EMIT3(0x83, 0xC6, imm32);
- else
- /* add %esi, imm32 */
- EMIT2_off32(0x81, 0xC6, imm32);
- }
- }
- /* skb pointer is in R6 (%rbx), it will be copied into
- * %rdi if skb_copy_bits() call is necessary.
- * sk_load_* helpers also use %r10 and %r9d.
- * See bpf_jit.S
- */
- if (seen_ax_reg)
- /* r10 = skb->data, mov %r10, off32(%rbx) */
- EMIT3_off32(0x4c, 0x8b, 0x93,
- offsetof(struct sk_buff, data));
- EMIT1_off32(0xE8, jmp_offset); /* call */
- break;
-
- case BPF_LD | BPF_IND | BPF_H:
- func = sk_load_half;
- goto common_load;
- case BPF_LD | BPF_ABS | BPF_H:
- func = CHOOSE_LOAD_FUNC(imm32, sk_load_half);
- goto common_load;
- case BPF_LD | BPF_IND | BPF_B:
- func = sk_load_byte;
- goto common_load;
- case BPF_LD | BPF_ABS | BPF_B:
- func = CHOOSE_LOAD_FUNC(imm32, sk_load_byte);
- goto common_load;
-
case BPF_JMP | BPF_EXIT:
if (seen_exit) {
jmp_offset = ctx->cleanup_addr - addrs[i];
goto emit_jmp;
}
seen_exit = true;
- /* update cleanup_addr */
+ /* Update cleanup_addr */
ctx->cleanup_addr = proglen;
/* mov rbx, qword ptr [rbp+0] */
EMIT4(0x48, 0x8B, 0x5D, 0);
@@ -1129,10 +1022,11 @@
break;
default:
- /* By design x64 JIT should support all BPF instructions
+ /*
+ * By design x86-64 JIT should support all BPF instructions.
* This error will be seen if new instruction was added
- * to interpreter, but not to JIT
- * or if there is junk in bpf_prog
+ * to the interpreter, but not to the JIT, or if there is
+ * junk in bpf_prog.
*/
pr_err("bpf_jit: unknown opcode %02x\n", insn->code);
return -EINVAL;
@@ -1184,7 +1078,8 @@
return orig_prog;
tmp = bpf_jit_blind_constants(prog);
- /* If blinding was requested and we failed during blinding,
+ /*
+ * If blinding was requested and we failed during blinding,
* we must fall back to the interpreter.
*/
if (IS_ERR(tmp))
@@ -1218,8 +1113,9 @@
goto out_addrs;
}
- /* Before first pass, make a rough estimation of addrs[]
- * each bpf instruction is translated to less than 64 bytes
+ /*
+ * Before first pass, make a rough estimation of addrs[]
+ * each BPF instruction is translated to less than 64 bytes
*/
for (proglen = 0, i = 0; i < prog->len; i++) {
proglen += 64;
@@ -1228,10 +1124,11 @@
ctx.cleanup_addr = proglen;
skip_init_addrs:
- /* JITed image shrinks with every pass and the loop iterates
- * until the image stops shrinking. Very large bpf programs
+ /*
+ * JITed image shrinks with every pass and the loop iterates
+ * until the image stops shrinking. Very large BPF programs
* may converge on the last pass. In such case do one more
- * pass to emit the final image
+ * pass to emit the final image.
*/
for (pass = 0; pass < 20 || image; pass++) {
proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
new file mode 100644
index 0000000..0cc04e3
--- /dev/null
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -0,0 +1,2419 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Just-In-Time compiler for eBPF filters on IA32 (32bit x86)
+ *
+ * Author: Wang YanQing (udknight@gmail.com)
+ * The code based on code and ideas from:
+ * Eric Dumazet (eric.dumazet@gmail.com)
+ * and from:
+ * Shubham Bansal <illusionist.neo@gmail.com>
+ */
+
+#include <linux/netdevice.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <asm/cacheflush.h>
+#include <asm/set_memory.h>
+#include <asm/nospec-branch.h>
+#include <linux/bpf.h>
+
+/*
+ * eBPF prog stack layout:
+ *
+ * high
+ * original ESP => +-----+
+ * | | callee saved registers
+ * +-----+
+ * | ... | eBPF JIT scratch space
+ * BPF_FP,IA32_EBP => +-----+
+ * | ... | eBPF prog stack
+ * +-----+
+ * |RSVD | JIT scratchpad
+ * current ESP => +-----+
+ * | |
+ * | ... | Function call stack
+ * | |
+ * +-----+
+ * low
+ *
+ * The callee saved registers:
+ *
+ * high
+ * original ESP => +------------------+ \
+ * | ebp | |
+ * current EBP => +------------------+ } callee saved registers
+ * | ebx,esi,edi | |
+ * +------------------+ /
+ * low
+ */
+
+static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
+{
+ if (len == 1)
+ *ptr = bytes;
+ else if (len == 2)
+ *(u16 *)ptr = bytes;
+ else {
+ *(u32 *)ptr = bytes;
+ barrier();
+ }
+ return ptr + len;
+}
+
+#define EMIT(bytes, len) \
+ do { prog = emit_code(prog, bytes, len); cnt += len; } while (0)
+
+#define EMIT1(b1) EMIT(b1, 1)
+#define EMIT2(b1, b2) EMIT((b1) + ((b2) << 8), 2)
+#define EMIT3(b1, b2, b3) EMIT((b1) + ((b2) << 8) + ((b3) << 16), 3)
+#define EMIT4(b1, b2, b3, b4) \
+ EMIT((b1) + ((b2) << 8) + ((b3) << 16) + ((b4) << 24), 4)
+
+#define EMIT1_off32(b1, off) \
+ do { EMIT1(b1); EMIT(off, 4); } while (0)
+#define EMIT2_off32(b1, b2, off) \
+ do { EMIT2(b1, b2); EMIT(off, 4); } while (0)
+#define EMIT3_off32(b1, b2, b3, off) \
+ do { EMIT3(b1, b2, b3); EMIT(off, 4); } while (0)
+#define EMIT4_off32(b1, b2, b3, b4, off) \
+ do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
+
+#define jmp_label(label, jmp_insn_len) (label - cnt - jmp_insn_len)
+
+static bool is_imm8(int value)
+{
+ return value <= 127 && value >= -128;
+}
+
+static bool is_simm32(s64 value)
+{
+ return value == (s64) (s32) value;
+}
+
+#define STACK_OFFSET(k) (k)
+#define TCALL_CNT (MAX_BPF_JIT_REG + 0) /* Tail Call Count */
+
+#define IA32_EAX (0x0)
+#define IA32_EBX (0x3)
+#define IA32_ECX (0x1)
+#define IA32_EDX (0x2)
+#define IA32_ESI (0x6)
+#define IA32_EDI (0x7)
+#define IA32_EBP (0x5)
+#define IA32_ESP (0x4)
+
+/*
+ * List of x86 cond jumps opcodes (. + s8)
+ * Add 0x10 (and an extra 0x0f) to generate far jumps (. + s32)
+ */
+#define IA32_JB 0x72
+#define IA32_JAE 0x73
+#define IA32_JE 0x74
+#define IA32_JNE 0x75
+#define IA32_JBE 0x76
+#define IA32_JA 0x77
+#define IA32_JL 0x7C
+#define IA32_JGE 0x7D
+#define IA32_JLE 0x7E
+#define IA32_JG 0x7F
+
+/*
+ * Map eBPF registers to IA32 32bit registers or stack scratch space.
+ *
+ * 1. All the registers, R0-R10, are mapped to scratch space on stack.
+ * 2. We need two 64 bit temp registers to do complex operations on eBPF
+ * registers.
+ * 3. For performance reason, the BPF_REG_AX for blinding constant, is
+ * mapped to real hardware register pair, IA32_ESI and IA32_EDI.
+ *
+ * As the eBPF registers are all 64 bit registers and IA32 has only 32 bit
+ * registers, we have to map each eBPF registers with two IA32 32 bit regs
+ * or scratch memory space and we have to build eBPF 64 bit register from those.
+ *
+ * We use IA32_EAX, IA32_EDX, IA32_ECX, IA32_EBX as temporary registers.
+ */
+static const u8 bpf2ia32[][2] = {
+ /* Return value from in-kernel function, and exit value from eBPF */
+ [BPF_REG_0] = {STACK_OFFSET(0), STACK_OFFSET(4)},
+
+ /* The arguments from eBPF program to in-kernel function */
+ /* Stored on stack scratch space */
+ [BPF_REG_1] = {STACK_OFFSET(8), STACK_OFFSET(12)},
+ [BPF_REG_2] = {STACK_OFFSET(16), STACK_OFFSET(20)},
+ [BPF_REG_3] = {STACK_OFFSET(24), STACK_OFFSET(28)},
+ [BPF_REG_4] = {STACK_OFFSET(32), STACK_OFFSET(36)},
+ [BPF_REG_5] = {STACK_OFFSET(40), STACK_OFFSET(44)},
+
+ /* Callee saved registers that in-kernel function will preserve */
+ /* Stored on stack scratch space */
+ [BPF_REG_6] = {STACK_OFFSET(48), STACK_OFFSET(52)},
+ [BPF_REG_7] = {STACK_OFFSET(56), STACK_OFFSET(60)},
+ [BPF_REG_8] = {STACK_OFFSET(64), STACK_OFFSET(68)},
+ [BPF_REG_9] = {STACK_OFFSET(72), STACK_OFFSET(76)},
+
+ /* Read only Frame Pointer to access Stack */
+ [BPF_REG_FP] = {STACK_OFFSET(80), STACK_OFFSET(84)},
+
+ /* Temporary register for blinding constants. */
+ [BPF_REG_AX] = {IA32_ESI, IA32_EDI},
+
+ /* Tail call count. Stored on stack scratch space. */
+ [TCALL_CNT] = {STACK_OFFSET(88), STACK_OFFSET(92)},
+};
+
+#define dst_lo dst[0]
+#define dst_hi dst[1]
+#define src_lo src[0]
+#define src_hi src[1]
+
+#define STACK_ALIGNMENT 8
+/*
+ * Stack space for BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4,
+ * BPF_REG_5, BPF_REG_6, BPF_REG_7, BPF_REG_8, BPF_REG_9,
+ * BPF_REG_FP, BPF_REG_AX and Tail call counts.
+ */
+#define SCRATCH_SIZE 96
+
+/* Total stack size used in JITed code */
+#define _STACK_SIZE (stack_depth + SCRATCH_SIZE)
+
+#define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
+
+/* Get the offset of eBPF REGISTERs stored on scratch space. */
+#define STACK_VAR(off) (off)
+
+/* Encode 'dst_reg' register into IA32 opcode 'byte' */
+static u8 add_1reg(u8 byte, u32 dst_reg)
+{
+ return byte + dst_reg;
+}
+
+/* Encode 'dst_reg' and 'src_reg' registers into IA32 opcode 'byte' */
+static u8 add_2reg(u8 byte, u32 dst_reg, u32 src_reg)
+{
+ return byte + dst_reg + (src_reg << 3);
+}
+
+static void jit_fill_hole(void *area, unsigned int size)
+{
+ /* Fill whole space with int3 instructions */
+ memset(area, 0xcc, size);
+}
+
+static inline void emit_ia32_mov_i(const u8 dst, const u32 val, bool dstk,
+ u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+ if (dstk) {
+ if (val == 0) {
+ /* xor eax,eax */
+ EMIT2(0x33, add_2reg(0xC0, IA32_EAX, IA32_EAX));
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst));
+ } else {
+ EMIT3_off32(0xC7, add_1reg(0x40, IA32_EBP),
+ STACK_VAR(dst), val);
+ }
+ } else {
+ if (val == 0)
+ EMIT2(0x33, add_2reg(0xC0, dst, dst));
+ else
+ EMIT2_off32(0xC7, add_1reg(0xC0, dst),
+ val);
+ }
+ *pprog = prog;
+}
+
+/* dst = imm (4 bytes)*/
+static inline void emit_ia32_mov_r(const u8 dst, const u8 src, bool dstk,
+ bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 sreg = sstk ? IA32_EAX : src;
+
+ if (sstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(src));
+ if (dstk)
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, sreg), STACK_VAR(dst));
+ else
+ /* mov dst,sreg */
+ EMIT2(0x89, add_2reg(0xC0, dst, sreg));
+
+ *pprog = prog;
+}
+
+/* dst = src */
+static inline void emit_ia32_mov_r64(const bool is64, const u8 dst[],
+ const u8 src[], bool dstk,
+ bool sstk, u8 **pprog)
+{
+ emit_ia32_mov_r(dst_lo, src_lo, dstk, sstk, pprog);
+ if (is64)
+ /* complete 8 byte move */
+ emit_ia32_mov_r(dst_hi, src_hi, dstk, sstk, pprog);
+ else
+ /* zero out high 4 bytes */
+ emit_ia32_mov_i(dst_hi, 0, dstk, pprog);
+}
+
+/* Sign extended move */
+static inline void emit_ia32_mov_i64(const bool is64, const u8 dst[],
+ const u32 val, bool dstk, u8 **pprog)
+{
+ u32 hi = 0;
+
+ if (is64 && (val & (1<<31)))
+ hi = (u32)~0;
+ emit_ia32_mov_i(dst_lo, val, dstk, pprog);
+ emit_ia32_mov_i(dst_hi, hi, dstk, pprog);
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst * src
+ */
+static inline void emit_ia32_mul_r(const u8 dst, const u8 src, bool dstk,
+ bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 sreg = sstk ? IA32_ECX : src;
+
+ if (sstk)
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src));
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(dst));
+ else
+ /* mov eax,dst */
+ EMIT2(0x8B, add_2reg(0xC0, dst, IA32_EAX));
+
+
+ EMIT2(0xF7, add_1reg(0xE0, sreg));
+
+ if (dstk)
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst));
+ else
+ /* mov dst,eax */
+ EMIT2(0x89, add_2reg(0xC0, dst, IA32_EAX));
+
+ *pprog = prog;
+}
+
+static inline void emit_ia32_to_le_r64(const u8 dst[], s32 val,
+ bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk && val != 64) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+ switch (val) {
+ case 16:
+ /*
+ * Emit 'movzwl eax,ax' to zero extend 16-bit
+ * into 64 bit
+ */
+ EMIT2(0x0F, 0xB7);
+ EMIT1(add_2reg(0xC0, dreg_lo, dreg_lo));
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+ break;
+ case 32:
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+ break;
+ case 64:
+ /* nop */
+ break;
+ }
+
+ if (dstk && val != 64) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ *pprog = prog;
+}
+
+static inline void emit_ia32_to_be_r64(const u8 dst[], s32 val,
+ bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+ switch (val) {
+ case 16:
+ /* Emit 'ror %ax, 8' to swap lower 2 bytes */
+ EMIT1(0x66);
+ EMIT3(0xC1, add_1reg(0xC8, dreg_lo), 8);
+
+ EMIT2(0x0F, 0xB7);
+ EMIT1(add_2reg(0xC0, dreg_lo, dreg_lo));
+
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+ break;
+ case 32:
+ /* Emit 'bswap eax' to swap lower 4 bytes */
+ EMIT1(0x0F);
+ EMIT1(add_1reg(0xC8, dreg_lo));
+
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+ break;
+ case 64:
+ /* Emit 'bswap eax' to swap lower 4 bytes */
+ EMIT1(0x0F);
+ EMIT1(add_1reg(0xC8, dreg_lo));
+
+ /* Emit 'bswap edx' to swap lower 4 bytes */
+ EMIT1(0x0F);
+ EMIT1(add_1reg(0xC8, dreg_hi));
+
+ /* mov ecx,dreg_hi */
+ EMIT2(0x89, add_2reg(0xC0, IA32_ECX, dreg_hi));
+ /* mov dreg_hi,dreg_lo */
+ EMIT2(0x89, add_2reg(0xC0, dreg_hi, dreg_lo));
+ /* mov dreg_lo,ecx */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, IA32_ECX));
+
+ break;
+ }
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ *pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (div|mod) src
+ */
+static inline void emit_ia32_div_mod_r(const u8 op, const u8 dst, const u8 src,
+ bool dstk, bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+ if (sstk)
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(src));
+ else if (src != IA32_ECX)
+ /* mov ecx,src */
+ EMIT2(0x8B, add_2reg(0xC0, src, IA32_ECX));
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst));
+ else
+ /* mov eax,dst */
+ EMIT2(0x8B, add_2reg(0xC0, dst, IA32_EAX));
+
+ /* xor edx,edx */
+ EMIT2(0x31, add_2reg(0xC0, IA32_EDX, IA32_EDX));
+ /* div ecx */
+ EMIT2(0xF7, add_1reg(0xF0, IA32_ECX));
+
+ if (op == BPF_MOD) {
+ if (dstk)
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst));
+ else
+ EMIT2(0x89, add_2reg(0xC0, dst, IA32_EDX));
+ } else {
+ if (dstk)
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst));
+ else
+ EMIT2(0x89, add_2reg(0xC0, dst, IA32_EAX));
+ }
+ *pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (shift) src
+ */
+static inline void emit_ia32_shift_r(const u8 op, const u8 dst, const u8 src,
+ bool dstk, bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg = dstk ? IA32_EAX : dst;
+ u8 b2;
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(dst));
+
+ if (sstk)
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src));
+ else if (src != IA32_ECX)
+ /* mov ecx,src */
+ EMIT2(0x8B, add_2reg(0xC0, src, IA32_ECX));
+
+ switch (op) {
+ case BPF_LSH:
+ b2 = 0xE0; break;
+ case BPF_RSH:
+ b2 = 0xE8; break;
+ case BPF_ARSH:
+ b2 = 0xF8; break;
+ default:
+ return;
+ }
+ EMIT2(0xD3, add_1reg(b2, dreg));
+
+ if (dstk)
+ /* mov dword ptr [ebp+off],dreg */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg), STACK_VAR(dst));
+ *pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (op) src
+ */
+static inline void emit_ia32_alu_r(const bool is64, const bool hi, const u8 op,
+ const u8 dst, const u8 src, bool dstk,
+ bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 sreg = sstk ? IA32_EAX : src;
+ u8 dreg = dstk ? IA32_EDX : dst;
+
+ if (sstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(src));
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(dst));
+
+ switch (BPF_OP(op)) {
+ /* dst = dst + src */
+ case BPF_ADD:
+ if (hi && is64)
+ EMIT2(0x11, add_2reg(0xC0, dreg, sreg));
+ else
+ EMIT2(0x01, add_2reg(0xC0, dreg, sreg));
+ break;
+ /* dst = dst - src */
+ case BPF_SUB:
+ if (hi && is64)
+ EMIT2(0x19, add_2reg(0xC0, dreg, sreg));
+ else
+ EMIT2(0x29, add_2reg(0xC0, dreg, sreg));
+ break;
+ /* dst = dst | src */
+ case BPF_OR:
+ EMIT2(0x09, add_2reg(0xC0, dreg, sreg));
+ break;
+ /* dst = dst & src */
+ case BPF_AND:
+ EMIT2(0x21, add_2reg(0xC0, dreg, sreg));
+ break;
+ /* dst = dst ^ src */
+ case BPF_XOR:
+ EMIT2(0x31, add_2reg(0xC0, dreg, sreg));
+ break;
+ }
+
+ if (dstk)
+ /* mov dword ptr [ebp+off],dreg */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg),
+ STACK_VAR(dst));
+ *pprog = prog;
+}
+
+/* ALU operation (64 bit) */
+static inline void emit_ia32_alu_r64(const bool is64, const u8 op,
+ const u8 dst[], const u8 src[],
+ bool dstk, bool sstk,
+ u8 **pprog)
+{
+ u8 *prog = *pprog;
+
+ emit_ia32_alu_r(is64, false, op, dst_lo, src_lo, dstk, sstk, &prog);
+ if (is64)
+ emit_ia32_alu_r(is64, true, op, dst_hi, src_hi, dstk, sstk,
+ &prog);
+ else
+ emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+ *pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (op) val
+ */
+static inline void emit_ia32_alu_i(const bool is64, const bool hi, const u8 op,
+ const u8 dst, const s32 val, bool dstk,
+ u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg = dstk ? IA32_EAX : dst;
+ u8 sreg = IA32_EDX;
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(dst));
+
+ if (!is_imm8(val))
+ /* mov edx,imm32*/
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EDX), val);
+
+ switch (op) {
+ /* dst = dst + val */
+ case BPF_ADD:
+ if (hi && is64) {
+ if (is_imm8(val))
+ EMIT3(0x83, add_1reg(0xD0, dreg), val);
+ else
+ EMIT2(0x11, add_2reg(0xC0, dreg, sreg));
+ } else {
+ if (is_imm8(val))
+ EMIT3(0x83, add_1reg(0xC0, dreg), val);
+ else
+ EMIT2(0x01, add_2reg(0xC0, dreg, sreg));
+ }
+ break;
+ /* dst = dst - val */
+ case BPF_SUB:
+ if (hi && is64) {
+ if (is_imm8(val))
+ EMIT3(0x83, add_1reg(0xD8, dreg), val);
+ else
+ EMIT2(0x19, add_2reg(0xC0, dreg, sreg));
+ } else {
+ if (is_imm8(val))
+ EMIT3(0x83, add_1reg(0xE8, dreg), val);
+ else
+ EMIT2(0x29, add_2reg(0xC0, dreg, sreg));
+ }
+ break;
+ /* dst = dst | val */
+ case BPF_OR:
+ if (is_imm8(val))
+ EMIT3(0x83, add_1reg(0xC8, dreg), val);
+ else
+ EMIT2(0x09, add_2reg(0xC0, dreg, sreg));
+ break;
+ /* dst = dst & val */
+ case BPF_AND:
+ if (is_imm8(val))
+ EMIT3(0x83, add_1reg(0xE0, dreg), val);
+ else
+ EMIT2(0x21, add_2reg(0xC0, dreg, sreg));
+ break;
+ /* dst = dst ^ val */
+ case BPF_XOR:
+ if (is_imm8(val))
+ EMIT3(0x83, add_1reg(0xF0, dreg), val);
+ else
+ EMIT2(0x31, add_2reg(0xC0, dreg, sreg));
+ break;
+ case BPF_NEG:
+ EMIT2(0xF7, add_1reg(0xD8, dreg));
+ break;
+ }
+
+ if (dstk)
+ /* mov dword ptr [ebp+off],dreg */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg),
+ STACK_VAR(dst));
+ *pprog = prog;
+}
+
+/* ALU operation (64 bit) */
+static inline void emit_ia32_alu_i64(const bool is64, const u8 op,
+ const u8 dst[], const u32 val,
+ bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ u32 hi = 0;
+
+ if (is64 && (val & (1<<31)))
+ hi = (u32)~0;
+
+ emit_ia32_alu_i(is64, false, op, dst_lo, val, dstk, &prog);
+ if (is64)
+ emit_ia32_alu_i(is64, true, op, dst_hi, hi, dstk, &prog);
+ else
+ emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+
+ *pprog = prog;
+}
+
+/* dst = ~dst (64 bit) */
+static inline void emit_ia32_neg64(const u8 dst[], bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ /* xor ecx,ecx */
+ EMIT2(0x31, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+ /* sub dreg_lo,ecx */
+ EMIT2(0x2B, add_2reg(0xC0, dreg_lo, IA32_ECX));
+ /* mov dreg_lo,ecx */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, IA32_ECX));
+
+ /* xor ecx,ecx */
+ EMIT2(0x31, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+ /* sbb dreg_hi,ecx */
+ EMIT2(0x19, add_2reg(0xC0, dreg_hi, IA32_ECX));
+ /* mov dreg_hi,ecx */
+ EMIT2(0x89, add_2reg(0xC0, dreg_hi, IA32_ECX));
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ *pprog = prog;
+}
+
+/* dst = dst << src */
+static inline void emit_ia32_lsh_r64(const u8 dst[], const u8 src[],
+ bool dstk, bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ static int jmp_label1 = -1;
+ static int jmp_label2 = -1;
+ static int jmp_label3 = -1;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ if (sstk)
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(src_lo));
+ else
+ /* mov ecx,src_lo */
+ EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_ECX));
+
+ /* cmp ecx,32 */
+ EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 32);
+ /* Jumps when >= 32 */
+ if (is_imm8(jmp_label(jmp_label1, 2)))
+ EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+ else
+ EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label1, 6));
+
+ /* < 32 */
+ /* shl dreg_hi,cl */
+ EMIT2(0xD3, add_1reg(0xE0, dreg_hi));
+ /* mov ebx,dreg_lo */
+ EMIT2(0x8B, add_2reg(0xC0, dreg_lo, IA32_EBX));
+ /* shl dreg_lo,cl */
+ EMIT2(0xD3, add_1reg(0xE0, dreg_lo));
+
+ /* IA32_ECX = -IA32_ECX + 32 */
+ /* neg ecx */
+ EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+ /* add ecx,32 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+ /* shr ebx,cl */
+ EMIT2(0xD3, add_1reg(0xE8, IA32_EBX));
+ /* or dreg_hi,ebx */
+ EMIT2(0x09, add_2reg(0xC0, dreg_hi, IA32_EBX));
+
+ /* goto out; */
+ if (is_imm8(jmp_label(jmp_label3, 2)))
+ EMIT2(0xEB, jmp_label(jmp_label3, 2));
+ else
+ EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+ /* >= 32 */
+ if (jmp_label1 == -1)
+ jmp_label1 = cnt;
+
+ /* cmp ecx,64 */
+ EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 64);
+ /* Jumps when >= 64 */
+ if (is_imm8(jmp_label(jmp_label2, 2)))
+ EMIT2(IA32_JAE, jmp_label(jmp_label2, 2));
+ else
+ EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label2, 6));
+
+ /* >= 32 && < 64 */
+ /* sub ecx,32 */
+ EMIT3(0x83, add_1reg(0xE8, IA32_ECX), 32);
+ /* shl dreg_lo,cl */
+ EMIT2(0xD3, add_1reg(0xE0, dreg_lo));
+ /* mov dreg_hi,dreg_lo */
+ EMIT2(0x89, add_2reg(0xC0, dreg_hi, dreg_lo));
+
+ /* xor dreg_lo,dreg_lo */
+ EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+
+ /* goto out; */
+ if (is_imm8(jmp_label(jmp_label3, 2)))
+ EMIT2(0xEB, jmp_label(jmp_label3, 2));
+ else
+ EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+ /* >= 64 */
+ if (jmp_label2 == -1)
+ jmp_label2 = cnt;
+ /* xor dreg_lo,dreg_lo */
+ EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+
+ if (jmp_label3 == -1)
+ jmp_label3 = cnt;
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ /* out: */
+ *pprog = prog;
+}
+
+/* dst = dst >> src (signed)*/
+static inline void emit_ia32_arsh_r64(const u8 dst[], const u8 src[],
+ bool dstk, bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ static int jmp_label1 = -1;
+ static int jmp_label2 = -1;
+ static int jmp_label3 = -1;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ if (sstk)
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(src_lo));
+ else
+ /* mov ecx,src_lo */
+ EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_ECX));
+
+ /* cmp ecx,32 */
+ EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 32);
+ /* Jumps when >= 32 */
+ if (is_imm8(jmp_label(jmp_label1, 2)))
+ EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+ else
+ EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label1, 6));
+
+ /* < 32 */
+ /* lshr dreg_lo,cl */
+ EMIT2(0xD3, add_1reg(0xE8, dreg_lo));
+ /* mov ebx,dreg_hi */
+ EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+ /* ashr dreg_hi,cl */
+ EMIT2(0xD3, add_1reg(0xF8, dreg_hi));
+
+ /* IA32_ECX = -IA32_ECX + 32 */
+ /* neg ecx */
+ EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+ /* add ecx,32 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+ /* shl ebx,cl */
+ EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+ /* or dreg_lo,ebx */
+ EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+
+ /* goto out; */
+ if (is_imm8(jmp_label(jmp_label3, 2)))
+ EMIT2(0xEB, jmp_label(jmp_label3, 2));
+ else
+ EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+ /* >= 32 */
+ if (jmp_label1 == -1)
+ jmp_label1 = cnt;
+
+ /* cmp ecx,64 */
+ EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 64);
+ /* Jumps when >= 64 */
+ if (is_imm8(jmp_label(jmp_label2, 2)))
+ EMIT2(IA32_JAE, jmp_label(jmp_label2, 2));
+ else
+ EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label2, 6));
+
+ /* >= 32 && < 64 */
+ /* sub ecx,32 */
+ EMIT3(0x83, add_1reg(0xE8, IA32_ECX), 32);
+ /* ashr dreg_hi,cl */
+ EMIT2(0xD3, add_1reg(0xF8, dreg_hi));
+ /* mov dreg_lo,dreg_hi */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+
+ /* ashr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+
+ /* goto out; */
+ if (is_imm8(jmp_label(jmp_label3, 2)))
+ EMIT2(0xEB, jmp_label(jmp_label3, 2));
+ else
+ EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+ /* >= 64 */
+ if (jmp_label2 == -1)
+ jmp_label2 = cnt;
+ /* ashr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+ /* mov dreg_lo,dreg_hi */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+
+ if (jmp_label3 == -1)
+ jmp_label3 = cnt;
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ /* out: */
+ *pprog = prog;
+}
+
+/* dst = dst >> src */
+static inline void emit_ia32_rsh_r64(const u8 dst[], const u8 src[], bool dstk,
+ bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ static int jmp_label1 = -1;
+ static int jmp_label2 = -1;
+ static int jmp_label3 = -1;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ if (sstk)
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(src_lo));
+ else
+ /* mov ecx,src_lo */
+ EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_ECX));
+
+ /* cmp ecx,32 */
+ EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 32);
+ /* Jumps when >= 32 */
+ if (is_imm8(jmp_label(jmp_label1, 2)))
+ EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+ else
+ EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label1, 6));
+
+ /* < 32 */
+ /* lshr dreg_lo,cl */
+ EMIT2(0xD3, add_1reg(0xE8, dreg_lo));
+ /* mov ebx,dreg_hi */
+ EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+ /* shr dreg_hi,cl */
+ EMIT2(0xD3, add_1reg(0xE8, dreg_hi));
+
+ /* IA32_ECX = -IA32_ECX + 32 */
+ /* neg ecx */
+ EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+ /* add ecx,32 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+ /* shl ebx,cl */
+ EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+ /* or dreg_lo,ebx */
+ EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+
+ /* goto out; */
+ if (is_imm8(jmp_label(jmp_label3, 2)))
+ EMIT2(0xEB, jmp_label(jmp_label3, 2));
+ else
+ EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+ /* >= 32 */
+ if (jmp_label1 == -1)
+ jmp_label1 = cnt;
+ /* cmp ecx,64 */
+ EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 64);
+ /* Jumps when >= 64 */
+ if (is_imm8(jmp_label(jmp_label2, 2)))
+ EMIT2(IA32_JAE, jmp_label(jmp_label2, 2));
+ else
+ EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label2, 6));
+
+ /* >= 32 && < 64 */
+ /* sub ecx,32 */
+ EMIT3(0x83, add_1reg(0xE8, IA32_ECX), 32);
+ /* shr dreg_hi,cl */
+ EMIT2(0xD3, add_1reg(0xE8, dreg_hi));
+ /* mov dreg_lo,dreg_hi */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+
+ /* goto out; */
+ if (is_imm8(jmp_label(jmp_label3, 2)))
+ EMIT2(0xEB, jmp_label(jmp_label3, 2));
+ else
+ EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+ /* >= 64 */
+ if (jmp_label2 == -1)
+ jmp_label2 = cnt;
+ /* xor dreg_lo,dreg_lo */
+ EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+
+ if (jmp_label3 == -1)
+ jmp_label3 = cnt;
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ /* out: */
+ *pprog = prog;
+}
+
+/* dst = dst << val */
+static inline void emit_ia32_lsh_i64(const u8 dst[], const u32 val,
+ bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+ /* Do LSH operation */
+ if (val < 32) {
+ /* shl dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xE0, dreg_hi), val);
+ /* mov ebx,dreg_lo */
+ EMIT2(0x8B, add_2reg(0xC0, dreg_lo, IA32_EBX));
+ /* shl dreg_lo,imm8 */
+ EMIT3(0xC1, add_1reg(0xE0, dreg_lo), val);
+
+ /* IA32_ECX = 32 - val */
+ /* mov ecx,val */
+ EMIT2(0xB1, val);
+ /* movzx ecx,ecx */
+ EMIT3(0x0F, 0xB6, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+ /* neg ecx */
+ EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+ /* add ecx,32 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+ /* shr ebx,cl */
+ EMIT2(0xD3, add_1reg(0xE8, IA32_EBX));
+ /* or dreg_hi,ebx */
+ EMIT2(0x09, add_2reg(0xC0, dreg_hi, IA32_EBX));
+ } else if (val >= 32 && val < 64) {
+ u32 value = val - 32;
+
+ /* shl dreg_lo,imm8 */
+ EMIT3(0xC1, add_1reg(0xE0, dreg_lo), value);
+ /* mov dreg_hi,dreg_lo */
+ EMIT2(0x89, add_2reg(0xC0, dreg_hi, dreg_lo));
+ /* xor dreg_lo,dreg_lo */
+ EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+ } else {
+ /* xor dreg_lo,dreg_lo */
+ EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+ }
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ *pprog = prog;
+}
+
+/* dst = dst >> val */
+static inline void emit_ia32_rsh_i64(const u8 dst[], const u32 val,
+ bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ /* Do RSH operation */
+ if (val < 32) {
+ /* shr dreg_lo,imm8 */
+ EMIT3(0xC1, add_1reg(0xE8, dreg_lo), val);
+ /* mov ebx,dreg_hi */
+ EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+ /* shr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xE8, dreg_hi), val);
+
+ /* IA32_ECX = 32 - val */
+ /* mov ecx,val */
+ EMIT2(0xB1, val);
+ /* movzx ecx,ecx */
+ EMIT3(0x0F, 0xB6, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+ /* neg ecx */
+ EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+ /* add ecx,32 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+ /* shl ebx,cl */
+ EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+ /* or dreg_lo,ebx */
+ EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+ } else if (val >= 32 && val < 64) {
+ u32 value = val - 32;
+
+ /* shr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xE8, dreg_hi), value);
+ /* mov dreg_lo,dreg_hi */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+ } else {
+ /* xor dreg_lo,dreg_lo */
+ EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+ /* xor dreg_hi,dreg_hi */
+ EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+ }
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ *pprog = prog;
+}
+
+/* dst = dst >> val (signed) */
+static inline void emit_ia32_arsh_i64(const u8 dst[], const u32 val,
+ bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+ /* Do RSH operation */
+ if (val < 32) {
+ /* shr dreg_lo,imm8 */
+ EMIT3(0xC1, add_1reg(0xE8, dreg_lo), val);
+ /* mov ebx,dreg_hi */
+ EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+ /* ashr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xF8, dreg_hi), val);
+
+ /* IA32_ECX = 32 - val */
+ /* mov ecx,val */
+ EMIT2(0xB1, val);
+ /* movzx ecx,ecx */
+ EMIT3(0x0F, 0xB6, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+ /* neg ecx */
+ EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+ /* add ecx,32 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+ /* shl ebx,cl */
+ EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+ /* or dreg_lo,ebx */
+ EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+ } else if (val >= 32 && val < 64) {
+ u32 value = val - 32;
+
+ /* ashr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xF8, dreg_hi), value);
+ /* mov dreg_lo,dreg_hi */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+
+ /* ashr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+ } else {
+ /* ashr dreg_hi,imm8 */
+ EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+ /* mov dreg_lo,dreg_hi */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+ }
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],dreg_lo */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],dreg_hi */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+ STACK_VAR(dst_hi));
+ }
+ *pprog = prog;
+}
+
+static inline void emit_ia32_mul_r64(const u8 dst[], const u8 src[], bool dstk,
+ bool sstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_hi));
+ else
+ /* mov eax,dst_hi */
+ EMIT2(0x8B, add_2reg(0xC0, dst_hi, IA32_EAX));
+
+ if (sstk)
+ /* mul dword ptr [ebp+off] */
+ EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(src_lo));
+ else
+ /* mul src_lo */
+ EMIT2(0xF7, add_1reg(0xE0, src_lo));
+
+ /* mov ecx,eax */
+ EMIT2(0x89, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ else
+ /* mov eax,dst_lo */
+ EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+ if (sstk)
+ /* mul dword ptr [ebp+off] */
+ EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(src_hi));
+ else
+ /* mul src_hi */
+ EMIT2(0xF7, add_1reg(0xE0, src_hi));
+
+ /* add eax,eax */
+ EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ else
+ /* mov eax,dst_lo */
+ EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+ if (sstk)
+ /* mul dword ptr [ebp+off] */
+ EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(src_lo));
+ else
+ /* mul src_lo */
+ EMIT2(0xF7, add_1reg(0xE0, src_lo));
+
+ /* add ecx,edx */
+ EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EDX));
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],ecx */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(dst_hi));
+ } else {
+ /* mov dst_lo,eax */
+ EMIT2(0x89, add_2reg(0xC0, dst_lo, IA32_EAX));
+ /* mov dst_hi,ecx */
+ EMIT2(0x89, add_2reg(0xC0, dst_hi, IA32_ECX));
+ }
+
+ *pprog = prog;
+}
+
+static inline void emit_ia32_mul_i64(const u8 dst[], const u32 val,
+ bool dstk, u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ u32 hi;
+
+ hi = val & (1<<31) ? (u32)~0 : 0;
+ /* movl eax,imm32 */
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EAX), val);
+ if (dstk)
+ /* mul dword ptr [ebp+off] */
+ EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(dst_hi));
+ else
+ /* mul dst_hi */
+ EMIT2(0xF7, add_1reg(0xE0, dst_hi));
+
+ /* mov ecx,eax */
+ EMIT2(0x89, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+ /* movl eax,imm32 */
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EAX), hi);
+ if (dstk)
+ /* mul dword ptr [ebp+off] */
+ EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(dst_lo));
+ else
+ /* mul dst_lo */
+ EMIT2(0xF7, add_1reg(0xE0, dst_lo));
+ /* add ecx,eax */
+ EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+ /* movl eax,imm32 */
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EAX), val);
+ if (dstk)
+ /* mul dword ptr [ebp+off] */
+ EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(dst_lo));
+ else
+ /* mul dst_lo */
+ EMIT2(0xF7, add_1reg(0xE0, dst_lo));
+
+ /* add ecx,edx */
+ EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EDX));
+
+ if (dstk) {
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ /* mov dword ptr [ebp+off],ecx */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(dst_hi));
+ } else {
+ /* mov dword ptr [ebp+off],eax */
+ EMIT2(0x89, add_2reg(0xC0, dst_lo, IA32_EAX));
+ /* mov dword ptr [ebp+off],ecx */
+ EMIT2(0x89, add_2reg(0xC0, dst_hi, IA32_ECX));
+ }
+
+ *pprog = prog;
+}
+
+static int bpf_size_to_x86_bytes(int bpf_size)
+{
+ if (bpf_size == BPF_W)
+ return 4;
+ else if (bpf_size == BPF_H)
+ return 2;
+ else if (bpf_size == BPF_B)
+ return 1;
+ else if (bpf_size == BPF_DW)
+ return 4; /* imm32 */
+ else
+ return 0;
+}
+
+struct jit_context {
+ int cleanup_addr; /* Epilogue code offset */
+};
+
+/* Maximum number of bytes emitted while JITing one eBPF insn */
+#define BPF_MAX_INSN_SIZE 128
+#define BPF_INSN_SAFETY 64
+
+#define PROLOGUE_SIZE 35
+
+/*
+ * Emit prologue code for BPF program and check it's size.
+ * bpf_tail_call helper will skip it while jumping into another program.
+ */
+static void emit_prologue(u8 **pprog, u32 stack_depth)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ const u8 *r1 = bpf2ia32[BPF_REG_1];
+ const u8 fplo = bpf2ia32[BPF_REG_FP][0];
+ const u8 fphi = bpf2ia32[BPF_REG_FP][1];
+ const u8 *tcc = bpf2ia32[TCALL_CNT];
+
+ /* push ebp */
+ EMIT1(0x55);
+ /* mov ebp,esp */
+ EMIT2(0x89, 0xE5);
+ /* push edi */
+ EMIT1(0x57);
+ /* push esi */
+ EMIT1(0x56);
+ /* push ebx */
+ EMIT1(0x53);
+
+ /* sub esp,STACK_SIZE */
+ EMIT2_off32(0x81, 0xEC, STACK_SIZE);
+ /* sub ebp,SCRATCH_SIZE+4+12*/
+ EMIT3(0x83, add_1reg(0xE8, IA32_EBP), SCRATCH_SIZE + 16);
+ /* xor ebx,ebx */
+ EMIT2(0x31, add_2reg(0xC0, IA32_EBX, IA32_EBX));
+
+ /* Set up BPF prog stack base register */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBP), STACK_VAR(fplo));
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(fphi));
+
+ /* Move BPF_CTX (EAX) to BPF_REG_R1 */
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r1[0]));
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(r1[1]));
+
+ /* Initialize Tail Count */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[0]));
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[1]));
+
+ BUILD_BUG_ON(cnt != PROLOGUE_SIZE);
+ *pprog = prog;
+}
+
+/* Emit epilogue code for BPF program */
+static void emit_epilogue(u8 **pprog, u32 stack_depth)
+{
+ u8 *prog = *pprog;
+ const u8 *r0 = bpf2ia32[BPF_REG_0];
+ int cnt = 0;
+
+ /* mov eax,dword ptr [ebp+off]*/
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r0[0]));
+ /* mov edx,dword ptr [ebp+off]*/
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(r0[1]));
+
+ /* add ebp,SCRATCH_SIZE+4+12*/
+ EMIT3(0x83, add_1reg(0xC0, IA32_EBP), SCRATCH_SIZE + 16);
+
+ /* mov ebx,dword ptr [ebp-12]*/
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), -12);
+ /* mov esi,dword ptr [ebp-8]*/
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ESI), -8);
+ /* mov edi,dword ptr [ebp-4]*/
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDI), -4);
+
+ EMIT1(0xC9); /* leave */
+ EMIT1(0xC3); /* ret */
+ *pprog = prog;
+}
+
+/*
+ * Generate the following code:
+ * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
+ * if (index >= array->map.max_entries)
+ * goto out;
+ * if (++tail_call_cnt > MAX_TAIL_CALL_CNT)
+ * goto out;
+ * prog = array->ptrs[index];
+ * if (prog == NULL)
+ * goto out;
+ * goto *(prog->bpf_func + prologue_size);
+ * out:
+ */
+static void emit_bpf_tail_call(u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+ const u8 *r1 = bpf2ia32[BPF_REG_1];
+ const u8 *r2 = bpf2ia32[BPF_REG_2];
+ const u8 *r3 = bpf2ia32[BPF_REG_3];
+ const u8 *tcc = bpf2ia32[TCALL_CNT];
+ u32 lo, hi;
+ static int jmp_label1 = -1;
+
+ /*
+ * if (index >= array->map.max_entries)
+ * goto out;
+ */
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r2[0]));
+ /* mov edx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(r3[0]));
+
+ /* cmp dword ptr [eax+off],edx */
+ EMIT3(0x39, add_2reg(0x40, IA32_EAX, IA32_EDX),
+ offsetof(struct bpf_array, map.max_entries));
+ /* jbe out */
+ EMIT2(IA32_JBE, jmp_label(jmp_label1, 2));
+
+ /*
+ * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+ * goto out;
+ */
+ lo = (u32)MAX_TAIL_CALL_CNT;
+ hi = (u32)((u64)MAX_TAIL_CALL_CNT >> 32);
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(tcc[0]));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[1]));
+
+ /* cmp edx,hi */
+ EMIT3(0x83, add_1reg(0xF8, IA32_EBX), hi);
+ EMIT2(IA32_JNE, 3);
+ /* cmp ecx,lo */
+ EMIT3(0x83, add_1reg(0xF8, IA32_ECX), lo);
+
+ /* ja out */
+ EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+
+ /* add eax,0x1 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 0x01);
+ /* adc ebx,0x0 */
+ EMIT3(0x83, add_1reg(0xD0, IA32_EBX), 0x00);
+
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(tcc[0]));
+ /* mov dword ptr [ebp+off],edx */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[1]));
+
+ /* prog = array->ptrs[index]; */
+ /* mov edx, [eax + edx * 4 + offsetof(...)] */
+ EMIT3_off32(0x8B, 0x94, 0x90, offsetof(struct bpf_array, ptrs));
+
+ /*
+ * if (prog == NULL)
+ * goto out;
+ */
+ /* test edx,edx */
+ EMIT2(0x85, add_2reg(0xC0, IA32_EDX, IA32_EDX));
+ /* je out */
+ EMIT2(IA32_JE, jmp_label(jmp_label1, 2));
+
+ /* goto *(prog->bpf_func + prologue_size); */
+ /* mov edx, dword ptr [edx + 32] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EDX, IA32_EDX),
+ offsetof(struct bpf_prog, bpf_func));
+ /* add edx,prologue_size */
+ EMIT3(0x83, add_1reg(0xC0, IA32_EDX), PROLOGUE_SIZE);
+
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r1[0]));
+
+ /*
+ * Now we're ready to jump into next BPF program:
+ * eax == ctx (1st arg)
+ * edx == prog->bpf_func + prologue_size
+ */
+ RETPOLINE_EDX_BPF_JIT();
+
+ if (jmp_label1 == -1)
+ jmp_label1 = cnt;
+
+ /* out: */
+ *pprog = prog;
+}
+
+/* Push the scratch stack register on top of the stack. */
+static inline void emit_push_r64(const u8 src[], u8 **pprog)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src_hi));
+ /* push ecx */
+ EMIT1(0x51);
+
+ /* mov ecx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src_lo));
+ /* push ecx */
+ EMIT1(0x51);
+
+ *pprog = prog;
+}
+
+static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
+ int oldproglen, struct jit_context *ctx)
+{
+ struct bpf_insn *insn = bpf_prog->insnsi;
+ int insn_cnt = bpf_prog->len;
+ bool seen_exit = false;
+ u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
+ int i, cnt = 0;
+ int proglen = 0;
+ u8 *prog = temp;
+
+ emit_prologue(&prog, bpf_prog->aux->stack_depth);
+
+ for (i = 0; i < insn_cnt; i++, insn++) {
+ const s32 imm32 = insn->imm;
+ const bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
+ const bool dstk = insn->dst_reg == BPF_REG_AX ? false : true;
+ const bool sstk = insn->src_reg == BPF_REG_AX ? false : true;
+ const u8 code = insn->code;
+ const u8 *dst = bpf2ia32[insn->dst_reg];
+ const u8 *src = bpf2ia32[insn->src_reg];
+ const u8 *r0 = bpf2ia32[BPF_REG_0];
+ s64 jmp_offset;
+ u8 jmp_cond;
+ int ilen;
+ u8 *func;
+
+ switch (code) {
+ /* ALU operations */
+ /* dst = src */
+ case BPF_ALU | BPF_MOV | BPF_K:
+ case BPF_ALU | BPF_MOV | BPF_X:
+ case BPF_ALU64 | BPF_MOV | BPF_K:
+ case BPF_ALU64 | BPF_MOV | BPF_X:
+ switch (BPF_SRC(code)) {
+ case BPF_X:
+ emit_ia32_mov_r64(is64, dst, src, dstk,
+ sstk, &prog);
+ break;
+ case BPF_K:
+ /* Sign-extend immediate value to dst reg */
+ emit_ia32_mov_i64(is64, dst, imm32,
+ dstk, &prog);
+ break;
+ }
+ break;
+ /* dst = dst + src/imm */
+ /* dst = dst - src/imm */
+ /* dst = dst | src/imm */
+ /* dst = dst & src/imm */
+ /* dst = dst ^ src/imm */
+ /* dst = dst * src/imm */
+ /* dst = dst << src */
+ /* dst = dst >> src */
+ case BPF_ALU | BPF_ADD | BPF_K:
+ case BPF_ALU | BPF_ADD | BPF_X:
+ case BPF_ALU | BPF_SUB | BPF_K:
+ case BPF_ALU | BPF_SUB | BPF_X:
+ case BPF_ALU | BPF_OR | BPF_K:
+ case BPF_ALU | BPF_OR | BPF_X:
+ case BPF_ALU | BPF_AND | BPF_K:
+ case BPF_ALU | BPF_AND | BPF_X:
+ case BPF_ALU | BPF_XOR | BPF_K:
+ case BPF_ALU | BPF_XOR | BPF_X:
+ case BPF_ALU64 | BPF_ADD | BPF_K:
+ case BPF_ALU64 | BPF_ADD | BPF_X:
+ case BPF_ALU64 | BPF_SUB | BPF_K:
+ case BPF_ALU64 | BPF_SUB | BPF_X:
+ case BPF_ALU64 | BPF_OR | BPF_K:
+ case BPF_ALU64 | BPF_OR | BPF_X:
+ case BPF_ALU64 | BPF_AND | BPF_K:
+ case BPF_ALU64 | BPF_AND | BPF_X:
+ case BPF_ALU64 | BPF_XOR | BPF_K:
+ case BPF_ALU64 | BPF_XOR | BPF_X:
+ switch (BPF_SRC(code)) {
+ case BPF_X:
+ emit_ia32_alu_r64(is64, BPF_OP(code), dst,
+ src, dstk, sstk, &prog);
+ break;
+ case BPF_K:
+ emit_ia32_alu_i64(is64, BPF_OP(code), dst,
+ imm32, dstk, &prog);
+ break;
+ }
+ break;
+ case BPF_ALU | BPF_MUL | BPF_K:
+ case BPF_ALU | BPF_MUL | BPF_X:
+ switch (BPF_SRC(code)) {
+ case BPF_X:
+ emit_ia32_mul_r(dst_lo, src_lo, dstk,
+ sstk, &prog);
+ break;
+ case BPF_K:
+ /* mov ecx,imm32*/
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX),
+ imm32);
+ emit_ia32_mul_r(dst_lo, IA32_ECX, dstk,
+ false, &prog);
+ break;
+ }
+ emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+ break;
+ case BPF_ALU | BPF_LSH | BPF_X:
+ case BPF_ALU | BPF_RSH | BPF_X:
+ case BPF_ALU | BPF_ARSH | BPF_K:
+ case BPF_ALU | BPF_ARSH | BPF_X:
+ switch (BPF_SRC(code)) {
+ case BPF_X:
+ emit_ia32_shift_r(BPF_OP(code), dst_lo, src_lo,
+ dstk, sstk, &prog);
+ break;
+ case BPF_K:
+ /* mov ecx,imm32*/
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX),
+ imm32);
+ emit_ia32_shift_r(BPF_OP(code), dst_lo,
+ IA32_ECX, dstk, false,
+ &prog);
+ break;
+ }
+ emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+ break;
+ /* dst = dst / src(imm) */
+ /* dst = dst % src(imm) */
+ case BPF_ALU | BPF_DIV | BPF_K:
+ case BPF_ALU | BPF_DIV | BPF_X:
+ case BPF_ALU | BPF_MOD | BPF_K:
+ case BPF_ALU | BPF_MOD | BPF_X:
+ switch (BPF_SRC(code)) {
+ case BPF_X:
+ emit_ia32_div_mod_r(BPF_OP(code), dst_lo,
+ src_lo, dstk, sstk, &prog);
+ break;
+ case BPF_K:
+ /* mov ecx,imm32*/
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX),
+ imm32);
+ emit_ia32_div_mod_r(BPF_OP(code), dst_lo,
+ IA32_ECX, dstk, false,
+ &prog);
+ break;
+ }
+ emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+ break;
+ case BPF_ALU64 | BPF_DIV | BPF_K:
+ case BPF_ALU64 | BPF_DIV | BPF_X:
+ case BPF_ALU64 | BPF_MOD | BPF_K:
+ case BPF_ALU64 | BPF_MOD | BPF_X:
+ goto notyet;
+ /* dst = dst >> imm */
+ /* dst = dst << imm */
+ case BPF_ALU | BPF_RSH | BPF_K:
+ case BPF_ALU | BPF_LSH | BPF_K:
+ if (unlikely(imm32 > 31))
+ return -EINVAL;
+ /* mov ecx,imm32*/
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
+ emit_ia32_shift_r(BPF_OP(code), dst_lo, IA32_ECX, dstk,
+ false, &prog);
+ emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+ break;
+ /* dst = dst << imm */
+ case BPF_ALU64 | BPF_LSH | BPF_K:
+ if (unlikely(imm32 > 63))
+ return -EINVAL;
+ emit_ia32_lsh_i64(dst, imm32, dstk, &prog);
+ break;
+ /* dst = dst >> imm */
+ case BPF_ALU64 | BPF_RSH | BPF_K:
+ if (unlikely(imm32 > 63))
+ return -EINVAL;
+ emit_ia32_rsh_i64(dst, imm32, dstk, &prog);
+ break;
+ /* dst = dst << src */
+ case BPF_ALU64 | BPF_LSH | BPF_X:
+ emit_ia32_lsh_r64(dst, src, dstk, sstk, &prog);
+ break;
+ /* dst = dst >> src */
+ case BPF_ALU64 | BPF_RSH | BPF_X:
+ emit_ia32_rsh_r64(dst, src, dstk, sstk, &prog);
+ break;
+ /* dst = dst >> src (signed) */
+ case BPF_ALU64 | BPF_ARSH | BPF_X:
+ emit_ia32_arsh_r64(dst, src, dstk, sstk, &prog);
+ break;
+ /* dst = dst >> imm (signed) */
+ case BPF_ALU64 | BPF_ARSH | BPF_K:
+ if (unlikely(imm32 > 63))
+ return -EINVAL;
+ emit_ia32_arsh_i64(dst, imm32, dstk, &prog);
+ break;
+ /* dst = ~dst */
+ case BPF_ALU | BPF_NEG:
+ emit_ia32_alu_i(is64, false, BPF_OP(code),
+ dst_lo, 0, dstk, &prog);
+ emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+ break;
+ /* dst = ~dst (64 bit) */
+ case BPF_ALU64 | BPF_NEG:
+ emit_ia32_neg64(dst, dstk, &prog);
+ break;
+ /* dst = dst * src/imm */
+ case BPF_ALU64 | BPF_MUL | BPF_X:
+ case BPF_ALU64 | BPF_MUL | BPF_K:
+ switch (BPF_SRC(code)) {
+ case BPF_X:
+ emit_ia32_mul_r64(dst, src, dstk, sstk, &prog);
+ break;
+ case BPF_K:
+ emit_ia32_mul_i64(dst, imm32, dstk, &prog);
+ break;
+ }
+ break;
+ /* dst = htole(dst) */
+ case BPF_ALU | BPF_END | BPF_FROM_LE:
+ emit_ia32_to_le_r64(dst, imm32, dstk, &prog);
+ break;
+ /* dst = htobe(dst) */
+ case BPF_ALU | BPF_END | BPF_FROM_BE:
+ emit_ia32_to_be_r64(dst, imm32, dstk, &prog);
+ break;
+ /* dst = imm64 */
+ case BPF_LD | BPF_IMM | BPF_DW: {
+ s32 hi, lo = imm32;
+
+ hi = insn[1].imm;
+ emit_ia32_mov_i(dst_lo, lo, dstk, &prog);
+ emit_ia32_mov_i(dst_hi, hi, dstk, &prog);
+ insn++;
+ i++;
+ break;
+ }
+ /* ST: *(u8*)(dst_reg + off) = imm */
+ case BPF_ST | BPF_MEM | BPF_H:
+ case BPF_ST | BPF_MEM | BPF_B:
+ case BPF_ST | BPF_MEM | BPF_W:
+ case BPF_ST | BPF_MEM | BPF_DW:
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ else
+ /* mov eax,dst_lo */
+ EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+ switch (BPF_SIZE(code)) {
+ case BPF_B:
+ EMIT(0xC6, 1); break;
+ case BPF_H:
+ EMIT2(0x66, 0xC7); break;
+ case BPF_W:
+ case BPF_DW:
+ EMIT(0xC7, 1); break;
+ }
+
+ if (is_imm8(insn->off))
+ EMIT2(add_1reg(0x40, IA32_EAX), insn->off);
+ else
+ EMIT1_off32(add_1reg(0x80, IA32_EAX),
+ insn->off);
+ EMIT(imm32, bpf_size_to_x86_bytes(BPF_SIZE(code)));
+
+ if (BPF_SIZE(code) == BPF_DW) {
+ u32 hi;
+
+ hi = imm32 & (1<<31) ? (u32)~0 : 0;
+ EMIT2_off32(0xC7, add_1reg(0x80, IA32_EAX),
+ insn->off + 4);
+ EMIT(hi, 4);
+ }
+ break;
+
+ /* STX: *(u8*)(dst_reg + off) = src_reg */
+ case BPF_STX | BPF_MEM | BPF_B:
+ case BPF_STX | BPF_MEM | BPF_H:
+ case BPF_STX | BPF_MEM | BPF_W:
+ case BPF_STX | BPF_MEM | BPF_DW:
+ if (dstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ else
+ /* mov eax,dst_lo */
+ EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+ if (sstk)
+ /* mov edx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(src_lo));
+ else
+ /* mov edx,src_lo */
+ EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_EDX));
+
+ switch (BPF_SIZE(code)) {
+ case BPF_B:
+ EMIT(0x88, 1); break;
+ case BPF_H:
+ EMIT2(0x66, 0x89); break;
+ case BPF_W:
+ case BPF_DW:
+ EMIT(0x89, 1); break;
+ }
+
+ if (is_imm8(insn->off))
+ EMIT2(add_2reg(0x40, IA32_EAX, IA32_EDX),
+ insn->off);
+ else
+ EMIT1_off32(add_2reg(0x80, IA32_EAX, IA32_EDX),
+ insn->off);
+
+ if (BPF_SIZE(code) == BPF_DW) {
+ if (sstk)
+ /* mov edi,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP,
+ IA32_EDX),
+ STACK_VAR(src_hi));
+ else
+ /* mov edi,src_hi */
+ EMIT2(0x8B, add_2reg(0xC0, src_hi,
+ IA32_EDX));
+ EMIT1(0x89);
+ if (is_imm8(insn->off + 4)) {
+ EMIT2(add_2reg(0x40, IA32_EAX,
+ IA32_EDX),
+ insn->off + 4);
+ } else {
+ EMIT1(add_2reg(0x80, IA32_EAX,
+ IA32_EDX));
+ EMIT(insn->off + 4, 4);
+ }
+ }
+ break;
+
+ /* LDX: dst_reg = *(u8*)(src_reg + off) */
+ case BPF_LDX | BPF_MEM | BPF_B:
+ case BPF_LDX | BPF_MEM | BPF_H:
+ case BPF_LDX | BPF_MEM | BPF_W:
+ case BPF_LDX | BPF_MEM | BPF_DW:
+ if (sstk)
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(src_lo));
+ else
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_EAX));
+
+ switch (BPF_SIZE(code)) {
+ case BPF_B:
+ EMIT2(0x0F, 0xB6); break;
+ case BPF_H:
+ EMIT2(0x0F, 0xB7); break;
+ case BPF_W:
+ case BPF_DW:
+ EMIT(0x8B, 1); break;
+ }
+
+ if (is_imm8(insn->off))
+ EMIT2(add_2reg(0x40, IA32_EAX, IA32_EDX),
+ insn->off);
+ else
+ EMIT1_off32(add_2reg(0x80, IA32_EAX, IA32_EDX),
+ insn->off);
+
+ if (dstk)
+ /* mov dword ptr [ebp+off],edx */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_lo));
+ else
+ /* mov dst_lo,edx */
+ EMIT2(0x89, add_2reg(0xC0, dst_lo, IA32_EDX));
+ switch (BPF_SIZE(code)) {
+ case BPF_B:
+ case BPF_H:
+ case BPF_W:
+ if (dstk) {
+ EMIT3(0xC7, add_1reg(0x40, IA32_EBP),
+ STACK_VAR(dst_hi));
+ EMIT(0x0, 4);
+ } else {
+ EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);
+ }
+ break;
+ case BPF_DW:
+ EMIT2_off32(0x8B,
+ add_2reg(0x80, IA32_EAX, IA32_EDX),
+ insn->off + 4);
+ if (dstk)
+ EMIT3(0x89,
+ add_2reg(0x40, IA32_EBP,
+ IA32_EDX),
+ STACK_VAR(dst_hi));
+ else
+ EMIT2(0x89,
+ add_2reg(0xC0, dst_hi, IA32_EDX));
+ break;
+ default:
+ break;
+ }
+ break;
+ /* call */
+ case BPF_JMP | BPF_CALL:
+ {
+ const u8 *r1 = bpf2ia32[BPF_REG_1];
+ const u8 *r2 = bpf2ia32[BPF_REG_2];
+ const u8 *r3 = bpf2ia32[BPF_REG_3];
+ const u8 *r4 = bpf2ia32[BPF_REG_4];
+ const u8 *r5 = bpf2ia32[BPF_REG_5];
+
+ if (insn->src_reg == BPF_PSEUDO_CALL)
+ goto notyet;
+
+ func = (u8 *) __bpf_call_base + imm32;
+ jmp_offset = func - (image + addrs[i]);
+
+ if (!imm32 || !is_simm32(jmp_offset)) {
+ pr_err("unsupported BPF func %d addr %p image %p\n",
+ imm32, func, image);
+ return -EINVAL;
+ }
+
+ /* mov eax,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(r1[0]));
+ /* mov edx,dword ptr [ebp+off] */
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(r1[1]));
+
+ emit_push_r64(r5, &prog);
+ emit_push_r64(r4, &prog);
+ emit_push_r64(r3, &prog);
+ emit_push_r64(r2, &prog);
+
+ EMIT1_off32(0xE8, jmp_offset + 9);
+
+ /* mov dword ptr [ebp+off],eax */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(r0[0]));
+ /* mov dword ptr [ebp+off],edx */
+ EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(r0[1]));
+
+ /* add esp,32 */
+ EMIT3(0x83, add_1reg(0xC0, IA32_ESP), 32);
+ break;
+ }
+ case BPF_JMP | BPF_TAIL_CALL:
+ emit_bpf_tail_call(&prog);
+ break;
+
+ /* cond jump */
+ case BPF_JMP | BPF_JEQ | BPF_X:
+ case BPF_JMP | BPF_JNE | BPF_X:
+ case BPF_JMP | BPF_JGT | BPF_X:
+ case BPF_JMP | BPF_JLT | BPF_X:
+ case BPF_JMP | BPF_JGE | BPF_X:
+ case BPF_JMP | BPF_JLE | BPF_X:
+ case BPF_JMP | BPF_JSGT | BPF_X:
+ case BPF_JMP | BPF_JSLE | BPF_X:
+ case BPF_JMP | BPF_JSLT | BPF_X:
+ case BPF_JMP | BPF_JSGE | BPF_X: {
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+ u8 sreg_lo = sstk ? IA32_ECX : src_lo;
+ u8 sreg_hi = sstk ? IA32_EBX : src_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ if (sstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(src_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX),
+ STACK_VAR(src_hi));
+ }
+
+ /* cmp dreg_hi,sreg_hi */
+ EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi));
+ EMIT2(IA32_JNE, 2);
+ /* cmp dreg_lo,sreg_lo */
+ EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo));
+ goto emit_cond_jmp;
+ }
+ case BPF_JMP | BPF_JSET | BPF_X: {
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+ u8 sreg_lo = sstk ? IA32_ECX : src_lo;
+ u8 sreg_hi = sstk ? IA32_EBX : src_hi;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ if (sstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+ STACK_VAR(src_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX),
+ STACK_VAR(src_hi));
+ }
+ /* and dreg_lo,sreg_lo */
+ EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
+ /* and dreg_hi,sreg_hi */
+ EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
+ /* or dreg_lo,dreg_hi */
+ EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
+ goto emit_cond_jmp;
+ }
+ case BPF_JMP | BPF_JSET | BPF_K: {
+ u32 hi;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+ u8 sreg_lo = IA32_ECX;
+ u8 sreg_hi = IA32_EBX;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+ hi = imm32 & (1<<31) ? (u32)~0 : 0;
+
+ /* mov ecx,imm32 */
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
+ /* mov ebx,imm32 */
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX), hi);
+
+ /* and dreg_lo,sreg_lo */
+ EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
+ /* and dreg_hi,sreg_hi */
+ EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
+ /* or dreg_lo,dreg_hi */
+ EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
+ goto emit_cond_jmp;
+ }
+ case BPF_JMP | BPF_JEQ | BPF_K:
+ case BPF_JMP | BPF_JNE | BPF_K:
+ case BPF_JMP | BPF_JGT | BPF_K:
+ case BPF_JMP | BPF_JLT | BPF_K:
+ case BPF_JMP | BPF_JGE | BPF_K:
+ case BPF_JMP | BPF_JLE | BPF_K:
+ case BPF_JMP | BPF_JSGT | BPF_K:
+ case BPF_JMP | BPF_JSLE | BPF_K:
+ case BPF_JMP | BPF_JSLT | BPF_K:
+ case BPF_JMP | BPF_JSGE | BPF_K: {
+ u32 hi;
+ u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+ u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+ u8 sreg_lo = IA32_ECX;
+ u8 sreg_hi = IA32_EBX;
+
+ if (dstk) {
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+ STACK_VAR(dst_lo));
+ EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+ STACK_VAR(dst_hi));
+ }
+
+ hi = imm32 & (1<<31) ? (u32)~0 : 0;
+ /* mov ecx,imm32 */
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
+ /* mov ebx,imm32 */
+ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX), hi);
+
+ /* cmp dreg_hi,sreg_hi */
+ EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi));
+ EMIT2(IA32_JNE, 2);
+ /* cmp dreg_lo,sreg_lo */
+ EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo));
+
+emit_cond_jmp: /* Convert BPF opcode to x86 */
+ switch (BPF_OP(code)) {
+ case BPF_JEQ:
+ jmp_cond = IA32_JE;
+ break;
+ case BPF_JSET:
+ case BPF_JNE:
+ jmp_cond = IA32_JNE;
+ break;
+ case BPF_JGT:
+ /* GT is unsigned '>', JA in x86 */
+ jmp_cond = IA32_JA;
+ break;
+ case BPF_JLT:
+ /* LT is unsigned '<', JB in x86 */
+ jmp_cond = IA32_JB;
+ break;
+ case BPF_JGE:
+ /* GE is unsigned '>=', JAE in x86 */
+ jmp_cond = IA32_JAE;
+ break;
+ case BPF_JLE:
+ /* LE is unsigned '<=', JBE in x86 */
+ jmp_cond = IA32_JBE;
+ break;
+ case BPF_JSGT:
+ /* Signed '>', GT in x86 */
+ jmp_cond = IA32_JG;
+ break;
+ case BPF_JSLT:
+ /* Signed '<', LT in x86 */
+ jmp_cond = IA32_JL;
+ break;
+ case BPF_JSGE:
+ /* Signed '>=', GE in x86 */
+ jmp_cond = IA32_JGE;
+ break;
+ case BPF_JSLE:
+ /* Signed '<=', LE in x86 */
+ jmp_cond = IA32_JLE;
+ break;
+ default: /* to silence GCC warning */
+ return -EFAULT;
+ }
+ jmp_offset = addrs[i + insn->off] - addrs[i];
+ if (is_imm8(jmp_offset)) {
+ EMIT2(jmp_cond, jmp_offset);
+ } else if (is_simm32(jmp_offset)) {
+ EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset);
+ } else {
+ pr_err("cond_jmp gen bug %llx\n", jmp_offset);
+ return -EFAULT;
+ }
+
+ break;
+ }
+ case BPF_JMP | BPF_JA:
+ if (insn->off == -1)
+ /* -1 jmp instructions will always jump
+ * backwards two bytes. Explicitly handling
+ * this case avoids wasting too many passes
+ * when there are long sequences of replaced
+ * dead code.
+ */
+ jmp_offset = -2;
+ else
+ jmp_offset = addrs[i + insn->off] - addrs[i];
+
+ if (!jmp_offset)
+ /* Optimize out nop jumps */
+ break;
+emit_jmp:
+ if (is_imm8(jmp_offset)) {
+ EMIT2(0xEB, jmp_offset);
+ } else if (is_simm32(jmp_offset)) {
+ EMIT1_off32(0xE9, jmp_offset);
+ } else {
+ pr_err("jmp gen bug %llx\n", jmp_offset);
+ return -EFAULT;
+ }
+ break;
+ /* STX XADD: lock *(u32 *)(dst + off) += src */
+ case BPF_STX | BPF_XADD | BPF_W:
+ /* STX XADD: lock *(u64 *)(dst + off) += src */
+ case BPF_STX | BPF_XADD | BPF_DW:
+ goto notyet;
+ case BPF_JMP | BPF_EXIT:
+ if (seen_exit) {
+ jmp_offset = ctx->cleanup_addr - addrs[i];
+ goto emit_jmp;
+ }
+ seen_exit = true;
+ /* Update cleanup_addr */
+ ctx->cleanup_addr = proglen;
+ emit_epilogue(&prog, bpf_prog->aux->stack_depth);
+ break;
+notyet:
+ pr_info_once("*** NOT YET: opcode %02x ***\n", code);
+ return -EFAULT;
+ default:
+ /*
+ * This error will be seen if new instruction was added
+ * to interpreter, but not to JIT or if there is junk in
+ * bpf_prog
+ */
+ pr_err("bpf_jit: unknown opcode %02x\n", code);
+ return -EINVAL;
+ }
+
+ ilen = prog - temp;
+ if (ilen > BPF_MAX_INSN_SIZE) {
+ pr_err("bpf_jit: fatal insn size error\n");
+ return -EFAULT;
+ }
+
+ if (image) {
+ if (unlikely(proglen + ilen > oldproglen)) {
+ pr_err("bpf_jit: fatal error\n");
+ return -EFAULT;
+ }
+ memcpy(image + proglen, temp, ilen);
+ }
+ proglen += ilen;
+ addrs[i] = proglen;
+ prog = temp;
+ }
+ return proglen;
+}
+
+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
+{
+ struct bpf_binary_header *header = NULL;
+ struct bpf_prog *tmp, *orig_prog = prog;
+ int proglen, oldproglen = 0;
+ struct jit_context ctx = {};
+ bool tmp_blinded = false;
+ u8 *image = NULL;
+ int *addrs;
+ int pass;
+ int i;
+
+ if (!prog->jit_requested)
+ return orig_prog;
+
+ tmp = bpf_jit_blind_constants(prog);
+ /*
+ * If blinding was requested and we failed during blinding,
+ * we must fall back to the interpreter.
+ */
+ if (IS_ERR(tmp))
+ return orig_prog;
+ if (tmp != prog) {
+ tmp_blinded = true;
+ prog = tmp;
+ }
+
+ addrs = kmalloc(prog->len * sizeof(*addrs), GFP_KERNEL);
+ if (!addrs) {
+ prog = orig_prog;
+ goto out;
+ }
+
+ /*
+ * Before first pass, make a rough estimation of addrs[]
+ * each BPF instruction is translated to less than 64 bytes
+ */
+ for (proglen = 0, i = 0; i < prog->len; i++) {
+ proglen += 64;
+ addrs[i] = proglen;
+ }
+ ctx.cleanup_addr = proglen;
+
+ /*
+ * JITed image shrinks with every pass and the loop iterates
+ * until the image stops shrinking. Very large BPF programs
+ * may converge on the last pass. In such case do one more
+ * pass to emit the final image.
+ */
+ for (pass = 0; pass < 20 || image; pass++) {
+ proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
+ if (proglen <= 0) {
+out_image:
+ image = NULL;
+ if (header)
+ bpf_jit_binary_free(header);
+ prog = orig_prog;
+ goto out_addrs;
+ }
+ if (image) {
+ if (proglen != oldproglen) {
+ pr_err("bpf_jit: proglen=%d != oldproglen=%d\n",
+ proglen, oldproglen);
+ goto out_image;
+ }
+ break;
+ }
+ if (proglen == oldproglen) {
+ header = bpf_jit_binary_alloc(proglen, &image,
+ 1, jit_fill_hole);
+ if (!header) {
+ prog = orig_prog;
+ goto out_addrs;
+ }
+ }
+ oldproglen = proglen;
+ cond_resched();
+ }
+
+ if (bpf_jit_enable > 1)
+ bpf_jit_dump(prog->len, proglen, pass + 1, image);
+
+ if (image) {
+ bpf_jit_binary_lock_ro(header);
+ prog->bpf_func = (void *)image;
+ prog->jited = 1;
+ prog->jited_len = proglen;
+ } else {
+ prog = orig_prog;
+ }
+
+out_addrs:
+ kfree(addrs);
+out:
+ if (tmp_blinded)
+ bpf_jit_prog_release_other(prog, prog == orig_prog ?
+ tmp : orig_prog);
+ return prog;
+}
diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig
index 010f5f5..f3c643a 100644
--- a/drivers/bluetooth/Kconfig
+++ b/drivers/bluetooth/Kconfig
@@ -197,6 +197,7 @@
config BT_HCIUART_QCA
bool "Qualcomm Atheros protocol support"
depends on BT_HCIUART
+ depends on BT_HCIUART_SERDEV
select BT_HCIUART_H4
select BT_QCA
help
diff --git a/drivers/bluetooth/btbcm.c b/drivers/bluetooth/btbcm.c
index 6659f11..99cde1f 100644
--- a/drivers/bluetooth/btbcm.c
+++ b/drivers/bluetooth/btbcm.c
@@ -315,10 +315,12 @@
return 0;
}
-static const struct {
+struct bcm_subver_table {
u16 subver;
const char *name;
-} bcm_uart_subver_table[] = {
+};
+
+static const struct bcm_subver_table bcm_uart_subver_table[] = {
{ 0x4103, "BCM4330B1" }, /* 002.001.003 */
{ 0x410e, "BCM43341B0" }, /* 002.001.014 */
{ 0x4406, "BCM4324B3" }, /* 002.004.006 */
@@ -330,98 +332,7 @@
{ }
};
-int btbcm_initialize(struct hci_dev *hdev, char *fw_name, size_t len)
-{
- u16 subver, rev;
- const char *hw_name = NULL;
- struct sk_buff *skb;
- struct hci_rp_read_local_version *ver;
- int i, err;
-
- /* Reset */
- err = btbcm_reset(hdev);
- if (err)
- return err;
-
- /* Read Local Version Info */
- skb = btbcm_read_local_version(hdev);
- if (IS_ERR(skb))
- return PTR_ERR(skb);
-
- ver = (struct hci_rp_read_local_version *)skb->data;
- rev = le16_to_cpu(ver->hci_rev);
- subver = le16_to_cpu(ver->lmp_subver);
- kfree_skb(skb);
-
- /* Read controller information */
- err = btbcm_read_info(hdev);
- if (err)
- return err;
-
- switch ((rev & 0xf000) >> 12) {
- case 0:
- case 1:
- case 2:
- case 3:
- for (i = 0; bcm_uart_subver_table[i].name; i++) {
- if (subver == bcm_uart_subver_table[i].subver) {
- hw_name = bcm_uart_subver_table[i].name;
- break;
- }
- }
-
- snprintf(fw_name, len, "brcm/%s.hcd", hw_name ? : "BCM");
- break;
- default:
- return 0;
- }
-
- bt_dev_info(hdev, "%s (%3.3u.%3.3u.%3.3u) build %4.4u",
- hw_name ? : "BCM", (subver & 0xe000) >> 13,
- (subver & 0x1f00) >> 8, (subver & 0x00ff), rev & 0x0fff);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(btbcm_initialize);
-
-int btbcm_finalize(struct hci_dev *hdev)
-{
- struct sk_buff *skb;
- struct hci_rp_read_local_version *ver;
- u16 subver, rev;
- int err;
-
- /* Reset */
- err = btbcm_reset(hdev);
- if (err)
- return err;
-
- /* Read Local Version Info */
- skb = btbcm_read_local_version(hdev);
- if (IS_ERR(skb))
- return PTR_ERR(skb);
-
- ver = (struct hci_rp_read_local_version *)skb->data;
- rev = le16_to_cpu(ver->hci_rev);
- subver = le16_to_cpu(ver->lmp_subver);
- kfree_skb(skb);
-
- bt_dev_info(hdev, "BCM (%3.3u.%3.3u.%3.3u) build %4.4u",
- (subver & 0xe000) >> 13, (subver & 0x1f00) >> 8,
- (subver & 0x00ff), rev & 0x0fff);
-
- btbcm_check_bdaddr(hdev);
-
- set_bit(HCI_QUIRK_STRICT_DUPLICATE_FILTER, &hdev->quirks);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(btbcm_finalize);
-
-static const struct {
- u16 subver;
- const char *name;
-} bcm_usb_subver_table[] = {
+static const struct bcm_subver_table bcm_usb_subver_table[] = {
{ 0x210b, "BCM43142A0" }, /* 001.001.011 */
{ 0x2112, "BCM4314A0" }, /* 001.001.018 */
{ 0x2118, "BCM20702A0" }, /* 001.001.024 */
@@ -435,14 +346,14 @@
{ }
};
-int btbcm_setup_patchram(struct hci_dev *hdev)
+int btbcm_initialize(struct hci_dev *hdev, char *fw_name, size_t len,
+ bool reinit)
{
- char fw_name[64];
- const struct firmware *fw;
u16 subver, rev, pid, vid;
- const char *hw_name = NULL;
+ const char *hw_name = "BCM";
struct sk_buff *skb;
struct hci_rp_read_local_version *ver;
+ const struct bcm_subver_table *bcm_subver_table;
int i, err;
/* Reset */
@@ -461,25 +372,27 @@
kfree_skb(skb);
/* Read controller information */
- err = btbcm_read_info(hdev);
- if (err)
- return err;
+ if (!reinit) {
+ err = btbcm_read_info(hdev);
+ if (err)
+ return err;
+ }
- switch ((rev & 0xf000) >> 12) {
- case 0:
- case 3:
- for (i = 0; bcm_uart_subver_table[i].name; i++) {
- if (subver == bcm_uart_subver_table[i].subver) {
- hw_name = bcm_uart_subver_table[i].name;
- break;
- }
+ /* Upper nibble of rev should be between 0 and 3? */
+ if (((rev & 0xf000) >> 12) > 3)
+ return 0;
+
+ bcm_subver_table = (hdev->bus == HCI_USB) ? bcm_usb_subver_table :
+ bcm_uart_subver_table;
+
+ for (i = 0; bcm_subver_table[i].name; i++) {
+ if (subver == bcm_subver_table[i].subver) {
+ hw_name = bcm_subver_table[i].name;
+ break;
}
+ }
- snprintf(fw_name, sizeof(fw_name), "brcm/%s.hcd",
- hw_name ? : "BCM");
- break;
- case 1:
- case 2:
+ if (hdev->bus == HCI_USB) {
/* Read USB Product Info */
skb = btbcm_read_usb_product(hdev);
if (IS_ERR(skb))
@@ -489,24 +402,50 @@
pid = get_unaligned_le16(skb->data + 3);
kfree_skb(skb);
- for (i = 0; bcm_usb_subver_table[i].name; i++) {
- if (subver == bcm_usb_subver_table[i].subver) {
- hw_name = bcm_usb_subver_table[i].name;
- break;
- }
- }
-
- snprintf(fw_name, sizeof(fw_name), "brcm/%s-%4.4x-%4.4x.hcd",
- hw_name ? : "BCM", vid, pid);
- break;
- default:
- return 0;
+ snprintf(fw_name, len, "brcm/%s-%4.4x-%4.4x.hcd",
+ hw_name, vid, pid);
+ } else {
+ snprintf(fw_name, len, "brcm/%s.hcd", hw_name);
}
bt_dev_info(hdev, "%s (%3.3u.%3.3u.%3.3u) build %4.4u",
- hw_name ? : "BCM", (subver & 0xe000) >> 13,
+ hw_name, (subver & 0xe000) >> 13,
(subver & 0x1f00) >> 8, (subver & 0x00ff), rev & 0x0fff);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(btbcm_initialize);
+
+int btbcm_finalize(struct hci_dev *hdev)
+{
+ char fw_name[64];
+ int err;
+
+ /* Re-initialize */
+ err = btbcm_initialize(hdev, fw_name, sizeof(fw_name), true);
+ if (err)
+ return err;
+
+ btbcm_check_bdaddr(hdev);
+
+ set_bit(HCI_QUIRK_STRICT_DUPLICATE_FILTER, &hdev->quirks);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(btbcm_finalize);
+
+int btbcm_setup_patchram(struct hci_dev *hdev)
+{
+ char fw_name[64];
+ const struct firmware *fw;
+ struct sk_buff *skb;
+ int err;
+
+ /* Initialize */
+ err = btbcm_initialize(hdev, fw_name, sizeof(fw_name), false);
+ if (err)
+ return err;
+
err = request_firmware(&fw, fw_name, &hdev->dev);
if (err < 0) {
bt_dev_info(hdev, "BCM: Patch %s not found", fw_name);
@@ -517,25 +456,11 @@
release_firmware(fw);
- /* Reset */
- err = btbcm_reset(hdev);
+ /* Re-initialize */
+ err = btbcm_initialize(hdev, fw_name, sizeof(fw_name), true);
if (err)
return err;
- /* Read Local Version Info */
- skb = btbcm_read_local_version(hdev);
- if (IS_ERR(skb))
- return PTR_ERR(skb);
-
- ver = (struct hci_rp_read_local_version *)skb->data;
- rev = le16_to_cpu(ver->hci_rev);
- subver = le16_to_cpu(ver->lmp_subver);
- kfree_skb(skb);
-
- bt_dev_info(hdev, "%s (%3.3u.%3.3u.%3.3u) build %4.4u",
- hw_name ? : "BCM", (subver & 0xe000) >> 13,
- (subver & 0x1f00) >> 8, (subver & 0x00ff), rev & 0x0fff);
-
/* Read Local Name */
skb = btbcm_read_local_name(hdev);
if (IS_ERR(skb))
diff --git a/drivers/bluetooth/btbcm.h b/drivers/bluetooth/btbcm.h
index cfe6ad4..5346515 100644
--- a/drivers/bluetooth/btbcm.h
+++ b/drivers/bluetooth/btbcm.h
@@ -73,7 +73,8 @@
int btbcm_setup_patchram(struct hci_dev *hdev);
int btbcm_setup_apple(struct hci_dev *hdev);
-int btbcm_initialize(struct hci_dev *hdev, char *fw_name, size_t len);
+int btbcm_initialize(struct hci_dev *hdev, char *fw_name, size_t len,
+ bool reinit);
int btbcm_finalize(struct hci_dev *hdev);
#else
@@ -104,7 +105,7 @@
}
static inline int btbcm_initialize(struct hci_dev *hdev, char *fw_name,
- size_t len)
+ size_t len, bool reinit)
{
return 0;
}
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 2793d41..8219816 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -127,28 +127,41 @@
BT_DBG("TLV Type\t\t : 0x%x", type_len & 0x000000ff);
BT_DBG("Length\t\t : %d bytes", length);
+ config->dnld_mode = ROME_SKIP_EVT_NONE;
+
switch (config->type) {
case TLV_TYPE_PATCH:
tlv_patch = (struct tlv_type_patch *)tlv->data;
- BT_DBG("Total Length\t\t : %d bytes",
+
+ /* For Rome version 1.1 to 3.1, all segment commands
+ * are acked by a vendor specific event (VSE).
+ * For Rome >= 3.2, the download mode field indicates
+ * if VSE is skipped by the controller.
+ * In case VSE is skipped, only the last segment is acked.
+ */
+ config->dnld_mode = tlv_patch->download_mode;
+
+ BT_DBG("Total Length : %d bytes",
le32_to_cpu(tlv_patch->total_size));
- BT_DBG("Patch Data Length\t : %d bytes",
+ BT_DBG("Patch Data Length : %d bytes",
le32_to_cpu(tlv_patch->data_length));
BT_DBG("Signing Format Version : 0x%x",
tlv_patch->format_version);
- BT_DBG("Signature Algorithm\t : 0x%x",
+ BT_DBG("Signature Algorithm : 0x%x",
tlv_patch->signature);
- BT_DBG("Reserved\t\t : 0x%x",
- le16_to_cpu(tlv_patch->reserved1));
- BT_DBG("Product ID\t\t : 0x%04x",
+ BT_DBG("Download mode : 0x%x",
+ tlv_patch->download_mode);
+ BT_DBG("Reserved : 0x%x",
+ tlv_patch->reserved1);
+ BT_DBG("Product ID : 0x%04x",
le16_to_cpu(tlv_patch->product_id));
- BT_DBG("Rom Build Version\t : 0x%04x",
+ BT_DBG("Rom Build Version : 0x%04x",
le16_to_cpu(tlv_patch->rom_build));
- BT_DBG("Patch Version\t\t : 0x%04x",
+ BT_DBG("Patch Version : 0x%04x",
le16_to_cpu(tlv_patch->patch_version));
- BT_DBG("Reserved\t\t : 0x%x",
+ BT_DBG("Reserved : 0x%x",
le16_to_cpu(tlv_patch->reserved2));
- BT_DBG("Patch Entry Address\t : 0x%x",
+ BT_DBG("Patch Entry Address : 0x%x",
le32_to_cpu(tlv_patch->entry));
break;
@@ -194,8 +207,8 @@
}
}
-static int rome_tlv_send_segment(struct hci_dev *hdev, int idx, int seg_size,
- const u8 *data)
+static int rome_tlv_send_segment(struct hci_dev *hdev, int seg_size,
+ const u8 *data, enum rome_tlv_dnld_mode mode)
{
struct sk_buff *skb;
struct edl_event_hdr *edl;
@@ -203,12 +216,14 @@
u8 cmd[MAX_SIZE_PER_TLV_SEGMENT + 2];
int err = 0;
- BT_DBG("%s: Download segment #%d size %d", hdev->name, idx, seg_size);
-
cmd[0] = EDL_PATCH_TLV_REQ_CMD;
cmd[1] = seg_size;
memcpy(cmd + 2, data, seg_size);
+ if (mode == ROME_SKIP_EVT_VSE_CC || mode == ROME_SKIP_EVT_VSE)
+ return __hci_cmd_send(hdev, EDL_PATCH_CMD_OPCODE, seg_size + 2,
+ cmd);
+
skb = __hci_cmd_sync_ev(hdev, EDL_PATCH_CMD_OPCODE, seg_size + 2, cmd,
HCI_VENDOR_PKT, HCI_INIT_TIMEOUT);
if (IS_ERR(skb)) {
@@ -245,47 +260,12 @@
return err;
}
-static int rome_tlv_download_request(struct hci_dev *hdev,
- const struct firmware *fw)
-{
- const u8 *buffer, *data;
- int total_segment, remain_size;
- int ret, i;
-
- if (!fw || !fw->data)
- return -EINVAL;
-
- total_segment = fw->size / MAX_SIZE_PER_TLV_SEGMENT;
- remain_size = fw->size % MAX_SIZE_PER_TLV_SEGMENT;
-
- BT_DBG("%s: Total segment num %d remain size %d total size %zu",
- hdev->name, total_segment, remain_size, fw->size);
-
- data = fw->data;
- for (i = 0; i < total_segment; i++) {
- buffer = data + i * MAX_SIZE_PER_TLV_SEGMENT;
- ret = rome_tlv_send_segment(hdev, i, MAX_SIZE_PER_TLV_SEGMENT,
- buffer);
- if (ret < 0)
- return -EIO;
- }
-
- if (remain_size) {
- buffer = data + total_segment * MAX_SIZE_PER_TLV_SEGMENT;
- ret = rome_tlv_send_segment(hdev, total_segment, remain_size,
- buffer);
- if (ret < 0)
- return -EIO;
- }
-
- return 0;
-}
-
static int rome_download_firmware(struct hci_dev *hdev,
struct rome_config *config)
{
const struct firmware *fw;
- int ret;
+ const u8 *segment;
+ int ret, remain, i = 0;
bt_dev_info(hdev, "ROME Downloading %s", config->fwname);
@@ -298,10 +278,24 @@
rome_tlv_check_data(config, fw);
- ret = rome_tlv_download_request(hdev, fw);
- if (ret) {
- BT_ERR("%s: Failed to download file: %s (%d)", hdev->name,
- config->fwname, ret);
+ segment = fw->data;
+ remain = fw->size;
+ while (remain > 0) {
+ int segsize = min(MAX_SIZE_PER_TLV_SEGMENT, remain);
+
+ bt_dev_dbg(hdev, "Send segment %d, size %d", i++, segsize);
+
+ remain -= segsize;
+ /* The last segment is always acked regardless download mode */
+ if (!remain || segsize < MAX_SIZE_PER_TLV_SEGMENT)
+ config->dnld_mode = ROME_SKIP_EVT_NONE;
+
+ ret = rome_tlv_send_segment(hdev, segsize, segment,
+ config->dnld_mode);
+ if (ret)
+ break;
+
+ segment += segsize;
}
release_firmware(fw);
diff --git a/drivers/bluetooth/btqca.h b/drivers/bluetooth/btqca.h
index 65e994b..13d77fd 100644
--- a/drivers/bluetooth/btqca.h
+++ b/drivers/bluetooth/btqca.h
@@ -61,6 +61,13 @@
QCA_BAUDRATE_RESERVED
};
+enum rome_tlv_dnld_mode {
+ ROME_SKIP_EVT_NONE,
+ ROME_SKIP_EVT_VSE,
+ ROME_SKIP_EVT_CC,
+ ROME_SKIP_EVT_VSE_CC
+};
+
enum rome_tlv_type {
TLV_TYPE_PATCH = 1,
TLV_TYPE_NVM
@@ -70,6 +77,7 @@
u8 type;
char fwname[64];
uint8_t user_baud_rate;
+ enum rome_tlv_dnld_mode dnld_mode;
};
struct edl_event_hdr {
@@ -94,7 +102,8 @@
__le32 data_length;
__u8 format_version;
__u8 signature;
- __le16 reserved1;
+ __u8 download_mode;
+ __u8 reserved1;
__le16 product_id;
__le16 rom_build;
__le16 patch_version;
diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
index 2c9a5fc..7df3eed 100644
--- a/drivers/bluetooth/btqcomsmd.c
+++ b/drivers/bluetooth/btqcomsmd.c
@@ -65,6 +65,7 @@
{
struct btqcomsmd *btq = priv;
+ btq->hdev->stat.byte_rx += count;
return btqcomsmd_recv(btq->hdev, HCI_EVENT_PKT, data, count);
}
@@ -76,12 +77,21 @@
switch (hci_skb_pkt_type(skb)) {
case HCI_ACLDATA_PKT:
ret = rpmsg_send(btq->acl_channel, skb->data, skb->len);
+ if (ret) {
+ hdev->stat.err_tx++;
+ break;
+ }
hdev->stat.acl_tx++;
hdev->stat.byte_tx += skb->len;
break;
case HCI_COMMAND_PKT:
ret = rpmsg_send(btq->cmd_channel, skb->data, skb->len);
+ if (ret) {
+ hdev->stat.err_tx++;
+ break;
+ }
hdev->stat.cmd_tx++;
+ hdev->stat.byte_tx += skb->len;
break;
default:
ret = -EILSEQ;
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index b937cc1..91882f5 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -276,6 +276,8 @@
{ USB_DEVICE(0x04ca, 0x3011), .driver_info = BTUSB_QCA_ROME },
{ USB_DEVICE(0x04ca, 0x3015), .driver_info = BTUSB_QCA_ROME },
{ USB_DEVICE(0x04ca, 0x3016), .driver_info = BTUSB_QCA_ROME },
+ { USB_DEVICE(0x04ca, 0x301a), .driver_info = BTUSB_QCA_ROME },
+ { USB_DEVICE(0x13d3, 0x3496), .driver_info = BTUSB_QCA_ROME },
/* Broadcom BCM2035 */
{ USB_DEVICE(0x0a5c, 0x2009), .driver_info = BTUSB_BCM92035 },
diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c
index 441f5e1..f06f0f1 100644
--- a/drivers/bluetooth/hci_bcm.c
+++ b/drivers/bluetooth/hci_bcm.c
@@ -501,7 +501,7 @@
hu->hdev->set_diag = bcm_set_diag;
hu->hdev->set_bdaddr = btbcm_set_bdaddr;
- err = btbcm_initialize(hu->hdev, fw_name, sizeof(fw_name));
+ err = btbcm_initialize(hu->hdev, fw_name, sizeof(fw_name), false);
if (err)
return err;
@@ -794,19 +794,21 @@
{ },
};
-#ifdef CONFIG_ACPI
-/* IRQ polarity of some chipsets are not defined correctly in ACPI table. */
-static const struct dmi_system_id bcm_active_low_irq_dmi_table[] = {
- { /* Handle ThinkPad 8 tablets with BCM2E55 chipset ACPI ID */
- .ident = "Lenovo ThinkPad 8",
+/* Some firmware reports an IRQ which does not work (wrong pin in fw table?) */
+static const struct dmi_system_id bcm_broken_irq_dmi_table[] = {
+ {
+ .ident = "Meegopad T08",
.matches = {
- DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"),
- DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "ThinkPad 8"),
+ DMI_EXACT_MATCH(DMI_BOARD_VENDOR,
+ "To be filled by OEM."),
+ DMI_EXACT_MATCH(DMI_BOARD_NAME, "T3 MRD"),
+ DMI_EXACT_MATCH(DMI_BOARD_VERSION, "V1.1"),
},
},
{ }
};
+#ifdef CONFIG_ACPI
static int bcm_resource(struct acpi_resource *ares, void *data)
{
struct bcm_device *dev = data;
@@ -904,6 +906,8 @@
static int bcm_get_resources(struct bcm_device *dev)
{
+ const struct dmi_system_id *dmi_id;
+
dev->name = dev_name(dev->dev);
if (x86_apple_machine && !bcm_apple_get_resources(dev))
@@ -936,6 +940,13 @@
dev->irq = gpiod_to_irq(gpio);
}
+ dmi_id = dmi_first_match(bcm_broken_irq_dmi_table);
+ if (dmi_id) {
+ dev_info(dev->dev, "%s: Has a broken IRQ config, disabling IRQ support / runtime-pm\n",
+ dmi_id->ident);
+ dev->irq = 0;
+ }
+
dev_dbg(dev->dev, "BCM irq: %d\n", dev->irq);
return 0;
}
@@ -944,7 +955,6 @@
static int bcm_acpi_probe(struct bcm_device *dev)
{
LIST_HEAD(resources);
- const struct dmi_system_id *dmi_id;
const struct acpi_gpio_mapping *gpio_mapping = acpi_bcm_int_last_gpios;
struct resource_entry *entry;
int ret;
@@ -991,13 +1001,6 @@
dev->irq_active_low = irq_polarity;
dev_warn(dev->dev, "Overwriting IRQ polarity to active %s by module-param\n",
dev->irq_active_low ? "low" : "high");
- } else {
- dmi_id = dmi_first_match(bcm_active_low_irq_dmi_table);
- if (dmi_id) {
- dev_warn(dev->dev, "%s: Overwriting IRQ polarity to active low",
- dmi_id->ident);
- dev->irq_active_low = true;
- }
}
return 0;
diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index b6a7170..954213e 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -447,6 +447,8 @@
btbcm_check_bdaddr(hdev);
break;
#endif
+ default:
+ break;
}
done:
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 05ec530..f05382b 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -29,7 +29,12 @@
*/
#include <linux/kernel.h>
+#include <linux/clk.h>
#include <linux/debugfs.h>
+#include <linux/gpio/consumer.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+#include <linux/serdev.h>
#include <net/bluetooth/bluetooth.h>
#include <net/bluetooth/hci_core.h>
@@ -50,6 +55,9 @@
#define IBS_TX_IDLE_TIMEOUT_MS 2000
#define BAUDRATE_SETTLE_TIMEOUT_MS 300
+/* susclk rate */
+#define SUSCLK_RATE_32KHZ 32768
+
/* HCI_IBS transmit side sleep protocol states */
enum tx_ibs_states {
HCI_IBS_TX_ASLEEP,
@@ -111,6 +119,12 @@
u64 votes_off;
};
+struct qca_serdev {
+ struct hci_uart serdev_hu;
+ struct gpio_desc *bt_en;
+ struct clk *susclk;
+};
+
static void __serial_clock_on(struct tty_struct *tty)
{
/* TODO: Some chipset requires to enable UART clock on client
@@ -386,6 +400,7 @@
/* Initialize protocol */
static int qca_open(struct hci_uart *hu)
{
+ struct qca_serdev *qcadev;
struct qca_data *qca;
BT_DBG("hu %p qca_open", hu);
@@ -444,6 +459,13 @@
timer_setup(&qca->tx_idle_timer, hci_ibs_tx_idle_timeout, 0);
qca->tx_idle_delay = IBS_TX_IDLE_TIMEOUT_MS;
+ if (hu->serdev) {
+ serdev_device_open(hu->serdev);
+
+ qcadev = serdev_device_get_drvdata(hu->serdev);
+ gpiod_set_value_cansleep(qcadev->bt_en, 1);
+ }
+
BT_DBG("HCI_UART_QCA open, tx_idle_delay=%u, wake_retrans=%u",
qca->tx_idle_delay, qca->wake_retrans);
@@ -512,6 +534,7 @@
/* Close protocol */
static int qca_close(struct hci_uart *hu)
{
+ struct qca_serdev *qcadev;
struct qca_data *qca = hu->priv;
BT_DBG("hu %p qca close", hu);
@@ -525,6 +548,13 @@
destroy_workqueue(qca->workqueue);
qca->hu = NULL;
+ if (hu->serdev) {
+ serdev_device_close(hu->serdev);
+
+ qcadev = serdev_device_get_drvdata(hu->serdev);
+ gpiod_set_value_cansleep(qcadev->bt_en, 0);
+ }
+
kfree_skb(qca->rx_skb);
hu->priv = NULL;
@@ -885,6 +915,14 @@
return 0;
}
+static inline void host_set_baudrate(struct hci_uart *hu, unsigned int speed)
+{
+ if (hu->serdev)
+ serdev_device_set_baudrate(hu->serdev, speed);
+ else
+ hci_uart_set_baudrate(hu, speed);
+}
+
static int qca_setup(struct hci_uart *hu)
{
struct hci_dev *hdev = hu->hdev;
@@ -905,7 +943,7 @@
speed = hu->proto->init_speed;
if (speed)
- hci_uart_set_baudrate(hu, speed);
+ host_set_baudrate(hu, speed);
/* Setup user speed if needed */
speed = 0;
@@ -924,7 +962,7 @@
ret);
return ret;
}
- hci_uart_set_baudrate(hu, speed);
+ host_set_baudrate(hu, speed);
}
/* Setup patch / NVM configurations */
@@ -935,6 +973,12 @@
} else if (ret == -ENOENT) {
/* No patch/nvm-config found, run with original fw/config */
ret = 0;
+ } else if (ret == -EAGAIN) {
+ /*
+ * Userspace firmware loader will return -EAGAIN in case no
+ * patch/nvm-config is found, so run with original fw/config.
+ */
+ ret = 0;
}
/* Setup bdaddr */
@@ -958,12 +1002,80 @@
.dequeue = qca_dequeue,
};
+static int qca_serdev_probe(struct serdev_device *serdev)
+{
+ struct qca_serdev *qcadev;
+ int err;
+
+ qcadev = devm_kzalloc(&serdev->dev, sizeof(*qcadev), GFP_KERNEL);
+ if (!qcadev)
+ return -ENOMEM;
+
+ qcadev->serdev_hu.serdev = serdev;
+ serdev_device_set_drvdata(serdev, qcadev);
+
+ qcadev->bt_en = devm_gpiod_get(&serdev->dev, "enable",
+ GPIOD_OUT_LOW);
+ if (IS_ERR(qcadev->bt_en)) {
+ dev_err(&serdev->dev, "failed to acquire enable gpio\n");
+ return PTR_ERR(qcadev->bt_en);
+ }
+
+ qcadev->susclk = devm_clk_get(&serdev->dev, NULL);
+ if (IS_ERR(qcadev->susclk)) {
+ dev_err(&serdev->dev, "failed to acquire clk\n");
+ return PTR_ERR(qcadev->susclk);
+ }
+
+ err = clk_set_rate(qcadev->susclk, SUSCLK_RATE_32KHZ);
+ if (err)
+ return err;
+
+ err = clk_prepare_enable(qcadev->susclk);
+ if (err)
+ return err;
+
+ err = hci_uart_register_device(&qcadev->serdev_hu, &qca_proto);
+ if (err)
+ clk_disable_unprepare(qcadev->susclk);
+
+ return err;
+}
+
+static void qca_serdev_remove(struct serdev_device *serdev)
+{
+ struct qca_serdev *qcadev = serdev_device_get_drvdata(serdev);
+
+ hci_uart_unregister_device(&qcadev->serdev_hu);
+
+ clk_disable_unprepare(qcadev->susclk);
+}
+
+static const struct of_device_id qca_bluetooth_of_match[] = {
+ { .compatible = "qcom,qca6174-bt" },
+ { /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, qca_bluetooth_of_match);
+
+static struct serdev_device_driver qca_serdev_driver = {
+ .probe = qca_serdev_probe,
+ .remove = qca_serdev_remove,
+ .driver = {
+ .name = "hci_uart_qca",
+ .of_match_table = qca_bluetooth_of_match,
+ },
+};
+
int __init qca_init(void)
{
+ serdev_device_driver_register(&qca_serdev_driver);
+
return hci_uart_register_proto(&qca_proto);
}
int __exit qca_deinit(void)
{
+ serdev_device_driver_unregister(&qca_serdev_driver);
+
return hci_uart_unregister_proto(&qca_proto);
}
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index a782ce8..ed5e424 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -262,6 +262,8 @@
ev->what = PROC_EVENT_COREDUMP;
ev->event_data.coredump.process_pid = task->pid;
ev->event_data.coredump.process_tgid = task->tgid;
+ ev->event_data.coredump.parent_pid = task->real_parent->pid;
+ ev->event_data.coredump.parent_tgid = task->real_parent->tgid;
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
@@ -288,6 +290,8 @@
ev->event_data.exit.process_tgid = task->tgid;
ev->event_data.exit.exit_code = task->exit_code;
ev->event_data.exit.exit_signal = task->exit_signal;
+ ev->event_data.exit.parent_pid = task->real_parent->pid;
+ ev->event_data.exit.parent_tgid = task->real_parent->tgid;
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
diff --git a/drivers/dca/dca-core.c b/drivers/dca/dca-core.c
index 7afbb28..1bc5ffb 100644
--- a/drivers/dca/dca-core.c
+++ b/drivers/dca/dca-core.c
@@ -270,7 +270,7 @@
* @dev - the device that wants dca service
* @cpu - the cpuid as returned by get_cpu()
*/
-u8 dca_common_get_tag(struct device *dev, int cpu)
+static u8 dca_common_get_tag(struct device *dev, int cpu)
{
struct dca_provider *dca;
u8 tag;
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 77d257e..6d52ea035 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -849,7 +849,7 @@
return 0;
err_cqb:
- kfree(*cqb);
+ kvfree(*cqb);
err_db:
mlx5_ib_db_unmap_user(to_mucontext(context), &cq->db);
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
index 4be3aef..267da82 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c
@@ -443,17 +443,16 @@
}
/* opa_vnic_calc_entropy - calculate the packet entropy */
-u8 opa_vnic_calc_entropy(struct opa_vnic_adapter *adapter, struct sk_buff *skb)
+u8 opa_vnic_calc_entropy(struct sk_buff *skb)
{
- u16 hash16;
+ u32 hash = skb_get_hash(skb);
- /*
- * Get flow based 16-bit hash and then XOR the upper and lower bytes
- * to get the entropy.
- * __skb_tx_hash limits qcount to 16 bits. Hence, get 15-bit hash.
- */
- hash16 = __skb_tx_hash(adapter->netdev, skb, BIT(15));
- return (u8)((hash16 >> 8) ^ (hash16 & 0xff));
+ /* store XOR of all bytes in lower 8 bits */
+ hash ^= hash >> 8;
+ hash ^= hash >> 16;
+
+ /* return lower 8 bits as entropy */
+ return (u8)(hash & 0xFF);
}
/* opa_vnic_get_def_port - get default port based on entropy */
@@ -490,7 +489,7 @@
hdr = skb_push(skb, OPA_VNIC_HDR_LEN);
- entropy = opa_vnic_calc_entropy(adapter, skb);
+ entropy = opa_vnic_calc_entropy(skb);
def_port = opa_vnic_get_def_port(adapter, entropy);
len = opa_vnic_wire_length(skb);
dlid = opa_vnic_get_dlid(adapter, skb, def_port);
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
index afd95f4..43ac61f 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_internal.h
@@ -299,7 +299,7 @@
void opa_vnic_rem_netdev(struct opa_vnic_adapter *adapter);
void opa_vnic_encap_skb(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
u8 opa_vnic_get_vl(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
-u8 opa_vnic_calc_entropy(struct opa_vnic_adapter *adapter, struct sk_buff *skb);
+u8 opa_vnic_calc_entropy(struct sk_buff *skb);
void opa_vnic_process_vema_config(struct opa_vnic_adapter *adapter);
void opa_vnic_release_mac_tbl(struct opa_vnic_adapter *adapter);
void opa_vnic_query_mac_tbl(struct opa_vnic_adapter *adapter,
diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
index ce57e0f..0c8aec6 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c
@@ -104,7 +104,7 @@
/* pass entropy and vl as metadata in skb */
mdata = skb_push(skb, sizeof(*mdata));
- mdata->entropy = opa_vnic_calc_entropy(adapter, skb);
+ mdata->entropy = opa_vnic_calc_entropy(skb);
mdata->vl = opa_vnic_get_vl(adapter, skb);
rc = adapter->rn_ops->ndo_select_queue(netdev, skb,
accel_priv, fallback);
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8918466..a029b27 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -198,6 +198,7 @@
config GENEVE
tristate "Generic Network Virtualization Encapsulation"
depends on INET && NET_UDP_TUNNEL
+ depends on IPV6 || !IPV6
select NET_IP_TUNNEL
select GRO_CELLS
---help---
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 5eb0df2..e82108c 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -40,11 +40,6 @@
#include <net/bonding.h>
#include <net/bond_alb.h>
-
-
-static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = {
- 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
-};
static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
0x33, 0x33, 0x00, 0x00, 0x00, 0x01
};
@@ -420,8 +415,7 @@
if (assigned_slave) {
rx_hash_table[index].slave = assigned_slave;
- if (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
- mac_bcast)) {
+ if (is_valid_ether_addr(rx_hash_table[index].mac_dst)) {
bond_info->rx_hashtbl[index].ntt = 1;
bond_info->rx_ntt = 1;
/* A slave has been removed from the
@@ -524,7 +518,7 @@
client_info = &(bond_info->rx_hashtbl[hash_index]);
if ((client_info->slave == slave) &&
- !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) {
+ is_valid_ether_addr(client_info->mac_dst)) {
client_info->ntt = 1;
ntt = 1;
}
@@ -565,7 +559,7 @@
if ((client_info->ip_src == src_ip) &&
!ether_addr_equal_64bits(client_info->slave->dev->dev_addr,
bond->dev->dev_addr) &&
- !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) {
+ is_valid_ether_addr(client_info->mac_dst)) {
client_info->ntt = 1;
bond_info->rx_ntt = 1;
}
@@ -593,7 +587,7 @@
if ((client_info->ip_src == arp->ip_src) &&
(client_info->ip_dst == arp->ip_dst)) {
/* the entry is already assigned to this client */
- if (!ether_addr_equal_64bits(arp->mac_dst, mac_bcast)) {
+ if (!is_broadcast_ether_addr(arp->mac_dst)) {
/* update mac address from arp */
ether_addr_copy(client_info->mac_dst, arp->mac_dst);
}
@@ -641,7 +635,7 @@
ether_addr_copy(client_info->mac_src, arp->mac_src);
client_info->slave = assigned_slave;
- if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) {
+ if (is_valid_ether_addr(client_info->mac_dst)) {
client_info->ntt = 1;
bond->alb_info.rx_ntt = 1;
} else {
@@ -733,8 +727,10 @@
assigned_slave = __rlb_next_rx_slave(bond);
if (assigned_slave && (client_info->slave != assigned_slave)) {
client_info->slave = assigned_slave;
- client_info->ntt = 1;
- ntt = 1;
+ if (!is_zero_ether_addr(client_info->mac_dst)) {
+ client_info->ntt = 1;
+ ntt = 1;
+ }
}
}
@@ -1319,8 +1315,8 @@
rlb_deinitialize(bond);
}
-static int bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
- struct slave *tx_slave)
+static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
+ struct slave *tx_slave)
{
struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
struct ethhdr *eth_data = eth_hdr(skb);
@@ -1354,7 +1350,7 @@
return NETDEV_TX_OK;
}
-int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
+netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
struct ethhdr *eth_data;
@@ -1392,7 +1388,7 @@
return bond_do_alb_xmit(skb, bond, tx_slave);
}
-int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
+netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
struct ethhdr *eth_data;
@@ -1412,9 +1408,9 @@
case ETH_P_IP: {
const struct iphdr *iph = ip_hdr(skb);
- if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast) ||
- (iph->daddr == ip_bcast) ||
- (iph->protocol == IPPROTO_IGMP)) {
+ if (is_broadcast_ether_addr(eth_data->h_dest) ||
+ iph->daddr == ip_bcast ||
+ iph->protocol == IPPROTO_IGMP) {
do_tx_balance = false;
break;
}
@@ -1426,7 +1422,7 @@
/* IPv6 doesn't really use broadcast mac address, but leave
* that here just in case.
*/
- if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast)) {
+ if (is_broadcast_ether_addr(eth_data->h_dest)) {
do_tx_balance = false;
break;
}
@@ -1482,8 +1478,24 @@
}
if (do_tx_balance) {
- hash_index = _simple_hash(hash_start, hash_size);
- tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
+ if (bond->params.tlb_dynamic_lb) {
+ hash_index = _simple_hash(hash_start, hash_size);
+ tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
+ } else {
+ /*
+ * do_tx_balance means we are free to select the tx_slave
+ * So we do exactly what tlb would do for hash selection
+ */
+
+ struct bond_up_slave *slaves;
+ unsigned int count;
+
+ slaves = rcu_dereference(bond->slave_arr);
+ count = slaves ? READ_ONCE(slaves->count) : 0;
+ if (likely(count))
+ tx_slave = slaves->arr[bond_xmit_hash(bond, skb) %
+ count];
+ }
}
return bond_do_alb_xmit(skb, bond, tx_slave);
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1f1e97b..bd53a71 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -159,7 +159,7 @@
MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on carrier");
module_param(xmit_hash_policy, charp, 0);
-MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
+MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
"0 for layer 2 (default), 1 for layer 3+4, "
"2 for layer 2+3, 3 for encap layer 2+3, "
"4 for encap layer 3+4");
@@ -247,7 +247,7 @@
BUILD_BUG_ON(sizeof(skb->queue_mapping) !=
sizeof(qdisc_skb_cb(skb)->slave_dev_queue_mapping));
- skb->queue_mapping = qdisc_skb_cb(skb)->slave_dev_queue_mapping;
+ skb_set_queue_mapping(skb, qdisc_skb_cb(skb)->slave_dev_queue_mapping);
if (unlikely(netpoll_tx_running(bond->dev)))
bond_netpoll_send_skb(bond_get_slave_by_dev(bond, slave_dev), skb);
@@ -1107,7 +1107,8 @@
done:
bond_dev->vlan_features = vlan_features;
- bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
+ bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+ NETIF_F_GSO_UDP_L4;
bond_dev->gso_max_segs = gso_max_segs;
netif_set_gso_max_size(bond_dev, gso_max_size);
@@ -1217,12 +1218,37 @@
}
}
+static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
+ enum netdev_lag_tx_type type)
+{
+ if (type != NETDEV_LAG_TX_TYPE_HASH)
+ return NETDEV_LAG_HASH_NONE;
+
+ switch (bond->params.xmit_policy) {
+ case BOND_XMIT_POLICY_LAYER2:
+ return NETDEV_LAG_HASH_L2;
+ case BOND_XMIT_POLICY_LAYER34:
+ return NETDEV_LAG_HASH_L34;
+ case BOND_XMIT_POLICY_LAYER23:
+ return NETDEV_LAG_HASH_L23;
+ case BOND_XMIT_POLICY_ENCAP23:
+ return NETDEV_LAG_HASH_E23;
+ case BOND_XMIT_POLICY_ENCAP34:
+ return NETDEV_LAG_HASH_E34;
+ default:
+ return NETDEV_LAG_HASH_UNKNOWN;
+ }
+}
+
static int bond_master_upper_dev_link(struct bonding *bond, struct slave *slave,
struct netlink_ext_ack *extack)
{
struct netdev_lag_upper_info lag_upper_info;
+ enum netdev_lag_tx_type type;
- lag_upper_info.tx_type = bond_lag_tx_type(bond);
+ type = bond_lag_tx_type(bond);
+ lag_upper_info.tx_type = type;
+ lag_upper_info.hash_type = bond_lag_hash_type(bond, type);
return netdev_master_upper_dev_link(slave->dev, bond->dev, slave,
&lag_upper_info, extack);
@@ -1735,7 +1761,7 @@
unblock_netpoll_tx();
}
- if (bond_mode_uses_xmit_hash(bond))
+ if (bond_mode_can_use_xmit_hash(bond))
bond_update_slave_arr(bond, NULL);
bond->nest_level = dev_get_nest_level(bond_dev);
@@ -1870,7 +1896,7 @@
if (BOND_MODE(bond) == BOND_MODE_8023AD)
bond_3ad_unbind_slave(slave);
- if (bond_mode_uses_xmit_hash(bond))
+ if (bond_mode_can_use_xmit_hash(bond))
bond_update_slave_arr(bond, slave);
netdev_info(bond_dev, "Releasing %s interface %s\n",
@@ -2137,6 +2163,24 @@
return commit;
}
+static void bond_miimon_link_change(struct bonding *bond,
+ struct slave *slave,
+ char link)
+{
+ switch (BOND_MODE(bond)) {
+ case BOND_MODE_8023AD:
+ bond_3ad_handle_link_change(slave, link);
+ break;
+ case BOND_MODE_TLB:
+ case BOND_MODE_ALB:
+ bond_alb_handle_link_change(bond, slave, link);
+ break;
+ case BOND_MODE_XOR:
+ bond_update_slave_arr(bond, NULL);
+ break;
+ }
+}
+
static void bond_miimon_commit(struct bonding *bond)
{
struct list_head *iter;
@@ -2178,16 +2222,7 @@
slave->speed == SPEED_UNKNOWN ? 0 : slave->speed,
slave->duplex ? "full" : "half");
- /* notify ad that the link status has changed */
- if (BOND_MODE(bond) == BOND_MODE_8023AD)
- bond_3ad_handle_link_change(slave, BOND_LINK_UP);
-
- if (bond_is_lb(bond))
- bond_alb_handle_link_change(bond, slave,
- BOND_LINK_UP);
-
- if (BOND_MODE(bond) == BOND_MODE_XOR)
- bond_update_slave_arr(bond, NULL);
+ bond_miimon_link_change(bond, slave, BOND_LINK_UP);
if (!bond->curr_active_slave || slave == primary)
goto do_failover;
@@ -2209,16 +2244,7 @@
netdev_info(bond->dev, "link status definitely down for interface %s, disabling it\n",
slave->dev->name);
- if (BOND_MODE(bond) == BOND_MODE_8023AD)
- bond_3ad_handle_link_change(slave,
- BOND_LINK_DOWN);
-
- if (bond_is_lb(bond))
- bond_alb_handle_link_change(bond, slave,
- BOND_LINK_DOWN);
-
- if (BOND_MODE(bond) == BOND_MODE_XOR)
- bond_update_slave_arr(bond, NULL);
+ bond_miimon_link_change(bond, slave, BOND_LINK_DOWN);
if (slave == rcu_access_pointer(bond->curr_active_slave))
goto do_failover;
@@ -3102,7 +3128,7 @@
* events. If these (miimon/arpmon) parameters are configured
* then array gets refreshed twice and that should be fine!
*/
- if (bond_mode_uses_xmit_hash(bond))
+ if (bond_mode_can_use_xmit_hash(bond))
bond_update_slave_arr(bond, NULL);
break;
case NETDEV_CHANGEMTU:
@@ -3322,7 +3348,7 @@
*/
if (bond_alb_initialize(bond, (BOND_MODE(bond) == BOND_MODE_ALB)))
return -ENOMEM;
- if (bond->params.tlb_dynamic_lb)
+ if (bond->params.tlb_dynamic_lb || BOND_MODE(bond) == BOND_MODE_ALB)
queue_delayed_work(bond->wq, &bond->alb_work, 0);
}
@@ -3341,7 +3367,7 @@
bond_3ad_initiate_agg_selection(bond, 1);
}
- if (bond_mode_uses_xmit_hash(bond))
+ if (bond_mode_can_use_xmit_hash(bond))
bond_update_slave_arr(bond, NULL);
return 0;
@@ -3807,7 +3833,8 @@
return slave_id;
}
-static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev)
+static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
+ struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
struct iphdr *iph = ip_hdr(skb);
@@ -3843,7 +3870,8 @@
/* In active-backup mode, we know that bond->curr_active_slave is always valid if
* the bond has a usable interface.
*/
-static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_dev)
+static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
+ struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
struct slave *slave;
@@ -3892,7 +3920,7 @@
* to determine the slave interface -
* (a) BOND_MODE_8023AD
* (b) BOND_MODE_XOR
- * (c) BOND_MODE_TLB && tlb_dynamic_lb == 0
+ * (c) (BOND_MODE_TLB || BOND_MODE_ALB) && tlb_dynamic_lb == 0
*
* The caller is expected to hold RTNL only and NO other lock!
*/
@@ -3945,6 +3973,11 @@
continue;
if (skipslave == slave)
continue;
+
+ netdev_dbg(bond->dev,
+ "Adding slave dev %s to tx hash array[%d]\n",
+ slave->dev->name, new_arr->count);
+
new_arr->arr[new_arr->count++] = slave;
}
@@ -3981,7 +4014,8 @@
* usable slave array is formed in the control path. The xmit function
* just calculates hash and sends the packet out.
*/
-static int bond_3ad_xor_xmit(struct sk_buff *skb, struct net_device *dev)
+static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
+ struct net_device *dev)
{
struct bonding *bond = netdev_priv(dev);
struct slave *slave;
@@ -4001,7 +4035,8 @@
}
/* in broadcast mode, we send everything to all usable interfaces. */
-static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
+static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb,
+ struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
struct slave *slave = NULL;
@@ -4038,12 +4073,12 @@
struct slave *slave = NULL;
struct list_head *iter;
- if (!skb->queue_mapping)
+ if (!skb_rx_queue_recorded(skb))
return 1;
/* Find out if any slaves have the same mapping as this skb. */
bond_for_each_slave_rcu(bond, slave, iter) {
- if (slave->queue_id == skb->queue_mapping) {
+ if (slave->queue_id == skb_get_queue_mapping(skb)) {
if (bond_slave_is_up(slave) &&
slave->link == BOND_LINK_UP) {
bond_dev_queue_xmit(bond, skb, slave->dev);
@@ -4069,7 +4104,7 @@
u16 txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : 0;
/* Save the original txq to restore before passing to the driver */
- qdisc_skb_cb(skb)->slave_dev_queue_mapping = skb->queue_mapping;
+ qdisc_skb_cb(skb)->slave_dev_queue_mapping = skb_get_queue_mapping(skb);
if (unlikely(txq >= dev->real_num_tx_queues)) {
do {
@@ -4259,7 +4294,7 @@
NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_VLAN_CTAG_FILTER;
- bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
+ bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
bond_dev->features |= bond_dev->hw_features;
}
@@ -4320,9 +4355,9 @@
}
if (xmit_hash_policy) {
- if ((bond_mode != BOND_MODE_XOR) &&
- (bond_mode != BOND_MODE_8023AD) &&
- (bond_mode != BOND_MODE_TLB)) {
+ if (bond_mode == BOND_MODE_ROUNDROBIN ||
+ bond_mode == BOND_MODE_ACTIVEBACKUP ||
+ bond_mode == BOND_MODE_BROADCAST) {
pr_info("xmit_hash_policy param is irrelevant in mode %s\n",
bond_mode_name(bond_mode));
} else {
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 58c705f..8a945c9 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -395,7 +395,7 @@
.id = BOND_OPT_TLB_DYNAMIC_LB,
.name = "tlb_dynamic_lb",
.desc = "Enable dynamic flow shuffling",
- .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB)),
+ .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB) | BIT(BOND_MODE_ALB)),
.values = bond_tlb_dynamic_lb_tbl,
.flags = BOND_OPTFLAG_IFDOWN,
.set = bond_option_tlb_dynamic_lb_set,
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 7861678..9f561fe 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -806,16 +806,39 @@
return B53_MIBS_SIZE;
}
-void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
+static struct phy_device *b53_get_phy_device(struct dsa_switch *ds, int port)
+{
+ /* These ports typically do not have built-in PHYs */
+ switch (port) {
+ case B53_CPU_PORT_25:
+ case 7:
+ case B53_CPU_PORT:
+ return NULL;
+ }
+
+ return mdiobus_get_phy(ds->slave_mii_bus, port);
+}
+
+void b53_get_strings(struct dsa_switch *ds, int port, u32 stringset,
+ uint8_t *data)
{
struct b53_device *dev = ds->priv;
const struct b53_mib_desc *mibs = b53_get_mib(dev);
unsigned int mib_size = b53_get_mib_size(dev);
+ struct phy_device *phydev;
unsigned int i;
- for (i = 0; i < mib_size; i++)
- strlcpy(data + i * ETH_GSTRING_LEN,
- mibs[i].name, ETH_GSTRING_LEN);
+ if (stringset == ETH_SS_STATS) {
+ for (i = 0; i < mib_size; i++)
+ strlcpy(data + i * ETH_GSTRING_LEN,
+ mibs[i].name, ETH_GSTRING_LEN);
+ } else if (stringset == ETH_SS_PHY_STATS) {
+ phydev = b53_get_phy_device(ds, port);
+ if (!phydev)
+ return;
+
+ phy_ethtool_get_strings(phydev, data);
+ }
}
EXPORT_SYMBOL(b53_get_strings);
@@ -852,11 +875,34 @@
}
EXPORT_SYMBOL(b53_get_ethtool_stats);
-int b53_get_sset_count(struct dsa_switch *ds, int port)
+void b53_get_ethtool_phy_stats(struct dsa_switch *ds, int port, uint64_t *data)
+{
+ struct phy_device *phydev;
+
+ phydev = b53_get_phy_device(ds, port);
+ if (!phydev)
+ return;
+
+ phy_ethtool_get_stats(phydev, NULL, data);
+}
+EXPORT_SYMBOL(b53_get_ethtool_phy_stats);
+
+int b53_get_sset_count(struct dsa_switch *ds, int port, int sset)
{
struct b53_device *dev = ds->priv;
+ struct phy_device *phydev;
- return b53_get_mib_size(dev);
+ if (sset == ETH_SS_STATS) {
+ return b53_get_mib_size(dev);
+ } else if (sset == ETH_SS_PHY_STATS) {
+ phydev = b53_get_phy_device(ds, port);
+ if (!phydev)
+ return 0;
+
+ return phy_ethtool_get_sset_count(phydev);
+ }
+
+ return 0;
}
EXPORT_SYMBOL(b53_get_sset_count);
@@ -1477,7 +1523,7 @@
}
EXPORT_SYMBOL(b53_br_fast_age);
-static bool b53_can_enable_brcm_tags(struct dsa_switch *ds, int port)
+static bool b53_possible_cpu_port(struct dsa_switch *ds, int port)
{
/* Broadcom switches will accept enabling Broadcom tags on the
* following ports: 5, 7 and 8, any other port is not supported
@@ -1489,10 +1535,19 @@
return true;
}
- dev_warn(ds->dev, "Port %d is not Broadcom tag capable\n", port);
return false;
}
+static bool b53_can_enable_brcm_tags(struct dsa_switch *ds, int port)
+{
+ bool ret = b53_possible_cpu_port(ds, port);
+
+ if (!ret)
+ dev_warn(ds->dev, "Port %d is not Broadcom tag capable\n",
+ port);
+ return ret;
+}
+
enum dsa_tag_protocol b53_get_tag_protocol(struct dsa_switch *ds, int port)
{
struct b53_device *dev = ds->priv;
@@ -1650,6 +1705,7 @@
.get_strings = b53_get_strings,
.get_ethtool_stats = b53_get_ethtool_stats,
.get_sset_count = b53_get_sset_count,
+ .get_ethtool_phy_stats = b53_get_ethtool_phy_stats,
.phy_read = b53_phy_read16,
.phy_write = b53_phy_write16,
.adjust_link = b53_adjust_link,
@@ -1954,6 +2010,15 @@
dev->num_ports = dev->cpu_port + 1;
dev->enabled_ports |= BIT(dev->cpu_port);
+ /* Include non standard CPU port built-in PHYs to be probed */
+ if (is539x(dev) || is531x5(dev)) {
+ for (i = 0; i < dev->num_ports; i++) {
+ if (!(dev->ds->phys_mii_mask & BIT(i)) &&
+ !b53_possible_cpu_port(dev->ds, i))
+ dev->ds->phys_mii_mask |= BIT(i);
+ }
+ }
+
dev->ports = devm_kzalloc(dev->dev,
sizeof(struct b53_port) * dev->num_ports,
GFP_KERNEL);
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 1187ebd..cc284a5 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -286,9 +286,11 @@
/* Exported functions towards other drivers */
void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port);
int b53_configure_vlan(struct dsa_switch *ds);
-void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data);
+void b53_get_strings(struct dsa_switch *ds, int port, u32 stringset,
+ uint8_t *data);
void b53_get_ethtool_stats(struct dsa_switch *ds, int port, uint64_t *data);
-int b53_get_sset_count(struct dsa_switch *ds, int port);
+int b53_get_sset_count(struct dsa_switch *ds, int port, int sset);
+void b53_get_ethtool_phy_stats(struct dsa_switch *ds, int port, uint64_t *data);
int b53_br_join(struct dsa_switch *ds, int port, struct net_device *bridge);
void b53_br_leave(struct dsa_switch *ds, int port, struct net_device *bridge);
void b53_br_set_stp_state(struct dsa_switch *ds, int port, u8 state);
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 0378ede..02e8982 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -16,6 +16,7 @@
#include <linux/platform_device.h>
#include <linux/phy.h>
#include <linux/phy_fixed.h>
+#include <linux/phylink.h>
#include <linux/mii.h>
#include <linux/of.h>
#include <linux/of_irq.h>
@@ -306,7 +307,8 @@
static irqreturn_t bcm_sf2_switch_0_isr(int irq, void *dev_id)
{
- struct bcm_sf2_priv *priv = dev_id;
+ struct dsa_switch *ds = dev_id;
+ struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
priv->irq0_stat = intrl2_0_readl(priv, INTRL2_CPU_STATUS) &
~priv->irq0_mask;
@@ -317,16 +319,21 @@
static irqreturn_t bcm_sf2_switch_1_isr(int irq, void *dev_id)
{
- struct bcm_sf2_priv *priv = dev_id;
+ struct dsa_switch *ds = dev_id;
+ struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
priv->irq1_stat = intrl2_1_readl(priv, INTRL2_CPU_STATUS) &
~priv->irq1_mask;
intrl2_1_writel(priv, priv->irq1_stat, INTRL2_CPU_CLEAR);
- if (priv->irq1_stat & P_LINK_UP_IRQ(P7_IRQ_OFF))
- priv->port_sts[7].link = 1;
- if (priv->irq1_stat & P_LINK_DOWN_IRQ(P7_IRQ_OFF))
- priv->port_sts[7].link = 0;
+ if (priv->irq1_stat & P_LINK_UP_IRQ(P7_IRQ_OFF)) {
+ priv->port_sts[7].link = true;
+ dsa_port_phylink_mac_change(ds, 7, true);
+ }
+ if (priv->irq1_stat & P_LINK_DOWN_IRQ(P7_IRQ_OFF)) {
+ priv->port_sts[7].link = false;
+ dsa_port_phylink_mac_change(ds, 7, false);
+ }
return IRQ_HANDLED;
}
@@ -443,12 +450,8 @@
priv->slave_mii_bus->parent = ds->dev->parent;
priv->slave_mii_bus->phy_mask = ~priv->indir_phy_mask;
- if (dn)
- err = of_mdiobus_register(priv->slave_mii_bus, dn);
- else
- err = mdiobus_register(priv->slave_mii_bus);
-
- if (err)
+ err = of_mdiobus_register(priv->slave_mii_bus, dn);
+ if (err && dn)
of_node_put(dn);
return err;
@@ -473,13 +476,56 @@
return priv->hw_params.gphy_rev;
}
-static void bcm_sf2_sw_adjust_link(struct dsa_switch *ds, int port,
- struct phy_device *phydev)
+static void bcm_sf2_sw_validate(struct dsa_switch *ds, int port,
+ unsigned long *supported,
+ struct phylink_link_state *state)
+{
+ __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+
+ if (!phy_interface_mode_is_rgmii(state->interface) &&
+ state->interface != PHY_INTERFACE_MODE_MII &&
+ state->interface != PHY_INTERFACE_MODE_REVMII &&
+ state->interface != PHY_INTERFACE_MODE_GMII &&
+ state->interface != PHY_INTERFACE_MODE_INTERNAL &&
+ state->interface != PHY_INTERFACE_MODE_MOCA) {
+ bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
+ dev_err(ds->dev,
+ "Unsupported interface: %d\n", state->interface);
+ return;
+ }
+
+ /* Allow all the expected bits */
+ phylink_set(mask, Autoneg);
+ phylink_set_port_modes(mask);
+ phylink_set(mask, Pause);
+ phylink_set(mask, Asym_Pause);
+
+ /* With the exclusion of MII and Reverse MII, we support Gigabit,
+ * including Half duplex
+ */
+ if (state->interface != PHY_INTERFACE_MODE_MII &&
+ state->interface != PHY_INTERFACE_MODE_REVMII) {
+ phylink_set(mask, 1000baseT_Full);
+ phylink_set(mask, 1000baseT_Half);
+ }
+
+ phylink_set(mask, 10baseT_Half);
+ phylink_set(mask, 10baseT_Full);
+ phylink_set(mask, 100baseT_Half);
+ phylink_set(mask, 100baseT_Full);
+
+ bitmap_and(supported, supported, mask,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+ bitmap_and(state->advertising, state->advertising, mask,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static void bcm_sf2_sw_mac_config(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state)
{
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
- struct ethtool_eee *p = &priv->dev->ports[port].eee;
u32 id_mode_dis = 0, port_mode;
- const char *str = NULL;
u32 reg, offset;
if (priv->type == BCM7445_DEVICE_ID)
@@ -487,62 +533,48 @@
else
offset = CORE_STS_OVERRIDE_GMIIP2_PORT(port);
- switch (phydev->interface) {
+ switch (state->interface) {
case PHY_INTERFACE_MODE_RGMII:
- str = "RGMII (no delay)";
id_mode_dis = 1;
+ /* fallthrough */
case PHY_INTERFACE_MODE_RGMII_TXID:
- if (!str)
- str = "RGMII (TX delay)";
port_mode = EXT_GPHY;
break;
case PHY_INTERFACE_MODE_MII:
- str = "MII";
port_mode = EXT_EPHY;
break;
case PHY_INTERFACE_MODE_REVMII:
- str = "Reverse MII";
port_mode = EXT_REVMII;
break;
default:
- /* All other PHYs: internal and MoCA */
+ /* all other PHYs: internal and MoCA */
goto force_link;
}
- /* If the link is down, just disable the interface to conserve power */
- if (!phydev->link) {
- reg = reg_readl(priv, REG_RGMII_CNTRL_P(port));
- reg &= ~RGMII_MODE_EN;
- reg_writel(priv, reg, REG_RGMII_CNTRL_P(port));
- goto force_link;
- }
-
- /* Clear id_mode_dis bit, and the existing port mode, but
- * make sure we enable the RGMII block for data to pass
+ /* Clear id_mode_dis bit, and the existing port mode, let
+ * RGMII_MODE_EN bet set by mac_link_{up,down}
*/
reg = reg_readl(priv, REG_RGMII_CNTRL_P(port));
reg &= ~ID_MODE_DIS;
reg &= ~(PORT_MODE_MASK << PORT_MODE_SHIFT);
reg &= ~(RX_PAUSE_EN | TX_PAUSE_EN);
- reg |= port_mode | RGMII_MODE_EN;
+ reg |= port_mode;
if (id_mode_dis)
reg |= ID_MODE_DIS;
- if (phydev->pause) {
- if (phydev->asym_pause)
+ if (state->pause & MLO_PAUSE_TXRX_MASK) {
+ if (state->pause & MLO_PAUSE_TX)
reg |= TX_PAUSE_EN;
reg |= RX_PAUSE_EN;
}
reg_writel(priv, reg, REG_RGMII_CNTRL_P(port));
- pr_info("Port %d configured for %s\n", port, str);
-
force_link:
/* Force link settings detected from the PHY */
reg = SW_OVERRIDE;
- switch (phydev->speed) {
+ switch (state->speed) {
case SPEED_1000:
reg |= SPDSTS_1000 << SPEED_SHIFT;
break;
@@ -551,33 +583,61 @@
break;
}
- if (phydev->link)
+ if (state->link)
reg |= LINK_STS;
- if (phydev->duplex == DUPLEX_FULL)
+ if (state->duplex == DUPLEX_FULL)
reg |= DUPLX_MODE;
core_writel(priv, reg, offset);
+}
- if (!phydev->is_pseudo_fixed_link)
+static void bcm_sf2_sw_mac_link_set(struct dsa_switch *ds, int port,
+ phy_interface_t interface, bool link)
+{
+ struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+ u32 reg;
+
+ if (!phy_interface_mode_is_rgmii(interface) &&
+ interface != PHY_INTERFACE_MODE_MII &&
+ interface != PHY_INTERFACE_MODE_REVMII)
+ return;
+
+ /* If the link is down, just disable the interface to conserve power */
+ reg = reg_readl(priv, REG_RGMII_CNTRL_P(port));
+ if (link)
+ reg |= RGMII_MODE_EN;
+ else
+ reg &= ~RGMII_MODE_EN;
+ reg_writel(priv, reg, REG_RGMII_CNTRL_P(port));
+}
+
+static void bcm_sf2_sw_mac_link_down(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface)
+{
+ bcm_sf2_sw_mac_link_set(ds, port, interface, false);
+}
+
+static void bcm_sf2_sw_mac_link_up(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface,
+ struct phy_device *phydev)
+{
+ struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
+ struct ethtool_eee *p = &priv->dev->ports[port].eee;
+
+ bcm_sf2_sw_mac_link_set(ds, port, interface, true);
+
+ if (mode == MLO_AN_PHY && phydev)
p->eee_enabled = b53_eee_init(ds, port, phydev);
}
-static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
- struct fixed_phy_status *status)
+static void bcm_sf2_sw_fixed_state(struct dsa_switch *ds, int port,
+ struct phylink_link_state *status)
{
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
- u32 duplex, pause, offset;
- u32 reg;
- if (priv->type == BCM7445_DEVICE_ID)
- offset = CORE_STS_OVERRIDE_GMIIP_PORT(port);
- else
- offset = CORE_STS_OVERRIDE_GMIIP2_PORT(port);
-
- duplex = core_readl(priv, CORE_DUPSTS);
- pause = core_readl(priv, CORE_PAUSESTS);
-
- status->link = 0;
+ status->link = false;
/* MoCA port is special as we do not get link status from CORE_LNKSTS,
* which means that we need to force the link at the port override
@@ -596,28 +656,10 @@
*/
if (!status->link)
netif_carrier_off(ds->ports[port].slave);
- status->duplex = 1;
+ status->duplex = DUPLEX_FULL;
} else {
- status->link = 1;
- status->duplex = !!(duplex & (1 << port));
+ status->link = true;
}
-
- reg = core_readl(priv, offset);
- reg |= SW_OVERRIDE;
- if (status->link)
- reg |= LINK_STS;
- else
- reg &= ~LINK_STS;
- core_writel(priv, reg, offset);
-
- if ((pause & (1 << port)) &&
- (pause & (1 << (port + PAUSESTS_TX_PAUSE_SHIFT)))) {
- status->asym_pause = 1;
- status->pause = 1;
- }
-
- if (pause & (1 << port))
- status->pause = 1;
}
static void bcm_sf2_enable_acb(struct dsa_switch *ds)
@@ -859,9 +901,13 @@
.get_strings = b53_get_strings,
.get_ethtool_stats = b53_get_ethtool_stats,
.get_sset_count = b53_get_sset_count,
+ .get_ethtool_phy_stats = b53_get_ethtool_phy_stats,
.get_phy_flags = bcm_sf2_sw_get_phy_flags,
- .adjust_link = bcm_sf2_sw_adjust_link,
- .fixed_link_update = bcm_sf2_sw_fixed_link_update,
+ .phylink_validate = bcm_sf2_sw_validate,
+ .phylink_mac_config = bcm_sf2_sw_mac_config,
+ .phylink_mac_link_down = bcm_sf2_sw_mac_link_down,
+ .phylink_mac_link_up = bcm_sf2_sw_mac_link_up,
+ .phylink_fixed_state = bcm_sf2_sw_fixed_state,
.suspend = bcm_sf2_sw_suspend,
.resume = bcm_sf2_sw_resume,
.get_wol = bcm_sf2_sw_get_wol,
@@ -1064,14 +1110,14 @@
bcm_sf2_intr_disable(priv);
ret = devm_request_irq(&pdev->dev, priv->irq0, bcm_sf2_switch_0_isr, 0,
- "switch_0", priv);
+ "switch_0", ds);
if (ret < 0) {
pr_err("failed to request switch_0 IRQ\n");
goto out_mdio;
}
ret = devm_request_irq(&pdev->dev, priv->irq1, bcm_sf2_switch_1_isr, 0,
- "switch_1", priv);
+ "switch_1", ds);
if (ret < 0) {
pr_err("failed to request switch_1 IRQ\n");
goto out_mdio;
diff --git a/drivers/net/dsa/dsa_loop.c b/drivers/net/dsa/dsa_loop.c
index f77be9f..816f34d 100644
--- a/drivers/net/dsa/dsa_loop.c
+++ b/drivers/net/dsa/dsa_loop.c
@@ -67,7 +67,7 @@
static enum dsa_tag_protocol dsa_loop_get_protocol(struct dsa_switch *ds,
int port)
{
- dev_dbg(ds->dev, "%s\n", __func__);
+ dev_dbg(ds->dev, "%s: port: %d\n", __func__, port);
return DSA_TAG_PROTO_NONE;
}
@@ -86,16 +86,23 @@
return 0;
}
-static int dsa_loop_get_sset_count(struct dsa_switch *ds, int port)
+static int dsa_loop_get_sset_count(struct dsa_switch *ds, int port, int sset)
{
+ if (sset != ETH_SS_STATS && sset != ETH_SS_PHY_STATS)
+ return 0;
+
return __DSA_LOOP_CNT_MAX;
}
-static void dsa_loop_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
+static void dsa_loop_get_strings(struct dsa_switch *ds, int port,
+ u32 stringset, uint8_t *data)
{
struct dsa_loop_priv *ps = ds->priv;
unsigned int i;
+ if (stringset != ETH_SS_STATS && stringset != ETH_SS_PHY_STATS)
+ return;
+
for (i = 0; i < __DSA_LOOP_CNT_MAX; i++)
memcpy(data + i * ETH_GSTRING_LEN,
ps->ports[port].mib[i].name, ETH_GSTRING_LEN);
@@ -117,8 +124,6 @@
struct mii_bus *bus = ps->bus;
int ret;
- dev_dbg(ds->dev, "%s\n", __func__);
-
ret = mdiobus_read_nested(bus, ps->port_base + port, regnum);
if (ret < 0)
ps->ports[port].mib[DSA_LOOP_PHY_READ_ERR].val++;
@@ -135,8 +140,6 @@
struct mii_bus *bus = ps->bus;
int ret;
- dev_dbg(ds->dev, "%s\n", __func__);
-
ret = mdiobus_write_nested(bus, ps->port_base + port, regnum, value);
if (ret < 0)
ps->ports[port].mib[DSA_LOOP_PHY_WRITE_ERR].val++;
@@ -149,7 +152,8 @@
static int dsa_loop_port_bridge_join(struct dsa_switch *ds, int port,
struct net_device *bridge)
{
- dev_dbg(ds->dev, "%s\n", __func__);
+ dev_dbg(ds->dev, "%s: port: %d, bridge: %s\n",
+ __func__, port, bridge->name);
return 0;
}
@@ -157,19 +161,22 @@
static void dsa_loop_port_bridge_leave(struct dsa_switch *ds, int port,
struct net_device *bridge)
{
- dev_dbg(ds->dev, "%s\n", __func__);
+ dev_dbg(ds->dev, "%s: port: %d, bridge: %s\n",
+ __func__, port, bridge->name);
}
static void dsa_loop_port_stp_state_set(struct dsa_switch *ds, int port,
u8 state)
{
- dev_dbg(ds->dev, "%s\n", __func__);
+ dev_dbg(ds->dev, "%s: port: %d, state: %d\n",
+ __func__, port, state);
}
static int dsa_loop_port_vlan_filtering(struct dsa_switch *ds, int port,
bool vlan_filtering)
{
- dev_dbg(ds->dev, "%s\n", __func__);
+ dev_dbg(ds->dev, "%s: port: %d, vlan_filtering: %d\n",
+ __func__, port, vlan_filtering);
return 0;
}
@@ -181,7 +188,8 @@
struct dsa_loop_priv *ps = ds->priv;
struct mii_bus *bus = ps->bus;
- dev_dbg(ds->dev, "%s\n", __func__);
+ dev_dbg(ds->dev, "%s: port: %d, vlan: %d-%d",
+ __func__, port, vlan->vid_begin, vlan->vid_end);
/* Just do a sleeping operation to make lockdep checks effective */
mdiobus_read(bus, ps->port_base + port, MII_BMSR);
@@ -202,8 +210,6 @@
struct dsa_loop_vlan *vl;
u16 vid;
- dev_dbg(ds->dev, "%s\n", __func__);
-
/* Just do a sleeping operation to make lockdep checks effective */
mdiobus_read(bus, ps->port_base + port, MII_BMSR);
@@ -215,6 +221,9 @@
vl->untagged |= BIT(port);
else
vl->untagged &= ~BIT(port);
+
+ dev_dbg(ds->dev, "%s: port: %d vlan: %d, %stagged, pvid: %d\n",
+ __func__, port, vid, untagged ? "un" : "", pvid);
}
if (pvid)
@@ -230,8 +239,6 @@
struct dsa_loop_vlan *vl;
u16 vid, pvid = ps->pvid;
- dev_dbg(ds->dev, "%s\n", __func__);
-
/* Just do a sleeping operation to make lockdep checks effective */
mdiobus_read(bus, ps->port_base + port, MII_BMSR);
@@ -244,6 +251,9 @@
if (pvid == vid)
pvid = 1;
+
+ dev_dbg(ds->dev, "%s: port: %d vlan: %d, %stagged, pvid: %d\n",
+ __func__, port, vid, untagged ? "un" : "", pvid);
}
ps->pvid = pvid;
@@ -256,6 +266,7 @@
.get_strings = dsa_loop_get_strings,
.get_ethtool_stats = dsa_loop_get_ethtool_stats,
.get_sset_count = dsa_loop_get_sset_count,
+ .get_ethtool_phy_stats = dsa_loop_get_ethtool_stats,
.phy_read = dsa_loop_phy_read,
.phy_write = dsa_loop_phy_write,
.port_bridge_join = dsa_loop_port_bridge_join,
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index fefa454..b4f6e1a 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -977,10 +977,14 @@
{ .offset = LAN9303_MAC_TX_LATECOL_0, .name = "TxLateCol", },
};
-static void lan9303_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
+static void lan9303_get_strings(struct dsa_switch *ds, int port,
+ u32 stringset, uint8_t *data)
{
unsigned int u;
+ if (stringset != ETH_SS_STATS)
+ return;
+
for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
strncpy(data + u * ETH_GSTRING_LEN, lan9303_mib[u].name,
ETH_GSTRING_LEN);
@@ -1007,8 +1011,11 @@
}
}
-static int lan9303_get_sset_count(struct dsa_switch *ds, int port)
+static int lan9303_get_sset_count(struct dsa_switch *ds, int port, int sset)
{
+ if (sset != ETH_SS_STATS)
+ return 0;
+
return ARRAY_SIZE(lan9303_mib);
}
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index bcb3e6c..7210c49 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -439,15 +439,22 @@
ksz_port_cfg(dev, port, REG_PORT_CTRL_0, PORT_MAC_LOOPBACK, true);
}
-static int ksz_sset_count(struct dsa_switch *ds, int port)
+static int ksz_sset_count(struct dsa_switch *ds, int port, int sset)
{
+ if (sset != ETH_SS_STATS)
+ return 0;
+
return TOTAL_SWITCH_COUNTER_NUM;
}
-static void ksz_get_strings(struct dsa_switch *ds, int port, uint8_t *buf)
+static void ksz_get_strings(struct dsa_switch *ds, int port,
+ u32 stringset, uint8_t *buf)
{
int i;
+ if (stringset != ETH_SS_STATS)
+ return;
+
for (i = 0; i < TOTAL_SWITCH_COUNTER_NUM; i++) {
memcpy(buf + i * ETH_GSTRING_LEN, mib_names[i].string,
ETH_GSTRING_LEN);
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 80a4dbc..62e4866 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -573,10 +573,14 @@
}
static void
-mt7530_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
+mt7530_get_strings(struct dsa_switch *ds, int port, u32 stringset,
+ uint8_t *data)
{
int i;
+ if (stringset != ETH_SS_STATS)
+ return;
+
for (i = 0; i < ARRAY_SIZE(mt7530_mib); i++)
strncpy(data + i * ETH_GSTRING_LEN, mt7530_mib[i].name,
ETH_GSTRING_LEN);
@@ -604,8 +608,11 @@
}
static int
-mt7530_get_sset_count(struct dsa_switch *ds, int port)
+mt7530_get_sset_count(struct dsa_switch *ds, int port, int sset)
{
+ if (sset != ETH_SS_STATS)
+ return 0;
+
return ARRAY_SIZE(mt7530_mib);
}
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5b4374f..12df00f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -28,9 +28,11 @@
#include <linux/of_device.h>
#include <linux/of_irq.h>
#include <linux/of_mdio.h>
+#include <linux/platform_data/mv88e6xxx.h>
#include <linux/netdevice.h>
#include <linux/gpio/consumer.h>
#include <linux/phy.h>
+#include <linux/phylink.h>
#include <net/dsa.h>
#include "chip.h"
@@ -580,6 +582,83 @@
dev_err(ds->dev, "p%d: failed to configure MAC\n", port);
}
+static void mv88e6xxx_validate(struct dsa_switch *ds, int port,
+ unsigned long *supported,
+ struct phylink_link_state *state)
+{
+}
+
+static int mv88e6xxx_link_state(struct dsa_switch *ds, int port,
+ struct phylink_link_state *state)
+{
+ struct mv88e6xxx_chip *chip = ds->priv;
+ int err;
+
+ mutex_lock(&chip->reg_lock);
+ err = mv88e6xxx_port_link_state(chip, port, state);
+ mutex_unlock(&chip->reg_lock);
+
+ return err;
+}
+
+static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state)
+{
+ struct mv88e6xxx_chip *chip = ds->priv;
+ int speed, duplex, link, err;
+
+ if (mode == MLO_AN_PHY)
+ return;
+
+ if (mode == MLO_AN_FIXED) {
+ link = LINK_FORCED_UP;
+ speed = state->speed;
+ duplex = state->duplex;
+ } else {
+ speed = SPEED_UNFORCED;
+ duplex = DUPLEX_UNFORCED;
+ link = LINK_UNFORCED;
+ }
+
+ mutex_lock(&chip->reg_lock);
+ err = mv88e6xxx_port_setup_mac(chip, port, link, speed, duplex,
+ state->interface);
+ mutex_unlock(&chip->reg_lock);
+
+ if (err && err != -EOPNOTSUPP)
+ dev_err(ds->dev, "p%d: failed to configure MAC\n", port);
+}
+
+static void mv88e6xxx_mac_link_force(struct dsa_switch *ds, int port, int link)
+{
+ struct mv88e6xxx_chip *chip = ds->priv;
+ int err;
+
+ mutex_lock(&chip->reg_lock);
+ err = chip->info->ops->port_set_link(chip, port, link);
+ mutex_unlock(&chip->reg_lock);
+
+ if (err)
+ dev_err(chip->dev, "p%d: failed to force MAC link\n", port);
+}
+
+static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface)
+{
+ if (mode == MLO_AN_FIXED)
+ mv88e6xxx_mac_link_force(ds, port, LINK_FORCED_DOWN);
+}
+
+static void mv88e6xxx_mac_link_up(struct dsa_switch *ds, int port,
+ unsigned int mode, phy_interface_t interface,
+ struct phy_device *phydev)
+{
+ if (mode == MLO_AN_FIXED)
+ mv88e6xxx_mac_link_force(ds, port, LINK_FORCED_UP);
+}
+
static int mv88e6xxx_stats_snapshot(struct mv88e6xxx_chip *chip, int port)
{
if (!chip->info->ops->stats_snapshot)
@@ -665,13 +744,13 @@
case STATS_TYPE_PORT:
err = mv88e6xxx_port_read(chip, port, s->reg, ®);
if (err)
- return UINT64_MAX;
+ return U64_MAX;
low = reg;
if (s->size == 4) {
err = mv88e6xxx_port_read(chip, port, s->reg + 1, ®);
if (err)
- return UINT64_MAX;
+ return U64_MAX;
high = reg;
}
break;
@@ -685,7 +764,7 @@
mv88e6xxx_g1_stats_read(chip, reg + 1, &high);
break;
default:
- return UINT64_MAX;
+ return U64_MAX;
}
value = (((u64)high) << 16) | low;
return value;
@@ -742,11 +821,14 @@
}
static void mv88e6xxx_get_strings(struct dsa_switch *ds, int port,
- uint8_t *data)
+ u32 stringset, uint8_t *data)
{
struct mv88e6xxx_chip *chip = ds->priv;
int count = 0;
+ if (stringset != ETH_SS_STATS)
+ return;
+
mutex_lock(&chip->reg_lock);
if (chip->info->ops->stats_get_strings)
@@ -789,12 +871,15 @@
STATS_TYPE_BANK1);
}
-static int mv88e6xxx_get_sset_count(struct dsa_switch *ds, int port)
+static int mv88e6xxx_get_sset_count(struct dsa_switch *ds, int port, int sset)
{
struct mv88e6xxx_chip *chip = ds->priv;
int serdes_count = 0;
int count = 0;
+ if (sset != ETH_SS_STATS)
+ return 0;
+
mutex_lock(&chip->reg_lock);
if (chip->info->ops->stats_get_sset_count)
count = chip->info->ops->stats_get_sset_count(chip);
@@ -911,14 +996,6 @@
}
-static int mv88e6xxx_stats_set_histogram(struct mv88e6xxx_chip *chip)
-{
- if (chip->info->ops->stats_set_histogram)
- return chip->info->ops->stats_set_histogram(chip);
-
- return 0;
-}
-
static int mv88e6xxx_get_regs_len(struct dsa_switch *ds, int port)
{
return 32 * sizeof(u16);
@@ -1020,6 +1097,76 @@
dev_err(ds->dev, "p%d: failed to update state\n", port);
}
+static int mv88e6xxx_pri_setup(struct mv88e6xxx_chip *chip)
+{
+ int err;
+
+ if (chip->info->ops->ieee_pri_map) {
+ err = chip->info->ops->ieee_pri_map(chip);
+ if (err)
+ return err;
+ }
+
+ if (chip->info->ops->ip_pri_map) {
+ err = chip->info->ops->ip_pri_map(chip);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static int mv88e6xxx_devmap_setup(struct mv88e6xxx_chip *chip)
+{
+ int target, port;
+ int err;
+
+ if (!chip->info->global2_addr)
+ return 0;
+
+ /* Initialize the routing port to the 32 possible target devices */
+ for (target = 0; target < 32; target++) {
+ port = 0x1f;
+ if (target < DSA_MAX_SWITCHES)
+ if (chip->ds->rtable[target] != DSA_RTABLE_NONE)
+ port = chip->ds->rtable[target];
+
+ err = mv88e6xxx_g2_device_mapping_write(chip, target, port);
+ if (err)
+ return err;
+ }
+
+ if (chip->info->ops->set_cascade_port) {
+ port = MV88E6XXX_CASCADE_PORT_MULTIPLE;
+ err = chip->info->ops->set_cascade_port(chip, port);
+ if (err)
+ return err;
+ }
+
+ err = mv88e6xxx_g1_set_device_number(chip, chip->ds->index);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int mv88e6xxx_trunk_setup(struct mv88e6xxx_chip *chip)
+{
+ /* Clear all trunk masks and mapping */
+ if (chip->info->global2_addr)
+ return mv88e6xxx_g2_trunk_clear(chip);
+
+ return 0;
+}
+
+static int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip)
+{
+ if (chip->info->ops->rmu_disable)
+ return chip->info->ops->rmu_disable(chip);
+
+ return 0;
+}
+
static int mv88e6xxx_pot_setup(struct mv88e6xxx_chip *chip)
{
if (chip->info->ops->pot_clear)
@@ -2113,53 +2260,16 @@
return err;
}
-static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
+static int mv88e6xxx_stats_setup(struct mv88e6xxx_chip *chip)
{
- struct dsa_switch *ds = chip->ds;
int err;
- /* Disable remote management, and set the switch's DSA device number. */
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL2,
- MV88E6XXX_G1_CTL2_MULTIPLE_CASCADE |
- (ds->index & 0x1f));
- if (err)
- return err;
-
- /* Configure the IP ToS mapping registers. */
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_0, 0x0000);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_1, 0x0000);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_2, 0x5555);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_3, 0x5555);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_4, 0xaaaa);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_5, 0xaaaa);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_6, 0xffff);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_7, 0xffff);
- if (err)
- return err;
-
- /* Configure the IEEE 802.1p priority mapping register. */
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IEEE_PRI, 0xfa41);
- if (err)
- return err;
-
/* Initialize the statistics unit */
- err = mv88e6xxx_stats_set_histogram(chip);
- if (err)
- return err;
+ if (chip->info->ops->stats_set_histogram) {
+ err = chip->info->ops->stats_set_histogram(chip);
+ if (err)
+ return err;
+ }
return mv88e6xxx_g1_stats_clear(chip);
}
@@ -2185,18 +2295,6 @@
goto unlock;
}
- /* Setup Switch Global 1 Registers */
- err = mv88e6xxx_g1_setup(chip);
- if (err)
- goto unlock;
-
- /* Setup Switch Global 2 Registers */
- if (chip->info->global2_addr) {
- err = mv88e6xxx_g2_setup(chip);
- if (err)
- goto unlock;
- }
-
err = mv88e6xxx_irl_setup(chip);
if (err)
goto unlock;
@@ -2229,10 +2327,26 @@
if (err)
goto unlock;
+ err = mv88e6xxx_rmu_setup(chip);
+ if (err)
+ goto unlock;
+
err = mv88e6xxx_rsvd2cpu_setup(chip);
if (err)
goto unlock;
+ err = mv88e6xxx_trunk_setup(chip);
+ if (err)
+ goto unlock;
+
+ err = mv88e6xxx_devmap_setup(chip);
+ if (err)
+ goto unlock;
+
+ err = mv88e6xxx_pri_setup(chip);
+ if (err)
+ goto unlock;
+
/* Setup PTP Hardware Clock and timestamping */
if (chip->info->ptp_support) {
err = mv88e6xxx_ptp_setup(chip);
@@ -2244,6 +2358,10 @@
goto unlock;
}
+ err = mv88e6xxx_stats_setup(chip);
+ if (err)
+ goto unlock;
+
unlock:
mutex_unlock(&chip->reg_lock);
@@ -2337,10 +2455,7 @@
return err;
}
- if (np)
- err = of_mdiobus_register(bus, np);
- else
- err = mdiobus_register(bus);
+ err = of_mdiobus_register(bus, np);
if (err) {
dev_err(chip->dev, "Cannot register MDIO bus (%d)\n", err);
mv88e6xxx_g2_irq_mdio_free(chip, bus);
@@ -2460,6 +2575,8 @@
static const struct mv88e6xxx_ops mv88e6085_ops = {
/* MV88E6XXX_FAMILY_6097 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
@@ -2488,12 +2605,16 @@
.ppu_enable = mv88e6185_g1_ppu_enable,
.ppu_disable = mv88e6185_g1_ppu_disable,
.reset = mv88e6185_g1_reset,
+ .rmu_disable = mv88e6085_g1_rmu_disable,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+ .serdes_power = mv88e6341_serdes_power,
};
static const struct mv88e6xxx_ops mv88e6095_ops = {
/* MV88E6XXX_FAMILY_6095 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
.phy_write = mv88e6185_phy_ppu_write,
@@ -2518,6 +2639,8 @@
static const struct mv88e6xxx_ops mv88e6097_ops = {
/* MV88E6XXX_FAMILY_6097 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2545,12 +2668,15 @@
.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6085_g1_rmu_disable,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
};
static const struct mv88e6xxx_ops mv88e6123_ops = {
/* MV88E6XXX_FAMILY_6165 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2579,6 +2705,8 @@
static const struct mv88e6xxx_ops mv88e6131_ops = {
/* MV88E6XXX_FAMILY_6185 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
.phy_write = mv88e6185_phy_ppu_write,
@@ -2603,6 +2731,7 @@
.watchdog_ops = &mv88e6097_watchdog_ops,
.mgmt_rsvd2cpu = mv88e6185_g2_mgmt_rsvd2cpu,
.ppu_enable = mv88e6185_g1_ppu_enable,
+ .set_cascade_port = mv88e6185_g1_set_cascade_port,
.ppu_disable = mv88e6185_g1_ppu_disable,
.reset = mv88e6185_g1_reset,
.vtu_getnext = mv88e6185_g1_vtu_getnext,
@@ -2611,6 +2740,8 @@
static const struct mv88e6xxx_ops mv88e6141_ops = {
/* MV88E6XXX_FAMILY_6341 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom8,
.set_eeprom = mv88e6xxx_g2_set_eeprom8,
@@ -2648,6 +2779,8 @@
static const struct mv88e6xxx_ops mv88e6161_ops = {
/* MV88E6XXX_FAMILY_6165 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2681,6 +2814,8 @@
static const struct mv88e6xxx_ops mv88e6165_ops = {
/* MV88E6XXX_FAMILY_6165 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6165_phy_read,
@@ -2707,6 +2842,8 @@
static const struct mv88e6xxx_ops mv88e6171_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2741,6 +2878,8 @@
static const struct mv88e6xxx_ops mv88e6172_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -2771,6 +2910,7 @@
.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6352_g1_rmu_disable,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
.serdes_power = mv88e6352_serdes_power,
@@ -2779,6 +2919,8 @@
static const struct mv88e6xxx_ops mv88e6175_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2809,10 +2951,13 @@
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+ .serdes_power = mv88e6341_serdes_power,
};
static const struct mv88e6xxx_ops mv88e6176_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -2843,6 +2988,7 @@
.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6352_g1_rmu_disable,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
.serdes_power = mv88e6352_serdes_power,
@@ -2851,6 +2997,8 @@
static const struct mv88e6xxx_ops mv88e6185_ops = {
/* MV88E6XXX_FAMILY_6185 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
.phy_write = mv88e6185_phy_ppu_write,
@@ -2870,6 +3018,7 @@
.set_egress_port = mv88e6095_g1_set_egress_port,
.watchdog_ops = &mv88e6097_watchdog_ops,
.mgmt_rsvd2cpu = mv88e6185_g2_mgmt_rsvd2cpu,
+ .set_cascade_port = mv88e6185_g1_set_cascade_port,
.ppu_enable = mv88e6185_g1_ppu_enable,
.ppu_disable = mv88e6185_g1_ppu_disable,
.reset = mv88e6185_g1_reset,
@@ -2907,6 +3056,7 @@
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6390_g1_rmu_disable,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
@@ -2943,6 +3093,7 @@
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6390_g1_rmu_disable,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
@@ -2979,6 +3130,7 @@
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6390_g1_rmu_disable,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
@@ -2986,6 +3138,8 @@
static const struct mv88e6xxx_ops mv88e6240_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -3016,6 +3170,7 @@
.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6352_g1_rmu_disable,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
.serdes_power = mv88e6352_serdes_power,
@@ -3054,6 +3209,7 @@
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6390_g1_rmu_disable,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
@@ -3063,6 +3219,8 @@
static const struct mv88e6xxx_ops mv88e6320_ops = {
/* MV88E6XXX_FAMILY_6320 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -3099,6 +3257,8 @@
static const struct mv88e6xxx_ops mv88e6321_ops = {
/* MV88E6XXX_FAMILY_6320 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -3133,6 +3293,8 @@
static const struct mv88e6xxx_ops mv88e6341_ops = {
/* MV88E6XXX_FAMILY_6341 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom8,
.set_eeprom = mv88e6xxx_g2_set_eeprom8,
@@ -3171,6 +3333,8 @@
static const struct mv88e6xxx_ops mv88e6350_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -3205,6 +3369,8 @@
static const struct mv88e6xxx_ops mv88e6351_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -3240,6 +3406,8 @@
static const struct mv88e6xxx_ops mv88e6352_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -3270,6 +3438,7 @@
.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6352_g1_rmu_disable,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
.serdes_power = mv88e6352_serdes_power,
@@ -3313,6 +3482,7 @@
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6390_g1_rmu_disable,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
@@ -3353,6 +3523,7 @@
.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
.pot_clear = mv88e6xxx_g2_pot_clear,
.reset = mv88e6352_g1_reset,
+ .rmu_disable = mv88e6390_g1_rmu_disable,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
@@ -4125,6 +4296,11 @@
.get_tag_protocol = mv88e6xxx_get_tag_protocol,
.setup = mv88e6xxx_setup,
.adjust_link = mv88e6xxx_adjust_link,
+ .phylink_validate = mv88e6xxx_validate,
+ .phylink_mac_link_state = mv88e6xxx_link_state,
+ .phylink_mac_config = mv88e6xxx_mac_config,
+ .phylink_mac_link_down = mv88e6xxx_mac_link_down,
+ .phylink_mac_link_up = mv88e6xxx_mac_link_up,
.get_strings = mv88e6xxx_get_strings,
.get_ethtool_stats = mv88e6xxx_get_ethtool_stats,
.get_sset_count = mv88e6xxx_get_sset_count,
@@ -4175,6 +4351,7 @@
return -ENOMEM;
ds->priv = chip;
+ ds->dev = dev;
ds->ops = &mv88e6xxx_switch_ops;
ds->ageing_time_min = chip->info->age_time_coeff;
ds->ageing_time_max = chip->info->age_time_coeff * U8_MAX;
@@ -4189,42 +4366,82 @@
dsa_unregister_switch(chip->ds);
}
+static const void *pdata_device_get_match_data(struct device *dev)
+{
+ const struct of_device_id *matches = dev->driver->of_match_table;
+ const struct dsa_mv88e6xxx_pdata *pdata = dev->platform_data;
+
+ for (; matches->name[0] || matches->type[0] || matches->compatible[0];
+ matches++) {
+ if (!strcmp(pdata->compatible, matches->compatible))
+ return matches->data;
+ }
+ return NULL;
+}
+
static int mv88e6xxx_probe(struct mdio_device *mdiodev)
{
+ struct dsa_mv88e6xxx_pdata *pdata = mdiodev->dev.platform_data;
+ const struct mv88e6xxx_info *compat_info = NULL;
struct device *dev = &mdiodev->dev;
struct device_node *np = dev->of_node;
- const struct mv88e6xxx_info *compat_info;
struct mv88e6xxx_chip *chip;
- u32 eeprom_len;
+ int port;
int err;
- compat_info = of_device_get_match_data(dev);
+ if (np)
+ compat_info = of_device_get_match_data(dev);
+
+ if (pdata) {
+ compat_info = pdata_device_get_match_data(dev);
+
+ if (!pdata->netdev)
+ return -EINVAL;
+
+ for (port = 0; port < DSA_MAX_PORTS; port++) {
+ if (!(pdata->enabled_ports & (1 << port)))
+ continue;
+ if (strcmp(pdata->cd.port_names[port], "cpu"))
+ continue;
+ pdata->cd.netdev[port] = &pdata->netdev->dev;
+ break;
+ }
+ }
+
if (!compat_info)
return -EINVAL;
chip = mv88e6xxx_alloc_chip(dev);
- if (!chip)
- return -ENOMEM;
+ if (!chip) {
+ err = -ENOMEM;
+ goto out;
+ }
chip->info = compat_info;
err = mv88e6xxx_smi_init(chip, mdiodev->bus, mdiodev->addr);
if (err)
- return err;
+ goto out;
chip->reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
- if (IS_ERR(chip->reset))
- return PTR_ERR(chip->reset);
+ if (IS_ERR(chip->reset)) {
+ err = PTR_ERR(chip->reset);
+ goto out;
+ }
err = mv88e6xxx_detect(chip);
if (err)
- return err;
+ goto out;
mv88e6xxx_phy_init(chip);
- if (chip->info->ops->get_eeprom &&
- !of_property_read_u32(np, "eeprom-length", &eeprom_len))
- chip->eeprom_len = eeprom_len;
+ if (chip->info->ops->get_eeprom) {
+ if (np)
+ of_property_read_u32(np, "eeprom-length",
+ &chip->eeprom_len);
+ else
+ chip->eeprom_len = pdata->eeprom_len;
+ }
mutex_lock(&chip->reg_lock);
err = mv88e6xxx_switch_reset(chip);
@@ -4293,6 +4510,9 @@
mv88e6xxx_irq_poll_free(chip);
mutex_unlock(&chip->reg_lock);
out:
+ if (pdata)
+ dev_put(pdata->netdev);
+
return err;
}
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 12b7f46..8ac3fbb 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -21,10 +21,6 @@
#include <linux/timecounter.h>
#include <net/dsa.h>
-#ifndef UINT64_MAX
-#define UINT64_MAX (u64)(~((u64)0))
-#endif
-
#define SMI_CMD 0x00
#define SMI_CMD_BUSY BIT(15)
#define SMI_CMD_CLAUSE_22 BIT(12)
@@ -242,7 +238,7 @@
struct gpio_desc *reset;
/* set to size of eeprom if supported by the switch */
- int eeprom_len;
+ u32 eeprom_len;
/* List of mdio busses */
struct list_head mdios;
@@ -298,6 +294,9 @@
};
struct mv88e6xxx_ops {
+ int (*ieee_pri_map)(struct mv88e6xxx_chip *chip);
+ int (*ip_pri_map)(struct mv88e6xxx_chip *chip);
+
/* Ingress Rate Limit unit (IRL) operations */
int (*irl_init_all)(struct mv88e6xxx_chip *chip, int port);
@@ -406,6 +405,12 @@
uint64_t *data);
int (*set_cpu_port)(struct mv88e6xxx_chip *chip, int port);
int (*set_egress_port)(struct mv88e6xxx_chip *chip, int port);
+
+#define MV88E6XXX_CASCADE_PORT_NONE 0xe
+#define MV88E6XXX_CASCADE_PORT_MULTIPLE 0xf
+
+ int (*set_cascade_port)(struct mv88e6xxx_chip *chip, int port);
+
const struct mv88e6xxx_irq_ops *watchdog_ops;
int (*mgmt_rsvd2cpu)(struct mv88e6xxx_chip *chip);
@@ -431,6 +436,9 @@
/* Interface to the AVB/PTP registers */
const struct mv88e6xxx_avb_ops *avb_ops;
+
+ /* Remote Management Unit operations */
+ int (*rmu_disable)(struct mv88e6xxx_chip *chip);
};
struct mv88e6xxx_irq_ops {
diff --git a/drivers/net/dsa/mv88e6xxx/global1.c b/drivers/net/dsa/mv88e6xxx/global1.c
index b43bd64..d721ccf 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.c
+++ b/drivers/net/dsa/mv88e6xxx/global1.c
@@ -241,6 +241,64 @@
return mv88e6185_g1_wait_ppu_disabled(chip);
}
+/* Offset 0x10: IP-PRI Mapping Register 0
+ * Offset 0x11: IP-PRI Mapping Register 1
+ * Offset 0x12: IP-PRI Mapping Register 2
+ * Offset 0x13: IP-PRI Mapping Register 3
+ * Offset 0x14: IP-PRI Mapping Register 4
+ * Offset 0x15: IP-PRI Mapping Register 5
+ * Offset 0x16: IP-PRI Mapping Register 6
+ * Offset 0x17: IP-PRI Mapping Register 7
+ */
+
+int mv88e6085_g1_ip_pri_map(struct mv88e6xxx_chip *chip)
+{
+ int err;
+
+ /* Reset the IP TOS/DiffServ/Traffic priorities to defaults */
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_0, 0x0000);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_1, 0x0000);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_2, 0x5555);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_3, 0x5555);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_4, 0xaaaa);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_5, 0xaaaa);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_6, 0xffff);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_7, 0xffff);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+/* Offset 0x18: IEEE-PRI Register */
+
+int mv88e6085_g1_ieee_pri_map(struct mv88e6xxx_chip *chip)
+{
+ /* Reset the IEEE Tag priorities to defaults */
+ return mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IEEE_PRI, 0xfa41);
+}
+
/* Offset 0x1a: Monitor Control */
/* Offset 0x1a: Monitor & MGMT Control on some devices */
@@ -350,20 +408,59 @@
/* Offset 0x1c: Global Control 2 */
-int mv88e6390_g1_stats_set_histogram(struct mv88e6xxx_chip *chip)
+static int mv88e6xxx_g1_ctl2_mask(struct mv88e6xxx_chip *chip, u16 mask,
+ u16 val)
{
- u16 val;
+ u16 reg;
int err;
- err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL2, &val);
+ err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL2, ®);
if (err)
return err;
- val |= MV88E6XXX_G1_CTL2_HIST_RX_TX;
+ reg &= ~mask;
+ reg |= val & mask;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL2, val);
+ return mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL2, reg);
+}
- return err;
+int mv88e6185_g1_set_cascade_port(struct mv88e6xxx_chip *chip, int port)
+{
+ const u16 mask = MV88E6185_G1_CTL2_CASCADE_PORT_MASK;
+
+ return mv88e6xxx_g1_ctl2_mask(chip, mask, port << __bf_shf(mask));
+}
+
+int mv88e6085_g1_rmu_disable(struct mv88e6xxx_chip *chip)
+{
+ return mv88e6xxx_g1_ctl2_mask(chip, MV88E6085_G1_CTL2_P10RM |
+ MV88E6085_G1_CTL2_RM_ENABLE, 0);
+}
+
+int mv88e6352_g1_rmu_disable(struct mv88e6xxx_chip *chip)
+{
+ return mv88e6xxx_g1_ctl2_mask(chip, MV88E6352_G1_CTL2_RMU_MODE_MASK,
+ MV88E6352_G1_CTL2_RMU_MODE_DISABLED);
+}
+
+int mv88e6390_g1_rmu_disable(struct mv88e6xxx_chip *chip)
+{
+ return mv88e6xxx_g1_ctl2_mask(chip, MV88E6390_G1_CTL2_RMU_MODE_MASK,
+ MV88E6390_G1_CTL2_RMU_MODE_DISABLED);
+}
+
+int mv88e6390_g1_stats_set_histogram(struct mv88e6xxx_chip *chip)
+{
+ return mv88e6xxx_g1_ctl2_mask(chip, MV88E6390_G1_CTL2_HIST_MODE_MASK,
+ MV88E6390_G1_CTL2_HIST_MODE_RX |
+ MV88E6390_G1_CTL2_HIST_MODE_TX);
+}
+
+int mv88e6xxx_g1_set_device_number(struct mv88e6xxx_chip *chip, int index)
+{
+ return mv88e6xxx_g1_ctl2_mask(chip,
+ MV88E6XXX_G1_CTL2_DEVICE_NUMBER_MASK,
+ index);
}
/* Offset 0x1d: Statistics Operation 2 */
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index 6aee731..7c791c1d 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -201,11 +201,35 @@
/* Offset 0x1C: Global Control 2 */
#define MV88E6XXX_G1_CTL2 0x1c
-#define MV88E6XXX_G1_CTL2_NO_CASCADE 0xe000
-#define MV88E6XXX_G1_CTL2_MULTIPLE_CASCADE 0xf000
-#define MV88E6XXX_G1_CTL2_HIST_RX 0x0040
-#define MV88E6XXX_G1_CTL2_HIST_TX 0x0080
-#define MV88E6XXX_G1_CTL2_HIST_RX_TX 0x00c0
+#define MV88E6185_G1_CTL2_CASCADE_PORT_MASK 0xf000
+#define MV88E6185_G1_CTL2_CASCADE_PORT_NONE 0xe000
+#define MV88E6185_G1_CTL2_CASCADE_PORT_MULTI 0xf000
+#define MV88E6352_G1_CTL2_HEADER_TYPE_MASK 0xc000
+#define MV88E6352_G1_CTL2_HEADER_TYPE_ORIG 0x0000
+#define MV88E6352_G1_CTL2_HEADER_TYPE_MGMT 0x4000
+#define MV88E6390_G1_CTL2_HEADER_TYPE_LAG 0x8000
+#define MV88E6352_G1_CTL2_RMU_MODE_MASK 0x3000
+#define MV88E6352_G1_CTL2_RMU_MODE_DISABLED 0x0000
+#define MV88E6352_G1_CTL2_RMU_MODE_PORT_4 0x1000
+#define MV88E6352_G1_CTL2_RMU_MODE_PORT_5 0x2000
+#define MV88E6352_G1_CTL2_RMU_MODE_PORT_6 0x3000
+#define MV88E6085_G1_CTL2_DA_CHECK 0x4000
+#define MV88E6085_G1_CTL2_P10RM 0x2000
+#define MV88E6085_G1_CTL2_RM_ENABLE 0x1000
+#define MV88E6352_G1_CTL2_DA_CHECK 0x0800
+#define MV88E6390_G1_CTL2_RMU_MODE_MASK 0x0700
+#define MV88E6390_G1_CTL2_RMU_MODE_PORT_0 0x0000
+#define MV88E6390_G1_CTL2_RMU_MODE_PORT_1 0x0100
+#define MV88E6390_G1_CTL2_RMU_MODE_PORT_9 0x0200
+#define MV88E6390_G1_CTL2_RMU_MODE_PORT_10 0x0300
+#define MV88E6390_G1_CTL2_RMU_MODE_ALL_DSA 0x0600
+#define MV88E6390_G1_CTL2_RMU_MODE_DISABLED 0x0700
+#define MV88E6390_G1_CTL2_HIST_MODE_MASK 0x00c0
+#define MV88E6390_G1_CTL2_HIST_MODE_RX 0x0040
+#define MV88E6390_G1_CTL2_HIST_MODE_TX 0x0080
+#define MV88E6352_G1_CTL2_CTR_MODE_MASK 0x0060
+#define MV88E6390_G1_CTL2_CTR_MODE 0x0020
+#define MV88E6XXX_G1_CTL2_DEVICE_NUMBER_MASK 0x001f
/* Offset 0x1D: Stats Operation Register */
#define MV88E6XXX_G1_STATS_OP 0x1d
@@ -253,6 +277,17 @@
int mv88e6390_g1_set_cpu_port(struct mv88e6xxx_chip *chip, int port);
int mv88e6390_g1_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip);
+int mv88e6085_g1_ip_pri_map(struct mv88e6xxx_chip *chip);
+int mv88e6085_g1_ieee_pri_map(struct mv88e6xxx_chip *chip);
+
+int mv88e6185_g1_set_cascade_port(struct mv88e6xxx_chip *chip, int port);
+
+int mv88e6085_g1_rmu_disable(struct mv88e6xxx_chip *chip);
+int mv88e6352_g1_rmu_disable(struct mv88e6xxx_chip *chip);
+int mv88e6390_g1_rmu_disable(struct mv88e6xxx_chip *chip);
+
+int mv88e6xxx_g1_set_device_number(struct mv88e6xxx_chip *chip, int index);
+
int mv88e6xxx_g1_atu_set_learn2all(struct mv88e6xxx_chip *chip, bool learn2all);
int mv88e6xxx_g1_atu_set_age_time(struct mv88e6xxx_chip *chip,
unsigned int msecs);
diff --git a/drivers/net/dsa/mv88e6xxx/global2.c b/drivers/net/dsa/mv88e6xxx/global2.c
index 8d22d66..91a3cb2 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.c
+++ b/drivers/net/dsa/mv88e6xxx/global2.c
@@ -119,37 +119,17 @@
/* Offset 0x06: Device Mapping Table register */
-static int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip,
- int target, int port)
+int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip, int target,
+ int port)
{
- u16 val = (target << 8) | (port & 0xf);
+ u16 val = (target << 8) | (port & 0x1f);
+ /* Modern chips use 5 bits to define a device mapping port,
+ * but bit 4 is reserved on older chips, so it is safe to use.
+ */
return mv88e6xxx_g2_update(chip, MV88E6XXX_G2_DEVICE_MAPPING, val);
}
-static int mv88e6xxx_g2_set_device_mapping(struct mv88e6xxx_chip *chip)
-{
- int target, port;
- int err;
-
- /* Initialize the routing port to the 32 possible target devices */
- for (target = 0; target < 32; ++target) {
- port = 0xf;
-
- if (target < DSA_MAX_SWITCHES) {
- port = chip->ds->rtable[target];
- if (port == DSA_RTABLE_NONE)
- port = 0xf;
- }
-
- err = mv88e6xxx_g2_device_mapping_write(chip, target, port);
- if (err)
- break;
- }
-
- return err;
-}
-
/* Offset 0x07: Trunk Mask Table register */
static int mv88e6xxx_g2_trunk_mask_write(struct mv88e6xxx_chip *chip, int num,
@@ -174,7 +154,7 @@
return mv88e6xxx_g2_update(chip, MV88E6XXX_G2_TRUNK_MAPPING, val);
}
-static int mv88e6xxx_g2_clear_trunk(struct mv88e6xxx_chip *chip)
+int mv88e6xxx_g2_trunk_clear(struct mv88e6xxx_chip *chip)
{
const u16 port_mask = BIT(mv88e6xxx_num_ports(chip)) - 1;
int i, err;
@@ -1067,9 +1047,6 @@
{
int err, irq, virq;
- if (!chip->dev->of_node)
- return -EINVAL;
-
chip->g2_irq.domain = irq_domain_add_simple(
chip->dev->of_node, 16, 0, &mv88e6xxx_g2_irq_domain_ops, chip);
if (!chip->g2_irq.domain)
@@ -1138,31 +1115,3 @@
for (phy = 0; phy < chip->info->num_internal_phys; phy++)
irq_dispose_mapping(bus->irq[phy]);
}
-
-int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
-{
- u16 reg;
- int err;
-
- /* Ignore removed tag data on doubly tagged packets, disable
- * flow control messages, force flow control priority to the
- * highest, and send all special multicast frames to the CPU
- * port at the highest priority.
- */
- reg = MV88E6XXX_G2_SWITCH_MGMT_FORCE_FLOW_CTL_PRI | (0x7 << 4);
- err = mv88e6xxx_g2_write(chip, MV88E6XXX_G2_SWITCH_MGMT, reg);
- if (err)
- return err;
-
- /* Program the DSA routing table. */
- err = mv88e6xxx_g2_set_device_mapping(chip);
- if (err)
- return err;
-
- /* Clear all trunk masks and mapping. */
- err = mv88e6xxx_g2_clear_trunk(chip);
- if (err)
- return err;
-
- return 0;
-}
diff --git a/drivers/net/dsa/mv88e6xxx/global2.h b/drivers/net/dsa/mv88e6xxx/global2.h
index 520ec70..37e8ce2 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.h
+++ b/drivers/net/dsa/mv88e6xxx/global2.h
@@ -60,7 +60,8 @@
#define MV88E6XXX_G2_DEVICE_MAPPING 0x06
#define MV88E6XXX_G2_DEVICE_MAPPING_UPDATE 0x8000
#define MV88E6XXX_G2_DEVICE_MAPPING_DEV_MASK 0x1f00
-#define MV88E6XXX_G2_DEVICE_MAPPING_PORT_MASK 0x000f
+#define MV88E6352_G2_DEVICE_MAPPING_PORT_MASK 0x000f
+#define MV88E6390_G2_DEVICE_MAPPING_PORT_MASK 0x001f
/* Offset 0x07: Trunk Mask Table Register */
#define MV88E6XXX_G2_TRUNK_MASK 0x07
@@ -313,7 +314,6 @@
int src_port, u16 data);
int mv88e6xxx_g2_misc_4_bit_port(struct mv88e6xxx_chip *chip);
-int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip);
int mv88e6xxx_g2_irq_setup(struct mv88e6xxx_chip *chip);
void mv88e6xxx_g2_irq_free(struct mv88e6xxx_chip *chip);
@@ -327,6 +327,11 @@
int mv88e6xxx_g2_pot_clear(struct mv88e6xxx_chip *chip);
+int mv88e6xxx_g2_trunk_clear(struct mv88e6xxx_chip *chip);
+
+int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip, int target,
+ int port);
+
extern const struct mv88e6xxx_irq_ops mv88e6097_watchdog_ops;
extern const struct mv88e6xxx_irq_ops mv88e6390_watchdog_ops;
@@ -441,11 +446,6 @@
return -EOPNOTSUPP;
}
-static inline int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
-{
- return -EOPNOTSUPP;
-}
-
static inline int mv88e6xxx_g2_irq_setup(struct mv88e6xxx_chip *chip)
{
return -EOPNOTSUPP;
@@ -495,6 +495,17 @@
return -EOPNOTSUPP;
}
+static inline int mv88e6xxx_g2_trunk_clear(struct mv88e6xxx_chip *chip)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip,
+ int target, int port)
+{
+ return -EOPNOTSUPP;
+}
+
#endif /* CONFIG_NET_DSA_MV88E6XXX_GLOBAL2 */
#endif /* _MV88E6XXX_GLOBAL2_H */
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 6315774..429d0eb 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -15,6 +15,7 @@
#include <linux/bitfield.h>
#include <linux/if_bridge.h>
#include <linux/phy.h>
+#include <linux/phylink.h>
#include "chip.h"
#include "port.h"
@@ -378,6 +379,44 @@
return 0;
}
+int mv88e6xxx_port_link_state(struct mv88e6xxx_chip *chip, int port,
+ struct phylink_link_state *state)
+{
+ int err;
+ u16 reg;
+
+ err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_STS, ®);
+ if (err)
+ return err;
+
+ switch (reg & MV88E6XXX_PORT_STS_SPEED_MASK) {
+ case MV88E6XXX_PORT_STS_SPEED_10:
+ state->speed = SPEED_10;
+ break;
+ case MV88E6XXX_PORT_STS_SPEED_100:
+ state->speed = SPEED_100;
+ break;
+ case MV88E6XXX_PORT_STS_SPEED_1000:
+ state->speed = SPEED_1000;
+ break;
+ case MV88E6XXX_PORT_STS_SPEED_10000:
+ if ((reg &MV88E6XXX_PORT_STS_CMODE_MASK) ==
+ MV88E6XXX_PORT_STS_CMODE_2500BASEX)
+ state->speed = SPEED_2500;
+ else
+ state->speed = SPEED_10000;
+ break;
+ }
+
+ state->duplex = reg & MV88E6XXX_PORT_STS_DUPLEX ?
+ DUPLEX_FULL : DUPLEX_HALF;
+ state->link = !!(reg & MV88E6XXX_PORT_STS_LINK);
+ state->an_enabled = 1;
+ state->an_complete = state->link;
+
+ return 0;
+}
+
/* Offset 0x02: Jamming Control
*
* Do not limit the period of time that this port can be paused for by
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index b16d5f0..5e1db1b 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -29,6 +29,7 @@
#define MV88E6XXX_PORT_STS_SPEED_10 0x0000
#define MV88E6XXX_PORT_STS_SPEED_100 0x0100
#define MV88E6XXX_PORT_STS_SPEED_1000 0x0200
+#define MV88E6XXX_PORT_STS_SPEED_10000 0x0300
#define MV88E6352_PORT_STS_EEE 0x0040
#define MV88E6165_PORT_STS_AM_DIS 0x0040
#define MV88E6185_PORT_STS_MGMII 0x0040
@@ -295,6 +296,8 @@
int mv88e6390x_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
phy_interface_t mode);
int mv88e6xxx_port_get_cmode(struct mv88e6xxx_chip *chip, int port, u8 *cmode);
+int mv88e6xxx_port_link_state(struct mv88e6xxx_chip *chip, int port,
+ struct phylink_link_state *state);
int mv88e6xxx_port_set_map_da(struct mv88e6xxx_chip *chip, int port);
int mv88e6095_port_set_upstream_port(struct mv88e6xxx_chip *chip, int port,
int upstream_port);
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c b/drivers/net/dsa/mv88e6xxx/serdes.c
index fb058fd..880b2cf 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -326,3 +326,23 @@
return 0;
}
+
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on)
+{
+ int err;
+ u8 cmode;
+
+ if (port != 5)
+ return 0;
+
+ err = mv88e6xxx_port_get_cmode(chip, port, &cmode);
+ if (err)
+ return err;
+
+ if (cmode == MV88E6XXX_PORT_STS_CMODE_1000BASE_X ||
+ cmode == MV88E6XXX_PORT_STS_CMODE_SGMII ||
+ cmode == MV88E6XXX_PORT_STS_CMODE_2500BASEX)
+ return mv88e6390_serdes_sgmii(chip, MV88E6341_ADDR_SERDES, on);
+
+ return 0;
+}
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.h b/drivers/net/dsa/mv88e6xxx/serdes.h
index 1897c01..b6e5fbd 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.h
+++ b/drivers/net/dsa/mv88e6xxx/serdes.h
@@ -19,6 +19,8 @@
#define MV88E6352_ADDR_SERDES 0x0f
#define MV88E6352_SERDES_PAGE_FIBER 0x01
+#define MV88E6341_ADDR_SERDES 0x15
+
#define MV88E6390_PORT9_LANE0 0x09
#define MV88E6390_PORT9_LANE1 0x12
#define MV88E6390_PORT9_LANE2 0x13
@@ -42,6 +44,7 @@
#define MV88E6390_SGMII_CONTROL_LOOPBACK BIT(14)
#define MV88E6390_SGMII_CONTROL_PDOWN BIT(11)
+int mv88e6341_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
int mv88e6352_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, bool on);
int mv88e6352_serdes_get_sset_count(struct mv88e6xxx_chip *chip, int port);
diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 600d5ad..cdcde7f 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -1,17 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2009 Felix Fietkau <nbd@nbd.name>
* Copyright (C) 2011-2012 Gabor Juhos <juhosg@openwrt.org>
* Copyright (c) 2015, The Linux Foundation. All rights reserved.
* Copyright (c) 2016 John Crispin <john@phrozen.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
*/
#include <linux/module.h>
@@ -473,10 +465,10 @@
static void
qca8k_port_set_status(struct qca8k_priv *priv, int port, int enable)
{
- u32 mask = QCA8K_PORT_STATUS_TXMAC;
+ u32 mask = QCA8K_PORT_STATUS_TXMAC | QCA8K_PORT_STATUS_RXMAC;
/* Port 0 and 6 have no internal PHY */
- if ((port > 0) && (port < 6))
+ if (port > 0 && port < 6)
mask |= QCA8K_PORT_STATUS_LINK_AUTO;
if (enable)
@@ -490,6 +482,7 @@
{
struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv;
int ret, i, phy_mode = -1;
+ u32 mask;
/* Make sure that port 0 is the cpu port */
if (!dsa_is_cpu_port(ds, 0)) {
@@ -515,7 +508,10 @@
if (ret < 0)
return ret;
- /* Enable CPU Port */
+ /* Enable CPU Port, force it to maximum bandwidth and full-duplex */
+ mask = QCA8K_PORT_STATUS_SPEED_1000 | QCA8K_PORT_STATUS_TXFLOW |
+ QCA8K_PORT_STATUS_RXFLOW | QCA8K_PORT_STATUS_DUPLEX;
+ qca8k_write(priv, QCA8K_REG_PORT_STATUS(QCA8K_CPU_PORT), mask);
qca8k_reg_set(priv, QCA8K_REG_GLOBAL_FW_CTRL0,
QCA8K_GLOBAL_FW_CTRL0_CPU_PORT_EN);
qca8k_port_set_status(priv, QCA8K_CPU_PORT, 1);
@@ -583,6 +579,47 @@
return 0;
}
+static void
+qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy)
+{
+ struct qca8k_priv *priv = ds->priv;
+ u32 reg;
+
+ /* Force fixed-link setting for CPU port, skip others. */
+ if (!phy_is_pseudo_fixed_link(phy))
+ return;
+
+ /* Set port speed */
+ switch (phy->speed) {
+ case 10:
+ reg = QCA8K_PORT_STATUS_SPEED_10;
+ break;
+ case 100:
+ reg = QCA8K_PORT_STATUS_SPEED_100;
+ break;
+ case 1000:
+ reg = QCA8K_PORT_STATUS_SPEED_1000;
+ break;
+ default:
+ dev_dbg(priv->dev, "port%d link speed %dMbps not supported.\n",
+ port, phy->speed);
+ return;
+ }
+
+ /* Set duplex mode */
+ if (phy->duplex == DUPLEX_FULL)
+ reg |= QCA8K_PORT_STATUS_DUPLEX;
+
+ /* Force flow control */
+ if (dsa_is_cpu_port(ds, port))
+ reg |= QCA8K_PORT_STATUS_RXFLOW | QCA8K_PORT_STATUS_TXFLOW;
+
+ /* Force link down before changing MAC options */
+ qca8k_port_set_status(priv, port, 0);
+ qca8k_write(priv, QCA8K_REG_PORT_STATUS(port), reg);
+ qca8k_port_set_status(priv, port, 1);
+}
+
static int
qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum)
{
@@ -600,10 +637,13 @@
}
static void
-qca8k_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
+qca8k_get_strings(struct dsa_switch *ds, int port, u32 stringset, uint8_t *data)
{
int i;
+ if (stringset != ETH_SS_STATS)
+ return;
+
for (i = 0; i < ARRAY_SIZE(ar8327_mib); i++)
strncpy(data + i * ETH_GSTRING_LEN, ar8327_mib[i].name,
ETH_GSTRING_LEN);
@@ -631,8 +671,11 @@
}
static int
-qca8k_get_sset_count(struct dsa_switch *ds, int port)
+qca8k_get_sset_count(struct dsa_switch *ds, int port, int sset)
{
+ if (sset != ETH_SS_STATS)
+ return 0;
+
return ARRAY_SIZE(ar8327_mib);
}
@@ -831,6 +874,7 @@
static const struct dsa_switch_ops qca8k_switch_ops = {
.get_tag_protocol = qca8k_get_tag_protocol,
.setup = qca8k_setup,
+ .adjust_link = qca8k_adjust_link,
.get_strings = qca8k_get_strings,
.phy_read = qca8k_phy_read,
.phy_write = qca8k_phy_write,
@@ -862,6 +906,7 @@
return -ENOMEM;
priv->bus = mdiodev->bus;
+ priv->dev = &mdiodev->dev;
/* read the switches ID register */
id = qca8k_read(priv, QCA8K_REG_MASK_CTRL);
@@ -933,6 +978,7 @@
qca8k_suspend, qca8k_resume);
static const struct of_device_id qca8k_of_match[] = {
+ { .compatible = "qca,qca8334" },
{ .compatible = "qca,qca8337" },
{ /* sentinel */ },
};
diff --git a/drivers/net/dsa/qca8k.h b/drivers/net/dsa/qca8k.h
index 1cf8a92..613fe5c5 100644
--- a/drivers/net/dsa/qca8k.h
+++ b/drivers/net/dsa/qca8k.h
@@ -51,8 +51,10 @@
#define QCA8K_GOL_MAC_ADDR0 0x60
#define QCA8K_GOL_MAC_ADDR1 0x64
#define QCA8K_REG_PORT_STATUS(_i) (0x07c + (_i) * 4)
-#define QCA8K_PORT_STATUS_SPEED GENMASK(2, 0)
-#define QCA8K_PORT_STATUS_SPEED_S 0
+#define QCA8K_PORT_STATUS_SPEED GENMASK(1, 0)
+#define QCA8K_PORT_STATUS_SPEED_10 0
+#define QCA8K_PORT_STATUS_SPEED_100 0x1
+#define QCA8K_PORT_STATUS_SPEED_1000 0x2
#define QCA8K_PORT_STATUS_TXMAC BIT(2)
#define QCA8K_PORT_STATUS_RXMAC BIT(3)
#define QCA8K_PORT_STATUS_TXFLOW BIT(4)
@@ -165,6 +167,7 @@
struct ar8xxx_port_status port_sts[QCA8K_NUM_PORTS];
struct dsa_switch *ds;
struct mutex reg_mutex;
+ struct device *dev;
};
struct qca8k_mib_desc {
diff --git a/drivers/net/ethernet/3com/3c59x.c b/drivers/net/ethernet/3com/3c59x.c
index 176861b..5bc1683 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -765,8 +765,9 @@
struct net_device *dev);
static int vortex_rx(struct net_device *dev);
static int boomerang_rx(struct net_device *dev);
-static irqreturn_t vortex_interrupt(int irq, void *dev_id);
-static irqreturn_t boomerang_interrupt(int irq, void *dev_id);
+static irqreturn_t vortex_boomerang_interrupt(int irq, void *dev_id);
+static irqreturn_t _vortex_interrupt(int irq, struct net_device *dev);
+static irqreturn_t _boomerang_interrupt(int irq, struct net_device *dev);
static int vortex_close(struct net_device *dev);
static void dump_tx_ring(struct net_device *dev);
static void update_stats(void __iomem *ioaddr, struct net_device *dev);
@@ -838,11 +839,7 @@
#ifdef CONFIG_NET_POLL_CONTROLLER
static void poll_vortex(struct net_device *dev)
{
- struct vortex_private *vp = netdev_priv(dev);
- unsigned long flags;
- local_irq_save(flags);
- (vp->full_bus_master_rx ? boomerang_interrupt:vortex_interrupt)(dev->irq,dev);
- local_irq_restore(flags);
+ vortex_boomerang_interrupt(dev->irq, dev);
}
#endif
@@ -1728,8 +1725,7 @@
dma_addr_t dma;
/* Use the now-standard shared IRQ implementation. */
- if ((retval = request_irq(dev->irq, vp->full_bus_master_rx ?
- boomerang_interrupt : vortex_interrupt, IRQF_SHARED, dev->name, dev))) {
+ if ((retval = request_irq(dev->irq, vortex_boomerang_interrupt, IRQF_SHARED, dev->name, dev))) {
pr_err("%s: Could not reserve IRQ %d\n", dev->name, dev->irq);
goto err;
}
@@ -1904,18 +1900,7 @@
pr_err("%s: Interrupt posted but not delivered --"
" IRQ blocked by another device?\n", dev->name);
/* Bad idea here.. but we might as well handle a few events. */
- {
- /*
- * Block interrupts because vortex_interrupt does a bare spin_lock()
- */
- unsigned long flags;
- local_irq_save(flags);
- if (vp->full_bus_master_tx)
- boomerang_interrupt(dev->irq, dev);
- else
- vortex_interrupt(dev->irq, dev);
- local_irq_restore(flags);
- }
+ vortex_boomerang_interrupt(dev->irq, dev);
}
if (vortex_debug > 0)
@@ -2266,9 +2251,8 @@
*/
static irqreturn_t
-vortex_interrupt(int irq, void *dev_id)
+_vortex_interrupt(int irq, struct net_device *dev)
{
- struct net_device *dev = dev_id;
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr;
int status;
@@ -2277,7 +2261,6 @@
unsigned int bytes_compl = 0, pkts_compl = 0;
ioaddr = vp->ioaddr;
- spin_lock(&vp->lock);
status = ioread16(ioaddr + EL3_STATUS);
@@ -2375,7 +2358,6 @@
pr_debug("%s: exiting interrupt, status %4.4x.\n",
dev->name, status);
handler_exit:
- spin_unlock(&vp->lock);
return IRQ_RETVAL(handled);
}
@@ -2385,9 +2367,8 @@
*/
static irqreturn_t
-boomerang_interrupt(int irq, void *dev_id)
+_boomerang_interrupt(int irq, struct net_device *dev)
{
- struct net_device *dev = dev_id;
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr;
int status;
@@ -2397,12 +2378,6 @@
ioaddr = vp->ioaddr;
-
- /*
- * It seems dopey to put the spinlock this early, but we could race against vortex_tx_timeout
- * and boomerang_start_xmit
- */
- spin_lock(&vp->lock);
vp->handling_irq = 1;
status = ioread16(ioaddr + EL3_STATUS);
@@ -2521,10 +2496,29 @@
dev->name, status);
handler_exit:
vp->handling_irq = 0;
- spin_unlock(&vp->lock);
return IRQ_RETVAL(handled);
}
+static irqreturn_t
+vortex_boomerang_interrupt(int irq, void *dev_id)
+{
+ struct net_device *dev = dev_id;
+ struct vortex_private *vp = netdev_priv(dev);
+ unsigned long flags;
+ irqreturn_t ret;
+
+ spin_lock_irqsave(&vp->lock, flags);
+
+ if (vp->full_bus_master_rx)
+ ret = _boomerang_interrupt(dev->irq, dev);
+ else
+ ret = _vortex_interrupt(dev->irq, dev);
+
+ spin_unlock_irqrestore(&vp->lock, flags);
+
+ return ret;
+}
+
static int vortex_rx(struct net_device *dev)
{
struct vortex_private *vp = netdev_priv(dev);
diff --git a/drivers/net/ethernet/8390/Kconfig b/drivers/net/ethernet/8390/Kconfig
index 9fee7c8..f2f0264 100644
--- a/drivers/net/ethernet/8390/Kconfig
+++ b/drivers/net/ethernet/8390/Kconfig
@@ -29,8 +29,8 @@
called axnet_cs. If unsure, say N.
config AX88796
- tristate "ASIX AX88796 NE2000 clone support"
- depends on (ARM || MIPS || SUPERH)
+ tristate "ASIX AX88796 NE2000 clone support" if !ZORRO
+ depends on (ARM || MIPS || SUPERH || ZORRO || COMPILE_TEST)
select CRC32
select PHYLIB
select MDIO_BITBANG
@@ -45,6 +45,19 @@
---help---
Select this if your platform comes with an external 93CX6 eeprom.
+config XSURF100
+ tristate "Amiga XSurf 100 AX88796/NE2000 clone support"
+ depends on ZORRO
+ select AX88796
+ select ASIX_PHY
+ help
+ This driver is for the Individual Computers X-Surf 100 Ethernet
+ card (based on the Asix AX88796 chip). If you have such a card,
+ say Y. Otherwise, say N.
+
+ To compile this driver as a module, choose M here: the module
+ will be called xsurf100.
+
config HYDRA
tristate "Hydra support"
depends on ZORRO
diff --git a/drivers/net/ethernet/8390/Makefile b/drivers/net/ethernet/8390/Makefile
index 1d650e6..85c83c5 100644
--- a/drivers/net/ethernet/8390/Makefile
+++ b/drivers/net/ethernet/8390/Makefile
@@ -16,4 +16,5 @@
obj-$(CONFIG_STNIC) += stnic.o 8390.o
obj-$(CONFIG_ULTRA) += smc-ultra.o 8390.o
obj-$(CONFIG_WD80x3) += wd.o 8390.o
+obj-$(CONFIG_XSURF100) += xsurf100.o
obj-$(CONFIG_ZORRO8390) += zorro8390.o
diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index da61cf3..2a0ddec 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -163,6 +163,21 @@
ei_outb(ENISR_RESET, addr + EN0_ISR); /* Ack intr. */
}
+/* Wrapper for __ei_interrupt for platforms that have a platform-specific
+ * way to find out whether the interrupt request might be caused by
+ * the ax88796 chip.
+ */
+static irqreturn_t ax_ei_interrupt_filtered(int irq, void *dev_id)
+{
+ struct net_device *dev = dev_id;
+ struct ax_device *ax = to_ax_dev(dev);
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+
+ if (!ax->plat->check_irq(pdev))
+ return IRQ_NONE;
+
+ return ax_ei_interrupt(irq, dev_id);
+}
static void ax_get_8390_hdr(struct net_device *dev, struct e8390_pkt_hdr *hdr,
int ring_page)
@@ -387,6 +402,90 @@
ei_outb(reg_gpoc, ei_local->mem + EI_SHIFT(0x17));
}
+static void ax_bb_mdc(struct mdiobb_ctrl *ctrl, int level)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+
+ if (level)
+ ax->reg_memr |= AX_MEMR_MDC;
+ else
+ ax->reg_memr &= ~AX_MEMR_MDC;
+
+ ei_outb(ax->reg_memr, ax->addr_memr);
+}
+
+static void ax_bb_dir(struct mdiobb_ctrl *ctrl, int output)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+
+ if (output)
+ ax->reg_memr &= ~AX_MEMR_MDIR;
+ else
+ ax->reg_memr |= AX_MEMR_MDIR;
+
+ ei_outb(ax->reg_memr, ax->addr_memr);
+}
+
+static void ax_bb_set_data(struct mdiobb_ctrl *ctrl, int value)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+
+ if (value)
+ ax->reg_memr |= AX_MEMR_MDO;
+ else
+ ax->reg_memr &= ~AX_MEMR_MDO;
+
+ ei_outb(ax->reg_memr, ax->addr_memr);
+}
+
+static int ax_bb_get_data(struct mdiobb_ctrl *ctrl)
+{
+ struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
+ int reg_memr = ei_inb(ax->addr_memr);
+
+ return reg_memr & AX_MEMR_MDI ? 1 : 0;
+}
+
+static const struct mdiobb_ops bb_ops = {
+ .owner = THIS_MODULE,
+ .set_mdc = ax_bb_mdc,
+ .set_mdio_dir = ax_bb_dir,
+ .set_mdio_data = ax_bb_set_data,
+ .get_mdio_data = ax_bb_get_data,
+};
+
+static int ax_mii_init(struct net_device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+ struct ei_device *ei_local = netdev_priv(dev);
+ struct ax_device *ax = to_ax_dev(dev);
+ int err;
+
+ ax->bb_ctrl.ops = &bb_ops;
+ ax->addr_memr = ei_local->mem + AX_MEMR;
+ ax->mii_bus = alloc_mdio_bitbang(&ax->bb_ctrl);
+ if (!ax->mii_bus) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ ax->mii_bus->name = "ax88796_mii_bus";
+ ax->mii_bus->parent = dev->dev.parent;
+ snprintf(ax->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
+ pdev->name, pdev->id);
+
+ err = mdiobus_register(ax->mii_bus);
+ if (err)
+ goto out_free_mdio_bitbang;
+
+ return 0;
+
+ out_free_mdio_bitbang:
+ free_mdio_bitbang(ax->mii_bus);
+ out:
+ return err;
+}
+
static int ax_open(struct net_device *dev)
{
struct ax_device *ax = to_ax_dev(dev);
@@ -394,8 +493,16 @@
netdev_dbg(dev, "open\n");
- ret = request_irq(dev->irq, ax_ei_interrupt, ax->irqflags,
- dev->name, dev);
+ ret = ax_mii_init(dev);
+ if (ret)
+ goto failed_mii;
+
+ if (ax->plat->check_irq)
+ ret = request_irq(dev->irq, ax_ei_interrupt_filtered,
+ ax->irqflags, dev->name, dev);
+ else
+ ret = request_irq(dev->irq, ax_ei_interrupt, ax->irqflags,
+ dev->name, dev);
if (ret)
goto failed_request_irq;
@@ -421,6 +528,10 @@
ax_phy_switch(dev, 0);
free_irq(dev->irq, dev);
failed_request_irq:
+ /* unregister mdiobus */
+ mdiobus_unregister(ax->mii_bus);
+ free_mdio_bitbang(ax->mii_bus);
+ failed_mii:
return ret;
}
@@ -440,6 +551,9 @@
phy_disconnect(dev->phydev);
free_irq(dev->irq, dev);
+
+ mdiobus_unregister(ax->mii_bus);
+ free_mdio_bitbang(ax->mii_bus);
return 0;
}
@@ -539,92 +653,8 @@
#endif
};
-static void ax_bb_mdc(struct mdiobb_ctrl *ctrl, int level)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
-
- if (level)
- ax->reg_memr |= AX_MEMR_MDC;
- else
- ax->reg_memr &= ~AX_MEMR_MDC;
-
- ei_outb(ax->reg_memr, ax->addr_memr);
-}
-
-static void ax_bb_dir(struct mdiobb_ctrl *ctrl, int output)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
-
- if (output)
- ax->reg_memr &= ~AX_MEMR_MDIR;
- else
- ax->reg_memr |= AX_MEMR_MDIR;
-
- ei_outb(ax->reg_memr, ax->addr_memr);
-}
-
-static void ax_bb_set_data(struct mdiobb_ctrl *ctrl, int value)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
-
- if (value)
- ax->reg_memr |= AX_MEMR_MDO;
- else
- ax->reg_memr &= ~AX_MEMR_MDO;
-
- ei_outb(ax->reg_memr, ax->addr_memr);
-}
-
-static int ax_bb_get_data(struct mdiobb_ctrl *ctrl)
-{
- struct ax_device *ax = container_of(ctrl, struct ax_device, bb_ctrl);
- int reg_memr = ei_inb(ax->addr_memr);
-
- return reg_memr & AX_MEMR_MDI ? 1 : 0;
-}
-
-static const struct mdiobb_ops bb_ops = {
- .owner = THIS_MODULE,
- .set_mdc = ax_bb_mdc,
- .set_mdio_dir = ax_bb_dir,
- .set_mdio_data = ax_bb_set_data,
- .get_mdio_data = ax_bb_get_data,
-};
-
/* setup code */
-static int ax_mii_init(struct net_device *dev)
-{
- struct platform_device *pdev = to_platform_device(dev->dev.parent);
- struct ei_device *ei_local = netdev_priv(dev);
- struct ax_device *ax = to_ax_dev(dev);
- int err;
-
- ax->bb_ctrl.ops = &bb_ops;
- ax->addr_memr = ei_local->mem + AX_MEMR;
- ax->mii_bus = alloc_mdio_bitbang(&ax->bb_ctrl);
- if (!ax->mii_bus) {
- err = -ENOMEM;
- goto out;
- }
-
- ax->mii_bus->name = "ax88796_mii_bus";
- ax->mii_bus->parent = dev->dev.parent;
- snprintf(ax->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
- pdev->name, pdev->id);
-
- err = mdiobus_register(ax->mii_bus);
- if (err)
- goto out_free_mdio_bitbang;
-
- return 0;
-
- out_free_mdio_bitbang:
- free_mdio_bitbang(ax->mii_bus);
- out:
- return err;
-}
-
static void ax_initial_setup(struct net_device *dev, struct ei_device *ei_local)
{
void __iomem *ioaddr = ei_local->mem;
@@ -669,10 +699,16 @@
if (ax->plat->flags & AXFLG_HAS_EEPROM) {
unsigned char SA_prom[32];
+ ei_outb(6, ioaddr + EN0_RCNTLO);
+ ei_outb(0, ioaddr + EN0_RCNTHI);
+ ei_outb(0, ioaddr + EN0_RSARLO);
+ ei_outb(0, ioaddr + EN0_RSARHI);
+ ei_outb(E8390_RREAD + E8390_START, ioaddr + NE_CMD);
for (i = 0; i < sizeof(SA_prom); i += 2) {
SA_prom[i] = ei_inb(ioaddr + NE_DATAPORT);
SA_prom[i + 1] = ei_inb(ioaddr + NE_DATAPORT);
}
+ ei_outb(ENISR_RDC, ioaddr + EN0_ISR); /* Ack intr. */
if (ax->plat->wordlength == 2)
for (i = 0; i < 16; i++)
@@ -741,18 +777,20 @@
#endif
ei_local->reset_8390 = &ax_reset_8390;
- ei_local->block_input = &ax_block_input;
- ei_local->block_output = &ax_block_output;
+ if (ax->plat->block_input)
+ ei_local->block_input = ax->plat->block_input;
+ else
+ ei_local->block_input = &ax_block_input;
+ if (ax->plat->block_output)
+ ei_local->block_output = ax->plat->block_output;
+ else
+ ei_local->block_output = &ax_block_output;
ei_local->get_8390_hdr = &ax_get_8390_hdr;
ei_local->priv = 0;
dev->netdev_ops = &ax_netdev_ops;
dev->ethtool_ops = &ax_ethtool_ops;
- ret = ax_mii_init(dev);
- if (ret)
- goto err_out;
-
ax_NS8390_init(dev, 0);
ret = register_netdev(dev);
@@ -777,7 +815,6 @@
struct resource *mem;
unregister_netdev(dev);
- free_irq(dev->irq, dev);
iounmap(ei_local->mem);
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -789,6 +826,7 @@
release_mem_region(mem->start, resource_size(mem));
}
+ platform_set_drvdata(pdev, NULL);
free_netdev(dev);
return 0;
@@ -835,6 +873,9 @@
dev->irq = irq->start;
ax->irqflags = irq->flags & IRQF_TRIGGER_MASK;
+ if (irq->flags & IORESOURCE_IRQ_SHAREABLE)
+ ax->irqflags |= IRQF_SHARED;
+
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!mem) {
dev_err(&pdev->dev, "no MEM specified\n");
@@ -919,6 +960,7 @@
release_mem_region(mem->start, mem_size);
exit_mem:
+ platform_set_drvdata(pdev, NULL);
free_netdev(dev);
return ret;
diff --git a/drivers/net/ethernet/8390/xsurf100.c b/drivers/net/ethernet/8390/xsurf100.c
new file mode 100644
index 0000000..e2c9638
--- /dev/null
+++ b/drivers/net/ethernet/8390/xsurf100.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/zorro.h>
+#include <net/ax88796.h>
+#include <asm/amigaints.h>
+
+#define ZORRO_PROD_INDIVIDUAL_COMPUTERS_X_SURF100 \
+ ZORRO_ID(INDIVIDUAL_COMPUTERS, 0x64, 0)
+
+#define XS100_IRQSTATUS_BASE 0x40
+#define XS100_8390_BASE 0x800
+
+/* Longword-access area. Translated to 2 16-bit access cycles by the
+ * X-Surf 100 FPGA
+ */
+#define XS100_8390_DATA32_BASE 0x8000
+#define XS100_8390_DATA32_SIZE 0x2000
+/* Sub-Areas for fast data register access; addresses relative to area begin */
+#define XS100_8390_DATA_READ32_BASE 0x0880
+#define XS100_8390_DATA_WRITE32_BASE 0x0C80
+#define XS100_8390_DATA_AREA_SIZE 0x80
+
+#define __NS8390_init ax_NS8390_init
+
+/* force unsigned long back to 'void __iomem *' */
+#define ax_convert_addr(_a) ((void __force __iomem *)(_a))
+
+#define ei_inb(_a) z_readb(ax_convert_addr(_a))
+#define ei_outb(_v, _a) z_writeb(_v, ax_convert_addr(_a))
+
+#define ei_inw(_a) z_readw(ax_convert_addr(_a))
+#define ei_outw(_v, _a) z_writew(_v, ax_convert_addr(_a))
+
+#define ei_inb_p(_a) ei_inb(_a)
+#define ei_outb_p(_v, _a) ei_outb(_v, _a)
+
+/* define EI_SHIFT() to take into account our register offsets */
+#define EI_SHIFT(x) (ei_local->reg_offset[(x)])
+
+/* Ensure we have our RCR base value */
+#define AX88796_PLATFORM
+
+static unsigned char version[] =
+ "ax88796.c: Copyright 2005,2007 Simtec Electronics\n";
+
+#include "lib8390.c"
+
+/* from ne.c */
+#define NE_CMD EI_SHIFT(0x00)
+#define NE_RESET EI_SHIFT(0x1f)
+#define NE_DATAPORT EI_SHIFT(0x10)
+
+struct xsurf100_ax_plat_data {
+ struct ax_plat_data ax;
+ void __iomem *base_regs;
+ void __iomem *data_area;
+};
+
+static int is_xsurf100_network_irq(struct platform_device *pdev)
+{
+ struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+ return (readw(xs100->base_regs + XS100_IRQSTATUS_BASE) & 0xaaaa) != 0;
+}
+
+/* These functions guarantee that the iomem is accessed with 32 bit
+ * cycles only. z_memcpy_fromio / z_memcpy_toio don't
+ */
+static void z_memcpy_fromio32(void *dst, const void __iomem *src, size_t bytes)
+{
+ while (bytes > 32) {
+ asm __volatile__
+ ("movem.l (%0)+,%%d0-%%d7\n"
+ "movem.l %%d0-%%d7,(%1)\n"
+ "adda.l #32,%1" : "=a"(src), "=a"(dst)
+ : "0"(src), "1"(dst) : "d0", "d1", "d2", "d3", "d4",
+ "d5", "d6", "d7", "memory");
+ bytes -= 32;
+ }
+ while (bytes) {
+ *(uint32_t *)dst = z_readl(src);
+ src += 4;
+ dst += 4;
+ bytes -= 4;
+ }
+}
+
+static void z_memcpy_toio32(void __iomem *dst, const void *src, size_t bytes)
+{
+ while (bytes) {
+ z_writel(*(const uint32_t *)src, dst);
+ src += 4;
+ dst += 4;
+ bytes -= 4;
+ }
+}
+
+static void xs100_write(struct net_device *dev, const void *src,
+ unsigned int count)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+ struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+ /* copy whole blocks */
+ while (count > XS100_8390_DATA_AREA_SIZE) {
+ z_memcpy_toio32(xs100->data_area +
+ XS100_8390_DATA_WRITE32_BASE, src,
+ XS100_8390_DATA_AREA_SIZE);
+ src += XS100_8390_DATA_AREA_SIZE;
+ count -= XS100_8390_DATA_AREA_SIZE;
+ }
+ /* copy whole dwords */
+ z_memcpy_toio32(xs100->data_area + XS100_8390_DATA_WRITE32_BASE,
+ src, count & ~3);
+ src += count & ~3;
+ if (count & 2) {
+ ei_outw(*(uint16_t *)src, ei_local->mem + NE_DATAPORT);
+ src += 2;
+ }
+ if (count & 1)
+ ei_outb(*(uint8_t *)src, ei_local->mem + NE_DATAPORT);
+}
+
+static void xs100_read(struct net_device *dev, void *dst, unsigned int count)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ struct platform_device *pdev = to_platform_device(dev->dev.parent);
+ struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+ /* copy whole blocks */
+ while (count > XS100_8390_DATA_AREA_SIZE) {
+ z_memcpy_fromio32(dst, xs100->data_area +
+ XS100_8390_DATA_READ32_BASE,
+ XS100_8390_DATA_AREA_SIZE);
+ dst += XS100_8390_DATA_AREA_SIZE;
+ count -= XS100_8390_DATA_AREA_SIZE;
+ }
+ /* copy whole dwords */
+ z_memcpy_fromio32(dst, xs100->data_area + XS100_8390_DATA_READ32_BASE,
+ count & ~3);
+ dst += count & ~3;
+ if (count & 2) {
+ *(uint16_t *)dst = ei_inw(ei_local->mem + NE_DATAPORT);
+ dst += 2;
+ }
+ if (count & 1)
+ *(uint8_t *)dst = ei_inb(ei_local->mem + NE_DATAPORT);
+}
+
+/* Block input and output, similar to the Crynwr packet driver. If
+ * you are porting to a new ethercard, look at the packet driver
+ * source for hints. The NEx000 doesn't share the on-board packet
+ * memory -- you have to put the packet out through the "remote DMA"
+ * dataport using ei_outb.
+ */
+static void xs100_block_input(struct net_device *dev, int count,
+ struct sk_buff *skb, int ring_offset)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ void __iomem *nic_base = ei_local->mem;
+ char *buf = skb->data;
+
+ if (ei_local->dmaing) {
+ netdev_err(dev,
+ "DMAing conflict in %s [DMAstat:%d][irqlock:%d]\n",
+ __func__,
+ ei_local->dmaing, ei_local->irqlock);
+ return;
+ }
+
+ ei_local->dmaing |= 0x01;
+
+ ei_outb(E8390_NODMA + E8390_PAGE0 + E8390_START, nic_base + NE_CMD);
+ ei_outb(count & 0xff, nic_base + EN0_RCNTLO);
+ ei_outb(count >> 8, nic_base + EN0_RCNTHI);
+ ei_outb(ring_offset & 0xff, nic_base + EN0_RSARLO);
+ ei_outb(ring_offset >> 8, nic_base + EN0_RSARHI);
+ ei_outb(E8390_RREAD + E8390_START, nic_base + NE_CMD);
+
+ xs100_read(dev, buf, count);
+
+ ei_local->dmaing &= ~1;
+}
+
+static void xs100_block_output(struct net_device *dev, int count,
+ const unsigned char *buf, const int start_page)
+{
+ struct ei_device *ei_local = netdev_priv(dev);
+ void __iomem *nic_base = ei_local->mem;
+ unsigned long dma_start;
+
+ /* Round the count up for word writes. Do we need to do this?
+ * What effect will an odd byte count have on the 8390? I
+ * should check someday.
+ */
+ if (ei_local->word16 && (count & 0x01))
+ count++;
+
+ /* This *shouldn't* happen. If it does, it's the last thing
+ * you'll see
+ */
+ if (ei_local->dmaing) {
+ netdev_err(dev,
+ "DMAing conflict in %s [DMAstat:%d][irqlock:%d]\n",
+ __func__,
+ ei_local->dmaing, ei_local->irqlock);
+ return;
+ }
+
+ ei_local->dmaing |= 0x01;
+ /* We should already be in page 0, but to be safe... */
+ ei_outb(E8390_PAGE0 + E8390_START + E8390_NODMA, nic_base + NE_CMD);
+
+ ei_outb(ENISR_RDC, nic_base + EN0_ISR);
+
+ /* Now the normal output. */
+ ei_outb(count & 0xff, nic_base + EN0_RCNTLO);
+ ei_outb(count >> 8, nic_base + EN0_RCNTHI);
+ ei_outb(0x00, nic_base + EN0_RSARLO);
+ ei_outb(start_page, nic_base + EN0_RSARHI);
+
+ ei_outb(E8390_RWRITE + E8390_START, nic_base + NE_CMD);
+
+ xs100_write(dev, buf, count);
+
+ dma_start = jiffies;
+
+ while ((ei_inb(nic_base + EN0_ISR) & ENISR_RDC) == 0) {
+ if (jiffies - dma_start > 2 * HZ / 100) { /* 20ms */
+ netdev_warn(dev, "timeout waiting for Tx RDC.\n");
+ ei_local->reset_8390(dev);
+ ax_NS8390_init(dev, 1);
+ break;
+ }
+ }
+
+ ei_outb(ENISR_RDC, nic_base + EN0_ISR); /* Ack intr. */
+ ei_local->dmaing &= ~0x01;
+}
+
+static int xsurf100_probe(struct zorro_dev *zdev,
+ const struct zorro_device_id *ent)
+{
+ struct platform_device *pdev;
+ struct xsurf100_ax_plat_data ax88796_data;
+ struct resource res[2] = {
+ DEFINE_RES_NAMED(IRQ_AMIGA_PORTS, 1, NULL,
+ IORESOURCE_IRQ | IORESOURCE_IRQ_SHAREABLE),
+ DEFINE_RES_MEM(zdev->resource.start + XS100_8390_BASE,
+ 4 * 0x20)
+ };
+ int reg;
+ /* This table is referenced in the device structure, so it must
+ * outlive the scope of xsurf100_probe.
+ */
+ static u32 reg_offsets[32];
+ int ret = 0;
+
+ /* X-Surf 100 control and 32 bit ring buffer data access areas.
+ * These resources are not used by the ax88796 driver, so must
+ * be requested here and passed via platform data.
+ */
+
+ if (!request_mem_region(zdev->resource.start, 0x100, zdev->name)) {
+ dev_err(&zdev->dev, "cannot reserve X-Surf 100 control registers\n");
+ return -ENXIO;
+ }
+
+ if (!request_mem_region(zdev->resource.start +
+ XS100_8390_DATA32_BASE,
+ XS100_8390_DATA32_SIZE,
+ "X-Surf 100 32-bit data access")) {
+ dev_err(&zdev->dev, "cannot reserve 32-bit area\n");
+ ret = -ENXIO;
+ goto exit_req;
+ }
+
+ for (reg = 0; reg < 0x20; reg++)
+ reg_offsets[reg] = 4 * reg;
+
+ memset(&ax88796_data, 0, sizeof(ax88796_data));
+ ax88796_data.ax.flags = AXFLG_HAS_EEPROM;
+ ax88796_data.ax.wordlength = 2;
+ ax88796_data.ax.dcr_val = 0x48;
+ ax88796_data.ax.rcr_val = 0x40;
+ ax88796_data.ax.reg_offsets = reg_offsets;
+ ax88796_data.ax.check_irq = is_xsurf100_network_irq;
+ ax88796_data.base_regs = ioremap(zdev->resource.start, 0x100);
+
+ /* error handling for ioremap regs */
+ if (!ax88796_data.base_regs) {
+ dev_err(&zdev->dev, "Cannot ioremap area %pR (registers)\n",
+ &zdev->resource);
+
+ ret = -ENXIO;
+ goto exit_req2;
+ }
+
+ ax88796_data.data_area = ioremap(zdev->resource.start +
+ XS100_8390_DATA32_BASE, XS100_8390_DATA32_SIZE);
+
+ /* error handling for ioremap data */
+ if (!ax88796_data.data_area) {
+ dev_err(&zdev->dev,
+ "Cannot ioremap area %pR offset %x (32-bit access)\n",
+ &zdev->resource, XS100_8390_DATA32_BASE);
+
+ ret = -ENXIO;
+ goto exit_mem;
+ }
+
+ ax88796_data.ax.block_output = xs100_block_output;
+ ax88796_data.ax.block_input = xs100_block_input;
+
+ pdev = platform_device_register_resndata(&zdev->dev, "ax88796",
+ zdev->slotaddr, res, 2,
+ &ax88796_data,
+ sizeof(ax88796_data));
+
+ if (IS_ERR(pdev)) {
+ dev_err(&zdev->dev, "cannot register platform device\n");
+ ret = -ENXIO;
+ goto exit_mem2;
+ }
+
+ zorro_set_drvdata(zdev, pdev);
+
+ if (!ret)
+ return 0;
+
+ exit_mem2:
+ iounmap(ax88796_data.data_area);
+
+ exit_mem:
+ iounmap(ax88796_data.base_regs);
+
+ exit_req2:
+ release_mem_region(zdev->resource.start + XS100_8390_DATA32_BASE,
+ XS100_8390_DATA32_SIZE);
+
+ exit_req:
+ release_mem_region(zdev->resource.start, 0x100);
+
+ return ret;
+}
+
+static void xsurf100_remove(struct zorro_dev *zdev)
+{
+ struct platform_device *pdev = zorro_get_drvdata(zdev);
+ struct xsurf100_ax_plat_data *xs100 = dev_get_platdata(&pdev->dev);
+
+ platform_device_unregister(pdev);
+
+ iounmap(xs100->base_regs);
+ release_mem_region(zdev->resource.start, 0x100);
+ iounmap(xs100->data_area);
+ release_mem_region(zdev->resource.start + XS100_8390_DATA32_BASE,
+ XS100_8390_DATA32_SIZE);
+}
+
+static const struct zorro_device_id xsurf100_zorro_tbl[] = {
+ { ZORRO_PROD_INDIVIDUAL_COMPUTERS_X_SURF100, },
+ { 0 }
+};
+
+MODULE_DEVICE_TABLE(zorro, xsurf100_zorro_tbl);
+
+static struct zorro_driver xsurf100_driver = {
+ .name = "xsurf100",
+ .id_table = xsurf100_zorro_tbl,
+ .probe = xsurf100_probe,
+ .remove = xsurf100_remove,
+};
+
+module_driver(xsurf100_driver, zorro_register_driver, zorro_unregister_driver);
+
+MODULE_DESCRIPTION("X-Surf 100 driver");
+MODULE_AUTHOR("Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 603a570..af766fd 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -33,9 +33,9 @@
source "drivers/net/ethernet/arc/Kconfig"
source "drivers/net/ethernet/atheros/Kconfig"
source "drivers/net/ethernet/aurora/Kconfig"
-source "drivers/net/ethernet/cadence/Kconfig"
source "drivers/net/ethernet/broadcom/Kconfig"
source "drivers/net/ethernet/brocade/Kconfig"
+source "drivers/net/ethernet/cadence/Kconfig"
source "drivers/net/ethernet/calxeda/Kconfig"
source "drivers/net/ethernet/cavium/Kconfig"
source "drivers/net/ethernet/chelsio/Kconfig"
@@ -72,16 +72,16 @@
source "drivers/net/ethernet/dlink/Kconfig"
source "drivers/net/ethernet/emulex/Kconfig"
source "drivers/net/ethernet/ezchip/Kconfig"
-source "drivers/net/ethernet/neterion/Kconfig"
source "drivers/net/ethernet/faraday/Kconfig"
source "drivers/net/ethernet/freescale/Kconfig"
source "drivers/net/ethernet/fujitsu/Kconfig"
source "drivers/net/ethernet/hisilicon/Kconfig"
source "drivers/net/ethernet/hp/Kconfig"
source "drivers/net/ethernet/huawei/Kconfig"
+source "drivers/net/ethernet/i825xx/Kconfig"
source "drivers/net/ethernet/ibm/Kconfig"
source "drivers/net/ethernet/intel/Kconfig"
-source "drivers/net/ethernet/i825xx/Kconfig"
+source "drivers/net/ethernet/neterion/Kconfig"
source "drivers/net/ethernet/xscale/Kconfig"
config JME
@@ -115,6 +115,7 @@
source "drivers/net/ethernet/micrel/Kconfig"
source "drivers/net/ethernet/microchip/Kconfig"
source "drivers/net/ethernet/moxa/Kconfig"
+source "drivers/net/ethernet/mscc/Kconfig"
source "drivers/net/ethernet/myricom/Kconfig"
config FEALNX
@@ -160,20 +161,21 @@
source "drivers/net/ethernet/pasemi/Kconfig"
source "drivers/net/ethernet/qlogic/Kconfig"
source "drivers/net/ethernet/qualcomm/Kconfig"
+source "drivers/net/ethernet/rdc/Kconfig"
source "drivers/net/ethernet/realtek/Kconfig"
source "drivers/net/ethernet/renesas/Kconfig"
-source "drivers/net/ethernet/rdc/Kconfig"
source "drivers/net/ethernet/rocker/Kconfig"
source "drivers/net/ethernet/samsung/Kconfig"
source "drivers/net/ethernet/seeq/Kconfig"
-source "drivers/net/ethernet/silan/Kconfig"
-source "drivers/net/ethernet/sis/Kconfig"
source "drivers/net/ethernet/sfc/Kconfig"
source "drivers/net/ethernet/sgi/Kconfig"
+source "drivers/net/ethernet/silan/Kconfig"
+source "drivers/net/ethernet/sis/Kconfig"
source "drivers/net/ethernet/smsc/Kconfig"
source "drivers/net/ethernet/socionext/Kconfig"
source "drivers/net/ethernet/stmicro/Kconfig"
source "drivers/net/ethernet/sun/Kconfig"
+source "drivers/net/ethernet/synopsys/Kconfig"
source "drivers/net/ethernet/tehuti/Kconfig"
source "drivers/net/ethernet/ti/Kconfig"
source "drivers/net/ethernet/toshiba/Kconfig"
@@ -182,6 +184,5 @@
source "drivers/net/ethernet/wiznet/Kconfig"
source "drivers/net/ethernet/xilinx/Kconfig"
source "drivers/net/ethernet/xircom/Kconfig"
-source "drivers/net/ethernet/synopsys/Kconfig"
endif # ETHERNET
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 2bfd2ee..8fbfe9c 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -55,6 +55,7 @@
obj-$(CONFIG_NET_VENDOR_MELLANOX) += mellanox/
obj-$(CONFIG_NET_VENDOR_MICREL) += micrel/
obj-$(CONFIG_NET_VENDOR_MICROCHIP) += microchip/
+obj-$(CONFIG_NET_VENDOR_MICROSEMI) += mscc/
obj-$(CONFIG_NET_VENDOR_MOXART) += moxa/
obj-$(CONFIG_NET_VENDOR_MYRI) += myricom/
obj-$(CONFIG_FEALNX) += fealnx.o
diff --git a/drivers/net/ethernet/amd/amd8111e.c b/drivers/net/ethernet/amd/amd8111e.c
index c99e3e8..a90080f 100644
--- a/drivers/net/ethernet/amd/amd8111e.c
+++ b/drivers/net/ethernet/amd/amd8111e.c
@@ -1074,16 +1074,12 @@
amd8111e_set_coalesce(dev,TX_INTR_COAL);
coal_conf->tx_coal_type = MEDIUM_COALESCE;
}
-
- }
- else if(tx_pkt_size >= 1024){
- if (tx_pkt_size >= 1024){
- if(coal_conf->tx_coal_type != HIGH_COALESCE){
- coal_conf->tx_timeout = 4;
- coal_conf->tx_event_count = 8;
- amd8111e_set_coalesce(dev,TX_INTR_COAL);
- coal_conf->tx_coal_type = HIGH_COALESCE;
- }
+ } else if (tx_pkt_size >= 1024) {
+ if (coal_conf->tx_coal_type != HIGH_COALESCE) {
+ coal_conf->tx_timeout = 4;
+ coal_conf->tx_event_count = 8;
+ amd8111e_set_coalesce(dev, TX_INTR_COAL);
+ coal_conf->tx_coal_type = HIGH_COALESCE;
}
}
}
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 7c204f0..24f1053 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1312,14 +1312,83 @@
return 0;
}
+static void xgbe_free_memory(struct xgbe_prv_data *pdata)
+{
+ struct xgbe_desc_if *desc_if = &pdata->desc_if;
+
+ /* Free the ring descriptors and buffers */
+ desc_if->free_ring_resources(pdata);
+
+ /* Free the channel and ring structures */
+ xgbe_free_channels(pdata);
+}
+
+static int xgbe_alloc_memory(struct xgbe_prv_data *pdata)
+{
+ struct xgbe_desc_if *desc_if = &pdata->desc_if;
+ struct net_device *netdev = pdata->netdev;
+ int ret;
+
+ if (pdata->new_tx_ring_count) {
+ pdata->tx_ring_count = pdata->new_tx_ring_count;
+ pdata->tx_q_count = pdata->tx_ring_count;
+ pdata->new_tx_ring_count = 0;
+ }
+
+ if (pdata->new_rx_ring_count) {
+ pdata->rx_ring_count = pdata->new_rx_ring_count;
+ pdata->new_rx_ring_count = 0;
+ }
+
+ /* Calculate the Rx buffer size before allocating rings */
+ pdata->rx_buf_size = xgbe_calc_rx_buf_size(netdev, netdev->mtu);
+
+ /* Allocate the channel and ring structures */
+ ret = xgbe_alloc_channels(pdata);
+ if (ret)
+ return ret;
+
+ /* Allocate the ring descriptors and buffers */
+ ret = desc_if->alloc_ring_resources(pdata);
+ if (ret)
+ goto err_channels;
+
+ /* Initialize the service and Tx timers */
+ xgbe_init_timers(pdata);
+
+ return 0;
+
+err_channels:
+ xgbe_free_memory(pdata);
+
+ return ret;
+}
+
static int xgbe_start(struct xgbe_prv_data *pdata)
{
struct xgbe_hw_if *hw_if = &pdata->hw_if;
struct xgbe_phy_if *phy_if = &pdata->phy_if;
struct net_device *netdev = pdata->netdev;
+ unsigned int i;
int ret;
- DBGPR("-->xgbe_start\n");
+ /* Set the number of queues */
+ ret = netif_set_real_num_tx_queues(netdev, pdata->tx_ring_count);
+ if (ret) {
+ netdev_err(netdev, "error setting real tx queue count\n");
+ return ret;
+ }
+
+ ret = netif_set_real_num_rx_queues(netdev, pdata->rx_ring_count);
+ if (ret) {
+ netdev_err(netdev, "error setting real rx queue count\n");
+ return ret;
+ }
+
+ /* Set RSS lookup table data for programming */
+ for (i = 0; i < XGBE_RSS_MAX_TABLE_SIZE; i++)
+ XGMAC_SET_BITS(pdata->rss_table[i], MAC_RSSDR, DMCH,
+ i % pdata->rx_ring_count);
ret = hw_if->init(pdata);
if (ret)
@@ -1347,8 +1416,6 @@
clear_bit(XGBE_STOPPED, &pdata->dev_state);
- DBGPR("<--xgbe_start\n");
-
return 0;
err_irqs:
@@ -1426,10 +1493,22 @@
netdev_alert(pdata->netdev, "device stopped\n");
}
-static void xgbe_restart_dev(struct xgbe_prv_data *pdata)
+void xgbe_full_restart_dev(struct xgbe_prv_data *pdata)
{
- DBGPR("-->xgbe_restart_dev\n");
+ /* If not running, "restart" will happen on open */
+ if (!netif_running(pdata->netdev))
+ return;
+ xgbe_stop(pdata);
+
+ xgbe_free_memory(pdata);
+ xgbe_alloc_memory(pdata);
+
+ xgbe_start(pdata);
+}
+
+void xgbe_restart_dev(struct xgbe_prv_data *pdata)
+{
/* If not running, "restart" will happen on open */
if (!netif_running(pdata->netdev))
return;
@@ -1440,8 +1519,6 @@
xgbe_free_rx_data(pdata);
xgbe_start(pdata);
-
- DBGPR("<--xgbe_restart_dev\n");
}
static void xgbe_restart(struct work_struct *work)
@@ -1827,11 +1904,8 @@
static int xgbe_open(struct net_device *netdev)
{
struct xgbe_prv_data *pdata = netdev_priv(netdev);
- struct xgbe_desc_if *desc_if = &pdata->desc_if;
int ret;
- DBGPR("-->xgbe_open\n");
-
/* Create the various names based on netdev name */
snprintf(pdata->an_name, sizeof(pdata->an_name) - 1, "%s-pcs",
netdev_name(netdev));
@@ -1876,43 +1950,25 @@
goto err_sysclk;
}
- /* Calculate the Rx buffer size before allocating rings */
- ret = xgbe_calc_rx_buf_size(netdev, netdev->mtu);
- if (ret < 0)
- goto err_ptpclk;
- pdata->rx_buf_size = ret;
-
- /* Allocate the channel and ring structures */
- ret = xgbe_alloc_channels(pdata);
- if (ret)
- goto err_ptpclk;
-
- /* Allocate the ring descriptors and buffers */
- ret = desc_if->alloc_ring_resources(pdata);
- if (ret)
- goto err_channels;
-
INIT_WORK(&pdata->service_work, xgbe_service);
INIT_WORK(&pdata->restart_work, xgbe_restart);
INIT_WORK(&pdata->stopdev_work, xgbe_stopdev);
INIT_WORK(&pdata->tx_tstamp_work, xgbe_tx_tstamp);
- xgbe_init_timers(pdata);
+
+ ret = xgbe_alloc_memory(pdata);
+ if (ret)
+ goto err_ptpclk;
ret = xgbe_start(pdata);
if (ret)
- goto err_rings;
+ goto err_mem;
clear_bit(XGBE_DOWN, &pdata->dev_state);
- DBGPR("<--xgbe_open\n");
-
return 0;
-err_rings:
- desc_if->free_ring_resources(pdata);
-
-err_channels:
- xgbe_free_channels(pdata);
+err_mem:
+ xgbe_free_memory(pdata);
err_ptpclk:
clk_disable_unprepare(pdata->ptpclk);
@@ -1932,18 +1988,11 @@
static int xgbe_close(struct net_device *netdev)
{
struct xgbe_prv_data *pdata = netdev_priv(netdev);
- struct xgbe_desc_if *desc_if = &pdata->desc_if;
-
- DBGPR("-->xgbe_close\n");
/* Stop the device */
xgbe_stop(pdata);
- /* Free the ring descriptors and buffers */
- desc_if->free_ring_resources(pdata);
-
- /* Free the channel and ring structures */
- xgbe_free_channels(pdata);
+ xgbe_free_memory(pdata);
/* Disable the clocks */
clk_disable_unprepare(pdata->ptpclk);
@@ -1957,8 +2006,6 @@
set_bit(XGBE_DOWN, &pdata->dev_state);
- DBGPR("<--xgbe_close\n");
-
return 0;
}
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
index ff397bb..a880f10 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
@@ -626,6 +626,217 @@
return 0;
}
+static int xgbe_get_module_info(struct net_device *netdev,
+ struct ethtool_modinfo *modinfo)
+{
+ struct xgbe_prv_data *pdata = netdev_priv(netdev);
+
+ return pdata->phy_if.module_info(pdata, modinfo);
+}
+
+static int xgbe_get_module_eeprom(struct net_device *netdev,
+ struct ethtool_eeprom *eeprom, u8 *data)
+{
+ struct xgbe_prv_data *pdata = netdev_priv(netdev);
+
+ return pdata->phy_if.module_eeprom(pdata, eeprom, data);
+}
+
+static void xgbe_get_ringparam(struct net_device *netdev,
+ struct ethtool_ringparam *ringparam)
+{
+ struct xgbe_prv_data *pdata = netdev_priv(netdev);
+
+ ringparam->rx_max_pending = XGBE_RX_DESC_CNT_MAX;
+ ringparam->tx_max_pending = XGBE_TX_DESC_CNT_MAX;
+ ringparam->rx_pending = pdata->rx_desc_count;
+ ringparam->tx_pending = pdata->tx_desc_count;
+}
+
+static int xgbe_set_ringparam(struct net_device *netdev,
+ struct ethtool_ringparam *ringparam)
+{
+ struct xgbe_prv_data *pdata = netdev_priv(netdev);
+ unsigned int rx, tx;
+
+ if (ringparam->rx_mini_pending || ringparam->rx_jumbo_pending) {
+ netdev_err(netdev, "unsupported ring parameter\n");
+ return -EINVAL;
+ }
+
+ if ((ringparam->rx_pending < XGBE_RX_DESC_CNT_MIN) ||
+ (ringparam->rx_pending > XGBE_RX_DESC_CNT_MAX)) {
+ netdev_err(netdev,
+ "rx ring parameter must be between %u and %u\n",
+ XGBE_RX_DESC_CNT_MIN, XGBE_RX_DESC_CNT_MAX);
+ return -EINVAL;
+ }
+
+ if ((ringparam->tx_pending < XGBE_TX_DESC_CNT_MIN) ||
+ (ringparam->tx_pending > XGBE_TX_DESC_CNT_MAX)) {
+ netdev_err(netdev,
+ "tx ring parameter must be between %u and %u\n",
+ XGBE_TX_DESC_CNT_MIN, XGBE_TX_DESC_CNT_MAX);
+ return -EINVAL;
+ }
+
+ rx = __rounddown_pow_of_two(ringparam->rx_pending);
+ if (rx != ringparam->rx_pending)
+ netdev_notice(netdev,
+ "rx ring parameter rounded to power of two: %u\n",
+ rx);
+
+ tx = __rounddown_pow_of_two(ringparam->tx_pending);
+ if (tx != ringparam->tx_pending)
+ netdev_notice(netdev,
+ "tx ring parameter rounded to power of two: %u\n",
+ tx);
+
+ if ((rx == pdata->rx_desc_count) &&
+ (tx == pdata->tx_desc_count))
+ goto out;
+
+ pdata->rx_desc_count = rx;
+ pdata->tx_desc_count = tx;
+
+ xgbe_restart_dev(pdata);
+
+out:
+ return 0;
+}
+
+static void xgbe_get_channels(struct net_device *netdev,
+ struct ethtool_channels *channels)
+{
+ struct xgbe_prv_data *pdata = netdev_priv(netdev);
+ unsigned int rx, tx, combined;
+
+ /* Calculate maximums allowed:
+ * - Take into account the number of available IRQs
+ * - Do not take into account the number of online CPUs so that
+ * the user can over-subscribe if desired
+ * - Tx is additionally limited by the number of hardware queues
+ */
+ rx = min(pdata->hw_feat.rx_ch_cnt, pdata->rx_max_channel_count);
+ rx = min(rx, pdata->channel_irq_count);
+ tx = min(pdata->hw_feat.tx_ch_cnt, pdata->tx_max_channel_count);
+ tx = min(tx, pdata->channel_irq_count);
+ tx = min(tx, pdata->tx_max_q_count);
+
+ combined = min(rx, tx);
+
+ channels->max_combined = combined;
+ channels->max_rx = rx ? rx - 1 : 0;
+ channels->max_tx = tx ? tx - 1 : 0;
+
+ /* Get current settings based on device state */
+ rx = pdata->new_rx_ring_count ? : pdata->rx_ring_count;
+ tx = pdata->new_tx_ring_count ? : pdata->tx_ring_count;
+
+ combined = min(rx, tx);
+ rx -= combined;
+ tx -= combined;
+
+ channels->combined_count = combined;
+ channels->rx_count = rx;
+ channels->tx_count = tx;
+}
+
+static void xgbe_print_set_channels_input(struct net_device *netdev,
+ struct ethtool_channels *channels)
+{
+ netdev_err(netdev, "channel inputs: combined=%u, rx-only=%u, tx-only=%u\n",
+ channels->combined_count, channels->rx_count,
+ channels->tx_count);
+}
+
+static int xgbe_set_channels(struct net_device *netdev,
+ struct ethtool_channels *channels)
+{
+ struct xgbe_prv_data *pdata = netdev_priv(netdev);
+ unsigned int rx, rx_curr, tx, tx_curr, combined;
+
+ /* Calculate maximums allowed:
+ * - Take into account the number of available IRQs
+ * - Do not take into account the number of online CPUs so that
+ * the user can over-subscribe if desired
+ * - Tx is additionally limited by the number of hardware queues
+ */
+ rx = min(pdata->hw_feat.rx_ch_cnt, pdata->rx_max_channel_count);
+ rx = min(rx, pdata->channel_irq_count);
+ tx = min(pdata->hw_feat.tx_ch_cnt, pdata->tx_max_channel_count);
+ tx = min(tx, pdata->tx_max_q_count);
+ tx = min(tx, pdata->channel_irq_count);
+
+ combined = min(rx, tx);
+
+ /* Should not be setting other count */
+ if (channels->other_count) {
+ netdev_err(netdev,
+ "other channel count must be zero\n");
+ return -EINVAL;
+ }
+
+ /* Require at least one Combined (Rx and Tx) channel */
+ if (!channels->combined_count) {
+ netdev_err(netdev,
+ "at least one combined Rx/Tx channel is required\n");
+ xgbe_print_set_channels_input(netdev, channels);
+ return -EINVAL;
+ }
+
+ /* Check combined channels */
+ if (channels->combined_count > combined) {
+ netdev_err(netdev,
+ "combined channel count cannot exceed %u\n",
+ combined);
+ xgbe_print_set_channels_input(netdev, channels);
+ return -EINVAL;
+ }
+
+ /* Can have some Rx-only or Tx-only channels, but not both */
+ if (channels->rx_count && channels->tx_count) {
+ netdev_err(netdev,
+ "cannot specify both Rx-only and Tx-only channels\n");
+ xgbe_print_set_channels_input(netdev, channels);
+ return -EINVAL;
+ }
+
+ /* Check that we don't exceed the maximum number of channels */
+ if ((channels->combined_count + channels->rx_count) > rx) {
+ netdev_err(netdev,
+ "total Rx channels (%u) requested exceeds maximum available (%u)\n",
+ channels->combined_count + channels->rx_count, rx);
+ xgbe_print_set_channels_input(netdev, channels);
+ return -EINVAL;
+ }
+
+ if ((channels->combined_count + channels->tx_count) > tx) {
+ netdev_err(netdev,
+ "total Tx channels (%u) requested exceeds maximum available (%u)\n",
+ channels->combined_count + channels->tx_count, tx);
+ xgbe_print_set_channels_input(netdev, channels);
+ return -EINVAL;
+ }
+
+ rx = channels->combined_count + channels->rx_count;
+ tx = channels->combined_count + channels->tx_count;
+
+ rx_curr = pdata->new_rx_ring_count ? : pdata->rx_ring_count;
+ tx_curr = pdata->new_tx_ring_count ? : pdata->tx_ring_count;
+
+ if ((rx == rx_curr) && (tx == tx_curr))
+ goto out;
+
+ pdata->new_rx_ring_count = rx;
+ pdata->new_tx_ring_count = tx;
+
+ xgbe_full_restart_dev(pdata);
+
+out:
+ return 0;
+}
+
static const struct ethtool_ops xgbe_ethtool_ops = {
.get_drvinfo = xgbe_get_drvinfo,
.get_msglevel = xgbe_get_msglevel,
@@ -646,6 +857,12 @@
.get_ts_info = xgbe_get_ts_info,
.get_link_ksettings = xgbe_get_link_ksettings,
.set_link_ksettings = xgbe_set_link_ksettings,
+ .get_module_info = xgbe_get_module_info,
+ .get_module_eeprom = xgbe_get_module_eeprom,
+ .get_ringparam = xgbe_get_ringparam,
+ .set_ringparam = xgbe_set_ringparam,
+ .get_channels = xgbe_get_channels,
+ .set_channels = xgbe_set_channels,
};
const struct ethtool_ops *xgbe_get_ethtool_ops(void)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index 441d0973..b41f236 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -265,7 +265,6 @@
{
struct net_device *netdev = pdata->netdev;
struct device *dev = pdata->dev;
- unsigned int i;
int ret;
netdev->irq = pdata->dev_irq;
@@ -324,26 +323,9 @@
pdata->tx_ring_count, pdata->rx_ring_count);
}
- /* Set the number of queues */
- ret = netif_set_real_num_tx_queues(netdev, pdata->tx_ring_count);
- if (ret) {
- dev_err(dev, "error setting real tx queue count\n");
- return ret;
- }
-
- ret = netif_set_real_num_rx_queues(netdev, pdata->rx_ring_count);
- if (ret) {
- dev_err(dev, "error setting real rx queue count\n");
- return ret;
- }
-
- /* Initialize RSS hash key and lookup table */
+ /* Initialize RSS hash key */
netdev_rss_key_fill(pdata->rss_key, sizeof(pdata->rss_key));
- for (i = 0; i < XGBE_RSS_MAX_TABLE_SIZE; i++)
- XGMAC_SET_BITS(pdata->rss_table[i], MAC_RSSDR, DMCH,
- i % pdata->rx_ring_count);
-
XGMAC_SET_BITS(pdata->rss_options, MAC_RSSCR, IP2TE, 1);
XGMAC_SET_BITS(pdata->rss_options, MAC_RSSCR, TCP4TE, 1);
XGMAC_SET_BITS(pdata->rss_options, MAC_RSSCR, UDP4TE, 1);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
index 1b45cd7..4b5d625 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c
@@ -126,6 +126,24 @@
#include "xgbe.h"
#include "xgbe-common.h"
+static int xgbe_phy_module_eeprom(struct xgbe_prv_data *pdata,
+ struct ethtool_eeprom *eeprom, u8 *data)
+{
+ if (!pdata->phy_if.phy_impl.module_eeprom)
+ return -ENXIO;
+
+ return pdata->phy_if.phy_impl.module_eeprom(pdata, eeprom, data);
+}
+
+static int xgbe_phy_module_info(struct xgbe_prv_data *pdata,
+ struct ethtool_modinfo *modinfo)
+{
+ if (!pdata->phy_if.phy_impl.module_info)
+ return -ENXIO;
+
+ return pdata->phy_if.phy_impl.module_info(pdata, modinfo);
+}
+
static void xgbe_an37_clear_interrupts(struct xgbe_prv_data *pdata)
{
int reg;
@@ -198,31 +216,8 @@
xgbe_an37_clear_interrupts(pdata);
}
-static void xgbe_an73_enable_kr_training(struct xgbe_prv_data *pdata)
-{
- unsigned int reg;
-
- reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
-
- reg |= XGBE_KR_TRAINING_ENABLE;
- XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
-}
-
-static void xgbe_an73_disable_kr_training(struct xgbe_prv_data *pdata)
-{
- unsigned int reg;
-
- reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
-
- reg &= ~XGBE_KR_TRAINING_ENABLE;
- XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
-}
-
static void xgbe_kr_mode(struct xgbe_prv_data *pdata)
{
- /* Enable KR training */
- xgbe_an73_enable_kr_training(pdata);
-
/* Set MAC to 10G speed */
pdata->hw_if.set_speed(pdata, SPEED_10000);
@@ -232,9 +227,6 @@
static void xgbe_kx_2500_mode(struct xgbe_prv_data *pdata)
{
- /* Disable KR training */
- xgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 2.5G speed */
pdata->hw_if.set_speed(pdata, SPEED_2500);
@@ -244,9 +236,6 @@
static void xgbe_kx_1000_mode(struct xgbe_prv_data *pdata)
{
- /* Disable KR training */
- xgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
@@ -260,9 +249,6 @@
if (pdata->kr_redrv)
return xgbe_kr_mode(pdata);
- /* Disable KR training */
- xgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 10G speed */
pdata->hw_if.set_speed(pdata, SPEED_10000);
@@ -272,9 +258,6 @@
static void xgbe_x_mode(struct xgbe_prv_data *pdata)
{
- /* Disable KR training */
- xgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
@@ -284,9 +267,6 @@
static void xgbe_sgmii_1000_mode(struct xgbe_prv_data *pdata)
{
- /* Disable KR training */
- xgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
@@ -296,9 +276,6 @@
static void xgbe_sgmii_100_mode(struct xgbe_prv_data *pdata)
{
- /* Disable KR training */
- xgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
@@ -354,13 +331,15 @@
xgbe_change_mode(pdata, pdata->phy_if.phy_impl.switch_mode(pdata));
}
-static void xgbe_set_mode(struct xgbe_prv_data *pdata,
+static bool xgbe_set_mode(struct xgbe_prv_data *pdata,
enum xgbe_mode mode)
{
if (mode == xgbe_cur_mode(pdata))
- return;
+ return false;
xgbe_change_mode(pdata, mode);
+
+ return true;
}
static bool xgbe_use_mode(struct xgbe_prv_data *pdata,
@@ -407,6 +386,12 @@
{
unsigned int reg;
+ /* Disable KR training for now */
+ reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
+ reg &= ~XGBE_KR_TRAINING_ENABLE;
+ XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
+
+ /* Update AN settings */
reg = XMDIO_READ(pdata, MDIO_MMD_AN, MDIO_CTRL1);
reg &= ~MDIO_AN_CTRL1_ENABLE;
@@ -504,21 +489,19 @@
XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_FECCTRL, reg);
/* Start KR training */
+ if (pdata->phy_if.phy_impl.kr_training_pre)
+ pdata->phy_if.phy_impl.kr_training_pre(pdata);
+
reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
- if (reg & XGBE_KR_TRAINING_ENABLE) {
- if (pdata->phy_if.phy_impl.kr_training_pre)
- pdata->phy_if.phy_impl.kr_training_pre(pdata);
+ reg |= XGBE_KR_TRAINING_ENABLE;
+ reg |= XGBE_KR_TRAINING_START;
+ XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
- reg |= XGBE_KR_TRAINING_START;
- XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL,
- reg);
+ netif_dbg(pdata, link, pdata->netdev,
+ "KR training initiated\n");
- netif_dbg(pdata, link, pdata->netdev,
- "KR training initiated\n");
-
- if (pdata->phy_if.phy_impl.kr_training_post)
- pdata->phy_if.phy_impl.kr_training_post(pdata);
- }
+ if (pdata->phy_if.phy_impl.kr_training_post)
+ pdata->phy_if.phy_impl.kr_training_post(pdata);
return XGBE_AN_PAGE_RECEIVED;
}
@@ -1197,21 +1180,23 @@
return 0;
}
-static int __xgbe_phy_config_aneg(struct xgbe_prv_data *pdata)
+static int __xgbe_phy_config_aneg(struct xgbe_prv_data *pdata, bool set_mode)
{
int ret;
+ mutex_lock(&pdata->an_mutex);
+
set_bit(XGBE_LINK_INIT, &pdata->dev_state);
pdata->link_check = jiffies;
ret = pdata->phy_if.phy_impl.an_config(pdata);
if (ret)
- return ret;
+ goto out;
if (pdata->phy.autoneg != AUTONEG_ENABLE) {
ret = xgbe_phy_config_fixed(pdata);
if (ret || !pdata->kr_redrv)
- return ret;
+ goto out;
netif_dbg(pdata, link, pdata->netdev, "AN redriver support\n");
} else {
@@ -1221,24 +1206,27 @@
/* Disable auto-negotiation interrupt */
disable_irq(pdata->an_irq);
- /* Start auto-negotiation in a supported mode */
- if (xgbe_use_mode(pdata, XGBE_MODE_KR)) {
- xgbe_set_mode(pdata, XGBE_MODE_KR);
- } else if (xgbe_use_mode(pdata, XGBE_MODE_KX_2500)) {
- xgbe_set_mode(pdata, XGBE_MODE_KX_2500);
- } else if (xgbe_use_mode(pdata, XGBE_MODE_KX_1000)) {
- xgbe_set_mode(pdata, XGBE_MODE_KX_1000);
- } else if (xgbe_use_mode(pdata, XGBE_MODE_SFI)) {
- xgbe_set_mode(pdata, XGBE_MODE_SFI);
- } else if (xgbe_use_mode(pdata, XGBE_MODE_X)) {
- xgbe_set_mode(pdata, XGBE_MODE_X);
- } else if (xgbe_use_mode(pdata, XGBE_MODE_SGMII_1000)) {
- xgbe_set_mode(pdata, XGBE_MODE_SGMII_1000);
- } else if (xgbe_use_mode(pdata, XGBE_MODE_SGMII_100)) {
- xgbe_set_mode(pdata, XGBE_MODE_SGMII_100);
- } else {
- enable_irq(pdata->an_irq);
- return -EINVAL;
+ if (set_mode) {
+ /* Start auto-negotiation in a supported mode */
+ if (xgbe_use_mode(pdata, XGBE_MODE_KR)) {
+ xgbe_set_mode(pdata, XGBE_MODE_KR);
+ } else if (xgbe_use_mode(pdata, XGBE_MODE_KX_2500)) {
+ xgbe_set_mode(pdata, XGBE_MODE_KX_2500);
+ } else if (xgbe_use_mode(pdata, XGBE_MODE_KX_1000)) {
+ xgbe_set_mode(pdata, XGBE_MODE_KX_1000);
+ } else if (xgbe_use_mode(pdata, XGBE_MODE_SFI)) {
+ xgbe_set_mode(pdata, XGBE_MODE_SFI);
+ } else if (xgbe_use_mode(pdata, XGBE_MODE_X)) {
+ xgbe_set_mode(pdata, XGBE_MODE_X);
+ } else if (xgbe_use_mode(pdata, XGBE_MODE_SGMII_1000)) {
+ xgbe_set_mode(pdata, XGBE_MODE_SGMII_1000);
+ } else if (xgbe_use_mode(pdata, XGBE_MODE_SGMII_100)) {
+ xgbe_set_mode(pdata, XGBE_MODE_SGMII_100);
+ } else {
+ enable_irq(pdata->an_irq);
+ ret = -EINVAL;
+ goto out;
+ }
}
/* Disable and stop any in progress auto-negotiation */
@@ -1258,16 +1246,7 @@
xgbe_an_init(pdata);
xgbe_an_restart(pdata);
- return 0;
-}
-
-static int xgbe_phy_config_aneg(struct xgbe_prv_data *pdata)
-{
- int ret;
-
- mutex_lock(&pdata->an_mutex);
-
- ret = __xgbe_phy_config_aneg(pdata);
+out:
if (ret)
set_bit(XGBE_LINK_ERR, &pdata->dev_state);
else
@@ -1278,6 +1257,16 @@
return ret;
}
+static int xgbe_phy_config_aneg(struct xgbe_prv_data *pdata)
+{
+ return __xgbe_phy_config_aneg(pdata, true);
+}
+
+static int xgbe_phy_reconfig_aneg(struct xgbe_prv_data *pdata)
+{
+ return __xgbe_phy_config_aneg(pdata, false);
+}
+
static bool xgbe_phy_aneg_done(struct xgbe_prv_data *pdata)
{
return (pdata->an_result == XGBE_AN_COMPLETE);
@@ -1334,7 +1323,8 @@
pdata->phy.duplex = DUPLEX_FULL;
- xgbe_set_mode(pdata, mode);
+ if (xgbe_set_mode(pdata, mode) && pdata->an_again)
+ xgbe_phy_reconfig_aneg(pdata);
}
static void xgbe_phy_status(struct xgbe_prv_data *pdata)
@@ -1639,4 +1629,7 @@
phy_if->phy_valid_speed = xgbe_phy_valid_speed;
phy_if->an_isr = xgbe_an_combined_isr;
+
+ phy_if->module_info = xgbe_phy_module_info;
+ phy_if->module_eeprom = xgbe_phy_module_eeprom;
}
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
index 82d1f41..7b86240 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
@@ -335,16 +335,33 @@
pdata->awcr = XGBE_DMA_PCI_AWCR;
pdata->awarcr = XGBE_DMA_PCI_AWARCR;
+ /* Read the port property registers */
+ pdata->pp0 = XP_IOREAD(pdata, XP_PROP_0);
+ pdata->pp1 = XP_IOREAD(pdata, XP_PROP_1);
+ pdata->pp2 = XP_IOREAD(pdata, XP_PROP_2);
+ pdata->pp3 = XP_IOREAD(pdata, XP_PROP_3);
+ pdata->pp4 = XP_IOREAD(pdata, XP_PROP_4);
+ if (netif_msg_probe(pdata)) {
+ dev_dbg(dev, "port property 0 = %#010x\n", pdata->pp0);
+ dev_dbg(dev, "port property 1 = %#010x\n", pdata->pp1);
+ dev_dbg(dev, "port property 2 = %#010x\n", pdata->pp2);
+ dev_dbg(dev, "port property 3 = %#010x\n", pdata->pp3);
+ dev_dbg(dev, "port property 4 = %#010x\n", pdata->pp4);
+ }
+
/* Set the maximum channels and queues */
- reg = XP_IOREAD(pdata, XP_PROP_1);
- pdata->tx_max_channel_count = XP_GET_BITS(reg, XP_PROP_1, MAX_TX_DMA);
- pdata->rx_max_channel_count = XP_GET_BITS(reg, XP_PROP_1, MAX_RX_DMA);
- pdata->tx_max_q_count = XP_GET_BITS(reg, XP_PROP_1, MAX_TX_QUEUES);
- pdata->rx_max_q_count = XP_GET_BITS(reg, XP_PROP_1, MAX_RX_QUEUES);
+ pdata->tx_max_channel_count = XP_GET_BITS(pdata->pp1, XP_PROP_1,
+ MAX_TX_DMA);
+ pdata->rx_max_channel_count = XP_GET_BITS(pdata->pp1, XP_PROP_1,
+ MAX_RX_DMA);
+ pdata->tx_max_q_count = XP_GET_BITS(pdata->pp1, XP_PROP_1,
+ MAX_TX_QUEUES);
+ pdata->rx_max_q_count = XP_GET_BITS(pdata->pp1, XP_PROP_1,
+ MAX_RX_QUEUES);
if (netif_msg_probe(pdata)) {
dev_dbg(dev, "max tx/rx channel count = %u/%u\n",
pdata->tx_max_channel_count,
- pdata->tx_max_channel_count);
+ pdata->rx_max_channel_count);
dev_dbg(dev, "max tx/rx hw queue count = %u/%u\n",
pdata->tx_max_q_count, pdata->rx_max_q_count);
}
@@ -353,12 +370,13 @@
xgbe_set_counts(pdata);
/* Set the maximum fifo amounts */
- reg = XP_IOREAD(pdata, XP_PROP_2);
- pdata->tx_max_fifo_size = XP_GET_BITS(reg, XP_PROP_2, TX_FIFO_SIZE);
+ pdata->tx_max_fifo_size = XP_GET_BITS(pdata->pp2, XP_PROP_2,
+ TX_FIFO_SIZE);
pdata->tx_max_fifo_size *= 16384;
pdata->tx_max_fifo_size = min(pdata->tx_max_fifo_size,
pdata->vdata->tx_max_fifo_size);
- pdata->rx_max_fifo_size = XP_GET_BITS(reg, XP_PROP_2, RX_FIFO_SIZE);
+ pdata->rx_max_fifo_size = XP_GET_BITS(pdata->pp2, XP_PROP_2,
+ RX_FIFO_SIZE);
pdata->rx_max_fifo_size *= 16384;
pdata->rx_max_fifo_size = min(pdata->rx_max_fifo_size,
pdata->vdata->rx_max_fifo_size);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index aac8843..3ceb4f9 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -119,6 +119,7 @@
#include <linux/kmod.h>
#include <linux/mdio.h>
#include <linux/phy.h>
+#include <linux/ethtool.h>
#include "xgbe.h"
#include "xgbe-common.h"
@@ -270,6 +271,15 @@
u8 vendor[32];
};
+#define XGBE_SFP_DIAGS_SUPPORTED(_x) \
+ ((_x)->extd[XGBE_SFP_EXTD_SFF_8472] && \
+ !((_x)->extd[XGBE_SFP_EXTD_DIAG] & XGBE_SFP_EXTD_DIAG_ADDR_CHANGE))
+
+#define XGBE_SFP_EEPROM_BASE_LEN 256
+#define XGBE_SFP_EEPROM_DIAG_LEN 256
+#define XGBE_SFP_EEPROM_MAX (XGBE_SFP_EEPROM_BASE_LEN + \
+ XGBE_SFP_EEPROM_DIAG_LEN)
+
#define XGBE_BEL_FUSE_VENDOR "BEL-FUSE "
#define XGBE_BEL_FUSE_PARTNO "1GBT-SFP06 "
@@ -327,8 +337,6 @@
unsigned int mdio_addr;
- unsigned int comm_owned;
-
/* SFP Support */
enum xgbe_sfp_comm sfp_comm;
unsigned int sfp_mux_address;
@@ -345,7 +353,6 @@
unsigned int sfp_rx_los;
unsigned int sfp_tx_fault;
unsigned int sfp_mod_absent;
- unsigned int sfp_diags;
unsigned int sfp_changed;
unsigned int sfp_phy_avail;
unsigned int sfp_cable_len;
@@ -382,12 +389,6 @@
static int xgbe_phy_i2c_xfer(struct xgbe_prv_data *pdata,
struct xgbe_i2c_op *i2c_op)
{
- struct xgbe_phy_data *phy_data = pdata->phy_data;
-
- /* Be sure we own the bus */
- if (WARN_ON(!phy_data->comm_owned))
- return -EIO;
-
return pdata->i2c_if.i2c_xfer(pdata, i2c_op);
}
@@ -549,10 +550,6 @@
static void xgbe_phy_put_comm_ownership(struct xgbe_prv_data *pdata)
{
- struct xgbe_phy_data *phy_data = pdata->phy_data;
-
- phy_data->comm_owned = 0;
-
mutex_unlock(&xgbe_phy_comm_lock);
}
@@ -562,9 +559,6 @@
unsigned long timeout;
unsigned int mutex_id;
- if (phy_data->comm_owned)
- return 0;
-
/* The I2C and MDIO/GPIO bus is multiplexed between multiple devices,
* the driver needs to take the software mutex and then the hardware
* mutexes before being able to use the busses.
@@ -593,7 +587,6 @@
XP_IOWRITE(pdata, XP_I2C_MUTEX, mutex_id);
XP_IOWRITE(pdata, XP_MDIO_MUTEX, mutex_id);
- phy_data->comm_owned = 1;
return 0;
}
@@ -867,6 +860,9 @@
struct xgbe_phy_data *phy_data = pdata->phy_data;
unsigned int phy_id = phy_data->phydev->phy_id;
+ if (phy_data->port_mode != XGBE_PORT_MODE_SFP)
+ return false;
+
if ((phy_id & 0xfffffff0) != 0x01ff0cc0)
return false;
@@ -892,8 +888,83 @@
return true;
}
+static bool xgbe_phy_belfuse_phy_quirks(struct xgbe_prv_data *pdata)
+{
+ struct xgbe_phy_data *phy_data = pdata->phy_data;
+ struct xgbe_sfp_eeprom *sfp_eeprom = &phy_data->sfp_eeprom;
+ unsigned int phy_id = phy_data->phydev->phy_id;
+ int reg;
+
+ if (phy_data->port_mode != XGBE_PORT_MODE_SFP)
+ return false;
+
+ if (memcmp(&sfp_eeprom->base[XGBE_SFP_BASE_VENDOR_NAME],
+ XGBE_BEL_FUSE_VENDOR, XGBE_SFP_BASE_VENDOR_NAME_LEN))
+ return false;
+
+ /* For Bel-Fuse, use the extra AN flag */
+ pdata->an_again = 1;
+
+ if (memcmp(&sfp_eeprom->base[XGBE_SFP_BASE_VENDOR_PN],
+ XGBE_BEL_FUSE_PARTNO, XGBE_SFP_BASE_VENDOR_PN_LEN))
+ return false;
+
+ if ((phy_id & 0xfffffff0) != 0x03625d10)
+ return false;
+
+ /* Disable RGMII mode */
+ phy_write(phy_data->phydev, 0x18, 0x7007);
+ reg = phy_read(phy_data->phydev, 0x18);
+ phy_write(phy_data->phydev, 0x18, reg & ~0x0080);
+
+ /* Enable fiber register bank */
+ phy_write(phy_data->phydev, 0x1c, 0x7c00);
+ reg = phy_read(phy_data->phydev, 0x1c);
+ reg &= 0x03ff;
+ reg &= ~0x0001;
+ phy_write(phy_data->phydev, 0x1c, 0x8000 | 0x7c00 | reg | 0x0001);
+
+ /* Power down SerDes */
+ reg = phy_read(phy_data->phydev, 0x00);
+ phy_write(phy_data->phydev, 0x00, reg | 0x00800);
+
+ /* Configure SGMII-to-Copper mode */
+ phy_write(phy_data->phydev, 0x1c, 0x7c00);
+ reg = phy_read(phy_data->phydev, 0x1c);
+ reg &= 0x03ff;
+ reg &= ~0x0006;
+ phy_write(phy_data->phydev, 0x1c, 0x8000 | 0x7c00 | reg | 0x0004);
+
+ /* Power up SerDes */
+ reg = phy_read(phy_data->phydev, 0x00);
+ phy_write(phy_data->phydev, 0x00, reg & ~0x00800);
+
+ /* Enable copper register bank */
+ phy_write(phy_data->phydev, 0x1c, 0x7c00);
+ reg = phy_read(phy_data->phydev, 0x1c);
+ reg &= 0x03ff;
+ reg &= ~0x0001;
+ phy_write(phy_data->phydev, 0x1c, 0x8000 | 0x7c00 | reg);
+
+ /* Power up SerDes */
+ reg = phy_read(phy_data->phydev, 0x00);
+ phy_write(phy_data->phydev, 0x00, reg & ~0x00800);
+
+ phy_data->phydev->supported = PHY_GBIT_FEATURES;
+ phy_data->phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+ phy_data->phydev->advertising = phy_data->phydev->supported;
+
+ netif_dbg(pdata, drv, pdata->netdev,
+ "BelFuse PHY quirk in place\n");
+
+ return true;
+}
+
static void xgbe_phy_external_phy_quirks(struct xgbe_prv_data *pdata)
{
+ if (xgbe_phy_belfuse_phy_quirks(pdata))
+ return;
+
if (xgbe_phy_finisar_phy_quirks(pdata))
return;
}
@@ -910,6 +981,9 @@
if (phy_data->phydev)
return 0;
+ /* Clear the extra AN flag */
+ pdata->an_again = 0;
+
/* Check for the use of an external PHY */
if (phy_data->phydev_mode == XGBE_MDIO_MODE_NONE)
return 0;
@@ -1034,37 +1108,6 @@
return false;
}
-static bool xgbe_phy_belfuse_parse_quirks(struct xgbe_prv_data *pdata)
-{
- struct xgbe_phy_data *phy_data = pdata->phy_data;
- struct xgbe_sfp_eeprom *sfp_eeprom = &phy_data->sfp_eeprom;
-
- if (memcmp(&sfp_eeprom->base[XGBE_SFP_BASE_VENDOR_NAME],
- XGBE_BEL_FUSE_VENDOR, XGBE_SFP_BASE_VENDOR_NAME_LEN))
- return false;
-
- if (!memcmp(&sfp_eeprom->base[XGBE_SFP_BASE_VENDOR_PN],
- XGBE_BEL_FUSE_PARTNO, XGBE_SFP_BASE_VENDOR_PN_LEN)) {
- phy_data->sfp_base = XGBE_SFP_BASE_1000_SX;
- phy_data->sfp_cable = XGBE_SFP_CABLE_ACTIVE;
- phy_data->sfp_speed = XGBE_SFP_SPEED_1000;
- if (phy_data->sfp_changed)
- netif_dbg(pdata, drv, pdata->netdev,
- "Bel-Fuse SFP quirk in place\n");
- return true;
- }
-
- return false;
-}
-
-static bool xgbe_phy_sfp_parse_quirks(struct xgbe_prv_data *pdata)
-{
- if (xgbe_phy_belfuse_parse_quirks(pdata))
- return true;
-
- return false;
-}
-
static void xgbe_phy_sfp_parse_eeprom(struct xgbe_prv_data *pdata)
{
struct xgbe_phy_data *phy_data = pdata->phy_data;
@@ -1083,9 +1126,6 @@
phy_data->sfp_tx_fault = xgbe_phy_check_sfp_tx_fault(phy_data);
phy_data->sfp_rx_los = xgbe_phy_check_sfp_rx_los(phy_data);
- if (xgbe_phy_sfp_parse_quirks(pdata))
- return;
-
/* Assume ACTIVE cable unless told it is PASSIVE */
if (sfp_base[XGBE_SFP_BASE_CABLE] & XGBE_SFP_BASE_CABLE_PASSIVE) {
phy_data->sfp_cable = XGBE_SFP_CABLE_PASSIVE;
@@ -1227,13 +1267,6 @@
memcpy(&phy_data->sfp_eeprom, &sfp_eeprom, sizeof(sfp_eeprom));
- if (sfp_eeprom.extd[XGBE_SFP_EXTD_SFF_8472]) {
- u8 diag_type = sfp_eeprom.extd[XGBE_SFP_EXTD_DIAG];
-
- if (!(diag_type & XGBE_SFP_EXTD_DIAG_ADDR_CHANGE))
- phy_data->sfp_diags = 1;
- }
-
xgbe_phy_free_phy_device(pdata);
} else {
phy_data->sfp_changed = 0;
@@ -1283,7 +1316,6 @@
phy_data->sfp_rx_los = 0;
phy_data->sfp_tx_fault = 0;
phy_data->sfp_mod_absent = 1;
- phy_data->sfp_diags = 0;
phy_data->sfp_base = XGBE_SFP_BASE_UNKNOWN;
phy_data->sfp_cable = XGBE_SFP_CABLE_UNKNOWN;
phy_data->sfp_speed = XGBE_SFP_SPEED_UNKNOWN;
@@ -1326,6 +1358,130 @@
xgbe_phy_put_comm_ownership(pdata);
}
+static int xgbe_phy_module_eeprom(struct xgbe_prv_data *pdata,
+ struct ethtool_eeprom *eeprom, u8 *data)
+{
+ struct xgbe_phy_data *phy_data = pdata->phy_data;
+ u8 eeprom_addr, eeprom_data[XGBE_SFP_EEPROM_MAX];
+ struct xgbe_sfp_eeprom *sfp_eeprom;
+ unsigned int i, j, rem;
+ int ret;
+
+ rem = eeprom->len;
+
+ if (!eeprom->len) {
+ ret = -EINVAL;
+ goto done;
+ }
+
+ if ((eeprom->offset + eeprom->len) > XGBE_SFP_EEPROM_MAX) {
+ ret = -EINVAL;
+ goto done;
+ }
+
+ if (phy_data->port_mode != XGBE_PORT_MODE_SFP) {
+ ret = -ENXIO;
+ goto done;
+ }
+
+ if (!netif_running(pdata->netdev)) {
+ ret = -EIO;
+ goto done;
+ }
+
+ if (phy_data->sfp_mod_absent) {
+ ret = -EIO;
+ goto done;
+ }
+
+ ret = xgbe_phy_get_comm_ownership(pdata);
+ if (ret) {
+ ret = -EIO;
+ goto done;
+ }
+
+ ret = xgbe_phy_sfp_get_mux(pdata);
+ if (ret) {
+ netdev_err(pdata->netdev, "I2C error setting SFP MUX\n");
+ ret = -EIO;
+ goto put_own;
+ }
+
+ /* Read the SFP serial ID eeprom */
+ eeprom_addr = 0;
+ ret = xgbe_phy_i2c_read(pdata, XGBE_SFP_SERIAL_ID_ADDRESS,
+ &eeprom_addr, sizeof(eeprom_addr),
+ eeprom_data, XGBE_SFP_EEPROM_BASE_LEN);
+ if (ret) {
+ netdev_err(pdata->netdev,
+ "I2C error reading SFP EEPROM\n");
+ ret = -EIO;
+ goto put_mux;
+ }
+
+ sfp_eeprom = (struct xgbe_sfp_eeprom *)eeprom_data;
+
+ if (XGBE_SFP_DIAGS_SUPPORTED(sfp_eeprom)) {
+ /* Read the SFP diagnostic eeprom */
+ eeprom_addr = 0;
+ ret = xgbe_phy_i2c_read(pdata, XGBE_SFP_DIAG_INFO_ADDRESS,
+ &eeprom_addr, sizeof(eeprom_addr),
+ eeprom_data + XGBE_SFP_EEPROM_BASE_LEN,
+ XGBE_SFP_EEPROM_DIAG_LEN);
+ if (ret) {
+ netdev_err(pdata->netdev,
+ "I2C error reading SFP DIAGS\n");
+ ret = -EIO;
+ goto put_mux;
+ }
+ }
+
+ for (i = 0, j = eeprom->offset; i < eeprom->len; i++, j++) {
+ if ((j >= XGBE_SFP_EEPROM_BASE_LEN) &&
+ !XGBE_SFP_DIAGS_SUPPORTED(sfp_eeprom))
+ break;
+
+ data[i] = eeprom_data[j];
+ rem--;
+ }
+
+put_mux:
+ xgbe_phy_sfp_put_mux(pdata);
+
+put_own:
+ xgbe_phy_put_comm_ownership(pdata);
+
+done:
+ eeprom->len -= rem;
+
+ return ret;
+}
+
+static int xgbe_phy_module_info(struct xgbe_prv_data *pdata,
+ struct ethtool_modinfo *modinfo)
+{
+ struct xgbe_phy_data *phy_data = pdata->phy_data;
+
+ if (phy_data->port_mode != XGBE_PORT_MODE_SFP)
+ return -ENXIO;
+
+ if (!netif_running(pdata->netdev))
+ return -EIO;
+
+ if (phy_data->sfp_mod_absent)
+ return -EIO;
+
+ if (XGBE_SFP_DIAGS_SUPPORTED(&phy_data->sfp_eeprom)) {
+ modinfo->type = ETH_MODULE_SFF_8472;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+ } else {
+ modinfo->type = ETH_MODULE_SFF_8079;
+ modinfo->eeprom_len = ETH_MODULE_SFF_8079_LEN;
+ }
+
+ return 0;
+}
+
static void xgbe_phy_phydev_flowctrl(struct xgbe_prv_data *pdata)
{
struct ethtool_link_ksettings *lks = &pdata->phy.lks;
@@ -1611,6 +1767,10 @@
XGBE_CLR_ADV(dlks, 1000baseKX_Full);
XGBE_CLR_ADV(dlks, 10000baseKR_Full);
+ /* Advertise FEC support is present */
+ if (pdata->fec_ability & MDIO_PMA_10GBR_FECABLE_ABLE)
+ XGBE_SET_ADV(dlks, 10000baseR_FEC);
+
switch (phy_data->port_mode) {
case XGBE_PORT_MODE_BACKPLANE:
XGBE_SET_ADV(dlks, 10000baseKR_Full);
@@ -2421,22 +2581,21 @@
static void xgbe_phy_sfp_gpio_setup(struct xgbe_prv_data *pdata)
{
struct xgbe_phy_data *phy_data = pdata->phy_data;
- unsigned int reg;
-
- reg = XP_IOREAD(pdata, XP_PROP_3);
phy_data->sfp_gpio_address = XGBE_GPIO_ADDRESS_PCA9555 +
- XP_GET_BITS(reg, XP_PROP_3, GPIO_ADDR);
+ XP_GET_BITS(pdata->pp3, XP_PROP_3,
+ GPIO_ADDR);
- phy_data->sfp_gpio_mask = XP_GET_BITS(reg, XP_PROP_3, GPIO_MASK);
+ phy_data->sfp_gpio_mask = XP_GET_BITS(pdata->pp3, XP_PROP_3,
+ GPIO_MASK);
- phy_data->sfp_gpio_rx_los = XP_GET_BITS(reg, XP_PROP_3,
+ phy_data->sfp_gpio_rx_los = XP_GET_BITS(pdata->pp3, XP_PROP_3,
GPIO_RX_LOS);
- phy_data->sfp_gpio_tx_fault = XP_GET_BITS(reg, XP_PROP_3,
+ phy_data->sfp_gpio_tx_fault = XP_GET_BITS(pdata->pp3, XP_PROP_3,
GPIO_TX_FAULT);
- phy_data->sfp_gpio_mod_absent = XP_GET_BITS(reg, XP_PROP_3,
+ phy_data->sfp_gpio_mod_absent = XP_GET_BITS(pdata->pp3, XP_PROP_3,
GPIO_MOD_ABS);
- phy_data->sfp_gpio_rate_select = XP_GET_BITS(reg, XP_PROP_3,
+ phy_data->sfp_gpio_rate_select = XP_GET_BITS(pdata->pp3, XP_PROP_3,
GPIO_RATE_SELECT);
if (netif_msg_probe(pdata)) {
@@ -2458,18 +2617,17 @@
static void xgbe_phy_sfp_comm_setup(struct xgbe_prv_data *pdata)
{
struct xgbe_phy_data *phy_data = pdata->phy_data;
- unsigned int reg, mux_addr_hi, mux_addr_lo;
+ unsigned int mux_addr_hi, mux_addr_lo;
- reg = XP_IOREAD(pdata, XP_PROP_4);
-
- mux_addr_hi = XP_GET_BITS(reg, XP_PROP_4, MUX_ADDR_HI);
- mux_addr_lo = XP_GET_BITS(reg, XP_PROP_4, MUX_ADDR_LO);
+ mux_addr_hi = XP_GET_BITS(pdata->pp4, XP_PROP_4, MUX_ADDR_HI);
+ mux_addr_lo = XP_GET_BITS(pdata->pp4, XP_PROP_4, MUX_ADDR_LO);
if (mux_addr_lo == XGBE_SFP_DIRECT)
return;
phy_data->sfp_comm = XGBE_SFP_COMM_PCA9545;
phy_data->sfp_mux_address = (mux_addr_hi << 2) + mux_addr_lo;
- phy_data->sfp_mux_channel = XP_GET_BITS(reg, XP_PROP_4, MUX_CHAN);
+ phy_data->sfp_mux_channel = XP_GET_BITS(pdata->pp4, XP_PROP_4,
+ MUX_CHAN);
if (netif_msg_probe(pdata)) {
dev_dbg(pdata->dev, "SFP: mux_address=%#x\n",
@@ -2592,13 +2750,11 @@
static int xgbe_phy_mdio_reset_setup(struct xgbe_prv_data *pdata)
{
struct xgbe_phy_data *phy_data = pdata->phy_data;
- unsigned int reg;
if (phy_data->conn_type != XGBE_CONN_TYPE_MDIO)
return 0;
- reg = XP_IOREAD(pdata, XP_PROP_3);
- phy_data->mdio_reset = XP_GET_BITS(reg, XP_PROP_3, MDIO_RESET);
+ phy_data->mdio_reset = XP_GET_BITS(pdata->pp3, XP_PROP_3, MDIO_RESET);
switch (phy_data->mdio_reset) {
case XGBE_MDIO_RESET_NONE:
case XGBE_MDIO_RESET_I2C_GPIO:
@@ -2612,12 +2768,12 @@
if (phy_data->mdio_reset == XGBE_MDIO_RESET_I2C_GPIO) {
phy_data->mdio_reset_addr = XGBE_GPIO_ADDRESS_PCA9555 +
- XP_GET_BITS(reg, XP_PROP_3,
+ XP_GET_BITS(pdata->pp3, XP_PROP_3,
MDIO_RESET_I2C_ADDR);
- phy_data->mdio_reset_gpio = XP_GET_BITS(reg, XP_PROP_3,
+ phy_data->mdio_reset_gpio = XP_GET_BITS(pdata->pp3, XP_PROP_3,
MDIO_RESET_I2C_GPIO);
} else if (phy_data->mdio_reset == XGBE_MDIO_RESET_INT_GPIO) {
- phy_data->mdio_reset_gpio = XP_GET_BITS(reg, XP_PROP_3,
+ phy_data->mdio_reset_gpio = XP_GET_BITS(pdata->pp3, XP_PROP_3,
MDIO_RESET_INT_GPIO);
}
@@ -2707,12 +2863,9 @@
static bool xgbe_phy_port_enabled(struct xgbe_prv_data *pdata)
{
- unsigned int reg;
-
- reg = XP_IOREAD(pdata, XP_PROP_0);
- if (!XP_GET_BITS(reg, XP_PROP_0, PORT_SPEEDS))
+ if (!XP_GET_BITS(pdata->pp0, XP_PROP_0, PORT_SPEEDS))
return false;
- if (!XP_GET_BITS(reg, XP_PROP_0, CONN_TYPE))
+ if (!XP_GET_BITS(pdata->pp0, XP_PROP_0, CONN_TYPE))
return false;
return true;
@@ -2921,7 +3074,6 @@
struct ethtool_link_ksettings *lks = &pdata->phy.lks;
struct xgbe_phy_data *phy_data;
struct mii_bus *mii;
- unsigned int reg;
int ret;
/* Check if enabled */
@@ -2940,12 +3092,11 @@
return -ENOMEM;
pdata->phy_data = phy_data;
- reg = XP_IOREAD(pdata, XP_PROP_0);
- phy_data->port_mode = XP_GET_BITS(reg, XP_PROP_0, PORT_MODE);
- phy_data->port_id = XP_GET_BITS(reg, XP_PROP_0, PORT_ID);
- phy_data->port_speeds = XP_GET_BITS(reg, XP_PROP_0, PORT_SPEEDS);
- phy_data->conn_type = XP_GET_BITS(reg, XP_PROP_0, CONN_TYPE);
- phy_data->mdio_addr = XP_GET_BITS(reg, XP_PROP_0, MDIO_ADDR);
+ phy_data->port_mode = XP_GET_BITS(pdata->pp0, XP_PROP_0, PORT_MODE);
+ phy_data->port_id = XP_GET_BITS(pdata->pp0, XP_PROP_0, PORT_ID);
+ phy_data->port_speeds = XP_GET_BITS(pdata->pp0, XP_PROP_0, PORT_SPEEDS);
+ phy_data->conn_type = XP_GET_BITS(pdata->pp0, XP_PROP_0, CONN_TYPE);
+ phy_data->mdio_addr = XP_GET_BITS(pdata->pp0, XP_PROP_0, MDIO_ADDR);
if (netif_msg_probe(pdata)) {
dev_dbg(pdata->dev, "port mode=%u\n", phy_data->port_mode);
dev_dbg(pdata->dev, "port id=%u\n", phy_data->port_id);
@@ -2954,12 +3105,11 @@
dev_dbg(pdata->dev, "mdio addr=%u\n", phy_data->mdio_addr);
}
- reg = XP_IOREAD(pdata, XP_PROP_4);
- phy_data->redrv = XP_GET_BITS(reg, XP_PROP_4, REDRV_PRESENT);
- phy_data->redrv_if = XP_GET_BITS(reg, XP_PROP_4, REDRV_IF);
- phy_data->redrv_addr = XP_GET_BITS(reg, XP_PROP_4, REDRV_ADDR);
- phy_data->redrv_lane = XP_GET_BITS(reg, XP_PROP_4, REDRV_LANE);
- phy_data->redrv_model = XP_GET_BITS(reg, XP_PROP_4, REDRV_MODEL);
+ phy_data->redrv = XP_GET_BITS(pdata->pp4, XP_PROP_4, REDRV_PRESENT);
+ phy_data->redrv_if = XP_GET_BITS(pdata->pp4, XP_PROP_4, REDRV_IF);
+ phy_data->redrv_addr = XP_GET_BITS(pdata->pp4, XP_PROP_4, REDRV_ADDR);
+ phy_data->redrv_lane = XP_GET_BITS(pdata->pp4, XP_PROP_4, REDRV_LANE);
+ phy_data->redrv_model = XP_GET_BITS(pdata->pp4, XP_PROP_4, REDRV_MODEL);
if (phy_data->redrv && netif_msg_probe(pdata)) {
dev_dbg(pdata->dev, "redrv present\n");
dev_dbg(pdata->dev, "redrv i/f=%u\n", phy_data->redrv_if);
@@ -3231,4 +3381,7 @@
phy_impl->kr_training_pre = xgbe_phy_kr_training_pre;
phy_impl->kr_training_post = xgbe_phy_kr_training_post;
+
+ phy_impl->module_info = xgbe_phy_module_info;
+ phy_impl->module_eeprom = xgbe_phy_module_eeprom;
}
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h
index 95d4b56..47bcbcf 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe.h
@@ -144,6 +144,11 @@
#define XGBE_TX_DESC_MAX_PROC (XGBE_TX_DESC_CNT >> 1)
#define XGBE_RX_DESC_CNT 512
+#define XGBE_TX_DESC_CNT_MIN 64
+#define XGBE_TX_DESC_CNT_MAX 4096
+#define XGBE_RX_DESC_CNT_MIN 64
+#define XGBE_RX_DESC_CNT_MAX 4096
+
#define XGBE_TX_MAX_BUF_SIZE (0x3fff & ~(64 - 1))
/* Descriptors required for maximum contiguous TSO/GSO packet */
@@ -835,6 +840,7 @@
* Optional routines:
* an_pre, an_post
* kr_training_pre, kr_training_post
+ * module_info, module_eeprom
*/
struct xgbe_phy_impl_if {
/* Perform Setup/teardown actions */
@@ -883,6 +889,12 @@
/* Pre/Post KR training enablement support */
void (*kr_training_pre)(struct xgbe_prv_data *);
void (*kr_training_post)(struct xgbe_prv_data *);
+
+ /* SFP module related info */
+ int (*module_info)(struct xgbe_prv_data *pdata,
+ struct ethtool_modinfo *modinfo);
+ int (*module_eeprom)(struct xgbe_prv_data *pdata,
+ struct ethtool_eeprom *eeprom, u8 *data);
};
struct xgbe_phy_if {
@@ -905,6 +917,12 @@
/* For single interrupt support */
irqreturn_t (*an_isr)(struct xgbe_prv_data *);
+ /* For ethtool PHY support */
+ int (*module_info)(struct xgbe_prv_data *pdata,
+ struct ethtool_modinfo *modinfo);
+ int (*module_eeprom)(struct xgbe_prv_data *pdata,
+ struct ethtool_eeprom *eeprom, u8 *data);
+
/* PHY implementation specific services */
struct xgbe_phy_impl_if phy_impl;
};
@@ -1027,6 +1045,13 @@
void __iomem *xprop_regs; /* XGBE property registers */
void __iomem *xi2c_regs; /* XGBE I2C CSRs */
+ /* Port property registers */
+ unsigned int pp0;
+ unsigned int pp1;
+ unsigned int pp2;
+ unsigned int pp3;
+ unsigned int pp4;
+
/* Overall device lock */
spinlock_t lock;
@@ -1097,6 +1122,9 @@
unsigned int rx_ring_count;
unsigned int rx_desc_count;
+ unsigned int new_tx_ring_count;
+ unsigned int new_rx_ring_count;
+
unsigned int tx_max_q_count;
unsigned int rx_max_q_count;
unsigned int tx_q_count;
@@ -1233,6 +1261,7 @@
enum xgbe_rx kr_state;
enum xgbe_rx kx_state;
struct work_struct an_work;
+ unsigned int an_again;
unsigned int an_supported;
unsigned int parallel_detect;
unsigned int fec_ability;
@@ -1310,6 +1339,8 @@
int xgbe_powerdown(struct net_device *, unsigned int);
void xgbe_init_rx_coalesce(struct xgbe_prv_data *);
void xgbe_init_tx_coalesce(struct xgbe_prv_data *);
+void xgbe_restart_dev(struct xgbe_prv_data *pdata);
+void xgbe_full_restart_dev(struct xgbe_prv_data *pdata);
#ifdef CONFIG_DEBUG_FS
void xgbe_debugfs_init(struct xgbe_prv_data *);
diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index f33b25f..d5fca2e 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -654,7 +654,7 @@
pkts = priv->rx_max_coalesced_frames;
if (ec->use_adaptive_rx_coalesce && !priv->dim.use_dim) {
- moder = net_dim_get_def_profile(priv->dim.dim.mode);
+ moder = net_dim_get_def_rx_moderation(priv->dim.dim.mode);
usecs = moder.usec;
pkts = moder.pkts;
}
@@ -1064,7 +1064,7 @@
struct bcm_sysport_priv *priv =
container_of(ndim, struct bcm_sysport_priv, dim);
struct net_dim_cq_moder cur_profile =
- net_dim_get_profile(dim->mode, dim->profile_ix);
+ net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
bcm_sysport_set_rx_coalesce(priv, cur_profile.usec, cur_profile.pkts);
dim->state = NET_DIM_START_MEASURE;
@@ -1437,7 +1437,7 @@
/* If DIM was enabled, re-apply default parameters */
if (dim->use_dim) {
- moder = net_dim_get_def_profile(dim->dim.mode);
+ moder = net_dim_get_def_rx_moderation(dim->dim.mode);
usecs = moder.usec;
pkts = moder.pkts;
}
diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 7c560d5..5a779b1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -2,3 +2,4 @@
bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o
bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
+bnxt_en-$(CONFIG_DEBUG_FS) += bnxt_debugfs.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f83769d..dfa0839 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -62,6 +62,7 @@
#include "bnxt_vfr.h"
#include "bnxt_tc.h"
#include "bnxt_devlink.h"
+#include "bnxt_debugfs.h"
#define BNXT_TX_TIMEOUT (5 * HZ)
@@ -2383,6 +2384,7 @@
for (i = 0, j = 0; i < bp->tx_nr_rings; i++) {
struct bnxt_tx_ring_info *txr = &bp->tx_ring[i];
struct bnxt_ring_struct *ring;
+ u8 qidx;
ring = &txr->tx_ring_struct;
@@ -2411,7 +2413,8 @@
memset(txr->tx_push, 0, sizeof(struct tx_push_bd));
}
- ring->queue_id = bp->q_info[j].queue_id;
+ qidx = bp->tc_to_qidx[j];
+ ring->queue_id = bp->q_info[qidx].queue_id;
if (i < bp->tx_nr_rings_xdp)
continue;
if (i % bp->tx_nr_rings_per_tc == (bp->tx_nr_rings_per_tc - 1))
@@ -3493,15 +3496,29 @@
if (!timeout)
timeout = DFLT_HWRM_CMD_TIMEOUT;
+ /* convert timeout to usec */
+ timeout *= 1000;
i = 0;
- tmo_count = timeout * 40;
+ /* Short timeout for the first few iterations:
+ * number of loops = number of loops for short timeout +
+ * number of loops for standard timeout.
+ */
+ tmo_count = HWRM_SHORT_TIMEOUT_COUNTER;
+ timeout = timeout - HWRM_SHORT_MIN_TIMEOUT * HWRM_SHORT_TIMEOUT_COUNTER;
+ tmo_count += DIV_ROUND_UP(timeout, HWRM_MIN_TIMEOUT);
resp_len = bp->hwrm_cmd_resp_addr + HWRM_RESP_LEN_OFFSET;
if (intr_process) {
/* Wait until hwrm response cmpl interrupt is processed */
while (bp->hwrm_intr_seq_id != HWRM_SEQ_ID_INVALID &&
i++ < tmo_count) {
- usleep_range(25, 40);
+ /* on first few passes, just barely sleep */
+ if (i < HWRM_SHORT_TIMEOUT_COUNTER)
+ usleep_range(HWRM_SHORT_MIN_TIMEOUT,
+ HWRM_SHORT_MAX_TIMEOUT);
+ else
+ usleep_range(HWRM_MIN_TIMEOUT,
+ HWRM_MAX_TIMEOUT);
}
if (bp->hwrm_intr_seq_id != HWRM_SEQ_ID_INVALID) {
@@ -3513,25 +3530,34 @@
HWRM_RESP_LEN_SFT;
valid = bp->hwrm_cmd_resp_addr + len - 1;
} else {
+ int j;
+
/* Check if response len is updated */
for (i = 0; i < tmo_count; i++) {
len = (le32_to_cpu(*resp_len) & HWRM_RESP_LEN_MASK) >>
HWRM_RESP_LEN_SFT;
if (len)
break;
- usleep_range(25, 40);
+ /* on first few passes, just barely sleep */
+ if (i < DFLT_HWRM_CMD_TIMEOUT)
+ usleep_range(HWRM_SHORT_MIN_TIMEOUT,
+ HWRM_SHORT_MAX_TIMEOUT);
+ else
+ usleep_range(HWRM_MIN_TIMEOUT,
+ HWRM_MAX_TIMEOUT);
}
if (i >= tmo_count) {
netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d\n",
- timeout, le16_to_cpu(req->req_type),
+ HWRM_TOTAL_TIMEOUT(i),
+ le16_to_cpu(req->req_type),
le16_to_cpu(req->seq_id), len);
return -1;
}
/* Last byte of resp contains valid bit */
valid = bp->hwrm_cmd_resp_addr + len - 1;
- for (i = 0; i < 5; i++) {
+ for (j = 0; j < HWRM_VALID_BIT_DELAY_USEC; j++) {
/* make sure we read from updated DMA memory */
dma_rmb();
if (*valid)
@@ -3539,9 +3565,10 @@
udelay(1);
}
- if (i >= 5) {
+ if (j >= HWRM_VALID_BIT_DELAY_USEC) {
netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d v:%d\n",
- timeout, le16_to_cpu(req->req_type),
+ HWRM_TOTAL_TIMEOUT(i),
+ le16_to_cpu(req->req_type),
le16_to_cpu(req->seq_id), len, *valid);
return -1;
}
@@ -4334,26 +4361,9 @@
mutex_unlock(&bp->hwrm_cmd_lock);
if (rc || err) {
- switch (ring_type) {
- case RING_FREE_REQ_RING_TYPE_L2_CMPL:
- netdev_err(bp->dev, "hwrm_ring_alloc cp failed. rc:%x err:%x\n",
- rc, err);
- return -1;
-
- case RING_FREE_REQ_RING_TYPE_RX:
- netdev_err(bp->dev, "hwrm_ring_alloc rx failed. rc:%x err:%x\n",
- rc, err);
- return -1;
-
- case RING_FREE_REQ_RING_TYPE_TX:
- netdev_err(bp->dev, "hwrm_ring_alloc tx failed. rc:%x err:%x\n",
- rc, err);
- return -1;
-
- default:
- netdev_err(bp->dev, "Invalid ring\n");
- return -1;
- }
+ netdev_err(bp->dev, "hwrm_ring_alloc type %d failed. rc:%x err:%x\n",
+ ring_type, rc, err);
+ return -EIO;
}
ring->fw_ring_id = ring_id;
return rc;
@@ -4477,23 +4487,9 @@
mutex_unlock(&bp->hwrm_cmd_lock);
if (rc || error_code) {
- switch (ring_type) {
- case RING_FREE_REQ_RING_TYPE_L2_CMPL:
- netdev_err(bp->dev, "hwrm_ring_free cp failed. rc:%d\n",
- rc);
- return rc;
- case RING_FREE_REQ_RING_TYPE_RX:
- netdev_err(bp->dev, "hwrm_ring_free rx failed. rc:%d\n",
- rc);
- return rc;
- case RING_FREE_REQ_RING_TYPE_TX:
- netdev_err(bp->dev, "hwrm_ring_free tx failed. rc:%d\n",
- rc);
- return rc;
- default:
- netdev_err(bp->dev, "Invalid ring\n");
- return -1;
- }
+ netdev_err(bp->dev, "hwrm_ring_free type %d failed. rc:%x err:%x\n",
+ ring_type, rc, error_code);
+ return -EIO;
}
return 0;
}
@@ -4721,6 +4717,10 @@
__bnxt_hwrm_reserve_vf_rings(bp, &req, tx_rings, rx_rings, ring_grps,
cp_rings, vnics);
+ req.enables |= cpu_to_le32(FUNC_VF_CFG_REQ_ENABLES_NUM_RSSCOS_CTXS |
+ FUNC_VF_CFG_REQ_ENABLES_NUM_L2_CTXS);
+ req.num_rsscos_ctxs = cpu_to_le16(BNXT_VF_MAX_RSS_CTX);
+ req.num_l2_ctxs = cpu_to_le16(BNXT_VF_MAX_L2_CTX);
rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
if (rc)
return -ENOMEM;
@@ -5309,6 +5309,7 @@
for (i = 0; i < bp->max_tc; i++) {
bp->q_info[i].queue_id = *qptr++;
bp->q_info[i].queue_profile = *qptr++;
+ bp->tc_to_qidx[i] = i;
}
qportcfg_exit:
@@ -5376,7 +5377,8 @@
struct tm tm;
time64_t now = ktime_get_real_seconds();
- if (bp->hwrm_spec_code < 0x10400)
+ if ((BNXT_VF(bp) && bp->hwrm_spec_code < 0x10901) ||
+ bp->hwrm_spec_code < 0x10400)
return -EOPNOTSUPP;
time64_to_tm(now, 0, &tm);
@@ -5958,6 +5960,9 @@
if (total_vecs > max)
total_vecs = max;
+ if (!total_vecs)
+ return 0;
+
msix_ent = kcalloc(total_vecs, sizeof(struct msix_entry), GFP_KERNEL);
if (!msix_ent)
return -ENOMEM;
@@ -6457,6 +6462,9 @@
}
mutex_unlock(&bp->hwrm_cmd_lock);
+ if (!BNXT_SINGLE_PF(bp))
+ return 0;
+
diff = link_info->support_auto_speeds ^ link_info->advertising;
if ((link_info->support_auto_speeds | diff) !=
link_info->support_auto_speeds) {
@@ -6843,6 +6851,8 @@
}
}
+static int bnxt_init_dflt_ring_mode(struct bnxt *bp);
+
static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
{
int rc = 0;
@@ -6850,6 +6860,12 @@
bnxt_preset_reg_win(bp);
netif_carrier_off(bp->dev);
if (irq_re_init) {
+ /* Reserve rings now if none were reserved at driver probe. */
+ rc = bnxt_init_dflt_ring_mode(bp);
+ if (rc) {
+ netdev_err(bp->dev, "Failed to reserve default rings at open\n");
+ return rc;
+ }
rc = bnxt_reserve_rings(bp);
if (rc)
return rc;
@@ -6877,6 +6893,7 @@
}
bnxt_enable_napi(bp);
+ bnxt_debug_dev_init(bp);
rc = bnxt_init_nic(bp, irq_re_init);
if (rc) {
@@ -6909,6 +6926,7 @@
return 0;
open_err:
+ bnxt_debug_dev_exit(bp);
bnxt_disable_napi(bp);
bnxt_del_napi(bp);
@@ -7002,6 +7020,7 @@
/* TODO CHIMP_FW: Link/PHY related cleanup if (link_re_init) */
+ bnxt_debug_dev_exit(bp);
bnxt_disable_napi(bp);
del_timer_sync(&bp->timer);
bnxt_free_skbs(bp);
@@ -7279,6 +7298,25 @@
return rc;
}
+static bool bnxt_can_reserve_rings(struct bnxt *bp)
+{
+#ifdef CONFIG_BNXT_SRIOV
+ if ((bp->flags & BNXT_FLAG_NEW_RM) && BNXT_VF(bp)) {
+ struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+
+ /* No minimum rings were provisioned by the PF. Don't
+ * reserve rings by default when device is down.
+ */
+ if (hw_resc->min_tx_rings || hw_resc->resv_tx_rings)
+ return true;
+
+ if (!netif_running(bp->dev))
+ return false;
+ }
+#endif
+ return true;
+}
+
/* If the chip and firmware supports RFS */
static bool bnxt_rfs_supported(struct bnxt *bp)
{
@@ -7295,7 +7333,7 @@
#ifdef CONFIG_RFS_ACCEL
int vnics, max_vnics, max_rss_ctxs;
- if (!(bp->flags & BNXT_FLAG_MSIX_CAP))
+ if (!(bp->flags & BNXT_FLAG_MSIX_CAP) || !bnxt_can_reserve_rings(bp))
return false;
vnics = 1 + bp->rx_nr_rings;
@@ -7729,7 +7767,7 @@
coal->coal_bufs = 30;
coal->coal_ticks_irq = 1;
coal->coal_bufs_irq = 2;
- coal->idle_thresh = 25;
+ coal->idle_thresh = 50;
coal->bufs_per_record = 2;
coal->budget = 64; /* NAPI budget */
@@ -8529,6 +8567,9 @@
{
int dflt_rings, max_rx_rings, max_tx_rings, rc;
+ if (!bnxt_can_reserve_rings(bp))
+ return 0;
+
if (sh)
bp->flags |= BNXT_FLAG_SHARED_RINGS;
dflt_rings = netif_get_num_default_rss_queues();
@@ -8574,6 +8615,29 @@
return rc;
}
+static int bnxt_init_dflt_ring_mode(struct bnxt *bp)
+{
+ int rc;
+
+ if (bp->tx_nr_rings)
+ return 0;
+
+ rc = bnxt_set_dflt_rings(bp, true);
+ if (rc) {
+ netdev_err(bp->dev, "Not enough rings available.\n");
+ return rc;
+ }
+ rc = bnxt_init_int_mode(bp);
+ if (rc)
+ return rc;
+ bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
+ if (bnxt_rfs_supported(bp) && bnxt_rfs_capable(bp)) {
+ bp->flags |= BNXT_FLAG_RFS;
+ bp->dev->features |= NETIF_F_NTUPLE;
+ }
+ return 0;
+}
+
int bnxt_restore_pf_fw_resources(struct bnxt *bp)
{
int rc;
@@ -8614,8 +8678,8 @@
memcpy(bp->dev->dev_addr, vf->mac_addr, ETH_ALEN);
} else {
eth_hw_addr_random(bp->dev);
- rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
}
+ rc = bnxt_approve_mac(bp, bp->dev->dev_addr);
#endif
}
return rc;
@@ -9078,6 +9142,7 @@
static int __init bnxt_init(void)
{
+ bnxt_debug_init();
return pci_register_driver(&bnxt_pci_driver);
}
@@ -9086,6 +9151,7 @@
pci_unregister_driver(&bnxt_pci_driver);
if (bnxt_pf_wq)
destroy_workqueue(bnxt_pf_wq);
+ bnxt_debug_exit();
}
module_init(bnxt_init);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3d55d3b..9b14eb6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -532,6 +532,19 @@
#define BNXT_HWRM_REQ_MAX_SIZE 128
#define BNXT_HWRM_REQS_PER_PAGE (BNXT_PAGE_SIZE / \
BNXT_HWRM_REQ_MAX_SIZE)
+#define HWRM_SHORT_MIN_TIMEOUT 3
+#define HWRM_SHORT_MAX_TIMEOUT 10
+#define HWRM_SHORT_TIMEOUT_COUNTER 5
+
+#define HWRM_MIN_TIMEOUT 25
+#define HWRM_MAX_TIMEOUT 40
+
+#define HWRM_TOTAL_TIMEOUT(n) (((n) <= HWRM_SHORT_TIMEOUT_COUNTER) ? \
+ ((n) * HWRM_SHORT_MIN_TIMEOUT) : \
+ (HWRM_SHORT_TIMEOUT_COUNTER * HWRM_SHORT_MIN_TIMEOUT + \
+ ((n) - HWRM_SHORT_TIMEOUT_COUNTER) * HWRM_MIN_TIMEOUT))
+
+#define HWRM_VALID_BIT_DELAY_USEC 20
#define BNXT_RX_EVENT 1
#define BNXT_AGG_EVENT 2
@@ -1242,6 +1255,7 @@
u8 max_tc;
u8 max_lltc; /* lossless TCs */
struct bnxt_queue_info q_info[BNXT_MAX_QUEUE];
+ u8 tc_to_qidx[BNXT_MAX_QUEUE];
unsigned int current_interval;
#define BNXT_TIMER_INTERVAL HZ
@@ -1384,6 +1398,8 @@
u16 *cfa_code_map; /* cfa_code -> vf_idx map */
u8 switch_id[8];
struct bnxt_tc_info *tc_info;
+ struct dentry *debugfs_pdev;
+ struct dentry *debugfs_dim;
};
#define BNXT_RX_STATS_OFFSET(counter) \
@@ -1398,8 +1414,7 @@
#define I2C_DEV_ADDR_A0 0xa0
#define I2C_DEV_ADDR_A2 0xa2
-#define SFP_EEPROM_SFF_8472_COMP_ADDR 0x5e
-#define SFP_EEPROM_SFF_8472_COMP_SIZE 1
+#define SFF_DIAG_SUPPORT_OFFSET 0x5c
#define SFF_MODULE_ID_SFP 0x3
#define SFF_MODULE_ID_QSFP 0xc
#define SFF_MODULE_ID_QSFP_PLUS 0xd
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index 3c746f2..d5bc72c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -21,6 +21,21 @@
#include "bnxt_dcb.h"
#ifdef CONFIG_BNXT_DCB
+static int bnxt_queue_to_tc(struct bnxt *bp, u8 queue_id)
+{
+ int i, j;
+
+ for (i = 0; i < bp->max_tc; i++) {
+ if (bp->q_info[i].queue_id == queue_id) {
+ for (j = 0; j < bp->max_tc; j++) {
+ if (bp->tc_to_qidx[j] == i)
+ return j;
+ }
+ }
+ }
+ return -EINVAL;
+}
+
static int bnxt_hwrm_queue_pri2cos_cfg(struct bnxt *bp, struct ieee_ets *ets)
{
struct hwrm_queue_pri2cos_cfg_input req = {0};
@@ -33,10 +48,13 @@
pri2cos = &req.pri0_cos_queue_id;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ u8 qidx;
+
req.enables |= cpu_to_le32(
QUEUE_PRI2COS_CFG_REQ_ENABLES_PRI0_COS_QUEUE_ID << i);
- pri2cos[i] = bp->q_info[ets->prio_tc[i]].queue_id;
+ qidx = bp->tc_to_qidx[ets->prio_tc[i]];
+ pri2cos[i] = bp->q_info[qidx].queue_id;
}
rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
return rc;
@@ -55,17 +73,15 @@
rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
if (!rc) {
u8 *pri2cos = &resp->pri0_cos_queue_id;
- int i, j;
+ int i;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
u8 queue_id = pri2cos[i];
+ int tc;
- for (j = 0; j < bp->max_tc; j++) {
- if (bp->q_info[j].queue_id == queue_id) {
- ets->prio_tc[i] = j;
- break;
- }
- }
+ tc = bnxt_queue_to_tc(bp, queue_id);
+ if (tc >= 0)
+ ets->prio_tc[i] = tc;
}
}
mutex_unlock(&bp->hwrm_cmd_lock);
@@ -81,13 +97,15 @@
void *data;
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_COS2BW_CFG, -1, -1);
- data = &req.unused_0;
- for (i = 0; i < max_tc; i++, data += sizeof(cos2bw) - 4) {
+ for (i = 0; i < max_tc; i++) {
+ u8 qidx;
+
req.enables |= cpu_to_le32(
QUEUE_COS2BW_CFG_REQ_ENABLES_COS_QUEUE_ID0_VALID << i);
memset(&cos2bw, 0, sizeof(cos2bw));
- cos2bw.queue_id = bp->q_info[i].queue_id;
+ qidx = bp->tc_to_qidx[i];
+ cos2bw.queue_id = bp->q_info[qidx].queue_id;
if (ets->tc_tsa[i] == IEEE_8021QAZ_TSA_STRICT) {
cos2bw.tsa =
QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP;
@@ -103,8 +121,9 @@
cpu_to_le32((ets->tc_tx_bw[i] * 100) |
BW_VALUE_UNIT_PERCENT1_100);
}
+ data = &req.unused_0 + qidx * (sizeof(cos2bw) - 4);
memcpy(data, &cos2bw.queue_id, sizeof(cos2bw) - 4);
- if (i == 0) {
+ if (qidx == 0) {
req.queue_id0 = cos2bw.queue_id;
req.unused_0 = 0;
}
@@ -132,66 +151,81 @@
data = &resp->queue_id0 + offsetof(struct bnxt_cos2bw_cfg, queue_id);
for (i = 0; i < bp->max_tc; i++, data += sizeof(cos2bw) - 4) {
- int j;
+ int tc;
memcpy(&cos2bw.queue_id, data, sizeof(cos2bw) - 4);
if (i == 0)
cos2bw.queue_id = resp->queue_id0;
- for (j = 0; j < bp->max_tc; j++) {
- if (bp->q_info[j].queue_id != cos2bw.queue_id)
- continue;
- if (cos2bw.tsa ==
- QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP) {
- ets->tc_tsa[j] = IEEE_8021QAZ_TSA_STRICT;
- } else {
- ets->tc_tsa[j] = IEEE_8021QAZ_TSA_ETS;
- ets->tc_tx_bw[j] = cos2bw.bw_weight;
- }
+ tc = bnxt_queue_to_tc(bp, cos2bw.queue_id);
+ if (tc < 0)
+ continue;
+
+ if (cos2bw.tsa ==
+ QUEUE_COS2BW_QCFG_RESP_QUEUE_ID0_TSA_ASSIGN_SP) {
+ ets->tc_tsa[tc] = IEEE_8021QAZ_TSA_STRICT;
+ } else {
+ ets->tc_tsa[tc] = IEEE_8021QAZ_TSA_ETS;
+ ets->tc_tx_bw[tc] = cos2bw.bw_weight;
}
}
mutex_unlock(&bp->hwrm_cmd_lock);
return 0;
}
-static int bnxt_hwrm_queue_cfg(struct bnxt *bp, unsigned int lltc_mask)
+static int bnxt_queue_remap(struct bnxt *bp, unsigned int lltc_mask)
{
- struct hwrm_queue_cfg_input req = {0};
- int i;
+ unsigned long qmap = 0;
+ int max = bp->max_tc;
+ int i, j, rc;
- if (netif_running(bp->dev))
- bnxt_tx_disable(bp);
-
- bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_CFG, -1, -1);
- req.flags = cpu_to_le32(QUEUE_CFG_REQ_FLAGS_PATH_BIDIR);
- req.enables = cpu_to_le32(QUEUE_CFG_REQ_ENABLES_SERVICE_PROFILE);
-
- /* Configure lossless queues to lossy first */
- req.service_profile = QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSY;
- for (i = 0; i < bp->max_tc; i++) {
- if (BNXT_LLQ(bp->q_info[i].queue_profile)) {
- req.queue_id = cpu_to_le32(bp->q_info[i].queue_id);
- hwrm_send_message(bp, &req, sizeof(req),
- HWRM_CMD_TIMEOUT);
- bp->q_info[i].queue_profile =
- QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSY;
- }
- }
-
- /* Now configure desired queues to lossless */
- req.service_profile = QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSLESS;
- for (i = 0; i < bp->max_tc; i++) {
+ /* Assign lossless TCs first */
+ for (i = 0, j = 0; i < max; ) {
if (lltc_mask & (1 << i)) {
- req.queue_id = cpu_to_le32(bp->q_info[i].queue_id);
- hwrm_send_message(bp, &req, sizeof(req),
- HWRM_CMD_TIMEOUT);
- bp->q_info[i].queue_profile =
- QUEUE_CFG_REQ_SERVICE_PROFILE_LOSSLESS;
+ if (BNXT_LLQ(bp->q_info[j].queue_profile)) {
+ bp->tc_to_qidx[i] = j;
+ __set_bit(j, &qmap);
+ i++;
+ }
+ j++;
+ continue;
+ }
+ i++;
+ }
+
+ for (i = 0, j = 0; i < max; i++) {
+ if (lltc_mask & (1 << i))
+ continue;
+ j = find_next_zero_bit(&qmap, max, j);
+ bp->tc_to_qidx[i] = j;
+ __set_bit(j, &qmap);
+ j++;
+ }
+
+ if (netif_running(bp->dev)) {
+ bnxt_close_nic(bp, false, false);
+ rc = bnxt_open_nic(bp, false, false);
+ if (rc) {
+ netdev_warn(bp->dev, "failed to open NIC, rc = %d\n", rc);
+ return rc;
}
}
- if (netif_running(bp->dev))
- bnxt_tx_enable(bp);
+ if (bp->ieee_ets) {
+ int tc = netdev_get_num_tc(bp->dev);
+ if (!tc)
+ tc = 1;
+ rc = bnxt_hwrm_queue_cos2bw_cfg(bp, bp->ieee_ets, tc);
+ if (rc) {
+ netdev_warn(bp->dev, "failed to config BW, rc = %d\n", rc);
+ return rc;
+ }
+ rc = bnxt_hwrm_queue_pri2cos_cfg(bp, bp->ieee_ets);
+ if (rc) {
+ netdev_warn(bp->dev, "failed to config prio, rc = %d\n", rc);
+ return rc;
+ }
+ }
return 0;
}
@@ -201,7 +235,7 @@
struct ieee_ets *my_ets = bp->ieee_ets;
unsigned int tc_mask = 0, pri_mask = 0;
u8 i, pri, lltc_count = 0;
- bool need_q_recfg = false;
+ bool need_q_remap = false;
int rc;
if (!my_ets)
@@ -221,22 +255,26 @@
if (lltc_count > bp->max_lltc)
return -EINVAL;
+ for (i = 0; i < bp->max_tc; i++) {
+ if (tc_mask & (1 << i)) {
+ u8 qidx = bp->tc_to_qidx[i];
+
+ if (!BNXT_LLQ(bp->q_info[qidx].queue_profile)) {
+ need_q_remap = true;
+ break;
+ }
+ }
+ }
+
+ if (need_q_remap)
+ rc = bnxt_queue_remap(bp, tc_mask);
+
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_QUEUE_PFCENABLE_CFG, -1, -1);
req.flags = cpu_to_le32(pri_mask);
rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
if (rc)
return rc;
- for (i = 0; i < bp->max_tc; i++) {
- if (tc_mask & (1 << i)) {
- if (!BNXT_LLQ(bp->q_info[i].queue_profile))
- need_q_recfg = true;
- }
- }
-
- if (need_q_recfg)
- rc = bnxt_hwrm_queue_cfg(bp, tc_mask);
-
return rc;
}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
new file mode 100644
index 0000000..94e208e
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c
@@ -0,0 +1,124 @@
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * Copyright (c) 2017-2018 Broadcom Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/debugfs.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include "bnxt_hsi.h"
+#include <linux/net_dim.h>
+#include "bnxt.h"
+#include "bnxt_debugfs.h"
+
+static struct dentry *bnxt_debug_mnt;
+
+static ssize_t debugfs_dim_read(struct file *filep,
+ char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct net_dim *dim = filep->private_data;
+ int len;
+ char *buf;
+
+ if (*ppos)
+ return 0;
+ if (!dim)
+ return -ENODEV;
+ buf = kasprintf(GFP_KERNEL,
+ "state = %d\n" \
+ "profile_ix = %d\n" \
+ "mode = %d\n" \
+ "tune_state = %d\n" \
+ "steps_right = %d\n" \
+ "steps_left = %d\n" \
+ "tired = %d\n",
+ dim->state,
+ dim->profile_ix,
+ dim->mode,
+ dim->tune_state,
+ dim->steps_right,
+ dim->steps_left,
+ dim->tired);
+ if (!buf)
+ return -ENOMEM;
+ if (count < strlen(buf)) {
+ kfree(buf);
+ return -ENOSPC;
+ }
+ len = simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
+ kfree(buf);
+ return len;
+}
+
+static const struct file_operations debugfs_dim_fops = {
+ .owner = THIS_MODULE,
+ .open = simple_open,
+ .read = debugfs_dim_read,
+};
+
+static struct dentry *debugfs_dim_ring_init(struct net_dim *dim, int ring_idx,
+ struct dentry *dd)
+{
+ static char qname[16];
+
+ snprintf(qname, 10, "%d", ring_idx);
+ return debugfs_create_file(qname, 0600, dd,
+ dim, &debugfs_dim_fops);
+}
+
+void bnxt_debug_dev_init(struct bnxt *bp)
+{
+ const char *pname = pci_name(bp->pdev);
+ struct dentry *pdevf;
+ int i;
+
+ bp->debugfs_pdev = debugfs_create_dir(pname, bnxt_debug_mnt);
+ if (bp->debugfs_pdev) {
+ pdevf = debugfs_create_dir("dim", bp->debugfs_pdev);
+ if (!pdevf) {
+ pr_err("failed to create debugfs entry %s/dim\n",
+ pname);
+ return;
+ }
+ bp->debugfs_dim = pdevf;
+ /* create files for each rx ring */
+ for (i = 0; i < bp->cp_nr_rings; i++) {
+ struct bnxt_cp_ring_info *cpr = &bp->bnapi[i]->cp_ring;
+
+ if (cpr && bp->bnapi[i]->rx_ring) {
+ pdevf = debugfs_dim_ring_init(&cpr->dim, i,
+ bp->debugfs_dim);
+ if (!pdevf)
+ pr_err("failed to create debugfs entry %s/dim/%d\n",
+ pname, i);
+ }
+ }
+ } else {
+ pr_err("failed to create debugfs entry %s\n", pname);
+ }
+}
+
+void bnxt_debug_dev_exit(struct bnxt *bp)
+{
+ if (bp) {
+ debugfs_remove_recursive(bp->debugfs_pdev);
+ bp->debugfs_pdev = NULL;
+ }
+}
+
+void bnxt_debug_init(void)
+{
+ bnxt_debug_mnt = debugfs_create_dir("bnxt_en", NULL);
+ if (!bnxt_debug_mnt)
+ pr_err("failed to init bnxt_en debugfs\n");
+}
+
+void bnxt_debug_exit(void)
+{
+ debugfs_remove_recursive(bnxt_debug_mnt);
+}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.h
new file mode 100644
index 0000000..d0bb488
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.h
@@ -0,0 +1,23 @@
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * Copyright (c) 2017-2018 Broadcom Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include "bnxt_hsi.h"
+#include "bnxt.h"
+
+#ifdef CONFIG_DEBUG_FS
+void bnxt_debug_init(void);
+void bnxt_debug_exit(void);
+void bnxt_debug_dev_init(struct bnxt *bp);
+void bnxt_debug_dev_exit(struct bnxt *bp);
+#else
+static inline void bnxt_debug_init(void) {}
+static inline void bnxt_debug_exit(void) {}
+static inline void bnxt_debug_dev_init(struct bnxt *bp) {}
+static inline void bnxt_debug_dev_exit(struct bnxt *bp) {}
+#endif
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
index 408dd19..afa97c8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dim.c
@@ -21,11 +21,11 @@
struct bnxt_napi *bnapi = container_of(cpr,
struct bnxt_napi,
cp_ring);
- struct net_dim_cq_moder cur_profile = net_dim_get_profile(dim->mode,
- dim->profile_ix);
+ struct net_dim_cq_moder cur_moder =
+ net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
- cpr->rx_ring_coal.coal_ticks = cur_profile.usec;
- cpr->rx_ring_coal.coal_bufs = cur_profile.pkts;
+ cpr->rx_ring_coal.coal_ticks = cur_moder.usec;
+ cpr->rx_ring_coal.coal_bufs = cur_moder.pkts;
bnxt_hwrm_set_ring_coal(bnapi->bp, bnapi);
dim->state = NET_DIM_START_MEASURE;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 8ba14ae..7270c8b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -140,6 +140,19 @@
#define BNXT_RX_STATS_EXT_ENTRY(counter) \
{ BNXT_RX_STATS_EXT_OFFSET(counter), __stringify(counter) }
+enum {
+ RX_TOTAL_DISCARDS,
+ TX_TOTAL_DISCARDS,
+};
+
+static struct {
+ u64 counter;
+ char string[ETH_GSTRING_LEN];
+} bnxt_sw_func_stats[] = {
+ {0, "rx_total_discard_pkts"},
+ {0, "tx_total_discard_pkts"},
+};
+
static const struct {
long offset;
char string[ETH_GSTRING_LEN];
@@ -237,6 +250,7 @@
BNXT_RX_STATS_EXT_ENTRY(resume_roce_pause_events),
};
+#define BNXT_NUM_SW_FUNC_STATS ARRAY_SIZE(bnxt_sw_func_stats)
#define BNXT_NUM_PORT_STATS ARRAY_SIZE(bnxt_port_stats_arr)
#define BNXT_NUM_PORT_STATS_EXT ARRAY_SIZE(bnxt_port_stats_ext_arr)
@@ -244,6 +258,8 @@
{
int num_stats = BNXT_NUM_STATS * bp->cp_nr_rings;
+ num_stats += BNXT_NUM_SW_FUNC_STATS;
+
if (bp->flags & BNXT_FLAG_PORT_STATS)
num_stats += BNXT_NUM_PORT_STATS;
@@ -279,6 +295,9 @@
if (!bp->bnapi)
return;
+ for (i = 0; i < BNXT_NUM_SW_FUNC_STATS; i++)
+ bnxt_sw_func_stats[i].counter = 0;
+
for (i = 0; i < bp->cp_nr_rings; i++) {
struct bnxt_napi *bnapi = bp->bnapi[i];
struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
@@ -288,7 +307,16 @@
for (k = 0; k < stat_fields; j++, k++)
buf[j] = le64_to_cpu(hw_stats[k]);
buf[j++] = cpr->rx_l4_csum_errors;
+
+ bnxt_sw_func_stats[RX_TOTAL_DISCARDS].counter +=
+ le64_to_cpu(cpr->hw_stats->rx_discard_pkts);
+ bnxt_sw_func_stats[TX_TOTAL_DISCARDS].counter +=
+ le64_to_cpu(cpr->hw_stats->tx_discard_pkts);
}
+
+ for (i = 0; i < BNXT_NUM_SW_FUNC_STATS; i++, j++)
+ buf[j] = bnxt_sw_func_stats[i].counter;
+
if (bp->flags & BNXT_FLAG_PORT_STATS) {
__le64 *port_stats = (__le64 *)bp->hw_rx_port_stats;
@@ -359,6 +387,11 @@
sprintf(buf, "[%d]: rx_l4_csum_errors", i);
buf += ETH_GSTRING_LEN;
}
+ for (i = 0; i < BNXT_NUM_SW_FUNC_STATS; i++) {
+ strcpy(buf, bnxt_sw_func_stats[i].string);
+ buf += ETH_GSTRING_LEN;
+ }
+
if (bp->flags & BNXT_FLAG_PORT_STATS) {
for (i = 0; i < BNXT_NUM_PORT_STATS; i++) {
strcpy(buf, bnxt_port_stats_arr[i].string);
@@ -551,6 +584,8 @@
* to renable
*/
}
+ } else {
+ rc = bnxt_reserve_rings(bp);
}
return rc;
@@ -1785,6 +1820,11 @@
static int bnxt_get_eeprom_len(struct net_device *dev)
{
+ struct bnxt *bp = netdev_priv(dev);
+
+ if (BNXT_VF(bp))
+ return 0;
+
/* The -1 return value allows the entire 32-bit range of offsets to be
* passed via the ethtool command-line utility.
*/
@@ -2144,9 +2184,8 @@
static int bnxt_get_module_info(struct net_device *dev,
struct ethtool_modinfo *modinfo)
{
+ u8 data[SFF_DIAG_SUPPORT_OFFSET + 1];
struct bnxt *bp = netdev_priv(dev);
- struct hwrm_port_phy_i2c_read_input req = {0};
- struct hwrm_port_phy_i2c_read_output *output = bp->hwrm_cmd_resp_addr;
int rc;
/* No point in going further if phy status indicates
@@ -2161,21 +2200,19 @@
if (bp->hwrm_spec_code < 0x10202)
return -EOPNOTSUPP;
- bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_PHY_I2C_READ, -1, -1);
- req.i2c_slave_addr = I2C_DEV_ADDR_A0;
- req.page_number = 0;
- req.page_offset = cpu_to_le16(SFP_EEPROM_SFF_8472_COMP_ADDR);
- req.data_length = SFP_EEPROM_SFF_8472_COMP_SIZE;
- req.port_id = cpu_to_le16(bp->pf.port_id);
- mutex_lock(&bp->hwrm_cmd_lock);
- rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+ rc = bnxt_read_sfp_module_eeprom_info(bp, I2C_DEV_ADDR_A0, 0, 0,
+ SFF_DIAG_SUPPORT_OFFSET + 1,
+ data);
if (!rc) {
- u32 module_id = le32_to_cpu(output->data[0]);
+ u8 module_id = data[0];
+ u8 diag_supported = data[SFF_DIAG_SUPPORT_OFFSET];
switch (module_id) {
case SFF_MODULE_ID_SFP:
modinfo->type = ETH_MODULE_SFF_8472;
modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;
+ if (!diag_supported)
+ modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
break;
case SFF_MODULE_ID_QSFP:
case SFF_MODULE_ID_QSFP_PLUS:
@@ -2191,7 +2228,6 @@
break;
}
}
- mutex_unlock(&bp->hwrm_cmd_lock);
return rc;
}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index f952963..a649108 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -462,13 +462,13 @@
vf_vnics = hw_resc->max_vnics - bp->nr_vnics;
vf_vnics = min_t(u16, vf_vnics, vf_rx_rings);
- req.min_rsscos_ctx = cpu_to_le16(1);
- req.max_rsscos_ctx = cpu_to_le16(1);
+ req.min_rsscos_ctx = cpu_to_le16(BNXT_VF_MIN_RSS_CTX);
+ req.max_rsscos_ctx = cpu_to_le16(BNXT_VF_MAX_RSS_CTX);
if (pf->vf_resv_strategy == BNXT_VF_RESV_STRATEGY_MINIMAL) {
req.min_cmpl_rings = cpu_to_le16(1);
req.min_tx_rings = cpu_to_le16(1);
req.min_rx_rings = cpu_to_le16(1);
- req.min_l2_ctxs = cpu_to_le16(1);
+ req.min_l2_ctxs = cpu_to_le16(BNXT_VF_MIN_L2_CTX);
req.min_vnics = cpu_to_le16(1);
req.min_stat_ctx = cpu_to_le16(1);
req.min_hw_ring_grps = cpu_to_le16(1);
@@ -483,7 +483,7 @@
req.min_cmpl_rings = cpu_to_le16(vf_cp_rings);
req.min_tx_rings = cpu_to_le16(vf_tx_rings);
req.min_rx_rings = cpu_to_le16(vf_rx_rings);
- req.min_l2_ctxs = cpu_to_le16(4);
+ req.min_l2_ctxs = cpu_to_le16(BNXT_VF_MAX_L2_CTX);
req.min_vnics = cpu_to_le16(vf_vnics);
req.min_stat_ctx = cpu_to_le16(vf_stat_ctx);
req.min_hw_ring_grps = cpu_to_le16(vf_ring_grps);
@@ -491,7 +491,7 @@
req.max_cmpl_rings = cpu_to_le16(vf_cp_rings);
req.max_tx_rings = cpu_to_le16(vf_tx_rings);
req.max_rx_rings = cpu_to_le16(vf_rx_rings);
- req.max_l2_ctxs = cpu_to_le16(4);
+ req.max_l2_ctxs = cpu_to_le16(BNXT_VF_MAX_L2_CTX);
req.max_vnics = cpu_to_le16(vf_vnics);
req.max_stat_ctx = cpu_to_le16(vf_stat_ctx);
req.max_hw_ring_grps = cpu_to_le16(vf_ring_grps);
@@ -809,6 +809,9 @@
struct hwrm_fwd_resp_input req = {0};
struct hwrm_fwd_resp_output *resp = bp->hwrm_cmd_resp_addr;
+ if (BNXT_FWD_RESP_SIZE_ERR(msg_size))
+ return -EINVAL;
+
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FWD_RESP, -1, -1);
/* Set the new target id */
@@ -845,6 +848,9 @@
struct hwrm_reject_fwd_resp_input req = {0};
struct hwrm_reject_fwd_resp_output *resp = bp->hwrm_cmd_resp_addr;
+ if (BNXT_REJ_FWD_RESP_SIZE_ERR(msg_size))
+ return -EINVAL;
+
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_REJECT_FWD_RESP, -1, -1);
/* Set the new target id */
req.target_id = cpu_to_le16(vf->fw_fid);
@@ -877,6 +883,9 @@
struct hwrm_exec_fwd_resp_input req = {0};
struct hwrm_exec_fwd_resp_output *resp = bp->hwrm_cmd_resp_addr;
+ if (BNXT_EXEC_FWD_RESP_SIZE_ERR(msg_size))
+ return -EINVAL;
+
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_EXEC_FWD_RESP, -1, -1);
/* Set the new target id */
req.target_id = cpu_to_le16(vf->fw_fid);
@@ -914,7 +923,8 @@
if (req->enables & cpu_to_le32(FUNC_VF_CFG_REQ_ENABLES_DFLT_MAC_ADDR)) {
if (is_valid_ether_addr(req->dflt_mac_addr) &&
((vf->flags & BNXT_VF_TRUST) ||
- (!is_valid_ether_addr(vf->mac_addr)))) {
+ !is_valid_ether_addr(vf->mac_addr) ||
+ ether_addr_equal(req->dflt_mac_addr, vf->mac_addr))) {
ether_addr_copy(vf->vf_mac_addr, req->dflt_mac_addr);
return bnxt_hwrm_exec_fwd_resp(bp, vf, msg_size);
}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index d10f6f6..e9b20cd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -11,6 +11,23 @@
#ifndef BNXT_SRIOV_H
#define BNXT_SRIOV_H
+#define BNXT_FWD_RESP_SIZE_ERR(n) \
+ ((offsetof(struct hwrm_fwd_resp_input, encap_resp) + n) > \
+ sizeof(struct hwrm_fwd_resp_input))
+
+#define BNXT_EXEC_FWD_RESP_SIZE_ERR(n) \
+ ((offsetof(struct hwrm_exec_fwd_resp_input, encap_request) + n) >\
+ offsetof(struct hwrm_exec_fwd_resp_input, encap_resp_target_id))
+
+#define BNXT_REJ_FWD_RESP_SIZE_ERR(n) \
+ ((offsetof(struct hwrm_reject_fwd_resp_input, encap_request) + n) >\
+ offsetof(struct hwrm_reject_fwd_resp_input, encap_resp_target_id))
+
+#define BNXT_VF_MIN_RSS_CTX 1
+#define BNXT_VF_MAX_RSS_CTX 1
+#define BNXT_VF_MIN_L2_CTX 1
+#define BNXT_VF_MAX_L2_CTX 4
+
int bnxt_get_vf_config(struct net_device *, int, struct ifla_vf_info *);
int bnxt_set_vf_mac(struct net_device *, int, u8 *);
int bnxt_set_vf_vlan(struct net_device *, int, u16, u8, __be16);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 1389ab5..1f0e872 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -113,10 +113,10 @@
if (tx_avail != bp->tx_ring_size)
*event &= ~BNXT_RX_EVENT;
+ *len = xdp.data_end - xdp.data;
if (orig_data != xdp.data) {
offset = xdp.data - xdp.data_hard_start;
*data_ptr = xdp.data_hard_start + offset;
- *len = xdp.data_end - xdp.data;
}
switch (act) {
case XDP_PASS:
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 0445f2c..20c1681 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -652,7 +652,7 @@
pkts = ring->rx_max_coalesced_frames;
if (ec->use_adaptive_rx_coalesce && !ring->dim.use_dim) {
- moder = net_dim_get_def_profile(ring->dim.dim.mode);
+ moder = net_dim_get_def_rx_moderation(ring->dim.dim.mode);
usecs = moder.usec;
pkts = moder.pkts;
}
@@ -1925,7 +1925,7 @@
struct bcmgenet_rx_ring *ring =
container_of(ndim, struct bcmgenet_rx_ring, dim);
struct net_dim_cq_moder cur_profile =
- net_dim_get_profile(dim->mode, dim->profile_ix);
+ net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
bcmgenet_set_rx_coalesce(ring, cur_profile.usec, cur_profile.pkts);
dim->state = NET_DIM_START_MEASURE;
@@ -2102,7 +2102,7 @@
/* If DIM was enabled, re-apply default parameters */
if (dim->use_dim) {
- moder = net_dim_get_def_profile(dim->dim.mode);
+ moder = net_dim_get_def_rx_moderation(dim->dim.mode);
usecs = moder.usec;
pkts = moder.pkts;
}
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index b4c9268..3e93df5 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -591,16 +591,10 @@
dev_set_drvdata(&bp->dev->dev, bp->mii_bus);
np = bp->pdev->dev.of_node;
+ if (pdata)
+ bp->mii_bus->phy_mask = pdata->phy_mask;
- if (np) {
- err = of_mdiobus_register(bp->mii_bus, np);
- } else {
- if (pdata)
- bp->mii_bus->phy_mask = pdata->phy_mask;
-
- err = mdiobus_register(bp->mii_bus);
- }
-
+ err = of_mdiobus_register(bp->mii_bus, np);
if (err)
goto err_out_free_mdiobus;
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index e8b2904..929d485 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -1245,7 +1245,7 @@
CN23XX_SLI_MAC_PF_INT_ENB64(oct->pcie_port, oct->pf_num);
}
-static int cn23xx_sriov_config(struct octeon_device *oct)
+int cn23xx_sriov_config(struct octeon_device *oct)
{
struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip;
u32 max_rings, total_rings, max_vfs, rings_per_vf;
@@ -1269,8 +1269,8 @@
break;
}
- if (max_rings <= num_present_cpus())
- num_pf_rings = 1;
+ if (oct->sriov_info.num_pf_rings)
+ num_pf_rings = oct->sriov_info.num_pf_rings;
else
num_pf_rings = num_present_cpus();
@@ -1497,3 +1497,57 @@
octeon_mbox_write(oct, &mbox_cmd);
}
}
+
+static void
+cn23xx_get_vf_stats_callback(struct octeon_device *oct,
+ struct octeon_mbox_cmd *cmd, void *arg)
+{
+ struct oct_vf_stats_ctx *ctx = arg;
+
+ memcpy(ctx->stats, cmd->data, sizeof(struct oct_vf_stats));
+ atomic_set(&ctx->status, 1);
+}
+
+int cn23xx_get_vf_stats(struct octeon_device *oct, int vfidx,
+ struct oct_vf_stats *stats)
+{
+ u32 timeout = HZ; // 1sec
+ struct octeon_mbox_cmd mbox_cmd;
+ struct oct_vf_stats_ctx ctx;
+ u32 count = 0, ret;
+
+ if (!(oct->sriov_info.vf_drv_loaded_mask & (1ULL << vfidx)))
+ return -1;
+
+ if (sizeof(struct oct_vf_stats) > sizeof(mbox_cmd.data))
+ return -1;
+
+ mbox_cmd.msg.u64 = 0;
+ mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST;
+ mbox_cmd.msg.s.resp_needed = 1;
+ mbox_cmd.msg.s.cmd = OCTEON_GET_VF_STATS;
+ mbox_cmd.msg.s.len = 1;
+ mbox_cmd.q_no = vfidx * oct->sriov_info.rings_per_vf;
+ mbox_cmd.recv_len = 0;
+ mbox_cmd.recv_status = 0;
+ mbox_cmd.fn = (octeon_mbox_callback_t)cn23xx_get_vf_stats_callback;
+ ctx.stats = stats;
+ atomic_set(&ctx.status, 0);
+ mbox_cmd.fn_arg = (void *)&ctx;
+ memset(mbox_cmd.data, 0, sizeof(mbox_cmd.data));
+ octeon_mbox_write(oct, &mbox_cmd);
+
+ do {
+ schedule_timeout_uninterruptible(1);
+ } while ((atomic_read(&ctx.status) == 0) && (count++ < timeout));
+
+ ret = atomic_read(&ctx.status);
+ if (ret == 0) {
+ octeon_mbox_cancel(oct, 0);
+ dev_err(&oct->pci_dev->dev, "Unable to get stats from VF-%d, timedout\n",
+ vfidx);
+ return -1;
+ }
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h
index 2aba524..e6f31d0 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h
@@ -43,6 +43,15 @@
#define CN23XX_SLI_DEF_BP 0x40
+struct oct_vf_stats {
+ u64 rx_packets;
+ u64 tx_packets;
+ u64 rx_bytes;
+ u64 tx_bytes;
+ u64 broadcast;
+ u64 multicast;
+};
+
int setup_cn23xx_octeon_pf_device(struct octeon_device *oct);
int validate_cn23xx_pf_config_info(struct octeon_device *oct,
@@ -52,8 +61,13 @@
void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct);
+int cn23xx_sriov_config(struct octeon_device *oct);
+
int cn23xx_fw_loaded(struct octeon_device *oct);
void cn23xx_tell_vf_its_macaddr_changed(struct octeon_device *oct, int vfidx,
u8 *mac);
+
+int cn23xx_get_vf_stats(struct octeon_device *oct, int ifidx,
+ struct oct_vf_stats *stats);
#endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 2a94eee..8093c5e 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -29,6 +29,162 @@
/* OOM task polling interval */
#define LIO_OOM_POLL_INTERVAL_MS 250
+#define OCTNIC_MAX_SG MAX_SKB_FRAGS
+
+/**
+ * \brief Callback for getting interface configuration
+ * @param status status of request
+ * @param buf pointer to resp structure
+ */
+void lio_if_cfg_callback(struct octeon_device *oct,
+ u32 status __attribute__((unused)), void *buf)
+{
+ struct octeon_soft_command *sc = (struct octeon_soft_command *)buf;
+ struct liquidio_if_cfg_context *ctx;
+ struct liquidio_if_cfg_resp *resp;
+
+ resp = (struct liquidio_if_cfg_resp *)sc->virtrptr;
+ ctx = (struct liquidio_if_cfg_context *)sc->ctxptr;
+
+ oct = lio_get_device(ctx->octeon_id);
+ if (resp->status)
+ dev_err(&oct->pci_dev->dev, "nic if cfg instruction failed. Status: %llx\n",
+ CVM_CAST64(resp->status));
+ WRITE_ONCE(ctx->cond, 1);
+
+ snprintf(oct->fw_info.liquidio_firmware_version, 32, "%s",
+ resp->cfg_info.liquidio_firmware_version);
+
+ /* This barrier is required to be sure that the response has been
+ * written fully before waking up the handler
+ */
+ wmb();
+
+ wake_up_interruptible(&ctx->wc);
+}
+
+/**
+ * \brief Delete gather lists
+ * @param lio per-network private data
+ */
+void lio_delete_glists(struct lio *lio)
+{
+ struct octnic_gather *g;
+ int i;
+
+ kfree(lio->glist_lock);
+ lio->glist_lock = NULL;
+
+ if (!lio->glist)
+ return;
+
+ for (i = 0; i < lio->oct_dev->num_iqs; i++) {
+ do {
+ g = (struct octnic_gather *)
+ lio_list_delete_head(&lio->glist[i]);
+ kfree(g);
+ } while (g);
+
+ if (lio->glists_virt_base && lio->glists_virt_base[i] &&
+ lio->glists_dma_base && lio->glists_dma_base[i]) {
+ lio_dma_free(lio->oct_dev,
+ lio->glist_entry_size * lio->tx_qsize,
+ lio->glists_virt_base[i],
+ lio->glists_dma_base[i]);
+ }
+ }
+
+ kfree(lio->glists_virt_base);
+ lio->glists_virt_base = NULL;
+
+ kfree(lio->glists_dma_base);
+ lio->glists_dma_base = NULL;
+
+ kfree(lio->glist);
+ lio->glist = NULL;
+}
+
+/**
+ * \brief Setup gather lists
+ * @param lio per-network private data
+ */
+int lio_setup_glists(struct octeon_device *oct, struct lio *lio, int num_iqs)
+{
+ struct octnic_gather *g;
+ int i, j;
+
+ lio->glist_lock =
+ kcalloc(num_iqs, sizeof(*lio->glist_lock), GFP_KERNEL);
+ if (!lio->glist_lock)
+ return -ENOMEM;
+
+ lio->glist =
+ kcalloc(num_iqs, sizeof(*lio->glist), GFP_KERNEL);
+ if (!lio->glist) {
+ kfree(lio->glist_lock);
+ lio->glist_lock = NULL;
+ return -ENOMEM;
+ }
+
+ lio->glist_entry_size =
+ ROUNDUP8((ROUNDUP4(OCTNIC_MAX_SG) >> 2) * OCT_SG_ENTRY_SIZE);
+
+ /* allocate memory to store virtual and dma base address of
+ * per glist consistent memory
+ */
+ lio->glists_virt_base = kcalloc(num_iqs, sizeof(*lio->glists_virt_base),
+ GFP_KERNEL);
+ lio->glists_dma_base = kcalloc(num_iqs, sizeof(*lio->glists_dma_base),
+ GFP_KERNEL);
+
+ if (!lio->glists_virt_base || !lio->glists_dma_base) {
+ lio_delete_glists(lio);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < num_iqs; i++) {
+ int numa_node = dev_to_node(&oct->pci_dev->dev);
+
+ spin_lock_init(&lio->glist_lock[i]);
+
+ INIT_LIST_HEAD(&lio->glist[i]);
+
+ lio->glists_virt_base[i] =
+ lio_dma_alloc(oct,
+ lio->glist_entry_size * lio->tx_qsize,
+ &lio->glists_dma_base[i]);
+
+ if (!lio->glists_virt_base[i]) {
+ lio_delete_glists(lio);
+ return -ENOMEM;
+ }
+
+ for (j = 0; j < lio->tx_qsize; j++) {
+ g = kzalloc_node(sizeof(*g), GFP_KERNEL,
+ numa_node);
+ if (!g)
+ g = kzalloc(sizeof(*g), GFP_KERNEL);
+ if (!g)
+ break;
+
+ g->sg = lio->glists_virt_base[i] +
+ (j * lio->glist_entry_size);
+
+ g->sg_dma_ptr = lio->glists_dma_base[i] +
+ (j * lio->glist_entry_size);
+
+ list_add_tail(&g->list, &lio->glist[i]);
+ }
+
+ if (j != lio->tx_qsize) {
+ lio_delete_glists(lio);
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
int liquidio_set_feature(struct net_device *netdev, int cmd, u16 param1)
{
struct lio *lio = GET_LIO(netdev);
@@ -880,8 +1036,8 @@
int num_ioq_vectors;
int irqret, err;
- oct->num_msix_irqs = num_ioqs;
if (oct->msix_on) {
+ oct->num_msix_irqs = num_ioqs;
if (OCTEON_CN23XX_PF(oct)) {
num_interrupts = MAX_IOQ_INTERRUPTS_PER_PF + 1;
@@ -1169,3 +1325,355 @@
return pending_pkts;
}
+
+static void
+octnet_nic_stats_callback(struct octeon_device *oct_dev,
+ u32 status, void *ptr)
+{
+ struct octeon_soft_command *sc = (struct octeon_soft_command *)ptr;
+ struct oct_nic_stats_resp *resp =
+ (struct oct_nic_stats_resp *)sc->virtrptr;
+ struct oct_nic_stats_ctrl *ctrl =
+ (struct oct_nic_stats_ctrl *)sc->ctxptr;
+ struct nic_rx_stats *rsp_rstats = &resp->stats.fromwire;
+ struct nic_tx_stats *rsp_tstats = &resp->stats.fromhost;
+ struct nic_rx_stats *rstats = &oct_dev->link_stats.fromwire;
+ struct nic_tx_stats *tstats = &oct_dev->link_stats.fromhost;
+
+ if (status != OCTEON_REQUEST_TIMEOUT && !resp->status) {
+ octeon_swap_8B_data((u64 *)&resp->stats,
+ (sizeof(struct oct_link_stats)) >> 3);
+
+ /* RX link-level stats */
+ rstats->total_rcvd = rsp_rstats->total_rcvd;
+ rstats->bytes_rcvd = rsp_rstats->bytes_rcvd;
+ rstats->total_bcst = rsp_rstats->total_bcst;
+ rstats->total_mcst = rsp_rstats->total_mcst;
+ rstats->runts = rsp_rstats->runts;
+ rstats->ctl_rcvd = rsp_rstats->ctl_rcvd;
+ /* Accounts for over/under-run of buffers */
+ rstats->fifo_err = rsp_rstats->fifo_err;
+ rstats->dmac_drop = rsp_rstats->dmac_drop;
+ rstats->fcs_err = rsp_rstats->fcs_err;
+ rstats->jabber_err = rsp_rstats->jabber_err;
+ rstats->l2_err = rsp_rstats->l2_err;
+ rstats->frame_err = rsp_rstats->frame_err;
+ rstats->red_drops = rsp_rstats->red_drops;
+
+ /* RX firmware stats */
+ rstats->fw_total_rcvd = rsp_rstats->fw_total_rcvd;
+ rstats->fw_total_fwd = rsp_rstats->fw_total_fwd;
+ rstats->fw_total_mcast = rsp_rstats->fw_total_mcast;
+ rstats->fw_total_bcast = rsp_rstats->fw_total_bcast;
+ rstats->fw_err_pko = rsp_rstats->fw_err_pko;
+ rstats->fw_err_link = rsp_rstats->fw_err_link;
+ rstats->fw_err_drop = rsp_rstats->fw_err_drop;
+ rstats->fw_rx_vxlan = rsp_rstats->fw_rx_vxlan;
+ rstats->fw_rx_vxlan_err = rsp_rstats->fw_rx_vxlan_err;
+
+ /* Number of packets that are LROed */
+ rstats->fw_lro_pkts = rsp_rstats->fw_lro_pkts;
+ /* Number of octets that are LROed */
+ rstats->fw_lro_octs = rsp_rstats->fw_lro_octs;
+ /* Number of LRO packets formed */
+ rstats->fw_total_lro = rsp_rstats->fw_total_lro;
+ /* Number of times lRO of packet aborted */
+ rstats->fw_lro_aborts = rsp_rstats->fw_lro_aborts;
+ rstats->fw_lro_aborts_port = rsp_rstats->fw_lro_aborts_port;
+ rstats->fw_lro_aborts_seq = rsp_rstats->fw_lro_aborts_seq;
+ rstats->fw_lro_aborts_tsval = rsp_rstats->fw_lro_aborts_tsval;
+ rstats->fw_lro_aborts_timer = rsp_rstats->fw_lro_aborts_timer;
+ /* intrmod: packet forward rate */
+ rstats->fwd_rate = rsp_rstats->fwd_rate;
+
+ /* TX link-level stats */
+ tstats->total_pkts_sent = rsp_tstats->total_pkts_sent;
+ tstats->total_bytes_sent = rsp_tstats->total_bytes_sent;
+ tstats->mcast_pkts_sent = rsp_tstats->mcast_pkts_sent;
+ tstats->bcast_pkts_sent = rsp_tstats->bcast_pkts_sent;
+ tstats->ctl_sent = rsp_tstats->ctl_sent;
+ /* Packets sent after one collision*/
+ tstats->one_collision_sent = rsp_tstats->one_collision_sent;
+ /* Packets sent after multiple collision*/
+ tstats->multi_collision_sent = rsp_tstats->multi_collision_sent;
+ /* Packets not sent due to max collisions */
+ tstats->max_collision_fail = rsp_tstats->max_collision_fail;
+ /* Packets not sent due to max deferrals */
+ tstats->max_deferral_fail = rsp_tstats->max_deferral_fail;
+ /* Accounts for over/under-run of buffers */
+ tstats->fifo_err = rsp_tstats->fifo_err;
+ tstats->runts = rsp_tstats->runts;
+ /* Total number of collisions detected */
+ tstats->total_collisions = rsp_tstats->total_collisions;
+
+ /* firmware stats */
+ tstats->fw_total_sent = rsp_tstats->fw_total_sent;
+ tstats->fw_total_fwd = rsp_tstats->fw_total_fwd;
+ tstats->fw_total_mcast_sent = rsp_tstats->fw_total_mcast_sent;
+ tstats->fw_total_bcast_sent = rsp_tstats->fw_total_bcast_sent;
+ tstats->fw_err_pko = rsp_tstats->fw_err_pko;
+ tstats->fw_err_pki = rsp_tstats->fw_err_pki;
+ tstats->fw_err_link = rsp_tstats->fw_err_link;
+ tstats->fw_err_drop = rsp_tstats->fw_err_drop;
+ tstats->fw_tso = rsp_tstats->fw_tso;
+ tstats->fw_tso_fwd = rsp_tstats->fw_tso_fwd;
+ tstats->fw_err_tso = rsp_tstats->fw_err_tso;
+ tstats->fw_tx_vxlan = rsp_tstats->fw_tx_vxlan;
+
+ resp->status = 1;
+ } else {
+ resp->status = -1;
+ }
+ complete(&ctrl->complete);
+}
+
+int octnet_get_link_stats(struct net_device *netdev)
+{
+ struct lio *lio = GET_LIO(netdev);
+ struct octeon_device *oct_dev = lio->oct_dev;
+ struct octeon_soft_command *sc;
+ struct oct_nic_stats_ctrl *ctrl;
+ struct oct_nic_stats_resp *resp;
+ int retval;
+
+ /* Alloc soft command */
+ sc = (struct octeon_soft_command *)
+ octeon_alloc_soft_command(oct_dev,
+ 0,
+ sizeof(struct oct_nic_stats_resp),
+ sizeof(struct octnic_ctrl_pkt));
+
+ if (!sc)
+ return -ENOMEM;
+
+ resp = (struct oct_nic_stats_resp *)sc->virtrptr;
+ memset(resp, 0, sizeof(struct oct_nic_stats_resp));
+
+ ctrl = (struct oct_nic_stats_ctrl *)sc->ctxptr;
+ memset(ctrl, 0, sizeof(struct oct_nic_stats_ctrl));
+ ctrl->netdev = netdev;
+ init_completion(&ctrl->complete);
+
+ sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
+ octeon_prepare_soft_command(oct_dev, sc, OPCODE_NIC,
+ OPCODE_NIC_PORT_STATS, 0, 0, 0);
+
+ sc->callback = octnet_nic_stats_callback;
+ sc->callback_arg = sc;
+ sc->wait_time = 500; /*in milli seconds*/
+
+ retval = octeon_send_soft_command(oct_dev, sc);
+ if (retval == IQ_SEND_FAILED) {
+ octeon_free_soft_command(oct_dev, sc);
+ return -EINVAL;
+ }
+
+ wait_for_completion_timeout(&ctrl->complete, msecs_to_jiffies(1000));
+
+ if (resp->status != 1) {
+ octeon_free_soft_command(oct_dev, sc);
+
+ return -EINVAL;
+ }
+
+ octeon_free_soft_command(oct_dev, sc);
+
+ return 0;
+}
+
+static void liquidio_nic_seapi_ctl_callback(struct octeon_device *oct,
+ u32 status,
+ void *buf)
+{
+ struct liquidio_nic_seapi_ctl_context *ctx;
+ struct octeon_soft_command *sc = buf;
+
+ ctx = sc->ctxptr;
+
+ oct = lio_get_device(ctx->octeon_id);
+ if (status) {
+ dev_err(&oct->pci_dev->dev, "%s: instruction failed. Status: %llx\n",
+ __func__,
+ CVM_CAST64(status));
+ }
+ ctx->status = status;
+ complete(&ctx->complete);
+}
+
+int liquidio_set_speed(struct lio *lio, int speed)
+{
+ struct liquidio_nic_seapi_ctl_context *ctx;
+ struct octeon_device *oct = lio->oct_dev;
+ struct oct_nic_seapi_resp *resp;
+ struct octeon_soft_command *sc;
+ union octnet_cmd *ncmd;
+ u32 ctx_size;
+ int retval;
+ u32 var;
+
+ if (oct->speed_setting == speed)
+ return 0;
+
+ if (!OCTEON_CN23XX_PF(oct)) {
+ dev_err(&oct->pci_dev->dev, "%s: SET SPEED only for PF\n",
+ __func__);
+ return -EOPNOTSUPP;
+ }
+
+ ctx_size = sizeof(struct liquidio_nic_seapi_ctl_context);
+ sc = octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE,
+ sizeof(struct oct_nic_seapi_resp),
+ ctx_size);
+ if (!sc)
+ return -ENOMEM;
+
+ ncmd = sc->virtdptr;
+ ctx = sc->ctxptr;
+ resp = sc->virtrptr;
+ memset(resp, 0, sizeof(struct oct_nic_seapi_resp));
+
+ ctx->octeon_id = lio_get_device_id(oct);
+ ctx->status = 0;
+ init_completion(&ctx->complete);
+
+ ncmd->u64 = 0;
+ ncmd->s.cmd = SEAPI_CMD_SPEED_SET;
+ ncmd->s.param1 = speed;
+
+ octeon_swap_8B_data((u64 *)ncmd, (OCTNET_CMD_SIZE >> 3));
+
+ sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
+ octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
+ OPCODE_NIC_UBOOT_CTL, 0, 0, 0);
+
+ sc->callback = liquidio_nic_seapi_ctl_callback;
+ sc->callback_arg = sc;
+ sc->wait_time = 5000;
+
+ retval = octeon_send_soft_command(oct, sc);
+ if (retval == IQ_SEND_FAILED) {
+ dev_info(&oct->pci_dev->dev, "Failed to send soft command\n");
+ retval = -EBUSY;
+ } else {
+ /* Wait for response or timeout */
+ if (wait_for_completion_timeout(&ctx->complete,
+ msecs_to_jiffies(10000)) == 0) {
+ dev_err(&oct->pci_dev->dev, "%s: sc timeout\n",
+ __func__);
+ octeon_free_soft_command(oct, sc);
+ return -EINTR;
+ }
+
+ retval = resp->status;
+
+ if (retval) {
+ dev_err(&oct->pci_dev->dev, "%s failed, retval=%d\n",
+ __func__, retval);
+ octeon_free_soft_command(oct, sc);
+ return -EIO;
+ }
+
+ var = be32_to_cpu((__force __be32)resp->speed);
+ if (var != speed) {
+ dev_err(&oct->pci_dev->dev,
+ "%s: setting failed speed= %x, expect %x\n",
+ __func__, var, speed);
+ }
+
+ oct->speed_setting = var;
+ }
+
+ octeon_free_soft_command(oct, sc);
+
+ return retval;
+}
+
+int liquidio_get_speed(struct lio *lio)
+{
+ struct liquidio_nic_seapi_ctl_context *ctx;
+ struct octeon_device *oct = lio->oct_dev;
+ struct oct_nic_seapi_resp *resp;
+ struct octeon_soft_command *sc;
+ union octnet_cmd *ncmd;
+ u32 ctx_size;
+ int retval;
+
+ ctx_size = sizeof(struct liquidio_nic_seapi_ctl_context);
+ sc = octeon_alloc_soft_command(oct, OCTNET_CMD_SIZE,
+ sizeof(struct oct_nic_seapi_resp),
+ ctx_size);
+ if (!sc)
+ return -ENOMEM;
+
+ ncmd = sc->virtdptr;
+ ctx = sc->ctxptr;
+ resp = sc->virtrptr;
+ memset(resp, 0, sizeof(struct oct_nic_seapi_resp));
+
+ ctx->octeon_id = lio_get_device_id(oct);
+ ctx->status = 0;
+ init_completion(&ctx->complete);
+
+ ncmd->u64 = 0;
+ ncmd->s.cmd = SEAPI_CMD_SPEED_GET;
+
+ octeon_swap_8B_data((u64 *)ncmd, (OCTNET_CMD_SIZE >> 3));
+
+ sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
+ octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
+ OPCODE_NIC_UBOOT_CTL, 0, 0, 0);
+
+ sc->callback = liquidio_nic_seapi_ctl_callback;
+ sc->callback_arg = sc;
+ sc->wait_time = 5000;
+
+ retval = octeon_send_soft_command(oct, sc);
+ if (retval == IQ_SEND_FAILED) {
+ dev_info(&oct->pci_dev->dev, "Failed to send soft command\n");
+ oct->no_speed_setting = 1;
+ oct->speed_setting = 25;
+
+ retval = -EBUSY;
+ } else {
+ if (wait_for_completion_timeout(&ctx->complete,
+ msecs_to_jiffies(10000)) == 0) {
+ dev_err(&oct->pci_dev->dev, "%s: sc timeout\n",
+ __func__);
+
+ oct->speed_setting = 25;
+ oct->no_speed_setting = 1;
+
+ octeon_free_soft_command(oct, sc);
+
+ return -EINTR;
+ }
+ retval = resp->status;
+ if (retval) {
+ dev_err(&oct->pci_dev->dev,
+ "%s failed retval=%d\n", __func__, retval);
+ oct->no_speed_setting = 1;
+ oct->speed_setting = 25;
+ octeon_free_soft_command(oct, sc);
+ retval = -EIO;
+ } else {
+ u32 var;
+
+ var = be32_to_cpu((__force __be32)resp->speed);
+ oct->speed_setting = var;
+ if (var == 0xffff) {
+ oct->no_speed_setting = 1;
+ /* unable to access boot variables
+ * get the default value based on the NIC type
+ */
+ oct->speed_setting = 25;
+ }
+ }
+ }
+
+ octeon_free_soft_command(oct, sc);
+
+ return retval;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index 550ac29..06f7449 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -32,7 +32,6 @@
#include "cn23xx_vf_device.h"
static int lio_reset_queues(struct net_device *netdev, uint32_t num_qs);
-static int octnet_get_link_stats(struct net_device *netdev);
struct oct_intrmod_context {
int octeon_id;
@@ -96,11 +95,9 @@
"tx_packets",
"rx_bytes",
"tx_bytes",
- "rx_errors", /*jabber_err+l2_err+frame_err */
- "tx_errors", /*fw_err_pko+fw_err_link+fw_err_drop */
- "rx_dropped", /*st->fromwire.total_rcvd - st->fromwire.fw_total_rcvd +
- *st->fromwire.dmac_drop + st->fromwire.fw_err_drop
- */
+ "rx_errors",
+ "tx_errors",
+ "rx_dropped",
"tx_dropped",
"tx_total_sent",
@@ -115,14 +112,17 @@
"tx_tso_err",
"tx_vxlan",
+ "tx_mcast",
+ "tx_bcast",
+
"mac_tx_total_pkts",
"mac_tx_total_bytes",
"mac_tx_mcast_pkts",
"mac_tx_bcast_pkts",
- "mac_tx_ctl_packets", /*oct->link_stats.fromhost.ctl_sent */
+ "mac_tx_ctl_packets",
"mac_tx_total_collisions",
"mac_tx_one_collision",
- "mac_tx_multi_collison",
+ "mac_tx_multi_collision",
"mac_tx_max_collision_fail",
"mac_tx_max_deferal_fail",
"mac_tx_fifo_err",
@@ -130,6 +130,8 @@
"rx_total_rcvd",
"rx_total_fwd",
+ "rx_mcast",
+ "rx_bcast",
"rx_jabber_err",
"rx_l2_err",
"rx_frame_err",
@@ -170,17 +172,21 @@
"tx_packets",
"rx_bytes",
"tx_bytes",
- "rx_errors", /* jabber_err + l2_err+frame_err */
- "tx_errors", /* fw_err_pko + fw_err_link+fw_err_drop */
- "rx_dropped", /* total_rcvd - fw_total_rcvd + dmac_drop + fw_err_drop */
+ "rx_errors",
+ "tx_errors",
+ "rx_dropped",
"tx_dropped",
+ "rx_mcast",
+ "tx_mcast",
+ "rx_bcast",
+ "tx_bcast",
"link_state_changes",
};
/* statistics of host tx queue */
static const char oct_iq_stats_strings[][ETH_GSTRING_LEN] = {
- "packets", /*oct->instr_queue[iq_no]->stats.tx_done*/
- "bytes", /*oct->instr_queue[iq_no]->stats.tx_tot_bytes*/
+ "packets",
+ "bytes",
"dropped",
"iq_busy",
"sgentry_sent",
@@ -197,13 +203,9 @@
/* statistics of host rx queue */
static const char oct_droq_stats_strings[][ETH_GSTRING_LEN] = {
- "packets", /*oct->droq[oq_no]->stats.rx_pkts_received */
- "bytes", /*oct->droq[oq_no]->stats.rx_bytes_received */
- "dropped", /*oct->droq[oq_no]->stats.rx_dropped+
- *oct->droq[oq_no]->stats.dropped_nodispatch+
- *oct->droq[oq_no]->stats.dropped_toomany+
- *oct->droq[oq_no]->stats.dropped_nomem
- */
+ "packets",
+ "bytes",
+ "dropped",
"dropped_nomem",
"dropped_toomany",
"fw_dropped",
@@ -228,44 +230,145 @@
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct = lio->oct_dev;
struct oct_link_info *linfo;
- u32 supported = 0, advertising = 0;
linfo = &lio->linfo;
+ ethtool_link_ksettings_zero_link_mode(ecmd, supported);
+ ethtool_link_ksettings_zero_link_mode(ecmd, advertising);
+
switch (linfo->link.s.phy_type) {
case LIO_PHY_PORT_TP:
ecmd->base.port = PORT_TP;
- supported = (SUPPORTED_10000baseT_Full |
- SUPPORTED_TP | SUPPORTED_Pause);
- advertising = (ADVERTISED_10000baseT_Full | ADVERTISED_Pause);
ecmd->base.autoneg = AUTONEG_DISABLE;
+ ethtool_link_ksettings_add_link_mode(ecmd, supported, TP);
+ ethtool_link_ksettings_add_link_mode(ecmd, supported, Pause);
+ ethtool_link_ksettings_add_link_mode(ecmd, supported,
+ 10000baseT_Full);
+
+ ethtool_link_ksettings_add_link_mode(ecmd, advertising, Pause);
+ ethtool_link_ksettings_add_link_mode(ecmd, advertising,
+ 10000baseT_Full);
+
break;
case LIO_PHY_PORT_FIBRE:
- ecmd->base.port = PORT_FIBRE;
-
- if (linfo->link.s.speed == SPEED_10000) {
- supported = SUPPORTED_10000baseT_Full;
- advertising = ADVERTISED_10000baseT_Full;
+ if (linfo->link.s.if_mode == INTERFACE_MODE_XAUI ||
+ linfo->link.s.if_mode == INTERFACE_MODE_RXAUI ||
+ linfo->link.s.if_mode == INTERFACE_MODE_XLAUI ||
+ linfo->link.s.if_mode == INTERFACE_MODE_XFI) {
+ dev_dbg(&oct->pci_dev->dev, "ecmd->base.transceiver is XCVR_EXTERNAL\n");
+ } else {
+ dev_err(&oct->pci_dev->dev, "Unknown link interface mode: %d\n",
+ linfo->link.s.if_mode);
}
- supported |= SUPPORTED_FIBRE | SUPPORTED_Pause;
- advertising |= ADVERTISED_Pause;
+ ecmd->base.port = PORT_FIBRE;
ecmd->base.autoneg = AUTONEG_DISABLE;
- break;
- }
+ ethtool_link_ksettings_add_link_mode(ecmd, supported, FIBRE);
- if (linfo->link.s.if_mode == INTERFACE_MODE_XAUI ||
- linfo->link.s.if_mode == INTERFACE_MODE_RXAUI ||
- linfo->link.s.if_mode == INTERFACE_MODE_XLAUI ||
- linfo->link.s.if_mode == INTERFACE_MODE_XFI) {
- ethtool_convert_legacy_u32_to_link_mode(
- ecmd->link_modes.supported, supported);
- ethtool_convert_legacy_u32_to_link_mode(
- ecmd->link_modes.advertising, advertising);
- } else {
- dev_err(&oct->pci_dev->dev, "Unknown link interface reported %d\n",
- linfo->link.s.if_mode);
+ ethtool_link_ksettings_add_link_mode(ecmd, supported, Pause);
+ ethtool_link_ksettings_add_link_mode(ecmd, advertising, Pause);
+ if (oct->subsystem_id == OCTEON_CN2350_25GB_SUBSYS_ID ||
+ oct->subsystem_id == OCTEON_CN2360_25GB_SUBSYS_ID) {
+ if (OCTEON_CN23XX_PF(oct)) {
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported, 25000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported, 25000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported, 25000baseCR_Full);
+
+ if (oct->no_speed_setting == 0) {
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 10000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 10000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 10000baseCR_Full);
+ }
+
+ if (oct->no_speed_setting == 0)
+ liquidio_get_speed(lio);
+ else
+ oct->speed_setting = 25;
+
+ if (oct->speed_setting == 10) {
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 10000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 10000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 10000baseCR_Full);
+ }
+ if (oct->speed_setting == 25) {
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 25000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 25000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 25000baseCR_Full);
+ }
+ } else { /* VF */
+ if (linfo->link.s.speed == 10000) {
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 10000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 10000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 10000baseCR_Full);
+
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 10000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 10000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 10000baseCR_Full);
+ }
+
+ if (linfo->link.s.speed == 25000) {
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 25000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 25000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, supported,
+ 25000baseCR_Full);
+
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 25000baseSR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 25000baseKR_Full);
+ ethtool_link_ksettings_add_link_mode
+ (ecmd, advertising,
+ 25000baseCR_Full);
+ }
+ }
+ } else {
+ ethtool_link_ksettings_add_link_mode(ecmd, supported,
+ 10000baseT_Full);
+ ethtool_link_ksettings_add_link_mode(ecmd, advertising,
+ 10000baseT_Full);
+ }
+ break;
}
if (linfo->link.s.link_up) {
@@ -279,6 +382,51 @@
return 0;
}
+static int lio_set_link_ksettings(struct net_device *netdev,
+ const struct ethtool_link_ksettings *ecmd)
+{
+ const int speed = ecmd->base.speed;
+ struct lio *lio = GET_LIO(netdev);
+ struct oct_link_info *linfo;
+ struct octeon_device *oct;
+ u32 is25G = 0;
+
+ oct = lio->oct_dev;
+
+ linfo = &lio->linfo;
+
+ if (oct->subsystem_id == OCTEON_CN2350_25GB_SUBSYS_ID ||
+ oct->subsystem_id == OCTEON_CN2360_25GB_SUBSYS_ID) {
+ is25G = 1;
+ } else {
+ return -EOPNOTSUPP;
+ }
+
+ if (oct->no_speed_setting) {
+ dev_err(&oct->pci_dev->dev, "%s: Changing speed is not supported\n",
+ __func__);
+ return -EOPNOTSUPP;
+ }
+
+ if ((ecmd->base.duplex != DUPLEX_UNKNOWN &&
+ ecmd->base.duplex != linfo->link.s.duplex) ||
+ ecmd->base.autoneg != AUTONEG_DISABLE ||
+ (ecmd->base.speed != 10000 && ecmd->base.speed != 25000 &&
+ ecmd->base.speed != SPEED_UNKNOWN))
+ return -EOPNOTSUPP;
+
+ if ((oct->speed_boot == speed / 1000) &&
+ oct->speed_boot == oct->speed_setting)
+ return 0;
+
+ liquidio_set_speed(lio, speed / 1000);
+
+ dev_dbg(&oct->pci_dev->dev, "Port speed is set to %dG\n",
+ oct->speed_setting);
+
+ return 0;
+}
+
static void
lio_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
{
@@ -359,7 +507,14 @@
rx_count = CFG_GET_NUM_RXQS_NIC_IF(conf6x, lio->ifidx);
tx_count = CFG_GET_NUM_TXQS_NIC_IF(conf6x, lio->ifidx);
} else if (OCTEON_CN23XX_PF(oct)) {
- max_combined = lio->linfo.num_txpciq;
+ if (oct->sriov_info.sriov_enabled) {
+ max_combined = lio->linfo.num_txpciq;
+ } else {
+ struct octeon_config *conf23_pf =
+ CHIP_CONF(oct, cn23xx_pf);
+
+ max_combined = CFG_GET_IQ_MAX_Q(conf23_pf);
+ }
combined_count = oct->num_iqs;
} else if (OCTEON_CN23XX_VF(oct)) {
u64 reg_val = 0ULL;
@@ -423,9 +578,15 @@
kfree(oct->irq_name_storage);
oct->irq_name_storage = NULL;
+
+ if (octeon_allocate_ioq_vector(oct, num_ioqs)) {
+ dev_err(&oct->pci_dev->dev, "OCTEON: ioq vector allocation failed\n");
+ return -1;
+ }
+
if (octeon_setup_interrupt(oct, num_ioqs)) {
dev_info(&oct->pci_dev->dev, "Setup interrupt failed\n");
- return 1;
+ return -1;
}
/* Enable Octeon device interrupts */
@@ -455,7 +616,16 @@
combined_count = channel->combined_count;
if (OCTEON_CN23XX_PF(oct)) {
- max_combined = channel->max_combined;
+ if (oct->sriov_info.sriov_enabled) {
+ max_combined = lio->linfo.num_txpciq;
+ } else {
+ struct octeon_config *conf23_pf =
+ CHIP_CONF(oct,
+ cn23xx_pf);
+
+ max_combined =
+ CFG_GET_IQ_MAX_Q(conf23_pf);
+ }
} else if (OCTEON_CN23XX_VF(oct)) {
u64 reg_val = 0ULL;
u64 ctrl = CN23XX_VF_SLI_IQ_PKT_CONTROL64(0);
@@ -483,7 +653,6 @@
if (lio_reset_queues(dev, combined_count))
return -EINVAL;
- lio_irq_reallocate_irqs(oct, combined_count);
if (stopped)
dev->netdev_ops->ndo_open(dev);
@@ -822,12 +991,120 @@
ering->rx_jumbo_max_pending = 0;
}
+static int lio_23xx_reconfigure_queue_count(struct lio *lio)
+{
+ struct octeon_device *oct = lio->oct_dev;
+ struct liquidio_if_cfg_context *ctx;
+ u32 resp_size, ctx_size, data_size;
+ struct liquidio_if_cfg_resp *resp;
+ struct octeon_soft_command *sc;
+ union oct_nic_if_cfg if_cfg;
+ struct lio_version *vdata;
+ u32 ifidx_or_pfnum;
+ int retval;
+ int j;
+
+ resp_size = sizeof(struct liquidio_if_cfg_resp);
+ ctx_size = sizeof(struct liquidio_if_cfg_context);
+ data_size = sizeof(struct lio_version);
+ sc = (struct octeon_soft_command *)
+ octeon_alloc_soft_command(oct, data_size,
+ resp_size, ctx_size);
+ if (!sc) {
+ dev_err(&oct->pci_dev->dev, "%s: Failed to allocate soft command\n",
+ __func__);
+ return -1;
+ }
+
+ resp = (struct liquidio_if_cfg_resp *)sc->virtrptr;
+ ctx = (struct liquidio_if_cfg_context *)sc->ctxptr;
+ vdata = (struct lio_version *)sc->virtdptr;
+
+ vdata->major = (__force u16)cpu_to_be16(LIQUIDIO_BASE_MAJOR_VERSION);
+ vdata->minor = (__force u16)cpu_to_be16(LIQUIDIO_BASE_MINOR_VERSION);
+ vdata->micro = (__force u16)cpu_to_be16(LIQUIDIO_BASE_MICRO_VERSION);
+
+ ifidx_or_pfnum = oct->pf_num;
+ WRITE_ONCE(ctx->cond, 0);
+ ctx->octeon_id = lio_get_device_id(oct);
+ init_waitqueue_head(&ctx->wc);
+
+ if_cfg.u64 = 0;
+ if_cfg.s.num_iqueues = oct->sriov_info.num_pf_rings;
+ if_cfg.s.num_oqueues = oct->sriov_info.num_pf_rings;
+ if_cfg.s.base_queue = oct->sriov_info.pf_srn;
+ if_cfg.s.gmx_port_id = oct->pf_num;
+
+ sc->iq_no = 0;
+ octeon_prepare_soft_command(oct, sc, OPCODE_NIC,
+ OPCODE_NIC_QCOUNT_UPDATE, 0,
+ if_cfg.u64, 0);
+ sc->callback = lio_if_cfg_callback;
+ sc->callback_arg = sc;
+ sc->wait_time = LIO_IFCFG_WAIT_TIME;
+
+ retval = octeon_send_soft_command(oct, sc);
+ if (retval == IQ_SEND_FAILED) {
+ dev_err(&oct->pci_dev->dev,
+ "iq/oq config failed status: %x\n",
+ retval);
+ goto qcount_update_fail;
+ }
+
+ if (sleep_cond(&ctx->wc, &ctx->cond) == -EINTR) {
+ dev_err(&oct->pci_dev->dev, "Wait interrupted\n");
+ return -1;
+ }
+
+ retval = resp->status;
+ if (retval) {
+ dev_err(&oct->pci_dev->dev, "iq/oq config failed\n");
+ goto qcount_update_fail;
+ }
+
+ octeon_swap_8B_data((u64 *)(&resp->cfg_info),
+ (sizeof(struct liquidio_if_cfg_info)) >> 3);
+
+ lio->ifidx = ifidx_or_pfnum;
+ lio->linfo.num_rxpciq = hweight64(resp->cfg_info.iqmask);
+ lio->linfo.num_txpciq = hweight64(resp->cfg_info.iqmask);
+ for (j = 0; j < lio->linfo.num_rxpciq; j++) {
+ lio->linfo.rxpciq[j].u64 =
+ resp->cfg_info.linfo.rxpciq[j].u64;
+ }
+
+ for (j = 0; j < lio->linfo.num_txpciq; j++) {
+ lio->linfo.txpciq[j].u64 =
+ resp->cfg_info.linfo.txpciq[j].u64;
+ }
+
+ lio->linfo.hw_addr = resp->cfg_info.linfo.hw_addr;
+ lio->linfo.gmxport = resp->cfg_info.linfo.gmxport;
+ lio->linfo.link.u64 = resp->cfg_info.linfo.link.u64;
+ lio->txq = lio->linfo.txpciq[0].s.q_no;
+ lio->rxq = lio->linfo.rxpciq[0].s.q_no;
+
+ octeon_free_soft_command(oct, sc);
+ dev_info(&oct->pci_dev->dev, "Queue count updated to %d\n",
+ lio->linfo.num_rxpciq);
+
+ return 0;
+
+qcount_update_fail:
+ octeon_free_soft_command(oct, sc);
+
+ return -1;
+}
+
static int lio_reset_queues(struct net_device *netdev, uint32_t num_qs)
{
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct = lio->oct_dev;
+ int i, queue_count_update = 0;
struct napi_struct *napi, *n;
- int i, update = 0;
+ int ret;
+
+ schedule_timeout_uninterruptible(msecs_to_jiffies(100));
if (wait_for_pending_requests(oct))
dev_err(&oct->pci_dev->dev, "There were pending requests\n");
@@ -836,7 +1113,7 @@
dev_err(&oct->pci_dev->dev, "IQ had pending instructions\n");
if (octeon_set_io_queues_off(oct)) {
- dev_err(&oct->pci_dev->dev, "setting io queues off failed\n");
+ dev_err(&oct->pci_dev->dev, "Setting io queues off failed\n");
return -1;
}
@@ -849,9 +1126,40 @@
netif_napi_del(napi);
if (num_qs != oct->num_iqs) {
- netif_set_real_num_rx_queues(netdev, num_qs);
- netif_set_real_num_tx_queues(netdev, num_qs);
- update = 1;
+ ret = netif_set_real_num_rx_queues(netdev, num_qs);
+ if (ret) {
+ dev_err(&oct->pci_dev->dev,
+ "Setting real number rx failed\n");
+ return ret;
+ }
+
+ ret = netif_set_real_num_tx_queues(netdev, num_qs);
+ if (ret) {
+ dev_err(&oct->pci_dev->dev,
+ "Setting real number tx failed\n");
+ return ret;
+ }
+
+ /* The value of queue_count_update decides whether it is the
+ * queue count or the descriptor count that is being
+ * re-configured.
+ */
+ queue_count_update = 1;
+ }
+
+ /* Re-configuration of queues can happen in two scenarios, SRIOV enabled
+ * and SRIOV disabled. Few things like recreating queue zero, resetting
+ * glists and IRQs are required for both. For the latter, some more
+ * steps like updating sriov_info for the octeon device need to be done.
+ */
+ if (queue_count_update) {
+ lio_delete_glists(lio);
+
+ /* Delete mbox for PF which is SRIOV disabled because sriov_info
+ * will be now changed.
+ */
+ if ((OCTEON_CN23XX_PF(oct)) && !oct->sriov_info.sriov_enabled)
+ oct->fn_list.free_mbox(oct);
}
for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) {
@@ -866,24 +1174,91 @@
octeon_delete_instr_queue(oct, i);
}
+ if (queue_count_update) {
+ /* For PF re-configure sriov related information */
+ if ((OCTEON_CN23XX_PF(oct)) &&
+ !oct->sriov_info.sriov_enabled) {
+ oct->sriov_info.num_pf_rings = num_qs;
+ if (cn23xx_sriov_config(oct)) {
+ dev_err(&oct->pci_dev->dev,
+ "Queue reset aborted: SRIOV config failed\n");
+ return -1;
+ }
+
+ num_qs = oct->sriov_info.num_pf_rings;
+ }
+ }
+
if (oct->fn_list.setup_device_regs(oct)) {
dev_err(&oct->pci_dev->dev, "Failed to configure device registers\n");
return -1;
}
+ /* The following are needed in case of queue count re-configuration and
+ * not for descriptor count re-configuration.
+ */
+ if (queue_count_update) {
+ if (octeon_setup_instr_queues(oct))
+ return -1;
+
+ if (octeon_setup_output_queues(oct))
+ return -1;
+
+ /* Recreating mbox for PF that is SRIOV disabled */
+ if (OCTEON_CN23XX_PF(oct) && !oct->sriov_info.sriov_enabled) {
+ if (oct->fn_list.setup_mbox(oct)) {
+ dev_err(&oct->pci_dev->dev, "Mailbox setup failed\n");
+ return -1;
+ }
+ }
+
+ /* Deleting and recreating IRQs whether the interface is SRIOV
+ * enabled or disabled.
+ */
+ if (lio_irq_reallocate_irqs(oct, num_qs)) {
+ dev_err(&oct->pci_dev->dev, "IRQs could not be allocated\n");
+ return -1;
+ }
+
+ /* Enable the input and output queues for this Octeon device */
+ if (oct->fn_list.enable_io_queues(oct)) {
+ dev_err(&oct->pci_dev->dev, "Failed to enable input/output queues\n");
+ return -1;
+ }
+
+ for (i = 0; i < oct->num_oqs; i++)
+ writel(oct->droq[i]->max_count,
+ oct->droq[i]->pkts_credit_reg);
+
+ /* Informing firmware about the new queue count. It is required
+ * for firmware to allocate more number of queues than those at
+ * load time.
+ */
+ if (OCTEON_CN23XX_PF(oct) && !oct->sriov_info.sriov_enabled) {
+ if (lio_23xx_reconfigure_queue_count(lio))
+ return -1;
+ }
+ }
+
+ /* Once firmware is aware of the new value, queues can be recreated */
if (liquidio_setup_io_queues(oct, 0, num_qs, num_qs)) {
- dev_err(&oct->pci_dev->dev, "IO queues initialization failed\n");
+ dev_err(&oct->pci_dev->dev, "I/O queues creation failed\n");
return -1;
}
- /* Enable the input and output queues for this Octeon device */
- if (oct->fn_list.enable_io_queues(oct)) {
- dev_err(&oct->pci_dev->dev, "Failed to enable input/output queues");
- return -1;
- }
+ if (queue_count_update) {
+ if (lio_setup_glists(oct, lio, num_qs)) {
+ dev_err(&oct->pci_dev->dev, "Gather list allocation failed\n");
+ return -1;
+ }
- if (update && lio_send_queue_count_update(netdev, num_qs))
- return -1;
+ /* Send firmware the information about new number of queues
+ * if the interface is a VF or a PF that is SRIOV enabled.
+ */
+ if (oct->sriov_info.sriov_enabled || OCTEON_CN23XX_VF(oct))
+ if (lio_send_queue_count_update(netdev, num_qs))
+ return -1;
+ }
return 0;
}
@@ -928,7 +1303,7 @@
CFG_SET_NUM_RX_DESCS_NIC_IF(octeon_get_conf(oct), lio->ifidx,
rx_count);
- if (lio_reset_queues(netdev, lio->linfo.num_txpciq))
+ if (lio_reset_queues(netdev, oct->num_iqs))
goto err_lio_reset_queues;
if (stopped)
@@ -1063,33 +1438,48 @@
{
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct_dev = lio->oct_dev;
- struct net_device_stats *netstats = &netdev->stats;
+ struct rtnl_link_stats64 lstats;
int i = 0, j;
if (ifstate_check(lio, LIO_IFSTATE_RESETTING))
return;
- netdev->netdev_ops->ndo_get_stats(netdev);
- octnet_get_link_stats(netdev);
-
+ netdev->netdev_ops->ndo_get_stats64(netdev, &lstats);
/*sum of oct->droq[oq_no]->stats->rx_pkts_received */
- data[i++] = CVM_CAST64(netstats->rx_packets);
+ data[i++] = lstats.rx_packets;
/*sum of oct->instr_queue[iq_no]->stats.tx_done */
- data[i++] = CVM_CAST64(netstats->tx_packets);
+ data[i++] = lstats.tx_packets;
/*sum of oct->droq[oq_no]->stats->rx_bytes_received */
- data[i++] = CVM_CAST64(netstats->rx_bytes);
+ data[i++] = lstats.rx_bytes;
/*sum of oct->instr_queue[iq_no]->stats.tx_tot_bytes */
- data[i++] = CVM_CAST64(netstats->tx_bytes);
- data[i++] = CVM_CAST64(netstats->rx_errors);
- data[i++] = CVM_CAST64(netstats->tx_errors);
+ data[i++] = lstats.tx_bytes;
+ data[i++] = lstats.rx_errors +
+ oct_dev->link_stats.fromwire.fcs_err +
+ oct_dev->link_stats.fromwire.jabber_err +
+ oct_dev->link_stats.fromwire.l2_err +
+ oct_dev->link_stats.fromwire.frame_err;
+ data[i++] = lstats.tx_errors;
/*sum of oct->droq[oq_no]->stats->rx_dropped +
*oct->droq[oq_no]->stats->dropped_nodispatch +
*oct->droq[oq_no]->stats->dropped_toomany +
*oct->droq[oq_no]->stats->dropped_nomem
*/
- data[i++] = CVM_CAST64(netstats->rx_dropped);
+ data[i++] = lstats.rx_dropped +
+ oct_dev->link_stats.fromwire.fifo_err +
+ oct_dev->link_stats.fromwire.dmac_drop +
+ oct_dev->link_stats.fromwire.red_drops +
+ oct_dev->link_stats.fromwire.fw_err_pko +
+ oct_dev->link_stats.fromwire.fw_err_link +
+ oct_dev->link_stats.fromwire.fw_err_drop;
/*sum of oct->instr_queue[iq_no]->stats.tx_dropped */
- data[i++] = CVM_CAST64(netstats->tx_dropped);
+ data[i++] = lstats.tx_dropped +
+ oct_dev->link_stats.fromhost.max_collision_fail +
+ oct_dev->link_stats.fromhost.max_deferral_fail +
+ oct_dev->link_stats.fromhost.total_collisions +
+ oct_dev->link_stats.fromhost.fw_err_pko +
+ oct_dev->link_stats.fromhost.fw_err_link +
+ oct_dev->link_stats.fromhost.fw_err_drop +
+ oct_dev->link_stats.fromhost.fw_err_pki;
/* firmware tx stats */
/*per_core_stats[cvmx_get_core_num()].link_stats[mdata->from_ifidx].
@@ -1124,6 +1514,10 @@
*/
data[i++] = CVM_CAST64(oct_dev->link_stats.fromhost.fw_tx_vxlan);
+ /* Multicast packets sent by this port */
+ data[i++] = oct_dev->link_stats.fromhost.fw_total_mcast_sent;
+ data[i++] = oct_dev->link_stats.fromhost.fw_total_bcast_sent;
+
/* mac tx statistics */
/*CVMX_BGXX_CMRX_TX_STAT5 */
data[i++] = CVM_CAST64(oct_dev->link_stats.fromhost.total_pkts_sent);
@@ -1160,6 +1554,9 @@
*fw_total_fwd
*/
data[i++] = CVM_CAST64(oct_dev->link_stats.fromwire.fw_total_fwd);
+ /* Multicast packets received on this port */
+ data[i++] = oct_dev->link_stats.fromwire.fw_total_mcast;
+ data[i++] = oct_dev->link_stats.fromwire.fw_total_bcast;
/*per_core_stats[core_id].link_stats[ifidx].fromwire.jabber_err */
data[i++] = CVM_CAST64(oct_dev->link_stats.fromwire.jabber_err);
/*per_core_stats[core_id].link_stats[ifidx].fromwire.l2_err */
@@ -1328,7 +1725,7 @@
__attribute__((unused)),
u64 *data)
{
- struct net_device_stats *netstats = &netdev->stats;
+ struct rtnl_link_stats64 lstats;
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct_dev = lio->oct_dev;
int i = 0, j, vj;
@@ -1336,25 +1733,31 @@
if (ifstate_check(lio, LIO_IFSTATE_RESETTING))
return;
- netdev->netdev_ops->ndo_get_stats(netdev);
+ netdev->netdev_ops->ndo_get_stats64(netdev, &lstats);
/* sum of oct->droq[oq_no]->stats->rx_pkts_received */
- data[i++] = CVM_CAST64(netstats->rx_packets);
+ data[i++] = lstats.rx_packets;
/* sum of oct->instr_queue[iq_no]->stats.tx_done */
- data[i++] = CVM_CAST64(netstats->tx_packets);
+ data[i++] = lstats.tx_packets;
/* sum of oct->droq[oq_no]->stats->rx_bytes_received */
- data[i++] = CVM_CAST64(netstats->rx_bytes);
+ data[i++] = lstats.rx_bytes;
/* sum of oct->instr_queue[iq_no]->stats.tx_tot_bytes */
- data[i++] = CVM_CAST64(netstats->tx_bytes);
- data[i++] = CVM_CAST64(netstats->rx_errors);
- data[i++] = CVM_CAST64(netstats->tx_errors);
+ data[i++] = lstats.tx_bytes;
+ data[i++] = lstats.rx_errors;
+ data[i++] = lstats.tx_errors;
/* sum of oct->droq[oq_no]->stats->rx_dropped +
* oct->droq[oq_no]->stats->dropped_nodispatch +
* oct->droq[oq_no]->stats->dropped_toomany +
* oct->droq[oq_no]->stats->dropped_nomem
*/
- data[i++] = CVM_CAST64(netstats->rx_dropped);
+ data[i++] = lstats.rx_dropped;
/* sum of oct->instr_queue[iq_no]->stats.tx_dropped */
- data[i++] = CVM_CAST64(netstats->tx_dropped);
+ data[i++] = lstats.tx_dropped;
+
+ data[i++] = oct_dev->link_stats.fromwire.fw_total_mcast;
+ data[i++] = oct_dev->link_stats.fromhost.fw_total_mcast_sent;
+ data[i++] = oct_dev->link_stats.fromwire.fw_total_bcast;
+ data[i++] = oct_dev->link_stats.fromhost.fw_total_bcast_sent;
+
/* lio->link_changes */
data[i++] = CVM_CAST64(lio->link_changes);
@@ -1765,161 +2168,6 @@
return -EINTR;
}
-static void
-octnet_nic_stats_callback(struct octeon_device *oct_dev,
- u32 status, void *ptr)
-{
- struct octeon_soft_command *sc = (struct octeon_soft_command *)ptr;
- struct oct_nic_stats_resp *resp =
- (struct oct_nic_stats_resp *)sc->virtrptr;
- struct oct_nic_stats_ctrl *ctrl =
- (struct oct_nic_stats_ctrl *)sc->ctxptr;
- struct nic_rx_stats *rsp_rstats = &resp->stats.fromwire;
- struct nic_tx_stats *rsp_tstats = &resp->stats.fromhost;
-
- struct nic_rx_stats *rstats = &oct_dev->link_stats.fromwire;
- struct nic_tx_stats *tstats = &oct_dev->link_stats.fromhost;
-
- if ((status != OCTEON_REQUEST_TIMEOUT) && !resp->status) {
- octeon_swap_8B_data((u64 *)&resp->stats,
- (sizeof(struct oct_link_stats)) >> 3);
-
- /* RX link-level stats */
- rstats->total_rcvd = rsp_rstats->total_rcvd;
- rstats->bytes_rcvd = rsp_rstats->bytes_rcvd;
- rstats->total_bcst = rsp_rstats->total_bcst;
- rstats->total_mcst = rsp_rstats->total_mcst;
- rstats->runts = rsp_rstats->runts;
- rstats->ctl_rcvd = rsp_rstats->ctl_rcvd;
- /* Accounts for over/under-run of buffers */
- rstats->fifo_err = rsp_rstats->fifo_err;
- rstats->dmac_drop = rsp_rstats->dmac_drop;
- rstats->fcs_err = rsp_rstats->fcs_err;
- rstats->jabber_err = rsp_rstats->jabber_err;
- rstats->l2_err = rsp_rstats->l2_err;
- rstats->frame_err = rsp_rstats->frame_err;
-
- /* RX firmware stats */
- rstats->fw_total_rcvd = rsp_rstats->fw_total_rcvd;
- rstats->fw_total_fwd = rsp_rstats->fw_total_fwd;
- rstats->fw_err_pko = rsp_rstats->fw_err_pko;
- rstats->fw_err_link = rsp_rstats->fw_err_link;
- rstats->fw_err_drop = rsp_rstats->fw_err_drop;
- rstats->fw_rx_vxlan = rsp_rstats->fw_rx_vxlan;
- rstats->fw_rx_vxlan_err = rsp_rstats->fw_rx_vxlan_err;
-
- /* Number of packets that are LROed */
- rstats->fw_lro_pkts = rsp_rstats->fw_lro_pkts;
- /* Number of octets that are LROed */
- rstats->fw_lro_octs = rsp_rstats->fw_lro_octs;
- /* Number of LRO packets formed */
- rstats->fw_total_lro = rsp_rstats->fw_total_lro;
- /* Number of times lRO of packet aborted */
- rstats->fw_lro_aborts = rsp_rstats->fw_lro_aborts;
- rstats->fw_lro_aborts_port = rsp_rstats->fw_lro_aborts_port;
- rstats->fw_lro_aborts_seq = rsp_rstats->fw_lro_aborts_seq;
- rstats->fw_lro_aborts_tsval = rsp_rstats->fw_lro_aborts_tsval;
- rstats->fw_lro_aborts_timer = rsp_rstats->fw_lro_aborts_timer;
- /* intrmod: packet forward rate */
- rstats->fwd_rate = rsp_rstats->fwd_rate;
-
- /* TX link-level stats */
- tstats->total_pkts_sent = rsp_tstats->total_pkts_sent;
- tstats->total_bytes_sent = rsp_tstats->total_bytes_sent;
- tstats->mcast_pkts_sent = rsp_tstats->mcast_pkts_sent;
- tstats->bcast_pkts_sent = rsp_tstats->bcast_pkts_sent;
- tstats->ctl_sent = rsp_tstats->ctl_sent;
- /* Packets sent after one collision*/
- tstats->one_collision_sent = rsp_tstats->one_collision_sent;
- /* Packets sent after multiple collision*/
- tstats->multi_collision_sent = rsp_tstats->multi_collision_sent;
- /* Packets not sent due to max collisions */
- tstats->max_collision_fail = rsp_tstats->max_collision_fail;
- /* Packets not sent due to max deferrals */
- tstats->max_deferral_fail = rsp_tstats->max_deferral_fail;
- /* Accounts for over/under-run of buffers */
- tstats->fifo_err = rsp_tstats->fifo_err;
- tstats->runts = rsp_tstats->runts;
- /* Total number of collisions detected */
- tstats->total_collisions = rsp_tstats->total_collisions;
-
- /* firmware stats */
- tstats->fw_total_sent = rsp_tstats->fw_total_sent;
- tstats->fw_total_fwd = rsp_tstats->fw_total_fwd;
- tstats->fw_err_pko = rsp_tstats->fw_err_pko;
- tstats->fw_err_pki = rsp_tstats->fw_err_pki;
- tstats->fw_err_link = rsp_tstats->fw_err_link;
- tstats->fw_err_drop = rsp_tstats->fw_err_drop;
- tstats->fw_tso = rsp_tstats->fw_tso;
- tstats->fw_tso_fwd = rsp_tstats->fw_tso_fwd;
- tstats->fw_err_tso = rsp_tstats->fw_err_tso;
- tstats->fw_tx_vxlan = rsp_tstats->fw_tx_vxlan;
-
- resp->status = 1;
- } else {
- resp->status = -1;
- }
- complete(&ctrl->complete);
-}
-
-/* Configure interrupt moderation parameters */
-static int octnet_get_link_stats(struct net_device *netdev)
-{
- struct lio *lio = GET_LIO(netdev);
- struct octeon_device *oct_dev = lio->oct_dev;
-
- struct octeon_soft_command *sc;
- struct oct_nic_stats_ctrl *ctrl;
- struct oct_nic_stats_resp *resp;
-
- int retval;
-
- /* Alloc soft command */
- sc = (struct octeon_soft_command *)
- octeon_alloc_soft_command(oct_dev,
- 0,
- sizeof(struct oct_nic_stats_resp),
- sizeof(struct octnic_ctrl_pkt));
-
- if (!sc)
- return -ENOMEM;
-
- resp = (struct oct_nic_stats_resp *)sc->virtrptr;
- memset(resp, 0, sizeof(struct oct_nic_stats_resp));
-
- ctrl = (struct oct_nic_stats_ctrl *)sc->ctxptr;
- memset(ctrl, 0, sizeof(struct oct_nic_stats_ctrl));
- ctrl->netdev = netdev;
- init_completion(&ctrl->complete);
-
- sc->iq_no = lio->linfo.txpciq[0].s.q_no;
-
- octeon_prepare_soft_command(oct_dev, sc, OPCODE_NIC,
- OPCODE_NIC_PORT_STATS, 0, 0, 0);
-
- sc->callback = octnet_nic_stats_callback;
- sc->callback_arg = sc;
- sc->wait_time = 500; /*in milli seconds*/
-
- retval = octeon_send_soft_command(oct_dev, sc);
- if (retval == IQ_SEND_FAILED) {
- octeon_free_soft_command(oct_dev, sc);
- return -EINVAL;
- }
-
- wait_for_completion_timeout(&ctrl->complete, msecs_to_jiffies(1000));
-
- if (resp->status != 1) {
- octeon_free_soft_command(oct_dev, sc);
-
- return -EINVAL;
- }
-
- octeon_free_soft_command(oct_dev, sc);
-
- return 0;
-}
-
static int lio_get_intr_coalesce(struct net_device *netdev,
struct ethtool_coalesce *intr_coal)
{
@@ -2864,6 +3112,7 @@
static const struct ethtool_ops lio_ethtool_ops = {
.get_link_ksettings = lio_get_link_ksettings,
+ .set_link_ksettings = lio_set_link_ksettings,
.get_link = ethtool_op_get_link,
.get_drvinfo = lio_get_drvinfo,
.get_ringparam = lio_ethtool_get_ringparam,
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 603a144..e500528 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -138,33 +138,10 @@
* by this structure in the NIC module.
*/
-#define OCTNIC_MAX_SG (MAX_SKB_FRAGS)
-
#define OCTNIC_GSO_MAX_HEADER_SIZE 128
#define OCTNIC_GSO_MAX_SIZE \
(CN23XX_DEFAULT_INPUT_JABBER - OCTNIC_GSO_MAX_HEADER_SIZE)
-/** Structure of a node in list of gather components maintained by
- * NIC driver for each network device.
- */
-struct octnic_gather {
- /** List manipulation. Next and prev pointers. */
- struct list_head list;
-
- /** Size of the gather component at sg in bytes. */
- int sg_size;
-
- /** Number of bytes that sg was adjusted to make it 8B-aligned. */
- int adjust;
-
- /** Gather component that can accommodate max sized fragment list
- * received from the IP layer.
- */
- struct octeon_sg_entry *sg;
-
- dma_addr_t sg_dma_ptr;
-};
-
struct handshake {
struct completion init;
struct completion started;
@@ -520,7 +497,7 @@
*/
static inline int check_txq_status(struct lio *lio)
{
- int numqs = lio->netdev->num_tx_queues;
+ int numqs = lio->netdev->real_num_tx_queues;
int ret_val = 0;
int q, iq;
@@ -542,148 +519,6 @@
}
/**
- * Remove the node at the head of the list. The list would be empty at
- * the end of this call if there are no more nodes in the list.
- */
-static inline struct list_head *list_delete_head(struct list_head *root)
-{
- struct list_head *node;
-
- if ((root->prev == root) && (root->next == root))
- node = NULL;
- else
- node = root->next;
-
- if (node)
- list_del(node);
-
- return node;
-}
-
-/**
- * \brief Delete gather lists
- * @param lio per-network private data
- */
-static void delete_glists(struct lio *lio)
-{
- struct octnic_gather *g;
- int i;
-
- kfree(lio->glist_lock);
- lio->glist_lock = NULL;
-
- if (!lio->glist)
- return;
-
- for (i = 0; i < lio->linfo.num_txpciq; i++) {
- do {
- g = (struct octnic_gather *)
- list_delete_head(&lio->glist[i]);
- if (g)
- kfree(g);
- } while (g);
-
- if (lio->glists_virt_base && lio->glists_virt_base[i] &&
- lio->glists_dma_base && lio->glists_dma_base[i]) {
- lio_dma_free(lio->oct_dev,
- lio->glist_entry_size * lio->tx_qsize,
- lio->glists_virt_base[i],
- lio->glists_dma_base[i]);
- }
- }
-
- kfree(lio->glists_virt_base);
- lio->glists_virt_base = NULL;
-
- kfree(lio->glists_dma_base);
- lio->glists_dma_base = NULL;
-
- kfree(lio->glist);
- lio->glist = NULL;
-}
-
-/**
- * \brief Setup gather lists
- * @param lio per-network private data
- */
-static int setup_glists(struct octeon_device *oct, struct lio *lio, int num_iqs)
-{
- int i, j;
- struct octnic_gather *g;
-
- lio->glist_lock = kcalloc(num_iqs, sizeof(*lio->glist_lock),
- GFP_KERNEL);
- if (!lio->glist_lock)
- return -ENOMEM;
-
- lio->glist = kcalloc(num_iqs, sizeof(*lio->glist),
- GFP_KERNEL);
- if (!lio->glist) {
- kfree(lio->glist_lock);
- lio->glist_lock = NULL;
- return -ENOMEM;
- }
-
- lio->glist_entry_size =
- ROUNDUP8((ROUNDUP4(OCTNIC_MAX_SG) >> 2) * OCT_SG_ENTRY_SIZE);
-
- /* allocate memory to store virtual and dma base address of
- * per glist consistent memory
- */
- lio->glists_virt_base = kcalloc(num_iqs, sizeof(*lio->glists_virt_base),
- GFP_KERNEL);
- lio->glists_dma_base = kcalloc(num_iqs, sizeof(*lio->glists_dma_base),
- GFP_KERNEL);
-
- if (!lio->glists_virt_base || !lio->glists_dma_base) {
- delete_glists(lio);
- return -ENOMEM;
- }
-
- for (i = 0; i < num_iqs; i++) {
- int numa_node = dev_to_node(&oct->pci_dev->dev);
-
- spin_lock_init(&lio->glist_lock[i]);
-
- INIT_LIST_HEAD(&lio->glist[i]);
-
- lio->glists_virt_base[i] =
- lio_dma_alloc(oct,
- lio->glist_entry_size * lio->tx_qsize,
- &lio->glists_dma_base[i]);
-
- if (!lio->glists_virt_base[i]) {
- delete_glists(lio);
- return -ENOMEM;
- }
-
- for (j = 0; j < lio->tx_qsize; j++) {
- g = kzalloc_node(sizeof(*g), GFP_KERNEL,
- numa_node);
- if (!g)
- g = kzalloc(sizeof(*g), GFP_KERNEL);
- if (!g)
- break;
-
- g->sg = lio->glists_virt_base[i] +
- (j * lio->glist_entry_size);
-
- g->sg_dma_ptr = lio->glists_dma_base[i] +
- (j * lio->glist_entry_size);
-
- list_add_tail(&g->list, &lio->glist[i]);
- }
-
- if (j != lio->tx_qsize) {
- delete_glists(lio);
- return -ENOMEM;
- }
- }
-
- return 0;
-}
-
-/**
* \brief Print link information
* @param netdev network device
*/
@@ -1077,6 +912,9 @@
/* set linux specific device pointer */
oct_dev->pci_dev = (void *)pdev;
+ oct_dev->subsystem_id = pdev->subsystem_vendor |
+ (pdev->subsystem_device << 16);
+
hs = &handshake[oct_dev->octeon_id];
init_completion(&hs->init);
init_completion(&hs->started);
@@ -1471,7 +1309,7 @@
cleanup_rx_oom_poll_fn(netdev);
- delete_glists(lio);
+ lio_delete_glists(lio);
free_netdev(netdev);
@@ -1686,7 +1524,7 @@
i++;
}
- iq = skb_iq(lio, skb);
+ iq = skb_iq(lio->oct_dev, skb);
spin_lock(&lio->glist_lock[iq]);
list_add_tail(&g->list, &lio->glist[iq]);
spin_unlock(&lio->glist_lock[iq]);
@@ -1729,7 +1567,7 @@
i++;
}
- iq = skb_iq(lio, skb);
+ iq = skb_iq(lio->oct_dev, skb);
spin_lock(&lio->glist_lock[iq]);
list_add_tail(&g->list, &lio->glist[iq]);
@@ -1942,39 +1780,6 @@
}
/**
- * \brief Callback for getting interface configuration
- * @param status status of request
- * @param buf pointer to resp structure
- */
-static void if_cfg_callback(struct octeon_device *oct,
- u32 status __attribute__((unused)),
- void *buf)
-{
- struct octeon_soft_command *sc = (struct octeon_soft_command *)buf;
- struct liquidio_if_cfg_resp *resp;
- struct liquidio_if_cfg_context *ctx;
-
- resp = (struct liquidio_if_cfg_resp *)sc->virtrptr;
- ctx = (struct liquidio_if_cfg_context *)sc->ctxptr;
-
- oct = lio_get_device(ctx->octeon_id);
- if (resp->status)
- dev_err(&oct->pci_dev->dev, "nic if cfg instruction failed. Status: 0x%llx (0x%08x)\n",
- CVM_CAST64(resp->status), status);
- WRITE_ONCE(ctx->cond, 1);
-
- snprintf(oct->fw_info.liquidio_firmware_version, 32, "%s",
- resp->cfg_info.liquidio_firmware_version);
-
- /* This barrier is required to be sure that the response has been
- * written fully before waking up the handler
- */
- wmb();
-
- wake_up_interruptible(&ctx->wc);
-}
-
-/**
* \brief Poll routine for checking transmit queue status
* @param work work_struct data structure
*/
@@ -2049,11 +1854,6 @@
ifstate_set(lio, LIO_IFSTATE_RUNNING);
- /* Ready for link status updates */
- lio->intf_open = 1;
-
- netif_info(lio, ifup, lio->netdev, "Interface Open, ready for traffic\n");
-
if (OCTEON_CN23XX_PF(oct)) {
if (!oct->msix_on)
if (setup_tx_poll_fn(netdev))
@@ -2063,7 +1863,12 @@
return -1;
}
- start_txqs(netdev);
+ netif_tx_start_all_queues(netdev);
+
+ /* Ready for link status updates */
+ lio->intf_open = 1;
+
+ netif_info(lio, ifup, lio->netdev, "Interface Open, ready for traffic\n");
/* tell Octeon to start forwarding packets to host */
send_rx_ctrl_cmd(lio, 1);
@@ -2086,11 +1891,15 @@
ifstate_reset(lio, LIO_IFSTATE_RUNNING);
- netif_tx_disable(netdev);
+ /* Stop any link updates */
+ lio->intf_open = 0;
+
+ stop_txqs(netdev);
/* Inform that netif carrier is down */
netif_carrier_off(netdev);
- lio->intf_open = 0;
+ netif_tx_disable(netdev);
+
lio->linfo.link.s.link_up = 0;
lio->link_changes++;
@@ -2252,14 +2061,11 @@
return 0;
}
-/**
- * \brief Net device get_stats
- * @param netdev network device
- */
-static struct net_device_stats *liquidio_get_stats(struct net_device *netdev)
+static void
+liquidio_get_stats64(struct net_device *netdev,
+ struct rtnl_link_stats64 *lstats)
{
struct lio *lio = GET_LIO(netdev);
- struct net_device_stats *stats = &netdev->stats;
struct octeon_device *oct;
u64 pkts = 0, drop = 0, bytes = 0;
struct oct_droq_stats *oq_stats;
@@ -2269,7 +2075,7 @@
oct = lio->oct_dev;
if (ifstate_check(lio, LIO_IFSTATE_RESETTING))
- return stats;
+ return;
for (i = 0; i < oct->num_iqs; i++) {
iq_no = lio->linfo.txpciq[i].s.q_no;
@@ -2279,9 +2085,9 @@
bytes += iq_stats->tx_tot_bytes;
}
- stats->tx_packets = pkts;
- stats->tx_bytes = bytes;
- stats->tx_dropped = drop;
+ lstats->tx_packets = pkts;
+ lstats->tx_bytes = bytes;
+ lstats->tx_dropped = drop;
pkts = 0;
drop = 0;
@@ -2298,11 +2104,34 @@
bytes += oq_stats->rx_bytes_received;
}
- stats->rx_bytes = bytes;
- stats->rx_packets = pkts;
- stats->rx_dropped = drop;
+ lstats->rx_bytes = bytes;
+ lstats->rx_packets = pkts;
+ lstats->rx_dropped = drop;
- return stats;
+ octnet_get_link_stats(netdev);
+ lstats->multicast = oct->link_stats.fromwire.fw_total_mcast;
+ lstats->collisions = oct->link_stats.fromhost.total_collisions;
+
+ /* detailed rx_errors: */
+ lstats->rx_length_errors = oct->link_stats.fromwire.l2_err;
+ /* recved pkt with crc error */
+ lstats->rx_crc_errors = oct->link_stats.fromwire.fcs_err;
+ /* recv'd frame alignment error */
+ lstats->rx_frame_errors = oct->link_stats.fromwire.frame_err;
+ /* recv'r fifo overrun */
+ lstats->rx_fifo_errors = oct->link_stats.fromwire.fifo_err;
+
+ lstats->rx_errors = lstats->rx_length_errors + lstats->rx_crc_errors +
+ lstats->rx_frame_errors + lstats->rx_fifo_errors;
+
+ /* detailed tx_errors */
+ lstats->tx_aborted_errors = oct->link_stats.fromhost.fw_err_pko;
+ lstats->tx_carrier_errors = oct->link_stats.fromhost.fw_err_link;
+ lstats->tx_fifo_errors = oct->link_stats.fromhost.fifo_err;
+
+ lstats->tx_errors = lstats->tx_aborted_errors +
+ lstats->tx_carrier_errors +
+ lstats->tx_fifo_errors;
}
/**
@@ -2510,7 +2339,7 @@
lio = GET_LIO(netdev);
oct = lio->oct_dev;
- q_idx = skb_iq(lio, skb);
+ q_idx = skb_iq(oct, skb);
tag = q_idx;
iq_no = lio->linfo.txpciq[q_idx].s.q_no;
@@ -2585,6 +2414,7 @@
if (dma_mapping_error(&oct->pci_dev->dev, dptr)) {
dev_err(&oct->pci_dev->dev, "%s DMA mapping error 1\n",
__func__);
+ stats->tx_dmamap_fail++;
return NETDEV_TX_BUSY;
}
@@ -2602,7 +2432,7 @@
spin_lock(&lio->glist_lock[q_idx]);
g = (struct octnic_gather *)
- list_delete_head(&lio->glist[q_idx]);
+ lio_list_delete_head(&lio->glist[q_idx]);
spin_unlock(&lio->glist_lock[q_idx]);
if (!g) {
@@ -2624,6 +2454,7 @@
if (dma_mapping_error(&oct->pci_dev->dev, g->sg[0].ptr[0])) {
dev_err(&oct->pci_dev->dev, "%s DMA mapping error 2\n",
__func__);
+ stats->tx_dmamap_fail++;
return NETDEV_TX_BUSY;
}
add_sg_size(&g->sg[0], (skb->len - skb->data_len), 0);
@@ -3324,11 +3155,36 @@
.switchdev_port_attr_get = lio_pf_switchdev_attr_get,
};
+static int liquidio_get_vf_stats(struct net_device *netdev, int vfidx,
+ struct ifla_vf_stats *vf_stats)
+{
+ struct lio *lio = GET_LIO(netdev);
+ struct octeon_device *oct = lio->oct_dev;
+ struct oct_vf_stats stats;
+ int ret;
+
+ if (vfidx < 0 || vfidx >= oct->sriov_info.num_vfs_alloced)
+ return -EINVAL;
+
+ memset(&stats, 0, sizeof(struct oct_vf_stats));
+ ret = cn23xx_get_vf_stats(oct, vfidx, &stats);
+ if (!ret) {
+ vf_stats->rx_packets = stats.rx_packets;
+ vf_stats->tx_packets = stats.tx_packets;
+ vf_stats->rx_bytes = stats.rx_bytes;
+ vf_stats->tx_bytes = stats.tx_bytes;
+ vf_stats->broadcast = stats.broadcast;
+ vf_stats->multicast = stats.multicast;
+ }
+
+ return ret;
+}
+
static const struct net_device_ops lionetdevops = {
.ndo_open = liquidio_open,
.ndo_stop = liquidio_stop,
.ndo_start_xmit = liquidio_xmit,
- .ndo_get_stats = liquidio_get_stats,
+ .ndo_get_stats64 = liquidio_get_stats64,
.ndo_set_mac_address = liquidio_set_mac,
.ndo_set_rx_mode = liquidio_set_mcast_list,
.ndo_tx_timeout = liquidio_tx_timeout,
@@ -3346,6 +3202,7 @@
.ndo_get_vf_config = liquidio_get_vf_config,
.ndo_set_vf_trust = liquidio_set_vf_trust,
.ndo_set_vf_link_state = liquidio_set_vf_link_state,
+ .ndo_get_vf_stats = liquidio_get_vf_stats,
};
/** \brief Entry point for the liquidio module
@@ -3448,6 +3305,7 @@
struct liquidio_if_cfg_resp *resp;
struct octdev_props *props;
int retval, num_iqueues, num_oqueues;
+ int max_num_queues = 0;
union oct_nic_if_cfg if_cfg;
unsigned int base_queue;
unsigned int gmx_port_id;
@@ -3528,9 +3386,9 @@
OPCODE_NIC_IF_CFG, 0,
if_cfg.u64, 0);
- sc->callback = if_cfg_callback;
+ sc->callback = lio_if_cfg_callback;
sc->callback_arg = sc;
- sc->wait_time = 3000;
+ sc->wait_time = LIO_IFCFG_WAIT_TIME;
retval = octeon_send_soft_command(octeon_dev, sc);
if (retval == IQ_SEND_FAILED) {
@@ -3584,11 +3442,20 @@
resp->cfg_info.oqmask);
goto setup_nic_dev_fail;
}
+
+ if (OCTEON_CN6XXX(octeon_dev)) {
+ max_num_queues = CFG_GET_IQ_MAX_Q(CHIP_CONF(octeon_dev,
+ cn6xxx));
+ } else if (OCTEON_CN23XX_PF(octeon_dev)) {
+ max_num_queues = CFG_GET_IQ_MAX_Q(CHIP_CONF(octeon_dev,
+ cn23xx_pf));
+ }
+
dev_dbg(&octeon_dev->pci_dev->dev,
- "interface %d, iqmask %016llx, oqmask %016llx, numiqueues %d, numoqueues %d\n",
+ "interface %d, iqmask %016llx, oqmask %016llx, numiqueues %d, numoqueues %d max_num_queues: %d\n",
i, resp->cfg_info.iqmask, resp->cfg_info.oqmask,
- num_iqueues, num_oqueues);
- netdev = alloc_etherdev_mq(LIO_SIZE, num_iqueues);
+ num_iqueues, num_oqueues, max_num_queues);
+ netdev = alloc_etherdev_mq(LIO_SIZE, max_num_queues);
if (!netdev) {
dev_err(&octeon_dev->pci_dev->dev, "Device allocation failed\n");
@@ -3603,6 +3470,20 @@
netdev->netdev_ops = &lionetdevops;
SWITCHDEV_SET_OPS(netdev, &lio_pf_switchdev_ops);
+ retval = netif_set_real_num_rx_queues(netdev, num_oqueues);
+ if (retval) {
+ dev_err(&octeon_dev->pci_dev->dev,
+ "setting real number rx failed\n");
+ goto setup_nic_dev_fail;
+ }
+
+ retval = netif_set_real_num_tx_queues(netdev, num_iqueues);
+ if (retval) {
+ dev_err(&octeon_dev->pci_dev->dev,
+ "setting real number tx failed\n");
+ goto setup_nic_dev_fail;
+ }
+
lio = GET_LIO(netdev);
memset(lio, 0, sizeof(struct lio));
@@ -3724,7 +3605,7 @@
lio->tx_qsize = octeon_get_tx_qsize(octeon_dev, lio->txq);
lio->rx_qsize = octeon_get_rx_qsize(octeon_dev, lio->rxq);
- if (setup_glists(octeon_dev, lio, num_iqueues)) {
+ if (lio_setup_glists(octeon_dev, lio, num_iqueues)) {
dev_err(&octeon_dev->pci_dev->dev,
"Gather list allocation failed\n");
goto setup_nic_dev_fail;
@@ -3786,6 +3667,23 @@
"NIC ifidx:%d Setup successful\n", i);
octeon_free_soft_command(octeon_dev, sc);
+
+ if (octeon_dev->subsystem_id ==
+ OCTEON_CN2350_25GB_SUBSYS_ID ||
+ octeon_dev->subsystem_id ==
+ OCTEON_CN2360_25GB_SUBSYS_ID) {
+ liquidio_get_speed(lio);
+
+ if (octeon_dev->speed_setting == 0) {
+ octeon_dev->speed_setting = 25;
+ octeon_dev->no_speed_setting = 1;
+ }
+ } else {
+ octeon_dev->no_speed_setting = 1;
+ octeon_dev->speed_setting = 10;
+ }
+ octeon_dev->speed_boot = octeon_dev->speed_setting;
+
}
devlink = devlink_alloc(&liquidio_devlink_ops,
@@ -4223,7 +4121,9 @@
}
atomic_set(&octeon_dev->status, OCT_DEV_MBOX_SETUP_DONE);
- if (octeon_allocate_ioq_vector(octeon_dev)) {
+ if (octeon_allocate_ioq_vector
+ (octeon_dev,
+ octeon_dev->sriov_info.num_pf_rings)) {
dev_err(&octeon_dev->pci_dev->dev, "OCTEON: ioq vector allocation failed\n");
return 1;
}
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index f92dfa4..7fa0212 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -69,30 +69,10 @@
} s;
};
-#define OCTNIC_MAX_SG (MAX_SKB_FRAGS)
-
#define OCTNIC_GSO_MAX_HEADER_SIZE 128
#define OCTNIC_GSO_MAX_SIZE \
(CN23XX_DEFAULT_INPUT_JABBER - OCTNIC_GSO_MAX_HEADER_SIZE)
-struct octnic_gather {
- /* List manipulation. Next and prev pointers. */
- struct list_head list;
-
- /* Size of the gather component at sg in bytes. */
- int sg_size;
-
- /* Number of bytes that sg was adjusted to make it 8B-aligned. */
- int adjust;
-
- /* Gather component that can accommodate max sized fragment list
- * received from the IP layer.
- */
- struct octeon_sg_entry *sg;
-
- dma_addr_t sg_dma_ptr;
-};
-
static int
liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
static void liquidio_vf_remove(struct pci_dev *pdev);
@@ -285,142 +265,6 @@
};
/**
- * Remove the node at the head of the list. The list would be empty at
- * the end of this call if there are no more nodes in the list.
- */
-static struct list_head *list_delete_head(struct list_head *root)
-{
- struct list_head *node;
-
- if ((root->prev == root) && (root->next == root))
- node = NULL;
- else
- node = root->next;
-
- if (node)
- list_del(node);
-
- return node;
-}
-
-/**
- * \brief Delete gather lists
- * @param lio per-network private data
- */
-static void delete_glists(struct lio *lio)
-{
- struct octnic_gather *g;
- int i;
-
- kfree(lio->glist_lock);
- lio->glist_lock = NULL;
-
- if (!lio->glist)
- return;
-
- for (i = 0; i < lio->linfo.num_txpciq; i++) {
- do {
- g = (struct octnic_gather *)
- list_delete_head(&lio->glist[i]);
- kfree(g);
- } while (g);
-
- if (lio->glists_virt_base && lio->glists_virt_base[i] &&
- lio->glists_dma_base && lio->glists_dma_base[i]) {
- lio_dma_free(lio->oct_dev,
- lio->glist_entry_size * lio->tx_qsize,
- lio->glists_virt_base[i],
- lio->glists_dma_base[i]);
- }
- }
-
- kfree(lio->glists_virt_base);
- lio->glists_virt_base = NULL;
-
- kfree(lio->glists_dma_base);
- lio->glists_dma_base = NULL;
-
- kfree(lio->glist);
- lio->glist = NULL;
-}
-
-/**
- * \brief Setup gather lists
- * @param lio per-network private data
- */
-static int setup_glists(struct lio *lio, int num_iqs)
-{
- struct octnic_gather *g;
- int i, j;
-
- lio->glist_lock =
- kzalloc(sizeof(*lio->glist_lock) * num_iqs, GFP_KERNEL);
- if (!lio->glist_lock)
- return -ENOMEM;
-
- lio->glist =
- kzalloc(sizeof(*lio->glist) * num_iqs, GFP_KERNEL);
- if (!lio->glist) {
- kfree(lio->glist_lock);
- lio->glist_lock = NULL;
- return -ENOMEM;
- }
-
- lio->glist_entry_size =
- ROUNDUP8((ROUNDUP4(OCTNIC_MAX_SG) >> 2) * OCT_SG_ENTRY_SIZE);
-
- /* allocate memory to store virtual and dma base address of
- * per glist consistent memory
- */
- lio->glists_virt_base = kcalloc(num_iqs, sizeof(*lio->glists_virt_base),
- GFP_KERNEL);
- lio->glists_dma_base = kcalloc(num_iqs, sizeof(*lio->glists_dma_base),
- GFP_KERNEL);
-
- if (!lio->glists_virt_base || !lio->glists_dma_base) {
- delete_glists(lio);
- return -ENOMEM;
- }
-
- for (i = 0; i < num_iqs; i++) {
- spin_lock_init(&lio->glist_lock[i]);
-
- INIT_LIST_HEAD(&lio->glist[i]);
-
- lio->glists_virt_base[i] =
- lio_dma_alloc(lio->oct_dev,
- lio->glist_entry_size * lio->tx_qsize,
- &lio->glists_dma_base[i]);
-
- if (!lio->glists_virt_base[i]) {
- delete_glists(lio);
- return -ENOMEM;
- }
-
- for (j = 0; j < lio->tx_qsize; j++) {
- g = kzalloc(sizeof(*g), GFP_KERNEL);
- if (!g)
- break;
-
- g->sg = lio->glists_virt_base[i] +
- (j * lio->glist_entry_size);
-
- g->sg_dma_ptr = lio->glists_dma_base[i] +
- (j * lio->glist_entry_size);
-
- list_add_tail(&g->list, &lio->glist[i]);
- }
-
- if (j != lio->tx_qsize) {
- delete_glists(lio);
- return -ENOMEM;
- }
- }
-
- return 0;
-}
-
-/**
* \brief Print link information
* @param netdev network device
*/
@@ -567,6 +411,9 @@
/* set linux specific device pointer */
oct_dev->pci_dev = pdev;
+ oct_dev->subsystem_id = pdev->subsystem_vendor |
+ (pdev->subsystem_device << 16);
+
if (octeon_device_init(oct_dev)) {
liquidio_vf_remove(pdev);
return -ENOMEM;
@@ -856,7 +703,7 @@
cleanup_link_status_change_wq(netdev);
- delete_glists(lio);
+ lio_delete_glists(lio);
free_netdev(netdev);
@@ -1005,7 +852,7 @@
i++;
}
- iq = skb_iq(lio, skb);
+ iq = skb_iq(lio->oct_dev, skb);
spin_lock(&lio->glist_lock[iq]);
list_add_tail(&g->list, &lio->glist[iq]);
@@ -1049,7 +896,7 @@
i++;
}
- iq = skb_iq(lio, skb);
+ iq = skb_iq(lio->oct_dev, skb);
spin_lock(&lio->glist_lock[iq]);
list_add_tail(&g->list, &lio->glist[iq]);
@@ -1059,38 +906,6 @@
}
/**
- * \brief Callback for getting interface configuration
- * @param status status of request
- * @param buf pointer to resp structure
- */
-static void if_cfg_callback(struct octeon_device *oct,
- u32 status __attribute__((unused)), void *buf)
-{
- struct octeon_soft_command *sc = (struct octeon_soft_command *)buf;
- struct liquidio_if_cfg_context *ctx;
- struct liquidio_if_cfg_resp *resp;
-
- resp = (struct liquidio_if_cfg_resp *)sc->virtrptr;
- ctx = (struct liquidio_if_cfg_context *)sc->ctxptr;
-
- oct = lio_get_device(ctx->octeon_id);
- if (resp->status)
- dev_err(&oct->pci_dev->dev, "nic if cfg instruction failed. Status: %llx\n",
- CVM_CAST64(resp->status));
- WRITE_ONCE(ctx->cond, 1);
-
- snprintf(oct->fw_info.liquidio_firmware_version, 32, "%s",
- resp->cfg_info.liquidio_firmware_version);
-
- /* This barrier is required to be sure that the response has been
- * written fully before waking up the handler
- */
- wmb();
-
- wake_up_interruptible(&ctx->wc);
-}
-
-/**
* \brief Net device open for LiquidIO
* @param netdev network device
*/
@@ -1336,24 +1151,21 @@
return 0;
}
-/**
- * \brief Net device get_stats
- * @param netdev network device
- */
-static struct net_device_stats *liquidio_get_stats(struct net_device *netdev)
+static void
+liquidio_get_stats64(struct net_device *netdev,
+ struct rtnl_link_stats64 *lstats)
{
struct lio *lio = GET_LIO(netdev);
- struct net_device_stats *stats = &netdev->stats;
+ struct octeon_device *oct;
u64 pkts = 0, drop = 0, bytes = 0;
struct oct_droq_stats *oq_stats;
struct oct_iq_stats *iq_stats;
- struct octeon_device *oct;
int i, iq_no, oq_no;
oct = lio->oct_dev;
if (ifstate_check(lio, LIO_IFSTATE_RESETTING))
- return stats;
+ return;
for (i = 0; i < oct->num_iqs; i++) {
iq_no = lio->linfo.txpciq[i].s.q_no;
@@ -1363,9 +1175,9 @@
bytes += iq_stats->tx_tot_bytes;
}
- stats->tx_packets = pkts;
- stats->tx_bytes = bytes;
- stats->tx_dropped = drop;
+ lstats->tx_packets = pkts;
+ lstats->tx_bytes = bytes;
+ lstats->tx_dropped = drop;
pkts = 0;
drop = 0;
@@ -1382,11 +1194,29 @@
bytes += oq_stats->rx_bytes_received;
}
- stats->rx_bytes = bytes;
- stats->rx_packets = pkts;
- stats->rx_dropped = drop;
+ lstats->rx_bytes = bytes;
+ lstats->rx_packets = pkts;
+ lstats->rx_dropped = drop;
- return stats;
+ octnet_get_link_stats(netdev);
+ lstats->multicast = oct->link_stats.fromwire.fw_total_mcast;
+
+ /* detailed rx_errors: */
+ lstats->rx_length_errors = oct->link_stats.fromwire.l2_err;
+ /* recved pkt with crc error */
+ lstats->rx_crc_errors = oct->link_stats.fromwire.fcs_err;
+ /* recv'd frame alignment error */
+ lstats->rx_frame_errors = oct->link_stats.fromwire.frame_err;
+
+ lstats->rx_errors = lstats->rx_length_errors + lstats->rx_crc_errors +
+ lstats->rx_frame_errors;
+
+ /* detailed tx_errors */
+ lstats->tx_aborted_errors = oct->link_stats.fromhost.fw_err_pko;
+ lstats->tx_carrier_errors = oct->link_stats.fromhost.fw_err_link;
+
+ lstats->tx_errors = lstats->tx_aborted_errors +
+ lstats->tx_carrier_errors;
}
/**
@@ -1580,7 +1410,7 @@
lio = GET_LIO(netdev);
oct = lio->oct_dev;
- q_idx = skb_iq(lio, skb);
+ q_idx = skb_iq(lio->oct_dev, skb);
tag = q_idx;
iq_no = lio->linfo.txpciq[q_idx].s.q_no;
@@ -1661,8 +1491,8 @@
int i, frags;
spin_lock(&lio->glist_lock[q_idx]);
- g = (struct octnic_gather *)list_delete_head(
- &lio->glist[q_idx]);
+ g = (struct octnic_gather *)
+ lio_list_delete_head(&lio->glist[q_idx]);
spin_unlock(&lio->glist_lock[q_idx]);
if (!g) {
@@ -2034,7 +1864,7 @@
.ndo_open = liquidio_open,
.ndo_stop = liquidio_stop,
.ndo_start_xmit = liquidio_xmit,
- .ndo_get_stats = liquidio_get_stats,
+ .ndo_get_stats64 = liquidio_get_stats64,
.ndo_set_mac_address = liquidio_set_mac,
.ndo_set_rx_mode = liquidio_set_mcast_list,
.ndo_tx_timeout = liquidio_tx_timeout,
@@ -2156,7 +1986,7 @@
OPCODE_NIC_IF_CFG, 0, if_cfg.u64,
0);
- sc->callback = if_cfg_callback;
+ sc->callback = lio_if_cfg_callback;
sc->callback_arg = sc;
sc->wait_time = 5000;
@@ -2273,6 +2103,7 @@
netdev->features = (lio->dev_capability & ~NETIF_F_LRO);
netdev->hw_features = lio->dev_capability;
+ netdev->hw_features &= ~NETIF_F_HW_VLAN_CTAG_RX;
/* MTU range: 68 - 16000 */
netdev->min_mtu = LIO_MIN_MTU_SIZE;
@@ -2321,7 +2152,7 @@
lio->tx_qsize = octeon_get_tx_qsize(octeon_dev, lio->txq);
lio->rx_qsize = octeon_get_rx_qsize(octeon_dev, lio->rxq);
- if (setup_glists(lio, num_iqueues)) {
+ if (lio_setup_glists(octeon_dev, lio, num_iqueues)) {
dev_err(&octeon_dev->pci_dev->dev,
"Gather list allocation failed\n");
goto setup_nic_dev_fail;
@@ -2371,6 +2202,8 @@
"NIC ifidx:%d Setup successful\n", i);
octeon_free_soft_command(octeon_dev, sc);
+
+ octeon_dev->no_speed_setting = 1;
}
return 0;
@@ -2512,7 +2345,7 @@
}
atomic_set(&oct->status, OCT_DEV_MBOX_SETUP_DONE);
- if (octeon_allocate_ioq_vector(oct)) {
+ if (octeon_allocate_ioq_vector(oct, oct->sriov_info.rings_per_vf)) {
dev_err(&oct->pci_dev->dev, "ioq vector allocation failed\n");
return 1;
}
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
index 2adafa3..ddd7431 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c
@@ -201,13 +201,14 @@
{
struct lio_vf_rep_desc *vf_rep = netdev_priv(dev);
- stats64->tx_packets = vf_rep->stats.tx_packets;
- stats64->tx_bytes = vf_rep->stats.tx_bytes;
- stats64->tx_dropped = vf_rep->stats.tx_dropped;
+ /* Swap tx and rx stats as VF rep is a switch port */
+ stats64->tx_packets = vf_rep->stats.rx_packets;
+ stats64->tx_bytes = vf_rep->stats.rx_bytes;
+ stats64->tx_dropped = vf_rep->stats.rx_dropped;
- stats64->rx_packets = vf_rep->stats.rx_packets;
- stats64->rx_bytes = vf_rep->stats.rx_bytes;
- stats64->rx_dropped = vf_rep->stats.rx_dropped;
+ stats64->rx_packets = vf_rep->stats.tx_packets;
+ stats64->rx_bytes = vf_rep->stats.tx_bytes;
+ stats64->rx_dropped = vf_rep->stats.tx_dropped;
}
static int
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 75eea83..690424b 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -28,7 +28,7 @@
#define LIQUIDIO_PACKAGE ""
#define LIQUIDIO_BASE_MAJOR_VERSION 1
#define LIQUIDIO_BASE_MINOR_VERSION 7
-#define LIQUIDIO_BASE_MICRO_VERSION 0
+#define LIQUIDIO_BASE_MICRO_VERSION 2
#define LIQUIDIO_BASE_VERSION __stringify(LIQUIDIO_BASE_MAJOR_VERSION) "." \
__stringify(LIQUIDIO_BASE_MINOR_VERSION)
#define LIQUIDIO_MICRO_VERSION "." __stringify(LIQUIDIO_BASE_MICRO_VERSION)
@@ -84,6 +84,7 @@
#define OPCODE_NIC_IF_CFG 0x09
#define OPCODE_NIC_VF_DRV_NOTICE 0x0A
#define OPCODE_NIC_INTRMOD_PARAMS 0x0B
+#define OPCODE_NIC_QCOUNT_UPDATE 0x12
#define OPCODE_NIC_SET_TRUSTED_VF 0x13
#define OPCODE_NIC_SYNC_OCTEON_TIME 0x14
#define VF_DRV_LOADED 1
@@ -92,6 +93,7 @@
#define OPCODE_NIC_VF_REP_PKT 0x15
#define OPCODE_NIC_VF_REP_CMD 0x16
+#define OPCODE_NIC_UBOOT_CTL 0x17
#define CORE_DRV_TEST_SCATTER_OP 0xFFF5
@@ -248,6 +250,9 @@
#define OCTNET_CMD_VLAN_FILTER_ENABLE 0x1
#define OCTNET_CMD_VLAN_FILTER_DISABLE 0x0
+#define SEAPI_CMD_SPEED_SET 0x2
+#define SEAPI_CMD_SPEED_GET 0x3
+
#define LIO_CMD_WAIT_TM 100
/* RX(packets coming from wire) Checksum verification flags */
@@ -779,23 +784,32 @@
/** Stats for each NIC port in RX direction. */
struct nic_rx_stats {
/* link-level stats */
- u64 total_rcvd;
- u64 bytes_rcvd;
- u64 total_bcst;
- u64 total_mcst;
- u64 runts;
- u64 ctl_rcvd;
- u64 fifo_err; /* Accounts for over/under-run of buffers */
- u64 dmac_drop;
- u64 fcs_err;
- u64 jabber_err;
- u64 l2_err;
- u64 frame_err;
+ u64 total_rcvd; /* Received packets */
+ u64 bytes_rcvd; /* Octets of received packets */
+ u64 total_bcst; /* Number of non-dropped L2 broadcast packets */
+ u64 total_mcst; /* Number of non-dropped L2 multicast packets */
+ u64 runts; /* Packets shorter than allowed */
+ u64 ctl_rcvd; /* Received PAUSE packets */
+ u64 fifo_err; /* Packets dropped due to RX FIFO full */
+ u64 dmac_drop; /* Packets dropped by the DMAC filter */
+ u64 fcs_err; /* Sum of fragment, overrun, and FCS errors */
+ u64 jabber_err; /* Packets larger than allowed */
+ u64 l2_err; /* Sum of DMA, parity, PCAM access, no memory,
+ * buffer overflow, malformed L2 header or
+ * length, oversize errors
+ **/
+ u64 frame_err; /* Sum of IPv4 and L4 checksum errors */
+ u64 red_drops; /* Packets dropped by RED due to buffer
+ * exhaustion
+ **/
/* firmware stats */
u64 fw_total_rcvd;
u64 fw_total_fwd;
u64 fw_total_fwd_bytes;
+ u64 fw_total_mcast;
+ u64 fw_total_bcast;
+
u64 fw_err_pko;
u64 fw_err_link;
u64 fw_err_drop;
@@ -806,11 +820,11 @@
u64 fw_lro_pkts; /* Number of packets that are LROed */
u64 fw_lro_octs; /* Number of octets that are LROed */
u64 fw_total_lro; /* Number of LRO packets formed */
- u64 fw_lro_aborts; /* Number of times lRO of packet aborted */
+ u64 fw_lro_aborts; /* Number of times LRO of packet aborted */
u64 fw_lro_aborts_port;
u64 fw_lro_aborts_seq;
u64 fw_lro_aborts_tsval;
- u64 fw_lro_aborts_timer;
+ u64 fw_lro_aborts_timer; /* Timer setting error */
/* intrmod: packet forward rate */
u64 fwd_rate;
};
@@ -818,23 +832,42 @@
/** Stats for each NIC port in RX direction. */
struct nic_tx_stats {
/* link-level stats */
- u64 total_pkts_sent;
- u64 total_bytes_sent;
- u64 mcast_pkts_sent;
- u64 bcast_pkts_sent;
- u64 ctl_sent;
- u64 one_collision_sent; /* Packets sent after one collision*/
- u64 multi_collision_sent; /* Packets sent after multiple collision*/
- u64 max_collision_fail; /* Packets not sent due to max collisions */
- u64 max_deferral_fail; /* Packets not sent due to max deferrals */
- u64 fifo_err; /* Accounts for over/under-run of buffers */
- u64 runts;
- u64 total_collisions; /* Total number of collisions detected */
+ u64 total_pkts_sent; /* Total frames sent on the interface */
+ u64 total_bytes_sent; /* Total octets sent on the interface */
+ u64 mcast_pkts_sent; /* Packets sent to the multicast DMAC */
+ u64 bcast_pkts_sent; /* Packets sent to a broadcast DMAC */
+ u64 ctl_sent; /* Control/PAUSE packets sent */
+ u64 one_collision_sent; /* Packets sent that experienced a
+ * single collision before successful
+ * transmission
+ **/
+ u64 multi_collision_sent; /* Packets sent that experienced
+ * multiple collisions before successful
+ * transmission
+ **/
+ u64 max_collision_fail; /* Packets dropped due to excessive
+ * collisions
+ **/
+ u64 max_deferral_fail; /* Packets not sent due to max
+ * deferrals
+ **/
+ u64 fifo_err; /* Packets sent that experienced a
+ * transmit underflow and were
+ * truncated
+ **/
+ u64 runts; /* Packets sent with an octet count
+ * lessthan 64
+ **/
+ u64 total_collisions; /* Packets dropped due to excessive
+ * collisions
+ **/
/* firmware stats */
u64 fw_total_sent;
u64 fw_total_fwd;
u64 fw_total_fwd_bytes;
+ u64 fw_total_mcast_sent;
+ u64 fw_total_bcast_sent;
u64 fw_err_pko;
u64 fw_err_link;
u64 fw_err_drop;
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index f38abf6..f878a55 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -824,23 +824,18 @@
}
int
-octeon_allocate_ioq_vector(struct octeon_device *oct)
+octeon_allocate_ioq_vector(struct octeon_device *oct, u32 num_ioqs)
{
- int i, num_ioqs = 0;
struct octeon_ioq_vector *ioq_vector;
int cpu_num;
int size;
-
- if (OCTEON_CN23XX_PF(oct))
- num_ioqs = oct->sriov_info.num_pf_rings;
- else if (OCTEON_CN23XX_VF(oct))
- num_ioqs = oct->sriov_info.rings_per_vf;
+ int i;
size = sizeof(struct octeon_ioq_vector) * num_ioqs;
oct->ioq_vector = vzalloc(size);
if (!oct->ioq_vector)
- return 1;
+ return -1;
for (i = 0; i < num_ioqs; i++) {
ioq_vector = &oct->ioq_vector[i];
ioq_vector->oct_dev = oct;
@@ -856,6 +851,7 @@
else
ioq_vector->ioq_num = i;
}
+
return 0;
}
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
index 91937cc..94a4ed88d 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
@@ -43,6 +43,13 @@
#define OCTEON_CN23XX_REV_1_1 0x01
#define OCTEON_CN23XX_REV_2_0 0x80
+/**SubsystemId for the chips */
+#define OCTEON_CN2350_10GB_SUBSYS_ID_1 0X3177d
+#define OCTEON_CN2350_10GB_SUBSYS_ID_2 0X4177d
+#define OCTEON_CN2360_10GB_SUBSYS_ID 0X5177d
+#define OCTEON_CN2350_25GB_SUBSYS_ID 0X7177d
+#define OCTEON_CN2360_25GB_SUBSYS_ID 0X6177d
+
/** Endian-swap modes supported by Octeon. */
enum octeon_pci_swap_mode {
OCTEON_PCI_PASSTHROUGH = 0,
@@ -430,6 +437,8 @@
u16 rev_id;
+ u32 subsystem_id;
+
u16 pf_num;
u16 vf_num;
@@ -584,6 +593,11 @@
struct lio_vf_rep_list vf_rep_list;
struct devlink *devlink;
enum devlink_eswitch_mode eswitch_mode;
+
+ /* for 25G NIC speed change */
+ u8 speed_boot;
+ u8 speed_setting;
+ u8 no_speed_setting;
};
#define OCT_DRV_ONLINE 1
@@ -867,7 +881,7 @@
struct octeon_config *octeon_get_conf(struct octeon_device *oct);
void octeon_free_ioq_vector(struct octeon_device *oct);
-int octeon_allocate_ioq_vector(struct octeon_device *oct);
+int octeon_allocate_ioq_vector(struct octeon_device *oct, u32 num_ioqs);
void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq);
/* LiquidIO driver pivate flags */
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_iq.h b/drivers/net/ethernet/cavium/liquidio/octeon_iq.h
index 81c9876..5fed7b6 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_iq.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_iq.h
@@ -62,8 +62,8 @@
u64 tx_tot_bytes;/**< Total count of bytes sento to network. */
u64 tx_gso; /* count of tso */
u64 tx_vxlan; /* tunnel */
- u64 tx_dmamap_fail;
- u64 tx_restart;
+ u64 tx_dmamap_fail; /* Number of times dma mapping failed */
+ u64 tx_restart; /* Number of times this queue restarted */
};
#define OCT_IQ_STATS_SIZE (sizeof(struct oct_iq_stats))
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
index 28e74ee..021d99c 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c
@@ -24,6 +24,7 @@
#include "octeon_device.h"
#include "octeon_main.h"
#include "octeon_mailbox.h"
+#include "cn23xx_pf_device.h"
/**
* octeon_mbox_read:
@@ -205,6 +206,26 @@
return ret;
}
+static void get_vf_stats(struct octeon_device *oct,
+ struct oct_vf_stats *stats)
+{
+ int i;
+
+ for (i = 0; i < oct->num_iqs; i++) {
+ if (!oct->instr_queue[i])
+ continue;
+ stats->tx_packets += oct->instr_queue[i]->stats.tx_done;
+ stats->tx_bytes += oct->instr_queue[i]->stats.tx_tot_bytes;
+ }
+
+ for (i = 0; i < oct->num_oqs; i++) {
+ if (!oct->droq[i])
+ continue;
+ stats->rx_packets += oct->droq[i]->stats.rx_pkts_received;
+ stats->rx_bytes += oct->droq[i]->stats.rx_bytes_received;
+ }
+}
+
/**
* octeon_mbox_process_cmd:
* @mbox: Pointer mailbox
@@ -250,6 +271,15 @@
mbox_cmd->msg.s.params);
break;
+ case OCTEON_GET_VF_STATS:
+ dev_dbg(&oct->pci_dev->dev, "Got VF stats request. Sending data back\n");
+ mbox_cmd->msg.s.type = OCTEON_MBOX_RESPONSE;
+ mbox_cmd->msg.s.resp_needed = 1;
+ mbox_cmd->msg.s.len = 1 +
+ sizeof(struct oct_vf_stats) / sizeof(u64);
+ get_vf_stats(oct, (struct oct_vf_stats *)mbox_cmd->data);
+ octeon_mbox_write(oct, mbox_cmd);
+ break;
default:
break;
}
@@ -322,3 +352,25 @@
return 0;
}
+
+int octeon_mbox_cancel(struct octeon_device *oct, int q_no)
+{
+ struct octeon_mbox *mbox = oct->mbox[q_no];
+ struct octeon_mbox_cmd *mbox_cmd;
+ unsigned long flags = 0;
+
+ spin_lock_irqsave(&mbox->lock, flags);
+ mbox_cmd = &mbox->mbox_resp;
+
+ if (!(mbox->state & OCTEON_MBOX_STATE_RESPONSE_PENDING)) {
+ spin_unlock_irqrestore(&mbox->lock, flags);
+ return 1;
+ }
+
+ mbox->state = OCTEON_MBOX_STATE_IDLE;
+ memset(mbox_cmd, 0, sizeof(*mbox_cmd));
+ writeq(OCTEON_PFVFSIG, mbox->mbox_read_reg);
+ spin_unlock_irqrestore(&mbox->lock, flags);
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.h b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.h
index 1def22a..d92bd7e 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mailbox.h
@@ -25,6 +25,7 @@
#define OCTEON_VF_ACTIVE 0x1
#define OCTEON_VF_FLR_REQUEST 0x2
#define OCTEON_PF_CHANGED_VF_MACADDR 0x4
+#define OCTEON_GET_VF_STATS 0x8
/*Macro for Read acknowldgement*/
#define OCTEON_PFVFACK 0xffffffffffffffffULL
@@ -107,9 +108,15 @@
};
+struct oct_vf_stats_ctx {
+ atomic_t status;
+ struct oct_vf_stats *stats;
+};
+
int octeon_mbox_read(struct octeon_mbox *mbox);
int octeon_mbox_write(struct octeon_device *oct,
struct octeon_mbox_cmd *mbox_cmd);
int octeon_mbox_process_message(struct octeon_mbox *mbox);
+int octeon_mbox_cancel(struct octeon_device *oct, int q_no);
#endif
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index 4069710..d7a3916 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -47,6 +47,29 @@
u64 status;
};
+#define LIO_IFCFG_WAIT_TIME 3000 /* In milli seconds */
+
+/* Structure of a node in list of gather components maintained by
+ * NIC driver for each network device.
+ */
+struct octnic_gather {
+ /* List manipulation. Next and prev pointers. */
+ struct list_head list;
+
+ /* Size of the gather component at sg in bytes. */
+ int sg_size;
+
+ /* Number of bytes that sg was adjusted to make it 8B-aligned. */
+ int adjust;
+
+ /* Gather component that can accommodate max sized fragment list
+ * received from the IP layer.
+ */
+ struct octeon_sg_entry *sg;
+
+ dma_addr_t sg_dma_ptr;
+};
+
struct oct_nic_stats_resp {
u64 rh;
struct oct_link_stats stats;
@@ -58,6 +81,18 @@
struct net_device *netdev;
};
+struct oct_nic_seapi_resp {
+ u64 rh;
+ u32 speed;
+ u64 status;
+};
+
+struct liquidio_nic_seapi_ctl_context {
+ int octeon_id;
+ u32 status;
+ struct completion complete;
+};
+
/** LiquidIO per-interface network private data */
struct lio {
/** State of the interface. Rx/Tx happens only in the RUNNING state. */
@@ -157,7 +192,7 @@
#define LIO_SIZE (sizeof(struct lio))
#define GET_LIO(netdev) ((struct lio *)netdev_priv(netdev))
-#define LIO_MAX_CORES 12
+#define LIO_MAX_CORES 16
/**
* \brief Enable or disable feature
@@ -190,6 +225,8 @@
int octeon_setup_interrupt(struct octeon_device *oct, u32 num_ioqs);
+int octnet_get_link_stats(struct net_device *netdev);
+
int lio_wait_for_clean_oq(struct octeon_device *oct);
/**
* \brief Register ethtool operations
@@ -197,6 +234,17 @@
*/
void liquidio_set_ethtool_ops(struct net_device *netdev);
+void lio_if_cfg_callback(struct octeon_device *oct,
+ u32 status __attribute__((unused)),
+ void *buf);
+
+void lio_delete_glists(struct lio *lio);
+
+int lio_setup_glists(struct octeon_device *oct, struct lio *lio, int num_qs);
+
+int liquidio_get_speed(struct lio *lio);
+int liquidio_set_speed(struct lio *lio, int speed);
+
/**
* \brief Net device change_mtu
* @param netdev network device
@@ -515,7 +563,7 @@
{
int i;
- for (i = 0; i < netdev->num_tx_queues; i++)
+ for (i = 0; i < netdev->real_num_tx_queues; i++)
netif_stop_subqueue(netdev, i);
}
@@ -528,7 +576,7 @@
struct lio *lio = GET_LIO(netdev);
int i, qno;
- for (i = 0; i < netdev->num_tx_queues; i++) {
+ for (i = 0; i < netdev->real_num_tx_queues; i++) {
qno = lio->linfo.txpciq[i % lio->oct_dev->num_iqs].s.q_no;
if (__netif_subqueue_stopped(netdev, i)) {
@@ -549,14 +597,33 @@
int i;
if (lio->linfo.link.s.link_up) {
- for (i = 0; i < netdev->num_tx_queues; i++)
+ for (i = 0; i < netdev->real_num_tx_queues; i++)
netif_start_subqueue(netdev, i);
}
}
-static inline int skb_iq(struct lio *lio, struct sk_buff *skb)
+static inline int skb_iq(struct octeon_device *oct, struct sk_buff *skb)
{
- return skb->queue_mapping % lio->linfo.num_txpciq;
+ return skb->queue_mapping % oct->num_iqs;
+}
+
+/**
+ * Remove the node at the head of the list. The list would be empty at
+ * the end of this call if there are no more nodes in the list.
+ */
+static inline struct list_head *lio_list_delete_head(struct list_head *root)
+{
+ struct list_head *node;
+
+ if (root->prev == root && root->next == root)
+ node = NULL;
+ else
+ node = root->next;
+
+ if (node)
+ list_del(node);
+
+ return node;
}
#endif
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 707db33..7135db4 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -538,9 +538,9 @@
action = bpf_prog_run_xdp(prog, &xdp);
rcu_read_unlock();
+ len = xdp.data_end - xdp.data;
/* Check if XDP program has changed headers */
if (orig_data != xdp.data) {
- len = xdp.data_end - xdp.data;
offset = orig_data - xdp.data;
dma_addr -= offset;
}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
index dc25066..3c50578 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
@@ -62,6 +62,18 @@
u32 map;
};
+#define SGE_QBASE_DATA_REG_NUM 4
+
+struct sge_qbase_reg_field {
+ u32 reg_addr;
+ u32 reg_data[SGE_QBASE_DATA_REG_NUM];
+ /* Max supported PFs */
+ u32 pf_data_value[PCIE_FW_MASTER_M + 1][SGE_QBASE_DATA_REG_NUM];
+ /* Max supported VFs */
+ u32 vf_data_value[T6_VF_M + 1][SGE_QBASE_DATA_REG_NUM];
+ u32 vfcount; /* Actual number of max vfs in current configuration */
+};
+
struct ireg_field {
u32 ireg_addr;
u32 ireg_data;
@@ -235,6 +247,9 @@
};
#define CUDBG_MAX_TCAM_TID 0x800
+#define CUDBG_T6_CLIP 1536
+#define CUDBG_MAX_TID_COMP_EN 6144
+#define CUDBG_MAX_TID_COMP_DIS 3072
enum cudbg_le_entry_types {
LE_ET_UNKNOWN = 0,
@@ -354,6 +369,11 @@
{0x10cc, 0x10d4, 0x0, 16},
};
+static const u32 t6_sge_qbase_index_array[] = {
+ /* 1 addr reg SGE_QBASE_INDEX and 4 data reg SGE_QBASE_MAP[0-3] */
+ 0x1250, 0x1240, 0x1244, 0x1248, 0x124c,
+};
+
static const u32 t5_pcie_pdbg_array[][IREG_NUM_ELEM] = {
{0x5a04, 0x5a0c, 0x00, 0x20}, /* t5_pcie_pdbg_regs_00_to_20 */
{0x5a04, 0x5a0c, 0x21, 0x20}, /* t5_pcie_pdbg_regs_21_to_40 */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index 8568a51..215fe62 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -24,6 +24,7 @@
#define CUDBG_STATUS_NOT_IMPLEMENTED -28
#define CUDBG_SYSTEM_ERROR -29
#define CUDBG_STATUS_CCLK_NOT_DEFINED -32
+#define CUDBG_STATUS_PARTIAL_DATA -41
#define CUDBG_MAJOR_VERSION 1
#define CUDBG_MINOR_VERSION 14
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index 9da6f57..0afcfe9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -1339,16 +1339,39 @@
return cudbg_write_and_release_buff(pdbg_init, &temp_buff, dbg_buff);
}
+static void cudbg_read_sge_qbase_indirect_reg(struct adapter *padap,
+ struct sge_qbase_reg_field *qbase,
+ u32 func, bool is_pf)
+{
+ u32 *buff, i;
+
+ if (is_pf) {
+ buff = qbase->pf_data_value[func];
+ } else {
+ buff = qbase->vf_data_value[func];
+ /* In SGE_QBASE_INDEX,
+ * Entries 0->7 are PF0->7, Entries 8->263 are VFID0->256.
+ */
+ func += 8;
+ }
+
+ t4_write_reg(padap, qbase->reg_addr, func);
+ for (i = 0; i < SGE_QBASE_DATA_REG_NUM; i++, buff++)
+ *buff = t4_read_reg(padap, qbase->reg_data[i]);
+}
+
int cudbg_collect_sge_indirect(struct cudbg_init *pdbg_init,
struct cudbg_buffer *dbg_buff,
struct cudbg_error *cudbg_err)
{
struct adapter *padap = pdbg_init->adap;
struct cudbg_buffer temp_buff = { 0 };
+ struct sge_qbase_reg_field *sge_qbase;
struct ireg_buf *ch_sge_dbg;
int i, rc;
- rc = cudbg_get_buff(pdbg_init, dbg_buff, sizeof(*ch_sge_dbg) * 2,
+ rc = cudbg_get_buff(pdbg_init, dbg_buff,
+ sizeof(*ch_sge_dbg) * 2 + sizeof(*sge_qbase),
&temp_buff);
if (rc)
return rc;
@@ -1370,6 +1393,28 @@
sge_pio->ireg_local_offset);
ch_sge_dbg++;
}
+
+ if (CHELSIO_CHIP_VERSION(padap->params.chip) > CHELSIO_T5) {
+ sge_qbase = (struct sge_qbase_reg_field *)ch_sge_dbg;
+ /* 1 addr reg SGE_QBASE_INDEX and 4 data reg
+ * SGE_QBASE_MAP[0-3]
+ */
+ sge_qbase->reg_addr = t6_sge_qbase_index_array[0];
+ for (i = 0; i < SGE_QBASE_DATA_REG_NUM; i++)
+ sge_qbase->reg_data[i] =
+ t6_sge_qbase_index_array[i + 1];
+
+ for (i = 0; i <= PCIE_FW_MASTER_M; i++)
+ cudbg_read_sge_qbase_indirect_reg(padap, sge_qbase,
+ i, true);
+
+ for (i = 0; i < padap->params.arch.vfcount; i++)
+ cudbg_read_sge_qbase_indirect_reg(padap, sge_qbase,
+ i, false);
+
+ sge_qbase->vfcount = padap->params.arch.vfcount;
+ }
+
return cudbg_write_and_release_buff(pdbg_init, &temp_buff, dbg_buff);
}
@@ -2366,8 +2411,11 @@
value = t4_read_reg(padap, LE_DB_ROUTING_TABLE_INDEX_A);
tcam_region->routing_start = value;
- /*Get clip table index */
- value = t4_read_reg(padap, LE_DB_CLIP_TABLE_INDEX_A);
+ /* Get clip table index. For T6 there is separate CLIP TCAM */
+ if (is_t6(padap->params.chip))
+ value = t4_read_reg(padap, LE_DB_CLCAM_TID_BASE_A);
+ else
+ value = t4_read_reg(padap, LE_DB_CLIP_TABLE_INDEX_A);
tcam_region->clip_start = value;
/* Get filter table index */
@@ -2392,8 +2440,16 @@
tcam_region->tid_hash_base;
}
} else { /* hash not enabled */
- tcam_region->max_tid = CUDBG_MAX_TCAM_TID;
+ if (is_t6(padap->params.chip))
+ tcam_region->max_tid = (value & ASLIPCOMPEN_F) ?
+ CUDBG_MAX_TID_COMP_EN :
+ CUDBG_MAX_TID_COMP_DIS;
+ else
+ tcam_region->max_tid = CUDBG_MAX_TCAM_TID;
}
+
+ if (is_t6(padap->params.chip))
+ tcam_region->max_tid += CUDBG_T6_CLIP;
}
int cudbg_collect_le_tcam(struct cudbg_init *pdbg_init,
@@ -2423,18 +2479,31 @@
for (i = 0; i < tcam_region.max_tid; ) {
rc = cudbg_read_tid(pdbg_init, i, tid_data);
if (rc) {
- cudbg_err->sys_err = rc;
- cudbg_put_buff(pdbg_init, &temp_buff);
- return rc;
+ cudbg_err->sys_warn = CUDBG_STATUS_PARTIAL_DATA;
+ /* Update tcam header and exit */
+ tcam_region.max_tid = i;
+ memcpy(temp_buff.data, &tcam_region,
+ sizeof(struct cudbg_tcam));
+ goto out;
}
- /* ipv6 takes two tids */
- cudbg_is_ipv6_entry(tid_data, tcam_region) ? i += 2 : i++;
+ if (cudbg_is_ipv6_entry(tid_data, tcam_region)) {
+ /* T6 CLIP TCAM: ipv6 takes 4 entries */
+ if (is_t6(padap->params.chip) &&
+ i >= tcam_region.clip_start &&
+ i < tcam_region.clip_start + CUDBG_T6_CLIP)
+ i += 4;
+ else /* Main TCAM: ipv6 takes two tids */
+ i += 2;
+ } else {
+ i++;
+ }
tid_data++;
bytes += sizeof(struct cudbg_tid_data);
}
+out:
return cudbg_write_and_release_buff(pdbg_init, &temp_buff, dbg_buff);
}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 688f954..0dbe2d9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -50,6 +50,7 @@
#include <linux/net_tstamp.h>
#include <linux/ptp_clock_kernel.h>
#include <linux/ptp_classify.h>
+#include <linux/crash_dump.h>
#include <asm/io.h>
#include "t4_chip_type.h"
#include "cxgb4_uld.h"
@@ -490,6 +491,9 @@
unsigned char link_ok; /* link up? */
unsigned char link_down_rc; /* link down reason */
+
+ bool new_module; /* ->OS Transceiver Module inserted */
+ bool redo_l1cfg; /* ->CC redo current "sticky" L1 CFG */
};
#define FW_LEN16(fw_struct) FW_CMD_LEN16_V(sizeof(fw_struct) / 16)
@@ -964,6 +968,9 @@
struct hma_data hma;
struct srq_data *srq;
+
+ /* Dump buffer for collecting logs in kdump kernel */
+ struct vmcoredd_data vmcoredd;
};
/* Support for "sched-class" command to allow a TX Scheduling Class to be
@@ -1034,6 +1041,7 @@
#define VF_BITWIDTH 8
#define IVLAN_BITWIDTH 16
#define OVLAN_BITWIDTH 16
+#define ENCAP_VNI_BITWIDTH 24
/* Filter matching rules. These consist of a set of ingress packet field
* (value, mask) tuples. The associated ingress packet field matches the
@@ -1064,6 +1072,7 @@
uint32_t ivlan_vld:1; /* inner VLAN valid */
uint32_t ovlan_vld:1; /* outer VLAN valid */
uint32_t pfvf_vld:1; /* PF/VF valid */
+ uint32_t encap_vld:1; /* Encapsulation valid */
uint32_t macidx:MACIDX_BITWIDTH; /* exact match MAC index */
uint32_t fcoe:FCOE_BITWIDTH; /* FCoE packet */
uint32_t iport:IPORT_BITWIDTH; /* ingress port */
@@ -1074,6 +1083,7 @@
uint32_t vf:VF_BITWIDTH; /* PCI-E VF ID */
uint32_t ivlan:IVLAN_BITWIDTH; /* inner VLAN */
uint32_t ovlan:OVLAN_BITWIDTH; /* outer VLAN */
+ uint32_t vni:ENCAP_VNI_BITWIDTH; /* VNI of tunnel */
/* Uncompressed header matching field rules. These are always
* available for field rules.
@@ -1317,7 +1327,7 @@
extern char cxgb4_driver_name[];
extern const char cxgb4_driver_version[];
-void t4_os_portmod_changed(const struct adapter *adap, int port_id);
+void t4_os_portmod_changed(struct adapter *adap, int port_id);
void t4_os_link_changed(struct adapter *adap, int port_id, int link_stat);
void t4_free_sge_resources(struct adapter *adap);
@@ -1498,8 +1508,25 @@
int t4_slow_intr_handler(struct adapter *adapter);
int t4_wait_dev_ready(void __iomem *regs);
-int t4_link_l1cfg(struct adapter *adap, unsigned int mbox, unsigned int port,
- struct link_config *lc);
+
+int t4_link_l1cfg_core(struct adapter *adap, unsigned int mbox,
+ unsigned int port, struct link_config *lc,
+ bool sleep_ok, int timeout);
+
+static inline int t4_link_l1cfg(struct adapter *adapter, unsigned int mbox,
+ unsigned int port, struct link_config *lc)
+{
+ return t4_link_l1cfg_core(adapter, mbox, port, lc,
+ true, FW_CMD_MAX_TIMEOUT);
+}
+
+static inline int t4_link_l1cfg_ns(struct adapter *adapter, unsigned int mbox,
+ unsigned int port, struct link_config *lc)
+{
+ return t4_link_l1cfg_core(adapter, mbox, port, lc,
+ false, FW_CMD_MAX_TIMEOUT);
+}
+
int t4_restart_aneg(struct adapter *adap, unsigned int mbox, unsigned int port);
u32 t4_read_pcie_cfg4(struct adapter *adap, int reg);
@@ -1690,6 +1717,12 @@
int t4_free_raw_mac_filt(struct adapter *adap, unsigned int viid,
const u8 *addr, const u8 *mask, unsigned int idx,
u8 lookup_type, u8 port_id, bool sleep_ok);
+int t4_free_encap_mac_filt(struct adapter *adap, unsigned int viid, int idx,
+ bool sleep_ok);
+int t4_alloc_encap_mac_filt(struct adapter *adap, unsigned int viid,
+ const u8 *addr, const u8 *mask, unsigned int vni,
+ unsigned int vni_mask, u8 dip_hit, u8 lookup_type,
+ bool sleep_ok);
int t4_alloc_raw_mac_filt(struct adapter *adap, unsigned int viid,
const u8 *addr, const u8 *mask, unsigned int idx,
u8 lookup_type, u8 port_id, bool sleep_ok);
@@ -1705,6 +1738,9 @@
bool ucast, u64 vec, bool sleep_ok);
int t4_enable_vi_params(struct adapter *adap, unsigned int mbox,
unsigned int viid, bool rx_en, bool tx_en, bool dcb_en);
+int t4_enable_pi_params(struct adapter *adap, unsigned int mbox,
+ struct port_info *pi,
+ bool rx_en, bool tx_en, bool dcb_en);
int t4_enable_vi(struct adapter *adap, unsigned int mbox, unsigned int viid,
bool rx_en, bool tx_en);
int t4_identify_port(struct adapter *adap, unsigned int mbox, unsigned int viid,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index 143686c..8d751ef 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -214,7 +214,8 @@
len = sizeof(struct ireg_buf) * n;
break;
case CUDBG_SGE_INDIRECT:
- len = sizeof(struct ireg_buf) * 2;
+ len = sizeof(struct ireg_buf) * 2 +
+ sizeof(struct sge_qbase_reg_field);
break;
case CUDBG_ULPRX_LA:
len = sizeof(struct cudbg_ulprx_la);
@@ -488,3 +489,28 @@
adapter->eth_dump.version = adapter->params.fw_vers;
adapter->eth_dump.len = 0;
}
+
+static int cxgb4_cudbg_vmcoredd_collect(struct vmcoredd_data *data, void *buf)
+{
+ struct adapter *adap = container_of(data, struct adapter, vmcoredd);
+ u32 len = data->size;
+
+ return cxgb4_cudbg_collect(adap, buf, &len, CXGB4_ETH_DUMP_ALL);
+}
+
+int cxgb4_cudbg_vmcore_add_dump(struct adapter *adap)
+{
+ struct vmcoredd_data *data = &adap->vmcoredd;
+ u32 len;
+
+ len = sizeof(struct cudbg_hdr) +
+ sizeof(struct cudbg_entity_hdr) * CUDBG_MAX_ENTITY;
+ len += CUDBG_DUMP_BUFF_SIZE;
+
+ data->size = len;
+ snprintf(data->dump_name, sizeof(data->dump_name), "%s_%s",
+ cxgb4_driver_name, adap->name);
+ data->vmcoredd_callback = cxgb4_cudbg_vmcoredd_collect;
+
+ return vmcore_add_device_dump(data);
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
index ce1ac9a..ef59ba1 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
@@ -41,8 +41,11 @@
CXGB4_ETH_DUMP_HW = (1 << 1), /* various FW and HW dumps */
};
+#define CXGB4_ETH_DUMP_ALL (CXGB4_ETH_DUMP_MEM | CXGB4_ETH_DUMP_HW)
+
u32 cxgb4_get_dump_length(struct adapter *adap, u32 flag);
int cxgb4_cudbg_collect(struct adapter *adap, void *buf, u32 *buf_size,
u32 flag);
void cxgb4_init_ethtool_dump(struct adapter *adapter);
+int cxgb4_cudbg_vmcore_add_dump(struct adapter *adap);
#endif /* __CXGB4_CUDBG_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
index 59d04d7..f7eef93 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
@@ -800,24 +800,20 @@
if (base->duplex != DUPLEX_FULL)
return -EINVAL;
- if (!(lc->pcaps & FW_PORT_CAP32_ANEG)) {
- /* PHY offers a single speed. See if that's what's
- * being requested.
- */
- if (base->autoneg == AUTONEG_DISABLE &&
- (lc->pcaps & speed_to_fw_caps(base->speed)))
- return 0;
- return -EINVAL;
- }
-
old_lc = *lc;
- if (base->autoneg == AUTONEG_DISABLE) {
+ if (!(lc->pcaps & FW_PORT_CAP32_ANEG) ||
+ base->autoneg == AUTONEG_DISABLE) {
fw_caps = speed_to_fw_caps(base->speed);
- if (!(lc->pcaps & fw_caps))
+ /* Must only specify a single speed which must be supported
+ * as part of the Physical Port Capabilities.
+ */
+ if ((fw_caps & (fw_caps - 1)) != 0 ||
+ !(lc->pcaps & fw_caps))
return -EINVAL;
+
lc->speed_caps = fw_caps;
- lc->acaps = 0;
+ lc->acaps = fw_caps;
} else {
fw_caps =
lmm_to_fw_caps(link_ksettings->link_modes.advertising);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index b76447b..00fc5f1 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -64,8 +64,7 @@
if (!skb)
return -ENOMEM;
- req = (struct cpl_set_tcb_field *)__skb_put(skb, sizeof(*req));
- memset(req, 0, sizeof(*req));
+ req = (struct cpl_set_tcb_field *)__skb_put_zero(skb, sizeof(*req));
INIT_TP_WR_CPL(req, CPL_SET_TCB_FIELD, ftid);
req->reply_ctrl = htons(REPLY_CHAN_V(0) |
QUEUENO_V(adap->sge.fw_evtq.abs_id) |
@@ -266,6 +265,8 @@
fs->mask.pfvf_vld) ||
unsupported(fconf, VNIC_ID_F, fs->val.ovlan_vld,
fs->mask.ovlan_vld) ||
+ unsupported(fconf, VNIC_ID_F, fs->val.encap_vld,
+ fs->mask.encap_vld) ||
unsupported(fconf, VLAN_F, fs->val.ivlan_vld, fs->mask.ivlan_vld))
return -EOPNOTSUPP;
@@ -276,8 +277,12 @@
* carries that overlap, we need to translate any PF/VF
* specification into that internal format below.
*/
- if (is_field_set(fs->val.pfvf_vld, fs->mask.pfvf_vld) &&
- is_field_set(fs->val.ovlan_vld, fs->mask.ovlan_vld))
+ if ((is_field_set(fs->val.pfvf_vld, fs->mask.pfvf_vld) &&
+ is_field_set(fs->val.ovlan_vld, fs->mask.ovlan_vld)) ||
+ (is_field_set(fs->val.pfvf_vld, fs->mask.pfvf_vld) &&
+ is_field_set(fs->val.encap_vld, fs->mask.encap_vld)) ||
+ (is_field_set(fs->val.ovlan_vld, fs->mask.ovlan_vld) &&
+ is_field_set(fs->val.encap_vld, fs->mask.encap_vld)))
return -EOPNOTSUPP;
if (unsupported(iconf, VNIC_F, fs->val.pfvf_vld, fs->mask.pfvf_vld) ||
(is_field_set(fs->val.ovlan_vld, fs->mask.ovlan_vld) &&
@@ -307,6 +312,9 @@
fs->newvlan == VLAN_REWRITE))
return -EOPNOTSUPP;
+ if (fs->val.encap_vld &&
+ CHELSIO_CHIP_VERSION(adapter->params.chip) < CHELSIO_T6)
+ return -EOPNOTSUPP;
return 0;
}
@@ -706,6 +714,8 @@
*/
void clear_filter(struct adapter *adap, struct filter_entry *f)
{
+ struct port_info *pi = netdev_priv(f->dev);
+
/* If the new or old filter have loopback rewriteing rules then we'll
* need to free any existing L2T, SMT, CLIP entries of filter
* rule.
@@ -716,6 +726,12 @@
if (f->smt)
cxgb4_smt_release(f->smt);
+ if (f->fs.val.encap_vld && f->fs.val.ovlan_vld)
+ if (atomic_dec_and_test(&adap->mps_encap[f->fs.val.ovlan &
+ 0x1ff].refcnt))
+ t4_free_encap_mac_filt(adap, pi->viid,
+ f->fs.val.ovlan & 0x1ff, 0);
+
if ((f->fs.hash || is_t6(adap->params.chip)) && f->fs.type)
cxgb4_clip_release(f->dev, (const u32 *)&f->fs.val.lip, 1);
@@ -841,6 +857,10 @@
if (!is_hashfilter(adap))
return false;
+ /* Keep tunnel VNI match disabled for hash-filters for now */
+ if (fs->mask.encap_vld)
+ return false;
+
if (fs->type) {
if (is_inaddr_any(fs->val.fip, AF_INET6) ||
!is_addr_all_mask(fs->mask.fip, AF_INET6))
@@ -934,8 +954,12 @@
ntuple |= (u64)(fs->val.tos) << tp->tos_shift;
if (tp->vnic_shift >= 0) {
- if ((adap->params.tp.ingress_config & VNIC_F) &&
- fs->mask.pfvf_vld)
+ if ((adap->params.tp.ingress_config & USE_ENC_IDX_F) &&
+ fs->mask.encap_vld)
+ ntuple |= (u64)((fs->val.encap_vld << 16) |
+ (fs->val.ovlan)) << tp->vnic_shift;
+ else if ((adap->params.tp.ingress_config & VNIC_F) &&
+ fs->mask.pfvf_vld)
ntuple |= (u64)((fs->val.pfvf_vld << 16) |
(fs->val.pf << 13) |
(fs->val.vf)) << tp->vnic_shift;
@@ -1049,6 +1073,7 @@
struct filter_ctx *ctx)
{
struct adapter *adapter = netdev2adap(dev);
+ struct port_info *pi = netdev_priv(dev);
struct tid_info *t = &adapter->tids;
struct filter_entry *f;
struct sk_buff *skb;
@@ -1115,13 +1140,34 @@
f->fs.mask.ovlan = (fs->mask.pf << 13) | fs->mask.vf;
f->fs.val.ovlan_vld = fs->val.pfvf_vld;
f->fs.mask.ovlan_vld = fs->mask.pfvf_vld;
+ } else if (iconf & USE_ENC_IDX_F) {
+ if (f->fs.val.encap_vld) {
+ struct port_info *pi = netdev_priv(f->dev);
+ u8 match_all_mac[] = { 0, 0, 0, 0, 0, 0 };
+
+ /* allocate MPS TCAM entry */
+ ret = t4_alloc_encap_mac_filt(adapter, pi->viid,
+ match_all_mac,
+ match_all_mac,
+ f->fs.val.vni,
+ f->fs.mask.vni,
+ 0, 1, 1);
+ if (ret < 0)
+ goto free_atid;
+
+ atomic_inc(&adapter->mps_encap[ret].refcnt);
+ f->fs.val.ovlan = ret;
+ f->fs.mask.ovlan = 0xffff;
+ f->fs.val.ovlan_vld = 1;
+ f->fs.mask.ovlan_vld = 1;
+ }
}
size = sizeof(struct cpl_t6_act_open_req);
if (f->fs.type) {
ret = cxgb4_clip_get(f->dev, (const u32 *)&f->fs.val.lip, 1);
if (ret)
- goto free_atid;
+ goto free_mps;
skb = alloc_skb(size, GFP_KERNEL);
if (!skb) {
@@ -1136,7 +1182,7 @@
skb = alloc_skb(size, GFP_KERNEL);
if (!skb) {
ret = -ENOMEM;
- goto free_atid;
+ goto free_mps;
}
mk_act_open_req(f, skb,
@@ -1152,6 +1198,10 @@
free_clip:
cxgb4_clip_release(f->dev, (const u32 *)&f->fs.val.lip, 1);
+free_mps:
+ if (f->fs.val.encap_vld && f->fs.val.ovlan_vld)
+ t4_free_encap_mac_filt(adapter, pi->viid, f->fs.val.ovlan, 1);
+
free_atid:
cxgb4_free_atid(t, atid);
@@ -1333,6 +1383,27 @@
f->fs.mask.ovlan = (fs->mask.pf << 13) | fs->mask.vf;
f->fs.val.ovlan_vld = fs->val.pfvf_vld;
f->fs.mask.ovlan_vld = fs->mask.pfvf_vld;
+ } else if (iconf & USE_ENC_IDX_F) {
+ if (f->fs.val.encap_vld) {
+ struct port_info *pi = netdev_priv(f->dev);
+ u8 match_all_mac[] = { 0, 0, 0, 0, 0, 0 };
+
+ /* allocate MPS TCAM entry */
+ ret = t4_alloc_encap_mac_filt(adapter, pi->viid,
+ match_all_mac,
+ match_all_mac,
+ f->fs.val.vni,
+ f->fs.mask.vni,
+ 0, 1, 1);
+ if (ret < 0)
+ goto free_clip;
+
+ atomic_inc(&adapter->mps_encap[ret].refcnt);
+ f->fs.val.ovlan = ret;
+ f->fs.mask.ovlan = 0x1ff;
+ f->fs.val.ovlan_vld = 1;
+ f->fs.mask.ovlan_vld = 1;
+ }
}
/* Attempt to set the filter. If we don't succeed, we clear
@@ -1349,6 +1420,13 @@
}
return ret;
+
+free_clip:
+ if (is_t6(adapter->params.chip) && f->fs.type)
+ cxgb4_clip_release(f->dev, (const u32 *)&f->fs.val.lip, 1);
+ cxgb4_clear_ftid(&adapter->tids, filter_id,
+ fs->type ? PF_INET6 : PF_INET, chip_ver);
+ return ret;
}
static int cxgb4_del_hash_filter(struct net_device *dev, int filter_id,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 005283c..0efae20 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -301,14 +301,14 @@
}
}
-void t4_os_portmod_changed(const struct adapter *adap, int port_id)
+void t4_os_portmod_changed(struct adapter *adap, int port_id)
{
static const char *mod_str[] = {
NULL, "LR", "SR", "ER", "passive DA", "active DA", "LRM"
};
- const struct net_device *dev = adap->port[port_id];
- const struct port_info *pi = netdev_priv(dev);
+ struct net_device *dev = adap->port[port_id];
+ struct port_info *pi = netdev_priv(dev);
if (pi->mod_type == FW_PORT_MOD_TYPE_NONE)
netdev_info(dev, "port module unplugged\n");
@@ -325,6 +325,11 @@
else
netdev_info(dev, "%s: unknown module type %d inserted\n",
dev->name, pi->mod_type);
+
+ /* If the interface is running, then we'll need any "sticky" Link
+ * Parameters redone with a new Transceiver Module.
+ */
+ pi->link_cfg.redo_l1cfg = netif_running(dev);
}
int dbfifo_int_thresh = 10; /* 10 == 640 entry threshold */
@@ -460,7 +465,7 @@
&pi->link_cfg);
if (ret == 0) {
local_bh_disable();
- ret = t4_enable_vi_params(pi->adapter, mb, pi->viid, true,
+ ret = t4_enable_pi_params(pi->adapter, mb, pi, true,
true, CXGB4_DCB_ENABLED);
local_bh_enable();
}
@@ -2339,7 +2344,8 @@
netif_tx_stop_all_queues(dev);
netif_carrier_off(dev);
- ret = t4_enable_vi(adapter, adapter->pf, pi->viid, false, false);
+ ret = t4_enable_pi_params(adapter, adapter->pf, pi,
+ false, false, false);
#ifdef CONFIG_CHELSIO_T4_DCB
cxgb4_dcb_reset(dev);
dcb_tx_queue_prio_enable(dev, false);
@@ -2886,13 +2892,13 @@
}
/* Convert from Mbps to Kbps */
- req_rate = rate << 10;
+ req_rate = rate * 1000;
/* Max rate is 100 Gbps */
- if (req_rate >= SCHED_MAX_RATE_KBPS) {
+ if (req_rate > SCHED_MAX_RATE_KBPS) {
dev_err(adap->pdev_dev,
"Invalid rate %u Mbps, Max rate is %u Mbps\n",
- rate, SCHED_MAX_RATE_KBPS >> 10);
+ rate, SCHED_MAX_RATE_KBPS / 1000);
return -ERANGE;
}
@@ -3081,7 +3087,7 @@
match_all_mac, match_all_mac,
adapter->rawf_start +
pi->port_id,
- 1, pi->port_id, true);
+ 1, pi->port_id, false);
if (ret < 0) {
netdev_info(netdev, "Failed to free mac filter entry, for port %d\n",
i);
@@ -3169,7 +3175,7 @@
match_all_mac,
adapter->rawf_start +
pi->port_id,
- 1, pi->port_id, true);
+ 1, pi->port_id, false);
if (ret < 0) {
netdev_info(netdev, "Failed to allocate a mac filter entry, not adding port %d\n",
be16_to_cpu(ti->port));
@@ -4135,6 +4141,10 @@
* card
*/
card_fw = kvzalloc(sizeof(*card_fw), GFP_KERNEL);
+ if (!card_fw) {
+ ret = -ENOMEM;
+ goto bye;
+ }
/* Get FW from from /lib/firmware/ */
ret = request_firmware(&fw, fw_info->fw_mod_name,
@@ -4276,6 +4286,20 @@
adap->tids.nftids = val[4] - val[3] + 1;
adap->sge.ingr_start = val[5];
+ if (CHELSIO_CHIP_VERSION(adap->params.chip) > CHELSIO_T5) {
+ /* Read the raw mps entries. In T6, the last 2 tcam entries
+ * are reserved for raw mac addresses (rawf = 2, one per port).
+ */
+ params[0] = FW_PARAM_PFVF(RAWF_START);
+ params[1] = FW_PARAM_PFVF(RAWF_END);
+ ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 2,
+ params, val);
+ if (ret == 0) {
+ adap->rawf_start = val[0];
+ adap->rawf_cnt = val[1] - val[0] + 1;
+ }
+ }
+
/* qids (ingress/egress) returned from firmware can be anywhere
* in the range from EQ(IQFLINT)_START to EQ(IQFLINT)_END.
* Hence driver needs to allocate memory for this range to
@@ -5181,6 +5205,7 @@
{
unsigned int i;
+ kvfree(adapter->mps_encap);
kvfree(adapter->smt);
kvfree(adapter->l2t);
kvfree(adapter->srq);
@@ -5216,14 +5241,11 @@
NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA)
#define SEGMENT_SIZE 128
-static int get_chip_type(struct pci_dev *pdev, u32 pl_rev)
+static int t4_get_chip_type(struct adapter *adap, int ver)
{
- u16 device_id;
+ u32 pl_rev = REV_G(t4_read_reg(adap, PL_REV_A));
- /* Retrieve adapter's device ID */
- pci_read_config_word(pdev, PCI_DEVICE_ID, &device_id);
-
- switch (device_id >> 12) {
+ switch (ver) {
case CHELSIO_T4:
return CHELSIO_CHIP_CODE(CHELSIO_T4, pl_rev);
case CHELSIO_T5:
@@ -5231,8 +5253,7 @@
case CHELSIO_T6:
return CHELSIO_CHIP_CODE(CHELSIO_T6, pl_rev);
default:
- dev_err(&pdev->dev, "Device %d is not supported\n",
- device_id);
+ break;
}
return -EINVAL;
}
@@ -5261,13 +5282,9 @@
u32 pcie_fw;
pcie_fw = readl(adap->regs + PCIE_FW_A);
- /* Check if cxgb4 is the MASTER and fw is initialized */
- if (num_vfs &&
- (!(pcie_fw & PCIE_FW_INIT_F) ||
- !(pcie_fw & PCIE_FW_MASTER_VLD_F) ||
- PCIE_FW_MASTER_G(pcie_fw) != CXGB4_UNIFIED_PF)) {
- dev_warn(&pdev->dev,
- "cxgb4 driver needs to be MASTER to support SRIOV\n");
+ /* Check if fw is initialized */
+ if (!(pcie_fw & PCIE_FW_INIT_F)) {
+ dev_warn(&pdev->dev, "Device not initialized\n");
return -EOPNOTSUPP;
}
@@ -5406,15 +5423,18 @@
static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
{
- int func, i, err, s_qpp, qpp, num_seg;
+ struct net_device *netdev;
+ struct adapter *adapter;
+ static int adap_idx = 1;
+ int s_qpp, qpp, num_seg;
struct port_info *pi;
bool highdma = false;
- struct adapter *adapter = NULL;
- struct net_device *netdev;
- void __iomem *regs;
- u32 whoami, pl_rev;
enum chip_type chip;
- static int adap_idx = 1;
+ void __iomem *regs;
+ int func, chip_ver;
+ u16 device_id;
+ int i, err;
+ u32 whoami;
printk_once(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);
@@ -5450,11 +5470,17 @@
goto out_free_adapter;
/* We control everything through one PF */
- whoami = readl(regs + PL_WHOAMI_A);
- pl_rev = REV_G(readl(regs + PL_REV_A));
- chip = get_chip_type(pdev, pl_rev);
- func = CHELSIO_CHIP_VERSION(chip) <= CHELSIO_T5 ?
- SOURCEPF_G(whoami) : T6_SOURCEPF_G(whoami);
+ whoami = t4_read_reg(adapter, PL_WHOAMI_A);
+ pci_read_config_word(pdev, PCI_DEVICE_ID, &device_id);
+ chip = t4_get_chip_type(adapter, CHELSIO_PCI_ID_VER(device_id));
+ if (chip < 0) {
+ dev_err(&pdev->dev, "Device %d is not supported\n", device_id);
+ err = chip;
+ goto out_free_adapter;
+ }
+ chip_ver = CHELSIO_CHIP_VERSION(chip);
+ func = chip_ver <= CHELSIO_T5 ?
+ SOURCEPF_G(whoami) : T6_SOURCEPF_G(whoami);
adapter->pdev = pdev;
adapter->pdev_dev = &pdev->dev;
@@ -5543,6 +5569,16 @@
if (err)
goto out_free_adapter;
+ if (is_kdump_kernel()) {
+ /* Collect hardware state and append to /proc/vmcore */
+ err = cxgb4_cudbg_vmcore_add_dump(adapter);
+ if (err) {
+ dev_warn(adapter->pdev_dev,
+ "Fail collecting vmcore device dump, err: %d. Continuing\n",
+ err);
+ err = 0;
+ }
+ }
if (!is_t4(adapter->params.chip)) {
s_qpp = (QUEUESPERPAGEPF0_S +
@@ -5610,8 +5646,15 @@
NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_TC;
- if (CHELSIO_CHIP_VERSION(chip) > CHELSIO_T5)
+ if (chip_ver > CHELSIO_T5) {
+ netdev->hw_enc_features |= NETIF_F_IP_CSUM |
+ NETIF_F_IPV6_CSUM |
+ NETIF_F_RXCSUM |
+ NETIF_F_GSO_UDP_TUNNEL |
+ NETIF_F_TSO | NETIF_F_TSO6;
+
netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+ }
if (highdma)
netdev->hw_features |= NETIF_F_HIGHDMA;
@@ -5676,8 +5719,14 @@
adapter->params.offload = 0;
}
+ adapter->mps_encap = kvzalloc(sizeof(struct mps_encap_entry) *
+ adapter->params.arch.mps_tcam_size,
+ GFP_KERNEL);
+ if (!adapter->mps_encap)
+ dev_warn(&pdev->dev, "could not allocate MPS Encap entries, continuing\n");
+
#if IS_ENABLED(CONFIG_IPV6)
- if ((CHELSIO_CHIP_VERSION(adapter->params.chip) <= CHELSIO_T5) &&
+ if (chip_ver <= CHELSIO_T5 &&
(!(t4_read_reg(adapter, LE_DB_CONFIG_A) & ASLIPCOMPEN_F))) {
/* CLIP functionality is not present in hardware,
* hence disable all offload features
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 3656336..3ddd2c4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -194,6 +194,23 @@
fs->mask.tos = mask->tos;
}
+ if (dissector_uses_key(cls->dissector, FLOW_DISSECTOR_KEY_ENC_KEYID)) {
+ struct flow_dissector_key_keyid *key, *mask;
+
+ key = skb_flow_dissector_target(cls->dissector,
+ FLOW_DISSECTOR_KEY_ENC_KEYID,
+ cls->key);
+ mask = skb_flow_dissector_target(cls->dissector,
+ FLOW_DISSECTOR_KEY_ENC_KEYID,
+ cls->mask);
+ fs->val.vni = be32_to_cpu(key->keyid);
+ fs->mask.vni = be32_to_cpu(mask->keyid);
+ if (fs->mask.vni) {
+ fs->val.encap_vld = 1;
+ fs->mask.encap_vld = 1;
+ }
+ }
+
if (dissector_uses_key(cls->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
struct flow_dissector_key_vlan *key, *mask;
u16 vlan_tci, vlan_tci_mask;
@@ -247,6 +264,7 @@
BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS) |
BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS) |
BIT(FLOW_DISSECTOR_KEY_PORTS) |
+ BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) |
BIT(FLOW_DISSECTOR_KEY_VLAN) |
BIT(FLOW_DISSECTOR_KEY_IP))) {
netdev_warn(dev, "Unsupported key used: 0x%x\n",
diff --git a/drivers/net/ethernet/chelsio/cxgb4/l2t.c b/drivers/net/ethernet/chelsio/cxgb4/l2t.c
index 1817a03..77c2c53 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/l2t.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/l2t.c
@@ -491,7 +491,7 @@
if (tp->protocol_shift >= 0)
ntuple |= (u64)IPPROTO_TCP << tp->protocol_shift;
- if (tp->vnic_shift >= 0) {
+ if (tp->vnic_shift >= 0 && (tp->ingress_config & VNIC_F)) {
u32 viid = cxgb4_port_viid(dev);
u32 vf = FW_VIID_VIN_G(viid);
u32 pf = FW_VIID_PFN_G(viid);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 1a28df1..276f223 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -1072,12 +1072,27 @@
static u64 hwcsum(enum chip_type chip, const struct sk_buff *skb)
{
int csum_type;
- const struct iphdr *iph = ip_hdr(skb);
+ bool inner_hdr_csum = false;
+ u16 proto, ver;
- if (iph->version == 4) {
- if (iph->protocol == IPPROTO_TCP)
+ if (skb->encapsulation &&
+ (CHELSIO_CHIP_VERSION(chip) > CHELSIO_T5))
+ inner_hdr_csum = true;
+
+ if (inner_hdr_csum) {
+ ver = inner_ip_hdr(skb)->version;
+ proto = (ver == 4) ? inner_ip_hdr(skb)->protocol :
+ inner_ipv6_hdr(skb)->nexthdr;
+ } else {
+ ver = ip_hdr(skb)->version;
+ proto = (ver == 4) ? ip_hdr(skb)->protocol :
+ ipv6_hdr(skb)->nexthdr;
+ }
+
+ if (ver == 4) {
+ if (proto == IPPROTO_TCP)
csum_type = TX_CSUM_TCPIP;
- else if (iph->protocol == IPPROTO_UDP)
+ else if (proto == IPPROTO_UDP)
csum_type = TX_CSUM_UDPIP;
else {
nocsum: /*
@@ -1090,19 +1105,29 @@
/*
* this doesn't work with extension headers
*/
- const struct ipv6hdr *ip6h = (const struct ipv6hdr *)iph;
-
- if (ip6h->nexthdr == IPPROTO_TCP)
+ if (proto == IPPROTO_TCP)
csum_type = TX_CSUM_TCPIP6;
- else if (ip6h->nexthdr == IPPROTO_UDP)
+ else if (proto == IPPROTO_UDP)
csum_type = TX_CSUM_UDPIP6;
else
goto nocsum;
}
if (likely(csum_type >= TX_CSUM_TCPIP)) {
- u64 hdr_len = TXPKT_IPHDR_LEN_V(skb_network_header_len(skb));
- int eth_hdr_len = skb_network_offset(skb) - ETH_HLEN;
+ int eth_hdr_len, l4_len;
+ u64 hdr_len;
+
+ if (inner_hdr_csum) {
+ /* This allows checksum offload for all encapsulated
+ * packets like GRE etc..
+ */
+ l4_len = skb_inner_network_header_len(skb);
+ eth_hdr_len = skb_inner_network_offset(skb) - ETH_HLEN;
+ } else {
+ l4_len = skb_network_header_len(skb);
+ eth_hdr_len = skb_network_offset(skb) - ETH_HLEN;
+ }
+ hdr_len = TXPKT_IPHDR_LEN_V(l4_len);
if (CHELSIO_CHIP_VERSION(chip) <= CHELSIO_T5)
hdr_len |= TXPKT_ETHHDR_LEN_V(eth_hdr_len);
@@ -1273,7 +1298,7 @@
netdev_tx_t t4_eth_xmit(struct sk_buff *skb, struct net_device *dev)
{
u32 wr_mid, ctrl0, op;
- u64 cntrl, *end;
+ u64 cntrl, *end, *sgl;
int qidx, credits;
unsigned int flits, ndesc;
struct adapter *adap;
@@ -1386,8 +1411,9 @@
end = (u64 *)wr + flits;
len = immediate ? skb->len : 0;
+ len += sizeof(*cpl);
if (ssi->gso_size) {
- struct cpl_tx_pkt_lso *lso = (void *)wr;
+ struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1);
bool v6 = (ssi->gso_type & SKB_GSO_TCPV6) != 0;
int l3hdr_len = skb_network_header_len(skb);
int eth_xtra_len = skb_network_offset(skb) - ETH_HLEN;
@@ -1417,20 +1443,19 @@
if (skb->ip_summed == CHECKSUM_PARTIAL)
cntrl = hwcsum(adap->params.chip, skb);
} else {
- lso->c.lso_ctrl = htonl(LSO_OPCODE_V(CPL_TX_PKT_LSO) |
- LSO_FIRST_SLICE_F | LSO_LAST_SLICE_F |
- LSO_IPV6_V(v6) |
- LSO_ETHHDR_LEN_V(eth_xtra_len / 4) |
- LSO_IPHDR_LEN_V(l3hdr_len / 4) |
- LSO_TCPHDR_LEN_V(tcp_hdr(skb)->doff));
- lso->c.ipid_ofst = htons(0);
- lso->c.mss = htons(ssi->gso_size);
- lso->c.seqno_offset = htonl(0);
+ lso->lso_ctrl = htonl(LSO_OPCODE_V(CPL_TX_PKT_LSO) |
+ LSO_FIRST_SLICE_F | LSO_LAST_SLICE_F |
+ LSO_IPV6_V(v6) |
+ LSO_ETHHDR_LEN_V(eth_xtra_len / 4) |
+ LSO_IPHDR_LEN_V(l3hdr_len / 4) |
+ LSO_TCPHDR_LEN_V(tcp_hdr(skb)->doff));
+ lso->ipid_ofst = htons(0);
+ lso->mss = htons(ssi->gso_size);
+ lso->seqno_offset = htonl(0);
if (is_t4(adap->params.chip))
- lso->c.len = htonl(skb->len);
+ lso->len = htonl(skb->len);
else
- lso->c.len =
- htonl(LSO_T5_XFER_SIZE_V(skb->len));
+ lso->len = htonl(LSO_T5_XFER_SIZE_V(skb->len));
cpl = (void *)(lso + 1);
if (CHELSIO_CHIP_VERSION(adap->params.chip)
@@ -1443,10 +1468,22 @@
TX_CSUM_TCPIP6 : TX_CSUM_TCPIP) |
TXPKT_IPHDR_LEN_V(l3hdr_len);
}
+ sgl = (u64 *)(cpl + 1); /* sgl start here */
+ if (unlikely((u8 *)sgl >= (u8 *)q->q.stat)) {
+ /* If current position is already at the end of the
+ * txq, reset the current to point to start of the queue
+ * and update the end ptr as well.
+ */
+ if (sgl == (u64 *)q->q.stat) {
+ int left = (u8 *)end - (u8 *)q->q.stat;
+
+ end = (void *)q->q.desc + left;
+ sgl = (void *)q->q.desc;
+ }
+ }
q->tso++;
q->tx_cso += ssi->gso_segs;
} else {
- len += sizeof(*cpl);
if (ptp_enabled)
op = FW_PTP_TX_PKT_WR;
else
@@ -1454,6 +1491,7 @@
wr->op_immdlen = htonl(FW_WR_OP_V(op) |
FW_WR_IMMDLEN_V(len));
cpl = (void *)(wr + 1);
+ sgl = (u64 *)(cpl + 1);
if (skb->ip_summed == CHECKSUM_PARTIAL) {
cntrl = hwcsum(adap->params.chip, skb) |
TXPKT_IPCSUM_DIS_F;
@@ -1487,20 +1525,19 @@
cpl->ctrl1 = cpu_to_be64(cntrl);
if (immediate) {
- cxgb4_inline_tx_skb(skb, &q->q, cpl + 1);
+ cxgb4_inline_tx_skb(skb, &q->q, sgl);
dev_consume_skb_any(skb);
} else {
int last_desc;
- cxgb4_write_sgl(skb, &q->q, (struct ulptx_sgl *)(cpl + 1),
- end, 0, addr);
+ cxgb4_write_sgl(skb, &q->q, (void *)sgl, end, 0, addr);
skb_orphan(skb);
last_desc = q->q.pidx + ndesc - 1;
if (last_desc >= q->q.size)
last_desc -= q->q.size;
q->q.sdesc[last_desc].skb = skb;
- q->q.sdesc[last_desc].sgl = (struct ulptx_sgl *)(cpl + 1);
+ q->q.sdesc[last_desc].sgl = (struct ulptx_sgl *)sgl;
}
txq_advance(&q->q, ndesc);
@@ -2259,7 +2296,7 @@
}
static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
- const struct cpl_rx_pkt *pkt)
+ const struct cpl_rx_pkt *pkt, unsigned long tnl_hdr_len)
{
struct adapter *adapter = rxq->rspq.adap;
struct sge *s = &adapter->sge;
@@ -2275,6 +2312,8 @@
}
copy_frags(skb, gl, s->pktshift);
+ if (tnl_hdr_len)
+ skb->csum_level = 1;
skb->len = gl->tot_len - s->pktshift;
skb->data_len = skb->len;
skb->truesize += skb->data_len;
@@ -2406,7 +2445,7 @@
struct sge *s = &q->adap->sge;
int cpl_trace_pkt = is_t4(q->adap->params.chip) ?
CPL_TRACE_PKT : CPL_TRACE_PKT_T5;
- u16 err_vec;
+ u16 err_vec, tnl_hdr_len = 0;
struct port_info *pi;
int ret = 0;
@@ -2415,16 +2454,19 @@
pkt = (const struct cpl_rx_pkt *)rsp;
/* Compressed error vector is enabled for T6 only */
- if (q->adap->params.tp.rx_pkt_encap)
+ if (q->adap->params.tp.rx_pkt_encap) {
err_vec = T6_COMPR_RXERR_VEC_G(be16_to_cpu(pkt->err_vec));
- else
+ tnl_hdr_len = T6_RX_TNLHDR_LEN_G(ntohs(pkt->err_vec));
+ } else {
err_vec = be16_to_cpu(pkt->err_vec);
+ }
csum_ok = pkt->csum_calc && !err_vec &&
(q->netdev->features & NETIF_F_RXCSUM);
- if ((pkt->l2info & htonl(RXF_TCP_F)) &&
+ if (((pkt->l2info & htonl(RXF_TCP_F)) ||
+ tnl_hdr_len) &&
(q->netdev->features & NETIF_F_GRO) && csum_ok && !pkt->ip_frag) {
- do_gro(rxq, si, pkt);
+ do_gro(rxq, si, pkt, tnl_hdr_len);
return 0;
}
@@ -2471,7 +2513,13 @@
} else if (pkt->l2info & htonl(RXF_IP_F)) {
__sum16 c = (__force __sum16)pkt->csum;
skb->csum = csum_unfold(c);
- skb->ip_summed = CHECKSUM_COMPLETE;
+
+ if (tnl_hdr_len) {
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+ skb->csum_level = 1;
+ } else {
+ skb->ip_summed = CHECKSUM_COMPLETE;
+ }
rxq->stats.rx_cso++;
}
} else {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/srq.c b/drivers/net/ethernet/chelsio/cxgb4/srq.c
index 6228a57..82b70a5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/srq.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/srq.c
@@ -84,8 +84,7 @@
if (!skb)
return -ENOMEM;
req = (struct cpl_srq_table_req *)
- __skb_put(skb, sizeof(*req));
- memset(req, 0, sizeof(*req));
+ __skb_put_zero(skb, sizeof(*req));
INIT_TP_WR(req, 0);
OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SRQ_TABLE_REQ,
TID_TID_V(srq_idx) |
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_chip_type.h b/drivers/net/ethernet/chelsio/cxgb4/t4_chip_type.h
index 54b7181..721c775 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_chip_type.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_chip_type.h
@@ -34,6 +34,8 @@
#ifndef __T4_CHIP_TYPE_H__
#define __T4_CHIP_TYPE_H__
+#define CHELSIO_PCI_ID_VER(__DeviceID) ((__DeviceID) >> 12)
+
#define CHELSIO_T4 0x4
#define CHELSIO_T5 0x5
#define CHELSIO_T6 0x6
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 7cb3ef4..39da7e3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3941,8 +3941,8 @@
CAP16_TO_CAP32(FC_RX);
CAP16_TO_CAP32(FC_TX);
CAP16_TO_CAP32(ANEG);
- CAP16_TO_CAP32(MDIX);
CAP16_TO_CAP32(MDIAUTO);
+ CAP16_TO_CAP32(MDISTRAIGHT);
CAP16_TO_CAP32(FEC_RS);
CAP16_TO_CAP32(FEC_BASER_RS);
CAP16_TO_CAP32(802_3_PAUSE);
@@ -3982,8 +3982,8 @@
CAP32_TO_CAP16(802_3_PAUSE);
CAP32_TO_CAP16(802_3_ASM_DIR);
CAP32_TO_CAP16(ANEG);
- CAP32_TO_CAP16(MDIX);
CAP32_TO_CAP16(MDIAUTO);
+ CAP32_TO_CAP16(MDISTRAIGHT);
CAP32_TO_CAP16(FEC_RS);
CAP32_TO_CAP16(FEC_BASER_RS);
@@ -4058,14 +4058,17 @@
* - If auto-negotiation is off set the MAC to the proper speed/duplex/FC,
* otherwise do it later based on the outcome of auto-negotiation.
*/
-int t4_link_l1cfg(struct adapter *adapter, unsigned int mbox,
- unsigned int port, struct link_config *lc)
+int t4_link_l1cfg_core(struct adapter *adapter, unsigned int mbox,
+ unsigned int port, struct link_config *lc,
+ bool sleep_ok, int timeout)
{
unsigned int fw_caps = adapter->params.fw_caps_support;
- struct fw_port_cmd cmd;
- unsigned int fw_mdi = FW_PORT_CAP32_MDI_V(FW_PORT_CAP32_MDI_AUTO);
fw_port_cap32_t fw_fc, cc_fec, fw_fec, rcap;
+ struct fw_port_cmd cmd;
+ unsigned int fw_mdi;
+ int ret;
+ fw_mdi = (FW_PORT_CAP32_MDI_V(FW_PORT_CAP32_MDI_AUTO) & lc->pcaps);
/* Convert driver coding of Pause Frame Flow Control settings into the
* Firmware's API.
*/
@@ -4087,7 +4090,7 @@
/* Figure out what our Requested Port Capabilities are going to be.
*/
if (!(lc->pcaps & FW_PORT_CAP32_ANEG)) {
- rcap = (lc->pcaps & ADVERT_MASK) | fw_fc | fw_fec;
+ rcap = lc->acaps | fw_fc | fw_fec;
lc->fc = lc->requested_fc & ~PAUSE_AUTONEG;
lc->fec = cc_fec;
} else if (lc->autoneg == AUTONEG_DISABLE) {
@@ -4098,6 +4101,13 @@
rcap = lc->acaps | fw_fc | fw_fec | fw_mdi;
}
+ if (rcap & ~lc->pcaps) {
+ dev_err(adapter->pdev_dev,
+ "Requested Port Capabilities %#x exceed Physical Port Capabilities %#x\n",
+ rcap, lc->pcaps);
+ return -EINVAL;
+ }
+
/* And send that on to the Firmware ...
*/
memset(&cmd, 0, sizeof(cmd));
@@ -4108,12 +4118,21 @@
cpu_to_be32(FW_PORT_CMD_ACTION_V(fw_caps == FW_CAPS16
? FW_PORT_ACTION_L1_CFG
: FW_PORT_ACTION_L1_CFG32) |
- FW_LEN16(cmd));
+ FW_LEN16(cmd));
if (fw_caps == FW_CAPS16)
cmd.u.l1cfg.rcap = cpu_to_be32(fwcaps32_to_caps16(rcap));
else
cmd.u.l1cfg32.rcap32 = cpu_to_be32(rcap);
- return t4_wr_mbox(adapter, mbox, &cmd, sizeof(cmd), NULL);
+
+ ret = t4_wr_mbox_meat_timeout(adapter, mbox, &cmd, sizeof(cmd), NULL,
+ sleep_ok, timeout);
+ if (ret) {
+ dev_err(adapter->pdev_dev,
+ "Requested Port Capabilities %#x rejected, error %d\n",
+ rcap, -ret);
+ return ret;
+ }
+ return ret;
}
/**
@@ -7513,6 +7532,43 @@
}
/**
+ * t4_free_encap_mac_filt - frees MPS entry at given index
+ * @adap: the adapter
+ * @viid: the VI id
+ * @idx: index of MPS entry to be freed
+ * @sleep_ok: call is allowed to sleep
+ *
+ * Frees the MPS entry at supplied index
+ *
+ * Returns a negative error number or zero on success
+ */
+int t4_free_encap_mac_filt(struct adapter *adap, unsigned int viid,
+ int idx, bool sleep_ok)
+{
+ struct fw_vi_mac_exact *p;
+ u8 addr[] = {0, 0, 0, 0, 0, 0};
+ struct fw_vi_mac_cmd c;
+ int ret = 0;
+ u32 exact;
+
+ memset(&c, 0, sizeof(c));
+ c.op_to_viid = cpu_to_be32(FW_CMD_OP_V(FW_VI_MAC_CMD) |
+ FW_CMD_REQUEST_F | FW_CMD_WRITE_F |
+ FW_CMD_EXEC_V(0) |
+ FW_VI_MAC_CMD_VIID_V(viid));
+ exact = FW_VI_MAC_CMD_ENTRY_TYPE_V(FW_VI_MAC_TYPE_EXACTMAC);
+ c.freemacs_to_len16 = cpu_to_be32(FW_VI_MAC_CMD_FREEMACS_V(0) |
+ exact |
+ FW_CMD_LEN16_V(1));
+ p = c.u.exact;
+ p->valid_to_idx = cpu_to_be16(FW_VI_MAC_CMD_VALID_F |
+ FW_VI_MAC_CMD_IDX_V(idx));
+ memcpy(p->macaddr, addr, sizeof(p->macaddr));
+ ret = t4_wr_mbox_meat(adap, adap->mbox, &c, sizeof(c), &c, sleep_ok);
+ return ret;
+}
+
+/**
* t4_free_raw_mac_filt - Frees a raw mac entry in mps tcam
* @adap: the adapter
* @viid: the VI id
@@ -7563,6 +7619,55 @@
}
/**
+ * t4_alloc_encap_mac_filt - Adds a mac entry in mps tcam with VNI support
+ * @adap: the adapter
+ * @viid: the VI id
+ * @mac: the MAC address
+ * @mask: the mask
+ * @vni: the VNI id for the tunnel protocol
+ * @vni_mask: mask for the VNI id
+ * @dip_hit: to enable DIP match for the MPS entry
+ * @lookup_type: MAC address for inner (1) or outer (0) header
+ * @sleep_ok: call is allowed to sleep
+ *
+ * Allocates an MPS entry with specified MAC address and VNI value.
+ *
+ * Returns a negative error number or the allocated index for this mac.
+ */
+int t4_alloc_encap_mac_filt(struct adapter *adap, unsigned int viid,
+ const u8 *addr, const u8 *mask, unsigned int vni,
+ unsigned int vni_mask, u8 dip_hit, u8 lookup_type,
+ bool sleep_ok)
+{
+ struct fw_vi_mac_cmd c;
+ struct fw_vi_mac_vni *p = c.u.exact_vni;
+ int ret = 0;
+ u32 val;
+
+ memset(&c, 0, sizeof(c));
+ c.op_to_viid = cpu_to_be32(FW_CMD_OP_V(FW_VI_MAC_CMD) |
+ FW_CMD_REQUEST_F | FW_CMD_WRITE_F |
+ FW_VI_MAC_CMD_VIID_V(viid));
+ val = FW_CMD_LEN16_V(1) |
+ FW_VI_MAC_CMD_ENTRY_TYPE_V(FW_VI_MAC_TYPE_EXACTMAC_VNI);
+ c.freemacs_to_len16 = cpu_to_be32(val);
+ p->valid_to_idx = cpu_to_be16(FW_VI_MAC_CMD_VALID_F |
+ FW_VI_MAC_CMD_IDX_V(FW_VI_MAC_ADD_MAC));
+ memcpy(p->macaddr, addr, sizeof(p->macaddr));
+ memcpy(p->macaddr_mask, mask, sizeof(p->macaddr_mask));
+
+ p->lookup_type_to_vni =
+ cpu_to_be32(FW_VI_MAC_CMD_VNI_V(vni) |
+ FW_VI_MAC_CMD_DIP_HIT_V(dip_hit) |
+ FW_VI_MAC_CMD_LOOKUP_TYPE_V(lookup_type));
+ p->vni_mask_pkd = cpu_to_be32(FW_VI_MAC_CMD_VNI_MASK_V(vni_mask));
+ ret = t4_wr_mbox_meat(adap, adap->mbox, &c, sizeof(c), &c, sleep_ok);
+ if (ret == 0)
+ ret = FW_VI_MAC_CMD_IDX_G(be16_to_cpu(p->valid_to_idx));
+ return ret;
+}
+
+/**
* t4_alloc_raw_mac_filt - Adds a mac entry in mps tcam
* @adap: the adapter
* @viid: the VI id
@@ -7909,6 +8014,34 @@
}
/**
+ * t4_enable_pi_params - enable/disable a Port's Virtual Interface
+ * @adap: the adapter
+ * @mbox: mailbox to use for the FW command
+ * @pi: the Port Information structure
+ * @rx_en: 1=enable Rx, 0=disable Rx
+ * @tx_en: 1=enable Tx, 0=disable Tx
+ * @dcb_en: 1=enable delivery of Data Center Bridging messages.
+ *
+ * Enables/disables a Port's Virtual Interface. Note that setting DCB
+ * Enable only makes sense when enabling a Virtual Interface ...
+ * If the Virtual Interface enable/disable operation is successful,
+ * we notify the OS-specific code of a potential Link Status change
+ * via the OS Contract API t4_os_link_changed().
+ */
+int t4_enable_pi_params(struct adapter *adap, unsigned int mbox,
+ struct port_info *pi,
+ bool rx_en, bool tx_en, bool dcb_en)
+{
+ int ret = t4_enable_vi_params(adap, mbox, pi->viid,
+ rx_en, tx_en, dcb_en);
+ if (ret)
+ return ret;
+ t4_os_link_changed(adap, pi->port_id,
+ rx_en && tx_en && pi->link_cfg.link_ok);
+ return 0;
+}
+
+/**
* t4_identify_port - identify a VI's port by blinking its LED
* @adap: the adapter
* @mbox: mailbox to use for the FW command
@@ -8249,6 +8382,9 @@
fc = fwcap_to_cc_pause(linkattr);
speed = fwcap_to_speed(linkattr);
+ lc->new_module = false;
+ lc->redo_l1cfg = false;
+
if (mod_type != pi->mod_type) {
/* With the newer SFP28 and QSFP28 Transceiver Module Types,
* various fundamental Port Capabilities which used to be
@@ -8283,6 +8419,8 @@
pi->port_type = port_type;
pi->mod_type = mod_type;
+
+ lc->new_module = t4_is_inserted_mod_type(mod_type);
t4_os_portmod_changed(adapter, pi->port_id);
}
@@ -8301,7 +8439,9 @@
lc->lpacaps = lpacaps;
lc->acaps = acaps & ADVERT_MASK;
- if (lc->acaps & FW_PORT_CAP32_ANEG) {
+ if (!(lc->acaps & FW_PORT_CAP32_ANEG)) {
+ lc->autoneg = AUTONEG_DISABLE;
+ } else if (lc->acaps & FW_PORT_CAP32_ANEG) {
lc->autoneg = AUTONEG_ENABLE;
} else {
/* When Autoneg is disabled, user needs to set
@@ -8315,6 +8455,26 @@
t4_os_link_changed(adapter, pi->port_id, link_ok);
}
+
+ if (lc->new_module && lc->redo_l1cfg) {
+ struct link_config old_lc;
+ int ret;
+
+ /* Save the current L1 Configuration and restore it if an
+ * error occurs. We probably should fix the l1_cfg*()
+ * routines not to change the link_config when an error
+ * occurs ...
+ */
+ old_lc = *lc;
+ ret = t4_link_l1cfg_ns(adapter, adapter->mbox, pi->lport, lc);
+ if (ret) {
+ *lc = old_lc;
+ dev_warn(adapter->pdev_dev,
+ "Attempt to update new Transceiver Module settings failed\n");
+ }
+ }
+ lc->new_module = false;
+ lc->redo_l1cfg = false;
}
/**
@@ -8486,6 +8646,13 @@
lc->requested_fec = FEC_AUTO;
lc->fec = fwcap_to_cc_fec(lc->def_acaps);
+ /* If the Port is capable of Auto-Negtotiation, initialize it as
+ * "enabled" and copy over all of the Physical Port Capabilities
+ * to the Advertised Port Capabilities. Otherwise mark it as
+ * Auto-Negotiate disabled and select the highest supported speed
+ * for the link. Note parallel structure in t4_link_l1cfg_core()
+ * and t4_handle_get_port_info().
+ */
if (lc->pcaps & FW_PORT_CAP32_ANEG) {
lc->acaps = lc->pcaps & ADVERT_MASK;
lc->autoneg = AUTONEG_ENABLE;
@@ -8493,6 +8660,7 @@
} else {
lc->acaps = 0;
lc->autoneg = AUTONEG_DISABLE;
+ lc->speed_caps = fwcap_to_fwspeed(acaps);
}
}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
index fe2029e9..09e38f0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
@@ -1233,6 +1233,11 @@
#define T6_COMPR_RXERR_SUM_V(x) ((x) << T6_COMPR_RXERR_SUM_S)
#define T6_COMPR_RXERR_SUM_F T6_COMPR_RXERR_SUM_V(1U)
+#define T6_RX_TNLHDR_LEN_S 8
+#define T6_RX_TNLHDR_LEN_M 0xFF
+#define T6_RX_TNLHDR_LEN_V(x) ((x) << T6_RX_TNLHDR_LEN_S)
+#define T6_RX_TNLHDR_LEN_G(x) (((x) >> T6_RX_TNLHDR_LEN_S) & T6_RX_TNLHDR_LEN_M)
+
struct cpl_trace_pkt {
u8 opcode;
u8 intf;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index 51b1803..c7f8d04 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -145,6 +145,9 @@
CH_PCI_ID_TABLE_FENTRY(0x5016), /* T580-OCP-SO */
CH_PCI_ID_TABLE_FENTRY(0x5017), /* T520-OCP-SO */
CH_PCI_ID_TABLE_FENTRY(0x5018), /* T540-BT */
+ CH_PCI_ID_TABLE_FENTRY(0x5019), /* T540-LP-BT */
+ CH_PCI_ID_TABLE_FENTRY(0x501a), /* T540-SO-BT */
+ CH_PCI_ID_TABLE_FENTRY(0x501b), /* T540-SO-CR */
CH_PCI_ID_TABLE_FENTRY(0x5080), /* Custom T540-cr */
CH_PCI_ID_TABLE_FENTRY(0x5081), /* Custom T540-LL-cr */
CH_PCI_ID_TABLE_FENTRY(0x5082), /* Custom T504-cr */
@@ -184,6 +187,7 @@
CH_PCI_ID_TABLE_FENTRY(0x50aa), /* Custom T580-CR */
CH_PCI_ID_TABLE_FENTRY(0x50ab), /* Custom T520-CR */
CH_PCI_ID_TABLE_FENTRY(0x50ac), /* Custom T540-BT */
+ CH_PCI_ID_TABLE_FENTRY(0x50ad), /* Custom T520-CR */
/* T6 adapters:
*/
@@ -208,6 +212,8 @@
CH_PCI_ID_TABLE_FENTRY(0x6085), /* Custom T6240-SO */
CH_PCI_ID_TABLE_FENTRY(0x6086), /* Custom T6225-SO-CR */
CH_PCI_ID_TABLE_FENTRY(0x6087), /* Custom T6225-CR */
+ CH_PCI_ID_TABLE_FENTRY(0x6088), /* Custom T62100-CR */
+ CH_PCI_ID_TABLE_FENTRY(0x6089), /* Custom T62100-KR */
CH_PCI_DEVICE_ID_TABLE_DEFINE_END;
#endif /* __T4_PCI_ID_TBL_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 276fdf2..6b55aa2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -1598,6 +1598,10 @@
#define VNIC_V(x) ((x) << VNIC_S)
#define VNIC_F VNIC_V(1U)
+#define USE_ENC_IDX_S 13
+#define USE_ENC_IDX_V(x) ((x) << USE_ENC_IDX_S)
+#define USE_ENC_IDX_F USE_ENC_IDX_V(1U)
+
#define CSUM_HAS_PSEUDO_HDR_S 10
#define CSUM_HAS_PSEUDO_HDR_V(x) ((x) << CSUM_HAS_PSEUDO_HDR_S)
#define CSUM_HAS_PSEUDO_HDR_F CSUM_HAS_PSEUDO_HDR_V(1U)
@@ -2995,6 +2999,7 @@
#define LE_DB_HASH_TID_BASE_A 0x19c30
#define LE_DB_HASH_TBL_BASE_ADDR_A 0x19c30
#define LE_DB_INT_CAUSE_A 0x19c3c
+#define LE_DB_CLCAM_TID_BASE_A 0x19df4
#define LE_DB_TID_HASHBASE_A 0x19df8
#define T6_LE_DB_HASH_TID_BASE_A 0x19df8
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index e3d4751..2d91480 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -1305,6 +1305,8 @@
FW_PARAMS_PARAM_PFVF_HPFILTER_END = 0x33,
FW_PARAMS_PARAM_PFVF_TLS_START = 0x34,
FW_PARAMS_PARAM_PFVF_TLS_END = 0x35,
+ FW_PARAMS_PARAM_PFVF_RAWF_START = 0x36,
+ FW_PARAMS_PARAM_PFVF_RAWF_END = 0x37,
FW_PARAMS_PARAM_PFVF_NCRYPTO_LOOKASIDE = 0x39,
FW_PARAMS_PARAM_PFVF_PORT_CAPS32 = 0x3A,
};
@@ -2156,6 +2158,14 @@
__be64 data0m_pkd;
__be32 data1m[2];
} raw;
+ struct fw_vi_mac_vni {
+ __be16 valid_to_idx;
+ __u8 macaddr[6];
+ __be16 r7;
+ __u8 macaddr_mask[6];
+ __be32 lookup_type_to_vni;
+ __be32 vni_mask_pkd;
+ } exact_vni[2];
} u;
};
@@ -2203,6 +2213,32 @@
#define FW_VI_MAC_CMD_RAW_IDX_G(x) \
(((x) >> FW_VI_MAC_CMD_RAW_IDX_S) & FW_VI_MAC_CMD_RAW_IDX_M)
+#define FW_VI_MAC_CMD_LOOKUP_TYPE_S 31
+#define FW_VI_MAC_CMD_LOOKUP_TYPE_M 0x1
+#define FW_VI_MAC_CMD_LOOKUP_TYPE_V(x) ((x) << FW_VI_MAC_CMD_LOOKUP_TYPE_S)
+#define FW_VI_MAC_CMD_LOOKUP_TYPE_G(x) \
+ (((x) >> FW_VI_MAC_CMD_LOOKUP_TYPE_S) & FW_VI_MAC_CMD_LOOKUP_TYPE_M)
+#define FW_VI_MAC_CMD_LOOKUP_TYPE_F FW_VI_MAC_CMD_LOOKUP_TYPE_V(1U)
+
+#define FW_VI_MAC_CMD_DIP_HIT_S 30
+#define FW_VI_MAC_CMD_DIP_HIT_M 0x1
+#define FW_VI_MAC_CMD_DIP_HIT_V(x) ((x) << FW_VI_MAC_CMD_DIP_HIT_S)
+#define FW_VI_MAC_CMD_DIP_HIT_G(x) \
+ (((x) >> FW_VI_MAC_CMD_DIP_HIT_S) & FW_VI_MAC_CMD_DIP_HIT_M)
+#define FW_VI_MAC_CMD_DIP_HIT_F FW_VI_MAC_CMD_DIP_HIT_V(1U)
+
+#define FW_VI_MAC_CMD_VNI_S 0
+#define FW_VI_MAC_CMD_VNI_M 0xffffff
+#define FW_VI_MAC_CMD_VNI_V(x) ((x) << FW_VI_MAC_CMD_VNI_S)
+#define FW_VI_MAC_CMD_VNI_G(x) \
+ (((x) >> FW_VI_MAC_CMD_VNI_S) & FW_VI_MAC_CMD_VNI_M)
+
+#define FW_VI_MAC_CMD_VNI_MASK_S 0
+#define FW_VI_MAC_CMD_VNI_MASK_M 0xffffff
+#define FW_VI_MAC_CMD_VNI_MASK_V(x) ((x) << FW_VI_MAC_CMD_VNI_MASK_S)
+#define FW_VI_MAC_CMD_VNI_MASK_G(x) \
+ (((x) >> FW_VI_MAC_CMD_VNI_MASK_S) & FW_VI_MAC_CMD_VNI_MASK_M)
+
#define FW_RXMODE_MTU_NO_CHG 65535
struct fw_vi_rxmode_cmd {
@@ -2435,8 +2471,8 @@
FW_PORT_CAP_FC_RX = 0x0040,
FW_PORT_CAP_FC_TX = 0x0080,
FW_PORT_CAP_ANEG = 0x0100,
- FW_PORT_CAP_MDIX = 0x0200,
- FW_PORT_CAP_MDIAUTO = 0x0400,
+ FW_PORT_CAP_MDIAUTO = 0x0200,
+ FW_PORT_CAP_MDISTRAIGHT = 0x0400,
FW_PORT_CAP_FEC_RS = 0x0800,
FW_PORT_CAP_FEC_BASER_RS = 0x1000,
FW_PORT_CAP_FEC_RESERVED = 0x2000,
@@ -2479,8 +2515,8 @@
#define FW_PORT_CAP32_802_3_PAUSE 0x00040000UL
#define FW_PORT_CAP32_802_3_ASM_DIR 0x00080000UL
#define FW_PORT_CAP32_ANEG 0x00100000UL
-#define FW_PORT_CAP32_MDIX 0x00200000UL
-#define FW_PORT_CAP32_MDIAUTO 0x00400000UL
+#define FW_PORT_CAP32_MDIAUTO 0x00200000UL
+#define FW_PORT_CAP32_MDISTRAIGHT 0x00400000UL
#define FW_PORT_CAP32_FEC_RS 0x00800000UL
#define FW_PORT_CAP32_FEC_BASER_RS 0x01000000UL
#define FW_PORT_CAP32_FEC_RESERVED1 0x02000000UL
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
index 123e2c1..4eb15ce 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
@@ -36,8 +36,8 @@
#define __T4FW_VERSION_H__
#define T4FW_VERSION_MAJOR 0x01
-#define T4FW_VERSION_MINOR 0x10
-#define T4FW_VERSION_MICRO 0x3F
+#define T4FW_VERSION_MINOR 0x13
+#define T4FW_VERSION_MICRO 0x01
#define T4FW_VERSION_BUILD 0x00
#define T4FW_MIN_VERSION_MAJOR 0x01
@@ -45,8 +45,8 @@
#define T4FW_MIN_VERSION_MICRO 0x00
#define T5FW_VERSION_MAJOR 0x01
-#define T5FW_VERSION_MINOR 0x10
-#define T5FW_VERSION_MICRO 0x3F
+#define T5FW_VERSION_MINOR 0x13
+#define T5FW_VERSION_MICRO 0x01
#define T5FW_VERSION_BUILD 0x00
#define T5FW_MIN_VERSION_MAJOR 0x00
@@ -54,8 +54,8 @@
#define T5FW_MIN_VERSION_MICRO 0x00
#define T6FW_VERSION_MAJOR 0x01
-#define T6FW_VERSION_MINOR 0x10
-#define T6FW_VERSION_MICRO 0x3F
+#define T6FW_VERSION_MINOR 0x13
+#define T6FW_VERSION_MICRO 0x01
#define T6FW_VERSION_BUILD 0x00
#define T6FW_MIN_VERSION_MAJOR 0x00
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index 9a81b523..ff84791 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -274,7 +274,7 @@
* is enabled on a port.
*/
if (ret == 0)
- ret = t4vf_enable_vi(pi->adapter, pi->viid, true, true);
+ ret = t4vf_enable_pi(pi->adapter, pi, true, true);
/* The Virtual Interfaces are connected to an internal switch on the
* chip which allows VIs attached to the same port to talk to each
@@ -822,8 +822,7 @@
netif_tx_stop_all_queues(dev);
netif_carrier_off(dev);
- t4vf_enable_vi(adapter, pi->viid, false, false);
- pi->link_cfg.link_ok = 0;
+ t4vf_enable_pi(adapter, pi, false, false);
clear_bit(pi->port_id, &adapter->open_device_map);
if (adapter->open_device_map == 0)
@@ -1419,6 +1418,22 @@
base->duplex = DUPLEX_UNKNOWN;
}
+ if (pi->link_cfg.fc & PAUSE_RX) {
+ if (pi->link_cfg.fc & PAUSE_TX) {
+ ethtool_link_ksettings_add_link_mode(link_ksettings,
+ advertising,
+ Pause);
+ } else {
+ ethtool_link_ksettings_add_link_mode(link_ksettings,
+ advertising,
+ Asym_Pause);
+ }
+ } else if (pi->link_cfg.fc & PAUSE_TX) {
+ ethtool_link_ksettings_add_link_mode(link_ksettings,
+ advertising,
+ Asym_Pause);
+ }
+
base->autoneg = pi->link_cfg.autoneg;
if (pi->link_cfg.pcaps & FW_PORT_CAP32_ANEG)
ethtool_link_ksettings_add_link_mode(link_ksettings,
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
index 712e8f0..ccca67c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
@@ -391,7 +391,10 @@
int t4vf_alloc_vi(struct adapter *, int);
int t4vf_free_vi(struct adapter *, int);
-int t4vf_enable_vi(struct adapter *, unsigned int, bool, bool);
+int t4vf_enable_vi(struct adapter *adapter, unsigned int viid, bool rx_en,
+ bool tx_en);
+int t4vf_enable_pi(struct adapter *adapter, struct port_info *pi, bool rx_en,
+ bool tx_en);
int t4vf_identify_port(struct adapter *, unsigned int, unsigned int);
int t4vf_set_rxmode(struct adapter *, unsigned int, int, int, int, int, int,
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
index 798695b..5b8c08c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
@@ -341,8 +341,8 @@
CAP16_TO_CAP32(FC_RX);
CAP16_TO_CAP32(FC_TX);
CAP16_TO_CAP32(ANEG);
- CAP16_TO_CAP32(MDIX);
CAP16_TO_CAP32(MDIAUTO);
+ CAP16_TO_CAP32(MDISTRAIGHT);
CAP16_TO_CAP32(FEC_RS);
CAP16_TO_CAP32(FEC_BASER_RS);
CAP16_TO_CAP32(802_3_PAUSE);
@@ -405,6 +405,36 @@
return 0;
}
+/**
+ * fwcap_to_fwspeed - return highest speed in Port Capabilities
+ * @acaps: advertised Port Capabilities
+ *
+ * Get the highest speed for the port from the advertised Port
+ * Capabilities. It will be either the highest speed from the list of
+ * speeds or whatever user has set using ethtool.
+ */
+static fw_port_cap32_t fwcap_to_fwspeed(fw_port_cap32_t acaps)
+{
+ #define TEST_SPEED_RETURN(__caps_speed) \
+ do { \
+ if (acaps & FW_PORT_CAP32_SPEED_##__caps_speed) \
+ return FW_PORT_CAP32_SPEED_##__caps_speed; \
+ } while (0)
+
+ TEST_SPEED_RETURN(400G);
+ TEST_SPEED_RETURN(200G);
+ TEST_SPEED_RETURN(100G);
+ TEST_SPEED_RETURN(50G);
+ TEST_SPEED_RETURN(40G);
+ TEST_SPEED_RETURN(25G);
+ TEST_SPEED_RETURN(10G);
+ TEST_SPEED_RETURN(1G);
+ TEST_SPEED_RETURN(100M);
+
+ #undef TEST_SPEED_RETURN
+ return 0;
+}
+
/*
* init_link_config - initialize a link's SW state
* @lc: structure holding the link state
@@ -431,6 +461,13 @@
lc->requested_fec = FEC_AUTO;
lc->fec = lc->auto_fec;
+ /* If the Port is capable of Auto-Negtotiation, initialize it as
+ * "enabled" and copy over all of the Physical Port Capabilities
+ * to the Advertised Port Capabilities. Otherwise mark it as
+ * Auto-Negotiate disabled and select the highest supported speed
+ * for the link. Note parallel structure in t4_link_l1cfg_core()
+ * and t4_handle_get_port_info().
+ */
if (lc->pcaps & FW_PORT_CAP32_ANEG) {
lc->acaps = acaps & ADVERT_MASK;
lc->autoneg = AUTONEG_ENABLE;
@@ -438,6 +475,7 @@
} else {
lc->acaps = 0;
lc->autoneg = AUTONEG_DISABLE;
+ lc->speed_caps = fwcap_to_fwspeed(acaps);
}
}
@@ -1362,6 +1400,30 @@
}
/**
+ * t4vf_enable_pi - enable/disable a Port's virtual interface
+ * @adapter: the adapter
+ * @pi: the Port Information structure
+ * @rx_en: 1=enable Rx, 0=disable Rx
+ * @tx_en: 1=enable Tx, 0=disable Tx
+ *
+ * Enables/disables a Port's virtual interface. If the Virtual
+ * Interface enable/disable operation is successful, we notify the
+ * OS-specific code of a potential Link Status change via the OS Contract
+ * API t4vf_os_link_changed().
+ */
+int t4vf_enable_pi(struct adapter *adapter, struct port_info *pi,
+ bool rx_en, bool tx_en)
+{
+ int ret = t4vf_enable_vi(adapter, pi->viid, rx_en, tx_en);
+
+ if (ret)
+ return ret;
+ t4vf_os_link_changed(adapter, pi->pidx,
+ rx_en && tx_en && pi->link_cfg.link_ok);
+ return 0;
+}
+
+/**
* t4vf_identify_port - identify a VI's port by blinking its LED
* @adapter: the adapter
* @viid: the Virtual Interface ID
@@ -1955,7 +2017,14 @@
lc->lpacaps = lpacaps;
lc->acaps = acaps & ADVERT_MASK;
- if (lc->acaps & FW_PORT_CAP32_ANEG) {
+ /* If we're not physically capable of Auto-Negotiation, note
+ * this as Auto-Negotiation disabled. Otherwise, we track
+ * what Auto-Negotiation settings we have. Note parallel
+ * structure in init_link_config().
+ */
+ if (!(lc->pcaps & FW_PORT_CAP32_ANEG)) {
+ lc->autoneg = AUTONEG_DISABLE;
+ } else if (lc->acaps & FW_PORT_CAP32_ANEG) {
lc->autoneg = AUTONEG_ENABLE;
} else {
/* When Autoneg is disabled, user needs to set
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
index 4b5aacc..240ba9d 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -90,8 +90,7 @@
{
struct cpl_tid_release *req;
- req = __skb_put(skb, len);
- memset(req, 0, len);
+ req = __skb_put_zero(skb, len);
INIT_TP_WR(req, tid);
OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_TID_RELEASE, tid));
@@ -104,8 +103,7 @@
{
struct cpl_close_con_req *req;
- req = __skb_put(skb, len);
- memset(req, 0, len);
+ req = __skb_put_zero(skb, len);
INIT_TP_WR(req, tid);
OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_CLOSE_CON_REQ, tid));
@@ -119,8 +117,7 @@
{
struct cpl_abort_req *req;
- req = __skb_put(skb, len);
- memset(req, 0, len);
+ req = __skb_put_zero(skb, len);
INIT_TP_WR(req, tid);
OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_REQ, tid));
@@ -134,8 +131,7 @@
{
struct cpl_abort_rpl *rpl;
- rpl = __skb_put(skb, len);
- memset(rpl, 0, len);
+ rpl = __skb_put_zero(skb, len);
INIT_TP_WR(rpl, tid);
OPCODE_TID(rpl) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_RPL, tid));
@@ -149,8 +145,7 @@
{
struct cpl_rx_data_ack *req;
- req = __skb_put(skb, len);
- memset(req, 0, len);
+ req = __skb_put_zero(skb, len);
INIT_TP_WR(req, tid);
OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK, tid));
diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 8bb0db9..00a5727 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -1246,8 +1246,7 @@
mdiobus_unregister(priv->mdio);
mdiobus_free(priv->mdio);
free2:
- if (priv->clk)
- clk_disable_unprepare(priv->clk);
+ clk_disable_unprepare(priv->clk);
free:
free_netdev(netdev);
out:
@@ -1271,8 +1270,7 @@
mdiobus_unregister(priv->mdio);
mdiobus_free(priv->mdio);
}
- if (priv->clk)
- clk_disable_unprepare(priv->clk);
+ clk_disable_unprepare(priv->clk);
unregister_netdev(netdev);
free_netdev(netdev);
}
diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
index 6e490fd2..a580a3d 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -22,7 +22,7 @@
config FEC
tristate "FEC ethernet controller (of ColdFire and some i.MX CPUs)"
depends on (M523x || M527x || M5272 || M528x || M520x || M532x || \
- ARCH_MXC || SOC_IMX28)
+ ARCH_MXC || SOC_IMX28 || COMPILE_TEST)
default ARCH_MXC || SOC_IMX28 if ARM
select PHYLIB
imply PTP_1588_CLOCK
diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index e7381f8..4778b66 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -21,7 +21,7 @@
#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM) || \
- defined(CONFIG_ARM64)
+ defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)
/*
* Just figures, Motorola would have to change the offsets for
* registers in the same peripheral device on different models
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 9d3eed4..ab7521c 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2053,13 +2053,9 @@
fep->mii_bus->parent = &pdev->dev;
node = of_get_child_by_name(pdev->dev.of_node, "mdio");
- if (node) {
- err = of_mdiobus_register(fep->mii_bus, node);
+ err = of_mdiobus_register(fep->mii_bus, node);
+ if (node)
of_node_put(node);
- } else {
- err = mdiobus_register(fep->mii_bus);
- }
-
if (err)
goto err_out_free_mdiobus;
@@ -2112,7 +2108,7 @@
/* List of registers that can be safety be read to dump them with ethtool */
#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM) || \
- defined(CONFIG_ARM64)
+ defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)
static u32 fec_enet_register_offset[] = {
FEC_IEVENT, FEC_IMASK, FEC_R_DES_ACTIVE_0, FEC_X_DES_ACTIVE_0,
FEC_ECNTRL, FEC_MII_DATA, FEC_MII_SPEED, FEC_MIB_CTRLSTAT, FEC_R_CNTRL,
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c
index 43d9732..36c2d7d 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -454,12 +454,6 @@
return -EOPNOTSUPP;
}
-/**
- * fec_ptp_hwtstamp_ioctl - control hardware time stamping
- * @ndev: pointer to net_device
- * @ifreq: ioctl data
- * @cmd: particular ioctl requested
- */
int fec_ptp_set(struct net_device *ndev, struct ifreq *ifr)
{
struct fec_enet_private *fep = netdev_priv(ndev);
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index 6552d68..ce6e24c 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -1391,12 +1391,10 @@
/* FM_WRONG_RESET_VALUES_ERRATA_FMAN_A005127 Errata
* workaround
*/
- if (port->rev_info.major >= 6) {
- u32 reg;
+ u32 reg;
- reg = 0x00001013;
- iowrite32be(reg, &port->bmi_regs->tx.fmbm_tfp);
- }
+ reg = 0x00001013;
+ iowrite32be(reg, &port->bmi_regs->tx.fmbm_tfp);
}
return 0;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
index 02145f2..63d7dbf 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
@@ -50,13 +50,22 @@
/* now, (un-)instantiate client by calling lower layer */
if (is_reg) {
ret = ae_dev->ops->init_client_instance(client, ae_dev);
- if (ret)
+ if (ret) {
dev_err(&ae_dev->pdev->dev,
"fail to instantiate client\n");
- return ret;
+ return ret;
+ }
+
+ hnae_set_bit(ae_dev->flag, HNAE3_CLIENT_INITED_B, 1);
+ return 0;
}
- ae_dev->ops->uninit_client_instance(client, ae_dev);
+ if (hnae_get_bit(ae_dev->flag, HNAE3_CLIENT_INITED_B)) {
+ ae_dev->ops->uninit_client_instance(client, ae_dev);
+
+ hnae_set_bit(ae_dev->flag, HNAE3_CLIENT_INITED_B, 0);
+ }
+
return 0;
}
@@ -89,7 +98,7 @@
exit:
mutex_unlock(&hnae3_common_lock);
- return ret;
+ return 0;
}
EXPORT_SYMBOL(hnae3_register_client);
@@ -112,7 +121,7 @@
* @ae_algo: AE algorithm
* NOTE: the duplicated name will not be checked
*/
-int hnae3_register_ae_algo(struct hnae3_ae_algo *ae_algo)
+void hnae3_register_ae_algo(struct hnae3_ae_algo *ae_algo)
{
const struct pci_device_id *id;
struct hnae3_ae_dev *ae_dev;
@@ -151,8 +160,6 @@
}
mutex_unlock(&hnae3_common_lock);
-
- return ret;
}
EXPORT_SYMBOL(hnae3_register_ae_algo);
@@ -168,6 +175,9 @@
mutex_lock(&hnae3_common_lock);
/* Check if there are matched ae_dev */
list_for_each_entry(ae_dev, &hnae3_ae_dev_list, node) {
+ if (!hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))
+ continue;
+
id = pci_match_id(ae_algo->pdev_id_table, ae_dev->pdev);
if (!id)
continue;
@@ -191,22 +201,14 @@
* @ae_dev: the AE device
* NOTE: the duplicated name will not be checked
*/
-int hnae3_register_ae_dev(struct hnae3_ae_dev *ae_dev)
+void hnae3_register_ae_dev(struct hnae3_ae_dev *ae_dev)
{
const struct pci_device_id *id;
struct hnae3_ae_algo *ae_algo;
struct hnae3_client *client;
- int ret = 0, lock_acquired;
+ int ret = 0;
- /* we can get deadlocked if SRIOV is being enabled in context to probe
- * and probe gets called again in same context. This can happen when
- * pci_enable_sriov() is called to create VFs from PF probes context.
- * Therefore, for simplicity uniformly defering further probing in all
- * cases where we detect contention.
- */
- lock_acquired = mutex_trylock(&hnae3_common_lock);
- if (!lock_acquired)
- return -EPROBE_DEFER;
+ mutex_lock(&hnae3_common_lock);
list_add_tail(&ae_dev->node, &hnae3_ae_dev_list);
@@ -220,7 +222,6 @@
if (!ae_dev->ops) {
dev_err(&ae_dev->pdev->dev, "ae_dev ops are null\n");
- ret = -EOPNOTSUPP;
goto out_err;
}
@@ -247,8 +248,6 @@
out_err:
mutex_unlock(&hnae3_common_lock);
-
- return ret;
}
EXPORT_SYMBOL(hnae3_register_ae_dev);
@@ -264,6 +263,9 @@
mutex_lock(&hnae3_common_lock);
/* Check if there are matched ae_algo */
list_for_each_entry(ae_algo, &hnae3_ae_algo_list, node) {
+ if (!hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))
+ continue;
+
id = pci_match_id(ae_algo->pdev_id_table, ae_dev->pdev);
if (!id)
continue;
@@ -283,3 +285,4 @@
MODULE_AUTHOR("Huawei Tech. Co., Ltd.");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("HNAE3(Hisilicon Network Acceleration Engine) Framework");
+MODULE_VERSION(HNAE3_MOD_VERSION);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 37ec1b3..45c571e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -36,6 +36,8 @@
#include <linux/pci.h>
#include <linux/types.h>
+#define HNAE3_MOD_VERSION "1.0"
+
/* Device IDs */
#define HNAE3_DEV_ID_GE 0xA220
#define HNAE3_DEV_ID_25GE 0xA221
@@ -52,6 +54,7 @@
#define HNAE3_DEV_INITED_B 0x0
#define HNAE3_DEV_SUPPORT_ROCE_B 0x1
#define HNAE3_DEV_SUPPORT_DCB_B 0x2
+#define HNAE3_CLIENT_INITED_B 0x3
#define HNAE3_DEV_SUPPORT_ROCE_DCB_BITS (BIT(HNAE3_DEV_SUPPORT_DCB_B) |\
BIT(HNAE3_DEV_SUPPORT_ROCE_B))
@@ -273,10 +276,6 @@
* Map rings to vector
* unmap_ring_from_vector()
* Unmap rings from vector
- * add_tunnel_udp()
- * Add tunnel information to hardware
- * del_tunnel_udp()
- * Delete tunnel information from hardware
* reset_queue()
* Reset queue
* get_fw_version()
@@ -388,9 +387,6 @@
int vector_num,
struct hnae3_ring_chain_node *vr_chain);
- int (*add_tunnel_udp)(struct hnae3_handle *handle, u16 port_num);
- int (*del_tunnel_udp)(struct hnae3_handle *handle, u16 port_num);
-
void (*reset_queue)(struct hnae3_handle *handle, u16 queue_id);
u32 (*get_fw_version)(struct hnae3_handle *handle);
void (*get_mdix_mode)(struct hnae3_handle *handle,
@@ -521,11 +517,11 @@
#define hnae_get_bit(origin, shift) \
hnae_get_field((origin), (0x1 << (shift)), (shift))
-int hnae3_register_ae_dev(struct hnae3_ae_dev *ae_dev);
+void hnae3_register_ae_dev(struct hnae3_ae_dev *ae_dev);
void hnae3_unregister_ae_dev(struct hnae3_ae_dev *ae_dev);
void hnae3_unregister_ae_algo(struct hnae3_ae_algo *ae_algo);
-int hnae3_register_ae_algo(struct hnae3_ae_algo *ae_algo);
+void hnae3_register_ae_algo(struct hnae3_ae_algo *ae_algo);
void hnae3_unregister_client(struct hnae3_client *client);
int hnae3_register_client(struct hnae3_client *client);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 8c55965..cac5195 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -502,7 +502,7 @@
/* find outer header point */
l3.hdr = skb_network_header(skb);
- l4_hdr = skb_inner_transport_header(skb);
+ l4_hdr = skb_transport_header(skb);
if (skb->protocol == htons(ETH_P_IPV6)) {
exthdr = l3.hdr + sizeof(*l3.v6);
@@ -1244,93 +1244,6 @@
stats->tx_compressed = netdev->stats.tx_compressed;
}
-static void hns3_add_tunnel_port(struct net_device *netdev, u16 port,
- enum hns3_udp_tnl_type type)
-{
- struct hns3_nic_priv *priv = netdev_priv(netdev);
- struct hns3_udp_tunnel *udp_tnl = &priv->udp_tnl[type];
- struct hnae3_handle *h = priv->ae_handle;
-
- if (udp_tnl->used && udp_tnl->dst_port == port) {
- udp_tnl->used++;
- return;
- }
-
- if (udp_tnl->used) {
- netdev_warn(netdev,
- "UDP tunnel [%d], port [%d] offload\n", type, port);
- return;
- }
-
- udp_tnl->dst_port = port;
- udp_tnl->used = 1;
- /* TBD send command to hardware to add port */
- if (h->ae_algo->ops->add_tunnel_udp)
- h->ae_algo->ops->add_tunnel_udp(h, port);
-}
-
-static void hns3_del_tunnel_port(struct net_device *netdev, u16 port,
- enum hns3_udp_tnl_type type)
-{
- struct hns3_nic_priv *priv = netdev_priv(netdev);
- struct hns3_udp_tunnel *udp_tnl = &priv->udp_tnl[type];
- struct hnae3_handle *h = priv->ae_handle;
-
- if (!udp_tnl->used || udp_tnl->dst_port != port) {
- netdev_warn(netdev,
- "Invalid UDP tunnel port %d\n", port);
- return;
- }
-
- udp_tnl->used--;
- if (udp_tnl->used)
- return;
-
- udp_tnl->dst_port = 0;
- /* TBD send command to hardware to del port */
- if (h->ae_algo->ops->del_tunnel_udp)
- h->ae_algo->ops->del_tunnel_udp(h, port);
-}
-
-/* hns3_nic_udp_tunnel_add - Get notifiacetion about UDP tunnel ports
- * @netdev: This physical ports's netdev
- * @ti: Tunnel information
- */
-static void hns3_nic_udp_tunnel_add(struct net_device *netdev,
- struct udp_tunnel_info *ti)
-{
- u16 port_n = ntohs(ti->port);
-
- switch (ti->type) {
- case UDP_TUNNEL_TYPE_VXLAN:
- hns3_add_tunnel_port(netdev, port_n, HNS3_UDP_TNL_VXLAN);
- break;
- case UDP_TUNNEL_TYPE_GENEVE:
- hns3_add_tunnel_port(netdev, port_n, HNS3_UDP_TNL_GENEVE);
- break;
- default:
- netdev_err(netdev, "unsupported tunnel type %d\n", ti->type);
- break;
- }
-}
-
-static void hns3_nic_udp_tunnel_del(struct net_device *netdev,
- struct udp_tunnel_info *ti)
-{
- u16 port_n = ntohs(ti->port);
-
- switch (ti->type) {
- case UDP_TUNNEL_TYPE_VXLAN:
- hns3_del_tunnel_port(netdev, port_n, HNS3_UDP_TNL_VXLAN);
- break;
- case UDP_TUNNEL_TYPE_GENEVE:
- hns3_del_tunnel_port(netdev, port_n, HNS3_UDP_TNL_GENEVE);
- break;
- default:
- break;
- }
-}
-
static int hns3_setup_tc(struct net_device *netdev, void *type_data)
{
struct tc_mqprio_qopt_offload *mqprio_qopt = type_data;
@@ -1569,13 +1482,50 @@
.ndo_get_stats64 = hns3_nic_get_stats64,
.ndo_setup_tc = hns3_nic_setup_tc,
.ndo_set_rx_mode = hns3_nic_set_rx_mode,
- .ndo_udp_tunnel_add = hns3_nic_udp_tunnel_add,
- .ndo_udp_tunnel_del = hns3_nic_udp_tunnel_del,
.ndo_vlan_rx_add_vid = hns3_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = hns3_vlan_rx_kill_vid,
.ndo_set_vf_vlan = hns3_ndo_set_vf_vlan,
};
+static bool hns3_is_phys_func(struct pci_dev *pdev)
+{
+ u32 dev_id = pdev->device;
+
+ switch (dev_id) {
+ case HNAE3_DEV_ID_GE:
+ case HNAE3_DEV_ID_25GE:
+ case HNAE3_DEV_ID_25GE_RDMA:
+ case HNAE3_DEV_ID_25GE_RDMA_MACSEC:
+ case HNAE3_DEV_ID_50GE_RDMA:
+ case HNAE3_DEV_ID_50GE_RDMA_MACSEC:
+ case HNAE3_DEV_ID_100G_RDMA_MACSEC:
+ return true;
+ case HNAE3_DEV_ID_100G_VF:
+ case HNAE3_DEV_ID_100G_RDMA_DCB_PFC_VF:
+ return false;
+ default:
+ dev_warn(&pdev->dev, "un-recognized pci device-id %d",
+ dev_id);
+ }
+
+ return false;
+}
+
+static void hns3_disable_sriov(struct pci_dev *pdev)
+{
+ /* If our VFs are assigned we cannot shut down SR-IOV
+ * without causing issues, so just leave the hardware
+ * available but disabled
+ */
+ if (pci_vfs_assigned(pdev)) {
+ dev_warn(&pdev->dev,
+ "disabling driver while VFs are assigned\n");
+ return;
+ }
+
+ pci_disable_sriov(pdev);
+}
+
/* hns3_probe - Device initialization routine
* @pdev: PCI device information struct
* @ent: entry in hns3_pci_tbl
@@ -1603,7 +1553,9 @@
ae_dev->dev_type = HNAE3_DEV_KNIC;
pci_set_drvdata(pdev, ae_dev);
- return hnae3_register_ae_dev(ae_dev);
+ hnae3_register_ae_dev(ae_dev);
+
+ return 0;
}
/* hns3_remove - Device removal routine
@@ -1613,21 +1565,56 @@
{
struct hnae3_ae_dev *ae_dev = pci_get_drvdata(pdev);
+ if (hns3_is_phys_func(pdev) && IS_ENABLED(CONFIG_PCI_IOV))
+ hns3_disable_sriov(pdev);
+
hnae3_unregister_ae_dev(ae_dev);
}
+/**
+ * hns3_pci_sriov_configure
+ * @pdev: pointer to a pci_dev structure
+ * @num_vfs: number of VFs to allocate
+ *
+ * Enable or change the number of VFs. Called when the user updates the number
+ * of VFs in sysfs.
+ **/
+static int hns3_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
+{
+ int ret;
+
+ if (!(hns3_is_phys_func(pdev) && IS_ENABLED(CONFIG_PCI_IOV))) {
+ dev_warn(&pdev->dev, "Can not config SRIOV\n");
+ return -EINVAL;
+ }
+
+ if (num_vfs) {
+ ret = pci_enable_sriov(pdev, num_vfs);
+ if (ret)
+ dev_err(&pdev->dev, "SRIOV enable failed %d\n", ret);
+ else
+ return num_vfs;
+ } else if (!pci_vfs_assigned(pdev)) {
+ pci_disable_sriov(pdev);
+ } else {
+ dev_warn(&pdev->dev,
+ "Unable to free VFs because some are assigned to VMs.\n");
+ }
+
+ return 0;
+}
+
static struct pci_driver hns3_driver = {
.name = hns3_driver_name,
.id_table = hns3_pci_tbl,
.probe = hns3_probe,
.remove = hns3_remove,
+ .sriov_configure = hns3_pci_sriov_configure,
};
/* set default feature to hns3 */
static void hns3_set_default_feature(struct net_device *netdev)
{
- struct hnae3_handle *h = hns3_get_handle(netdev);
-
netdev->priv_flags |= IFF_UNICAST_FLT;
netdev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
@@ -1656,15 +1643,11 @@
NETIF_F_GSO_UDP_TUNNEL_CSUM;
netdev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
- NETIF_F_HW_VLAN_CTAG_TX |
+ NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_RXCSUM | NETIF_F_SG | NETIF_F_GSO |
NETIF_F_GRO | NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_GSO_GRE |
NETIF_F_GSO_GRE_CSUM | NETIF_F_GSO_UDP_TUNNEL |
NETIF_F_GSO_UDP_TUNNEL_CSUM;
-
- if (!(h->flags & HNAE3_SUPPORT_VF))
- netdev->hw_features |=
- NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
}
static int hns3_alloc_buffer(struct hns3_enet_ring *ring,
@@ -1971,106 +1954,6 @@
writel_relaxed(i, ring->tqp->io_base + HNS3_RING_RX_RING_HEAD_REG);
}
-/* hns3_nic_get_headlen - determine size of header for LRO/GRO
- * @data: pointer to the start of the headers
- * @max: total length of section to find headers in
- *
- * This function is meant to determine the length of headers that will
- * be recognized by hardware for LRO, GRO, and RSC offloads. The main
- * motivation of doing this is to only perform one pull for IPv4 TCP
- * packets so that we can do basic things like calculating the gso_size
- * based on the average data per packet.
- */
-static unsigned int hns3_nic_get_headlen(unsigned char *data, u32 flag,
- unsigned int max_size)
-{
- unsigned char *network;
- u8 hlen;
-
- /* This should never happen, but better safe than sorry */
- if (max_size < ETH_HLEN)
- return max_size;
-
- /* Initialize network frame pointer */
- network = data;
-
- /* Set first protocol and move network header forward */
- network += ETH_HLEN;
-
- /* Handle any vlan tag if present */
- if (hnae_get_field(flag, HNS3_RXD_VLAN_M, HNS3_RXD_VLAN_S)
- == HNS3_RX_FLAG_VLAN_PRESENT) {
- if ((typeof(max_size))(network - data) > (max_size - VLAN_HLEN))
- return max_size;
-
- network += VLAN_HLEN;
- }
-
- /* Handle L3 protocols */
- if (hnae_get_field(flag, HNS3_RXD_L3ID_M, HNS3_RXD_L3ID_S)
- == HNS3_RX_FLAG_L3ID_IPV4) {
- if ((typeof(max_size))(network - data) >
- (max_size - sizeof(struct iphdr)))
- return max_size;
-
- /* Access ihl as a u8 to avoid unaligned access on ia64 */
- hlen = (network[0] & 0x0F) << 2;
-
- /* Verify hlen meets minimum size requirements */
- if (hlen < sizeof(struct iphdr))
- return network - data;
-
- /* Record next protocol if header is present */
- } else if (hnae_get_field(flag, HNS3_RXD_L3ID_M, HNS3_RXD_L3ID_S)
- == HNS3_RX_FLAG_L3ID_IPV6) {
- if ((typeof(max_size))(network - data) >
- (max_size - sizeof(struct ipv6hdr)))
- return max_size;
-
- /* Record next protocol */
- hlen = sizeof(struct ipv6hdr);
- } else {
- return network - data;
- }
-
- /* Relocate pointer to start of L4 header */
- network += hlen;
-
- /* Finally sort out TCP/UDP */
- if (hnae_get_field(flag, HNS3_RXD_L4ID_M, HNS3_RXD_L4ID_S)
- == HNS3_RX_FLAG_L4ID_TCP) {
- if ((typeof(max_size))(network - data) >
- (max_size - sizeof(struct tcphdr)))
- return max_size;
-
- /* Access doff as a u8 to avoid unaligned access on ia64 */
- hlen = (network[12] & 0xF0) >> 2;
-
- /* Verify hlen meets minimum size requirements */
- if (hlen < sizeof(struct tcphdr))
- return network - data;
-
- network += hlen;
- } else if (hnae_get_field(flag, HNS3_RXD_L4ID_M, HNS3_RXD_L4ID_S)
- == HNS3_RX_FLAG_L4ID_UDP) {
- if ((typeof(max_size))(network - data) >
- (max_size - sizeof(struct udphdr)))
- return max_size;
-
- network += sizeof(struct udphdr);
- }
-
- /* If everything has gone correctly network should be the
- * data section of the packet and will be the end of the header.
- * If not then it probably represents the end of the last recognized
- * header.
- */
- if ((typeof(max_size))(network - data) < max_size)
- return network - data;
- else
- return max_size;
-}
-
static void hns3_nic_reuse_page(struct sk_buff *skb, int i,
struct hns3_enet_ring *ring, int pull_len,
struct hns3_desc_cb *desc_cb)
@@ -2270,8 +2153,8 @@
ring->stats.seg_pkt_cnt++;
u64_stats_update_end(&ring->syncp);
- pull_len = hns3_nic_get_headlen(va, l234info,
- HNS3_RX_HEAD_SIZE);
+ pull_len = eth_get_headlen(va, HNS3_RX_HEAD_SIZE);
+
memcpy(__skb_put(skb, pull_len), va,
ALIGN(pull_len, sizeof(long)));
@@ -3052,13 +2935,13 @@
}
/* Set mac addr if it is configured. or leave it to the AE driver */
-static void hns3_init_mac_addr(struct net_device *netdev)
+static void hns3_init_mac_addr(struct net_device *netdev, bool init)
{
struct hns3_nic_priv *priv = netdev_priv(netdev);
struct hnae3_handle *h = priv->ae_handle;
u8 mac_addr_temp[ETH_ALEN];
- if (h->ae_algo->ops->get_mac_addr) {
+ if (h->ae_algo->ops->get_mac_addr && init) {
h->ae_algo->ops->get_mac_addr(h, mac_addr_temp);
ether_addr_copy(netdev->dev_addr, mac_addr_temp);
}
@@ -3112,7 +2995,7 @@
handle->kinfo.netdev = netdev;
handle->priv = (void *)priv;
- hns3_init_mac_addr(netdev);
+ hns3_init_mac_addr(netdev, true);
hns3_set_default_feature(netdev);
@@ -3298,9 +3181,35 @@
hns3_nic_mc_sync(ndev, ha->addr);
}
-static void hns3_drop_skb_data(struct hns3_enet_ring *ring, struct sk_buff *skb)
+static void hns3_clear_tx_ring(struct hns3_enet_ring *ring)
{
- dev_kfree_skb_any(skb);
+ if (!HNAE3_IS_TX_RING(ring))
+ return;
+
+ while (ring->next_to_clean != ring->next_to_use) {
+ hns3_free_buffer_detach(ring, ring->next_to_clean);
+ ring_ptr_move_fw(ring, next_to_clean);
+ }
+}
+
+static void hns3_clear_rx_ring(struct hns3_enet_ring *ring)
+{
+ if (HNAE3_IS_TX_RING(ring))
+ return;
+
+ while (ring->next_to_use != ring->next_to_clean) {
+ /* When a buffer is not reused, it's memory has been
+ * freed in hns3_handle_rx_bd or will be freed by
+ * stack, so only need to unmap the buffer here.
+ */
+ if (!ring->desc_cb[ring->next_to_use].reuse_flag) {
+ hns3_unmap_buffer(ring,
+ &ring->desc_cb[ring->next_to_use]);
+ ring->desc_cb[ring->next_to_use].dma = 0;
+ }
+
+ ring_ptr_move_fw(ring, next_to_use);
+ }
}
static void hns3_clear_all_ring(struct hnae3_handle *h)
@@ -3314,13 +3223,13 @@
struct hns3_enet_ring *ring;
ring = priv->ring_data[i].ring;
- hns3_clean_tx_ring(ring, ring->desc_num);
+ hns3_clear_tx_ring(ring);
dev_queue = netdev_get_tx_queue(ndev,
priv->ring_data[i].queue_index);
netdev_tx_reset_queue(dev_queue);
ring = priv->ring_data[i + h->kinfo.num_tqps].ring;
- hns3_clean_rx_ring(ring, ring->desc_num, hns3_drop_skb_data);
+ hns3_clear_rx_ring(ring);
}
}
@@ -3359,7 +3268,7 @@
struct hns3_nic_priv *priv = netdev_priv(netdev);
int ret;
- hns3_init_mac_addr(netdev);
+ hns3_init_mac_addr(netdev, false);
hns3_nic_set_rx_mode(netdev);
hns3_recover_hw_addr(netdev);
@@ -3600,6 +3509,8 @@
client.ops = &client_ops;
+ INIT_LIST_HEAD(&client.node);
+
ret = hnae3_register_client(&client);
if (ret)
return ret;
@@ -3627,3 +3538,4 @@
MODULE_AUTHOR("Huawei Tech. Co., Ltd.");
MODULE_LICENSE("GPL");
MODULE_ALIAS("pci:hns-nic");
+MODULE_VERSION(HNS3_MOD_VERSION);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 98cdbd3..5b40f5a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -14,6 +14,8 @@
#include "hnae3.h"
+#define HNS3_MOD_VERSION "1.0"
+
extern const char hns3_driver_version[];
enum hns3_nic_state {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index eb3c34f..c16bb6c 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -74,7 +74,7 @@
u32 ethtool_link_mode;
};
-static int hns3_lp_setup(struct net_device *ndev, enum hnae3_loop loop)
+static int hns3_lp_setup(struct net_device *ndev, enum hnae3_loop loop, bool en)
{
struct hnae3_handle *h = hns3_get_handle(ndev);
int ret;
@@ -85,11 +85,7 @@
switch (loop) {
case HNAE3_MAC_INTER_LOOP_MAC:
- ret = h->ae_algo->ops->set_loopback(h, loop, true);
- break;
- case HNAE3_MAC_LOOP_NONE:
- ret = h->ae_algo->ops->set_loopback(h,
- HNAE3_MAC_INTER_LOOP_MAC, false);
+ ret = h->ae_algo->ops->set_loopback(h, loop, en);
break;
default:
ret = -ENOTSUPP;
@@ -99,10 +95,7 @@
if (ret)
return ret;
- if (loop == HNAE3_MAC_LOOP_NONE)
- h->ae_algo->ops->set_promisc_mode(h, ndev->flags & IFF_PROMISC);
- else
- h->ae_algo->ops->set_promisc_mode(h, 1);
+ h->ae_algo->ops->set_promisc_mode(h, en);
return ret;
}
@@ -122,13 +115,13 @@
return ret;
}
- ret = hns3_lp_setup(ndev, loop_mode);
+ ret = hns3_lp_setup(ndev, loop_mode, true);
usleep_range(10000, 20000);
return ret;
}
-static int hns3_lp_down(struct net_device *ndev)
+static int hns3_lp_down(struct net_device *ndev, enum hnae3_loop loop_mode)
{
struct hnae3_handle *h = hns3_get_handle(ndev);
int ret;
@@ -136,7 +129,7 @@
if (!h->ae_algo->ops->stop)
return -EOPNOTSUPP;
- ret = hns3_lp_setup(ndev, HNAE3_MAC_LOOP_NONE);
+ ret = hns3_lp_setup(ndev, loop_mode, false);
if (ret) {
netdev_err(ndev, "lb_setup return error: %d\n", ret);
return ret;
@@ -332,7 +325,7 @@
data[test_index] = hns3_lp_up(ndev, loop_type);
if (!data[test_index]) {
data[test_index] = hns3_lp_run_test(ndev, loop_type);
- hns3_lp_down(ndev);
+ hns3_lp_down(ndev, loop_type);
}
if (data[test_index])
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index ff13d18..c36d647 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -31,6 +31,17 @@
return ring->desc_num - used - 1;
}
+static int is_valid_csq_clean_head(struct hclge_cmq_ring *ring, int h)
+{
+ int u = ring->next_to_use;
+ int c = ring->next_to_clean;
+
+ if (unlikely(h >= ring->desc_num))
+ return 0;
+
+ return u > c ? (h > c && h <= u) : (h > c || h <= u);
+}
+
static int hclge_alloc_cmd_desc(struct hclge_cmq_ring *ring)
{
int size = ring->desc_num * sizeof(struct hclge_desc);
@@ -141,6 +152,7 @@
static int hclge_cmd_csq_clean(struct hclge_hw *hw)
{
+ struct hclge_dev *hdev = (struct hclge_dev *)hw->back;
struct hclge_cmq_ring *csq = &hw->cmq.csq;
u16 ntc = csq->next_to_clean;
struct hclge_desc *desc;
@@ -149,6 +161,13 @@
desc = &csq->desc[ntc];
head = hclge_read_dev(hw, HCLGE_NIC_CSQ_HEAD_REG);
+ rmb(); /* Make sure head is ready before touch any data */
+
+ if (!is_valid_csq_clean_head(csq, head)) {
+ dev_warn(&hdev->pdev->dev, "wrong head (%d, %d-%d)\n", head,
+ csq->next_to_use, csq->next_to_clean);
+ return 0;
+ }
while (head != ntc) {
memset(desc, 0, sizeof(*desc));
@@ -171,7 +190,11 @@
static bool hclge_is_special_opcode(u16 opcode)
{
- u16 spec_opcode[3] = {0x0030, 0x0031, 0x0032};
+ /* these commands have several descriptors,
+ * and use the first one to save opcode and return value
+ */
+ u16 spec_opcode[3] = {HCLGE_OPC_STATS_64_BIT,
+ HCLGE_OPC_STATS_32_BIT, HCLGE_OPC_STATS_MAC};
int i;
for (i = 0; i < ARRAY_SIZE(spec_opcode); i++) {
@@ -362,9 +385,9 @@
static void hclge_destroy_queue(struct hclge_cmq_ring *ring)
{
- spin_lock_bh(&ring->lock);
+ spin_lock(&ring->lock);
hclge_free_cmd_desc(ring);
- spin_unlock_bh(&ring->lock);
+ spin_unlock(&ring->lock);
}
void hclge_destroy_cmd_queue(struct hclge_hw *hw)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 2066dd7..2f0bbb6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -304,8 +304,6 @@
HCLGE_MAC_STATS_FIELD_OFF(mac_tx_2048_4095_oct_pkt_num)},
{"mac_tx_4096_8191_oct_pkt_num",
HCLGE_MAC_STATS_FIELD_OFF(mac_tx_4096_8191_oct_pkt_num)},
- {"mac_tx_8192_12287_oct_pkt_num",
- HCLGE_MAC_STATS_FIELD_OFF(mac_tx_8192_12287_oct_pkt_num)},
{"mac_tx_8192_9216_oct_pkt_num",
HCLGE_MAC_STATS_FIELD_OFF(mac_tx_8192_9216_oct_pkt_num)},
{"mac_tx_9217_12287_oct_pkt_num",
@@ -356,8 +354,6 @@
HCLGE_MAC_STATS_FIELD_OFF(mac_rx_2048_4095_oct_pkt_num)},
{"mac_rx_4096_8191_oct_pkt_num",
HCLGE_MAC_STATS_FIELD_OFF(mac_rx_4096_8191_oct_pkt_num)},
- {"mac_rx_8192_12287_oct_pkt_num",
- HCLGE_MAC_STATS_FIELD_OFF(mac_rx_8192_12287_oct_pkt_num)},
{"mac_rx_8192_9216_oct_pkt_num",
HCLGE_MAC_STATS_FIELD_OFF(mac_rx_8192_9216_oct_pkt_num)},
{"mac_rx_9217_12287_oct_pkt_num",
@@ -1459,8 +1455,11 @@
/* We need to alloc a vport for main NIC of PF */
num_vport = hdev->num_vmdq_vport + hdev->num_req_vfs + 1;
- if (hdev->num_tqps < num_vport)
- num_vport = hdev->num_tqps;
+ if (hdev->num_tqps < num_vport) {
+ dev_err(&hdev->pdev->dev, "tqps(%d) is less than vports(%d)",
+ hdev->num_tqps, num_vport);
+ return -EINVAL;
+ }
/* Alloc the same number of TQPs for every vport */
tqp_per_vport = hdev->num_tqps / num_vport;
@@ -1474,21 +1473,8 @@
hdev->vport = vport;
hdev->num_alloc_vport = num_vport;
-#ifdef CONFIG_PCI_IOV
- /* Enable SRIOV */
- if (hdev->num_req_vfs) {
- dev_info(&pdev->dev, "active VFs(%d) found, enabling SRIOV\n",
- hdev->num_req_vfs);
- ret = pci_enable_sriov(hdev->pdev, hdev->num_req_vfs);
- if (ret) {
- hdev->num_alloc_vfs = 0;
- dev_err(&pdev->dev, "SRIOV enable failed %d\n",
- ret);
- return ret;
- }
- }
- hdev->num_alloc_vfs = hdev->num_req_vfs;
-#endif
+ if (IS_ENABLED(CONFIG_PCI_IOV))
+ hdev->num_alloc_vfs = hdev->num_req_vfs;
for (i = 0; i < num_vport; i++) {
vport->back = hdev;
@@ -2947,21 +2933,6 @@
hclge_service_complete(hdev);
}
-static void hclge_disable_sriov(struct hclge_dev *hdev)
-{
- /* If our VFs are assigned we cannot shut down SR-IOV
- * without causing issues, so just leave the hardware
- * available but disabled
- */
- if (pci_vfs_assigned(hdev->pdev)) {
- dev_warn(&hdev->pdev->dev,
- "disabling driver while VFs are assigned\n");
- return;
- }
-
- pci_disable_sriov(hdev->pdev);
-}
-
struct hclge_vport *hclge_get_vport(struct hnae3_handle *handle)
{
/* VF handle has no client */
@@ -3683,48 +3654,50 @@
"mac enable fail, ret =%d.\n", ret);
}
-static int hclge_set_loopback(struct hnae3_handle *handle,
- enum hnae3_loop loop_mode, bool en)
+static int hclge_set_mac_loopback(struct hclge_dev *hdev, bool en)
{
- struct hclge_vport *vport = hclge_get_vport(handle);
struct hclge_config_mac_mode_cmd *req;
- struct hclge_dev *hdev = vport->back;
struct hclge_desc desc;
u32 loop_en;
int ret;
+ req = (struct hclge_config_mac_mode_cmd *)&desc.data[0];
+ /* 1 Read out the MAC mode config at first */
+ hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_CONFIG_MAC_MODE, true);
+ ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+ if (ret) {
+ dev_err(&hdev->pdev->dev,
+ "mac loopback get fail, ret =%d.\n", ret);
+ return ret;
+ }
+
+ /* 2 Then setup the loopback flag */
+ loop_en = le32_to_cpu(req->txrx_pad_fcs_loop_en);
+ hnae_set_bit(loop_en, HCLGE_MAC_APP_LP_B, en ? 1 : 0);
+
+ req->txrx_pad_fcs_loop_en = cpu_to_le32(loop_en);
+
+ /* 3 Config mac work mode with loopback flag
+ * and its original configure parameters
+ */
+ hclge_cmd_reuse_desc(&desc, false);
+ ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+ if (ret)
+ dev_err(&hdev->pdev->dev,
+ "mac loopback set fail, ret =%d.\n", ret);
+ return ret;
+}
+
+static int hclge_set_loopback(struct hnae3_handle *handle,
+ enum hnae3_loop loop_mode, bool en)
+{
+ struct hclge_vport *vport = hclge_get_vport(handle);
+ struct hclge_dev *hdev = vport->back;
+ int ret;
+
switch (loop_mode) {
case HNAE3_MAC_INTER_LOOP_MAC:
- req = (struct hclge_config_mac_mode_cmd *)&desc.data[0];
- /* 1 Read out the MAC mode config at first */
- hclge_cmd_setup_basic_desc(&desc,
- HCLGE_OPC_CONFIG_MAC_MODE,
- true);
- ret = hclge_cmd_send(&hdev->hw, &desc, 1);
- if (ret) {
- dev_err(&hdev->pdev->dev,
- "mac loopback get fail, ret =%d.\n",
- ret);
- return ret;
- }
-
- /* 2 Then setup the loopback flag */
- loop_en = le32_to_cpu(req->txrx_pad_fcs_loop_en);
- if (en)
- hnae_set_bit(loop_en, HCLGE_MAC_APP_LP_B, 1);
- else
- hnae_set_bit(loop_en, HCLGE_MAC_APP_LP_B, 0);
-
- req->txrx_pad_fcs_loop_en = cpu_to_le32(loop_en);
-
- /* 3 Config mac work mode with loopback flag
- * and its original configure parameters
- */
- hclge_cmd_reuse_desc(&desc, false);
- ret = hclge_cmd_send(&hdev->hw, &desc, 1);
- if (ret)
- dev_err(&hdev->pdev->dev,
- "mac loopback set fail, ret =%d.\n", ret);
+ ret = hclge_set_mac_loopback(hdev, en);
break;
default:
ret = -ENOTSUPP;
@@ -3783,6 +3756,7 @@
hclge_cfg_mac_mode(hdev, true);
clear_bit(HCLGE_STATE_DOWN, &hdev->state);
mod_timer(&hdev->service_timer, jiffies + HZ);
+ hdev->hw.mac.link = 0;
/* reset tqp stats */
hclge_reset_tqp_stats(handle);
@@ -3819,6 +3793,8 @@
/* reset tqp stats */
hclge_reset_tqp_stats(handle);
+ del_timer_sync(&hdev->service_timer);
+ cancel_work_sync(&hdev->service_task);
hclge_update_link_status(hdev);
}
@@ -4540,8 +4516,9 @@
hclge_set_vlan_filter_ctrl(hdev, HCLGE_FILTER_TYPE_VF, enable);
}
-int hclge_set_vf_vlan_common(struct hclge_dev *hdev, int vfid,
- bool is_kill, u16 vlan, u8 qos, __be16 proto)
+static int hclge_set_vf_vlan_common(struct hclge_dev *hdev, int vfid,
+ bool is_kill, u16 vlan, u8 qos,
+ __be16 proto)
{
#define HCLGE_MAX_VF_BYTES 16
struct hclge_vlan_filter_vf_cfg_cmd *req0;
@@ -4599,12 +4576,9 @@
return -EIO;
}
-static int hclge_set_port_vlan_filter(struct hnae3_handle *handle,
- __be16 proto, u16 vlan_id,
- bool is_kill)
+static int hclge_set_port_vlan_filter(struct hclge_dev *hdev, __be16 proto,
+ u16 vlan_id, bool is_kill)
{
- struct hclge_vport *vport = hclge_get_vport(handle);
- struct hclge_dev *hdev = vport->back;
struct hclge_vlan_filter_pf_cfg_cmd *req;
struct hclge_desc desc;
u8 vlan_offset_byte_val;
@@ -4624,22 +4598,66 @@
req->vlan_offset_bitmap[vlan_offset_byte] = vlan_offset_byte_val;
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+ if (ret)
+ dev_err(&hdev->pdev->dev,
+ "port vlan command, send fail, ret =%d.\n", ret);
+ return ret;
+}
+
+static int hclge_set_vlan_filter_hw(struct hclge_dev *hdev, __be16 proto,
+ u16 vport_id, u16 vlan_id, u8 qos,
+ bool is_kill)
+{
+ u16 vport_idx, vport_num = 0;
+ int ret;
+
+ ret = hclge_set_vf_vlan_common(hdev, vport_id, is_kill, vlan_id,
+ 0, proto);
if (ret) {
dev_err(&hdev->pdev->dev,
- "port vlan command, send fail, ret =%d.\n",
- ret);
+ "Set %d vport vlan filter config fail, ret =%d.\n",
+ vport_id, ret);
return ret;
}
- ret = hclge_set_vf_vlan_common(hdev, 0, is_kill, vlan_id, 0, proto);
- if (ret) {
+ /* vlan 0 may be added twice when 8021q module is enabled */
+ if (!is_kill && !vlan_id &&
+ test_bit(vport_id, hdev->vlan_table[vlan_id]))
+ return 0;
+
+ if (!is_kill && test_and_set_bit(vport_id, hdev->vlan_table[vlan_id])) {
dev_err(&hdev->pdev->dev,
- "Set pf vlan filter config fail, ret =%d.\n",
- ret);
- return -EIO;
+ "Add port vlan failed, vport %d is already in vlan %d\n",
+ vport_id, vlan_id);
+ return -EINVAL;
}
- return 0;
+ if (is_kill &&
+ !test_and_clear_bit(vport_id, hdev->vlan_table[vlan_id])) {
+ dev_err(&hdev->pdev->dev,
+ "Delete port vlan failed, vport %d is not in vlan %d\n",
+ vport_id, vlan_id);
+ return -EINVAL;
+ }
+
+ for_each_set_bit(vport_idx, hdev->vlan_table[vlan_id], VLAN_N_VID)
+ vport_num++;
+
+ if ((is_kill && vport_num == 0) || (!is_kill && vport_num == 1))
+ ret = hclge_set_port_vlan_filter(hdev, proto, vlan_id,
+ is_kill);
+
+ return ret;
+}
+
+int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
+ u16 vlan_id, bool is_kill)
+{
+ struct hclge_vport *vport = hclge_get_vport(handle);
+ struct hclge_dev *hdev = vport->back;
+
+ return hclge_set_vlan_filter_hw(hdev, proto, vport->vport_id, vlan_id,
+ 0, is_kill);
}
static int hclge_set_vf_vlan_filter(struct hnae3_handle *handle, int vfid,
@@ -4653,7 +4671,7 @@
if (proto != htons(ETH_P_8021Q))
return -EPROTONOSUPPORT;
- return hclge_set_vf_vlan_common(hdev, vfid, false, vlan, qos, proto);
+ return hclge_set_vlan_filter_hw(hdev, proto, vfid, vlan, qos, false);
}
static int hclge_set_vlan_tx_offload_cfg(struct hclge_vport *vport)
@@ -4818,10 +4836,10 @@
}
handle = &hdev->vport[0].nic;
- return hclge_set_port_vlan_filter(handle, htons(ETH_P_8021Q), 0, false);
+ return hclge_set_vlan_filter(handle, htons(ETH_P_8021Q), 0, false);
}
-static int hclge_en_hw_strip_rxvtag(struct hnae3_handle *handle, bool enable)
+int hclge_en_hw_strip_rxvtag(struct hnae3_handle *handle, bool enable)
{
struct hclge_vport *vport = hclge_get_vport(handle);
@@ -5166,12 +5184,6 @@
struct phy_device *phydev = hdev->hw.mac.phydev;
u32 fc_autoneg;
- /* Only support flow control negotiation for netdev with
- * phy attached for now.
- */
- if (!phydev)
- return -EOPNOTSUPP;
-
fc_autoneg = hclge_get_autoneg(handle);
if (auto_neg != fc_autoneg) {
dev_info(&hdev->pdev->dev,
@@ -5190,6 +5202,12 @@
if (!fc_autoneg)
return hclge_cfg_pauseparam(hdev, rx_en, tx_en);
+ /* Only support flow control negotiation for netdev with
+ * phy attached for now.
+ */
+ if (!phydev)
+ return -EOPNOTSUPP;
+
return phy_start_aneg(phydev);
}
@@ -5282,7 +5300,7 @@
vport->nic.client = client;
ret = client->ops->init_instance(&vport->nic);
if (ret)
- goto err;
+ return ret;
if (hdev->roce_client &&
hnae3_dev_roce_supported(hdev)) {
@@ -5290,11 +5308,11 @@
ret = hclge_init_roce_base_info(vport);
if (ret)
- goto err;
+ return ret;
ret = rc->ops->init_instance(&vport->roce);
if (ret)
- goto err;
+ return ret;
}
break;
@@ -5304,7 +5322,7 @@
ret = client->ops->init_instance(&vport->nic);
if (ret)
- goto err;
+ return ret;
break;
case HNAE3_CLIENT_ROCE:
@@ -5316,18 +5334,16 @@
if (hdev->roce_client && hdev->nic_client) {
ret = hclge_init_roce_base_info(vport);
if (ret)
- goto err;
+ return ret;
ret = client->ops->init_instance(&vport->roce);
if (ret)
- goto err;
+ return ret;
}
}
}
return 0;
-err:
- return ret;
}
static void hclge_uninit_client_instance(struct hnae3_client *client,
@@ -5364,7 +5380,7 @@
ret = pci_enable_device(pdev);
if (ret) {
dev_err(&pdev->dev, "failed to enable PCI device\n");
- goto err_no_drvdata;
+ return ret;
}
ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
@@ -5402,8 +5418,6 @@
pci_release_regions(pdev);
err_disable_device:
pci_disable_device(pdev);
-err_no_drvdata:
- pci_set_drvdata(pdev, NULL);
return ret;
}
@@ -5412,6 +5426,7 @@
{
struct pci_dev *pdev = hdev->pdev;
+ pcim_iounmap(pdev, hdev->hw.io_base);
pci_free_irq_vectors(pdev);
pci_clear_master(pdev);
pci_release_mem_regions(pdev);
@@ -5427,7 +5442,7 @@
hdev = devm_kzalloc(&pdev->dev, sizeof(*hdev), GFP_KERNEL);
if (!hdev) {
ret = -ENOMEM;
- goto err_hclge_dev;
+ goto out;
}
hdev->pdev = pdev;
@@ -5440,38 +5455,38 @@
ret = hclge_pci_init(hdev);
if (ret) {
dev_err(&pdev->dev, "PCI init failed\n");
- goto err_pci_init;
+ goto out;
}
/* Firmware command queue initialize */
ret = hclge_cmd_queue_init(hdev);
if (ret) {
dev_err(&pdev->dev, "Cmd queue init failed, ret = %d.\n", ret);
- return ret;
+ goto err_pci_uninit;
}
/* Firmware command initialize */
ret = hclge_cmd_init(hdev);
if (ret)
- goto err_cmd_init;
+ goto err_cmd_uninit;
ret = hclge_get_cap(hdev);
if (ret) {
dev_err(&pdev->dev, "get hw capability error, ret = %d.\n",
ret);
- return ret;
+ goto err_cmd_uninit;
}
ret = hclge_configure(hdev);
if (ret) {
dev_err(&pdev->dev, "Configure dev error, ret = %d.\n", ret);
- return ret;
+ goto err_cmd_uninit;
}
ret = hclge_init_msi(hdev);
if (ret) {
dev_err(&pdev->dev, "Init MSI/MSI-X error, ret = %d.\n", ret);
- return ret;
+ goto err_cmd_uninit;
}
ret = hclge_misc_irq_init(hdev);
@@ -5479,69 +5494,71 @@
dev_err(&pdev->dev,
"Misc IRQ(vector0) init error, ret = %d.\n",
ret);
- return ret;
+ goto err_msi_uninit;
}
ret = hclge_alloc_tqps(hdev);
if (ret) {
dev_err(&pdev->dev, "Allocate TQPs error, ret = %d.\n", ret);
- return ret;
+ goto err_msi_irq_uninit;
}
ret = hclge_alloc_vport(hdev);
if (ret) {
dev_err(&pdev->dev, "Allocate vport error, ret = %d.\n", ret);
- return ret;
+ goto err_msi_irq_uninit;
}
ret = hclge_map_tqp(hdev);
if (ret) {
dev_err(&pdev->dev, "Map tqp error, ret = %d.\n", ret);
- return ret;
+ goto err_msi_irq_uninit;
}
- ret = hclge_mac_mdio_config(hdev);
- if (ret) {
- dev_warn(&hdev->pdev->dev,
- "mdio config fail ret=%d\n", ret);
- return ret;
+ if (hdev->hw.mac.media_type == HNAE3_MEDIA_TYPE_COPPER) {
+ ret = hclge_mac_mdio_config(hdev);
+ if (ret) {
+ dev_err(&hdev->pdev->dev,
+ "mdio config fail ret=%d\n", ret);
+ goto err_msi_irq_uninit;
+ }
}
ret = hclge_mac_init(hdev);
if (ret) {
dev_err(&pdev->dev, "Mac init error, ret = %d\n", ret);
- return ret;
+ goto err_mdiobus_unreg;
}
ret = hclge_config_tso(hdev, HCLGE_TSO_MSS_MIN, HCLGE_TSO_MSS_MAX);
if (ret) {
dev_err(&pdev->dev, "Enable tso fail, ret =%d\n", ret);
- return ret;
+ goto err_mdiobus_unreg;
}
ret = hclge_init_vlan_config(hdev);
if (ret) {
dev_err(&pdev->dev, "VLAN init fail, ret =%d\n", ret);
- return ret;
+ goto err_mdiobus_unreg;
}
ret = hclge_tm_schd_init(hdev);
if (ret) {
dev_err(&pdev->dev, "tm schd init fail, ret =%d\n", ret);
- return ret;
+ goto err_mdiobus_unreg;
}
hclge_rss_init_cfg(hdev);
ret = hclge_rss_init_hw(hdev);
if (ret) {
dev_err(&pdev->dev, "Rss init fail, ret =%d\n", ret);
- return ret;
+ goto err_mdiobus_unreg;
}
ret = init_mgr_tbl(hdev);
if (ret) {
dev_err(&pdev->dev, "manager table init fail, ret =%d\n", ret);
- return ret;
+ goto err_mdiobus_unreg;
}
hclge_dcb_ops_set(hdev);
@@ -5564,11 +5581,21 @@
pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME);
return 0;
-err_cmd_init:
+err_mdiobus_unreg:
+ if (hdev->hw.mac.phydev)
+ mdiobus_unregister(hdev->hw.mac.mdio_bus);
+err_msi_irq_uninit:
+ hclge_misc_irq_uninit(hdev);
+err_msi_uninit:
+ pci_free_irq_vectors(pdev);
+err_cmd_uninit:
+ hclge_destroy_cmd_queue(&hdev->hw);
+err_pci_uninit:
+ pcim_iounmap(pdev, hdev->hw.io_base);
+ pci_clear_master(pdev);
pci_release_regions(pdev);
-err_pci_init:
- pci_set_drvdata(pdev, NULL);
-err_hclge_dev:
+ pci_disable_device(pdev);
+out:
return ret;
}
@@ -5586,6 +5613,7 @@
set_bit(HCLGE_STATE_DOWN, &hdev->state);
hclge_stats_clear(hdev);
+ memset(hdev->vlan_table, 0, sizeof(hdev->vlan_table));
ret = hclge_cmd_init(hdev);
if (ret) {
@@ -5658,9 +5686,6 @@
set_bit(HCLGE_STATE_DOWN, &hdev->state);
- if (IS_ENABLED(CONFIG_PCI_IOV))
- hclge_disable_sriov(hdev);
-
if (hdev->service_timer.function)
del_timer_sync(&hdev->service_timer);
if (hdev->service_task.func)
@@ -6203,7 +6228,7 @@
.get_fw_version = hclge_get_fw_version,
.get_mdix_mode = hclge_get_mdix_mode,
.enable_vlan_filter = hclge_enable_vlan_filter,
- .set_vlan_filter = hclge_set_port_vlan_filter,
+ .set_vlan_filter = hclge_set_vlan_filter,
.set_vf_vlan_filter = hclge_set_vf_vlan_filter,
.enable_hw_strip_rxvtag = hclge_en_hw_strip_rxvtag,
.reset_event = hclge_reset_event,
@@ -6228,7 +6253,9 @@
{
pr_info("%s is initializing\n", HCLGE_NAME);
- return hnae3_register_ae_algo(&ae_algo);
+ hnae3_register_ae_algo(&ae_algo);
+
+ return 0;
}
static void hclge_exit(void)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 0f4157e..93177d9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -12,10 +12,12 @@
#include <linux/fs.h>
#include <linux/types.h>
#include <linux/phy.h>
+#include <linux/if_vlan.h>
+
#include "hclge_cmd.h"
#include "hnae3.h"
-#define HCLGE_MOD_VERSION "v1.0"
+#define HCLGE_MOD_VERSION "1.0"
#define HCLGE_DRIVER_NAME "hclge"
#define HCLGE_INVALID_VPORT 0xffff
@@ -406,9 +408,9 @@
u64 mac_tx_1519_2047_oct_pkt_num;
u64 mac_tx_2048_4095_oct_pkt_num;
u64 mac_tx_4096_8191_oct_pkt_num;
- u64 mac_tx_8192_12287_oct_pkt_num; /* valid for GE MAC only */
- u64 mac_tx_8192_9216_oct_pkt_num; /* valid for LGE & CGE MAC only */
- u64 mac_tx_9217_12287_oct_pkt_num; /* valid for LGE & CGE MAC */
+ u64 rsv0;
+ u64 mac_tx_8192_9216_oct_pkt_num;
+ u64 mac_tx_9217_12287_oct_pkt_num;
u64 mac_tx_12288_16383_oct_pkt_num;
u64 mac_tx_1519_max_good_oct_pkt_num;
u64 mac_tx_1519_max_bad_oct_pkt_num;
@@ -433,9 +435,9 @@
u64 mac_rx_1519_2047_oct_pkt_num;
u64 mac_rx_2048_4095_oct_pkt_num;
u64 mac_rx_4096_8191_oct_pkt_num;
- u64 mac_rx_8192_12287_oct_pkt_num;/* valid for GE MAC only */
- u64 mac_rx_8192_9216_oct_pkt_num; /* valid for LGE & CGE MAC only */
- u64 mac_rx_9217_12287_oct_pkt_num; /* valid for LGE & CGE MAC only */
+ u64 rsv1;
+ u64 mac_rx_8192_9216_oct_pkt_num;
+ u64 mac_rx_9217_12287_oct_pkt_num;
u64 mac_rx_12288_16383_oct_pkt_num;
u64 mac_rx_1519_max_good_oct_pkt_num;
u64 mac_rx_1519_max_bad_oct_pkt_num;
@@ -471,6 +473,7 @@
u16 tx_in_vlan_type;
};
+#define HCLGE_VPORT_NUM 256
struct hclge_dev {
struct pci_dev *pdev;
struct hnae3_ae_dev *ae_dev;
@@ -562,6 +565,7 @@
u64 rx_pkts_for_led;
u64 tx_pkts_for_led;
+ unsigned long vlan_table[VLAN_N_VID][BITS_TO_LONGS(HCLGE_VPORT_NUM)];
};
/* VPort level vlan tag configuration for TX direction */
@@ -646,8 +650,9 @@
}
int hclge_cfg_mac_speed_dup(struct hclge_dev *hdev, int speed, u8 duplex);
-int hclge_set_vf_vlan_common(struct hclge_dev *vport, int vfid,
- bool is_kill, u16 vlan, u8 qos, __be16 proto);
+int hclge_set_vlan_filter(struct hnae3_handle *handle, __be16 proto,
+ u16 vlan_id, bool is_kill);
+int hclge_en_hw_strip_rxvtag(struct hnae3_handle *handle, bool enable);
int hclge_buffer_alloc(struct hclge_dev *hdev);
int hclge_rss_init_hw(struct hclge_dev *hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
index a6f7ffa..b6ae26b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c
@@ -264,19 +264,23 @@
struct hclge_mbx_vf_to_pf_cmd *mbx_req,
bool gen_resp)
{
- struct hclge_dev *hdev = vport->back;
int status = 0;
if (mbx_req->msg[1] == HCLGE_MBX_VLAN_FILTER) {
+ struct hnae3_handle *handle = &vport->nic;
u16 vlan, proto;
bool is_kill;
is_kill = !!mbx_req->msg[2];
memcpy(&vlan, &mbx_req->msg[3], sizeof(vlan));
memcpy(&proto, &mbx_req->msg[5], sizeof(proto));
- status = hclge_set_vf_vlan_common(hdev, vport->vport_id,
- is_kill, vlan, 0,
- cpu_to_be16(proto));
+ status = hclge_set_vlan_filter(handle, cpu_to_be16(proto),
+ vlan, is_kill);
+ } else if (mbx_req->msg[1] == HCLGE_MBX_VLAN_RX_OFF_CFG) {
+ struct hnae3_handle *handle = &vport->nic;
+ bool en = mbx_req->msg[2] ? true : false;
+
+ status = hclge_en_hw_strip_rxvtag(handle, en);
}
if (gen_resp)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index 682c2d6..9f7932e42 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -140,8 +140,11 @@
struct mii_bus *mdio_bus;
int ret;
- if (hdev->hw.mac.phy_addr >= PHY_MAX_ADDR)
- return 0;
+ if (hdev->hw.mac.phy_addr >= PHY_MAX_ADDR) {
+ dev_err(&hdev->pdev->dev, "phy_addr(%d) is too large.\n",
+ hdev->hw.mac.phy_addr);
+ return -EINVAL;
+ }
mdio_bus = devm_mdiobus_alloc(&hdev->pdev->dev);
if (!mdio_bus)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index 885f25c..262c125 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -134,11 +134,8 @@
}
ret = hclge_cmd_send(&hdev->hw, desc, HCLGE_TM_PFC_PKT_GET_CMD_NUM);
- if (ret) {
- dev_err(&hdev->pdev->dev,
- "Get pfc pause stats fail, ret = %d.\n", ret);
+ if (ret)
return ret;
- }
for (i = 0; i < HCLGE_TM_PFC_PKT_GET_CMD_NUM; i++) {
struct hclge_pfc_stats_cmd *pfc_stats =
@@ -503,7 +500,8 @@
return hclge_cmd_send(&hdev->hw, &desc, 1);
}
-static int hclge_tm_qs_bp_cfg(struct hclge_dev *hdev, u8 tc)
+static int hclge_tm_qs_bp_cfg(struct hclge_dev *hdev, u8 tc, u8 grp_id,
+ u32 bit_map)
{
struct hclge_bp_to_qs_map_cmd *bp_to_qs_map_cmd;
struct hclge_desc desc;
@@ -514,9 +512,8 @@
bp_to_qs_map_cmd = (struct hclge_bp_to_qs_map_cmd *)desc.data;
bp_to_qs_map_cmd->tc_id = tc;
-
- /* Qset and tc is one by one mapping */
- bp_to_qs_map_cmd->qs_bit_map = cpu_to_le32(1 << tc);
+ bp_to_qs_map_cmd->qs_group_id = grp_id;
+ bp_to_qs_map_cmd->qs_bit_map = cpu_to_le32(bit_map);
return hclge_cmd_send(&hdev->hw, &desc, 1);
}
@@ -1170,6 +1167,41 @@
hdev->tm_info.hw_pfc_map);
}
+/* Each Tc has a 1024 queue sets to backpress, it divides to
+ * 32 group, each group contains 32 queue sets, which can be
+ * represented by u32 bitmap.
+ */
+static int hclge_bp_setup_hw(struct hclge_dev *hdev, u8 tc)
+{
+ struct hclge_vport *vport = hdev->vport;
+ u32 i, k, qs_bitmap;
+ int ret;
+
+ for (i = 0; i < HCLGE_BP_GRP_NUM; i++) {
+ qs_bitmap = 0;
+
+ for (k = 0; k < hdev->num_alloc_vport; k++) {
+ u16 qs_id = vport->qs_offset + tc;
+ u8 grp, sub_grp;
+
+ grp = hnae_get_field(qs_id, HCLGE_BP_GRP_ID_M,
+ HCLGE_BP_GRP_ID_S);
+ sub_grp = hnae_get_field(qs_id, HCLGE_BP_SUB_GRP_ID_M,
+ HCLGE_BP_SUB_GRP_ID_S);
+ if (i == grp)
+ qs_bitmap |= (1 << sub_grp);
+
+ vport++;
+ }
+
+ ret = hclge_tm_qs_bp_cfg(hdev, tc, i, qs_bitmap);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
static int hclge_mac_pause_setup_hw(struct hclge_dev *hdev)
{
bool tx_en, rx_en;
@@ -1221,7 +1253,7 @@
dev_warn(&hdev->pdev->dev, "set pfc pause failed:%d\n", ret);
for (i = 0; i < hdev->tm_info.num_tc; i++) {
- ret = hclge_tm_qs_bp_cfg(hdev, i);
+ ret = hclge_bp_setup_hw(hdev, i);
if (ret)
return ret;
}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
index 2dbe177..c2b6e8a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
@@ -89,6 +89,11 @@
__le32 pg_shapping_para;
};
+#define HCLGE_BP_GRP_NUM 32
+#define HCLGE_BP_SUB_GRP_ID_S 0
+#define HCLGE_BP_SUB_GRP_ID_M GENMASK(4, 0)
+#define HCLGE_BP_GRP_ID_S 5
+#define HCLGE_BP_GRP_ID_M GENMASK(9, 5)
struct hclge_bp_to_qs_map_cmd {
u8 tc_id;
u8 rsvd[2];
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 2b84264..2b0e329 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -830,6 +830,17 @@
HCLGEVF_VLAN_MBX_MSG_LEN, false, NULL, 0);
}
+static int hclgevf_en_hw_strip_rxvtag(struct hnae3_handle *handle, bool enable)
+{
+ struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+ u8 msg_data;
+
+ msg_data = enable ? 1 : 0;
+ return hclgevf_send_mbx_msg(hdev, HCLGE_MBX_SET_VLAN,
+ HCLGE_MBX_VLAN_RX_OFF_CFG, &msg_data,
+ 1, false, NULL, 0);
+}
+
static void hclgevf_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
{
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
@@ -1552,7 +1563,7 @@
ret = pci_enable_device(pdev);
if (ret) {
dev_err(&pdev->dev, "failed to enable PCI device\n");
- goto err_no_drvdata;
+ return ret;
}
ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
@@ -1584,8 +1595,7 @@
pci_release_regions(pdev);
err_disable_device:
pci_disable_device(pdev);
-err_no_drvdata:
- pci_set_drvdata(pdev, NULL);
+
return ret;
}
@@ -1597,7 +1607,6 @@
pci_clear_master(pdev);
pci_release_regions(pdev);
pci_disable_device(pdev);
- pci_set_drvdata(pdev, NULL);
}
static int hclgevf_init_hdev(struct hclgevf_dev *hdev)
@@ -1625,6 +1634,10 @@
hclgevf_state_init(hdev);
+ ret = hclgevf_cmd_init(hdev);
+ if (ret)
+ goto err_cmd_init;
+
ret = hclgevf_misc_irq_init(hdev);
if (ret) {
dev_err(&pdev->dev, "failed(%d) to init Misc IRQ(vector0)\n",
@@ -1632,10 +1645,6 @@
goto err_misc_irq_init;
}
- ret = hclgevf_cmd_init(hdev);
- if (ret)
- goto err_cmd_init;
-
ret = hclgevf_configure(hdev);
if (ret) {
dev_err(&pdev->dev, "failed(%d) to fetch configuration\n", ret);
@@ -1683,10 +1692,10 @@
return 0;
err_config:
- hclgevf_cmd_uninit(hdev);
-err_cmd_init:
hclgevf_misc_irq_uninit(hdev);
err_misc_irq_init:
+ hclgevf_cmd_uninit(hdev);
+err_cmd_init:
hclgevf_state_uninit(hdev);
hclgevf_uninit_msi(hdev);
err_irq_init:
@@ -1696,9 +1705,9 @@
static void hclgevf_uninit_hdev(struct hclgevf_dev *hdev)
{
- hclgevf_cmd_uninit(hdev);
- hclgevf_misc_irq_uninit(hdev);
hclgevf_state_uninit(hdev);
+ hclgevf_misc_irq_uninit(hdev);
+ hclgevf_cmd_uninit(hdev);
hclgevf_uninit_msi(hdev);
hclgevf_pci_uninit(hdev);
}
@@ -1825,6 +1834,7 @@
.get_tc_size = hclgevf_get_tc_size,
.get_fw_version = hclgevf_get_fw_version,
.set_vlan_filter = hclgevf_set_vlan_filter,
+ .enable_hw_strip_rxvtag = hclgevf_en_hw_strip_rxvtag,
.reset_event = hclgevf_reset_event,
.get_channels = hclgevf_get_channels,
.get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info,
@@ -1842,7 +1852,9 @@
{
pr_info("%s is initializing\n", HCLGEVF_NAME);
- return hnae3_register_ae_algo(&ae_algovf);
+ hnae3_register_ae_algo(&ae_algovf);
+
+ return 0;
}
static void hclgevf_exit(void)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
index a477a7c..9763e74 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
@@ -9,7 +9,7 @@
#include "hclgevf_cmd.h"
#include "hnae3.h"
-#define HCLGEVF_MOD_VERSION "v1.0"
+#define HCLGEVF_MOD_VERSION "1.0"
#define HCLGEVF_DRIVER_NAME "hclgevf"
#define HCLGEVF_ROCEE_VECTOR_NUM 0
diff --git a/drivers/net/ethernet/huawei/hinic/Kconfig b/drivers/net/ethernet/huawei/hinic/Kconfig
index 08db249..e4e8b24 100644
--- a/drivers/net/ethernet/huawei/hinic/Kconfig
+++ b/drivers/net/ethernet/huawei/hinic/Kconfig
@@ -4,7 +4,7 @@
config HINIC
tristate "Huawei Intelligent PCIE Network Interface Card"
- depends on (PCI_MSI && X86)
+ depends on (PCI_MSI && (X86 || ARM64))
---help---
This driver supports HiNIC PCIE Ethernet cards.
To compile this driver as part of the kernel, choose Y here.
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index eb53bd9..5b12272 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -51,7 +51,9 @@
module_param(rx_weight, uint, 0644);
MODULE_PARM_DESC(rx_weight, "Number Rx packets for NAPI budget (default=64)");
-#define PCI_DEVICE_ID_HI1822_PF 0x1822
+#define HINIC_DEV_ID_QUAD_PORT_25GE 0x1822
+#define HINIC_DEV_ID_DUAL_PORT_25GE 0x0200
+#define HINIC_DEV_ID_DUAL_PORT_100GE 0x0201
#define HINIC_WQ_NAME "hinic_dev"
@@ -1097,7 +1099,9 @@
}
static const struct pci_device_id hinic_pci_table[] = {
- { PCI_VDEVICE(HUAWEI, PCI_DEVICE_ID_HI1822_PF), 0},
+ { PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_QUAD_PORT_25GE), 0},
+ { PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_DUAL_PORT_25GE), 0},
+ { PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_DUAL_PORT_100GE), 0},
{ 0, 0}
};
MODULE_DEVICE_TABLE(pci, hinic_pci_table);
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 5ec1185..d0e196b 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -109,13 +109,14 @@
struct ibmvnic_sub_crq_queue *);
static int ibmvnic_poll(struct napi_struct *napi, int data);
static void send_map_query(struct ibmvnic_adapter *adapter);
-static void send_request_map(struct ibmvnic_adapter *, dma_addr_t, __be32, u8);
-static void send_request_unmap(struct ibmvnic_adapter *, u8);
+static int send_request_map(struct ibmvnic_adapter *, dma_addr_t, __be32, u8);
+static int send_request_unmap(struct ibmvnic_adapter *, u8);
static int send_login(struct ibmvnic_adapter *adapter);
static void send_cap_queries(struct ibmvnic_adapter *adapter);
static int init_sub_crqs(struct ibmvnic_adapter *);
static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter);
static int ibmvnic_init(struct ibmvnic_adapter *);
+static int ibmvnic_reset_init(struct ibmvnic_adapter *);
static void release_crq_queue(struct ibmvnic_adapter *);
static int __ibmvnic_set_mac(struct net_device *netdev, struct sockaddr *p);
static int init_crq_queue(struct ibmvnic_adapter *adapter);
@@ -172,6 +173,7 @@
struct ibmvnic_long_term_buff *ltb, int size)
{
struct device *dev = &adapter->vdev->dev;
+ int rc;
ltb->size = size;
ltb->buff = dma_alloc_coherent(dev, ltb->size, <b->addr,
@@ -185,8 +187,12 @@
adapter->map_id++;
init_completion(&adapter->fw_done);
- send_request_map(adapter, ltb->addr,
- ltb->size, ltb->map_id);
+ rc = send_request_map(adapter, ltb->addr,
+ ltb->size, ltb->map_id);
+ if (rc) {
+ dma_free_coherent(dev, ltb->size, ltb->buff, ltb->addr);
+ return rc;
+ }
wait_for_completion(&adapter->fw_done);
if (adapter->fw_done_rc) {
@@ -215,10 +221,14 @@
static int reset_long_term_buff(struct ibmvnic_adapter *adapter,
struct ibmvnic_long_term_buff *ltb)
{
+ int rc;
+
memset(ltb->buff, 0, ltb->size);
init_completion(&adapter->fw_done);
- send_request_map(adapter, ltb->addr, ltb->size, ltb->map_id);
+ rc = send_request_map(adapter, ltb->addr, ltb->size, ltb->map_id);
+ if (rc)
+ return rc;
wait_for_completion(&adapter->fw_done);
if (adapter->fw_done_rc) {
@@ -789,6 +799,7 @@
kfree(adapter->napi);
adapter->napi = NULL;
adapter->num_active_rx_napi = 0;
+ adapter->napi_enabled = false;
}
static int ibmvnic_login(struct net_device *netdev)
@@ -924,6 +935,10 @@
/* Partuial success, delay and re-send */
mdelay(1000);
resend = true;
+ } else if (adapter->init_done_rc) {
+ netdev_warn(netdev, "Unable to set link state, rc=%d\n",
+ adapter->init_done_rc);
+ return adapter->init_done_rc;
}
} while (resend);
@@ -956,6 +971,7 @@
struct device *dev = &adapter->vdev->dev;
union ibmvnic_crq crq;
int len = 0;
+ int rc;
if (adapter->vpd->buff)
len = adapter->vpd->len;
@@ -963,7 +979,9 @@
init_completion(&adapter->fw_done);
crq.get_vpd_size.first = IBMVNIC_CRQ_CMD;
crq.get_vpd_size.cmd = GET_VPD_SIZE;
- ibmvnic_send_crq(adapter, &crq);
+ rc = ibmvnic_send_crq(adapter, &crq);
+ if (rc)
+ return rc;
wait_for_completion(&adapter->fw_done);
if (!adapter->vpd->len)
@@ -996,7 +1014,12 @@
crq.get_vpd.cmd = GET_VPD;
crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr);
crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len);
- ibmvnic_send_crq(adapter, &crq);
+ rc = ibmvnic_send_crq(adapter, &crq);
+ if (rc) {
+ kfree(adapter->vpd->buff);
+ adapter->vpd->buff = NULL;
+ return rc;
+ }
wait_for_completion(&adapter->fw_done);
return 0;
@@ -1695,6 +1718,7 @@
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
struct sockaddr *addr = p;
union ibmvnic_crq crq;
+ int rc;
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
@@ -1705,7 +1729,9 @@
ether_addr_copy(&crq.change_mac_addr.mac_addr[0], addr->sa_data);
init_completion(&adapter->fw_done);
- ibmvnic_send_crq(adapter, &crq);
+ rc = ibmvnic_send_crq(adapter, &crq);
+ if (rc)
+ return rc;
wait_for_completion(&adapter->fw_done);
/* netdev->dev_addr is changed in handle_change_mac_rsp function */
return adapter->fw_done_rc ? -EIO : 0;
@@ -1787,7 +1813,7 @@
return rc;
}
- rc = ibmvnic_init(adapter);
+ rc = ibmvnic_reset_init(adapter);
if (rc)
return IBMVNIC_INIT_FAILED;
@@ -1857,6 +1883,85 @@
return 0;
}
+static int do_hard_reset(struct ibmvnic_adapter *adapter,
+ struct ibmvnic_rwi *rwi, u32 reset_state)
+{
+ struct net_device *netdev = adapter->netdev;
+ int rc;
+
+ netdev_dbg(adapter->netdev, "Hard resetting driver (%d)\n",
+ rwi->reset_reason);
+
+ netif_carrier_off(netdev);
+ adapter->reset_reason = rwi->reset_reason;
+
+ ibmvnic_cleanup(netdev);
+ release_resources(adapter);
+ release_sub_crqs(adapter, 0);
+ release_crq_queue(adapter);
+
+ /* remove the closed state so when we call open it appears
+ * we are coming from the probed state.
+ */
+ adapter->state = VNIC_PROBED;
+
+ rc = init_crq_queue(adapter);
+ if (rc) {
+ netdev_err(adapter->netdev,
+ "Couldn't initialize crq. rc=%d\n", rc);
+ return rc;
+ }
+
+ rc = ibmvnic_init(adapter);
+ if (rc)
+ return rc;
+
+ /* If the adapter was in PROBE state prior to the reset,
+ * exit here.
+ */
+ if (reset_state == VNIC_PROBED)
+ return 0;
+
+ rc = ibmvnic_login(netdev);
+ if (rc) {
+ adapter->state = VNIC_PROBED;
+ return 0;
+ }
+ /* netif_set_real_num_xx_queues needs to take rtnl lock here
+ * unless wait_for_reset is set, in which case the rtnl lock
+ * has already been taken before initializing the reset
+ */
+ if (!adapter->wait_for_reset) {
+ rtnl_lock();
+ rc = init_resources(adapter);
+ rtnl_unlock();
+ } else {
+ rc = init_resources(adapter);
+ }
+ if (rc)
+ return rc;
+
+ ibmvnic_disable_irqs(adapter);
+ adapter->state = VNIC_CLOSED;
+
+ if (reset_state == VNIC_CLOSED)
+ return 0;
+
+ rc = __ibmvnic_open(netdev);
+ if (rc) {
+ if (list_empty(&adapter->rwi_list))
+ adapter->state = VNIC_CLOSED;
+ else
+ adapter->state = reset_state;
+
+ return 0;
+ }
+
+ netif_carrier_on(netdev);
+
+ return 0;
+}
+
static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter)
{
struct ibmvnic_rwi *rwi;
@@ -1898,14 +2003,19 @@
netdev = adapter->netdev;
mutex_lock(&adapter->reset_lock);
- adapter->resetting = true;
reset_state = adapter->state;
rwi = get_next_rwi(adapter);
while (rwi) {
- rc = do_reset(adapter, rwi, reset_state);
+ if (adapter->force_reset_recovery) {
+ adapter->force_reset_recovery = false;
+ rc = do_hard_reset(adapter, rwi, reset_state);
+ } else {
+ rc = do_reset(adapter, rwi, reset_state);
+ }
kfree(rwi);
- if (rc && rc != IBMVNIC_INIT_FAILED)
+ if (rc && rc != IBMVNIC_INIT_FAILED &&
+ !adapter->force_reset_recovery)
break;
rwi = get_next_rwi(adapter);
@@ -1931,9 +2041,9 @@
static int ibmvnic_reset(struct ibmvnic_adapter *adapter,
enum ibmvnic_reset_reason reason)
{
+ struct list_head *entry, *tmp_entry;
struct ibmvnic_rwi *rwi, *tmp;
struct net_device *netdev = adapter->netdev;
- struct list_head *entry;
int ret;
if (adapter->state == VNIC_REMOVING ||
@@ -1969,11 +2079,17 @@
ret = ENOMEM;
goto err;
}
-
+ /* if we just received a transport event,
+ * flush reset queue and process this reset
+ */
+ if (adapter->force_reset_recovery && !list_empty(&adapter->rwi_list)) {
+ list_for_each_safe(entry, tmp_entry, &adapter->rwi_list)
+ list_del(entry);
+ }
rwi->reset_reason = reason;
list_add_tail(&rwi->list, &adapter->rwi_list);
mutex_unlock(&adapter->rwi_lock);
-
+ adapter->resetting = true;
netdev_dbg(adapter->netdev, "Scheduling reset (reason %d)\n", reason);
schedule_work(&adapter->ibmvnic_reset);
@@ -2369,6 +2485,7 @@
struct ibmvnic_adapter *adapter = netdev_priv(dev);
union ibmvnic_crq crq;
int i, j;
+ int rc;
memset(&crq, 0, sizeof(crq));
crq.request_statistics.first = IBMVNIC_CRQ_CMD;
@@ -2379,7 +2496,9 @@
/* Wait for data to be written */
init_completion(&adapter->stats_done);
- ibmvnic_send_crq(adapter, &crq);
+ rc = ibmvnic_send_crq(adapter, &crq);
+ if (rc)
+ return;
wait_for_completion(&adapter->stats_done);
for (i = 0; i < ARRAY_SIZE(ibmvnic_stats); i++)
@@ -3154,6 +3273,12 @@
(unsigned long int)cpu_to_be64(u64_crq[0]),
(unsigned long int)cpu_to_be64(u64_crq[1]));
+ if (!adapter->crq.active &&
+ crq->generic.first != IBMVNIC_CRQ_INIT_CMD) {
+ dev_warn(dev, "Invalid request detected while CRQ is inactive, possible device state change during reset\n");
+ return -EINVAL;
+ }
+
/* Make sure the hypervisor sees the complete request */
mb();
@@ -3378,8 +3503,8 @@
return -1;
}
-static void send_request_map(struct ibmvnic_adapter *adapter, dma_addr_t addr,
- u32 len, u8 map_id)
+static int send_request_map(struct ibmvnic_adapter *adapter, dma_addr_t addr,
+ u32 len, u8 map_id)
{
union ibmvnic_crq crq;
@@ -3389,10 +3514,10 @@
crq.request_map.map_id = map_id;
crq.request_map.ioba = cpu_to_be32(addr);
crq.request_map.len = cpu_to_be32(len);
- ibmvnic_send_crq(adapter, &crq);
+ return ibmvnic_send_crq(adapter, &crq);
}
-static void send_request_unmap(struct ibmvnic_adapter *adapter, u8 map_id)
+static int send_request_unmap(struct ibmvnic_adapter *adapter, u8 map_id)
{
union ibmvnic_crq crq;
@@ -3400,7 +3525,7 @@
crq.request_unmap.first = IBMVNIC_CRQ_CMD;
crq.request_unmap.cmd = REQUEST_UNMAP;
crq.request_unmap.map_id = map_id;
- ibmvnic_send_crq(adapter, &crq);
+ return ibmvnic_send_crq(adapter, &crq);
}
static void send_map_query(struct ibmvnic_adapter *adapter)
@@ -4227,11 +4352,15 @@
dev_info(dev, "Partner initialized\n");
adapter->from_passive_init = true;
adapter->failover_pending = false;
- complete(&adapter->init_done);
+ if (!completion_done(&adapter->init_done)) {
+ complete(&adapter->init_done);
+ adapter->init_done_rc = -EIO;
+ }
ibmvnic_reset(adapter, VNIC_RESET_FAILOVER);
break;
case IBMVNIC_CRQ_INIT_COMPLETE:
dev_info(dev, "Partner initialization complete\n");
+ adapter->crq.active = true;
send_version_xchg(adapter);
break;
default:
@@ -4240,6 +4369,9 @@
return;
case IBMVNIC_CRQ_XPORT_EVENT:
netif_carrier_off(netdev);
+ adapter->crq.active = false;
+ if (adapter->resetting)
+ adapter->force_reset_recovery = true;
if (gen_crq->cmd == IBMVNIC_PARTITION_MIGRATED) {
dev_info(dev, "Migrated, re-enabling adapter\n");
ibmvnic_reset(adapter, VNIC_RESET_MOBILITY);
@@ -4427,6 +4559,7 @@
/* Clean out the queue */
memset(crq->msgs, 0, PAGE_SIZE);
crq->cur = 0;
+ crq->active = false;
/* And re-open it again */
rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
@@ -4461,6 +4594,7 @@
DMA_BIDIRECTIONAL);
free_page((unsigned long)crq->msgs);
crq->msgs = NULL;
+ crq->active = false;
}
static int init_crq_queue(struct ibmvnic_adapter *adapter)
@@ -4538,7 +4672,7 @@
return retrc;
}
-static int ibmvnic_init(struct ibmvnic_adapter *adapter)
+static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
{
struct device *dev = &adapter->vdev->dev;
unsigned long timeout = msecs_to_jiffies(30000);
@@ -4597,6 +4731,49 @@
return rc;
}
+static int ibmvnic_init(struct ibmvnic_adapter *adapter)
+{
+ struct device *dev = &adapter->vdev->dev;
+ unsigned long timeout = msecs_to_jiffies(30000);
+ int rc;
+
+ adapter->from_passive_init = false;
+
+ init_completion(&adapter->init_done);
+ adapter->init_done_rc = 0;
+ ibmvnic_send_crq_init(adapter);
+ if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {
+ dev_err(dev, "Initialization sequence timed out\n");
+ return -1;
+ }
+
+ if (adapter->init_done_rc) {
+ release_crq_queue(adapter);
+ return adapter->init_done_rc;
+ }
+
+ if (adapter->from_passive_init) {
+ adapter->state = VNIC_OPEN;
+ adapter->from_passive_init = false;
+ return -1;
+ }
+
+ rc = init_sub_crqs(adapter);
+ if (rc) {
+ dev_err(dev, "Initialization of sub crqs failed\n");
+ release_crq_queue(adapter);
+ return rc;
+ }
+
+ rc = init_sub_crq_irqs(adapter);
+ if (rc) {
+ dev_err(dev, "Failed to initialize sub crq irqs\n");
+ release_crq_queue(adapter);
+ }
+
+ return rc;
+}
+
static struct device_attribute dev_attr_failover;
static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 22391e8..f9fb780 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -865,6 +865,7 @@
int size, cur;
dma_addr_t msg_token;
spinlock_t lock;
+ bool active;
};
union sub_crq {
@@ -1108,6 +1109,7 @@
bool mac_change_pending;
bool failover_pending;
+ bool force_reset_recovery;
struct ibmvnic_tunables desired;
struct ibmvnic_tunables fallback;
diff --git a/drivers/net/ethernet/intel/e100.c b/drivers/net/ethernet/intel/e100.c
index 41ad56e..27d5f271 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel PRO/100 Linux driver
- Copyright(c) 1999 - 2006 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
/*
* e100.c: Intel(R) PRO/100 ethernet driver
diff --git a/drivers/net/ethernet/intel/e1000/Makefile b/drivers/net/ethernet/intel/e1000/Makefile
index c7caadd..314c52d 100644
--- a/drivers/net/ethernet/intel/e1000/Makefile
+++ b/drivers/net/ethernet/intel/e1000/Makefile
@@ -1,31 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel PRO/1000 Linux driver
# Copyright(c) 1999 - 2006 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along with
-# this program; if not, write to the Free Software Foundation, Inc.,
-# 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# Linux NICS <linux.nics@intel.com>
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
#
# Makefile for the Intel(R) PRO/1000 ethernet driver
diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
index 3a0feea..c40729b 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -1,32 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/1000 Linux driver
- Copyright(c) 1999 - 2006 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
-
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
/* Linux PRO/1000 Ethernet Driver main header file */
diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 3e80ca1..5d365a9 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- * Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2006 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
/* ethtool support for e1000 */
diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.c b/drivers/net/ethernet/intel/e1000/e1000_hw.c
index 6e7e923..48428d6 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_hw.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_hw.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-*
- Intel PRO/1000 Linux driver
- Copyright(c) 1999 - 2006 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
- */
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
/* e1000_hw.c
* Shared functions for accessing and configuring the MAC
diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.h b/drivers/net/ethernet/intel/e1000/e1000_hw.h
index f09c569..b57a049 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_hw.h
+++ b/drivers/net/ethernet/intel/e1000/e1000_hw.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/1000 Linux driver
- Copyright(c) 1999 - 2006 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
/* e1000_hw.h
* Structures, enums, and macros for the MAC
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index d5eb19b..2110d5f 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel PRO/1000 Linux driver
- Copyright(c) 1999 - 2006 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
#include "e1000.h"
#include <net/ip6_checksum.h>
diff --git a/drivers/net/ethernet/intel/e1000/e1000_osdep.h b/drivers/net/ethernet/intel/e1000/e1000_osdep.h
index ae0559b..e966bb2 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_osdep.h
+++ b/drivers/net/ethernet/intel/e1000/e1000_osdep.h
@@ -1,32 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/1000 Linux driver
- Copyright(c) 1999 - 2006 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
-
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
/* glue for the OS independent part of e1000
* includes register access macros
diff --git a/drivers/net/ethernet/intel/e1000/e1000_param.c b/drivers/net/ethernet/intel/e1000/e1000_param.c
index 345f239..d3f29ff 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_param.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_param.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel PRO/1000 Linux driver
- Copyright(c) 1999 - 2006 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2006 Intel Corporation. */
#include "e1000.h"
diff --git a/drivers/net/ethernet/intel/e1000e/80003es2lan.c b/drivers/net/ethernet/intel/e1000e/80003es2lan.c
index 953e99d..257bd59 100644
--- a/drivers/net/ethernet/intel/e1000e/80003es2lan.c
+++ b/drivers/net/ethernet/intel/e1000e/80003es2lan.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* 80003ES2LAN Gigabit Ethernet Controller (Copper)
* 80003ES2LAN Gigabit Ethernet Controller (Serdes)
diff --git a/drivers/net/ethernet/intel/e1000e/80003es2lan.h b/drivers/net/ethernet/intel/e1000e/80003es2lan.h
index ee6d125..aa9d639 100644
--- a/drivers/net/ethernet/intel/e1000e/80003es2lan.h
+++ b/drivers/net/ethernet/intel/e1000e/80003es2lan.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_80003ES2LAN_H_
#define _E1000E_80003ES2LAN_H_
diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c
index 924f2c8..b930930 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* 82571EB Gigabit Ethernet Controller
* 82571EB Gigabit Ethernet Controller (Copper)
diff --git a/drivers/net/ethernet/intel/e1000e/82571.h b/drivers/net/ethernet/intel/e1000e/82571.h
index 9a24c64..834c238 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.h
+++ b/drivers/net/ethernet/intel/e1000e/82571.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_82571_H_
#define _E1000E_82571_H_
diff --git a/drivers/net/ethernet/intel/e1000e/Makefile b/drivers/net/ethernet/intel/e1000e/Makefile
index 24e391a..44e58b6 100644
--- a/drivers/net/ethernet/intel/e1000e/Makefile
+++ b/drivers/net/ethernet/intel/e1000e/Makefile
@@ -1,30 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel PRO/1000 Linux driver
-# Copyright(c) 1999 - 2014 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, see <http://www.gnu.org/licenses/>.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# Linux NICS <linux.nics@intel.com>
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
+# Copyright(c) 1999 - 2018 Intel Corporation.
#
# Makefile for the Intel(R) PRO/1000 ethernet driver
diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
index 2288301..fd550de 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000_DEFINES_H_
#define _E1000_DEFINES_H_
diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index da88555..c760dc7 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* Linux PRO/1000 Ethernet Driver main header file */
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 64dc0c1..e084cb7 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* ethtool support for e1000 */
diff --git a/drivers/net/ethernet/intel/e1000e/hw.h b/drivers/net/ethernet/intel/e1000e/hw.h
index 2180239..eff75bd 100644
--- a/drivers/net/ethernet/intel/e1000e/hw.h
+++ b/drivers/net/ethernet/intel/e1000e/hw.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000_HW_H_
#define _E1000_HW_H_
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index 1551d6c..cdae0ef 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* 82562G 10/100 Network Connection
* 82562G-2 10/100 Network Connection
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.h b/drivers/net/ethernet/intel/e1000e/ich8lan.h
index 3c4f82c..eb09c755f 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.h
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_ICH8LAN_H_
#define _E1000E_ICH8LAN_H_
diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c
index b293464..4abd55d 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "e1000.h"
diff --git a/drivers/net/ethernet/intel/e1000e/mac.h b/drivers/net/ethernet/intel/e1000e/mac.h
index cb0abf6..6ab2611 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.h
+++ b/drivers/net/ethernet/intel/e1000e/mac.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_MAC_H_
#define _E1000E_MAC_H_
diff --git a/drivers/net/ethernet/intel/e1000e/manage.c b/drivers/net/ethernet/intel/e1000e/manage.c
index e027660..c4c9b20 100644
--- a/drivers/net/ethernet/intel/e1000e/manage.c
+++ b/drivers/net/ethernet/intel/e1000e/manage.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "e1000.h"
diff --git a/drivers/net/ethernet/intel/e1000e/manage.h b/drivers/net/ethernet/intel/e1000e/manage.h
index 3268f2e..d868aad 100644
--- a/drivers/net/ethernet/intel/e1000e/manage.h
+++ b/drivers/net/ethernet/intel/e1000e/manage.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_MANAGE_H_
#define _E1000E_MANAGE_H_
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ec4a975..d3fef7f 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/drivers/net/ethernet/intel/e1000e/nvm.c b/drivers/net/ethernet/intel/e1000e/nvm.c
index 68949bb..937f9af 100644
--- a/drivers/net/ethernet/intel/e1000e/nvm.c
+++ b/drivers/net/ethernet/intel/e1000e/nvm.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "e1000.h"
diff --git a/drivers/net/ethernet/intel/e1000e/nvm.h b/drivers/net/ethernet/intel/e1000e/nvm.h
index 8e082028..6a30dfe 100644
--- a/drivers/net/ethernet/intel/e1000e/nvm.h
+++ b/drivers/net/ethernet/intel/e1000e/nvm.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_NVM_H_
#define _E1000E_NVM_H_
diff --git a/drivers/net/ethernet/intel/e1000e/param.c b/drivers/net/ethernet/intel/e1000e/param.c
index 2def33e..098369f 100644
--- a/drivers/net/ethernet/intel/e1000e/param.c
+++ b/drivers/net/ethernet/intel/e1000e/param.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/netdevice.h>
#include <linux/module.h>
diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
index b8226ed..4223301 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "e1000.h"
diff --git a/drivers/net/ethernet/intel/e1000e/phy.h b/drivers/net/ethernet/intel/e1000e/phy.h
index d4180b5..c48777d 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.h
+++ b/drivers/net/ethernet/intel/e1000e/phy.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_PHY_H_
#define _E1000E_PHY_H_
diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c
index f941e50..37c7694 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -1,24 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* PTP 1588 Hardware Clock (PHC)
* Derived from PTP Hardware Clock driver for Intel 82576 and 82580 (igb)
diff --git a/drivers/net/ethernet/intel/e1000e/regs.h b/drivers/net/ethernet/intel/e1000e/regs.h
index 16afc3c..47f5ca7 100644
--- a/drivers/net/ethernet/intel/e1000e/regs.h
+++ b/drivers/net/ethernet/intel/e1000e/regs.h
@@ -1,24 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel PRO/1000 Linux driver
- * Copyright(c) 1999 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000E_REGS_H_
#define _E1000E_REGS_H_
diff --git a/drivers/net/ethernet/intel/fm10k/Makefile b/drivers/net/ethernet/intel/fm10k/Makefile
index 93277cb..26a9746 100644
--- a/drivers/net/ethernet/intel/fm10k/Makefile
+++ b/drivers/net/ethernet/intel/fm10k/Makefile
@@ -1,26 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel(R) Ethernet Switch Host Interface Driver
-# Copyright(c) 2013 - 2016 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
+# Copyright(c) 2013 - 2018 Intel Corporation.
#
# Makefile for the Intel(R) Ethernet Switch Host Interface Driver
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index a9cdf76..a903a0b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -1,23 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _FM10K_H_
#define _FM10K_H_
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_common.c b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
index e303d88..f51a63f 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_common.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k_common.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_common.h b/drivers/net/ethernet/intel/fm10k/fm10k_common.h
index 2bdb24d..4c48fb7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_common.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_common.h
@@ -1,23 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _FM10K_COMMON_H_
#define _FM10K_COMMON_H_
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_dcbnl.c b/drivers/net/ethernet/intel/fm10k/fm10k_dcbnl.c
index c4f7334..20768ac 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_dcbnl.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_dcbnl.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
index 43e8d83..dca1041 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 28b6b4e..7657daa 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -1,40 +1,32 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/vmalloc.h>
#include "fm10k.h"
struct fm10k_stats {
+ /* The stat_string is expected to be a format string formatted using
+ * vsnprintf by fm10k_add_stat_strings. Every member of a stats array
+ * should use the same format specifiers as they will be formatted
+ * using the same variadic arguments.
+ */
char stat_string[ETH_GSTRING_LEN];
int sizeof_stat;
int stat_offset;
};
-#define FM10K_NETDEV_STAT(_net_stat) { \
- .stat_string = #_net_stat, \
- .sizeof_stat = FIELD_SIZEOF(struct net_device_stats, _net_stat), \
- .stat_offset = offsetof(struct net_device_stats, _net_stat) \
+#define FM10K_STAT_FIELDS(_type, _name, _stat) { \
+ .stat_string = _name, \
+ .sizeof_stat = FIELD_SIZEOF(_type, _stat), \
+ .stat_offset = offsetof(_type, _stat) \
}
+/* netdevice statistics */
+#define FM10K_NETDEV_STAT(_net_stat) \
+ FM10K_STAT_FIELDS(struct net_device_stats, __stringify(_net_stat), \
+ _net_stat)
+
static const struct fm10k_stats fm10k_gstrings_net_stats[] = {
FM10K_NETDEV_STAT(tx_packets),
FM10K_NETDEV_STAT(tx_bytes),
@@ -52,11 +44,9 @@
#define FM10K_NETDEV_STATS_LEN ARRAY_SIZE(fm10k_gstrings_net_stats)
-#define FM10K_STAT(_name, _stat) { \
- .stat_string = _name, \
- .sizeof_stat = FIELD_SIZEOF(struct fm10k_intfc, _stat), \
- .stat_offset = offsetof(struct fm10k_intfc, _stat) \
-}
+/* General interface statistics */
+#define FM10K_STAT(_name, _stat) \
+ FM10K_STAT_FIELDS(struct fm10k_intfc, _name, _stat)
static const struct fm10k_stats fm10k_gstrings_global_stats[] = {
FM10K_STAT("tx_restart_queue", restart_queue),
@@ -93,11 +83,9 @@
FM10K_STAT("nodesc_drop", stats.nodesc_drop.count),
};
-#define FM10K_MBX_STAT(_name, _stat) { \
- .stat_string = _name, \
- .sizeof_stat = FIELD_SIZEOF(struct fm10k_mbx_info, _stat), \
- .stat_offset = offsetof(struct fm10k_mbx_info, _stat) \
-}
+/* mailbox statistics */
+#define FM10K_MBX_STAT(_name, _stat) \
+ FM10K_STAT_FIELDS(struct fm10k_mbx_info, _name, _stat)
static const struct fm10k_stats fm10k_gstrings_mbx_stats[] = {
FM10K_MBX_STAT("mbx_tx_busy", tx_busy),
@@ -111,15 +99,13 @@
FM10K_MBX_STAT("mbx_rx_mbmem_pushed", rx_mbmem_pushed),
};
-#define FM10K_QUEUE_STAT(_name, _stat) { \
- .stat_string = _name, \
- .sizeof_stat = FIELD_SIZEOF(struct fm10k_ring, _stat), \
- .stat_offset = offsetof(struct fm10k_ring, _stat) \
-}
+/* per-queue ring statistics */
+#define FM10K_QUEUE_STAT(_name, _stat) \
+ FM10K_STAT_FIELDS(struct fm10k_ring, _name, _stat)
static const struct fm10k_stats fm10k_gstrings_queue_stats[] = {
- FM10K_QUEUE_STAT("packets", stats.packets),
- FM10K_QUEUE_STAT("bytes", stats.bytes),
+ FM10K_QUEUE_STAT("%s_queue_%u_packets", stats.packets),
+ FM10K_QUEUE_STAT("%s_queue_%u_bytes", stats.bytes),
};
#define FM10K_GLOBAL_STATS_LEN ARRAY_SIZE(fm10k_gstrings_global_stats)
@@ -149,49 +135,44 @@
static const char fm10k_prv_flags[FM10K_PRV_FLAG_LEN][ETH_GSTRING_LEN] = {
};
-static void fm10k_add_stat_strings(u8 **p, const char *prefix,
- const struct fm10k_stats stats[],
- const unsigned int size)
+static void __fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
+ const unsigned int size, ...)
{
unsigned int i;
for (i = 0; i < size; i++) {
- snprintf(*p, ETH_GSTRING_LEN, "%s%s",
- prefix, stats[i].stat_string);
+ va_list args;
+
+ va_start(args, size);
+ vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
*p += ETH_GSTRING_LEN;
+ va_end(args);
}
}
+#define fm10k_add_stat_strings(p, stats, ...) \
+ __fm10k_add_stat_strings(p, stats, ARRAY_SIZE(stats), ## __VA_ARGS__)
+
static void fm10k_get_stat_strings(struct net_device *dev, u8 *data)
{
struct fm10k_intfc *interface = netdev_priv(dev);
unsigned int i;
- fm10k_add_stat_strings(&data, "", fm10k_gstrings_net_stats,
- FM10K_NETDEV_STATS_LEN);
+ fm10k_add_stat_strings(&data, fm10k_gstrings_net_stats);
- fm10k_add_stat_strings(&data, "", fm10k_gstrings_global_stats,
- FM10K_GLOBAL_STATS_LEN);
+ fm10k_add_stat_strings(&data, fm10k_gstrings_global_stats);
- fm10k_add_stat_strings(&data, "", fm10k_gstrings_mbx_stats,
- FM10K_MBX_STATS_LEN);
+ fm10k_add_stat_strings(&data, fm10k_gstrings_mbx_stats);
if (interface->hw.mac.type != fm10k_mac_vf)
- fm10k_add_stat_strings(&data, "", fm10k_gstrings_pf_stats,
- FM10K_PF_STATS_LEN);
+ fm10k_add_stat_strings(&data, fm10k_gstrings_pf_stats);
for (i = 0; i < interface->hw.mac.max_queues; i++) {
- char prefix[ETH_GSTRING_LEN];
+ fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
+ "tx", i);
- snprintf(prefix, ETH_GSTRING_LEN, "tx_queue_%u_", i);
- fm10k_add_stat_strings(&data, prefix,
- fm10k_gstrings_queue_stats,
- FM10K_QUEUE_STATS_LEN);
-
- snprintf(prefix, ETH_GSTRING_LEN, "rx_queue_%u_", i);
- fm10k_add_stat_strings(&data, prefix,
- fm10k_gstrings_queue_stats,
- FM10K_QUEUE_STATS_LEN);
+ fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
+ "rx", i);
}
}
@@ -236,9 +217,9 @@
}
}
-static void fm10k_add_ethtool_stats(u64 **data, void *pointer,
- const struct fm10k_stats stats[],
- const unsigned int size)
+static void __fm10k_add_ethtool_stats(u64 **data, void *pointer,
+ const struct fm10k_stats stats[],
+ const unsigned int size)
{
unsigned int i;
char *p;
@@ -267,11 +248,16 @@
*((*data)++) = *(u8 *)p;
break;
default:
+ WARN_ONCE(1, "unexpected stat size for %s",
+ stats[i].stat_string);
*((*data)++) = 0;
}
}
}
+#define fm10k_add_ethtool_stats(data, pointer, stats) \
+ __fm10k_add_ethtool_stats(data, pointer, stats, ARRAY_SIZE(stats))
+
static void fm10k_get_ethtool_stats(struct net_device *netdev,
struct ethtool_stats __always_unused *stats,
u64 *data)
@@ -282,20 +268,16 @@
fm10k_update_stats(interface);
- fm10k_add_ethtool_stats(&data, net_stats, fm10k_gstrings_net_stats,
- FM10K_NETDEV_STATS_LEN);
+ fm10k_add_ethtool_stats(&data, net_stats, fm10k_gstrings_net_stats);
- fm10k_add_ethtool_stats(&data, interface, fm10k_gstrings_global_stats,
- FM10K_GLOBAL_STATS_LEN);
+ fm10k_add_ethtool_stats(&data, interface, fm10k_gstrings_global_stats);
fm10k_add_ethtool_stats(&data, &interface->hw.mbx,
- fm10k_gstrings_mbx_stats,
- FM10K_MBX_STATS_LEN);
+ fm10k_gstrings_mbx_stats);
if (interface->hw.mac.type != fm10k_mac_vf) {
fm10k_add_ethtool_stats(&data, interface,
- fm10k_gstrings_pf_stats,
- FM10K_PF_STATS_LEN);
+ fm10k_gstrings_pf_stats);
}
for (i = 0; i < interface->hw.mac.max_queues; i++) {
@@ -303,13 +285,11 @@
ring = interface->tx_ring[i];
fm10k_add_ethtool_stats(&data, ring,
- fm10k_gstrings_queue_stats,
- FM10K_QUEUE_STATS_LEN);
+ fm10k_gstrings_queue_stats);
ring = interface->rx_ring[i];
fm10k_add_ethtool_stats(&data, ring,
- fm10k_gstrings_queue_stats,
- FM10K_QUEUE_STATS_LEN);
+ fm10k_gstrings_queue_stats);
}
}
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 30395f5..e707d71 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k.h"
#include "fm10k_vf.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index df86070..3f53654 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/types.h>
#include <linux/module.h>
@@ -445,15 +427,14 @@
l2_accel = NULL;
}
- skb->protocol = eth_type_trans(skb, dev);
-
/* Record Rx queue, or update macvlan statistics */
if (!l2_accel)
skb_record_rx_queue(skb, rx_ring->queue_index);
else
macvlan_count_rx(netdev_priv(dev), skb->len + ETH_HLEN, true,
- (skb->pkt_type == PACKET_BROADCAST) ||
- (skb->pkt_type == PACKET_MULTICAST));
+ false);
+
+ skb->protocol = eth_type_trans(skb, dev);
}
/**
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
index c01bf30..21021fe4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k_common.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h
index 007e1df..56d1abf 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.h
@@ -1,23 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _FM10K_MBX_H_
#define _FM10K_MBX_H_
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 4579349..929f538 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1,27 +1,10 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k.h"
#include <linux/vmalloc.h>
#include <net/udp_tunnel.h>
+#include <linux/if_macvlan.h>
/**
* fm10k_setup_tx_resources - allocate Tx resources (Descriptors)
@@ -924,7 +907,9 @@
static int fm10k_update_vid(struct net_device *netdev, u16 vid, bool set)
{
struct fm10k_intfc *interface = netdev_priv(netdev);
+ struct fm10k_l2_accel *l2_accel = interface->l2_accel;
struct fm10k_hw *hw = &interface->hw;
+ u16 glort;
s32 err;
int i;
@@ -992,6 +977,22 @@
if (err)
goto err_out;
+ /* Update L2 accelerated macvlan addresses */
+ if (l2_accel) {
+ for (i = 0; i < l2_accel->size; i++) {
+ struct net_device *sdev = l2_accel->macvlan[i];
+
+ if (!sdev)
+ continue;
+
+ glort = l2_accel->dglort + 1 + i;
+
+ fm10k_queue_mac_request(interface, glort,
+ sdev->dev_addr,
+ vid, set);
+ }
+ }
+
/* set VLAN ID prior to syncing/unsyncing the VLAN */
interface->vid = vid + (set ? VLAN_N_VID : 0);
@@ -1231,6 +1232,22 @@
fm10k_queue_mac_request(interface, glort,
hw->mac.addr, vid, true);
+
+ /* synchronize macvlan addresses */
+ if (l2_accel) {
+ for (i = 0; i < l2_accel->size; i++) {
+ struct net_device *sdev = l2_accel->macvlan[i];
+
+ if (!sdev)
+ continue;
+
+ glort = l2_accel->dglort + 1 + i;
+
+ fm10k_queue_mac_request(interface, glort,
+ sdev->dev_addr,
+ vid, true);
+ }
+ }
}
/* update xcast mode before synchronizing addresses if host's mailbox
@@ -1254,7 +1271,7 @@
glort = l2_accel->dglort + 1 + i;
hw->mac.ops.update_xcast_mode(hw, glort,
- FM10K_XCAST_MODE_MULTI);
+ FM10K_XCAST_MODE_NONE);
fm10k_queue_mac_request(interface, glort,
sdev->dev_addr,
hw->mac.default_vid, true);
@@ -1447,7 +1464,14 @@
struct fm10k_dglort_cfg dglort = { 0 };
struct fm10k_hw *hw = &interface->hw;
int size = 0, i;
- u16 glort;
+ u16 vid, glort;
+
+ /* The hardware supported by fm10k only filters on the destination MAC
+ * address. In order to avoid issues we only support offloading modes
+ * where the hardware can actually provide the functionality.
+ */
+ if (!macvlan_supports_dest_filter(sdev))
+ return ERR_PTR(-EMEDIUMTYPE);
/* allocate l2 accel structure if it is not available */
if (!l2_accel) {
@@ -1513,12 +1537,18 @@
glort = l2_accel->dglort + 1 + i;
- if (fm10k_host_mbx_ready(interface)) {
+ if (fm10k_host_mbx_ready(interface))
hw->mac.ops.update_xcast_mode(hw, glort,
- FM10K_XCAST_MODE_MULTI);
+ FM10K_XCAST_MODE_NONE);
+
+ fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+ hw->mac.default_vid, true);
+
+ for (vid = fm10k_find_next_vlan(interface, 0);
+ vid < VLAN_N_VID;
+ vid = fm10k_find_next_vlan(interface, vid))
fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
- hw->mac.default_vid, true);
- }
+ vid, true);
fm10k_mbx_unlock(interface);
@@ -1532,8 +1562,8 @@
struct fm10k_dglort_cfg dglort = { 0 };
struct fm10k_hw *hw = &interface->hw;
struct net_device *sdev = priv;
+ u16 vid, glort;
int i;
- u16 glort;
if (!l2_accel)
return;
@@ -1553,12 +1583,18 @@
glort = l2_accel->dglort + 1 + i;
- if (fm10k_host_mbx_ready(interface)) {
+ if (fm10k_host_mbx_ready(interface))
hw->mac.ops.update_xcast_mode(hw, glort,
FM10K_XCAST_MODE_NONE);
+
+ fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+ hw->mac.default_vid, false);
+
+ for (vid = fm10k_find_next_vlan(interface, 0);
+ vid < VLAN_N_VID;
+ vid = fm10k_find_next_vlan(interface, vid))
fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
- hw->mac.default_vid, false);
- }
+ vid, false);
fm10k_mbx_unlock(interface);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index c4a2b68..15071e4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/module.h>
#include <linux/interrupt.h>
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 7ba54c5..8f0a99b6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k_pf.h"
#include "fm10k_vf.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
index ae81f9a..8e814df 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
@@ -1,23 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _FM10K_PF_H_
#define _FM10K_PF_H_
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c b/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c
index 725ecb7..2a7a40b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_tlv.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2018 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k_tlv.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_tlv.h b/drivers/net/ethernet/intel/fm10k/fm10k_tlv.h
index 5d2ee75..160bc5b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_tlv.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_tlv.h
@@ -1,23 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _FM10K_TLV_H_
#define _FM10K_TLV_H_
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_type.h b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
index dd23af1..3e608e4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_type.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_type.h
@@ -1,23 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _FM10K_TYPE_H_
#define _FM10K_TYPE_H_
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
index f069136..a8519c1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.c
@@ -1,23 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "fm10k_vf.h"
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_vf.h b/drivers/net/ethernet/intel/fm10k/fm10k_vf.h
index 66a66b7..787d0d5 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_vf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_vf.h
@@ -1,23 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _FM10K_VF_H_
#define _FM10K_VF_H_
diff --git a/drivers/net/ethernet/intel/i40e/Makefile b/drivers/net/ethernet/intel/i40e/Makefile
index 7543776..14397e7 100644
--- a/drivers/net/ethernet/intel/i40e/Makefile
+++ b/drivers/net/ethernet/intel/i40e/Makefile
@@ -1,29 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel Ethernet Controller XL710 Family Linux Driver
-# Copyright(c) 2013 - 2015 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along
-# with this program. If not, see <http://www.gnu.org/licenses/>.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
+# Copyright(c) 2013 - 2018 Intel Corporation.
#
# Makefile for the Intel(R) Ethernet Connection XL710 (i40e.ko) driver
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index a44139c..7a80652 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_H_
#define _I40E_H_
@@ -334,10 +310,12 @@
struct i40e_tc_info tc_info[I40E_MAX_TRAFFIC_CLASS];
};
+#define I40E_UDP_PORT_INDEX_UNUSED 255
struct i40e_udp_port_config {
/* AdminQ command interface expects port number in Host byte order */
u16 port;
u8 type;
+ u8 filter_index;
};
/* macros related to FLX_PIT */
@@ -608,7 +586,7 @@
unsigned long ptp_tx_start;
struct hwtstamp_config tstamp_config;
struct mutex tmreg_lock; /* Used to protect the SYSTIME registers. */
- u64 ptp_base_adj;
+ u32 ptp_adj_mult;
u32 tx_hwtstamp_timeouts;
u32 tx_hwtstamp_skipped;
u32 rx_hwtstamp_cleared;
@@ -1009,6 +987,9 @@
void i40e_notify_client_of_vf_msg(struct i40e_vsi *vsi, u32 vf_id,
u8 *msg, u16 len);
+int i40e_control_wait_tx_q(int seid, struct i40e_pf *pf, int pf_q, bool is_xdp,
+ bool enable);
+int i40e_control_wait_rx_q(struct i40e_pf *pf, int pf_q, bool enable);
int i40e_vsi_start_rings(struct i40e_vsi *vsi);
void i40e_vsi_stop_rings(struct i40e_vsi *vsi);
void i40e_vsi_stop_rings_no_wait(struct i40e_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
index 843fc77..ddbea79 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_status.h"
#include "i40e_type.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
index 0a8749e..edec3df 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_ADMINQ_H_
#define _I40E_ADMINQ_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 0244923..7d888e0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_ADMINQ_CMD_H_
#define _I40E_ADMINQ_CMD_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_alloc.h b/drivers/net/ethernet/intel/i40e/i40e_alloc.h
index abed0c5..cb86892 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_alloc.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_alloc.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_ALLOC_H_
#define _I40E_ALLOC_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c b/drivers/net/ethernet/intel/i40e/i40e_client.c
index d8ce499..5f3b8b9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/list.h>
#include <linux/errno.h>
@@ -64,7 +40,7 @@
/**
* i40e_client_get_params - Get the params that can change at runtime
* @vsi: the VSI with the message
- * @param: clinet param struct
+ * @params: client param struct
*
**/
static
@@ -590,7 +566,7 @@
* i40e_client_setup_qvlist
* @ldev: pointer to L2 context.
* @client: Client pointer.
- * @qv_info: queue and vector list
+ * @qvlist_info: queue and vector list
*
* Return 0 on success or < 0 on error
**/
@@ -665,7 +641,7 @@
* i40e_client_request_reset
* @ldev: pointer to L2 context.
* @client: Client pointer.
- * @level: reset level
+ * @reset_level: reset level
**/
static void i40e_client_request_reset(struct i40e_info *ldev,
struct i40e_client *client,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.h b/drivers/net/ethernet/intel/i40e/i40e_client.h
index 9d464d4..72994ba 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_CLIENT_H_
#define _I40E_CLIENT_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index c0a3dae..eb2d153 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_type.h"
#include "i40e_adminq.h"
@@ -1695,6 +1671,8 @@
/**
* i40e_set_fc
* @hw: pointer to the hw struct
+ * @aq_failures: buffer to return AdminQ failure information
+ * @atomic_restart: whether to enable atomic link restart
*
* Set the requested flow control mode using set_phy_config.
**/
@@ -2831,8 +2809,8 @@
* @mr_list: list of mirrored VSI SEIDs or VLAN IDs
* @cmd_details: pointer to command details structure or NULL
* @rule_id: Rule ID returned from FW
- * @rule_used: Number of rules used in internal switch
- * @rule_free: Number of rules free in internal switch
+ * @rules_used: Number of rules used in internal switch
+ * @rules_free: Number of rules free in internal switch
*
* Add/Delete a mirror rule to a specific switch. Mirror rules are supported for
* VEBs/VEPA elements only
@@ -2892,8 +2870,8 @@
* @mr_list: list of mirrored VSI SEIDs or VLAN IDs
* @cmd_details: pointer to command details structure or NULL
* @rule_id: Rule ID returned from FW
- * @rule_used: Number of rules used in internal switch
- * @rule_free: Number of rules free in internal switch
+ * @rules_used: Number of rules used in internal switch
+ * @rules_free: Number of rules free in internal switch
*
* Add mirror rule. Mirror rules are supported for VEBs or VEPA elements only
**/
@@ -2923,8 +2901,8 @@
* add_mirrorrule.
* @mr_list: list of mirrored VLAN IDs to be removed
* @cmd_details: pointer to command details structure or NULL
- * @rule_used: Number of rules used in internal switch
- * @rule_free: Number of rules free in internal switch
+ * @rules_used: Number of rules used in internal switch
+ * @rules_free: Number of rules free in internal switch
*
* Delete a mirror rule. Mirror rules are supported for VEBs/VEPA elements only
**/
@@ -3672,6 +3650,8 @@
/**
* i40e_aq_start_lldp
* @hw: pointer to the hw struct
+ * @buff: buffer for result
+ * @buff_size: buffer size
* @cmd_details: pointer to command details structure or NULL
*
* Start the embedded LLDP Agent on all ports.
@@ -3752,7 +3732,6 @@
* i40e_aq_add_udp_tunnel
* @hw: pointer to the hw struct
* @udp_port: the UDP port to add in Host byte order
- * @header_len: length of the tunneling header length in DWords
* @protocol_index: protocol index type
* @filter_index: pointer to filter index
* @cmd_details: pointer to command details structure or NULL
@@ -3971,6 +3950,7 @@
* @hw: pointer to the hw struct
* @seid: seid of the switching component connected to Physical Port
* @ets_data: Buffer holding ETS parameters
+ * @opcode: Tx scheduler AQ command opcode
* @cmd_details: pointer to command details structure or NULL
**/
i40e_status i40e_aq_config_switch_comp_ets(struct i40e_hw *hw,
@@ -4314,10 +4294,10 @@
* @hw: pointer to the hw struct
* @seid: VSI seid to add ethertype filter from
**/
-#define I40E_FLOW_CONTROL_ETHTYPE 0x8808
void i40e_add_filter_to_drop_tx_flow_control_frames(struct i40e_hw *hw,
u16 seid)
{
+#define I40E_FLOW_CONTROL_ETHTYPE 0x8808
u16 flag = I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC |
I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP |
I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TX;
@@ -4448,6 +4428,7 @@
* @ret_buff_size: actual buffer size returned
* @ret_next_table: next block to read
* @ret_next_index: next index to read
+ * @cmd_details: pointer to command details structure or NULL
*
* Dump internal FW/HW data for debug purposes.
*
@@ -4574,7 +4555,7 @@
* i40e_read_phy_register_clause22
* @hw: pointer to the HW structure
* @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
* @value: PHY register value
*
* Reads specified PHY register value
@@ -4619,7 +4600,7 @@
* i40e_write_phy_register_clause22
* @hw: pointer to the HW structure
* @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
* @value: PHY register value
*
* Writes specified PHY register value
@@ -4660,7 +4641,7 @@
* @hw: pointer to the HW structure
* @page: registers page number
* @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
* @value: PHY register value
*
* Reads specified PHY register value
@@ -4734,7 +4715,7 @@
* @hw: pointer to the HW structure
* @page: registers page number
* @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
* @value: PHY register value
*
* Writes value to specified PHY register
@@ -4801,7 +4782,7 @@
* @hw: pointer to the HW structure
* @page: registers page number
* @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
* @value: PHY register value
*
* Writes value to specified PHY register
@@ -4837,7 +4818,7 @@
* @hw: pointer to the HW structure
* @page: registers page number
* @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
* @value: PHY register value
*
* Reads specified PHY register value
@@ -4872,7 +4853,6 @@
* i40e_get_phy_address
* @hw: pointer to the HW structure
* @dev_num: PHY port num that address we want
- * @phy_addr: Returned PHY address
*
* Gets PHY address for current port
**/
@@ -5082,7 +5062,9 @@
* i40e_led_set_phy
* @hw: pointer to the HW structure
* @on: true or false
+ * @led_addr: address of led register to use
* @mode: original val plus bit for set or ignore
+ *
* Set led's on or off when controlled by the PHY
*
**/
@@ -5371,6 +5353,7 @@
* @hw: pointer to the hw struct
* @buff: command buffer (size in bytes = buff_size)
* @buff_size: buffer size in bytes
+ * @flags: AdminQ command flags
* @cmd_details: pointer to command details structure or NULL
**/
enum
diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
index 9fec728..56bff8f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_adminq.h"
#include "i40e_prototype.h"
@@ -945,6 +921,70 @@
}
/**
+ * _i40e_read_lldp_cfg - generic read of LLDP Configuration data from NVM
+ * @hw: pointer to the HW structure
+ * @lldp_cfg: pointer to hold lldp configuration variables
+ * @module: address of the module pointer
+ * @word_offset: offset of LLDP configuration
+ *
+ * Reads the LLDP configuration data from NVM using passed addresses
+ **/
+static i40e_status _i40e_read_lldp_cfg(struct i40e_hw *hw,
+ struct i40e_lldp_variables *lldp_cfg,
+ u8 module, u32 word_offset)
+{
+ u32 address, offset = (2 * word_offset);
+ i40e_status ret;
+ __le16 raw_mem;
+ u16 mem;
+
+ ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
+ if (ret)
+ return ret;
+
+ ret = i40e_aq_read_nvm(hw, 0x0, module * 2, sizeof(raw_mem), &raw_mem,
+ true, NULL);
+ i40e_release_nvm(hw);
+ if (ret)
+ return ret;
+
+ mem = le16_to_cpu(raw_mem);
+ /* Check if this pointer needs to be read in word size or 4K sector
+ * units.
+ */
+ if (mem & I40E_PTR_TYPE)
+ address = (0x7FFF & mem) * 4096;
+ else
+ address = (0x7FFF & mem) * 2;
+
+ ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
+ if (ret)
+ goto err_lldp_cfg;
+
+ ret = i40e_aq_read_nvm(hw, module, offset, sizeof(raw_mem), &raw_mem,
+ true, NULL);
+ i40e_release_nvm(hw);
+ if (ret)
+ return ret;
+
+ mem = le16_to_cpu(raw_mem);
+ offset = mem + word_offset;
+ offset *= 2;
+
+ ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
+ if (ret)
+ goto err_lldp_cfg;
+
+ ret = i40e_aq_read_nvm(hw, 0, address + offset,
+ sizeof(struct i40e_lldp_variables), lldp_cfg,
+ true, NULL);
+ i40e_release_nvm(hw);
+
+err_lldp_cfg:
+ return ret;
+}
+
+/**
* i40e_read_lldp_cfg - read LLDP Configuration data from NVM
* @hw: pointer to the HW structure
* @lldp_cfg: pointer to hold lldp configuration variables
@@ -955,21 +995,34 @@
struct i40e_lldp_variables *lldp_cfg)
{
i40e_status ret = 0;
- u32 offset = (2 * I40E_NVM_LLDP_CFG_PTR);
+ u32 mem;
if (!lldp_cfg)
return I40E_ERR_PARAM;
ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
if (ret)
- goto err_lldp_cfg;
+ return ret;
- ret = i40e_aq_read_nvm(hw, I40E_SR_EMP_MODULE_PTR, offset,
- sizeof(struct i40e_lldp_variables),
- (u8 *)lldp_cfg,
- true, NULL);
+ ret = i40e_aq_read_nvm(hw, I40E_SR_NVM_CONTROL_WORD, 0, sizeof(mem),
+ &mem, true, NULL);
i40e_release_nvm(hw);
+ if (ret)
+ return ret;
-err_lldp_cfg:
+ /* Read a bit that holds information whether we are running flat or
+ * structured NVM image. Flat image has LLDP configuration in shadow
+ * ram, so there is a need to pass different addresses for both cases.
+ */
+ if (mem & I40E_SR_NVM_MAP_STRUCTURE_TYPE) {
+ /* Flat NVM case */
+ ret = _i40e_read_lldp_cfg(hw, lldp_cfg, I40E_SR_EMP_MODULE_PTR,
+ I40E_SR_LLDP_CFG_PTR);
+ } else {
+ /* Good old structured NVM image */
+ ret = _i40e_read_lldp_cfg(hw, lldp_cfg, I40E_EMP_MODULE_PTR,
+ I40E_NVM_LLDP_CFG_PTR);
+ }
+
return ret;
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.h b/drivers/net/ethernet/intel/i40e/i40e_dcb.h
index 4f80638..2b748a6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_DCB_H_
#define _I40E_DCB_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c b/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c
index 502818e3..9deae9a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifdef CONFIG_I40E_DCB
#include "i40e.h"
@@ -47,7 +23,7 @@
/**
* i40e_dcbnl_ieee_getets - retrieve local IEEE ETS configuration
- * @netdev: the corresponding netdev
+ * @dev: the corresponding netdev
* @ets: structure to hold the ETS information
*
* Returns local IEEE ETS configuration
@@ -86,8 +62,8 @@
/**
* i40e_dcbnl_ieee_getpfc - retrieve local IEEE PFC configuration
- * @netdev: the corresponding netdev
- * @ets: structure to hold the PFC information
+ * @dev: the corresponding netdev
+ * @pfc: structure to hold the PFC information
*
* Returns local IEEE PFC configuration
**/
@@ -119,7 +95,7 @@
/**
* i40e_dcbnl_getdcbx - retrieve current DCBx capability
- * @netdev: the corresponding netdev
+ * @dev: the corresponding netdev
*
* Returns DCBx capability features
**/
@@ -132,7 +108,8 @@
/**
* i40e_dcbnl_get_perm_hw_addr - MAC address used by DCBx
- * @netdev: the corresponding netdev
+ * @dev: the corresponding netdev
+ * @perm_addr: buffer to store the MAC address
*
* Returns the SAN MAC address used for LLDP exchange
**/
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index d494dca..56b911a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifdef CONFIG_DEBUG_FS
@@ -36,8 +12,8 @@
/**
* i40e_dbg_find_vsi - searches for the vsi with the given seid
- * @pf - the PF structure to search for the vsi
- * @seid - seid of the vsi it is searching for
+ * @pf: the PF structure to search for the vsi
+ * @seid: seid of the vsi it is searching for
**/
static struct i40e_vsi *i40e_dbg_find_vsi(struct i40e_pf *pf, int seid)
{
@@ -55,8 +31,8 @@
/**
* i40e_dbg_find_veb - searches for the veb with the given seid
- * @pf - the PF structure to search for the veb
- * @seid - seid of the veb it is searching for
+ * @pf: the PF structure to search for the veb
+ * @seid: seid of the veb it is searching for
**/
static struct i40e_veb *i40e_dbg_find_veb(struct i40e_pf *pf, int seid)
{
diff --git a/drivers/net/ethernet/intel/i40e/i40e_devids.h b/drivers/net/ethernet/intel/i40e/i40e_devids.h
index ad6a66c..334b05f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_devids.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_devids.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_DEVIDS_H_
#define _I40E_DEVIDS_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_diag.c b/drivers/net/ethernet/intel/i40e/i40e_diag.c
index df3e604..ef4d3762 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_diag.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_diag.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_diag.h"
#include "i40e_prototype.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_diag.h b/drivers/net/ethernet/intel/i40e/i40e_diag.h
index be83417..c3340f3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_diag.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_diag.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_DIAG_H_
#define _I40E_DIAG_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index b974482..6947a2a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
/* ethtool support for i40e */
@@ -43,13 +19,13 @@
}
#define I40E_NETDEV_STAT(_net_stat) \
- I40E_STAT(struct rtnl_link_stats64, #_net_stat, _net_stat)
+ I40E_STAT(struct rtnl_link_stats64, #_net_stat, _net_stat)
#define I40E_PF_STAT(_name, _stat) \
- I40E_STAT(struct i40e_pf, _name, _stat)
+ I40E_STAT(struct i40e_pf, _name, _stat)
#define I40E_VSI_STAT(_name, _stat) \
- I40E_STAT(struct i40e_vsi, _name, _stat)
+ I40E_STAT(struct i40e_vsi, _name, _stat)
#define I40E_VEB_STAT(_name, _stat) \
- I40E_STAT(struct i40e_veb, _name, _stat)
+ I40E_STAT(struct i40e_veb, _name, _stat)
static const struct i40e_stats i40e_gstrings_net_stats[] = {
I40E_NETDEV_STAT(rx_packets),
@@ -66,18 +42,18 @@
};
static const struct i40e_stats i40e_gstrings_veb_stats[] = {
- I40E_VEB_STAT("rx_bytes", stats.rx_bytes),
- I40E_VEB_STAT("tx_bytes", stats.tx_bytes),
- I40E_VEB_STAT("rx_unicast", stats.rx_unicast),
- I40E_VEB_STAT("tx_unicast", stats.tx_unicast),
- I40E_VEB_STAT("rx_multicast", stats.rx_multicast),
- I40E_VEB_STAT("tx_multicast", stats.tx_multicast),
- I40E_VEB_STAT("rx_broadcast", stats.rx_broadcast),
- I40E_VEB_STAT("tx_broadcast", stats.tx_broadcast),
- I40E_VEB_STAT("rx_discards", stats.rx_discards),
- I40E_VEB_STAT("tx_discards", stats.tx_discards),
- I40E_VEB_STAT("tx_errors", stats.tx_errors),
- I40E_VEB_STAT("rx_unknown_protocol", stats.rx_unknown_protocol),
+ I40E_VEB_STAT("veb.rx_bytes", stats.rx_bytes),
+ I40E_VEB_STAT("veb.tx_bytes", stats.tx_bytes),
+ I40E_VEB_STAT("veb.rx_unicast", stats.rx_unicast),
+ I40E_VEB_STAT("veb.tx_unicast", stats.tx_unicast),
+ I40E_VEB_STAT("veb.rx_multicast", stats.rx_multicast),
+ I40E_VEB_STAT("veb.tx_multicast", stats.tx_multicast),
+ I40E_VEB_STAT("veb.rx_broadcast", stats.rx_broadcast),
+ I40E_VEB_STAT("veb.tx_broadcast", stats.tx_broadcast),
+ I40E_VEB_STAT("veb.rx_discards", stats.rx_discards),
+ I40E_VEB_STAT("veb.tx_discards", stats.tx_discards),
+ I40E_VEB_STAT("veb.tx_errors", stats.tx_errors),
+ I40E_VEB_STAT("veb.rx_unknown_protocol", stats.rx_unknown_protocol),
};
static const struct i40e_stats i40e_gstrings_misc_stats[] = {
@@ -90,6 +66,7 @@
I40E_VSI_STAT("rx_unknown_protocol", eth_stats.rx_unknown_protocol),
I40E_VSI_STAT("tx_linearize", tx_linearize),
I40E_VSI_STAT("tx_force_wb", tx_force_wb),
+ I40E_VSI_STAT("tx_busy", tx_busy),
I40E_VSI_STAT("rx_alloc_fail", rx_buf_failed),
I40E_VSI_STAT("rx_pg_alloc_fail", rx_page_failed),
};
@@ -105,76 +82,77 @@
* is queried on the base PF netdev, not on the VMDq or FCoE netdev.
*/
static const struct i40e_stats i40e_gstrings_stats[] = {
- I40E_PF_STAT("rx_bytes", stats.eth.rx_bytes),
- I40E_PF_STAT("tx_bytes", stats.eth.tx_bytes),
- I40E_PF_STAT("rx_unicast", stats.eth.rx_unicast),
- I40E_PF_STAT("tx_unicast", stats.eth.tx_unicast),
- I40E_PF_STAT("rx_multicast", stats.eth.rx_multicast),
- I40E_PF_STAT("tx_multicast", stats.eth.tx_multicast),
- I40E_PF_STAT("rx_broadcast", stats.eth.rx_broadcast),
- I40E_PF_STAT("tx_broadcast", stats.eth.tx_broadcast),
- I40E_PF_STAT("tx_errors", stats.eth.tx_errors),
- I40E_PF_STAT("rx_dropped", stats.eth.rx_discards),
- I40E_PF_STAT("tx_dropped_link_down", stats.tx_dropped_link_down),
- I40E_PF_STAT("rx_crc_errors", stats.crc_errors),
- I40E_PF_STAT("illegal_bytes", stats.illegal_bytes),
- I40E_PF_STAT("mac_local_faults", stats.mac_local_faults),
- I40E_PF_STAT("mac_remote_faults", stats.mac_remote_faults),
- I40E_PF_STAT("tx_timeout", tx_timeout_count),
- I40E_PF_STAT("rx_csum_bad", hw_csum_rx_error),
- I40E_PF_STAT("rx_length_errors", stats.rx_length_errors),
- I40E_PF_STAT("link_xon_rx", stats.link_xon_rx),
- I40E_PF_STAT("link_xoff_rx", stats.link_xoff_rx),
- I40E_PF_STAT("link_xon_tx", stats.link_xon_tx),
- I40E_PF_STAT("link_xoff_tx", stats.link_xoff_tx),
- I40E_PF_STAT("priority_xon_rx", stats.priority_xon_rx),
- I40E_PF_STAT("priority_xoff_rx", stats.priority_xoff_rx),
- I40E_PF_STAT("priority_xon_tx", stats.priority_xon_tx),
- I40E_PF_STAT("priority_xoff_tx", stats.priority_xoff_tx),
- I40E_PF_STAT("rx_size_64", stats.rx_size_64),
- I40E_PF_STAT("rx_size_127", stats.rx_size_127),
- I40E_PF_STAT("rx_size_255", stats.rx_size_255),
- I40E_PF_STAT("rx_size_511", stats.rx_size_511),
- I40E_PF_STAT("rx_size_1023", stats.rx_size_1023),
- I40E_PF_STAT("rx_size_1522", stats.rx_size_1522),
- I40E_PF_STAT("rx_size_big", stats.rx_size_big),
- I40E_PF_STAT("tx_size_64", stats.tx_size_64),
- I40E_PF_STAT("tx_size_127", stats.tx_size_127),
- I40E_PF_STAT("tx_size_255", stats.tx_size_255),
- I40E_PF_STAT("tx_size_511", stats.tx_size_511),
- I40E_PF_STAT("tx_size_1023", stats.tx_size_1023),
- I40E_PF_STAT("tx_size_1522", stats.tx_size_1522),
- I40E_PF_STAT("tx_size_big", stats.tx_size_big),
- I40E_PF_STAT("rx_undersize", stats.rx_undersize),
- I40E_PF_STAT("rx_fragments", stats.rx_fragments),
- I40E_PF_STAT("rx_oversize", stats.rx_oversize),
- I40E_PF_STAT("rx_jabber", stats.rx_jabber),
- I40E_PF_STAT("VF_admin_queue_requests", vf_aq_requests),
- I40E_PF_STAT("arq_overflows", arq_overflows),
- I40E_PF_STAT("rx_hwtstamp_cleared", rx_hwtstamp_cleared),
- I40E_PF_STAT("tx_hwtstamp_skipped", tx_hwtstamp_skipped),
- I40E_PF_STAT("fdir_flush_cnt", fd_flush_cnt),
- I40E_PF_STAT("fdir_atr_match", stats.fd_atr_match),
- I40E_PF_STAT("fdir_atr_tunnel_match", stats.fd_atr_tunnel_match),
- I40E_PF_STAT("fdir_atr_status", stats.fd_atr_status),
- I40E_PF_STAT("fdir_sb_match", stats.fd_sb_match),
- I40E_PF_STAT("fdir_sb_status", stats.fd_sb_status),
+ I40E_PF_STAT("port.rx_bytes", stats.eth.rx_bytes),
+ I40E_PF_STAT("port.tx_bytes", stats.eth.tx_bytes),
+ I40E_PF_STAT("port.rx_unicast", stats.eth.rx_unicast),
+ I40E_PF_STAT("port.tx_unicast", stats.eth.tx_unicast),
+ I40E_PF_STAT("port.rx_multicast", stats.eth.rx_multicast),
+ I40E_PF_STAT("port.tx_multicast", stats.eth.tx_multicast),
+ I40E_PF_STAT("port.rx_broadcast", stats.eth.rx_broadcast),
+ I40E_PF_STAT("port.tx_broadcast", stats.eth.tx_broadcast),
+ I40E_PF_STAT("port.tx_errors", stats.eth.tx_errors),
+ I40E_PF_STAT("port.rx_dropped", stats.eth.rx_discards),
+ I40E_PF_STAT("port.tx_dropped_link_down", stats.tx_dropped_link_down),
+ I40E_PF_STAT("port.rx_crc_errors", stats.crc_errors),
+ I40E_PF_STAT("port.illegal_bytes", stats.illegal_bytes),
+ I40E_PF_STAT("port.mac_local_faults", stats.mac_local_faults),
+ I40E_PF_STAT("port.mac_remote_faults", stats.mac_remote_faults),
+ I40E_PF_STAT("port.tx_timeout", tx_timeout_count),
+ I40E_PF_STAT("port.rx_csum_bad", hw_csum_rx_error),
+ I40E_PF_STAT("port.rx_length_errors", stats.rx_length_errors),
+ I40E_PF_STAT("port.link_xon_rx", stats.link_xon_rx),
+ I40E_PF_STAT("port.link_xoff_rx", stats.link_xoff_rx),
+ I40E_PF_STAT("port.link_xon_tx", stats.link_xon_tx),
+ I40E_PF_STAT("port.link_xoff_tx", stats.link_xoff_tx),
+ I40E_PF_STAT("port.rx_size_64", stats.rx_size_64),
+ I40E_PF_STAT("port.rx_size_127", stats.rx_size_127),
+ I40E_PF_STAT("port.rx_size_255", stats.rx_size_255),
+ I40E_PF_STAT("port.rx_size_511", stats.rx_size_511),
+ I40E_PF_STAT("port.rx_size_1023", stats.rx_size_1023),
+ I40E_PF_STAT("port.rx_size_1522", stats.rx_size_1522),
+ I40E_PF_STAT("port.rx_size_big", stats.rx_size_big),
+ I40E_PF_STAT("port.tx_size_64", stats.tx_size_64),
+ I40E_PF_STAT("port.tx_size_127", stats.tx_size_127),
+ I40E_PF_STAT("port.tx_size_255", stats.tx_size_255),
+ I40E_PF_STAT("port.tx_size_511", stats.tx_size_511),
+ I40E_PF_STAT("port.tx_size_1023", stats.tx_size_1023),
+ I40E_PF_STAT("port.tx_size_1522", stats.tx_size_1522),
+ I40E_PF_STAT("port.tx_size_big", stats.tx_size_big),
+ I40E_PF_STAT("port.rx_undersize", stats.rx_undersize),
+ I40E_PF_STAT("port.rx_fragments", stats.rx_fragments),
+ I40E_PF_STAT("port.rx_oversize", stats.rx_oversize),
+ I40E_PF_STAT("port.rx_jabber", stats.rx_jabber),
+ I40E_PF_STAT("port.VF_admin_queue_requests", vf_aq_requests),
+ I40E_PF_STAT("port.arq_overflows", arq_overflows),
+ I40E_PF_STAT("port.tx_hwtstamp_timeouts", tx_hwtstamp_timeouts),
+ I40E_PF_STAT("port.rx_hwtstamp_cleared", rx_hwtstamp_cleared),
+ I40E_PF_STAT("port.tx_hwtstamp_skipped", tx_hwtstamp_skipped),
+ I40E_PF_STAT("port.fdir_flush_cnt", fd_flush_cnt),
+ I40E_PF_STAT("port.fdir_atr_match", stats.fd_atr_match),
+ I40E_PF_STAT("port.fdir_atr_tunnel_match", stats.fd_atr_tunnel_match),
+ I40E_PF_STAT("port.fdir_atr_status", stats.fd_atr_status),
+ I40E_PF_STAT("port.fdir_sb_match", stats.fd_sb_match),
+ I40E_PF_STAT("port.fdir_sb_status", stats.fd_sb_status),
/* LPI stats */
- I40E_PF_STAT("tx_lpi_status", stats.tx_lpi_status),
- I40E_PF_STAT("rx_lpi_status", stats.rx_lpi_status),
- I40E_PF_STAT("tx_lpi_count", stats.tx_lpi_count),
- I40E_PF_STAT("rx_lpi_count", stats.rx_lpi_count),
+ I40E_PF_STAT("port.tx_lpi_status", stats.tx_lpi_status),
+ I40E_PF_STAT("port.rx_lpi_status", stats.rx_lpi_status),
+ I40E_PF_STAT("port.tx_lpi_count", stats.tx_lpi_count),
+ I40E_PF_STAT("port.rx_lpi_count", stats.rx_lpi_count),
};
-#define I40E_QUEUE_STATS_LEN(n) \
- (((struct i40e_netdev_priv *)netdev_priv((n)))->vsi->num_queue_pairs \
+/* We use num_tx_queues here as a proxy for the maximum number of queues
+ * available because we always allocate queues symmetrically.
+ */
+#define I40E_MAX_NUM_QUEUES(n) ((n)->num_tx_queues)
+#define I40E_QUEUE_STATS_LEN(n) \
+ (I40E_MAX_NUM_QUEUES(n) \
* 2 /* Tx and Rx together */ \
* (sizeof(struct i40e_queue_stats) / sizeof(u64)))
#define I40E_GLOBAL_STATS_LEN ARRAY_SIZE(i40e_gstrings_stats)
-#define I40E_NETDEV_STATS_LEN ARRAY_SIZE(i40e_gstrings_net_stats)
+#define I40E_NETDEV_STATS_LEN ARRAY_SIZE(i40e_gstrings_net_stats)
#define I40E_MISC_STATS_LEN ARRAY_SIZE(i40e_gstrings_misc_stats)
-#define I40E_VSI_STATS_LEN(n) (I40E_NETDEV_STATS_LEN + \
+#define I40E_VSI_STATS_LEN(n) (I40E_NETDEV_STATS_LEN + \
I40E_MISC_STATS_LEN + \
I40E_QUEUE_STATS_LEN((n)))
#define I40E_PFC_STATS_LEN ( \
@@ -977,7 +955,9 @@
ethtool_link_ksettings_test_link_mode(ks, advertising,
10000baseCR_Full) ||
ethtool_link_ksettings_test_link_mode(ks, advertising,
- 10000baseSR_Full))
+ 10000baseSR_Full) ||
+ ethtool_link_ksettings_test_link_mode(ks, advertising,
+ 10000baseLR_Full))
config.link_speed |= I40E_LINK_SPEED_10GB;
if (ethtool_link_ksettings_test_link_mode(ks, advertising,
20000baseKR2_Full))
@@ -1079,6 +1059,9 @@
/**
* i40e_get_pauseparam - Get Flow Control status
+ * @netdev: netdevice structure
+ * @pause: buffer to return pause parameters
+ *
* Return tx/rx-pause status
**/
static void i40e_get_pauseparam(struct net_device *netdev,
@@ -1677,6 +1660,32 @@
return err;
}
+/**
+ * i40e_get_stats_count - return the stats count for a device
+ * @netdev: the netdev to return the count for
+ *
+ * Returns the total number of statistics for this netdev. Note that even
+ * though this is a function, it is required that the count for a specific
+ * netdev must never change. Basing the count on static values such as the
+ * maximum number of queues or the device type is ok. However, the API for
+ * obtaining stats is *not* safe against changes based on non-static
+ * values such as the *current* number of queues, or runtime flags.
+ *
+ * If a statistic is not always enabled, return it as part of the count
+ * anyways, always return its string, and report its value as zero.
+ **/
+static int i40e_get_stats_count(struct net_device *netdev)
+{
+ struct i40e_netdev_priv *np = netdev_priv(netdev);
+ struct i40e_vsi *vsi = np->vsi;
+ struct i40e_pf *pf = vsi->back;
+
+ if (vsi == pf->vsi[pf->lan_vsi] && pf->hw.partition_id == 1)
+ return I40E_PF_STATS_LEN(netdev) + I40E_VEB_STATS_TOTAL;
+ else
+ return I40E_VSI_STATS_LEN(netdev);
+}
+
static int i40e_get_sset_count(struct net_device *netdev, int sset)
{
struct i40e_netdev_priv *np = netdev_priv(netdev);
@@ -1687,16 +1696,7 @@
case ETH_SS_TEST:
return I40E_TEST_LEN;
case ETH_SS_STATS:
- if (vsi == pf->vsi[pf->lan_vsi] && pf->hw.partition_id == 1) {
- int len = I40E_PF_STATS_LEN(netdev);
-
- if ((pf->lan_veb != I40E_NO_VEB) &&
- (pf->flags & I40E_FLAG_VEB_STATS_ENABLED))
- len += I40E_VEB_STATS_TOTAL;
- return len;
- } else {
- return I40E_VSI_STATS_LEN(netdev);
- }
+ return i40e_get_stats_count(netdev);
case ETH_SS_PRIV_FLAGS:
return I40E_PRIV_FLAGS_STR_LEN +
(pf->hw.pf_id == 0 ? I40E_GL_PRIV_FLAGS_STR_LEN : 0);
@@ -1705,6 +1705,20 @@
}
}
+/**
+ * i40e_get_ethtool_stats - copy stat values into supplied buffer
+ * @netdev: the netdev to collect stats for
+ * @stats: ethtool stats command structure
+ * @data: ethtool supplied buffer
+ *
+ * Copy the stats values for this netdev into the buffer. Expects data to be
+ * pre-allocated to the size returned by i40e_get_stats_count.. Note that all
+ * statistics must be copied in a static order, and the count must not change
+ * for a given netdev. See i40e_get_stats_count for more details.
+ *
+ * If a statistic is not currently valid (such as a disabled queue), this
+ * function reports its value as zero.
+ **/
static void i40e_get_ethtool_stats(struct net_device *netdev,
struct ethtool_stats *stats, u64 *data)
{
@@ -1712,47 +1726,54 @@
struct i40e_ring *tx_ring, *rx_ring;
struct i40e_vsi *vsi = np->vsi;
struct i40e_pf *pf = vsi->back;
- unsigned int j;
- int i = 0;
+ unsigned int i;
char *p;
struct rtnl_link_stats64 *net_stats = i40e_get_vsi_stats_struct(vsi);
unsigned int start;
i40e_update_stats(vsi);
- for (j = 0; j < I40E_NETDEV_STATS_LEN; j++) {
- p = (char *)net_stats + i40e_gstrings_net_stats[j].stat_offset;
- data[i++] = (i40e_gstrings_net_stats[j].sizeof_stat ==
+ for (i = 0; i < I40E_NETDEV_STATS_LEN; i++) {
+ p = (char *)net_stats + i40e_gstrings_net_stats[i].stat_offset;
+ *(data++) = (i40e_gstrings_net_stats[i].sizeof_stat ==
sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
}
- for (j = 0; j < I40E_MISC_STATS_LEN; j++) {
- p = (char *)vsi + i40e_gstrings_misc_stats[j].stat_offset;
- data[i++] = (i40e_gstrings_misc_stats[j].sizeof_stat ==
+ for (i = 0; i < I40E_MISC_STATS_LEN; i++) {
+ p = (char *)vsi + i40e_gstrings_misc_stats[i].stat_offset;
+ *(data++) = (i40e_gstrings_misc_stats[i].sizeof_stat ==
sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
}
rcu_read_lock();
- for (j = 0; j < vsi->num_queue_pairs; j++) {
- tx_ring = READ_ONCE(vsi->tx_rings[j]);
+ for (i = 0; i < I40E_MAX_NUM_QUEUES(netdev) ; i++) {
+ tx_ring = READ_ONCE(vsi->tx_rings[i]);
- if (!tx_ring)
+ if (!tx_ring) {
+ /* Bump the stat counter to skip these stats, and make
+ * sure the memory is zero'd
+ */
+ *(data++) = 0;
+ *(data++) = 0;
+ *(data++) = 0;
+ *(data++) = 0;
continue;
+ }
/* process Tx ring statistics */
do {
start = u64_stats_fetch_begin_irq(&tx_ring->syncp);
- data[i] = tx_ring->stats.packets;
- data[i + 1] = tx_ring->stats.bytes;
+ data[0] = tx_ring->stats.packets;
+ data[1] = tx_ring->stats.bytes;
} while (u64_stats_fetch_retry_irq(&tx_ring->syncp, start));
- i += 2;
+ data += 2;
/* Rx ring is the 2nd half of the queue pair */
rx_ring = &tx_ring[1];
do {
start = u64_stats_fetch_begin_irq(&rx_ring->syncp);
- data[i] = rx_ring->stats.packets;
- data[i + 1] = rx_ring->stats.bytes;
+ data[0] = rx_ring->stats.packets;
+ data[1] = rx_ring->stats.bytes;
} while (u64_stats_fetch_retry_irq(&rx_ring->syncp, start));
- i += 2;
+ data += 2;
}
rcu_read_unlock();
if (vsi != pf->vsi[pf->lan_vsi] || pf->hw.partition_id != 1)
@@ -1762,38 +1783,131 @@
(pf->flags & I40E_FLAG_VEB_STATS_ENABLED)) {
struct i40e_veb *veb = pf->veb[pf->lan_veb];
- for (j = 0; j < I40E_VEB_STATS_LEN; j++) {
+ for (i = 0; i < I40E_VEB_STATS_LEN; i++) {
p = (char *)veb;
- p += i40e_gstrings_veb_stats[j].stat_offset;
- data[i++] = (i40e_gstrings_veb_stats[j].sizeof_stat ==
+ p += i40e_gstrings_veb_stats[i].stat_offset;
+ *(data++) = (i40e_gstrings_veb_stats[i].sizeof_stat ==
sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
}
- for (j = 0; j < I40E_MAX_TRAFFIC_CLASS; j++) {
- data[i++] = veb->tc_stats.tc_tx_packets[j];
- data[i++] = veb->tc_stats.tc_tx_bytes[j];
- data[i++] = veb->tc_stats.tc_rx_packets[j];
- data[i++] = veb->tc_stats.tc_rx_bytes[j];
+ for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+ *(data++) = veb->tc_stats.tc_tx_packets[i];
+ *(data++) = veb->tc_stats.tc_tx_bytes[i];
+ *(data++) = veb->tc_stats.tc_rx_packets[i];
+ *(data++) = veb->tc_stats.tc_rx_bytes[i];
}
+ } else {
+ data += I40E_VEB_STATS_TOTAL;
}
- for (j = 0; j < I40E_GLOBAL_STATS_LEN; j++) {
- p = (char *)pf + i40e_gstrings_stats[j].stat_offset;
- data[i++] = (i40e_gstrings_stats[j].sizeof_stat ==
+ for (i = 0; i < I40E_GLOBAL_STATS_LEN; i++) {
+ p = (char *)pf + i40e_gstrings_stats[i].stat_offset;
+ *(data++) = (i40e_gstrings_stats[i].sizeof_stat ==
sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
}
- for (j = 0; j < I40E_MAX_USER_PRIORITY; j++) {
- data[i++] = pf->stats.priority_xon_tx[j];
- data[i++] = pf->stats.priority_xoff_tx[j];
+ for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
+ *(data++) = pf->stats.priority_xon_tx[i];
+ *(data++) = pf->stats.priority_xoff_tx[i];
}
- for (j = 0; j < I40E_MAX_USER_PRIORITY; j++) {
- data[i++] = pf->stats.priority_xon_rx[j];
- data[i++] = pf->stats.priority_xoff_rx[j];
+ for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
+ *(data++) = pf->stats.priority_xon_rx[i];
+ *(data++) = pf->stats.priority_xoff_rx[i];
}
- for (j = 0; j < I40E_MAX_USER_PRIORITY; j++)
- data[i++] = pf->stats.priority_xon_2_xoff[j];
+ for (i = 0; i < I40E_MAX_USER_PRIORITY; i++)
+ *(data++) = pf->stats.priority_xon_2_xoff[i];
}
-static void i40e_get_strings(struct net_device *netdev, u32 stringset,
- u8 *data)
+/**
+ * i40e_get_stat_strings - copy stat strings into supplied buffer
+ * @netdev: the netdev to collect strings for
+ * @data: supplied buffer to copy strings into
+ *
+ * Copy the strings related to stats for this netdev. Expects data to be
+ * pre-allocated with the size reported by i40e_get_stats_count. Note that the
+ * strings must be copied in a static order and the total count must not
+ * change for a given netdev. See i40e_get_stats_count for more details.
+ **/
+static void i40e_get_stat_strings(struct net_device *netdev, u8 *data)
+{
+ struct i40e_netdev_priv *np = netdev_priv(netdev);
+ struct i40e_vsi *vsi = np->vsi;
+ struct i40e_pf *pf = vsi->back;
+ unsigned int i;
+ u8 *p = data;
+
+ for (i = 0; i < I40E_NETDEV_STATS_LEN; i++) {
+ snprintf(data, ETH_GSTRING_LEN, "%s",
+ i40e_gstrings_net_stats[i].stat_string);
+ data += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < I40E_MISC_STATS_LEN; i++) {
+ snprintf(data, ETH_GSTRING_LEN, "%s",
+ i40e_gstrings_misc_stats[i].stat_string);
+ data += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < I40E_MAX_NUM_QUEUES(netdev); i++) {
+ snprintf(data, ETH_GSTRING_LEN, "tx-%u.tx_packets", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN, "tx-%u.tx_bytes", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN, "rx-%u.rx_packets", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN, "rx-%u.rx_bytes", i);
+ data += ETH_GSTRING_LEN;
+ }
+ if (vsi != pf->vsi[pf->lan_vsi] || pf->hw.partition_id != 1)
+ return;
+
+ for (i = 0; i < I40E_VEB_STATS_LEN; i++) {
+ snprintf(data, ETH_GSTRING_LEN, "%s",
+ i40e_gstrings_veb_stats[i].stat_string);
+ data += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+ snprintf(data, ETH_GSTRING_LEN,
+ "veb.tc_%u_tx_packets", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN,
+ "veb.tc_%u_tx_bytes", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN,
+ "veb.tc_%u_rx_packets", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN,
+ "veb.tc_%u_rx_bytes", i);
+ data += ETH_GSTRING_LEN;
+ }
+
+ for (i = 0; i < I40E_GLOBAL_STATS_LEN; i++) {
+ snprintf(data, ETH_GSTRING_LEN, "%s",
+ i40e_gstrings_stats[i].stat_string);
+ data += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
+ snprintf(data, ETH_GSTRING_LEN,
+ "port.tx_priority_%u_xon", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN,
+ "port.tx_priority_%u_xoff", i);
+ data += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
+ snprintf(data, ETH_GSTRING_LEN,
+ "port.rx_priority_%u_xon", i);
+ data += ETH_GSTRING_LEN;
+ snprintf(data, ETH_GSTRING_LEN,
+ "port.rx_priority_%u_xoff", i);
+ data += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
+ snprintf(data, ETH_GSTRING_LEN,
+ "port.rx_priority_%u_xon_2_xoff", i);
+ data += ETH_GSTRING_LEN;
+ }
+
+ WARN_ONCE(p - data != i40e_get_stats_count(netdev) * ETH_GSTRING_LEN,
+ "stat strings count mismatch!");
+}
+
+static void i40e_get_priv_flag_strings(struct net_device *netdev, u8 *data)
{
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
@@ -1801,98 +1915,33 @@
char *p = (char *)data;
unsigned int i;
+ for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++) {
+ snprintf(p, ETH_GSTRING_LEN, "%s",
+ i40e_gstrings_priv_flags[i].flag_string);
+ p += ETH_GSTRING_LEN;
+ }
+ if (pf->hw.pf_id != 0)
+ return;
+ for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++) {
+ snprintf(p, ETH_GSTRING_LEN, "%s",
+ i40e_gl_gstrings_priv_flags[i].flag_string);
+ p += ETH_GSTRING_LEN;
+ }
+}
+
+static void i40e_get_strings(struct net_device *netdev, u32 stringset,
+ u8 *data)
+{
switch (stringset) {
case ETH_SS_TEST:
memcpy(data, i40e_gstrings_test,
I40E_TEST_LEN * ETH_GSTRING_LEN);
break;
case ETH_SS_STATS:
- for (i = 0; i < I40E_NETDEV_STATS_LEN; i++) {
- snprintf(p, ETH_GSTRING_LEN, "%s",
- i40e_gstrings_net_stats[i].stat_string);
- p += ETH_GSTRING_LEN;
- }
- for (i = 0; i < I40E_MISC_STATS_LEN; i++) {
- snprintf(p, ETH_GSTRING_LEN, "%s",
- i40e_gstrings_misc_stats[i].stat_string);
- p += ETH_GSTRING_LEN;
- }
- for (i = 0; i < vsi->num_queue_pairs; i++) {
- snprintf(p, ETH_GSTRING_LEN, "tx-%d.tx_packets", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN, "tx-%d.tx_bytes", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN, "rx-%d.rx_packets", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN, "rx-%d.rx_bytes", i);
- p += ETH_GSTRING_LEN;
- }
- if (vsi != pf->vsi[pf->lan_vsi] || pf->hw.partition_id != 1)
- return;
-
- if ((pf->lan_veb != I40E_NO_VEB) &&
- (pf->flags & I40E_FLAG_VEB_STATS_ENABLED)) {
- for (i = 0; i < I40E_VEB_STATS_LEN; i++) {
- snprintf(p, ETH_GSTRING_LEN, "veb.%s",
- i40e_gstrings_veb_stats[i].stat_string);
- p += ETH_GSTRING_LEN;
- }
- for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
- snprintf(p, ETH_GSTRING_LEN,
- "veb.tc_%d_tx_packets", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN,
- "veb.tc_%d_tx_bytes", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN,
- "veb.tc_%d_rx_packets", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN,
- "veb.tc_%d_rx_bytes", i);
- p += ETH_GSTRING_LEN;
- }
- }
- for (i = 0; i < I40E_GLOBAL_STATS_LEN; i++) {
- snprintf(p, ETH_GSTRING_LEN, "port.%s",
- i40e_gstrings_stats[i].stat_string);
- p += ETH_GSTRING_LEN;
- }
- for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
- snprintf(p, ETH_GSTRING_LEN,
- "port.tx_priority_%d_xon", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN,
- "port.tx_priority_%d_xoff", i);
- p += ETH_GSTRING_LEN;
- }
- for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
- snprintf(p, ETH_GSTRING_LEN,
- "port.rx_priority_%d_xon", i);
- p += ETH_GSTRING_LEN;
- snprintf(p, ETH_GSTRING_LEN,
- "port.rx_priority_%d_xoff", i);
- p += ETH_GSTRING_LEN;
- }
- for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
- snprintf(p, ETH_GSTRING_LEN,
- "port.rx_priority_%d_xon_2_xoff", i);
- p += ETH_GSTRING_LEN;
- }
- /* BUG_ON(p - data != I40E_STATS_LEN * ETH_GSTRING_LEN); */
+ i40e_get_stat_strings(netdev, data);
break;
case ETH_SS_PRIV_FLAGS:
- for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++) {
- snprintf(p, ETH_GSTRING_LEN, "%s",
- i40e_gstrings_priv_flags[i].flag_string);
- p += ETH_GSTRING_LEN;
- }
- if (pf->hw.pf_id != 0)
- break;
- for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++) {
- snprintf(p, ETH_GSTRING_LEN, "%s",
- i40e_gl_gstrings_priv_flags[i].flag_string);
- p += ETH_GSTRING_LEN;
- }
+ i40e_get_priv_flag_strings(netdev, data);
break;
default:
break;
@@ -2550,7 +2599,7 @@
/**
* i40e_check_mask - Check whether a mask field is set
* @mask: the full mask value
- * @field; mask of the field to check
+ * @field: mask of the field to check
*
* If the given mask is fully set, return positive value. If the mask for the
* field is fully unset, return zero. Otherwise return a negative error code.
@@ -2621,6 +2670,7 @@
/**
* i40e_fill_rx_flow_user_data - Fill in user-defined data field
* @fsp: pointer to rx_flow specification
+ * @data: pointer to return userdef data
*
* Reads the userdef data structure and properly fills in the user defined
* fields of the rx_flow_spec.
@@ -2799,6 +2849,7 @@
* i40e_get_rxnfc - command to get RX flow classification rules
* @netdev: network interface device structure
* @cmd: ethtool rxnfc command
+ * @rule_locs: pointer to store rule data
*
* Returns Success if the command is supported.
**/
@@ -2840,7 +2891,7 @@
/**
* i40e_get_rss_hash_bits - Read RSS Hash bits from register
* @nfc: pointer to user request
- * @i_setc bits currently set
+ * @i_setc: bits currently set
*
* Returns value of bits to be set per user request
**/
@@ -2885,7 +2936,7 @@
/**
* i40e_set_rss_hash_opt - Enable/Disable flow types for RSS hash
* @pf: pointer to the physical function struct
- * @cmd: ethtool rxnfc command
+ * @nfc: ethtool rxnfc command
*
* Returns Success if the flow input set is supported.
**/
@@ -3284,7 +3335,7 @@
* __i40e_reprogram_flex_pit - Re-program specific FLX_PIT table
* @pf: Pointer to the PF structure
* @flex_pit_list: list of flexible src offsets in use
- * #flex_pit_start: index to first entry for this section of the table
+ * @flex_pit_start: index to first entry for this section of the table
*
* In order to handle flexible data, the hardware uses a table of values
* called the FLX_PIT table. This table is used to indicate which sections of
@@ -3398,7 +3449,7 @@
/**
* i40e_flow_str - Converts a flow_type into a human readable string
- * @flow_type: the flow type from a flow specification
+ * @fsp: the flow specification
*
* Currently only flow types we support are included here, and the string
* value attempts to match what ethtool would use to configure this flow type.
@@ -4103,7 +4154,7 @@
/**
* i40e_get_channels - Get the current channels enabled and max supported etc.
- * @netdev: network interface device structure
+ * @dev: network interface device structure
* @ch: ethtool channels structure
*
* We don't support separate tx and rx queues as channels. The other count
@@ -4112,7 +4163,7 @@
* q_vectors since we support a lot more queue pairs than q_vectors.
**/
static void i40e_get_channels(struct net_device *dev,
- struct ethtool_channels *ch)
+ struct ethtool_channels *ch)
{
struct i40e_netdev_priv *np = netdev_priv(dev);
struct i40e_vsi *vsi = np->vsi;
@@ -4131,14 +4182,14 @@
/**
* i40e_set_channels - Set the new channels count.
- * @netdev: network interface device structure
+ * @dev: network interface device structure
* @ch: ethtool channels structure
*
* The new channels count may not be the same as requested by the user
* since it gets rounded down to a power of 2 value.
**/
static int i40e_set_channels(struct net_device *dev,
- struct ethtool_channels *ch)
+ struct ethtool_channels *ch)
{
const u8 drop = I40E_FILTER_PROGRAM_DESC_DEST_DROP_PACKET;
struct i40e_netdev_priv *np = netdev_priv(dev);
@@ -4273,6 +4324,7 @@
* @netdev: network interface device structure
* @indir: indirection table
* @key: hash key
+ * @hfunc: hash function to use
*
* Returns -EINVAL if the table specifies an invalid queue id, otherwise
* returns 0 after programming the table.
diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
index 6d4b590..19ce93d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_osdep.h"
#include "i40e_register.h"
@@ -198,7 +174,6 @@
* @hw: pointer to our HW structure
* @hmc_info: pointer to the HMC configuration information structure
* @idx: the page index
- * @is_pf: distinguishes a VF from a PF
*
* This function:
* 1. Marks the entry in pd tabe (for paged address mode) or in sd table
diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.h b/drivers/net/ethernet/intel/i40e/i40e_hmc.h
index 7b5fd33..1c78de8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_hmc.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_HMC_H_
#define _I40E_HMC_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
index cd40dc4..994011c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_osdep.h"
#include "i40e_register.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h
index 79e1396..c46a2c4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_LAN_HMC_H_
#define _I40E_LAN_HMC_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1622999..b5daa5c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/etherdevice.h>
#include <linux/of_net.h>
@@ -278,8 +254,8 @@
/**
* i40e_find_vsi_from_id - searches for the vsi with the given id
- * @pf - the pf structure to search for the vsi
- * @id - id of the vsi it is searching for
+ * @pf: the pf structure to search for the vsi
+ * @id: id of the vsi it is searching for
**/
struct i40e_vsi *i40e_find_vsi_from_id(struct i40e_pf *pf, u16 id)
{
@@ -435,6 +411,7 @@
/**
* i40e_get_netdev_stats_struct - Get statistics for netdev interface
* @netdev: network interface device structure
+ * @stats: data structure to store statistics
*
* Returns the address of the device statistics structure.
* The statistics are actually updated from the service task.
@@ -2027,7 +2004,7 @@
* from firmware
* @count: Number of filters added
* @add_list: return data from fw
- * @head: pointer to first filter in current batch
+ * @add_head: pointer to first filter in current batch
*
* MAC filter entries from list were slated to be added to device. Returns
* number of successful filters. Note that 0 does NOT mean success!
@@ -2134,6 +2111,7 @@
/**
* i40e_aqc_broadcast_filter - Set promiscuous broadcast flags
* @vsi: pointer to the VSI
+ * @vsi_name: the VSI name
* @f: filter data
*
* This function sets or clears the promiscuous broadcast flags for VLAN
@@ -2840,6 +2818,7 @@
/**
* i40e_vlan_rx_add_vid - Add a vlan id filter to HW offload
* @netdev: network interface to be adjusted
+ * @proto: unused protocol value
* @vid: vlan id to be added
*
* net_device_ops implementation for adding vlan ids
@@ -2862,8 +2841,26 @@
}
/**
+ * i40e_vlan_rx_add_vid_up - Add a vlan id filter to HW offload in UP path
+ * @netdev: network interface to be adjusted
+ * @proto: unused protocol value
+ * @vid: vlan id to be added
+ **/
+static void i40e_vlan_rx_add_vid_up(struct net_device *netdev,
+ __always_unused __be16 proto, u16 vid)
+{
+ struct i40e_netdev_priv *np = netdev_priv(netdev);
+ struct i40e_vsi *vsi = np->vsi;
+
+ if (vid >= VLAN_N_VID)
+ return;
+ set_bit(vid, vsi->active_vlans);
+}
+
+/**
* i40e_vlan_rx_kill_vid - Remove a vlan id filter from HW offload
* @netdev: network interface to be adjusted
+ * @proto: unused protocol value
* @vid: vlan id to be removed
*
* net_device_ops implementation for removing vlan ids
@@ -2902,8 +2899,8 @@
i40e_vlan_stripping_disable(vsi);
for_each_set_bit(vid, vsi->active_vlans, VLAN_N_VID)
- i40e_vlan_rx_add_vid(vsi->netdev, htons(ETH_P_8021Q),
- vid);
+ i40e_vlan_rx_add_vid_up(vsi->netdev, htons(ETH_P_8021Q),
+ vid);
}
/**
@@ -3485,7 +3482,7 @@
/**
* i40e_enable_misc_int_causes - enable the non-queue interrupts
- * @hw: ptr to the hardware info
+ * @pf: pointer to private device data structure
**/
static void i40e_enable_misc_int_causes(struct i40e_pf *pf)
{
@@ -4255,8 +4252,8 @@
* @is_xdp: true if the queue is used for XDP
* @enable: start or stop the queue
**/
-static int i40e_control_wait_tx_q(int seid, struct i40e_pf *pf, int pf_q,
- bool is_xdp, bool enable)
+int i40e_control_wait_tx_q(int seid, struct i40e_pf *pf, int pf_q,
+ bool is_xdp, bool enable)
{
int ret;
@@ -4301,7 +4298,6 @@
if (ret)
break;
}
-
return ret;
}
@@ -4340,9 +4336,9 @@
* @pf_q: the PF queue to configure
* @enable: start or stop the queue
*
- * This function enables or disables a single queue. Note that any delay
- * required after the operation is expected to be handled by the caller of
- * this function.
+ * This function enables or disables a single queue. Note that
+ * any delay required after the operation is expected to be
+ * handled by the caller of this function.
**/
static void i40e_control_rx_q(struct i40e_pf *pf, int pf_q, bool enable)
{
@@ -4372,6 +4368,30 @@
}
/**
+ * i40e_control_wait_rx_q
+ * @pf: the PF structure
+ * @pf_q: queue being configured
+ * @enable: start or stop the rings
+ *
+ * This function enables or disables a single queue along with waiting
+ * for the change to finish. The caller of this function should handle
+ * the delays needed in the case of disabling queues.
+ **/
+int i40e_control_wait_rx_q(struct i40e_pf *pf, int pf_q, bool enable)
+{
+ int ret = 0;
+
+ i40e_control_rx_q(pf, pf_q, enable);
+
+ /* wait for the change to finish */
+ ret = i40e_pf_rxq_wait(pf, pf_q, enable);
+ if (ret)
+ return ret;
+
+ return ret;
+}
+
+/**
* i40e_vsi_control_rx - Start or stop a VSI's rings
* @vsi: the VSI being configured
* @enable: start or stop the rings
@@ -4383,10 +4403,7 @@
pf_q = vsi->base_queue;
for (i = 0; i < vsi->num_queue_pairs; i++, pf_q++) {
- i40e_control_rx_q(pf, pf_q, enable);
-
- /* wait for the change to finish */
- ret = i40e_pf_rxq_wait(pf, pf_q, enable);
+ ret = i40e_control_wait_rx_q(pf, pf_q, enable);
if (ret) {
dev_info(&pf->pdev->dev,
"VSI seid %d Rx ring %d %sable timeout\n",
@@ -5096,7 +5113,7 @@
* i40e_vsi_configure_bw_alloc - Configure VSI BW allocation per TC
* @vsi: the VSI being configured
* @enabled_tc: TC bitmap
- * @bw_credits: BW shared credits per TC
+ * @bw_share: BW shared credits per TC
*
* Returns 0 on success, negative value on failure
**/
@@ -6353,6 +6370,7 @@
/**
* i40e_print_link_message - print link up or down
* @vsi: the VSI for which link needs a message
+ * @isup: true of link is up, false otherwise
*/
void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
{
@@ -7212,8 +7230,7 @@
if (mask->dst == cpu_to_be32(0xffffffff)) {
field_flags |= I40E_CLOUD_FIELD_IIP;
} else {
- mask->dst = be32_to_cpu(mask->dst);
- dev_err(&pf->pdev->dev, "Bad ip dst mask %pI4\n",
+ dev_err(&pf->pdev->dev, "Bad ip dst mask %pI4b\n",
&mask->dst);
return I40E_ERR_CONFIG;
}
@@ -7223,8 +7240,7 @@
if (mask->src == cpu_to_be32(0xffffffff)) {
field_flags |= I40E_CLOUD_FIELD_IIP;
} else {
- mask->src = be32_to_cpu(mask->src);
- dev_err(&pf->pdev->dev, "Bad ip src mask %pI4\n",
+ dev_err(&pf->pdev->dev, "Bad ip src mask %pI4b\n",
&mask->src);
return I40E_ERR_CONFIG;
}
@@ -9691,9 +9707,9 @@
i40e_flush(hw);
}
-static const char *i40e_tunnel_name(struct i40e_udp_port_config *port)
+static const char *i40e_tunnel_name(u8 type)
{
- switch (port->type) {
+ switch (type) {
case UDP_TUNNEL_TYPE_VXLAN:
return "vxlan";
case UDP_TUNNEL_TYPE_GENEVE:
@@ -9727,37 +9743,68 @@
static void i40e_sync_udp_filters_subtask(struct i40e_pf *pf)
{
struct i40e_hw *hw = &pf->hw;
- i40e_status ret;
+ u8 filter_index, type;
u16 port;
int i;
if (!test_and_clear_bit(__I40E_UDP_FILTER_SYNC_PENDING, pf->state))
return;
+ /* acquire RTNL to maintain state of flags and port requests */
+ rtnl_lock();
+
for (i = 0; i < I40E_MAX_PF_UDP_OFFLOAD_PORTS; i++) {
if (pf->pending_udp_bitmap & BIT_ULL(i)) {
+ struct i40e_udp_port_config *udp_port;
+ i40e_status ret = 0;
+
+ udp_port = &pf->udp_ports[i];
pf->pending_udp_bitmap &= ~BIT_ULL(i);
- port = pf->udp_ports[i].port;
+
+ port = READ_ONCE(udp_port->port);
+ type = READ_ONCE(udp_port->type);
+ filter_index = READ_ONCE(udp_port->filter_index);
+
+ /* release RTNL while we wait on AQ command */
+ rtnl_unlock();
+
if (port)
ret = i40e_aq_add_udp_tunnel(hw, port,
- pf->udp_ports[i].type,
- NULL, NULL);
- else
- ret = i40e_aq_del_udp_tunnel(hw, i, NULL);
+ type,
+ &filter_index,
+ NULL);
+ else if (filter_index != I40E_UDP_PORT_INDEX_UNUSED)
+ ret = i40e_aq_del_udp_tunnel(hw, filter_index,
+ NULL);
+
+ /* reacquire RTNL so we can update filter_index */
+ rtnl_lock();
if (ret) {
dev_info(&pf->pdev->dev,
"%s %s port %d, index %d failed, err %s aq_err %s\n",
- i40e_tunnel_name(&pf->udp_ports[i]),
+ i40e_tunnel_name(type),
port ? "add" : "delete",
- port, i,
+ port,
+ filter_index,
i40e_stat_str(&pf->hw, ret),
i40e_aq_str(&pf->hw,
pf->hw.aq.asq_last_status));
- pf->udp_ports[i].port = 0;
+ if (port) {
+ /* failed to add, just reset port,
+ * drop pending bit for any deletion
+ */
+ udp_port->port = 0;
+ pf->pending_udp_bitmap &= ~BIT_ULL(i);
+ }
+ } else if (port) {
+ /* record filter index on success */
+ udp_port->filter_index = filter_index;
}
}
}
+
+ rtnl_unlock();
}
/**
@@ -10004,7 +10051,7 @@
/**
* i40e_vsi_free_arrays - Free queue and vector pointer arrays for the VSI
- * @type: VSI pointer
+ * @vsi: VSI pointer
* @free_qvectors: a bool to specify if q_vectors need to be freed.
*
* On error: returns error code (negative)
@@ -10279,21 +10326,28 @@
/* any vectors left over go for VMDq support */
if (pf->flags & I40E_FLAG_VMDQ_ENABLED) {
- int vmdq_vecs_wanted = pf->num_vmdq_vsis * pf->num_vmdq_qps;
- int vmdq_vecs = min_t(int, vectors_left, vmdq_vecs_wanted);
-
if (!vectors_left) {
pf->num_vmdq_msix = 0;
pf->num_vmdq_qps = 0;
} else {
+ int vmdq_vecs_wanted =
+ pf->num_vmdq_vsis * pf->num_vmdq_qps;
+ int vmdq_vecs =
+ min_t(int, vectors_left, vmdq_vecs_wanted);
+
/* if we're short on vectors for what's desired, we limit
* the queues per vmdq. If this is still more than are
* available, the user will need to change the number of
* queues/vectors used by the PF later with the ethtool
* channels command
*/
- if (vmdq_vecs < vmdq_vecs_wanted)
+ if (vectors_left < vmdq_vecs_wanted) {
pf->num_vmdq_qps = 1;
+ vmdq_vecs_wanted = pf->num_vmdq_vsis;
+ vmdq_vecs = min_t(int,
+ vectors_left,
+ vmdq_vecs_wanted);
+ }
pf->num_vmdq_msix = pf->num_vmdq_qps;
v_budget += vmdq_vecs;
@@ -10800,7 +10854,7 @@
* @vsi: Pointer to VSI structure
* @seed: Buffer to store the keys
* @lut: Buffer to store the lookup table entries
- * lut_size: Size of buffer to store the lookup table entries
+ * @lut_size: Size of buffer to store the lookup table entries
*
* Returns 0 on success, negative on failure
*/
@@ -11374,6 +11428,11 @@
u8 i;
for (i = 0; i < I40E_MAX_PF_UDP_OFFLOAD_PORTS; i++) {
+ /* Do not report ports with pending deletions as
+ * being available.
+ */
+ if (!port && (pf->pending_udp_bitmap & BIT_ULL(i)))
+ continue;
if (pf->udp_ports[i].port == port)
return i;
}
@@ -11428,6 +11487,7 @@
/* New port: add it and mark its index in the bitmap */
pf->udp_ports[next_idx].port = port;
+ pf->udp_ports[next_idx].filter_index = I40E_UDP_PORT_INDEX_UNUSED;
pf->pending_udp_bitmap |= BIT_ULL(next_idx);
set_bit(__I40E_UDP_FILTER_SYNC_PENDING, pf->state);
}
@@ -11469,7 +11529,12 @@
* and make it pending
*/
pf->udp_ports[idx].port = 0;
- pf->pending_udp_bitmap |= BIT_ULL(idx);
+
+ /* Toggle pending bit instead of setting it. This way if we are
+ * deleting a port that has yet to be added we just clear the pending
+ * bit and don't have to worry about it.
+ */
+ pf->pending_udp_bitmap ^= BIT_ULL(idx);
set_bit(__I40E_UDP_FILTER_SYNC_PENDING, pf->state);
return;
@@ -11500,6 +11565,7 @@
* @tb: pointer to array of nladdr (unused)
* @dev: the net device pointer
* @addr: the MAC address entry being added
+ * @vid: VLAN ID
* @flags: instructions from stack about fdb operation
*/
static int i40e_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
@@ -11545,6 +11611,7 @@
* i40e_ndo_bridge_setlink - Set the hardware bridge mode
* @dev: the netdev being configured
* @nlh: RTNL message
+ * @flags: bridge flags
*
* Inserts a new hardware bridge if not already created and
* enables the bridging mode requested (VEB or VEPA). If the
@@ -14118,6 +14185,7 @@
/**
* i40e_pci_error_detected - warning that something funky happened in PCI land
* @pdev: PCI device information struct
+ * @error: the type of PCI error
*
* Called to warn that something happened and the error handling steps
* are in progress. Allows the driver to quiesce things, be ready for
diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index ba9687c..0299e5b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_prototype.h"
@@ -1173,6 +1149,7 @@
* i40e_nvmupd_check_wait_event - handle NVM update operation events
* @hw: pointer to the hardware structure
* @opcode: the event that just happened
+ * @desc: AdminQ descriptor
**/
void i40e_nvmupd_check_wait_event(struct i40e_hw *hw, u16 opcode,
struct i40e_aq_desc *desc)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
index 9c3c3b0..a07574b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_OSDEP_H_
#define _I40E_OSDEP_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_prototype.h b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
index 2ec2418..3170655 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_prototype.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_PROTOTYPE_H_
#define _I40E_PROTOTYPE_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 5b47dd1..35f2866 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e.h"
#include <linux/ptp_classify.h>
@@ -40,9 +16,9 @@
* At 1Gb link, the period is multiplied by 20. (32ns)
* 1588 functionality is not supported at 100Mbps.
*/
-#define I40E_PTP_40GB_INCVAL 0x0199999999ULL
-#define I40E_PTP_10GB_INCVAL 0x0333333333ULL
-#define I40E_PTP_1GB_INCVAL 0x2000000000ULL
+#define I40E_PTP_40GB_INCVAL 0x0199999999ULL
+#define I40E_PTP_10GB_INCVAL_MULT 2
+#define I40E_PTP_1GB_INCVAL_MULT 20
#define I40E_PRTTSYN_CTL1_TSYNTYPE_V1 BIT(I40E_PRTTSYN_CTL1_TSYNTYPE_SHIFT)
#define I40E_PRTTSYN_CTL1_TSYNTYPE_V2 (2 << \
@@ -130,17 +106,24 @@
ppb = -ppb;
}
- smp_mb(); /* Force any pending update before accessing. */
- adj = READ_ONCE(pf->ptp_base_adj);
-
- freq = adj;
+ freq = I40E_PTP_40GB_INCVAL;
freq *= ppb;
diff = div_u64(freq, 1000000000ULL);
if (neg_adj)
- adj -= diff;
+ adj = I40E_PTP_40GB_INCVAL - diff;
else
- adj += diff;
+ adj = I40E_PTP_40GB_INCVAL + diff;
+
+ /* At some link speeds, the base incval is so large that directly
+ * multiplying by ppb would result in arithmetic overflow even when
+ * using a u64. Avoid this by instead calculating the new incval
+ * always in terms of the 40GbE clock rate and then multiplying by the
+ * link speed factor afterwards. This does result in slightly lower
+ * precision at lower link speeds, but it is fairly minor.
+ */
+ smp_mb(); /* Force any pending update before accessing. */
+ adj *= READ_ONCE(pf->ptp_adj_mult);
wr32(hw, I40E_PRTTSYN_INC_L, adj & 0xFFFFFFFF);
wr32(hw, I40E_PRTTSYN_INC_H, adj >> 32);
@@ -334,10 +317,12 @@
* This watchdog task is run periodically to make sure that we clear the Tx
* timestamp logic if we don't obtain a timestamp in a reasonable amount of
* time. It is unexpected in the normal case but if it occurs it results in
- * permanently prevent timestamps of future packets
+ * permanently preventing timestamps of future packets.
**/
void i40e_ptp_tx_hang(struct i40e_pf *pf)
{
+ struct sk_buff *skb;
+
if (!(pf->flags & I40E_FLAG_PTP) || !pf->ptp_tx)
return;
@@ -350,9 +335,12 @@
* within a second it is reasonable to assume that we never will.
*/
if (time_is_before_jiffies(pf->ptp_tx_start + HZ)) {
- dev_kfree_skb_any(pf->ptp_tx_skb);
+ skb = pf->ptp_tx_skb;
pf->ptp_tx_skb = NULL;
clear_bit_unlock(__I40E_PTP_TX_IN_PROGRESS, pf->state);
+
+ /* Free the skb after we clear the bitlock */
+ dev_kfree_skb_any(skb);
pf->tx_hwtstamp_timeouts++;
}
}
@@ -462,6 +450,7 @@
struct i40e_link_status *hw_link_info;
struct i40e_hw *hw = &pf->hw;
u64 incval;
+ u32 mult;
hw_link_info = &hw->phy.link_info;
@@ -469,10 +458,10 @@
switch (hw_link_info->link_speed) {
case I40E_LINK_SPEED_10GB:
- incval = I40E_PTP_10GB_INCVAL;
+ mult = I40E_PTP_10GB_INCVAL_MULT;
break;
case I40E_LINK_SPEED_1GB:
- incval = I40E_PTP_1GB_INCVAL;
+ mult = I40E_PTP_1GB_INCVAL_MULT;
break;
case I40E_LINK_SPEED_100MB:
{
@@ -483,15 +472,20 @@
"1588 functionality is not supported at 100 Mbps. Stopping the PHC.\n");
warn_once++;
}
- incval = 0;
+ mult = 0;
break;
}
case I40E_LINK_SPEED_40GB:
default:
- incval = I40E_PTP_40GB_INCVAL;
+ mult = 1;
break;
}
+ /* The increment value is calculated by taking the base 40GbE incvalue
+ * and multiplying it by a factor based on the link speed.
+ */
+ incval = I40E_PTP_40GB_INCVAL * mult;
+
/* Write the new increment value into the increment register. The
* hardware will not update the clock until both registers have been
* written.
@@ -500,14 +494,14 @@
wr32(hw, I40E_PRTTSYN_INC_H, incval >> 32);
/* Update the base adjustement value. */
- WRITE_ONCE(pf->ptp_base_adj, incval);
+ WRITE_ONCE(pf->ptp_adj_mult, mult);
smp_mb(); /* Force the above update. */
}
/**
* i40e_ptp_get_ts_config - ioctl interface to read the HW timestamping
* @pf: Board private structure
- * @ifreq: ioctl data
+ * @ifr: ioctl data
*
* Obtain the current hardware timestamping settigs as requested. To do this,
* keep a shadow copy of the timestamp settings rather than attempting to
@@ -651,7 +645,7 @@
/**
* i40e_ptp_set_ts_config - ioctl interface to control the HW timestamping
* @pf: Board private structure
- * @ifreq: ioctl data
+ * @ifr: ioctl data
*
* Respond to the user filter requests and make the appropriate hardware
* changes here. The XL710 cannot support splitting of the Tx/Rx timestamping
@@ -805,9 +799,11 @@
pf->ptp_rx = false;
if (pf->ptp_tx_skb) {
- dev_kfree_skb_any(pf->ptp_tx_skb);
+ struct sk_buff *skb = pf->ptp_tx_skb;
+
pf->ptp_tx_skb = NULL;
clear_bit_unlock(__I40E_PTP_TX_IN_PROGRESS, pf->state);
+ dev_kfree_skb_any(skb);
}
if (pf->ptp_clock) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h
index b3e206e..52e3680 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_register.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_REGISTER_H_
#define _I40E_REGISTER_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_status.h b/drivers/net/ethernet/intel/i40e/i40e_status.h
index 10c86f6..77be070 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_status.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_status.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_STATUS_H_
#define _I40E_STATUS_H_
diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
index 410ba13..424f020 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel(R) 40-10 Gigabit Ethernet Connection Network Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
/* Modeled on trace-events-sample.h */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f174c72..9b698c5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/prefetch.h>
#include <net/busy_poll.h>
@@ -495,7 +471,7 @@
/**
* i40e_add_del_fdir - Build raw packets to add/del fdir filter
* @vsi: pointer to the targeted VSI
- * @cmd: command to get or set RX flow classification rules
+ * @input: filter to add or delete
* @add: true adds a filter, false removes it
*
**/
@@ -638,7 +614,7 @@
if (tx_buffer->tx_flags & I40E_TX_FLAGS_FD_SB)
kfree(tx_buffer->raw_buf);
else if (ring_is_xdp(ring))
- page_frag_free(tx_buffer->raw_buf);
+ xdp_return_frame(tx_buffer->xdpf);
else
dev_kfree_skb_any(tx_buffer->skb);
if (dma_unmap_len(tx_buffer, len))
@@ -713,7 +689,7 @@
/**
* i40e_get_tx_pending - how many tx descriptors not processed
- * @tx_ring: the ring of descriptors
+ * @ring: the ring of descriptors
* @in_sw: use SW variables
*
* Since there is no access to the ring head register
@@ -841,7 +817,7 @@
/* free the skb/XDP data */
if (ring_is_xdp(tx_ring))
- page_frag_free(tx_buf->raw_buf);
+ xdp_return_frame(tx_buf->xdpf);
else
napi_consume_skb(tx_buf->skb, napi_budget);
@@ -1795,6 +1771,8 @@
* i40e_rx_hash - set the hash value in the skb
* @ring: descriptor ring
* @rx_desc: specific descriptor
+ * @skb: skb currently being received and modified
+ * @rx_ptype: Rx packet type
**/
static inline void i40e_rx_hash(struct i40e_ring *ring,
union i40e_rx_desc *rx_desc,
@@ -2203,9 +2181,20 @@
#define I40E_XDP_CONSUMED 1
#define I40E_XDP_TX 2
-static int i40e_xmit_xdp_ring(struct xdp_buff *xdp,
+static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
struct i40e_ring *xdp_ring);
+static int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp,
+ struct i40e_ring *xdp_ring)
+{
+ struct xdp_frame *xdpf = convert_to_xdp_frame(xdp);
+
+ if (unlikely(!xdpf))
+ return I40E_XDP_CONSUMED;
+
+ return i40e_xmit_xdp_ring(xdpf, xdp_ring);
+}
+
/**
* i40e_run_xdp - run an XDP program
* @rx_ring: Rx ring being processed
@@ -2225,13 +2214,15 @@
if (!xdp_prog)
goto xdp_out;
+ prefetchw(xdp->data_hard_start); /* xdp_frame write */
+
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
break;
case XDP_TX:
xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
- result = i40e_xmit_xdp_ring(xdp, xdp_ring);
+ result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
break;
case XDP_REDIRECT:
err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
@@ -3478,13 +3469,13 @@
* @xdp: data to transmit
* @xdp_ring: XDP Tx ring
**/
-static int i40e_xmit_xdp_ring(struct xdp_buff *xdp,
+static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
struct i40e_ring *xdp_ring)
{
- u32 size = xdp->data_end - xdp->data;
u16 i = xdp_ring->next_to_use;
struct i40e_tx_buffer *tx_bi;
struct i40e_tx_desc *tx_desc;
+ u32 size = xdpf->len;
dma_addr_t dma;
if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
@@ -3492,14 +3483,14 @@
return I40E_XDP_CONSUMED;
}
- dma = dma_map_single(xdp_ring->dev, xdp->data, size, DMA_TO_DEVICE);
+ dma = dma_map_single(xdp_ring->dev, xdpf->data, size, DMA_TO_DEVICE);
if (dma_mapping_error(xdp_ring->dev, dma))
return I40E_XDP_CONSUMED;
tx_bi = &xdp_ring->tx_bi[i];
tx_bi->bytecount = size;
tx_bi->gso_segs = 1;
- tx_bi->raw_buf = xdp->data;
+ tx_bi->xdpf = xdpf;
/* record length, and DMA address */
dma_unmap_len_set(tx_bi, len, size);
@@ -3673,14 +3664,19 @@
* @dev: netdev
* @xdp: XDP buffer
*
- * Returns Zero if sent, else an error code
+ * Returns number of frames successfully sent. Frames that fail are
+ * free'ed via XDP return API.
+ *
+ * For error cases, a negative errno code is returned and no-frames
+ * are transmitted (caller must handle freeing frames).
**/
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
{
struct i40e_netdev_priv *np = netdev_priv(dev);
unsigned int queue_index = smp_processor_id();
struct i40e_vsi *vsi = np->vsi;
- int err;
+ int drops = 0;
+ int i;
if (test_bit(__I40E_VSI_DOWN, vsi->state))
return -ENETDOWN;
@@ -3688,11 +3684,18 @@
if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs)
return -ENXIO;
- err = i40e_xmit_xdp_ring(xdp, vsi->xdp_rings[queue_index]);
- if (err != I40E_XDP_TX)
- return -ENOSPC;
+ for (i = 0; i < n; i++) {
+ struct xdp_frame *xdpf = frames[i];
+ int err;
- return 0;
+ err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]);
+ if (err != I40E_XDP_TX) {
+ xdp_return_frame_rx_napi(xdpf);
+ drops++;
+ }
+ }
+
+ return n - drops;
}
/**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 3043483..eb8804b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_TXRX_H_
#define _I40E_TXRX_H_
@@ -306,6 +282,7 @@
struct i40e_tx_buffer {
struct i40e_tx_desc *next_to_watch;
union {
+ struct xdp_frame *xdpf;
struct sk_buff *skb;
void *raw_buf;
};
@@ -510,7 +487,7 @@
void i40e_detect_recover_hung(struct i40e_vsi *vsi);
int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size);
bool __i40e_chk_linearize(struct sk_buff *skb);
-int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp);
+int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames);
void i40e_xdp_flush(struct net_device *dev);
/**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index bfb8009..7df969c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_TYPE_H_
#define _I40E_TYPE_H_
@@ -1318,7 +1294,8 @@
/* Checksum and Shadow RAM pointers */
#define I40E_SR_NVM_CONTROL_WORD 0x00
-#define I40E_SR_EMP_MODULE_PTR 0x0F
+#define I40E_EMP_MODULE_PTR 0x0F
+#define I40E_SR_EMP_MODULE_PTR 0x48
#define I40E_SR_PBA_FLAGS 0x15
#define I40E_SR_PBA_BLOCK_PTR 0x16
#define I40E_SR_BOOT_CONFIG_PTR 0x17
@@ -1337,6 +1314,8 @@
#define I40E_SR_PCIE_ALT_MODULE_MAX_SIZE 1024
#define I40E_SR_CONTROL_WORD_1_SHIFT 0x06
#define I40E_SR_CONTROL_WORD_1_MASK (0x03 << I40E_SR_CONTROL_WORD_1_SHIFT)
+#define I40E_SR_CONTROL_WORD_1_NVM_BANK_VALID BIT(5)
+#define I40E_SR_NVM_MAP_STRUCTURE_TYPE BIT(12)
#define I40E_PTR_TYPE BIT(15)
#define I40E_SR_OCP_CFG_WORD0 0x2B
#define I40E_SR_OCP_ENABLED BIT(15)
@@ -1454,7 +1433,8 @@
};
/* IEEE 802.1AB LLDP Agent Variables from NVM */
-#define I40E_NVM_LLDP_CFG_PTR 0xD
+#define I40E_NVM_LLDP_CFG_PTR 0x06
+#define I40E_SR_LLDP_CFG_PTR 0x31
struct i40e_lldp_variables {
u16 length;
u16 adminstatus;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 35173cb..c6d24ea 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e.h"
@@ -32,8 +8,8 @@
/**
* i40e_vc_vf_broadcast
* @pf: pointer to the PF structure
- * @opcode: operation code
- * @retval: return value
+ * @v_opcode: operation code
+ * @v_retval: return value
* @msg: pointer to the msg buffer
* @msglen: msg length
*
@@ -1663,6 +1639,7 @@
/**
* i40e_vc_get_version_msg
* @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer
*
* called from the VF to request the API version used by the PF
**/
@@ -1706,7 +1683,6 @@
* i40e_vc_get_vf_resources_msg
* @vf: pointer to the VF info
* @msg: pointer to the msg buffer
- * @msglen: msg length
*
* called from the VF to request its resources
**/
@@ -1830,8 +1806,6 @@
/**
* i40e_vc_reset_vf_msg
* @vf: pointer to the VF info
- * @msg: pointer to the msg buffer
- * @msglen: msg length
*
* called from the VF to reset itself,
* unlike other virtchnl messages, PF driver
@@ -2180,6 +2154,51 @@
}
/**
+ * i40e_ctrl_vf_tx_rings
+ * @vsi: the SRIOV VSI being configured
+ * @q_map: bit map of the queues to be enabled
+ * @enable: start or stop the queue
+ **/
+static int i40e_ctrl_vf_tx_rings(struct i40e_vsi *vsi, unsigned long q_map,
+ bool enable)
+{
+ struct i40e_pf *pf = vsi->back;
+ int ret = 0;
+ u16 q_id;
+
+ for_each_set_bit(q_id, &q_map, I40E_MAX_VF_QUEUES) {
+ ret = i40e_control_wait_tx_q(vsi->seid, pf,
+ vsi->base_queue + q_id,
+ false /*is xdp*/, enable);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
+/**
+ * i40e_ctrl_vf_rx_rings
+ * @vsi: the SRIOV VSI being configured
+ * @q_map: bit map of the queues to be enabled
+ * @enable: start or stop the queue
+ **/
+static int i40e_ctrl_vf_rx_rings(struct i40e_vsi *vsi, unsigned long q_map,
+ bool enable)
+{
+ struct i40e_pf *pf = vsi->back;
+ int ret = 0;
+ u16 q_id;
+
+ for_each_set_bit(q_id, &q_map, I40E_MAX_VF_QUEUES) {
+ ret = i40e_control_wait_rx_q(pf, vsi->base_queue + q_id,
+ enable);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
+/**
* i40e_vc_enable_queues_msg
* @vf: pointer to the VF info
* @msg: pointer to the msg buffer
@@ -2211,8 +2230,17 @@
goto error_param;
}
- if (i40e_vsi_start_rings(pf->vsi[vf->lan_vsi_idx]))
+ /* Use the queue bit map sent by the VF */
+ if (i40e_ctrl_vf_rx_rings(pf->vsi[vf->lan_vsi_idx], vqs->rx_queues,
+ true)) {
aq_ret = I40E_ERR_TIMEOUT;
+ goto error_param;
+ }
+ if (i40e_ctrl_vf_tx_rings(pf->vsi[vf->lan_vsi_idx], vqs->tx_queues,
+ true)) {
+ aq_ret = I40E_ERR_TIMEOUT;
+ goto error_param;
+ }
/* need to start the rings for additional ADq VSI's as well */
if (vf->adq_enabled) {
@@ -2260,8 +2288,17 @@
goto error_param;
}
- i40e_vsi_stop_rings(pf->vsi[vf->lan_vsi_idx]);
-
+ /* Use the queue bit map sent by the VF */
+ if (i40e_ctrl_vf_tx_rings(pf->vsi[vf->lan_vsi_idx], vqs->tx_queues,
+ false)) {
+ aq_ret = I40E_ERR_TIMEOUT;
+ goto error_param;
+ }
+ if (i40e_ctrl_vf_rx_rings(pf->vsi[vf->lan_vsi_idx], vqs->rx_queues,
+ false)) {
+ aq_ret = I40E_ERR_TIMEOUT;
+ goto error_param;
+ }
error_param:
/* send the response to the VF */
return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES,
@@ -3556,15 +3593,16 @@
* i40e_vc_process_vf_msg
* @pf: pointer to the PF structure
* @vf_id: source VF id
+ * @v_opcode: operation code
+ * @v_retval: unused return value code
* @msg: pointer to the msg buffer
* @msglen: msg length
- * @msghndl: msg handle
*
* called from the common aeq/arq handler to
* process request from VF
**/
int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 vf_id, u32 v_opcode,
- u32 v_retval, u8 *msg, u16 msglen)
+ u32 __always_unused v_retval, u8 *msg, u16 msglen)
{
struct i40e_hw *hw = &pf->hw;
int local_vf_id = vf_id - (s16)hw->func_caps.vf_base_id;
@@ -4015,7 +4053,8 @@
* i40e_ndo_set_vf_bw
* @netdev: network interface device structure
* @vf_id: VF identifier
- * @tx_rate: Tx rate
+ * @min_tx_rate: Minimum Tx rate
+ * @max_tx_rate: Maximum Tx rate
*
* configure VF Tx rate
**/
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 57f727b..bf67d62 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Driver
- * Copyright(c) 2013 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_VIRTCHNL_PF_H_
#define _I40E_VIRTCHNL_PF_H_
diff --git a/drivers/net/ethernet/intel/i40evf/Makefile b/drivers/net/ethernet/intel/i40evf/Makefile
index 1e89c54..3c5c6e9 100644
--- a/drivers/net/ethernet/intel/i40evf/Makefile
+++ b/drivers/net/ethernet/intel/i40evf/Makefile
@@ -1,29 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
-# Copyright(c) 2013 - 2014 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along
-# with this program. If not, see <http://www.gnu.org/licenses/>.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
+# Copyright(c) 2013 - 2018 Intel Corporation.
#
## Makefile for the Intel(R) 40GbE VF driver
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq.c b/drivers/net/ethernet/intel/i40evf/i40e_adminq.c
index 6fd677e..c355120 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_status.h"
#include "i40e_type.h"
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq.h
index a7137c1..1f264b9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_ADMINQ_H_
#define _I40E_ADMINQ_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 439e718..aa81e87 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_ADMINQ_CMD_H_
#define _I40E_ADMINQ_CMD_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_alloc.h b/drivers/net/ethernet/intel/i40evf/i40e_alloc.h
index 7e0fddd..cb86892 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_alloc.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_alloc.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_ALLOC_H_
#define _I40E_ALLOC_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 67140cd..9cef549 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40e_type.h"
#include "i40e_adminq.h"
@@ -1255,6 +1231,7 @@
* @hw: pointer to the hw struct
* @buff: command buffer (size in bytes = buff_size)
* @buff_size: buffer size in bytes
+ * @flags: AdminQ command flags
* @cmd_details: pointer to command details structure or NULL
**/
enum
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_devids.h b/drivers/net/ethernet/intel/i40evf/i40e_devids.h
index 352dd3f..f300bf2 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_devids.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_devids.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_DEVIDS_H_
#define _I40E_DEVIDS_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_hmc.h b/drivers/net/ethernet/intel/i40evf/i40e_hmc.h
index 7432596..1c78de8 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_hmc.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_hmc.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_HMC_H_
#define _I40E_HMC_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_lan_hmc.h b/drivers/net/ethernet/intel/i40evf/i40e_lan_hmc.h
index ddac0e4..82b00f7 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_lan_hmc.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_lan_hmc.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_LAN_HMC_H_
#define _I40E_LAN_HMC_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_osdep.h b/drivers/net/ethernet/intel/i40evf/i40e_osdep.h
index 8668ad6..3ddddb4 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_osdep.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_osdep.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_OSDEP_H_
#define _I40E_OSDEP_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_prototype.h b/drivers/net/ethernet/intel/i40evf/i40e_prototype.h
index 72501bd..a358f4b 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_prototype.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_prototype.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_PROTOTYPE_H_
#define _I40E_PROTOTYPE_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_register.h b/drivers/net/ethernet/intel/i40evf/i40e_register.h
index c9c9356..49e1f57 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_register.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_REGISTER_H_
#define _I40E_REGISTER_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_status.h b/drivers/net/ethernet/intel/i40evf/i40e_status.h
index 0d7993e..77be070 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_status.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_status.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_STATUS_H_
#define _I40E_STATUS_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_trace.h b/drivers/net/ethernet/intel/i40evf/i40e_trace.h
index ece01dd..d7a4e68 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_trace.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_trace.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel(R) 40-10 Gigabit Ethernet Virtual Function Driver
- * Copyright(c) 2013 - 2017 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
/* Modeled on trace-events-sample.h */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 12bd937..a973071 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include <linux/prefetch.h>
#include <net/busy_poll.h>
@@ -129,7 +105,7 @@
/**
* i40evf_get_tx_pending - how many Tx descriptors not processed
- * @tx_ring: the ring of descriptors
+ * @ring: the ring of descriptors
* @in_sw: is tx_pending being checked in SW or HW
*
* Since there is no access to the ring head register
@@ -1070,6 +1046,8 @@
* i40e_rx_hash - set the hash value in the skb
* @ring: descriptor ring
* @rx_desc: specific descriptor
+ * @skb: skb currently being received and modified
+ * @rx_ptype: Rx packet type
**/
static inline void i40e_rx_hash(struct i40e_ring *ring,
union i40e_rx_desc *rx_desc,
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 5790897..3b5a63b 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_TXRX_H_
#define _I40E_TXRX_H_
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index 449de4b..094387d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40E_TYPE_H_
#define _I40E_TYPE_H_
@@ -1257,7 +1233,8 @@
/* Checksum and Shadow RAM pointers */
#define I40E_SR_NVM_CONTROL_WORD 0x00
-#define I40E_SR_EMP_MODULE_PTR 0x0F
+#define I40E_EMP_MODULE_PTR 0x0F
+#define I40E_SR_EMP_MODULE_PTR 0x48
#define I40E_NVM_OEM_VER_OFF 0x83
#define I40E_SR_NVM_DEV_STARTER_VERSION 0x18
#define I40E_SR_NVM_WAKE_ON_LAN 0x19
@@ -1273,6 +1250,9 @@
#define I40E_SR_PCIE_ALT_MODULE_MAX_SIZE 1024
#define I40E_SR_CONTROL_WORD_1_SHIFT 0x06
#define I40E_SR_CONTROL_WORD_1_MASK (0x03 << I40E_SR_CONTROL_WORD_1_SHIFT)
+#define I40E_SR_CONTROL_WORD_1_NVM_BANK_VALID BIT(5)
+#define I40E_SR_NVM_MAP_STRUCTURE_TYPE BIT(12)
+#define I40E_PTR_TYPE BIT(15)
/* Shadow RAM related */
#define I40E_SR_SECTOR_SIZE_IN_WORDS 0x800
@@ -1386,6 +1366,10 @@
I40E_RESET_EMPR = 3,
};
+/* IEEE 802.1AB LLDP Agent Variables from NVM */
+#define I40E_NVM_LLDP_CFG_PTR 0x06
+#define I40E_SR_LLDP_CFG_PTR 0x31
+
/* RSS Hash Table Size */
#define I40E_PFQF_CTL_0_HASHLUTSIZE_512 0x00010000
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index 3a7a1e7..96e537a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#ifndef _I40EVF_H_
#define _I40EVF_H_
@@ -105,7 +81,6 @@
#define I40E_TX_DESC(R, i) (&(((struct i40e_tx_desc *)((R)->desc))[i]))
#define I40E_TX_CTXTDESC(R, i) \
(&(((struct i40e_tx_context_desc *)((R)->desc))[i]))
-#define MAX_QUEUES 16
#define I40EVF_MAX_REQ_QUEUES 4
#define I40EVF_HKEY_ARRAY_SIZE ((I40E_VFQF_HKEY_MAX_INDEX + 1) * 4)
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_client.c b/drivers/net/ethernet/intel/i40evf/i40evf_client.c
index da60ce1..3cc9d60 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_client.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_client.c
@@ -1,4 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
+
#include <linux/list.h>
#include <linux/errno.h>
@@ -176,7 +178,6 @@
/**
* i40evf_client_add_instance - add a client instance to the instance list
* @adapter: pointer to the board struct
- * @client: pointer to a client struct in the client list.
*
* Returns cinst ptr on success, NULL on failure
**/
@@ -234,7 +235,6 @@
/**
* i40evf_client_del_instance - removes a client instance from the list
* @adapter: pointer to the board struct
- * @client: pointer to the client struct
*
**/
static
@@ -438,7 +438,7 @@
* i40evf_client_setup_qvlist - send a message to the PF to setup iwarp qv map
* @ldev: pointer to L2 context.
* @client: Client pointer.
- * @qv_info: queue and vector list
+ * @qvlist_info: queue and vector list
*
* Return 0 on success or < 0 on error
**/
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_client.h b/drivers/net/ethernet/intel/i40evf/i40evf_client.h
index 15a10da..5585f36 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_client.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_client.h
@@ -1,6 +1,8 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _I40E_CLIENT_H_
-#define _I40E_CLIENT_H_
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
+
+#ifndef _I40EVF_CLIENT_H_
+#define _I40EVF_CLIENT_H_
#define I40EVF_CLIENT_STR_LENGTH 10
@@ -164,4 +166,4 @@
/* used by clients */
int i40evf_register_client(struct i40e_client *client);
int i40evf_unregister_client(struct i40e_client *client);
-#endif /* _I40E_CLIENT_H_ */
+#endif /* _I40EVF_CLIENT_H_ */
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index dc4cde2..69efe0a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
/* ethtool support for i40evf */
#include "i40evf.h"
@@ -226,7 +202,7 @@
/**
* i40evf_get_priv_flags - report device private flags
- * @dev: network interface device structure
+ * @netdev: network interface device structure
*
* The get string set count and the string set should be matched for each
* flag returned. Add new strings for each flag to the i40e_gstrings_priv_flags
@@ -253,7 +229,7 @@
/**
* i40evf_set_priv_flags - set private flags
- * @dev: network interface device structure
+ * @netdev: network interface device structure
* @flags: bit flags to be set
**/
static int i40evf_set_priv_flags(struct net_device *netdev, u32 flags)
@@ -627,6 +603,7 @@
* i40evf_get_rxnfc - command to get RX flow classification rules
* @netdev: network interface device structure
* @cmd: ethtool rxnfc command
+ * @rule_locs: pointer to store rule locations
*
* Returns Success if the command is supported.
**/
@@ -746,6 +723,7 @@
* @netdev: network interface device structure
* @indir: indirection table
* @key: hash key
+ * @hfunc: hash function in use
*
* Reads the indirection table directly from the hardware. Always returns 0.
**/
@@ -774,6 +752,7 @@
* @netdev: network interface device structure
* @indir: indirection table
* @key: hash key
+ * @hfunc: hash function to use
*
* Returns -EINVAL if the table specifies an inavlid queue id, otherwise
* returns 0 after programming the table.
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 5f71532..a7b87f9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40evf.h"
#include "i40e_prototype.h"
@@ -473,6 +449,7 @@
/**
* i40evf_request_traffic_irqs - Initialize MSI-X interrupts
* @adapter: board private structure
+ * @basename: device basename
*
* Allocates MSI-X vectors for tx and rx handling, and requests
* interrupts from the kernel.
@@ -705,7 +682,7 @@
f = i40evf_find_vlan(adapter, vlan);
if (!f) {
- f = kzalloc(sizeof(*f), GFP_ATOMIC);
+ f = kzalloc(sizeof(*f), GFP_KERNEL);
if (!f)
goto clearout;
@@ -745,6 +722,7 @@
/**
* i40evf_vlan_rx_add_vid - Add a VLAN filter to a device
* @netdev: network device struct
+ * @proto: unused protocol data
* @vid: VLAN tag
**/
static int i40evf_vlan_rx_add_vid(struct net_device *netdev,
@@ -762,6 +740,7 @@
/**
* i40evf_vlan_rx_kill_vid - Remove a VLAN filter from a device
* @netdev: network device struct
+ * @proto: unused protocol data
* @vid: VLAN tag
**/
static int i40evf_vlan_rx_kill_vid(struct net_device *netdev,
@@ -1946,7 +1925,8 @@
* ndo_open() returning, so we can't assume it means all our open
* tasks have finished, since we're not holding the rtnl_lock here.
*/
- running = (adapter->state == __I40EVF_RUNNING);
+ running = ((adapter->state == __I40EVF_RUNNING) ||
+ (adapter->state == __I40EVF_RESETTING));
if (running) {
netif_carrier_off(netdev);
@@ -2352,7 +2332,7 @@
total_max_rate += tx_rate;
num_qps += mqprio_qopt->qopt.count[i];
}
- if (num_qps > MAX_QUEUES)
+ if (num_qps > I40EVF_MAX_REQ_QUEUES)
return -EINVAL;
ret = i40evf_validate_tx_bandwidth(adapter, total_max_rate);
@@ -3160,7 +3140,7 @@
/**
* i40evf_features_check - Validate encapsulated packet conforms to limits
* @skb: skb buff
- * @netdev: This physical port's netdev
+ * @dev: This physical port's netdev
* @features: Offload features that the stack believes apply
**/
static netdev_features_t i40evf_features_check(struct sk_buff *skb,
@@ -3378,6 +3358,24 @@
if (vfres->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_VLAN)
netdev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+ /* Do not turn on offloads when they are requested to be turned off.
+ * TSO needs minimum 576 bytes to work correctly.
+ */
+ if (netdev->wanted_features) {
+ if (!(netdev->wanted_features & NETIF_F_TSO) ||
+ netdev->mtu < 576)
+ netdev->features &= ~NETIF_F_TSO;
+ if (!(netdev->wanted_features & NETIF_F_TSO6) ||
+ netdev->mtu < 576)
+ netdev->features &= ~NETIF_F_TSO6;
+ if (!(netdev->wanted_features & NETIF_F_TSO_ECN))
+ netdev->features &= ~NETIF_F_TSO_ECN;
+ if (!(netdev->wanted_features & NETIF_F_GRO))
+ netdev->features &= ~NETIF_F_GRO;
+ if (!(netdev->wanted_features & NETIF_F_GSO))
+ netdev->features &= ~NETIF_F_GSO;
+ }
+
adapter->vsi.id = adapter->vsi_res->vsi_id;
adapter->vsi.back = adapter;
@@ -3692,7 +3690,8 @@
pci_set_master(pdev);
- netdev = alloc_etherdev_mq(sizeof(struct i40evf_adapter), MAX_QUEUES);
+ netdev = alloc_etherdev_mq(sizeof(struct i40evf_adapter),
+ I40EVF_MAX_REQ_QUEUES);
if (!netdev) {
err = -ENOMEM;
goto err_alloc_etherdev;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index 26a5989..565677d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
- *
- * Intel Ethernet Controller XL710 Family Linux Virtual Function Driver
- * Copyright(c) 2013 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 2013 - 2018 Intel Corporation. */
#include "i40evf.h"
#include "i40e_prototype.h"
@@ -179,8 +155,7 @@
/**
* i40evf_get_vf_config
- * @hw: pointer to the hardware structure
- * @len: length of buffer
+ * @adapter: private adapter structure
*
* Get VF configuration from PF and populate hw structure. Must be called after
* admin queue is initialized. Busy waits until response is received from PF,
@@ -423,8 +398,6 @@
/**
* i40evf_add_ether_addrs
* @adapter: adapter structure
- * @addrs: the MAC address filters to add (contiguous)
- * @count: number of filters
*
* Request that the PF add one or more addresses to our filters.
**/
@@ -497,8 +470,6 @@
/**
* i40evf_del_ether_addrs
* @adapter: adapter structure
- * @addrs: the MAC address filters to remove (contiguous)
- * @count: number of filtes
*
* Request that the PF remove one or more addresses from our filters.
**/
@@ -571,8 +542,6 @@
/**
* i40evf_add_vlans
* @adapter: adapter structure
- * @vlans: the VLANs to add
- * @count: number of VLANs
*
* Request that the PF add one or more VLAN filters to our VSI.
**/
@@ -643,8 +612,6 @@
/**
* i40evf_del_vlans
* @adapter: adapter structure
- * @vlans: the VLANs to remove
- * @count: number of VLANs
*
* Request that the PF remove one or more VLAN filters from our VSI.
**/
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 7dc5f04..7541ec2 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1049,7 +1049,9 @@
* NVM Update commands (indirect 0x0703)
*/
struct ice_aqc_nvm {
- u8 cmd_flags;
+ __le16 offset_low;
+ u8 offset_high;
+ u8 cmd_flags;
#define ICE_AQC_NVM_LAST_CMD BIT(0)
#define ICE_AQC_NVM_PCIR_REQ BIT(0) /* Used by NVM Update reply */
#define ICE_AQC_NVM_PRESERVATION_S 1
@@ -1058,12 +1060,11 @@
#define ICE_AQC_NVM_PRESERVE_ALL BIT(1)
#define ICE_AQC_NVM_PRESERVE_SELECTED (3 << CSR_AQ_NVM_PRESERVATION_S)
#define ICE_AQC_NVM_FLASH_ONLY BIT(7)
- u8 module_typeid;
- __le16 length;
+ __le16 module_typeid;
+ __le16 length;
#define ICE_AQC_NVM_ERASE_LEN 0xFFFF
- __le32 offset;
- __le32 addr_high;
- __le32 addr_low;
+ __le32 addr_high;
+ __le32 addr_low;
};
/* Get/Set RSS key (indirect 0x0B04/0x0B02) */
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index fa7a69a..92da0a6 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -16,7 +16,7 @@
* Read the NVM using the admin queue commands (0x0701)
*/
static enum ice_status
-ice_aq_read_nvm(struct ice_hw *hw, u8 module_typeid, u32 offset, u16 length,
+ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
void *data, bool last_command, struct ice_sq_cd *cd)
{
struct ice_aq_desc desc;
@@ -33,8 +33,9 @@
/* If this is the last command in a series, set the proper flag. */
if (last_command)
cmd->cmd_flags |= ICE_AQC_NVM_LAST_CMD;
- cmd->module_typeid = module_typeid;
- cmd->offset = cpu_to_le32(offset);
+ cmd->module_typeid = cpu_to_le16(module_typeid);
+ cmd->offset_low = cpu_to_le16(offset & 0xFFFF);
+ cmd->offset_high = (offset >> 16) & 0xFF;
cmd->length = cpu_to_le16(length);
return ice_aq_send_cmd(hw, &desc, data, length, cd);
diff --git a/drivers/net/ethernet/intel/igb/Makefile b/drivers/net/ethernet/intel/igb/Makefile
index c48583e..394c1e0 100644
--- a/drivers/net/ethernet/intel/igb/Makefile
+++ b/drivers/net/ethernet/intel/igb/Makefile
@@ -1,31 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel 82575 PCI-Express Ethernet Linux driver
-# Copyright(c) 1999 - 2014 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along with
-# this program; if not, see <http://www.gnu.org/licenses/>.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# Linux NICS <linux.nics@intel.com>
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
-
+# Copyright(c) 1999 - 2018 Intel Corporation.
#
# Makefile for the Intel(R) 82575 PCI-Express ethernet driver
#
diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index dd9b6ca..b13b42e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
/* e1000_82575
* e1000_82576
diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.h b/drivers/net/ethernet/intel/igb/e1000_82575.h
index e53ebe9..6ad775b 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.h
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_82575_H_
#define _E1000_82575_H_
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 98534f7..252440a 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_DEFINES_H_
#define _E1000_DEFINES_H_
@@ -491,6 +470,8 @@
* manageability enabled, allowing us room for 15 multicast addresses.
*/
#define E1000_RAH_AV 0x80000000 /* Receive descriptor valid */
+#define E1000_RAH_ASEL_SRC_ADDR 0x00010000
+#define E1000_RAH_QSEL_ENABLE 0x10000000
#define E1000_RAL_MAC_ADDR_LEN 4
#define E1000_RAH_MAC_ADDR_LEN 2
#define E1000_RAH_POOL_MASK 0x03FC0000
diff --git a/drivers/net/ethernet/intel/igb/e1000_hw.h b/drivers/net/ethernet/intel/igb/e1000_hw.h
index ff835e1..5d87957 100644
--- a/drivers/net/ethernet/intel/igb/e1000_hw.h
+++ b/drivers/net/ethernet/intel/igb/e1000_hw.h
@@ -1,25 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_HW_H_
#define _E1000_HW_H_
diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.c b/drivers/net/ethernet/intel/igb/e1000_i210.c
index 6f54824..c54ebed 100644
--- a/drivers/net/ethernet/intel/igb/e1000_i210.c
+++ b/drivers/net/ethernet/intel/igb/e1000_i210.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
/* e1000_i210
* e1000_i211
diff --git a/drivers/net/ethernet/intel/igb/e1000_i210.h b/drivers/net/ethernet/intel/igb/e1000_i210.h
index 56f015c..5c437fd 100644
--- a/drivers/net/ethernet/intel/igb/e1000_i210.h
+++ b/drivers/net/ethernet/intel/igb/e1000_i210.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_I210_H_
#define _E1000_I210_H_
diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.c b/drivers/net/ethernet/intel/igb/e1000_mac.c
index 298afa0..79ee0a7 100644
--- a/drivers/net/ethernet/intel/igb/e1000_mac.c
+++ b/drivers/net/ethernet/intel/igb/e1000_mac.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#include <linux/if_ether.h>
#include <linux/delay.h>
diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.h b/drivers/net/ethernet/intel/igb/e1000_mac.h
index 04d80c7..6e110f2 100644
--- a/drivers/net/ethernet/intel/igb/e1000_mac.h
+++ b/drivers/net/ethernet/intel/igb/e1000_mac.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_MAC_H_
#define _E1000_MAC_H_
diff --git a/drivers/net/ethernet/intel/igb/e1000_mbx.c b/drivers/net/ethernet/intel/igb/e1000_mbx.c
index ef42f16..46debd9 100644
--- a/drivers/net/ethernet/intel/igb/e1000_mbx.c
+++ b/drivers/net/ethernet/intel/igb/e1000_mbx.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#include "e1000_mbx.h"
diff --git a/drivers/net/ethernet/intel/igb/e1000_mbx.h b/drivers/net/ethernet/intel/igb/e1000_mbx.h
index 4f0ecd2..178e60e 100644
--- a/drivers/net/ethernet/intel/igb/e1000_mbx.h
+++ b/drivers/net/ethernet/intel/igb/e1000_mbx.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_MBX_H_
#define _E1000_MBX_H_
diff --git a/drivers/net/ethernet/intel/igb/e1000_nvm.c b/drivers/net/ethernet/intel/igb/e1000_nvm.c
index e4596f1..09f4dcb 100644
--- a/drivers/net/ethernet/intel/igb/e1000_nvm.c
+++ b/drivers/net/ethernet/intel/igb/e1000_nvm.c
@@ -1,25 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#include <linux/if_ether.h>
#include <linux/delay.h>
diff --git a/drivers/net/ethernet/intel/igb/e1000_nvm.h b/drivers/net/ethernet/intel/igb/e1000_nvm.h
index dde68cd..091cddf 100644
--- a/drivers/net/ethernet/intel/igb/e1000_nvm.h
+++ b/drivers/net/ethernet/intel/igb/e1000_nvm.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_NVM_H_
#define _E1000_NVM_H_
diff --git a/drivers/net/ethernet/intel/igb/e1000_phy.c b/drivers/net/ethernet/intel/igb/e1000_phy.c
index 4ec6124..2be0e76 100644
--- a/drivers/net/ethernet/intel/igb/e1000_phy.c
+++ b/drivers/net/ethernet/intel/igb/e1000_phy.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2015 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#include <linux/if_ether.h>
#include <linux/delay.h>
diff --git a/drivers/net/ethernet/intel/igb/e1000_phy.h b/drivers/net/ethernet/intel/igb/e1000_phy.h
index 856d2cd..5894e4b 100644
--- a/drivers/net/ethernet/intel/igb/e1000_phy.h
+++ b/drivers/net/ethernet/intel/igb/e1000_phy.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_PHY_H_
#define _E1000_PHY_H_
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index e8fa8c6..0ad737d 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#ifndef _E1000_REGS_H_
#define _E1000_REGS_H_
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 8dbc399..9643b5b 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -1,26 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
/* Linux PRO/1000 Ethernet Driver main header file */
@@ -442,6 +421,8 @@
enum igb_filter_match_flags {
IGB_FILTER_FLAG_ETHER_TYPE = 0x1,
IGB_FILTER_FLAG_VLAN_TCI = 0x2,
+ IGB_FILTER_FLAG_SRC_MAC_ADDR = 0x4,
+ IGB_FILTER_FLAG_DST_MAC_ADDR = 0x8,
};
#define IGB_MAX_RXNFC_FILTERS 16
@@ -456,11 +437,14 @@
u8 match_flags;
__be16 etype;
__be16 vlan_tci;
+ u8 src_addr[ETH_ALEN];
+ u8 dst_addr[ETH_ALEN];
};
struct igb_nfc_filter {
struct hlist_node nfc_node;
struct igb_nfc_input filter;
+ unsigned long cookie;
u16 etype_reg_index;
u16 sw_idx;
u16 action;
@@ -474,6 +458,8 @@
#define IGB_MAC_STATE_DEFAULT 0x1
#define IGB_MAC_STATE_IN_USE 0x2
+#define IGB_MAC_STATE_SRC_ADDR 0x4
+#define IGB_MAC_STATE_QUEUE_STEERING 0x8
/* board specific private data structure */
struct igb_adapter {
@@ -598,6 +584,7 @@
/* RX network flow classification support */
struct hlist_head nfc_filter_list;
+ struct hlist_head cls_flower_list;
unsigned int nfc_filter_count;
/* lock for RX network flow classification filter */
spinlock_t nfc_lock;
@@ -739,4 +726,9 @@
int igb_erase_filter(struct igb_adapter *adapter,
struct igb_nfc_filter *input);
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags);
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags);
+
#endif /* _IGB_H_ */
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index e77ba0d..2d798499d 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
/* ethtool support for igb */
@@ -2495,6 +2474,23 @@
fsp->h_ext.vlan_tci = rule->filter.vlan_tci;
fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK);
}
+ if (rule->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+ ether_addr_copy(fsp->h_u.ether_spec.h_dest,
+ rule->filter.dst_addr);
+ /* As we only support matching by the full
+ * mask, return the mask to userspace
+ */
+ eth_broadcast_addr(fsp->m_u.ether_spec.h_dest);
+ }
+ if (rule->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+ ether_addr_copy(fsp->h_u.ether_spec.h_source,
+ rule->filter.src_addr);
+ /* As we only support matching by the full
+ * mask, return the mask to userspace
+ */
+ eth_broadcast_addr(fsp->m_u.ether_spec.h_source);
+ }
+
return 0;
}
return -EINVAL;
@@ -2768,14 +2764,41 @@
int igb_add_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input)
{
+ struct e1000_hw *hw = &adapter->hw;
int err = -EINVAL;
+ if (hw->mac.type == e1000_i210 &&
+ !(input->filter.match_flags & ~IGB_FILTER_FLAG_SRC_MAC_ADDR)) {
+ dev_err(&adapter->pdev->dev,
+ "i210 doesn't support flow classification rules specifying only source addresses.\n");
+ return -EOPNOTSUPP;
+ }
+
if (input->filter.match_flags & IGB_FILTER_FLAG_ETHER_TYPE) {
err = igb_rxnfc_write_etype_filter(adapter, input);
if (err)
return err;
}
+ if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) {
+ err = igb_add_mac_steering_filter(adapter,
+ input->filter.dst_addr,
+ input->action, 0);
+ err = min_t(int, err, 0);
+ if (err)
+ return err;
+ }
+
+ if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) {
+ err = igb_add_mac_steering_filter(adapter,
+ input->filter.src_addr,
+ input->action,
+ IGB_MAC_STATE_SRC_ADDR);
+ err = min_t(int, err, 0);
+ if (err)
+ return err;
+ }
+
if (input->filter.match_flags & IGB_FILTER_FLAG_VLAN_TCI)
err = igb_rxnfc_write_vlan_prio_filter(adapter, input);
@@ -2824,6 +2847,15 @@
igb_clear_vlan_prio_filter(adapter,
ntohs(input->filter.vlan_tci));
+ if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR)
+ igb_del_mac_steering_filter(adapter, input->filter.src_addr,
+ input->action,
+ IGB_MAC_STATE_SRC_ADDR);
+
+ if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR)
+ igb_del_mac_steering_filter(adapter, input->filter.dst_addr,
+ input->action, 0);
+
return 0;
}
@@ -2865,7 +2897,7 @@
/* add filter to the list */
if (parent)
- hlist_add_behind(&parent->nfc_node, &input->nfc_node);
+ hlist_add_behind(&input->nfc_node, &parent->nfc_node);
else
hlist_add_head(&input->nfc_node, &adapter->nfc_filter_list);
@@ -2905,10 +2937,6 @@
if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW)
return -EINVAL;
- if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK &&
- fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK))
- return -EINVAL;
-
input = kzalloc(sizeof(*input), GFP_KERNEL);
if (!input)
return -ENOMEM;
@@ -2918,6 +2946,20 @@
input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE;
}
+ /* Only support matching addresses by the full mask */
+ if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) {
+ input->filter.match_flags |= IGB_FILTER_FLAG_SRC_MAC_ADDR;
+ ether_addr_copy(input->filter.src_addr,
+ fsp->h_u.ether_spec.h_source);
+ }
+
+ /* Only support matching addresses by the full mask */
+ if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) {
+ input->filter.match_flags |= IGB_FILTER_FLAG_DST_MAC_ADDR;
+ ether_addr_copy(input->filter.dst_addr,
+ fsp->h_u.ether_spec.h_dest);
+ }
+
if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) {
if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) {
err = -EINVAL;
diff --git a/drivers/net/ethernet/intel/igb/igb_hwmon.c b/drivers/net/ethernet/intel/igb/igb_hwmon.c
index bebe43b..3b83747 100644
--- a/drivers/net/ethernet/intel/igb/igb_hwmon.c
+++ b/drivers/net/ethernet/intel/igb/igb_hwmon.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#include "igb.h"
#include "e1000_82575.h"
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index cce7ada..78574c0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1,26 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/* Intel(R) Gigabit Ethernet Linux driver
- * Copyright(c) 2007-2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- */
+/* Copyright(c) 2007 - 2018 Intel Corporation. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -36,6 +15,7 @@
#include <net/checksum.h>
#include <net/ip6_checksum.h>
#include <net/pkt_sched.h>
+#include <net/pkt_cls.h>
#include <linux/net_tstamp.h>
#include <linux/mii.h>
#include <linux/ethtool.h>
@@ -2513,6 +2493,250 @@
return 0;
}
+#define ETHER_TYPE_FULL_MASK ((__force __be16)~0)
+#define VLAN_PRIO_FULL_MASK (0x07)
+
+static int igb_parse_cls_flower(struct igb_adapter *adapter,
+ struct tc_cls_flower_offload *f,
+ int traffic_class,
+ struct igb_nfc_filter *input)
+{
+ struct netlink_ext_ack *extack = f->common.extack;
+
+ if (f->dissector->used_keys &
+ ~(BIT(FLOW_DISSECTOR_KEY_BASIC) |
+ BIT(FLOW_DISSECTOR_KEY_CONTROL) |
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
+ BIT(FLOW_DISSECTOR_KEY_VLAN))) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Unsupported key used, only BASIC, CONTROL, ETH_ADDRS and VLAN are supported");
+ return -EOPNOTSUPP;
+ }
+
+ if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct flow_dissector_key_eth_addrs *key, *mask;
+
+ key = skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_ETH_ADDRS,
+ f->key);
+ mask = skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_ETH_ADDRS,
+ f->mask);
+
+ if (!is_zero_ether_addr(mask->dst)) {
+ if (!is_broadcast_ether_addr(mask->dst)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full masks are supported for destination MAC address");
+ return -EINVAL;
+ }
+
+ input->filter.match_flags |=
+ IGB_FILTER_FLAG_DST_MAC_ADDR;
+ ether_addr_copy(input->filter.dst_addr, key->dst);
+ }
+
+ if (!is_zero_ether_addr(mask->src)) {
+ if (!is_broadcast_ether_addr(mask->src)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full masks are supported for source MAC address");
+ return -EINVAL;
+ }
+
+ input->filter.match_flags |=
+ IGB_FILTER_FLAG_SRC_MAC_ADDR;
+ ether_addr_copy(input->filter.src_addr, key->src);
+ }
+ }
+
+ if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+ struct flow_dissector_key_basic *key, *mask;
+
+ key = skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ f->key);
+ mask = skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ f->mask);
+
+ if (mask->n_proto) {
+ if (mask->n_proto != ETHER_TYPE_FULL_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for EtherType filter");
+ return -EINVAL;
+ }
+
+ input->filter.match_flags |= IGB_FILTER_FLAG_ETHER_TYPE;
+ input->filter.etype = key->n_proto;
+ }
+ }
+
+ if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
+ struct flow_dissector_key_vlan *key, *mask;
+
+ key = skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_VLAN,
+ f->key);
+ mask = skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_VLAN,
+ f->mask);
+
+ if (mask->vlan_priority) {
+ if (mask->vlan_priority != VLAN_PRIO_FULL_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
+ return -EINVAL;
+ }
+
+ input->filter.match_flags |= IGB_FILTER_FLAG_VLAN_TCI;
+ input->filter.vlan_tci = key->vlan_priority;
+ }
+ }
+
+ input->action = traffic_class;
+ input->cookie = f->cookie;
+
+ return 0;
+}
+
+static int igb_configure_clsflower(struct igb_adapter *adapter,
+ struct tc_cls_flower_offload *cls_flower)
+{
+ struct netlink_ext_ack *extack = cls_flower->common.extack;
+ struct igb_nfc_filter *filter, *f;
+ int err, tc;
+
+ tc = tc_classid_to_hwtc(adapter->netdev, cls_flower->classid);
+ if (tc < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid traffic class");
+ return -EINVAL;
+ }
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter)
+ return -ENOMEM;
+
+ err = igb_parse_cls_flower(adapter, cls_flower, tc, filter);
+ if (err < 0)
+ goto err_parse;
+
+ spin_lock(&adapter->nfc_lock);
+
+ hlist_for_each_entry(f, &adapter->nfc_filter_list, nfc_node) {
+ if (!memcmp(&f->filter, &filter->filter, sizeof(f->filter))) {
+ err = -EEXIST;
+ NL_SET_ERR_MSG_MOD(extack,
+ "This filter is already set in ethtool");
+ goto err_locked;
+ }
+ }
+
+ hlist_for_each_entry(f, &adapter->cls_flower_list, nfc_node) {
+ if (!memcmp(&f->filter, &filter->filter, sizeof(f->filter))) {
+ err = -EEXIST;
+ NL_SET_ERR_MSG_MOD(extack,
+ "This filter is already set in cls_flower");
+ goto err_locked;
+ }
+ }
+
+ err = igb_add_filter(adapter, filter);
+ if (err < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "Could not add filter to the adapter");
+ goto err_locked;
+ }
+
+ hlist_add_head(&filter->nfc_node, &adapter->cls_flower_list);
+
+ spin_unlock(&adapter->nfc_lock);
+
+ return 0;
+
+err_locked:
+ spin_unlock(&adapter->nfc_lock);
+
+err_parse:
+ kfree(filter);
+
+ return err;
+}
+
+static int igb_delete_clsflower(struct igb_adapter *adapter,
+ struct tc_cls_flower_offload *cls_flower)
+{
+ struct igb_nfc_filter *filter;
+ int err;
+
+ spin_lock(&adapter->nfc_lock);
+
+ hlist_for_each_entry(filter, &adapter->cls_flower_list, nfc_node)
+ if (filter->cookie == cls_flower->cookie)
+ break;
+
+ if (!filter) {
+ err = -ENOENT;
+ goto out;
+ }
+
+ err = igb_erase_filter(adapter, filter);
+ if (err < 0)
+ goto out;
+
+ hlist_del(&filter->nfc_node);
+ kfree(filter);
+
+out:
+ spin_unlock(&adapter->nfc_lock);
+
+ return err;
+}
+
+static int igb_setup_tc_cls_flower(struct igb_adapter *adapter,
+ struct tc_cls_flower_offload *cls_flower)
+{
+ switch (cls_flower->command) {
+ case TC_CLSFLOWER_REPLACE:
+ return igb_configure_clsflower(adapter, cls_flower);
+ case TC_CLSFLOWER_DESTROY:
+ return igb_delete_clsflower(adapter, cls_flower);
+ case TC_CLSFLOWER_STATS:
+ return -EOPNOTSUPP;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct igb_adapter *adapter = cb_priv;
+
+ if (!tc_cls_can_offload_and_chain0(adapter->netdev, type_data))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return igb_setup_tc_cls_flower(adapter, type_data);
+
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int igb_setup_tc_block(struct igb_adapter *adapter,
+ struct tc_block_offload *f)
+{
+ if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+ return -EOPNOTSUPP;
+
+ switch (f->command) {
+ case TC_BLOCK_BIND:
+ return tcf_block_cb_register(f->block, igb_setup_tc_block_cb,
+ adapter, adapter);
+ case TC_BLOCK_UNBIND:
+ tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb,
+ adapter);
+ return 0;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data)
{
@@ -2521,6 +2745,8 @@
switch (type) {
case TC_SETUP_QDISC_CBS:
return igb_offload_cbs(adapter, type_data);
+ case TC_SETUP_BLOCK:
+ return igb_setup_tc_block(adapter, type_data);
default:
return -EOPNOTSUPP;
@@ -2822,6 +3048,9 @@
if (hw->mac.type >= e1000_82576)
netdev->features |= NETIF_F_SCTP_CRC;
+ if (hw->mac.type >= e1000_i350)
+ netdev->features |= NETIF_F_HW_TC;
+
#define IGB_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \
NETIF_F_GSO_GRE_CSUM | \
NETIF_F_GSO_IPXIP4 | \
@@ -6856,8 +7085,35 @@
igb_rar_set_index(adapter, 0);
}
-static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
- const u8 queue)
+/* If the filter to be added and an already existing filter express
+ * the same address and address type, it should be possible to only
+ * override the other configurations, for example the queue to steer
+ * traffic.
+ */
+static bool igb_mac_entry_can_be_used(const struct igb_mac_addr *entry,
+ const u8 *addr, const u8 flags)
+{
+ if (!(entry->state & IGB_MAC_STATE_IN_USE))
+ return true;
+
+ if ((entry->state & IGB_MAC_STATE_SRC_ADDR) !=
+ (flags & IGB_MAC_STATE_SRC_ADDR))
+ return false;
+
+ if (!ether_addr_equal(addr, entry->addr))
+ return false;
+
+ return true;
+}
+
+/* Add a MAC filter for 'addr' directing matching traffic to 'queue',
+ * 'flags' is used to indicate what kind of match is made, match is by
+ * default for the destination address, if matching by source address
+ * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_add_mac_filter_flags(struct igb_adapter *adapter,
+ const u8 *addr, const u8 queue,
+ const u8 flags)
{
struct e1000_hw *hw = &adapter->hw;
int rar_entries = hw->mac.rar_entry_count -
@@ -6872,12 +7128,13 @@
* addresses.
*/
for (i = 0; i < rar_entries; i++) {
- if (adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE)
+ if (!igb_mac_entry_can_be_used(&adapter->mac_table[i],
+ addr, flags))
continue;
ether_addr_copy(adapter->mac_table[i].addr, addr);
adapter->mac_table[i].queue = queue;
- adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE;
+ adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE | flags;
igb_rar_set_index(adapter, i);
return i;
@@ -6886,9 +7143,22 @@
return -ENOSPC;
}
-static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
const u8 queue)
{
+ return igb_add_mac_filter_flags(adapter, addr, queue, 0);
+}
+
+/* Remove a MAC filter for 'addr' directing matching traffic to
+ * 'queue', 'flags' is used to indicate what kind of match need to be
+ * removed, match is by default for the destination address, if
+ * matching by source address is to be removed the flag
+ * IGB_MAC_STATE_SRC_ADDR can be used.
+ */
+static int igb_del_mac_filter_flags(struct igb_adapter *adapter,
+ const u8 *addr, const u8 queue,
+ const u8 flags)
+{
struct e1000_hw *hw = &adapter->hw;
int rar_entries = hw->mac.rar_entry_count -
adapter->vfs_allocated_count;
@@ -6904,14 +7174,26 @@
for (i = 0; i < rar_entries; i++) {
if (!(adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE))
continue;
+ if ((adapter->mac_table[i].state & flags) != flags)
+ continue;
if (adapter->mac_table[i].queue != queue)
continue;
if (!ether_addr_equal(adapter->mac_table[i].addr, addr))
continue;
- adapter->mac_table[i].state &= ~IGB_MAC_STATE_IN_USE;
- memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
- adapter->mac_table[i].queue = 0;
+ /* When a filter for the default address is "deleted",
+ * we return it to its initial configuration
+ */
+ if (adapter->mac_table[i].state & IGB_MAC_STATE_DEFAULT) {
+ adapter->mac_table[i].state =
+ IGB_MAC_STATE_DEFAULT | IGB_MAC_STATE_IN_USE;
+ adapter->mac_table[i].queue =
+ adapter->vfs_allocated_count;
+ } else {
+ adapter->mac_table[i].state = 0;
+ adapter->mac_table[i].queue = 0;
+ memset(adapter->mac_table[i].addr, 0, ETH_ALEN);
+ }
igb_rar_set_index(adapter, i);
return 0;
@@ -6920,6 +7202,34 @@
return -ENOENT;
}
+static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+ const u8 queue)
+{
+ return igb_del_mac_filter_flags(adapter, addr, queue, 0);
+}
+
+int igb_add_mac_steering_filter(struct igb_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags)
+{
+ struct e1000_hw *hw = &adapter->hw;
+
+ /* In theory, this should be supported on 82575 as well, but
+ * that part wasn't easily accessible during development.
+ */
+ if (hw->mac.type != e1000_i210)
+ return -EOPNOTSUPP;
+
+ return igb_add_mac_filter_flags(adapter, addr, queue,
+ IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
+int igb_del_mac_steering_filter(struct igb_adapter *adapter,
+ const u8 *addr, u8 queue, u8 flags)
+{
+ return igb_del_mac_filter_flags(adapter, addr, queue,
+ IGB_MAC_STATE_QUEUE_STEERING | flags);
+}
+
static int igb_uc_sync(struct net_device *netdev, const unsigned char *addr)
{
struct igb_adapter *adapter = netdev_priv(netdev);
@@ -8763,12 +9073,24 @@
if (is_valid_ether_addr(addr))
rar_high |= E1000_RAH_AV;
- if (hw->mac.type == e1000_82575)
+ if (adapter->mac_table[index].state & IGB_MAC_STATE_SRC_ADDR)
+ rar_high |= E1000_RAH_ASEL_SRC_ADDR;
+
+ switch (hw->mac.type) {
+ case e1000_82575:
+ case e1000_i210:
+ if (adapter->mac_table[index].state &
+ IGB_MAC_STATE_QUEUE_STEERING)
+ rar_high |= E1000_RAH_QSEL_ENABLE;
+
rar_high |= E1000_RAH_POOL_1 *
adapter->mac_table[index].queue;
- else
+ break;
+ default:
rar_high |= E1000_RAH_POOL_1 <<
adapter->mac_table[index].queue;
+ break;
+ }
}
wr32(E1000_RAL(index), rar_low);
@@ -9206,6 +9528,9 @@
hlist_for_each_entry(rule, &adapter->nfc_filter_list, nfc_node)
igb_erase_filter(adapter, rule);
+ hlist_for_each_entry(rule, &adapter->cls_flower_list, nfc_node)
+ igb_erase_filter(adapter, rule);
+
spin_unlock(&adapter->nfc_lock);
}
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 7454b98..9f4d700 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -1,21 +1,6 @@
// SPDX-License-Identifier: GPL-2.0+
-/* PTP Hardware Clock (PHC) driver for the Intel 82576 and 82580
- *
- * Copyright (C) 2011 Richard Cochran <richardcochran@gmail.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, see <http://www.gnu.org/licenses/>.
- */
+/* Copyright (C) 2011 Richard Cochran <richardcochran@gmail.com> */
+
#include <linux/module.h>
#include <linux/device.h>
#include <linux/pci.h>
diff --git a/drivers/net/ethernet/intel/igbvf/Makefile b/drivers/net/ethernet/intel/igbvf/Makefile
index efe29da..afd3e36 100644
--- a/drivers/net/ethernet/intel/igbvf/Makefile
+++ b/drivers/net/ethernet/intel/igbvf/Makefile
@@ -1,31 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel(R) 82576 Virtual Function Linux driver
-# Copyright(c) 2009 - 2012 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along with
-# this program; if not, write to the Free Software Foundation, Inc.,
-# 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
-
+# Copyright(c) 2009 - 2018 Intel Corporation.
#
# Makefile for the Intel(R) 82576 VF ethernet driver
#
diff --git a/drivers/net/ethernet/intel/igbvf/defines.h b/drivers/net/ethernet/intel/igbvf/defines.h
index 04bcfec..4437f83 100644
--- a/drivers/net/ethernet/intel/igbvf/defines.h
+++ b/drivers/net/ethernet/intel/igbvf/defines.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 1999 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000_DEFINES_H_
#define _E1000_DEFINES_H_
diff --git a/drivers/net/ethernet/intel/igbvf/ethtool.c b/drivers/net/ethernet/intel/igbvf/ethtool.c
index ca39e3c..3ae358b 100644
--- a/drivers/net/ethernet/intel/igbvf/ethtool.c
+++ b/drivers/net/ethernet/intel/igbvf/ethtool.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 2009 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2009 - 2018 Intel Corporation. */
/* ethtool support for igbvf */
diff --git a/drivers/net/ethernet/intel/igbvf/igbvf.h b/drivers/net/ethernet/intel/igbvf/igbvf.h
index f5bf248..eee26a3 100644
--- a/drivers/net/ethernet/intel/igbvf/igbvf.h
+++ b/drivers/net/ethernet/intel/igbvf/igbvf.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 2009 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2009 - 2018 Intel Corporation. */
/* Linux PRO/1000 Ethernet Driver main header file */
diff --git a/drivers/net/ethernet/intel/igbvf/mbx.c b/drivers/net/ethernet/intel/igbvf/mbx.c
index 9195884..163e583 100644
--- a/drivers/net/ethernet/intel/igbvf/mbx.c
+++ b/drivers/net/ethernet/intel/igbvf/mbx.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 2009 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2009 - 2018 Intel Corporation. */
#include "mbx.h"
diff --git a/drivers/net/ethernet/intel/igbvf/mbx.h b/drivers/net/ethernet/intel/igbvf/mbx.h
index 479b062..e5b3181 100644
--- a/drivers/net/ethernet/intel/igbvf/mbx.h
+++ b/drivers/net/ethernet/intel/igbvf/mbx.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 1999 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _E1000_MBX_H_
#define _E1000_MBX_H_
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index e2b7502..f818f06 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 2009 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2009 - 2018 Intel Corporation. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/drivers/net/ethernet/intel/igbvf/regs.h b/drivers/net/ethernet/intel/igbvf/regs.h
index 614e524..625a309 100644
--- a/drivers/net/ethernet/intel/igbvf/regs.h
+++ b/drivers/net/ethernet/intel/igbvf/regs.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 2009 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2009 - 2018 Intel Corporation. */
#ifndef _E1000_REGS_H_
#define _E1000_REGS_H_
diff --git a/drivers/net/ethernet/intel/igbvf/vf.c b/drivers/net/ethernet/intel/igbvf/vf.c
index bfe8d82..b8ba3f9 100644
--- a/drivers/net/ethernet/intel/igbvf/vf.c
+++ b/drivers/net/ethernet/intel/igbvf/vf.c
@@ -1,29 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 2009 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2009 - 2018 Intel Corporation. */
#include "vf.h"
diff --git a/drivers/net/ethernet/intel/igbvf/vf.h b/drivers/net/ethernet/intel/igbvf/vf.h
index 193b500..c71b0d7 100644
--- a/drivers/net/ethernet/intel/igbvf/vf.h
+++ b/drivers/net/ethernet/intel/igbvf/vf.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel(R) 82576 Virtual Function Linux driver
- Copyright(c) 2009 - 2012 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2009 - 2018 Intel Corporation. */
#ifndef _E1000_VF_H_
#define _E1000_VF_H_
diff --git a/drivers/net/ethernet/intel/ixgb/Makefile b/drivers/net/ethernet/intel/ixgb/Makefile
index 1b42dd5..2433e93 100644
--- a/drivers/net/ethernet/intel/ixgb/Makefile
+++ b/drivers/net/ethernet/intel/ixgb/Makefile
@@ -1,33 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel PRO/10GbE Linux driver
# Copyright(c) 1999 - 2008 Intel Corporation.
#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along with
-# this program; if not, write to the Free Software Foundation, Inc.,
-# 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# Linux NICS <linux.nics@intel.com>
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
-
-#
# Makefile for the Intel(R) PRO/10GbE ethernet driver
#
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb.h b/drivers/net/ethernet/intel/ixgb/ixgb.h
index 9202284..e85271b 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb.h
+++ b/drivers/net/ethernet/intel/ixgb/ixgb.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
#ifndef _IXGB_H_
#define _IXGB_H_
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_ee.c b/drivers/net/ethernet/intel/ixgb/ixgb_ee.c
index eca216b..129286f 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_ee.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_ee.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_ee.h b/drivers/net/ethernet/intel/ixgb/ixgb_ee.h
index 475297a..3ee0a09 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_ee.h
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_ee.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
#ifndef _IXGB_EE_H_
#define _IXGB_EE_H_
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c b/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
index d10a0d2..43744bf 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
/* ethtool support for ixgb */
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_hw.c b/drivers/net/ethernet/intel/ixgb/ixgb_hw.c
index bf9a220..cbaa933 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_hw.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_hw.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
/* ixgb_hw.c
* Shared functions for accessing and configuring the adapter
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_hw.h b/drivers/net/ethernet/intel/ixgb/ixgb_hw.h
index 19f36d8..6064583 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_hw.h
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_hw.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
#ifndef _IXGB_HW_H_
#define _IXGB_HW_H_
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_ids.h b/drivers/net/ethernet/intel/ixgb/ixgb_ids.h
index 24e8499..9695b82 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_ids.h
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_ids.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
#ifndef _IXGB_IDS_H_
#define _IXGB_IDS_H_
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index 2353c383..62f2173 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_osdep.h b/drivers/net/ethernet/intel/ixgb/ixgb_osdep.h
index b171037..7bd54efa 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_osdep.h
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_osdep.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
/* glue for the OS independent part of ixgb
* includes register access macros
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_param.c b/drivers/net/ethernet/intel/ixgb/ixgb_param.c
index 04a6064..f0cadd5 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_param.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_param.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel PRO/10GbE Linux driver
- Copyright(c) 1999 - 2008 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2008 Intel Corporation. */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile b/drivers/net/ethernet/intel/ixgbe/Makefile
index 4cd96c8..5414685 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -1,32 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel 10 Gigabit PCI Express Linux driver
-# Copyright(c) 1999 - 2013 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along with
-# this program; if not, write to the Free Software Foundation, Inc.,
-# 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# Linux NICS <linux.nics@intel.com>
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
-
+# Copyright(c) 1999 - 2018 Intel Corporation.
#
# Makefile for the Intel(R) 10GbE PCI Express ethernet driver
#
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4f08c71..fc534e9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_H_
#define _IXGBE_H_
@@ -241,8 +215,7 @@
unsigned long time_stamp;
union {
struct sk_buff *skb;
- /* XDP uses address ptr on irq_clean */
- void *data;
+ struct xdp_frame *xdpf;
};
unsigned int bytecount;
unsigned short gso_segs;
@@ -306,7 +279,6 @@
struct ixgbe_fwd_adapter {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
struct net_device *netdev;
- struct ixgbe_adapter *real_adapter;
unsigned int tx_base_queue;
unsigned int rx_base_queue;
int pool;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index cb0fe5f..eee277c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/pci.h>
#include <linux/delay.h>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82599.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_82599.c
index 66a74f4..1e49716 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82599.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/pci.h>
#include <linux/delay.h>
@@ -1462,7 +1436,8 @@
{
u32 hi_hash_dword, lo_hash_dword, flow_vm_vlan;
- u32 bucket_hash = 0, hi_dword = 0;
+ u32 bucket_hash = 0;
+ __be32 hi_dword = 0;
int i;
/* Apply masks to input data */
@@ -1501,7 +1476,7 @@
* Limit hash to 13 bits since max bucket count is 8K.
* Store result at the end of the input stream.
*/
- input->formatted.bkt_hash = bucket_hash & 0x1FFF;
+ input->formatted.bkt_hash = (__force __be16)(bucket_hash & 0x1FFF);
}
/**
@@ -1610,7 +1585,7 @@
return IXGBE_ERR_CONFIG;
}
- switch (input_mask->formatted.flex_bytes & 0xFFFF) {
+ switch ((__force u16)input_mask->formatted.flex_bytes & 0xFFFF) {
case 0x0000:
/* Mask Flex Bytes */
fdirm |= IXGBE_FDIRM_FLEX;
@@ -1680,13 +1655,13 @@
IXGBE_WRITE_REG(hw, IXGBE_FDIRPORT, fdirport);
/* record vlan (little-endian) and flex_bytes(big-endian) */
- fdirvlan = IXGBE_STORE_AS_BE16(input->formatted.flex_bytes);
+ fdirvlan = IXGBE_STORE_AS_BE16((__force u16)input->formatted.flex_bytes);
fdirvlan <<= IXGBE_FDIRVLAN_FLEX_SHIFT;
fdirvlan |= ntohs(input->formatted.vlan_id);
IXGBE_WRITE_REG(hw, IXGBE_FDIRVLAN, fdirvlan);
/* configure FDIRHASH register */
- fdirhash = input->formatted.bkt_hash;
+ fdirhash = (__force u32)input->formatted.bkt_hash;
fdirhash |= soft_id << IXGBE_FDIRHASH_SIG_SW_INDEX_SHIFT;
IXGBE_WRITE_REG(hw, IXGBE_FDIRHASH, fdirhash);
@@ -1724,7 +1699,7 @@
s32 err;
/* configure FDIRHASH register */
- fdirhash = input->formatted.bkt_hash;
+ fdirhash = (__force u32)input->formatted.bkt_hash;
fdirhash |= soft_id << IXGBE_FDIRHASH_SIG_SW_INDEX_SHIFT;
IXGBE_WRITE_REG(hw, IXGBE_FDIRHASH, fdirhash);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 633be93..3f5c350 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/pci.h>
#include <linux/delay.h>
@@ -3652,7 +3626,7 @@
*/
for (i = 0; i < dword_len; i++)
IXGBE_WRITE_REG_ARRAY(hw, IXGBE_FLEX_MNG,
- i, cpu_to_le32(buffer[i]));
+ i, (__force u32)cpu_to_le32(buffer[i]));
/* Setting this bit tells the ARC that a new command is pending. */
IXGBE_WRITE_REG(hw, IXGBE_HICR, hicr | IXGBE_HICR_C);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
index 2b31138..4b531e8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_COMMON_H_
#define _IXGBE_COMMON_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
index aaea828..d26cea5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.c
@@ -1,31 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
-
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe.h"
#include "ixgbe_type.h"
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
index 73b6362..60cd586 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _DCB_CONFIG_H_
#define _DCB_CONFIG_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c
index 0851306..379ae74 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.c
@@ -1,31 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe.h"
#include "ixgbe_type.h"
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.h
index 7edce60..fdca41a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82598.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _DCB_82598_CONFIG_H_
#define _DCB_82598_CONFIG_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c
index 1eed681..7948849 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe.h"
#include "ixgbe_type.h"
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.h
index fa030f0..c6f0848 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_82599.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _DCB_82599_CONFIG_H_
#define _DCB_82599_CONFIG_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
index b33f3f8..c00332d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2014 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe.h"
#include <linux/dcbnl.h>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
index ad54080..55fe811 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
@@ -1,30 +1,6 @@
-/*******************************************************************************
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
#include <linux/debugfs.h>
#include <linux/module.h>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index c0e6ab4..bdd179c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* ethtool support for ixgbe */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index 7a09a40..94b3165 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2014 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe.h"
#include <linux/if_ether.h>
@@ -465,7 +440,7 @@
case cpu_to_le32(IXGBE_RXDADV_STAT_FCSTAT_FCPRSP):
dma_unmap_sg(&adapter->pdev->dev, ddp->sgl,
ddp->sgc, DMA_FROM_DEVICE);
- ddp->err = ddp_err;
+ ddp->err = (__force u32)ddp_err;
ddp->sgl = NULL;
ddp->sgc = 0;
/* fall through */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h
index cf19199..724f538 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_FCOE_H
#define _IXGBE_FCOE_H
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index cead23e..99b170f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -1,29 +1,5 @@
-/*******************************************************************************
- *
- * Intel 10 Gigabit PCI Express Linux driver
- * Copyright(c) 2017 Oracle and/or its affiliates. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2017 Oracle and/or its affiliates. All rights reserved. */
#include "ixgbe.h"
#include <net/xfrm.h>
@@ -43,8 +19,9 @@
int i;
for (i = 0; i < 4; i++)
- IXGBE_WRITE_REG(hw, IXGBE_IPSTXKEY(i), cpu_to_be32(key[3 - i]));
- IXGBE_WRITE_REG(hw, IXGBE_IPSTXSALT, cpu_to_be32(salt));
+ IXGBE_WRITE_REG(hw, IXGBE_IPSTXKEY(i),
+ (__force u32)cpu_to_be32(key[3 - i]));
+ IXGBE_WRITE_REG(hw, IXGBE_IPSTXSALT, (__force u32)cpu_to_be32(salt));
IXGBE_WRITE_FLUSH(hw);
reg = IXGBE_READ_REG(hw, IXGBE_IPSTXIDX);
@@ -93,7 +70,8 @@
int i;
/* store the SPI (in bigendian) and IPidx */
- IXGBE_WRITE_REG(hw, IXGBE_IPSRXSPI, cpu_to_le32(spi));
+ IXGBE_WRITE_REG(hw, IXGBE_IPSRXSPI,
+ (__force u32)cpu_to_le32((__force u32)spi));
IXGBE_WRITE_REG(hw, IXGBE_IPSRXIPIDX, ip_idx);
IXGBE_WRITE_FLUSH(hw);
@@ -101,8 +79,9 @@
/* store the key, salt, and mode */
for (i = 0; i < 4; i++)
- IXGBE_WRITE_REG(hw, IXGBE_IPSRXKEY(i), cpu_to_be32(key[3 - i]));
- IXGBE_WRITE_REG(hw, IXGBE_IPSRXSALT, cpu_to_be32(salt));
+ IXGBE_WRITE_REG(hw, IXGBE_IPSRXKEY(i),
+ (__force u32)cpu_to_be32(key[3 - i]));
+ IXGBE_WRITE_REG(hw, IXGBE_IPSRXSALT, (__force u32)cpu_to_be32(salt));
IXGBE_WRITE_REG(hw, IXGBE_IPSRXMOD, mode);
IXGBE_WRITE_FLUSH(hw);
@@ -121,7 +100,8 @@
/* store the ip address */
for (i = 0; i < 4; i++)
- IXGBE_WRITE_REG(hw, IXGBE_IPSRXIPADDR(i), cpu_to_le32(addr[i]));
+ IXGBE_WRITE_REG(hw, IXGBE_IPSRXIPADDR(i),
+ (__force u32)cpu_to_le32((__force u32)addr[i]));
IXGBE_WRITE_FLUSH(hw);
ixgbe_ipsec_set_rx_item(hw, idx, ips_rx_ip_tbl);
@@ -391,7 +371,8 @@
struct xfrm_state *ret = NULL;
rcu_read_lock();
- hash_for_each_possible_rcu(ipsec->rx_sa_list, rsa, hlist, spi)
+ hash_for_each_possible_rcu(ipsec->rx_sa_list, rsa, hlist,
+ (__force u32)spi) {
if (spi == rsa->xs->id.spi &&
((ip4 && *daddr == rsa->xs->id.daddr.a4) ||
(!ip4 && !memcmp(daddr, &rsa->xs->id.daddr.a6,
@@ -401,6 +382,7 @@
xfrm_state_hold(ret);
break;
}
+ }
rcu_read_unlock();
return ret;
}
@@ -593,7 +575,7 @@
/* hash the new entry for faster search in Rx path */
hash_add_rcu(ipsec->rx_sa_list, &ipsec->rx_tbl[sa_idx].hlist,
- rsa.xs->id.spi);
+ (__force u64)rsa.xs->id.spi);
} else {
struct tx_sa tsa;
@@ -677,7 +659,8 @@
if (!ipsec->ip_tbl[ipi].ref_cnt) {
memset(&ipsec->ip_tbl[ipi], 0,
sizeof(struct rx_ip_sa));
- ixgbe_ipsec_set_rx_ip(hw, ipi, zerobuf);
+ ixgbe_ipsec_set_rx_ip(hw, ipi,
+ (__force __be32 *)zerobuf);
}
}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
index 4f099f5..9ef7faa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
@@ -1,30 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 2017 Oracle and/or its affiliates. All rights reserved.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program. If not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 2017 Oracle and/or its affiliates. All rights reserved. */
#ifndef _IXGBE_IPSEC_H_
#define _IXGBE_IPSEC_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index ed4cbe9..893a920 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe.h"
#include "ixgbe_sriov.h"
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba9..031d65c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/types.h>
#include <linux/module.h>
@@ -752,8 +727,8 @@
ring_desc = "";
pr_info("T [0x%03X] %016llX %016llX %016llX %08X %p %016llX %p%s",
i,
- le64_to_cpu(u0->a),
- le64_to_cpu(u0->b),
+ le64_to_cpu((__force __le64)u0->a),
+ le64_to_cpu((__force __le64)u0->b),
(u64)dma_unmap_addr(tx_buffer, dma),
dma_unmap_len(tx_buffer, len),
tx_buffer->next_to_watch,
@@ -864,15 +839,15 @@
/* Descriptor Done */
pr_info("RWB[0x%03X] %016llX %016llX ---------------- %p%s\n",
i,
- le64_to_cpu(u0->a),
- le64_to_cpu(u0->b),
+ le64_to_cpu((__force __le64)u0->a),
+ le64_to_cpu((__force __le64)u0->b),
rx_buffer_info->skb,
ring_desc);
} else {
pr_info("R [0x%03X] %016llX %016llX %016llX %p%s\n",
i,
- le64_to_cpu(u0->a),
- le64_to_cpu(u0->b),
+ le64_to_cpu((__force __le64)u0->a),
+ le64_to_cpu((__force __le64)u0->b),
(u64)rx_buffer_info->dma,
rx_buffer_info->skb,
ring_desc);
@@ -1216,7 +1191,7 @@
/* free the skb */
if (ring_is_xdp(tx_ring))
- page_frag_free(tx_buffer->data);
+ xdp_return_frame(tx_buffer->xdpf);
else
napi_consume_skb(tx_buffer->skb, napi_budget);
@@ -1768,15 +1743,14 @@
if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_STAT_SECP))
ixgbe_ipsec_rx(rx_ring, rx_desc, skb);
- skb->protocol = eth_type_trans(skb, dev);
-
/* record Rx queue, or update MACVLAN statistics */
if (netif_is_ixgbe(dev))
skb_record_rx_queue(skb, rx_ring->queue_index);
else
macvlan_count_rx(netdev_priv(dev), skb->len + ETH_HLEN, true,
- (skb->pkt_type == PACKET_BROADCAST) ||
- (skb->pkt_type == PACKET_MULTICAST));
+ false);
+
+ skb->protocol = eth_type_trans(skb, dev);
}
static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
@@ -2262,7 +2236,7 @@
#define IXGBE_XDP_TX 2
static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
- struct xdp_buff *xdp);
+ struct xdp_frame *xdpf);
static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
struct ixgbe_ring *rx_ring,
@@ -2270,6 +2244,7 @@
{
int err, result = IXGBE_XDP_PASS;
struct bpf_prog *xdp_prog;
+ struct xdp_frame *xdpf;
u32 act;
rcu_read_lock();
@@ -2278,12 +2253,19 @@
if (!xdp_prog)
goto xdp_out;
+ prefetchw(xdp->data_hard_start); /* xdp_frame write */
+
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
break;
case XDP_TX:
- result = ixgbe_xmit_xdp_ring(adapter, xdp);
+ xdpf = convert_to_xdp_frame(xdp);
+ if (unlikely(!xdpf)) {
+ result = IXGBE_XDP_CONSUMED;
+ break;
+ }
+ result = ixgbe_xmit_xdp_ring(adapter, xdpf);
break;
case XDP_REDIRECT:
err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
@@ -4211,7 +4193,8 @@
static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
{
struct ixgbe_hw *hw = &adapter->hw;
- u32 reg_offset, vf_shift;
+ u16 pool = adapter->num_rx_pools;
+ u32 reg_offset, vf_shift, vmolr;
u32 gcr_ext, vmdctl;
int i;
@@ -4225,6 +4208,13 @@
vmdctl |= IXGBE_VT_CTL_REPLEN;
IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vmdctl);
+ /* accept untagged packets until a vlan tag is
+ * specifically set for the VMDQ queue/pool
+ */
+ vmolr = IXGBE_VMOLR_AUPE;
+ while (pool--)
+ IXGBE_WRITE_REG(hw, IXGBE_VMOLR(VMDQ_P(pool)), vmolr);
+
vf_shift = VMDQ_P(0) % 32;
reg_offset = (VMDQ_P(0) >= 32) ? 1 : 0;
@@ -4892,36 +4882,6 @@
return -ENOMEM;
}
-/**
- * ixgbe_write_uc_addr_list - write unicast addresses to RAR table
- * @netdev: network interface device structure
- * @vfn: pool to associate with unicast addresses
- *
- * Writes unicast address list to the RAR table.
- * Returns: -ENOMEM on failure/insufficient address space
- * 0 on no addresses written
- * X on writing X addresses to the RAR table
- **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev, int vfn)
-{
- struct ixgbe_adapter *adapter = netdev_priv(netdev);
- int count = 0;
-
- /* return ENOMEM indicating insufficient memory for addresses */
- if (netdev_uc_count(netdev) > ixgbe_available_rars(adapter, vfn))
- return -ENOMEM;
-
- if (!netdev_uc_empty(netdev)) {
- struct netdev_hw_addr *ha;
- netdev_for_each_uc_addr(ha, netdev) {
- ixgbe_del_mac_filter(adapter, ha->addr, vfn);
- ixgbe_add_mac_filter(adapter, ha->addr, vfn);
- count++;
- }
- }
- return count;
-}
-
static int ixgbe_uc_sync(struct net_device *netdev, const unsigned char *addr)
{
struct ixgbe_adapter *adapter = netdev_priv(netdev);
@@ -5301,29 +5261,6 @@
spin_unlock(&adapter->fdir_perfect_lock);
}
-static void ixgbe_macvlan_set_rx_mode(struct net_device *dev, unsigned int pool,
- struct ixgbe_adapter *adapter)
-{
- struct ixgbe_hw *hw = &adapter->hw;
- u32 vmolr;
-
- /* No unicast promiscuous support for VMDQ devices. */
- vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(pool));
- vmolr |= (IXGBE_VMOLR_ROMPE | IXGBE_VMOLR_BAM | IXGBE_VMOLR_AUPE);
-
- /* clear the affected bit */
- vmolr &= ~IXGBE_VMOLR_MPE;
-
- if (dev->flags & IFF_ALLMULTI) {
- vmolr |= IXGBE_VMOLR_MPE;
- } else {
- vmolr |= IXGBE_VMOLR_ROMPE;
- hw->mac.ops.update_mc_addr_list(hw, dev);
- }
- ixgbe_write_uc_addr_list(adapter->netdev, pool);
- IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr);
-}
-
/**
* ixgbe_clean_rx_ring - Free Rx Buffers per Queue
* @rx_ring: ring to free buffers from
@@ -5376,21 +5313,17 @@
rx_ring->next_to_use = 0;
}
-static int ixgbe_fwd_ring_up(struct net_device *vdev,
+static int ixgbe_fwd_ring_up(struct ixgbe_adapter *adapter,
struct ixgbe_fwd_adapter *accel)
{
- struct ixgbe_adapter *adapter = accel->real_adapter;
+ struct net_device *vdev = accel->netdev;
int i, baseq, err;
- if (!test_bit(accel->pool, adapter->fwd_bitmask))
- return 0;
-
baseq = accel->pool * adapter->num_rx_queues_per_pool;
netdev_dbg(vdev, "pool %i:%i queues %i:%i\n",
accel->pool, adapter->num_rx_pools,
baseq, baseq + adapter->num_rx_queues_per_pool);
- accel->netdev = vdev;
accel->rx_base_queue = baseq;
accel->tx_base_queue = baseq;
@@ -5407,26 +5340,36 @@
*/
err = ixgbe_add_mac_filter(adapter, vdev->dev_addr,
VMDQ_P(accel->pool));
- if (err >= 0) {
- ixgbe_macvlan_set_rx_mode(vdev, accel->pool, adapter);
+ if (err >= 0)
return 0;
- }
+
+ /* if we cannot add the MAC rule then disable the offload */
+ macvlan_release_l2fw_offload(vdev);
for (i = 0; i < adapter->num_rx_queues_per_pool; i++)
adapter->rx_ring[baseq + i]->netdev = NULL;
+ netdev_err(vdev, "L2FW offload disabled due to L2 filter error\n");
+
+ clear_bit(accel->pool, adapter->fwd_bitmask);
+ kfree(accel);
+
return err;
}
-static int ixgbe_upper_dev_walk(struct net_device *upper, void *data)
+static int ixgbe_macvlan_up(struct net_device *vdev, void *data)
{
- if (netif_is_macvlan(upper)) {
- struct macvlan_dev *dfwd = netdev_priv(upper);
- struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+ struct ixgbe_adapter *adapter = data;
+ struct ixgbe_fwd_adapter *accel;
- if (dfwd->fwd_priv)
- ixgbe_fwd_ring_up(upper, vadapter);
- }
+ if (!netif_is_macvlan(vdev))
+ return 0;
+
+ accel = macvlan_accel_priv(vdev);
+ if (!accel)
+ return 0;
+
+ ixgbe_fwd_ring_up(adapter, accel);
return 0;
}
@@ -5434,7 +5377,7 @@
static void ixgbe_configure_dfwd(struct ixgbe_adapter *adapter)
{
netdev_walk_all_upper_dev_rcu(adapter->netdev,
- ixgbe_upper_dev_walk, NULL);
+ ixgbe_macvlan_up, adapter);
}
static void ixgbe_configure(struct ixgbe_adapter *adapter)
@@ -5797,7 +5740,7 @@
/* Free all the Tx ring sk_buffs */
if (ring_is_xdp(tx_ring))
- page_frag_free(tx_buffer->data);
+ xdp_return_frame(tx_buffer->xdpf);
else
dev_kfree_skb_any(tx_buffer->skb);
@@ -6370,7 +6313,7 @@
struct device *dev = rx_ring->dev;
int orig_node = dev_to_node(dev);
int ring_node = -1;
- int size;
+ int size, err;
size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count;
@@ -6407,6 +6350,13 @@
rx_ring->queue_index) < 0)
goto err;
+ err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp_rxq,
+ MEM_TYPE_PAGE_SHARED, NULL);
+ if (err) {
+ xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
+ goto err;
+ }
+
rx_ring->xdp_prog = adapter->xdp_prog;
return 0;
@@ -7801,7 +7751,7 @@
/* remove payload length from inner checksum */
paylen = skb->len - l4_offset;
- csum_replace_by_diff(&l4.tcp->check, htonl(paylen));
+ csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
/* update gso size and bytecount with header size */
first->gso_segs = skb_shinfo(skb)->gso_segs;
@@ -8336,7 +8286,7 @@
}
static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
- struct xdp_buff *xdp)
+ struct xdp_frame *xdpf)
{
struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
struct ixgbe_tx_buffer *tx_buffer;
@@ -8345,12 +8295,12 @@
dma_addr_t dma;
u16 i;
- len = xdp->data_end - xdp->data;
+ len = xdpf->len;
if (unlikely(!ixgbe_desc_unused(ring)))
return IXGBE_XDP_CONSUMED;
- dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
+ dma = dma_map_single(ring->dev, xdpf->data, len, DMA_TO_DEVICE);
if (dma_mapping_error(ring->dev, dma))
return IXGBE_XDP_CONSUMED;
@@ -8365,7 +8315,8 @@
dma_unmap_len_set(tx_buffer, len, len);
dma_unmap_addr_set(tx_buffer, dma, dma);
- tx_buffer->data = xdp->data;
+ tx_buffer->xdpf = xdpf;
+
tx_desc->read.buffer_addr = cpu_to_le64(dma);
/* put descriptor type bits */
@@ -8827,6 +8778,49 @@
}
#endif /* CONFIG_IXGBE_DCB */
+static int ixgbe_reassign_macvlan_pool(struct net_device *vdev, void *data)
+{
+ struct ixgbe_adapter *adapter = data;
+ struct ixgbe_fwd_adapter *accel;
+ int pool;
+
+ /* we only care about macvlans... */
+ if (!netif_is_macvlan(vdev))
+ return 0;
+
+ /* that have hardware offload enabled... */
+ accel = macvlan_accel_priv(vdev);
+ if (!accel)
+ return 0;
+
+ /* If we can relocate to a different bit do so */
+ pool = find_first_zero_bit(adapter->fwd_bitmask, adapter->num_rx_pools);
+ if (pool < adapter->num_rx_pools) {
+ set_bit(pool, adapter->fwd_bitmask);
+ accel->pool = pool;
+ return 0;
+ }
+
+ /* if we cannot find a free pool then disable the offload */
+ netdev_err(vdev, "L2FW offload disabled due to lack of queue resources\n");
+ macvlan_release_l2fw_offload(vdev);
+ kfree(accel);
+
+ return 0;
+}
+
+static void ixgbe_defrag_macvlan_pools(struct net_device *dev)
+{
+ struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+ /* flush any stale bits out of the fwd bitmask */
+ bitmap_clear(adapter->fwd_bitmask, 1, 63);
+
+ /* walk through upper devices reassigning pools */
+ netdev_walk_all_upper_dev_rcu(dev, ixgbe_reassign_macvlan_pool,
+ adapter);
+}
+
/**
* ixgbe_setup_tc - configure net_device for multiple traffic classes
*
@@ -8894,6 +8888,8 @@
#endif /* CONFIG_IXGBE_DCB */
ixgbe_init_interrupt_scheme(adapter);
+ ixgbe_defrag_macvlan_pools(dev);
+
if (netif_running(dev))
return ixgbe_open(dev);
@@ -8998,13 +8994,12 @@
static int get_macvlan_queue(struct net_device *upper, void *_data)
{
if (netif_is_macvlan(upper)) {
- struct macvlan_dev *dfwd = netdev_priv(upper);
- struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+ struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
struct upper_walk_data *data = _data;
struct ixgbe_adapter *adapter = data->adapter;
int ifindex = data->ifindex;
- if (vadapter && vadapter->netdev->ifindex == ifindex) {
+ if (vadapter && upper->ifindex == ifindex) {
data->queue = adapter->rx_ring[vadapter->rx_base_queue]->reg_idx;
data->action = data->queue;
return 1;
@@ -9109,7 +9104,8 @@
for (j = 0; field_ptr[j].val; j++) {
if (field_ptr[j].off == off) {
- field_ptr[j].val(input, mask, val, m);
+ field_ptr[j].val(input, mask, (__force u32)val,
+ (__force u32)m);
input->filter.formatted.flow_type |=
field_ptr[j].type;
found_entry = true;
@@ -9118,8 +9114,10 @@
}
if (nexthdr) {
if (nexthdr->off == cls->knode.sel->keys[i].off &&
- nexthdr->val == cls->knode.sel->keys[i].val &&
- nexthdr->mask == cls->knode.sel->keys[i].mask)
+ nexthdr->val ==
+ (__force u32)cls->knode.sel->keys[i].val &&
+ nexthdr->mask ==
+ (__force u32)cls->knode.sel->keys[i].mask)
found_jump_field = true;
else
continue;
@@ -9223,7 +9221,8 @@
for (i = 0; nexthdr[i].jump; i++) {
if (nexthdr[i].o != cls->knode.sel->offoff ||
nexthdr[i].s != cls->knode.sel->offshift ||
- nexthdr[i].m != cls->knode.sel->offmask)
+ nexthdr[i].m !=
+ (__force u32)cls->knode.sel->offmask)
return err;
jump = kzalloc(sizeof(*jump), GFP_KERNEL);
@@ -9444,6 +9443,22 @@
return features;
}
+static void ixgbe_reset_l2fw_offload(struct ixgbe_adapter *adapter)
+{
+ int rss = min_t(int, ixgbe_max_rss_indices(adapter),
+ num_online_cpus());
+
+ /* go back to full RSS if we're not running SR-IOV */
+ if (!adapter->ring_feature[RING_F_VMDQ].offset)
+ adapter->flags &= ~(IXGBE_FLAG_VMDQ_ENABLED |
+ IXGBE_FLAG_SRIOV_ENABLED);
+
+ adapter->ring_feature[RING_F_RSS].limit = rss;
+ adapter->ring_feature[RING_F_VMDQ].limit = 1;
+
+ ixgbe_setup_tc(adapter->netdev, adapter->hw_tcs);
+}
+
static int ixgbe_set_features(struct net_device *netdev,
netdev_features_t features)
{
@@ -9524,7 +9539,9 @@
}
}
- if (need_reset)
+ if ((changed & NETIF_F_HW_L2FW_DOFFLOAD) && adapter->num_rx_pools > 1)
+ ixgbe_reset_l2fw_offload(adapter);
+ else if (need_reset)
ixgbe_do_reset(netdev);
else if (changed & (NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_VLAN_CTAG_FILTER))
@@ -9787,71 +9804,98 @@
static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
{
- struct ixgbe_fwd_adapter *fwd_adapter = NULL;
struct ixgbe_adapter *adapter = netdev_priv(pdev);
- int used_pools = adapter->num_vfs + adapter->num_rx_pools;
+ struct ixgbe_fwd_adapter *accel;
int tcs = adapter->hw_tcs ? : 1;
- unsigned int limit;
int pool, err;
- /* Hardware has a limited number of available pools. Each VF, and the
- * PF require a pool. Check to ensure we don't attempt to use more
- * then the available number of pools.
+ /* The hardware supported by ixgbe only filters on the destination MAC
+ * address. In order to avoid issues we only support offloading modes
+ * where the hardware can actually provide the functionality.
*/
- if (used_pools >= IXGBE_MAX_VF_FUNCTIONS)
- return ERR_PTR(-EINVAL);
-
- if (((adapter->flags & IXGBE_FLAG_DCB_ENABLED) &&
- adapter->num_rx_pools >= (MAX_TX_QUEUES / tcs)) ||
- (adapter->num_rx_pools > IXGBE_MAX_MACVLANS))
- return ERR_PTR(-EBUSY);
-
- fwd_adapter = kzalloc(sizeof(*fwd_adapter), GFP_KERNEL);
- if (!fwd_adapter)
- return ERR_PTR(-ENOMEM);
+ if (!macvlan_supports_dest_filter(vdev))
+ return ERR_PTR(-EMEDIUMTYPE);
pool = find_first_zero_bit(adapter->fwd_bitmask, adapter->num_rx_pools);
+ if (pool == adapter->num_rx_pools) {
+ u16 used_pools = adapter->num_vfs + adapter->num_rx_pools;
+ u16 reserved_pools;
+
+ if (((adapter->flags & IXGBE_FLAG_DCB_ENABLED) &&
+ adapter->num_rx_pools >= (MAX_TX_QUEUES / tcs)) ||
+ adapter->num_rx_pools > IXGBE_MAX_MACVLANS)
+ return ERR_PTR(-EBUSY);
+
+ /* Hardware has a limited number of available pools. Each VF,
+ * and the PF require a pool. Check to ensure we don't
+ * attempt to use more then the available number of pools.
+ */
+ if (used_pools >= IXGBE_MAX_VF_FUNCTIONS)
+ return ERR_PTR(-EBUSY);
+
+ /* Enable VMDq flag so device will be set in VM mode */
+ adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED |
+ IXGBE_FLAG_SRIOV_ENABLED;
+
+ /* Try to reserve as many queues per pool as possible,
+ * we start with the configurations that support 4 queues
+ * per pools, followed by 2, and then by just 1 per pool.
+ */
+ if (used_pools < 32 && adapter->num_rx_pools < 16)
+ reserved_pools = min_t(u16,
+ 32 - used_pools,
+ 16 - adapter->num_rx_pools);
+ else if (adapter->num_rx_pools < 32)
+ reserved_pools = min_t(u16,
+ 64 - used_pools,
+ 32 - adapter->num_rx_pools);
+ else
+ reserved_pools = 64 - used_pools;
+
+
+ if (!reserved_pools)
+ return ERR_PTR(-EBUSY);
+
+ adapter->ring_feature[RING_F_VMDQ].limit += reserved_pools;
+
+ /* Force reinit of ring allocation with VMDQ enabled */
+ err = ixgbe_setup_tc(pdev, adapter->hw_tcs);
+ if (err)
+ return ERR_PTR(err);
+
+ if (pool >= adapter->num_rx_pools)
+ return ERR_PTR(-ENOMEM);
+ }
+
+ accel = kzalloc(sizeof(*accel), GFP_KERNEL);
+ if (!accel)
+ return ERR_PTR(-ENOMEM);
+
set_bit(pool, adapter->fwd_bitmask);
- limit = find_last_bit(adapter->fwd_bitmask, adapter->num_rx_pools + 1);
+ accel->pool = pool;
+ accel->netdev = vdev;
- /* Enable VMDq flag so device will be set in VM mode */
- adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED;
- adapter->ring_feature[RING_F_VMDQ].limit = limit + 1;
+ if (!netif_running(pdev))
+ return accel;
- fwd_adapter->pool = pool;
- fwd_adapter->real_adapter = adapter;
+ err = ixgbe_fwd_ring_up(adapter, accel);
+ if (err)
+ return ERR_PTR(err);
- /* Force reinit of ring allocation with VMDQ enabled */
- err = ixgbe_setup_tc(pdev, adapter->hw_tcs);
-
- if (!err && netif_running(pdev))
- err = ixgbe_fwd_ring_up(vdev, fwd_adapter);
-
- if (!err)
- return fwd_adapter;
-
- /* unwind counter and free adapter struct */
- netdev_info(pdev,
- "%s: dfwd hardware acceleration failed\n", vdev->name);
- clear_bit(pool, adapter->fwd_bitmask);
- kfree(fwd_adapter);
- return ERR_PTR(err);
+ return accel;
}
static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
{
struct ixgbe_fwd_adapter *accel = priv;
- struct ixgbe_adapter *adapter = accel->real_adapter;
+ struct ixgbe_adapter *adapter = netdev_priv(pdev);
unsigned int rxbase = accel->rx_base_queue;
- unsigned int limit, i;
+ unsigned int i;
/* delete unicast filter associated with offloaded interface */
ixgbe_del_mac_filter(adapter, accel->netdev->dev_addr,
VMDQ_P(accel->pool));
- /* disable ability to receive packets for this pool */
- IXGBE_WRITE_REG(&adapter->hw, IXGBE_VMOLR(accel->pool), 0);
-
/* Allow remaining Rx packets to get flushed out of the
* Rx FIFO before we drop the netdev for the ring.
*/
@@ -9870,25 +9914,6 @@
}
clear_bit(accel->pool, adapter->fwd_bitmask);
- limit = find_last_bit(adapter->fwd_bitmask, adapter->num_rx_pools);
- adapter->ring_feature[RING_F_VMDQ].limit = limit + 1;
-
- /* go back to full RSS if we're done with our VMQs */
- if (adapter->ring_feature[RING_F_VMDQ].limit == 1) {
- int rss = min_t(int, ixgbe_max_rss_indices(adapter),
- num_online_cpus());
-
- adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED;
- adapter->flags &= ~IXGBE_FLAG_SRIOV_ENABLED;
- adapter->ring_feature[RING_F_RSS].limit = rss;
- }
-
- ixgbe_setup_tc(pdev, adapter->hw_tcs);
- netdev_dbg(pdev, "pool %i:%i queues %i:%i\n",
- accel->pool, adapter->num_rx_pools,
- accel->rx_base_queue,
- accel->rx_base_queue +
- adapter->num_rx_queues_per_pool);
kfree(accel);
}
@@ -9970,7 +9995,8 @@
}
} else {
for (i = 0; i < adapter->num_rx_queues; i++)
- xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
+ (void)xchg(&adapter->rx_ring[i]->xdp_prog,
+ adapter->xdp_prog);
}
if (old_prog)
@@ -9996,11 +10022,13 @@
}
}
-static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
+static int ixgbe_xdp_xmit(struct net_device *dev, int n,
+ struct xdp_frame **frames)
{
struct ixgbe_adapter *adapter = netdev_priv(dev);
struct ixgbe_ring *ring;
- int err;
+ int drops = 0;
+ int i;
if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state)))
return -ENETDOWN;
@@ -10012,11 +10040,18 @@
if (unlikely(!ring))
return -ENXIO;
- err = ixgbe_xmit_xdp_ring(adapter, xdp);
- if (err != IXGBE_XDP_TX)
- return -ENOSPC;
+ for (i = 0; i < n; i++) {
+ struct xdp_frame *xdpf = frames[i];
+ int err;
- return 0;
+ err = ixgbe_xmit_xdp_ring(adapter, xdpf);
+ if (err != IXGBE_XDP_TX) {
+ xdp_return_frame_rx_napi(xdpf);
+ drops++;
+ }
+ }
+
+ return n - drops;
}
static void ixgbe_xdp_flush(struct net_device *dev)
@@ -10909,14 +10944,14 @@
rtnl_lock();
netif_device_detach(netdev);
+ if (netif_running(netdev))
+ ixgbe_close_suspend(adapter);
+
if (state == pci_channel_io_perm_failure) {
rtnl_unlock();
return PCI_ERS_RESULT_DISCONNECT;
}
- if (netif_running(netdev))
- ixgbe_close_suspend(adapter);
-
if (!test_and_set_bit(__IXGBE_DISABLED, &adapter->state))
pci_disable_device(pdev);
rtnl_unlock();
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.c
index a0cb843..5679293 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/pci.h>
#include <linux/delay.h>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
index c4628b6..e085b65 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_MBX_H_
#define _IXGBE_MBX_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
index 7244664..1e6cf22 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel 10 Gigabit PCI Express Linux drive
- * Copyright(c) 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program. If not, see <http://www.gnu.org/licenses/>.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_MODEL_H_
#define _IXGBE_MODEL_H_
@@ -53,8 +29,8 @@
union ixgbe_atr_input *mask,
u32 val, u32 m)
{
- input->filter.formatted.src_ip[0] = val;
- mask->formatted.src_ip[0] = m;
+ input->filter.formatted.src_ip[0] = (__force __be32)val;
+ mask->formatted.src_ip[0] = (__force __be32)m;
return 0;
}
@@ -62,8 +38,8 @@
union ixgbe_atr_input *mask,
u32 val, u32 m)
{
- input->filter.formatted.dst_ip[0] = val;
- mask->formatted.dst_ip[0] = m;
+ input->filter.formatted.dst_ip[0] = (__force __be32)val;
+ mask->formatted.dst_ip[0] = (__force __be32)m;
return 0;
}
@@ -79,10 +55,10 @@
union ixgbe_atr_input *mask,
u32 val, u32 m)
{
- input->filter.formatted.src_port = val & 0xffff;
- mask->formatted.src_port = m & 0xffff;
- input->filter.formatted.dst_port = val >> 16;
- mask->formatted.dst_port = m >> 16;
+ input->filter.formatted.src_port = (__force __be16)(val & 0xffff);
+ mask->formatted.src_port = (__force __be16)(m & 0xffff);
+ input->filter.formatted.dst_port = (__force __be16)(val >> 16);
+ mask->formatted.dst_port = (__force __be16)(m >> 16);
return 0;
};
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
index 91bde90..919a7af 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2014 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/pci.h>
#include <linux/delay.h>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h
index d6a7e77..64e44e0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_PHY_H_
#define _IXGBE_PHY_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index f6cc916..b3e0d8b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -1,30 +1,6 @@
-/*******************************************************************************
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
#include "ixgbe.h"
#include <linux/ptp_classify.h>
#include <linux/clocksource.h>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 008aa07..6f59933 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2015 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/types.h>
#include <linux/module.h>
@@ -266,7 +241,7 @@
#endif
/* Disable VMDq flag so device will be set in VM mode */
- if (adapter->ring_feature[RING_F_VMDQ].limit == 1) {
+ if (bitmap_weight(adapter->fwd_bitmask, adapter->num_rx_pools) == 1) {
adapter->flags &= ~IXGBE_FLAG_VMDQ_ENABLED;
adapter->flags &= ~IXGBE_FLAG_SRIOV_ENABLED;
rss = min_t(int, ixgbe_max_rss_indices(adapter),
@@ -312,7 +287,8 @@
* other values out of range.
*/
num_tc = adapter->hw_tcs;
- num_rx_pools = adapter->num_rx_pools;
+ num_rx_pools = bitmap_weight(adapter->fwd_bitmask,
+ adapter->num_rx_pools);
limit = (num_tc > 4) ? IXGBE_MAX_VFS_8TC :
(num_tc > 1) ? IXGBE_MAX_VFS_4TC : IXGBE_MAX_VFS_1TC;
@@ -878,14 +854,11 @@
/* reply to reset with ack and vf mac address */
msgbuf[0] = IXGBE_VF_RESET;
- if (!is_zero_ether_addr(vf_mac)) {
+ if (!is_zero_ether_addr(vf_mac) && adapter->vfinfo[vf].pf_set_mac) {
msgbuf[0] |= IXGBE_VT_MSGTYPE_ACK;
memcpy(addr, vf_mac, ETH_ALEN);
} else {
msgbuf[0] |= IXGBE_VT_MSGTYPE_NACK;
- dev_warn(&adapter->pdev->dev,
- "VF %d has no MAC address assigned, you may have to assign one manually\n",
- vf);
}
/*
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index e30d1f0..3ec2192 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_SRIOV_H_
#define _IXGBE_SRIOV_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sysfs.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sysfs.c
index 24766e1..2048442 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sysfs.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sysfs.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2013 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe.h"
#include "ixgbe_common.h"
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 2daa81e..e8ed377 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -1,31 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_TYPE_H_
#define _IXGBE_TYPE_H_
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
index b8c5fd2..de563cf 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
@@ -1,30 +1,5 @@
-/*******************************************************************************
-
- Intel 10 Gigabit PCI Express Linux driver
- Copyright(c) 1999 - 2016 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, write to the Free Software Foundation, Inc.,
- 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- Linux NICS <linux.nics@intel.com>
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include <linux/pci.h>
#include <linux/delay.h>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.h
index 182d640..e246c0d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.h
@@ -1,27 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
- *
- * Intel 10 Gigabit PCI Express Linux driver
- * Copyright(c) 1999 - 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- *****************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "ixgbe_type.h"
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index 9592f3e..a8148c7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -1,26 +1,6 @@
-/*******************************************************************************
- *
- * Intel 10 Gigabit PCI Express Linux driver
- * Copyright(c) 1999 - 2016 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * The full GNU General Public License is included in this distribution in
- * the file called "COPYING".
- *
- * Contact Information:
- * Linux NICS <linux.nics@intel.com>
- * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- ******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
+
#include "ixgbe_x540.h"
#include "ixgbe_type.h"
#include "ixgbe_common.h"
@@ -898,8 +878,9 @@
buffer.hdr.req.checksum = FW_DEFAULT_CHECKSUM;
/* convert offset from words to bytes */
- buffer.address = cpu_to_be32((offset + current_word) * 2);
- buffer.length = cpu_to_be16(words_to_read * 2);
+ buffer.address = (__force u32)cpu_to_be32((offset +
+ current_word) * 2);
+ buffer.length = (__force u16)cpu_to_be16(words_to_read * 2);
buffer.pad2 = 0;
buffer.pad3 = 0;
@@ -1109,9 +1090,9 @@
buffer.hdr.req.checksum = FW_DEFAULT_CHECKSUM;
/* convert offset from words to bytes */
- buffer.address = cpu_to_be32(offset * 2);
+ buffer.address = (__force u32)cpu_to_be32(offset * 2);
/* one word */
- buffer.length = cpu_to_be16(sizeof(u16));
+ buffer.length = (__force u16)cpu_to_be16(sizeof(u16));
status = hw->mac.ops.acquire_swfw_sync(hw, mask);
if (status)
diff --git a/drivers/net/ethernet/intel/ixgbevf/Makefile b/drivers/net/ethernet/intel/ixgbevf/Makefile
index bb47814..aba1e6a3 100644
--- a/drivers/net/ethernet/intel/ixgbevf/Makefile
+++ b/drivers/net/ethernet/intel/ixgbevf/Makefile
@@ -1,31 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-################################################################################
-#
-# Intel 82599 Virtual Function driver
-# Copyright(c) 1999 - 2012 Intel Corporation.
-#
-# This program is free software; you can redistribute it and/or modify it
-# under the terms and conditions of the GNU General Public License,
-# version 2, as published by the Free Software Foundation.
-#
-# This program is distributed in the hope it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-# more details.
-#
-# You should have received a copy of the GNU General Public License along with
-# this program; if not, write to the Free Software Foundation, Inc.,
-# 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-#
-# The full GNU General Public License is included in this distribution in
-# the file called "COPYING".
-#
-# Contact Information:
-# e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
-# Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-#
-################################################################################
-
+# Copyright(c) 1999 - 2018 Intel Corporation.
#
# Makefile for the Intel(R) 82599 VF ethernet driver
#
diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 71c8288..700d8eb 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2015 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBEVF_DEFINES_H_
#define _IXGBEVF_DEFINES_H_
diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 8e7d6c6..e7813d7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -1,28 +1,5 @@
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2018 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/* ethtool support for ixgbevf */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 447ce1d..70c7568 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2018 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBEVF_H_
#define _IXGBEVF_H_
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 850f8af..0830411 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1,28 +1,5 @@
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2018 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
/******************************************************************************
Copyright (c)2006 - 2007 Myricom, Inc. for some LRO specific code
@@ -4187,6 +4164,7 @@
return -EPERM;
ether_addr_copy(hw->mac.addr, addr->sa_data);
+ ether_addr_copy(hw->mac.perm_addr, addr->sa_data);
ether_addr_copy(netdev->dev_addr, addr->sa_data);
return 0;
@@ -4770,14 +4748,14 @@
rtnl_lock();
netif_device_detach(netdev);
+ if (netif_running(netdev))
+ ixgbevf_close_suspend(adapter);
+
if (state == pci_channel_io_perm_failure) {
rtnl_unlock();
return PCI_ERS_RESULT_DISCONNECT;
}
- if (netif_running(netdev))
- ixgbevf_close_suspend(adapter);
-
if (!test_and_set_bit(__IXGBEVF_DISABLED, &adapter->state))
pci_disable_device(pdev);
rtnl_unlock();
diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.c b/drivers/net/ethernet/intel/ixgbevf/mbx.c
index 2819abc..6bc1953 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.c
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.c
@@ -1,28 +1,5 @@
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2015 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "mbx.h"
#include "ixgbevf.h"
diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.h b/drivers/net/ethernet/intel/ixgbevf/mbx.h
index 5ec947f..bfd9ae1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.h
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2015 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBE_MBX_H_
#define _IXGBE_MBX_H_
diff --git a/drivers/net/ethernet/intel/ixgbevf/regs.h b/drivers/net/ethernet/intel/ixgbevf/regs.h
index 278f739..68d16ae 100644
--- a/drivers/net/ethernet/intel/ixgbevf/regs.h
+++ b/drivers/net/ethernet/intel/ixgbevf/regs.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2015 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef _IXGBEVF_REGS_H_
#define _IXGBEVF_REGS_H_
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 38d3a32..bf0577e 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -1,28 +1,5 @@
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2015 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#include "vf.h"
#include "ixgbevf.h"
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.h b/drivers/net/ethernet/intel/ixgbevf/vf.h
index 194fbdaa..d1e9e30 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.h
@@ -1,29 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/*******************************************************************************
-
- Intel 82599 Virtual Function driver
- Copyright(c) 1999 - 2015 Intel Corporation.
-
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
-
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
-
- You should have received a copy of the GNU General Public License along with
- this program; if not, see <http://www.gnu.org/licenses/>.
-
- The full GNU General Public License is included in this distribution in
- the file called "COPYING".
-
- Contact Information:
- e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
- Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
-
-*******************************************************************************/
+/* Copyright(c) 1999 - 2018 Intel Corporation. */
#ifndef __IXGBE_VF_H__
#define __IXGBE_VF_H__
diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig
index ebe5c91..cc2f770 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -86,6 +86,7 @@
depends on ARCH_MVEBU || COMPILE_TEST
depends on HAS_DMA
select MVMDIO
+ select PHYLINK
---help---
This driver supports the network interface units in the
Marvell ARMADA 375, 7K and 8K SoCs.
diff --git a/drivers/net/ethernet/marvell/mvmdio.c b/drivers/net/ethernet/marvell/mvmdio.c
index 0495487..c5dac6b 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -348,10 +348,7 @@
goto out_mdio;
}
- if (pdev->dev.of_node)
- ret = of_mdiobus_register(bus, pdev->dev.of_node);
- else
- ret = mdiobus_register(bus);
+ ret = of_mdiobus_register(bus, pdev->dev.of_node);
if (ret < 0) {
dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
goto out_mdio;
diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 6f41023..6847cd4 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -29,6 +29,7 @@
#include <linux/of_address.h>
#include <linux/of_device.h>
#include <linux/phy.h>
+#include <linux/phylink.h>
#include <linux/phy/phy.h>
#include <linux/clk.h>
#include <linux/hrtimer.h>
@@ -359,15 +360,23 @@
#define MVPP2_GMAC_FORCE_LINK_PASS BIT(1)
#define MVPP2_GMAC_IN_BAND_AUTONEG BIT(2)
#define MVPP2_GMAC_IN_BAND_AUTONEG_BYPASS BIT(3)
+#define MVPP2_GMAC_IN_BAND_RESTART_AN BIT(4)
#define MVPP2_GMAC_CONFIG_MII_SPEED BIT(5)
#define MVPP2_GMAC_CONFIG_GMII_SPEED BIT(6)
#define MVPP2_GMAC_AN_SPEED_EN BIT(7)
#define MVPP2_GMAC_FC_ADV_EN BIT(9)
+#define MVPP2_GMAC_FC_ADV_ASM_EN BIT(10)
#define MVPP2_GMAC_FLOW_CTRL_AUTONEG BIT(11)
#define MVPP2_GMAC_CONFIG_FULL_DUPLEX BIT(12)
#define MVPP2_GMAC_AN_DUPLEX_EN BIT(13)
#define MVPP2_GMAC_STATUS0 0x10
#define MVPP2_GMAC_STATUS0_LINK_UP BIT(0)
+#define MVPP2_GMAC_STATUS0_GMII_SPEED BIT(1)
+#define MVPP2_GMAC_STATUS0_MII_SPEED BIT(2)
+#define MVPP2_GMAC_STATUS0_FULL_DUPLEX BIT(3)
+#define MVPP2_GMAC_STATUS0_RX_PAUSE BIT(6)
+#define MVPP2_GMAC_STATUS0_TX_PAUSE BIT(7)
+#define MVPP2_GMAC_STATUS0_AN_COMPLETE BIT(11)
#define MVPP2_GMAC_PORT_FIFO_CFG_1_REG 0x1c
#define MVPP2_GMAC_TX_FIFO_MIN_TH_OFFS 6
#define MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK 0x1fc0
@@ -379,6 +388,8 @@
#define MVPP22_GMAC_INT_MASK_LINK_STAT BIT(1)
#define MVPP22_GMAC_CTRL_4_REG 0x90
#define MVPP22_CTRL4_EXT_PIN_GMII_SEL BIT(0)
+#define MVPP22_CTRL4_RX_FC_EN BIT(3)
+#define MVPP22_CTRL4_TX_FC_EN BIT(4)
#define MVPP22_CTRL4_DP_CLK_SEL BIT(5)
#define MVPP22_CTRL4_SYNC_BYPASS_DIS BIT(6)
#define MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE BIT(7)
@@ -392,6 +403,7 @@
#define MVPP22_XLG_CTRL0_PORT_EN BIT(0)
#define MVPP22_XLG_CTRL0_MAC_RESET_DIS BIT(1)
#define MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN BIT(7)
+#define MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN BIT(8)
#define MVPP22_XLG_CTRL0_MIB_CNT_DIS BIT(14)
#define MVPP22_XLG_CTRL1_REG 0x104
#define MVPP22_XLG_CTRL1_FRAMESIZELIMIT_OFFS 0
@@ -413,6 +425,7 @@
#define MVPP22_XLG_CTRL4_FWD_FC BIT(5)
#define MVPP22_XLG_CTRL4_FWD_PFC BIT(6)
#define MVPP22_XLG_CTRL4_MACMODSELECT_GMAC BIT(12)
+#define MVPP22_XLG_CTRL4_EN_IDLE_CHECK BIT(14)
/* SMI registers. PPv2.2 only, relative to priv->iface_base. */
#define MVPP22_SMI_MISC_CFG_REG 0x1204
@@ -1017,6 +1030,9 @@
/* Firmware node associated to the port */
struct fwnode_handle *fwnode;
+ /* Is a PHY always connected to the port */
+ bool has_phy;
+
/* Per-port registers' base address */
void __iomem *base;
void __iomem *stats_base;
@@ -1044,12 +1060,11 @@
struct mutex gather_stats_lock;
struct delayed_work stats_work;
+ struct device_node *of_node;
+
phy_interface_t phy_interface;
- struct device_node *phy_node;
+ struct phylink *phylink;
struct phy *comphy;
- unsigned int link;
- unsigned int duplex;
- unsigned int speed;
struct mvpp2_bm_pool *pool_long;
struct mvpp2_bm_pool *pool_short;
@@ -1338,6 +1353,12 @@
(addr) < (txq_pcpu)->tso_headers_dma + \
(txq_pcpu)->size * TSO_HEADER_SIZE)
+/* The prototype is added here to be used in start_dev when using ACPI. This
+ * will be removed once phylink is used for all modes (dt+ACPI).
+ */
+static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
+ const struct phylink_link_state *state);
+
/* Queue modes */
#define MVPP2_QDIST_SINGLE_MODE 0
#define MVPP2_QDIST_MULTI_MODE 1
@@ -1735,7 +1756,6 @@
int i, ai_idx = MVPP2_PRS_TCAM_AI_BYTE;
for (i = 0; i < MVPP2_PRS_AI_BITS; i++) {
-
if (!(enable & BIT(i)))
continue;
@@ -1819,7 +1839,6 @@
int ai_off = MVPP2_PRS_SRAM_AI_OFFS;
for (i = 0; i < MVPP2_PRS_SRAM_AI_CTRL_BITS; i++) {
-
if (!(mask & BIT(i)))
continue;
@@ -2109,6 +2128,9 @@
mvpp2_prs_sram_ai_update(&pe, 0,
MVPP2_PRS_SRAM_AI_MASK);
+ /* Set result info bits to 'single vlan' */
+ mvpp2_prs_sram_ri_update(&pe, MVPP2_PRS_RI_VLAN_SINGLE,
+ MVPP2_PRS_RI_VLAN_MASK);
/* If packet is tagged continue check vid filtering */
mvpp2_prs_sram_next_lu_set(&pe, MVPP2_PRS_LU_VID);
} else {
@@ -4849,6 +4871,8 @@
mvpp22_gop_init_rgmii(port);
break;
case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_2500BASEX:
mvpp22_gop_init_sgmii(port);
break;
case PHY_INTERFACE_MODE_10GKR:
@@ -4886,7 +4910,9 @@
u32 val;
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
- port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+ port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+ port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+ port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
/* Enable the GMAC link status irq for this port */
val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
val |= MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
@@ -4911,12 +4937,14 @@
if (port->gop_id == 0) {
val = readl(port->base + MVPP22_XLG_EXT_INT_MASK);
val &= ~(MVPP22_XLG_EXT_INT_MASK_XLG |
- MVPP22_XLG_EXT_INT_MASK_GIG);
+ MVPP22_XLG_EXT_INT_MASK_GIG);
writel(val, port->base + MVPP22_XLG_EXT_INT_MASK);
}
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
- port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+ port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+ port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+ port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
val &= ~MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
writel(val, port->base + MVPP22_GMAC_INT_SUM_MASK);
@@ -4928,7 +4956,9 @@
u32 val;
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
- port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+ port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+ port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+ port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_MASK);
val |= MVPP22_GMAC_INT_MASK_LINK_STAT;
writel(val, port->base + MVPP22_GMAC_INT_MASK);
@@ -4943,6 +4973,16 @@
mvpp22_gop_unmask_irq(port);
}
+/* Sets the PHY mode of the COMPHY (which configures the serdes lanes).
+ *
+ * The PHY mode used by the PPv2 driver comes from the network subsystem, while
+ * the one given to the COMPHY comes from the generic PHY subsystem. Hence they
+ * differ.
+ *
+ * The COMPHY configures the serdes lanes regardless of the actual use of the
+ * lanes by the physical layer. This is why configurations like
+ * "PPv2 (2500BaseX) - COMPHY (2500SGMII)" are valid.
+ */
static int mvpp22_comphy_init(struct mvpp2_port *port)
{
enum phy_mode mode;
@@ -4953,8 +4993,12 @@
switch (port->phy_interface) {
case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_1000BASEX:
mode = PHY_MODE_SGMII;
break;
+ case PHY_INTERFACE_MODE_2500BASEX:
+ mode = PHY_MODE_2500SGMII;
+ break;
case PHY_INTERFACE_MODE_10GKR:
mode = PHY_MODE_10GKR;
break;
@@ -4969,133 +5013,6 @@
return phy_power_on(port->comphy);
}
-static void mvpp2_port_mii_gmac_configure_mode(struct mvpp2_port *port)
-{
- u32 val;
-
- if (port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
- val = readl(port->base + MVPP22_GMAC_CTRL_4_REG);
- val |= MVPP22_CTRL4_SYNC_BYPASS_DIS | MVPP22_CTRL4_DP_CLK_SEL |
- MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE;
- val &= ~MVPP22_CTRL4_EXT_PIN_GMII_SEL;
- writel(val, port->base + MVPP22_GMAC_CTRL_4_REG);
- } else if (phy_interface_mode_is_rgmii(port->phy_interface)) {
- val = readl(port->base + MVPP22_GMAC_CTRL_4_REG);
- val |= MVPP22_CTRL4_EXT_PIN_GMII_SEL |
- MVPP22_CTRL4_SYNC_BYPASS_DIS |
- MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE;
- val &= ~MVPP22_CTRL4_DP_CLK_SEL;
- writel(val, port->base + MVPP22_GMAC_CTRL_4_REG);
- }
-
- /* The port is connected to a copper PHY */
- val = readl(port->base + MVPP2_GMAC_CTRL_0_REG);
- val &= ~MVPP2_GMAC_PORT_TYPE_MASK;
- writel(val, port->base + MVPP2_GMAC_CTRL_0_REG);
-
- val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
- val |= MVPP2_GMAC_IN_BAND_AUTONEG_BYPASS |
- MVPP2_GMAC_AN_SPEED_EN | MVPP2_GMAC_FLOW_CTRL_AUTONEG |
- MVPP2_GMAC_AN_DUPLEX_EN;
- if (port->phy_interface == PHY_INTERFACE_MODE_SGMII)
- val |= MVPP2_GMAC_IN_BAND_AUTONEG;
- writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-}
-
-static void mvpp2_port_mii_gmac_configure(struct mvpp2_port *port)
-{
- u32 val;
-
- /* Force link down */
- val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
- val &= ~MVPP2_GMAC_FORCE_LINK_PASS;
- val |= MVPP2_GMAC_FORCE_LINK_DOWN;
- writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-
- /* Set the GMAC in a reset state */
- val = readl(port->base + MVPP2_GMAC_CTRL_2_REG);
- val |= MVPP2_GMAC_PORT_RESET_MASK;
- writel(val, port->base + MVPP2_GMAC_CTRL_2_REG);
-
- /* Configure the PCS and in-band AN */
- val = readl(port->base + MVPP2_GMAC_CTRL_2_REG);
- if (port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
- val |= MVPP2_GMAC_INBAND_AN_MASK | MVPP2_GMAC_PCS_ENABLE_MASK;
- } else if (phy_interface_mode_is_rgmii(port->phy_interface)) {
- val &= ~MVPP2_GMAC_PCS_ENABLE_MASK;
- }
- writel(val, port->base + MVPP2_GMAC_CTRL_2_REG);
-
- mvpp2_port_mii_gmac_configure_mode(port);
-
- /* Unset the GMAC reset state */
- val = readl(port->base + MVPP2_GMAC_CTRL_2_REG);
- val &= ~MVPP2_GMAC_PORT_RESET_MASK;
- writel(val, port->base + MVPP2_GMAC_CTRL_2_REG);
-
- /* Stop forcing link down */
- val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
- val &= ~MVPP2_GMAC_FORCE_LINK_DOWN;
- writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-}
-
-static void mvpp2_port_mii_xlg_configure(struct mvpp2_port *port)
-{
- u32 val;
-
- if (port->gop_id != 0)
- return;
-
- val = readl(port->base + MVPP22_XLG_CTRL0_REG);
- val |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
- writel(val, port->base + MVPP22_XLG_CTRL0_REG);
-
- val = readl(port->base + MVPP22_XLG_CTRL4_REG);
- val &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
- val |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC;
- writel(val, port->base + MVPP22_XLG_CTRL4_REG);
-}
-
-static void mvpp22_port_mii_set(struct mvpp2_port *port)
-{
- u32 val;
-
- /* Only GOP port 0 has an XLG MAC */
- if (port->gop_id == 0) {
- val = readl(port->base + MVPP22_XLG_CTRL3_REG);
- val &= ~MVPP22_XLG_CTRL3_MACMODESELECT_MASK;
-
- if (port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
- port->phy_interface == PHY_INTERFACE_MODE_10GKR)
- val |= MVPP22_XLG_CTRL3_MACMODESELECT_10G;
- else
- val |= MVPP22_XLG_CTRL3_MACMODESELECT_GMAC;
-
- writel(val, port->base + MVPP22_XLG_CTRL3_REG);
- }
-}
-
-static void mvpp2_port_mii_set(struct mvpp2_port *port)
-{
- if (port->priv->hw_version == MVPP22)
- mvpp22_port_mii_set(port);
-
- if (phy_interface_mode_is_rgmii(port->phy_interface) ||
- port->phy_interface == PHY_INTERFACE_MODE_SGMII)
- mvpp2_port_mii_gmac_configure(port);
- else if (port->phy_interface == PHY_INTERFACE_MODE_10GKR)
- mvpp2_port_mii_xlg_configure(port);
-}
-
-static void mvpp2_port_fc_adv_enable(struct mvpp2_port *port)
-{
- u32 val;
-
- val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
- val |= MVPP2_GMAC_FC_ADV_EN;
- writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-}
-
static void mvpp2_port_enable(struct mvpp2_port *port)
{
u32 val;
@@ -5126,8 +5043,11 @@
(port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
port->phy_interface == PHY_INTERFACE_MODE_10GKR)) {
val = readl(port->base + MVPP22_XLG_CTRL0_REG);
- val &= ~(MVPP22_XLG_CTRL0_PORT_EN |
- MVPP22_XLG_CTRL0_MAC_RESET_DIS);
+ val &= ~MVPP22_XLG_CTRL0_PORT_EN;
+ writel(val, port->base + MVPP22_XLG_CTRL0_REG);
+
+ /* Disable & reset should be done separately */
+ val &= ~MVPP22_XLG_CTRL0_MAC_RESET_DIS;
writel(val, port->base + MVPP22_XLG_CTRL0_REG);
} else {
val = readl(port->base + MVPP2_GMAC_CTRL_0_REG);
@@ -5147,18 +5067,21 @@
}
/* Configure loopback port */
-static void mvpp2_port_loopback_set(struct mvpp2_port *port)
+static void mvpp2_port_loopback_set(struct mvpp2_port *port,
+ const struct phylink_link_state *state)
{
u32 val;
val = readl(port->base + MVPP2_GMAC_CTRL_1_REG);
- if (port->speed == 1000)
+ if (state->speed == 1000)
val |= MVPP2_GMAC_GMII_LB_EN_MASK;
else
val &= ~MVPP2_GMAC_GMII_LB_EN_MASK;
- if (port->phy_interface == PHY_INTERFACE_MODE_SGMII)
+ if (port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+ port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+ port->phy_interface == PHY_INTERFACE_MODE_2500BASEX)
val |= MVPP2_GMAC_PCS_LB_EN_MASK;
else
val &= ~MVPP2_GMAC_PCS_LB_EN_MASK;
@@ -5331,10 +5254,6 @@
int tx_port_num, val, queue, ptxq, lrxq;
if (port->priv->hw_version == MVPP21) {
- /* Configure port to loopback if needed */
- if (port->flags & MVPP2_F_LOOPBACK)
- mvpp2_port_loopback_set(port);
-
/* Update TX FIFO MIN Threshold */
val = readl(port->base + MVPP2_GMAC_PORT_FIFO_CFG_1_REG);
val &= ~MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK;
@@ -5552,7 +5471,6 @@
MVPP2_AGGR_TXQ_UPDATE_REG, pending);
}
-
/* Check if there are enough free descriptors in aggregated txq.
* If not, update the number of occupied descriptors and repeat the check.
*
@@ -5569,11 +5487,10 @@
MVPP2_AGGR_TXQ_STATUS_REG(cpu));
aggr_txq->count = val & MVPP2_AGGR_TXQ_PENDING_MASK;
+
+ if ((aggr_txq->count + num) > MVPP2_AGGR_TXQ_SIZE)
+ return -ENOMEM;
}
-
- if ((aggr_txq->count + num) > MVPP2_AGGR_TXQ_SIZE)
- return -ENOMEM;
-
return 0;
}
@@ -5633,7 +5550,7 @@
txq_pcpu->reserved_num += mvpp2_txq_alloc_reserved_desc(priv, txq, req);
- /* OK, the descriptor cound has been updated: check again. */
+ /* OK, the descriptor could have been updated: check again. */
if (txq_pcpu->reserved_num < num)
return -ENOMEM;
return 0;
@@ -6115,7 +6032,7 @@
/* Calculate base address in prefetch buffer. We reserve 16 descriptors
* for each existing TXQ.
* TCONTS for PON port must be continuous from 0 to MVPP2_MAX_TCONT
- * GBE ports assumed to be continious from 0 to MVPP2_MAX_PORTS
+ * GBE ports assumed to be continuous from 0 to MVPP2_MAX_PORTS
*/
desc_per_txq = 16;
desc = (port->id * MVPP2_MAX_TXQ * desc_per_txq) +
@@ -6372,7 +6289,9 @@
link = true;
}
} else if (phy_interface_mode_is_rgmii(port->phy_interface) ||
- port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+ port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+ port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+ port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_STAT);
if (val & MVPP22_GMAC_INT_STAT_LINK) {
event = true;
@@ -6382,6 +6301,11 @@
}
}
+ if (port->phylink) {
+ phylink_mac_change(port->phylink, link);
+ goto handled;
+ }
+
if (!netif_running(dev) || !event)
goto handled;
@@ -6406,111 +6330,6 @@
return IRQ_HANDLED;
}
-static void mvpp2_gmac_set_autoneg(struct mvpp2_port *port,
- struct phy_device *phydev)
-{
- u32 val;
-
- if (port->phy_interface != PHY_INTERFACE_MODE_RGMII &&
- port->phy_interface != PHY_INTERFACE_MODE_RGMII_ID &&
- port->phy_interface != PHY_INTERFACE_MODE_RGMII_RXID &&
- port->phy_interface != PHY_INTERFACE_MODE_RGMII_TXID &&
- port->phy_interface != PHY_INTERFACE_MODE_SGMII)
- return;
-
- val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
- val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED |
- MVPP2_GMAC_CONFIG_GMII_SPEED |
- MVPP2_GMAC_CONFIG_FULL_DUPLEX |
- MVPP2_GMAC_AN_SPEED_EN |
- MVPP2_GMAC_AN_DUPLEX_EN);
-
- if (phydev->duplex)
- val |= MVPP2_GMAC_CONFIG_FULL_DUPLEX;
-
- if (phydev->speed == SPEED_1000)
- val |= MVPP2_GMAC_CONFIG_GMII_SPEED;
- else if (phydev->speed == SPEED_100)
- val |= MVPP2_GMAC_CONFIG_MII_SPEED;
-
- writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-}
-
-/* Adjust link */
-static void mvpp2_link_event(struct net_device *dev)
-{
- struct mvpp2_port *port = netdev_priv(dev);
- struct phy_device *phydev = dev->phydev;
- bool link_reconfigured = false;
- u32 val;
-
- if (phydev->link) {
- if (port->phy_interface != phydev->interface && port->comphy) {
- /* disable current port for reconfiguration */
- mvpp2_interrupts_disable(port);
- netif_carrier_off(port->dev);
- mvpp2_port_disable(port);
- phy_power_off(port->comphy);
-
- /* comphy reconfiguration */
- port->phy_interface = phydev->interface;
- mvpp22_comphy_init(port);
-
- /* gop/mac reconfiguration */
- mvpp22_gop_init(port);
- mvpp2_port_mii_set(port);
-
- link_reconfigured = true;
- }
-
- if ((port->speed != phydev->speed) ||
- (port->duplex != phydev->duplex)) {
- mvpp2_gmac_set_autoneg(port, phydev);
-
- port->duplex = phydev->duplex;
- port->speed = phydev->speed;
- }
- }
-
- if (phydev->link != port->link || link_reconfigured) {
- port->link = phydev->link;
-
- if (phydev->link) {
- if (port->phy_interface == PHY_INTERFACE_MODE_RGMII ||
- port->phy_interface == PHY_INTERFACE_MODE_RGMII_ID ||
- port->phy_interface == PHY_INTERFACE_MODE_RGMII_RXID ||
- port->phy_interface == PHY_INTERFACE_MODE_RGMII_TXID ||
- port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
- val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
- val |= (MVPP2_GMAC_FORCE_LINK_PASS |
- MVPP2_GMAC_FORCE_LINK_DOWN);
- writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
- }
-
- mvpp2_interrupts_enable(port);
- mvpp2_port_enable(port);
-
- mvpp2_egress_enable(port);
- mvpp2_ingress_enable(port);
- netif_carrier_on(dev);
- netif_tx_wake_all_queues(dev);
- } else {
- port->duplex = -1;
- port->speed = 0;
-
- netif_tx_stop_all_queues(dev);
- netif_carrier_off(dev);
- mvpp2_ingress_disable(port);
- mvpp2_egress_disable(port);
-
- mvpp2_port_disable(port);
- mvpp2_interrupts_disable(port);
- }
-
- phy_print_status(phydev);
- }
-}
-
static void mvpp2_timer_set(struct mvpp2_port_pcpu *port_pcpu)
{
ktime_t interval;
@@ -6562,21 +6381,23 @@
{
u32 status = mvpp2_rxdesc_status_get(port, rx_desc);
size_t sz = mvpp2_rxdesc_size_get(port, rx_desc);
+ char *err_str = NULL;
switch (status & MVPP2_RXD_ERR_CODE_MASK) {
case MVPP2_RXD_ERR_CRC:
- netdev_err(port->dev, "bad rx status %08x (crc error), size=%zu\n",
- status, sz);
+ err_str = "crc";
break;
case MVPP2_RXD_ERR_OVERRUN:
- netdev_err(port->dev, "bad rx status %08x (overrun error), size=%zu\n",
- status, sz);
+ err_str = "overrun";
break;
case MVPP2_RXD_ERR_RESOURCE:
- netdev_err(port->dev, "bad rx status %08x (resource error), size=%zu\n",
- status, sz);
+ err_str = "resource";
break;
}
+ if (err_str && net_ratelimit())
+ netdev_err(port->dev,
+ "bad rx status %08x (%s error), size=%zu\n",
+ status, err_str, sz);
}
/* Handle RX checksum offload */
@@ -6781,8 +6602,7 @@
mvpp2_txdesc_size_set(port, tx_desc, frag->size);
buf_dma_addr = dma_map_single(port->dev->dev.parent, addr,
- frag->size,
- DMA_TO_DEVICE);
+ frag->size, DMA_TO_DEVICE);
if (dma_mapping_error(port->dev->dev.parent, buf_dma_addr)) {
mvpp2_txq_desc_put(txq);
goto cleanup;
@@ -7118,11 +6938,29 @@
return rx_done;
}
-/* Set hw internals when starting port */
-static void mvpp2_start_dev(struct mvpp2_port *port)
+static void mvpp22_mode_reconfigure(struct mvpp2_port *port)
{
- struct net_device *ndev = port->dev;
- int i;
+ u32 ctrl3;
+
+ /* comphy reconfiguration */
+ mvpp22_comphy_init(port);
+
+ /* gop reconfiguration */
+ mvpp22_gop_init(port);
+
+ /* Only GOP port 0 has an XLG MAC */
+ if (port->gop_id == 0) {
+ ctrl3 = readl(port->base + MVPP22_XLG_CTRL3_REG);
+ ctrl3 &= ~MVPP22_XLG_CTRL3_MACMODESELECT_MASK;
+
+ if (port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
+ port->phy_interface == PHY_INTERFACE_MODE_10GKR)
+ ctrl3 |= MVPP22_XLG_CTRL3_MACMODESELECT_10G;
+ else
+ ctrl3 |= MVPP22_XLG_CTRL3_MACMODESELECT_GMAC;
+
+ writel(ctrl3, port->base + MVPP22_XLG_CTRL3_REG);
+ }
if (port->gop_id == 0 &&
(port->phy_interface == PHY_INTERFACE_MODE_XAUI ||
@@ -7130,6 +6968,12 @@
mvpp2_xlg_max_rx_size_set(port);
else
mvpp2_gmac_max_rx_size_set(port);
+}
+
+/* Set hw internals when starting port */
+static void mvpp2_start_dev(struct mvpp2_port *port)
+{
+ int i;
mvpp2_txp_max_tx_size_set(port);
@@ -7139,42 +6983,39 @@
/* Enable interrupts on all CPUs */
mvpp2_interrupts_enable(port);
- if (port->priv->hw_version == MVPP22) {
- mvpp22_comphy_init(port);
- mvpp22_gop_init(port);
+ if (port->priv->hw_version == MVPP22)
+ mvpp22_mode_reconfigure(port);
+
+ if (port->phylink) {
+ phylink_start(port->phylink);
+ } else {
+ /* Phylink isn't used as of now for ACPI, so the MAC has to be
+ * configured manually when the interface is started. This will
+ * be removed as soon as the phylink ACPI support lands in.
+ */
+ struct phylink_link_state state = {
+ .interface = port->phy_interface,
+ .link = 1,
+ };
+ mvpp2_mac_config(port->dev, MLO_AN_INBAND, &state);
}
- mvpp2_port_mii_set(port);
- mvpp2_port_enable(port);
- if (ndev->phydev)
- phy_start(ndev->phydev);
netif_tx_start_all_queues(port->dev);
}
/* Set hw internals when stopping port */
static void mvpp2_stop_dev(struct mvpp2_port *port)
{
- struct net_device *ndev = port->dev;
int i;
- /* Stop new packets from arriving to RXQs */
- mvpp2_ingress_disable(port);
-
- mdelay(10);
-
/* Disable interrupts on all CPUs */
mvpp2_interrupts_disable(port);
for (i = 0; i < port->nqvecs; i++)
napi_disable(&port->qvecs[i].napi);
- netif_carrier_off(port->dev);
- netif_tx_stop_all_queues(port->dev);
-
- mvpp2_egress_disable(port);
- mvpp2_port_disable(port);
- if (ndev->phydev)
- phy_stop(ndev->phydev);
+ if (port->phylink)
+ phylink_stop(port->phylink);
phy_power_off(port->comphy);
}
@@ -7233,40 +7074,6 @@
addr[5] = (mac_addr_l >> MVPP2_GMAC_SA_LOW_OFFS) & 0xFF;
}
-static int mvpp2_phy_connect(struct mvpp2_port *port)
-{
- struct phy_device *phy_dev;
-
- /* No PHY is attached */
- if (!port->phy_node)
- return 0;
-
- phy_dev = of_phy_connect(port->dev, port->phy_node, mvpp2_link_event, 0,
- port->phy_interface);
- if (!phy_dev) {
- netdev_err(port->dev, "cannot connect to phy\n");
- return -ENODEV;
- }
- phy_dev->supported &= PHY_GBIT_FEATURES;
- phy_dev->advertising = phy_dev->supported;
-
- port->link = 0;
- port->duplex = 0;
- port->speed = 0;
-
- return 0;
-}
-
-static void mvpp2_phy_disconnect(struct mvpp2_port *port)
-{
- struct net_device *ndev = port->dev;
-
- if (!ndev->phydev)
- return;
-
- phy_disconnect(ndev->phydev);
-}
-
static int mvpp2_irqs_init(struct mvpp2_port *port)
{
int err, i;
@@ -7350,6 +7157,7 @@
struct mvpp2 *priv = port->priv;
unsigned char mac_bcast[ETH_ALEN] = {
0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+ bool valid = false;
int err;
err = mvpp2_prs_mac_da_accept(port, mac_bcast, true);
@@ -7392,7 +7200,19 @@
goto err_cleanup_txqs;
}
- if (priv->hw_version == MVPP22 && !port->phy_node && port->link_irq) {
+ /* Phylink isn't supported yet in ACPI mode */
+ if (port->of_node) {
+ err = phylink_of_phy_connect(port->phylink, port->of_node, 0);
+ if (err) {
+ netdev_err(port->dev, "could not attach PHY (%d)\n",
+ err);
+ goto err_free_irq;
+ }
+
+ valid = true;
+ }
+
+ if (priv->hw_version == MVPP22 && port->link_irq && !port->phylink) {
err = request_irq(port->link_irq, mvpp2_link_status_isr, 0,
dev->name, port);
if (err) {
@@ -7402,14 +7222,20 @@
}
mvpp22_gop_setup_irq(port);
+
+ /* In default link is down */
+ netif_carrier_off(port->dev);
+
+ valid = true;
+ } else {
+ port->link_irq = 0;
}
- /* In default link is down */
- netif_carrier_off(port->dev);
-
- err = mvpp2_phy_connect(port);
- if (err < 0)
- goto err_free_link_irq;
+ if (!valid) {
+ netdev_err(port->dev,
+ "invalid configuration: no dt or link IRQ");
+ goto err_free_irq;
+ }
/* Unmask interrupts on all CPUs */
on_each_cpu(mvpp2_interrupts_unmask, port, 1);
@@ -7426,9 +7252,6 @@
return 0;
-err_free_link_irq:
- if (priv->hw_version == MVPP22 && !port->phy_node && port->link_irq)
- free_irq(port->link_irq, port);
err_free_irq:
mvpp2_irqs_deinit(port);
err_cleanup_txqs:
@@ -7442,17 +7265,17 @@
{
struct mvpp2_port *port = netdev_priv(dev);
struct mvpp2_port_pcpu *port_pcpu;
- struct mvpp2 *priv = port->priv;
int cpu;
mvpp2_stop_dev(port);
- mvpp2_phy_disconnect(port);
/* Mask interrupts on all CPUs */
on_each_cpu(mvpp2_interrupts_mask, port, 1);
mvpp2_shared_interrupt_mask_unmask(port, true);
- if (priv->hw_version == MVPP22 && !port->phy_node && port->link_irq)
+ if (port->phylink)
+ phylink_disconnect_phy(port->phylink);
+ if (port->link_irq)
free_irq(port->link_irq, port);
mvpp2_irqs_deinit(port);
@@ -7535,42 +7358,18 @@
static int mvpp2_set_mac_address(struct net_device *dev, void *p)
{
- struct mvpp2_port *port = netdev_priv(dev);
const struct sockaddr *addr = p;
int err;
- if (!is_valid_ether_addr(addr->sa_data)) {
- err = -EADDRNOTAVAIL;
- goto log_error;
- }
-
- if (!netif_running(dev)) {
- err = mvpp2_prs_update_mac_da(dev, addr->sa_data);
- if (!err)
- return 0;
- /* Reconfigure parser to accept the original MAC address */
- err = mvpp2_prs_update_mac_da(dev, dev->dev_addr);
- if (err)
- goto log_error;
- }
-
- mvpp2_stop_dev(port);
+ if (!is_valid_ether_addr(addr->sa_data))
+ return -EADDRNOTAVAIL;
err = mvpp2_prs_update_mac_da(dev, addr->sa_data);
- if (!err)
- goto out_start;
-
- /* Reconfigure parser accept the original MAC address */
- err = mvpp2_prs_update_mac_da(dev, dev->dev_addr);
- if (err)
- goto log_error;
-out_start:
- mvpp2_start_dev(port);
- mvpp2_egress_enable(port);
- mvpp2_ingress_enable(port);
- return 0;
-log_error:
- netdev_err(dev, "failed to change MAC address\n");
+ if (err) {
+ /* Reconfigure parser accept the original MAC address */
+ mvpp2_prs_update_mac_da(dev, dev->dev_addr);
+ netdev_err(dev, "failed to change MAC address\n");
+ }
return err;
}
@@ -7658,16 +7457,12 @@
static int mvpp2_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
{
- int ret;
+ struct mvpp2_port *port = netdev_priv(dev);
- if (!dev->phydev)
+ if (!port->phylink)
return -ENOTSUPP;
- ret = phy_mii_ioctl(dev->phydev, ifr, cmd);
- if (!ret)
- mvpp2_link_event(dev);
-
- return ret;
+ return phylink_mii_ioctl(port->phylink, ifr, cmd);
}
static int mvpp2_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid)
@@ -7714,6 +7509,16 @@
/* Ethtool methods */
+static int mvpp2_ethtool_nway_reset(struct net_device *dev)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+
+ if (!port->phylink)
+ return -ENOTSUPP;
+
+ return phylink_ethtool_nway_reset(port->phylink);
+}
+
/* Set interrupt coalescing for ethtools */
static int mvpp2_ethtool_set_coalesce(struct net_device *dev,
struct ethtool_coalesce *c)
@@ -7842,6 +7647,50 @@
return err;
}
+static void mvpp2_ethtool_get_pause_param(struct net_device *dev,
+ struct ethtool_pauseparam *pause)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+
+ if (!port->phylink)
+ return;
+
+ phylink_ethtool_get_pauseparam(port->phylink, pause);
+}
+
+static int mvpp2_ethtool_set_pause_param(struct net_device *dev,
+ struct ethtool_pauseparam *pause)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+
+ if (!port->phylink)
+ return -ENOTSUPP;
+
+ return phylink_ethtool_set_pauseparam(port->phylink, pause);
+}
+
+static int mvpp2_ethtool_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+
+ if (!port->phylink)
+ return -ENOTSUPP;
+
+ return phylink_ethtool_ksettings_get(port->phylink, cmd);
+}
+
+static int mvpp2_ethtool_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+
+ if (!port->phylink)
+ return -ENOTSUPP;
+
+ return phylink_ethtool_ksettings_set(port->phylink, cmd);
+}
+
/* Device ops */
static const struct net_device_ops mvpp2_netdev_ops = {
@@ -7859,18 +7708,20 @@
};
static const struct ethtool_ops mvpp2_eth_tool_ops = {
- .nway_reset = phy_ethtool_nway_reset,
- .get_link = ethtool_op_get_link,
- .set_coalesce = mvpp2_ethtool_set_coalesce,
- .get_coalesce = mvpp2_ethtool_get_coalesce,
- .get_drvinfo = mvpp2_ethtool_get_drvinfo,
- .get_ringparam = mvpp2_ethtool_get_ringparam,
- .set_ringparam = mvpp2_ethtool_set_ringparam,
- .get_strings = mvpp2_ethtool_get_strings,
- .get_ethtool_stats = mvpp2_ethtool_get_stats,
- .get_sset_count = mvpp2_ethtool_get_sset_count,
- .get_link_ksettings = phy_ethtool_get_link_ksettings,
- .set_link_ksettings = phy_ethtool_set_link_ksettings,
+ .nway_reset = mvpp2_ethtool_nway_reset,
+ .get_link = ethtool_op_get_link,
+ .set_coalesce = mvpp2_ethtool_set_coalesce,
+ .get_coalesce = mvpp2_ethtool_get_coalesce,
+ .get_drvinfo = mvpp2_ethtool_get_drvinfo,
+ .get_ringparam = mvpp2_ethtool_get_ringparam,
+ .set_ringparam = mvpp2_ethtool_set_ringparam,
+ .get_strings = mvpp2_ethtool_get_strings,
+ .get_ethtool_stats = mvpp2_ethtool_get_stats,
+ .get_sset_count = mvpp2_ethtool_get_sset_count,
+ .get_pauseparam = mvpp2_ethtool_get_pause_param,
+ .set_pauseparam = mvpp2_ethtool_set_pause_param,
+ .get_link_ksettings = mvpp2_ethtool_get_link_ksettings,
+ .set_link_ksettings = mvpp2_ethtool_set_link_ksettings,
};
/* Used for PPv2.1, or PPv2.2 with the old Device Tree binding that
@@ -8172,18 +8023,361 @@
eth_hw_addr_random(dev);
}
+static void mvpp2_phylink_validate(struct net_device *dev,
+ unsigned long *supported,
+ struct phylink_link_state *state)
+{
+ __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+
+ phylink_set(mask, Autoneg);
+ phylink_set_port_modes(mask);
+ phylink_set(mask, Pause);
+ phylink_set(mask, Asym_Pause);
+
+ switch (state->interface) {
+ case PHY_INTERFACE_MODE_10GKR:
+ phylink_set(mask, 10000baseCR_Full);
+ phylink_set(mask, 10000baseSR_Full);
+ phylink_set(mask, 10000baseLR_Full);
+ phylink_set(mask, 10000baseLRM_Full);
+ phylink_set(mask, 10000baseER_Full);
+ phylink_set(mask, 10000baseKR_Full);
+ /* Fall-through */
+ default:
+ phylink_set(mask, 10baseT_Half);
+ phylink_set(mask, 10baseT_Full);
+ phylink_set(mask, 100baseT_Half);
+ phylink_set(mask, 100baseT_Full);
+ phylink_set(mask, 10000baseT_Full);
+ /* Fall-through */
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_2500BASEX:
+ phylink_set(mask, 1000baseT_Full);
+ phylink_set(mask, 1000baseX_Full);
+ phylink_set(mask, 2500baseX_Full);
+ }
+
+ bitmap_and(supported, supported, mask, __ETHTOOL_LINK_MODE_MASK_NBITS);
+ bitmap_and(state->advertising, state->advertising, mask,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static void mvpp22_xlg_link_state(struct mvpp2_port *port,
+ struct phylink_link_state *state)
+{
+ u32 val;
+
+ state->speed = SPEED_10000;
+ state->duplex = 1;
+ state->an_complete = 1;
+
+ val = readl(port->base + MVPP22_XLG_STATUS);
+ state->link = !!(val & MVPP22_XLG_STATUS_LINK_UP);
+
+ state->pause = 0;
+ val = readl(port->base + MVPP22_XLG_CTRL0_REG);
+ if (val & MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN)
+ state->pause |= MLO_PAUSE_TX;
+ if (val & MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN)
+ state->pause |= MLO_PAUSE_RX;
+}
+
+static void mvpp2_gmac_link_state(struct mvpp2_port *port,
+ struct phylink_link_state *state)
+{
+ u32 val;
+
+ val = readl(port->base + MVPP2_GMAC_STATUS0);
+
+ state->an_complete = !!(val & MVPP2_GMAC_STATUS0_AN_COMPLETE);
+ state->link = !!(val & MVPP2_GMAC_STATUS0_LINK_UP);
+ state->duplex = !!(val & MVPP2_GMAC_STATUS0_FULL_DUPLEX);
+
+ switch (port->phy_interface) {
+ case PHY_INTERFACE_MODE_1000BASEX:
+ state->speed = SPEED_1000;
+ break;
+ case PHY_INTERFACE_MODE_2500BASEX:
+ state->speed = SPEED_2500;
+ break;
+ default:
+ if (val & MVPP2_GMAC_STATUS0_GMII_SPEED)
+ state->speed = SPEED_1000;
+ else if (val & MVPP2_GMAC_STATUS0_MII_SPEED)
+ state->speed = SPEED_100;
+ else
+ state->speed = SPEED_10;
+ }
+
+ state->pause = 0;
+ if (val & MVPP2_GMAC_STATUS0_RX_PAUSE)
+ state->pause |= MLO_PAUSE_RX;
+ if (val & MVPP2_GMAC_STATUS0_TX_PAUSE)
+ state->pause |= MLO_PAUSE_TX;
+}
+
+static int mvpp2_phylink_mac_link_state(struct net_device *dev,
+ struct phylink_link_state *state)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+
+ if (port->priv->hw_version == MVPP22 && port->gop_id == 0) {
+ u32 mode = readl(port->base + MVPP22_XLG_CTRL3_REG);
+ mode &= MVPP22_XLG_CTRL3_MACMODESELECT_MASK;
+
+ if (mode == MVPP22_XLG_CTRL3_MACMODESELECT_10G) {
+ mvpp22_xlg_link_state(port, state);
+ return 1;
+ }
+ }
+
+ mvpp2_gmac_link_state(port, state);
+ return 1;
+}
+
+static void mvpp2_mac_an_restart(struct net_device *dev)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+ u32 val;
+
+ if (port->phy_interface != PHY_INTERFACE_MODE_SGMII)
+ return;
+
+ val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+ /* The RESTART_AN bit is cleared by the h/w after restarting the AN
+ * process.
+ */
+ val |= MVPP2_GMAC_IN_BAND_RESTART_AN | MVPP2_GMAC_IN_BAND_AUTONEG;
+ writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+}
+
+static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,
+ const struct phylink_link_state *state)
+{
+ u32 ctrl0, ctrl4;
+
+ ctrl0 = readl(port->base + MVPP22_XLG_CTRL0_REG);
+ ctrl4 = readl(port->base + MVPP22_XLG_CTRL4_REG);
+
+ if (state->pause & MLO_PAUSE_TX)
+ ctrl0 |= MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+ if (state->pause & MLO_PAUSE_RX)
+ ctrl0 |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
+
+ ctrl4 &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
+ ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC |
+ MVPP22_XLG_CTRL4_EN_IDLE_CHECK;
+
+ writel(ctrl0, port->base + MVPP22_XLG_CTRL0_REG);
+ writel(ctrl4, port->base + MVPP22_XLG_CTRL4_REG);
+}
+
+static void mvpp2_gmac_config(struct mvpp2_port *port, unsigned int mode,
+ const struct phylink_link_state *state)
+{
+ u32 an, ctrl0, ctrl2, ctrl4;
+
+ an = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+ ctrl0 = readl(port->base + MVPP2_GMAC_CTRL_0_REG);
+ ctrl2 = readl(port->base + MVPP2_GMAC_CTRL_2_REG);
+ ctrl4 = readl(port->base + MVPP22_GMAC_CTRL_4_REG);
+
+ /* Force link down */
+ an &= ~MVPP2_GMAC_FORCE_LINK_PASS;
+ an |= MVPP2_GMAC_FORCE_LINK_DOWN;
+ writel(an, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+
+ /* Set the GMAC in a reset state */
+ ctrl2 |= MVPP2_GMAC_PORT_RESET_MASK;
+ writel(ctrl2, port->base + MVPP2_GMAC_CTRL_2_REG);
+
+ an &= ~(MVPP2_GMAC_CONFIG_MII_SPEED | MVPP2_GMAC_CONFIG_GMII_SPEED |
+ MVPP2_GMAC_AN_SPEED_EN | MVPP2_GMAC_FC_ADV_EN |
+ MVPP2_GMAC_FC_ADV_ASM_EN | MVPP2_GMAC_FLOW_CTRL_AUTONEG |
+ MVPP2_GMAC_CONFIG_FULL_DUPLEX | MVPP2_GMAC_AN_DUPLEX_EN |
+ MVPP2_GMAC_FORCE_LINK_DOWN);
+ ctrl0 &= ~MVPP2_GMAC_PORT_TYPE_MASK;
+ ctrl2 &= ~(MVPP2_GMAC_PORT_RESET_MASK | MVPP2_GMAC_PCS_ENABLE_MASK);
+
+ if (state->interface == PHY_INTERFACE_MODE_1000BASEX ||
+ state->interface == PHY_INTERFACE_MODE_2500BASEX) {
+ /* 1000BaseX and 2500BaseX ports cannot negotiate speed nor can
+ * they negotiate duplex: they are always operating with a fixed
+ * speed of 1000/2500Mbps in full duplex, so force 1000/2500
+ * speed and full duplex here.
+ */
+ ctrl0 |= MVPP2_GMAC_PORT_TYPE_MASK;
+ an |= MVPP2_GMAC_CONFIG_GMII_SPEED |
+ MVPP2_GMAC_CONFIG_FULL_DUPLEX;
+ } else if (!phy_interface_mode_is_rgmii(state->interface)) {
+ an |= MVPP2_GMAC_AN_SPEED_EN | MVPP2_GMAC_FLOW_CTRL_AUTONEG;
+ }
+
+ if (state->duplex)
+ an |= MVPP2_GMAC_CONFIG_FULL_DUPLEX;
+ if (phylink_test(state->advertising, Pause))
+ an |= MVPP2_GMAC_FC_ADV_EN;
+ if (phylink_test(state->advertising, Asym_Pause))
+ an |= MVPP2_GMAC_FC_ADV_ASM_EN;
+
+ if (state->interface == PHY_INTERFACE_MODE_SGMII ||
+ state->interface == PHY_INTERFACE_MODE_1000BASEX ||
+ state->interface == PHY_INTERFACE_MODE_2500BASEX) {
+ an |= MVPP2_GMAC_IN_BAND_AUTONEG;
+ ctrl2 |= MVPP2_GMAC_INBAND_AN_MASK | MVPP2_GMAC_PCS_ENABLE_MASK;
+
+ ctrl4 &= ~(MVPP22_CTRL4_EXT_PIN_GMII_SEL |
+ MVPP22_CTRL4_RX_FC_EN | MVPP22_CTRL4_TX_FC_EN);
+ ctrl4 |= MVPP22_CTRL4_SYNC_BYPASS_DIS |
+ MVPP22_CTRL4_DP_CLK_SEL |
+ MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE;
+
+ if (state->pause & MLO_PAUSE_TX)
+ ctrl4 |= MVPP22_CTRL4_TX_FC_EN;
+ if (state->pause & MLO_PAUSE_RX)
+ ctrl4 |= MVPP22_CTRL4_RX_FC_EN;
+ } else if (phy_interface_mode_is_rgmii(state->interface)) {
+ an |= MVPP2_GMAC_IN_BAND_AUTONEG_BYPASS;
+
+ if (state->speed == SPEED_1000)
+ an |= MVPP2_GMAC_CONFIG_GMII_SPEED;
+ else if (state->speed == SPEED_100)
+ an |= MVPP2_GMAC_CONFIG_MII_SPEED;
+
+ ctrl4 &= ~MVPP22_CTRL4_DP_CLK_SEL;
+ ctrl4 |= MVPP22_CTRL4_EXT_PIN_GMII_SEL |
+ MVPP22_CTRL4_SYNC_BYPASS_DIS |
+ MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE;
+ }
+
+ writel(ctrl0, port->base + MVPP2_GMAC_CTRL_0_REG);
+ writel(ctrl2, port->base + MVPP2_GMAC_CTRL_2_REG);
+ writel(ctrl4, port->base + MVPP22_GMAC_CTRL_4_REG);
+ writel(an, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+}
+
+static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
+ const struct phylink_link_state *state)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+
+ /* Check for invalid configuration */
+ if (state->interface == PHY_INTERFACE_MODE_10GKR && port->gop_id != 0) {
+ netdev_err(dev, "Invalid mode on %s\n", dev->name);
+ return;
+ }
+
+ netif_tx_stop_all_queues(port->dev);
+ if (!port->has_phy)
+ netif_carrier_off(port->dev);
+
+ /* Make sure the port is disabled when reconfiguring the mode */
+ mvpp2_port_disable(port);
+
+ if (port->priv->hw_version == MVPP22 &&
+ port->phy_interface != state->interface) {
+ port->phy_interface = state->interface;
+
+ /* Reconfigure the serdes lanes */
+ phy_power_off(port->comphy);
+ mvpp22_mode_reconfigure(port);
+ }
+
+ /* mac (re)configuration */
+ if (state->interface == PHY_INTERFACE_MODE_10GKR)
+ mvpp2_xlg_config(port, mode, state);
+ else if (phy_interface_mode_is_rgmii(state->interface) ||
+ state->interface == PHY_INTERFACE_MODE_SGMII ||
+ state->interface == PHY_INTERFACE_MODE_1000BASEX ||
+ state->interface == PHY_INTERFACE_MODE_2500BASEX)
+ mvpp2_gmac_config(port, mode, state);
+
+ if (port->priv->hw_version == MVPP21 && port->flags & MVPP2_F_LOOPBACK)
+ mvpp2_port_loopback_set(port, state);
+
+ /* If the port already was up, make sure it's still in the same state */
+ if (state->link || !port->has_phy) {
+ mvpp2_port_enable(port);
+
+ mvpp2_egress_enable(port);
+ mvpp2_ingress_enable(port);
+ if (!port->has_phy)
+ netif_carrier_on(dev);
+ netif_tx_wake_all_queues(dev);
+ }
+}
+
+static void mvpp2_mac_link_up(struct net_device *dev, unsigned int mode,
+ phy_interface_t interface, struct phy_device *phy)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+ u32 val;
+
+ if (!phylink_autoneg_inband(mode) &&
+ interface != PHY_INTERFACE_MODE_10GKR) {
+ val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+ val &= ~MVPP2_GMAC_FORCE_LINK_DOWN;
+ if (phy_interface_mode_is_rgmii(interface))
+ val |= MVPP2_GMAC_FORCE_LINK_PASS;
+ writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+ }
+
+ mvpp2_port_enable(port);
+
+ mvpp2_egress_enable(port);
+ mvpp2_ingress_enable(port);
+ netif_tx_wake_all_queues(dev);
+}
+
+static void mvpp2_mac_link_down(struct net_device *dev, unsigned int mode,
+ phy_interface_t interface)
+{
+ struct mvpp2_port *port = netdev_priv(dev);
+ u32 val;
+
+ if (!phylink_autoneg_inband(mode) &&
+ interface != PHY_INTERFACE_MODE_10GKR) {
+ val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+ val &= ~MVPP2_GMAC_FORCE_LINK_PASS;
+ val |= MVPP2_GMAC_FORCE_LINK_DOWN;
+ writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+ }
+
+ netif_tx_stop_all_queues(dev);
+ mvpp2_egress_disable(port);
+ mvpp2_ingress_disable(port);
+
+ /* When using link interrupts to notify phylink of a MAC state change,
+ * we do not want the port to be disabled (we want to receive further
+ * interrupts, to be notified when the port will have a link later).
+ */
+ if (!port->has_phy)
+ return;
+
+ mvpp2_port_disable(port);
+}
+
+static const struct phylink_mac_ops mvpp2_phylink_ops = {
+ .validate = mvpp2_phylink_validate,
+ .mac_link_state = mvpp2_phylink_mac_link_state,
+ .mac_an_restart = mvpp2_mac_an_restart,
+ .mac_config = mvpp2_mac_config,
+ .mac_link_up = mvpp2_mac_link_up,
+ .mac_link_down = mvpp2_mac_link_down,
+};
+
/* Ports initialization */
static int mvpp2_port_probe(struct platform_device *pdev,
struct fwnode_handle *port_fwnode,
struct mvpp2 *priv)
{
- struct device_node *phy_node;
struct phy *comphy = NULL;
struct mvpp2_port *port;
struct mvpp2_port_pcpu *port_pcpu;
struct device_node *port_node = to_of_node(port_fwnode);
struct net_device *dev;
struct resource *res;
+ struct phylink *phylink;
char *mac_from = "";
unsigned int ntxqs, nrxqs;
bool has_tx_irqs;
@@ -8212,11 +8406,6 @@
if (!dev)
return -ENOMEM;
- if (port_node)
- phy_node = of_parse_phandle(port_node, "phy", 0);
- else
- phy_node = NULL;
-
phy_mode = fwnode_get_phy_mode(port_fwnode);
if (phy_mode < 0) {
dev_err(&pdev->dev, "incorrect phy mode\n");
@@ -8249,6 +8438,7 @@
port = netdev_priv(dev);
port->dev = dev;
port->fwnode = port_fwnode;
+ port->has_phy = !!of_find_property(port_node, "phy", NULL);
port->ntxqs = ntxqs;
port->nrxqs = nrxqs;
port->priv = priv;
@@ -8279,7 +8469,7 @@
else
port->first_rxq = port->id * priv->max_port_rxqs;
- port->phy_node = phy_node;
+ port->of_node = port_node;
port->phy_interface = phy_mode;
port->comphy = comphy;
@@ -8340,9 +8530,6 @@
mvpp2_port_periodic_xon_disable(port);
- if (priv->hw_version == MVPP21)
- mvpp2_port_fc_adv_enable(port);
-
mvpp2_port_reset(port);
port->pcpu = alloc_percpu(struct mvpp2_port_pcpu);
@@ -8386,10 +8573,23 @@
/* 9704 == 9728 - 20 and rounding to 8 */
dev->max_mtu = MVPP2_BM_JUMBO_PKT_SIZE;
+ /* Phylink isn't used w/ ACPI as of now */
+ if (port_node) {
+ phylink = phylink_create(dev, port_fwnode, phy_mode,
+ &mvpp2_phylink_ops);
+ if (IS_ERR(phylink)) {
+ err = PTR_ERR(phylink);
+ goto err_free_port_pcpu;
+ }
+ port->phylink = phylink;
+ } else {
+ port->phylink = NULL;
+ }
+
err = register_netdev(dev);
if (err < 0) {
dev_err(&pdev->dev, "failed to register netdev\n");
- goto err_free_port_pcpu;
+ goto err_phylink;
}
netdev_info(dev, "Using %s mac address %pM\n", mac_from, dev->dev_addr);
@@ -8397,6 +8597,9 @@
return 0;
+err_phylink:
+ if (port->phylink)
+ phylink_destroy(port->phylink);
err_free_port_pcpu:
free_percpu(port->pcpu);
err_free_txq_pcpu:
@@ -8410,7 +8613,6 @@
err_deinit_qvecs:
mvpp2_queue_vectors_deinit(port);
err_free_netdev:
- of_node_put(phy_node);
free_netdev(dev);
return err;
}
@@ -8421,7 +8623,8 @@
int i;
unregister_netdev(port->dev);
- of_node_put(port->phy_node);
+ if (port->phylink)
+ phylink_destroy(port->phylink);
free_percpu(port->pcpu);
free_percpu(port->stats);
for (i = 0; i < port->ntxqs; i++)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index e0b72bf..d8ebf0a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2503,7 +2503,6 @@
{
struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
struct device_node *mac_np;
- const struct of_device_id *match;
struct mtk_eth *eth;
int err;
int i;
@@ -2512,8 +2511,7 @@
if (!eth)
return -ENOMEM;
- match = of_match_device(of_mtk_match, &pdev->dev);
- eth->soc = (struct mtk_soc_data *)match->data;
+ eth->soc = of_device_get_match_data(&pdev->dev);
eth->dev = &pdev->dev;
eth->base = devm_ioremap_resource(&pdev->dev, res);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 5c613c6..9f54ccb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -593,30 +593,25 @@
}
#if IS_ENABLED(CONFIG_IPV6)
-/* In IPv6 packets, besides subtracting the pseudo header checksum,
- * we also compute/add the IP header checksum which
- * is not added by the HW.
+/* In IPv6 packets, hw_checksum lacks 6 bytes from IPv6 header:
+ * 4 first bytes : priority, version, flow_lbl
+ * and 2 additional bytes : nexthdr, hop_limit.
*/
static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
struct ipv6hdr *ipv6h)
{
__u8 nexthdr = ipv6h->nexthdr;
- __wsum csum_pseudo_hdr = 0;
+ __wsum temp;
if (unlikely(nexthdr == IPPROTO_FRAGMENT ||
nexthdr == IPPROTO_HOPOPTS ||
nexthdr == IPPROTO_SCTP))
return -1;
- hw_checksum = csum_add(hw_checksum, (__force __wsum)htons(nexthdr));
- csum_pseudo_hdr = csum_partial(&ipv6h->saddr,
- sizeof(ipv6h->saddr) + sizeof(ipv6h->daddr), 0);
- csum_pseudo_hdr = csum_add(csum_pseudo_hdr, (__force __wsum)ipv6h->payload_len);
- csum_pseudo_hdr = csum_add(csum_pseudo_hdr,
- (__force __wsum)htons(nexthdr));
-
- skb->csum = csum_sub(hw_checksum, csum_pseudo_hdr);
- skb->csum = csum_add(skb->csum, csum_partial(ipv6h, sizeof(struct ipv6hdr), 0));
+ /* priority, version, flow_lbl */
+ temp = csum_add(hw_checksum, *(__wsum *)ipv6h);
+ /* nexthdr and hop_limit */
+ skb->csum = csum_add(temp, (__force __wsum)*(__be16 *)&ipv6h->nexthdr);
return 0;
}
#endif
@@ -775,8 +770,8 @@
act = bpf_prog_run_xdp(xdp_prog, &xdp);
+ length = xdp.data_end - xdp.data;
if (xdp.data != orig_data) {
- length = xdp.data_end - xdp.data;
frags[0].page_offset = xdp.data -
xdp.data_hard_start;
va = xdp.data;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6b68537..0227786 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -694,7 +694,7 @@
u16 rings_p_up = priv->num_tx_rings_p_up;
if (netdev_get_num_tc(dev))
- return skb_tx_hash(dev, skb);
+ return fallback(dev, skb);
return fallback(dev, skb) % rings_p_up;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index de6b3d4..46dcbfb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -165,6 +165,7 @@
[36] = "QinQ VST mode support",
[37] = "sl to vl mapping table change event support",
[38] = "user MAC support",
+ [39] = "Report driver version to FW support",
};
int i;
@@ -1038,6 +1039,8 @@
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_ETH_BACKPL_AN_REP;
if (field32 & (1 << 7))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_RECOVERABLE_ERROR_EVENT;
+ if (field32 & (1 << 8))
+ dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_DRIVER_VERSION_TO_FW;
MLX4_GET(field32, outbox, QUERY_DEV_CAP_DIAG_RPRT_PER_PORT);
if (field32 & (1 << 17))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_DIAG_PER_PORT;
@@ -1860,6 +1863,8 @@
#define INIT_HCA_UC_STEERING_OFFSET (INIT_HCA_MCAST_OFFSET + 0x18)
#define INIT_HCA_LOG_MC_TABLE_SZ_OFFSET (INIT_HCA_MCAST_OFFSET + 0x1b)
#define INIT_HCA_DEVICE_MANAGED_FLOW_STEERING_EN 0x6
+#define INIT_HCA_DRIVER_VERSION_OFFSET 0x140
+#define INIT_HCA_DRIVER_VERSION_SZ 0x40
#define INIT_HCA_FS_PARAM_OFFSET 0x1d0
#define INIT_HCA_FS_BASE_OFFSET (INIT_HCA_FS_PARAM_OFFSET + 0x00)
#define INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET (INIT_HCA_FS_PARAM_OFFSET + 0x12)
@@ -1950,6 +1955,13 @@
if (dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_RECOVERABLE_ERROR_EVENT)
*(inbox + INIT_HCA_RECOVERABLE_ERROR_EVENT_OFFSET / 4) |= cpu_to_be32(1 << 31);
+ if (dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_DRIVER_VERSION_TO_FW) {
+ u8 *dst = (u8 *)(inbox + INIT_HCA_DRIVER_VERSION_OFFSET / 4);
+
+ strncpy(dst, DRV_NAME_FOR_FW, INIT_HCA_DRIVER_VERSION_SZ - 1);
+ mlx4_dbg(dev, "Reporting Driver Version to FW: %s\n", dst);
+ }
+
/* QPC/EEC/CQC/EQC/RDMARC attributes */
MLX4_PUT(inbox, param->qpc_base, INIT_HCA_QPC_BASE_OFFSET);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 60172a3..0a30d81 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -73,7 +73,7 @@
static int msi_x = 1;
module_param(msi_x, int, 0444);
-MODULE_PARM_DESC(msi_x, "attempt to use MSI-X if nonzero");
+MODULE_PARM_DESC(msi_x, "0 - don't use MSI-X, 1 - use MSI-X, >1 - limit number of MSI-X irqs to msi_x");
#else /* CONFIG_PCI_MSI */
@@ -2815,6 +2815,9 @@
dev->caps.num_eqs - dev->caps.reserved_eqs,
MAX_MSIX);
+ if (msi_x > 1)
+ nreq = min_t(int, nreq, msi_x);
+
entries = kcalloc(nreq, sizeof(*entries), GFP_KERNEL);
if (!entries)
goto no_msi;
@@ -4127,17 +4130,68 @@
.resume = mlx4_pci_resume,
};
+static int mlx4_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+ struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
+ struct mlx4_dev *dev = persist->dev;
+
+ mlx4_err(dev, "suspend was called\n");
+ mutex_lock(&persist->interface_state_mutex);
+ if (persist->interface_state & MLX4_INTERFACE_STATE_UP)
+ mlx4_unload_one(pdev);
+ mutex_unlock(&persist->interface_state_mutex);
+
+ return 0;
+}
+
+static int mlx4_resume(struct pci_dev *pdev)
+{
+ struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
+ struct mlx4_dev *dev = persist->dev;
+ struct mlx4_priv *priv = mlx4_priv(dev);
+ int nvfs[MLX4_MAX_PORTS + 1] = {0, 0, 0};
+ int total_vfs;
+ int ret = 0;
+
+ mlx4_err(dev, "resume was called\n");
+ total_vfs = dev->persist->num_vfs;
+ memcpy(nvfs, dev->persist->nvfs, sizeof(dev->persist->nvfs));
+
+ mutex_lock(&persist->interface_state_mutex);
+ if (!(persist->interface_state & MLX4_INTERFACE_STATE_UP)) {
+ ret = mlx4_load_one(pdev, priv->pci_dev_data, total_vfs,
+ nvfs, priv, 1);
+ if (!ret) {
+ ret = restore_current_port_types(dev,
+ dev->persist->curr_port_type,
+ dev->persist->curr_port_poss_type);
+ if (ret)
+ mlx4_err(dev, "resume: could not restore original port types (%d)\n", ret);
+ }
+ }
+ mutex_unlock(&persist->interface_state_mutex);
+
+ return ret;
+}
+
static struct pci_driver mlx4_driver = {
.name = DRV_NAME,
.id_table = mlx4_pci_table,
.probe = mlx4_init_one,
.shutdown = mlx4_shutdown,
.remove = mlx4_remove_one,
+ .suspend = mlx4_suspend,
+ .resume = mlx4_resume,
.err_handler = &mlx4_err_handler,
};
static int __init mlx4_verify_params(void)
{
+ if (msi_x < 0) {
+ pr_warn("mlx4_core: bad msi_x: %d\n", msi_x);
+ return -1;
+ }
+
if ((log_num_mac < 0) || (log_num_mac > 7)) {
pr_warn("mlx4_core: bad num_mac: %d\n", log_num_mac);
return -1;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index c68da19..cb9e923 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -55,8 +55,8 @@
#include "fw_qos.h"
#define DRV_NAME "mlx4_core"
-#define PFX DRV_NAME ": "
#define DRV_VERSION "4.0-0"
+#define DRV_NAME_FOR_FW "Linux," DRV_NAME "," DRV_VERSION
#define MLX4_FS_UDP_UC_EN (1 << 1)
#define MLX4_FS_TCP_UC_EN (1 << 2)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index c032319..ee66847 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -30,6 +30,7 @@
bool "Mellanox Technologies ConnectX-4 Ethernet support"
depends on NETDEVICES && ETHERNET && INET && PCI && MLX5_CORE
depends on IPV6=y || IPV6=n || MLX5_CORE=m
+ select PAGE_POOL
default n
---help---
Ethernet support in Mellanox Technologies ConnectX-4 NIC.
@@ -85,3 +86,14 @@
Build support for IPsec cryptography-offload accelaration in the NIC.
Note: Support for hardware with this capability needs to be selected
for this option to become available.
+
+config MLX5_EN_TLS
+ bool "TLS cryptography-offload accelaration"
+ depends on MLX5_CORE_EN
+ depends on TLS_DEVICE
+ depends on MLX5_ACCEL
+ default n
+ ---help---
+ Build support for TLS cryptography-offload accelaration in the NIC.
+ Note: Support for hardware with this capability needs to be selected
+ for this option to become available.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index c805769..9efbf19 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -8,24 +8,26 @@
fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
diag/fs_tracepoint.o
-mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o
+mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o
mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o \
- fpga/ipsec.o
+ fpga/ipsec.o fpga/tls.o
mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
- en_arfs.o en_fs_ethtool.o en_selftest.o
+ en_arfs.o en_fs_ethtool.o en_selftest.o en/port.o
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o en_tc.o
-mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o
+mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o
mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o ipoib/ipoib_vlan.o
mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \
en_accel/ipsec_stats.o
+mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o en_accel/tls_stats.o
+
CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
new file mode 100644
index 0000000..77ac19f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/mlx5/device.h>
+
+#include "accel/tls.h"
+#include "mlx5_core.h"
+#include "fpga/tls.h"
+
+int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn, u32 *p_swid)
+{
+ return mlx5_fpga_tls_add_tx_flow(mdev, flow, crypto_info,
+ start_offload_tcp_sn, p_swid);
+}
+
+void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid)
+{
+ mlx5_fpga_tls_del_tx_flow(mdev, swid, GFP_KERNEL);
+}
+
+bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev)
+{
+ return mlx5_fpga_is_tls_device(mdev);
+}
+
+u32 mlx5_accel_tls_device_caps(struct mlx5_core_dev *mdev)
+{
+ return mlx5_fpga_tls_device_caps(mdev);
+}
+
+int mlx5_accel_tls_init(struct mlx5_core_dev *mdev)
+{
+ return mlx5_fpga_tls_init(mdev);
+}
+
+void mlx5_accel_tls_cleanup(struct mlx5_core_dev *mdev)
+{
+ mlx5_fpga_tls_cleanup(mdev);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
new file mode 100644
index 0000000..6f9c9f4
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __MLX5_ACCEL_TLS_H__
+#define __MLX5_ACCEL_TLS_H__
+
+#include <linux/mlx5/driver.h>
+#include <linux/tls.h>
+
+#ifdef CONFIG_MLX5_ACCEL
+
+enum {
+ MLX5_ACCEL_TLS_TX = BIT(0),
+ MLX5_ACCEL_TLS_RX = BIT(1),
+ MLX5_ACCEL_TLS_V12 = BIT(2),
+ MLX5_ACCEL_TLS_V13 = BIT(3),
+ MLX5_ACCEL_TLS_LRO = BIT(4),
+ MLX5_ACCEL_TLS_IPV6 = BIT(5),
+ MLX5_ACCEL_TLS_AES_GCM128 = BIT(30),
+ MLX5_ACCEL_TLS_AES_GCM256 = BIT(31),
+};
+
+struct mlx5_ifc_tls_flow_bits {
+ u8 src_port[0x10];
+ u8 dst_port[0x10];
+ union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
+ union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
+ u8 ipv6[0x1];
+ u8 direction_sx[0x1];
+ u8 reserved_at_2[0x1e];
+};
+
+int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn, u32 *p_swid);
+void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid);
+bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev);
+u32 mlx5_accel_tls_device_caps(struct mlx5_core_dev *mdev);
+int mlx5_accel_tls_init(struct mlx5_core_dev *mdev);
+void mlx5_accel_tls_cleanup(struct mlx5_core_dev *mdev);
+
+#else
+
+static inline int
+mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn, u32 *p_swid) { return 0; }
+static inline void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid) { }
+static inline bool mlx5_accel_is_tls_device(struct mlx5_core_dev *mdev) { return false; }
+static inline u32 mlx5_accel_tls_device_caps(struct mlx5_core_dev *mdev) { return 0; }
+static inline int mlx5_accel_tls_init(struct mlx5_core_dev *mdev) { return 0; }
+static inline void mlx5_accel_tls_cleanup(struct mlx5_core_dev *mdev) { }
+
+#endif
+
+#endif /* __MLX5_ACCEL_TLS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 21cd170..487388a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -135,6 +135,14 @@
return cmd->cmd_buf + (idx << cmd->log_stride);
}
+static int mlx5_calc_cmd_blocks(struct mlx5_cmd_msg *msg)
+{
+ int size = msg->len;
+ int blen = size - min_t(int, sizeof(msg->first.data), size);
+
+ return DIV_ROUND_UP(blen, MLX5_CMD_DATA_BLOCK_SIZE);
+}
+
static u8 xor8_buf(void *buf, size_t offset, int len)
{
u8 *ptr = buf;
@@ -174,10 +182,7 @@
static void calc_chain_sig(struct mlx5_cmd_msg *msg)
{
struct mlx5_cmd_mailbox *next = msg->next;
- int size = msg->len;
- int blen = size - min_t(int, sizeof(msg->first.data), size);
- int n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1)
- / MLX5_CMD_DATA_BLOCK_SIZE;
+ int n = mlx5_calc_cmd_blocks(msg);
int i = 0;
for (i = 0; i < n && next; i++) {
@@ -220,12 +225,9 @@
static int verify_signature(struct mlx5_cmd_work_ent *ent)
{
struct mlx5_cmd_mailbox *next = ent->out->next;
+ int n = mlx5_calc_cmd_blocks(ent->out);
int err;
u8 sig;
- int size = ent->out->len;
- int blen = size - min_t(int, sizeof(ent->out->first.data), size);
- int n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1)
- / MLX5_CMD_DATA_BLOCK_SIZE;
int i = 0;
sig = xor8_buf(ent->lay, 0, sizeof(*ent->lay));
@@ -720,9 +722,11 @@
struct mlx5_cmd_msg *msg = input ? ent->in : ent->out;
u16 op = MLX5_GET(mbox_in, ent->lay->in, opcode);
struct mlx5_cmd_mailbox *next = msg->next;
+ int n = mlx5_calc_cmd_blocks(msg);
int data_only;
u32 offset = 0;
int dump_len;
+ int i;
data_only = !!(mlx5_core_debug_mask & (1 << MLX5_CMD_DATA));
@@ -749,7 +753,7 @@
offset += sizeof(*ent->lay);
}
- while (next && offset < msg->len) {
+ for (i = 0; i < n && next; i++) {
if (data_only) {
dump_len = min_t(int, MLX5_CMD_DATA_BLOCK_SIZE, msg->len - offset);
dump_buf(next->buf, dump_len, 1, offset);
@@ -1137,7 +1141,6 @@
struct mlx5_cmd_mailbox *tmp, *head = NULL;
struct mlx5_cmd_prot_block *block;
struct mlx5_cmd_msg *msg;
- int blen;
int err;
int n;
int i;
@@ -1146,8 +1149,8 @@
if (!msg)
return ERR_PTR(-ENOMEM);
- blen = size - min_t(int, sizeof(msg->first.data), size);
- n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1) / MLX5_CMD_DATA_BLOCK_SIZE;
+ msg->len = size;
+ n = mlx5_calc_cmd_blocks(msg);
for (i = 0; i < n; i++) {
tmp = alloc_cmd_box(dev, flags);
@@ -1165,7 +1168,6 @@
head = tmp;
}
msg->next = head;
- msg->len = size;
return msg;
err_alloc:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
index d93ff56..b3820a3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
@@ -235,7 +235,7 @@
switch (dst->type) {
case MLX5_FLOW_DESTINATION_TYPE_VPORT:
- trace_seq_printf(p, "vport=%u\n", dst->vport_num);
+ trace_seq_printf(p, "vport=%u\n", dst->vport.num);
break;
case MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE:
trace_seq_printf(p, "ft=%p\n", dst->ft);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 30cad07..c5c7a6d6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -53,6 +53,11 @@
#include "mlx5_core.h"
#include "en_stats.h"
+struct page_pool;
+
+#define MLX5E_METADATA_ETHER_TYPE (0x8CE4)
+#define MLX5E_METADATA_ETHER_LEN 8
+
#define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
#define MLX5E_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
@@ -60,6 +65,7 @@
#define MLX5E_HW2SW_MTU(params, hwmtu) ((hwmtu) - ((params)->hard_mtu))
#define MLX5E_SW2HW_MTU(params, swmtu) ((swmtu) + ((params)->hard_mtu))
+#define MLX5E_MAX_PRIORITY 8
#define MLX5E_MAX_DSCP 64
#define MLX5E_MAX_NUM_TC 8
@@ -239,6 +245,7 @@
bool vlan_strip_disable;
bool scatter_fcs_en;
bool rx_dim_enabled;
+ bool tx_dim_enabled;
u32 lro_timeout;
u32 pflags;
struct bpf_prog *xdp_prog;
@@ -269,6 +276,11 @@
/* The only setting that cannot be read from FW */
u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
u8 cap;
+
+ /* Buffer configuration */
+ bool manual_buffer;
+ u32 cable_len;
+ u32 xoff;
};
struct mlx5e_dcbx_dp {
@@ -282,8 +294,6 @@
MLX5E_RQ_STATE_AM,
};
-#define MLX5E_TEST_BIT(state, nr) (state & BIT(nr))
-
struct mlx5e_cq {
/* data path - accessed per cqe */
struct mlx5_cqwq wq;
@@ -328,6 +338,8 @@
MLX5E_SQ_STATE_ENABLED,
MLX5E_SQ_STATE_RECOVERING,
MLX5E_SQ_STATE_IPSEC,
+ MLX5E_SQ_STATE_AM,
+ MLX5E_SQ_STATE_TLS,
};
struct mlx5e_sq_wqe_info {
@@ -340,6 +352,7 @@
/* dirtied @completion */
u16 cc;
u32 dma_fifo_cc;
+ struct net_dim dim; /* Adaptive Moderation */
/* dirtied @xmit */
u16 pc ____cacheline_aligned_in_smp;
@@ -392,6 +405,7 @@
struct {
struct mlx5e_dma_info *di;
bool doorbell;
+ bool redirect_flush;
} db;
/* read only */
@@ -533,6 +547,7 @@
unsigned int hw_mtu;
struct mlx5e_xdpsq xdpsq;
DECLARE_BITMAP(flags, 8);
+ struct page_pool *page_pool;
/* control */
struct mlx5_wq_ctrl wq_ctrl;
@@ -625,7 +640,6 @@
struct mlx5e_tc_table {
struct mlx5_flow_table *t;
- struct rhashtable_params ht_params;
struct rhashtable ht;
DECLARE_HASHTABLE(mod_hdr_tbl, 8);
@@ -790,6 +804,9 @@
#ifdef CONFIG_MLX5_EN_IPSEC
struct mlx5e_ipsec *ipsec;
#endif
+#ifdef CONFIG_MLX5_EN_TLS
+ struct mlx5e_tls *tls;
+#endif
};
struct mlx5e_profile {
@@ -820,6 +837,8 @@
u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb,
void *accel_priv, select_queue_fallback_t fallback);
netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev);
+netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
+ struct mlx5e_tx_wqe *wqe, u16 pi);
void mlx5e_completion_event(struct mlx5_core_cq *mcq);
void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event);
@@ -919,8 +938,6 @@
void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len,
int num_channels);
-int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
-
void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params,
u8 cq_period_mode);
void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
@@ -935,6 +952,18 @@
MLX5_CAP_FLOWTABLE_NIC_RX(mdev, ft_field_support.inner_ip_version));
}
+static inline void mlx5e_sq_fetch_wqe(struct mlx5e_txqsq *sq,
+ struct mlx5e_tx_wqe **wqe,
+ u16 *pi)
+{
+ struct mlx5_wq_cyc *wq;
+
+ wq = &sq->wq;
+ *pi = sq->pc & wq->sz_m1;
+ *wqe = mlx5_wq_cyc_get_wqe(wq, *pi);
+ memset(*wqe, 0, sizeof(**wqe));
+}
+
static inline
struct mlx5e_tx_wqe *mlx5e_post_nop(struct mlx5_wq_cyc *wq, u32 sqn, u16 *pc)
{
@@ -1092,9 +1121,6 @@
int mlx5e_ethtool_flash_device(struct mlx5e_priv *priv,
struct ethtool_flash *flash);
-int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv);
-
/* mlx5e generic netdev management API */
struct net_device*
mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *profile,
@@ -1107,4 +1133,5 @@
u16 max_channels, u16 mtu);
u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
void mlx5e_rx_dim_work(struct work_struct *work);
+void mlx5e_tx_dim_work(struct work_struct *work);
#endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
new file mode 100644
index 0000000..d8e1711
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
@@ -0,0 +1 @@
+subdir-ccflags-y += -I$(src)/..
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
new file mode 100644
index 0000000..24e3b56
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -0,0 +1,237 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "port.h"
+
+/* speed in units of 1Mb */
+static const u32 mlx5e_link_speed[MLX5E_LINK_MODES_NUMBER] = {
+ [MLX5E_1000BASE_CX_SGMII] = 1000,
+ [MLX5E_1000BASE_KX] = 1000,
+ [MLX5E_10GBASE_CX4] = 10000,
+ [MLX5E_10GBASE_KX4] = 10000,
+ [MLX5E_10GBASE_KR] = 10000,
+ [MLX5E_20GBASE_KR2] = 20000,
+ [MLX5E_40GBASE_CR4] = 40000,
+ [MLX5E_40GBASE_KR4] = 40000,
+ [MLX5E_56GBASE_R4] = 56000,
+ [MLX5E_10GBASE_CR] = 10000,
+ [MLX5E_10GBASE_SR] = 10000,
+ [MLX5E_10GBASE_ER] = 10000,
+ [MLX5E_40GBASE_SR4] = 40000,
+ [MLX5E_40GBASE_LR4] = 40000,
+ [MLX5E_50GBASE_SR2] = 50000,
+ [MLX5E_100GBASE_CR4] = 100000,
+ [MLX5E_100GBASE_SR4] = 100000,
+ [MLX5E_100GBASE_KR4] = 100000,
+ [MLX5E_100GBASE_LR4] = 100000,
+ [MLX5E_100BASE_TX] = 100,
+ [MLX5E_1000BASE_T] = 1000,
+ [MLX5E_10GBASE_T] = 10000,
+ [MLX5E_25GBASE_CR] = 25000,
+ [MLX5E_25GBASE_KR] = 25000,
+ [MLX5E_25GBASE_SR] = 25000,
+ [MLX5E_50GBASE_CR2] = 50000,
+ [MLX5E_50GBASE_KR2] = 50000,
+};
+
+u32 mlx5e_port_ptys2speed(u32 eth_proto_oper)
+{
+ unsigned long temp = eth_proto_oper;
+ u32 speed = 0;
+ int i;
+
+ i = find_first_bit(&temp, MLX5E_LINK_MODES_NUMBER);
+ if (i < MLX5E_LINK_MODES_NUMBER)
+ speed = mlx5e_link_speed[i];
+
+ return speed;
+}
+
+int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
+{
+ u32 out[MLX5_ST_SZ_DW(ptys_reg)] = {};
+ u32 eth_proto_oper;
+ int err;
+
+ err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1);
+ if (err)
+ return err;
+
+ eth_proto_oper = MLX5_GET(ptys_reg, out, eth_proto_oper);
+ *speed = mlx5e_port_ptys2speed(eth_proto_oper);
+ if (!(*speed)) {
+ mlx5_core_warn(mdev, "cannot get port speed\n");
+ err = -EINVAL;
+ }
+
+ return err;
+}
+
+int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
+{
+ u32 max_speed = 0;
+ u32 proto_cap;
+ int err;
+ int i;
+
+ err = mlx5_query_port_proto_cap(mdev, &proto_cap, MLX5_PTYS_EN);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i)
+ if (proto_cap & MLX5E_PROT_MASK(i))
+ max_speed = max(max_speed, mlx5e_link_speed[i]);
+
+ *speed = max_speed;
+ return 0;
+}
+
+u32 mlx5e_port_speed2linkmodes(u32 speed)
+{
+ u32 link_modes = 0;
+ int i;
+
+ for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
+ if (mlx5e_link_speed[i] == speed)
+ link_modes |= MLX5E_PROT_MASK(i);
+ }
+
+ return link_modes;
+}
+
+int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out)
+{
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ void *in;
+ int err;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(pbmc_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PBMC, 0, 0);
+
+ kfree(in);
+ return err;
+}
+
+int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in)
+{
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ void *out;
+ int err;
+
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ MLX5_SET(pbmc_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PBMC, 0, 1);
+
+ kfree(out);
+ return err;
+}
+
+/* buffer[i]: buffer that priority i mapped to */
+int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer)
+{
+ int sz = MLX5_ST_SZ_BYTES(pptb_reg);
+ u32 prio_x_buff;
+ void *out;
+ void *in;
+ int prio;
+ int err;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!in || !out) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ MLX5_SET(pptb_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 0);
+ if (err)
+ goto out;
+
+ prio_x_buff = MLX5_GET(pptb_reg, out, prio_x_buff);
+ for (prio = 0; prio < 8; prio++) {
+ buffer[prio] = (u8)(prio_x_buff >> (4 * prio)) & 0xF;
+ mlx5_core_dbg(mdev, "prio %d, buffer %d\n", prio, buffer[prio]);
+ }
+out:
+ kfree(in);
+ kfree(out);
+ return err;
+}
+
+int mlx5e_port_set_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer)
+{
+ int sz = MLX5_ST_SZ_BYTES(pptb_reg);
+ u32 prio_x_buff;
+ void *out;
+ void *in;
+ int prio;
+ int err;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!in || !out) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ /* First query the pptb register */
+ MLX5_SET(pptb_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 0);
+ if (err)
+ goto out;
+
+ memcpy(in, out, sz);
+ MLX5_SET(pptb_reg, in, local_port, 1);
+
+ /* Update the pm and prio_x_buff */
+ MLX5_SET(pptb_reg, in, pm, 0xFF);
+
+ prio_x_buff = 0;
+ for (prio = 0; prio < 8; prio++)
+ prio_x_buff |= (buffer[prio] << (4 * prio));
+ MLX5_SET(pptb_reg, in, prio_x_buff, prio_x_buff);
+
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 1);
+
+out:
+ kfree(in);
+ kfree(out);
+ return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
new file mode 100644
index 0000000..f8cbd81
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __MLX5E_EN_PORT_H
+#define __MLX5E_EN_PORT_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
+int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
+int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
+u32 mlx5e_port_speed2linkmodes(u32 speed);
+
+int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out);
+int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in);
+int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
+int mlx5e_port_set_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
new file mode 100644
index 0000000..c047da8
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include "port_buffer.h"
+
+int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
+ struct mlx5e_port_buffer *port_buffer)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ u32 total_used = 0;
+ void *buffer;
+ void *out;
+ int err;
+ int i;
+
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ err = mlx5e_port_query_pbmc(mdev, out);
+ if (err)
+ goto out;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ buffer = MLX5_ADDR_OF(pbmc_reg, out, buffer[i]);
+ port_buffer->buffer[i].lossy =
+ MLX5_GET(bufferx_reg, buffer, lossy);
+ port_buffer->buffer[i].epsb =
+ MLX5_GET(bufferx_reg, buffer, epsb);
+ port_buffer->buffer[i].size =
+ MLX5_GET(bufferx_reg, buffer, size) << MLX5E_BUFFER_CELL_SHIFT;
+ port_buffer->buffer[i].xon =
+ MLX5_GET(bufferx_reg, buffer, xon_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+ port_buffer->buffer[i].xoff =
+ MLX5_GET(bufferx_reg, buffer, xoff_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+ total_used += port_buffer->buffer[i].size;
+
+ mlx5e_dbg(HW, priv, "buffer %d: size=%d, xon=%d, xoff=%d, epsb=%d, lossy=%d\n", i,
+ port_buffer->buffer[i].size,
+ port_buffer->buffer[i].xon,
+ port_buffer->buffer[i].xoff,
+ port_buffer->buffer[i].epsb,
+ port_buffer->buffer[i].lossy);
+ }
+
+ port_buffer->port_buffer_size =
+ MLX5_GET(pbmc_reg, out, port_buffer_size) << MLX5E_BUFFER_CELL_SHIFT;
+ port_buffer->spare_buffer_size =
+ port_buffer->port_buffer_size - total_used;
+
+ mlx5e_dbg(HW, priv, "total buffer size=%d, spare buffer size=%d\n",
+ port_buffer->port_buffer_size,
+ port_buffer->spare_buffer_size);
+out:
+ kfree(out);
+ return err;
+}
+
+static int port_set_buffer(struct mlx5e_priv *priv,
+ struct mlx5e_port_buffer *port_buffer)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ void *buffer;
+ void *in;
+ int err;
+ int i;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ err = mlx5e_port_query_pbmc(mdev, in);
+ if (err)
+ goto out;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ buffer = MLX5_ADDR_OF(pbmc_reg, in, buffer[i]);
+
+ MLX5_SET(bufferx_reg, buffer, size,
+ port_buffer->buffer[i].size >> MLX5E_BUFFER_CELL_SHIFT);
+ MLX5_SET(bufferx_reg, buffer, lossy,
+ port_buffer->buffer[i].lossy);
+ MLX5_SET(bufferx_reg, buffer, xoff_threshold,
+ port_buffer->buffer[i].xoff >> MLX5E_BUFFER_CELL_SHIFT);
+ MLX5_SET(bufferx_reg, buffer, xon_threshold,
+ port_buffer->buffer[i].xon >> MLX5E_BUFFER_CELL_SHIFT);
+ }
+
+ err = mlx5e_port_set_pbmc(mdev, in);
+out:
+ kfree(in);
+ return err;
+}
+
+/* xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]) */
+static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu)
+{
+ u32 speed;
+ u32 xoff;
+ int err;
+
+ err = mlx5e_port_linkspeed(priv->mdev, &speed);
+ if (err)
+ return 0;
+
+ xoff = (301 + 216 * priv->dcbx.cable_len / 100) * speed / 1000 + 272 * mtu / 100;
+
+ mlx5e_dbg(HW, priv, "%s: xoff=%d\n", __func__, xoff);
+ return xoff;
+}
+
+static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
+ u32 xoff, unsigned int mtu)
+{
+ int i;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ if (port_buffer->buffer[i].lossy) {
+ port_buffer->buffer[i].xoff = 0;
+ port_buffer->buffer[i].xon = 0;
+ continue;
+ }
+
+ if (port_buffer->buffer[i].size <
+ (xoff + mtu + (1 << MLX5E_BUFFER_CELL_SHIFT)))
+ return -ENOMEM;
+
+ port_buffer->buffer[i].xoff = port_buffer->buffer[i].size - xoff;
+ port_buffer->buffer[i].xon = port_buffer->buffer[i].xoff - mtu;
+ }
+
+ return 0;
+}
+
+/**
+ * update_buffer_lossy()
+ * mtu: device's MTU
+ * pfc_en: <input> current pfc configuration
+ * buffer: <input> current prio to buffer mapping
+ * xoff: <input> xoff value
+ * port_buffer: <output> port receive buffer configuration
+ * change: <output>
+ *
+ * Update buffer configuration based on pfc configuraiton and priority
+ * to buffer mapping.
+ * Buffer's lossy bit is changed to:
+ * lossless if there is at least one PFC enabled priority mapped to this buffer
+ * lossy if all priorities mapped to this buffer are PFC disabled
+ *
+ * Return:
+ * Return 0 if no error.
+ * Set change to true if buffer configuration is modified.
+ */
+static int update_buffer_lossy(unsigned int mtu,
+ u8 pfc_en, u8 *buffer, u32 xoff,
+ struct mlx5e_port_buffer *port_buffer,
+ bool *change)
+{
+ bool changed = false;
+ u8 lossy_count;
+ u8 prio_count;
+ u8 lossy;
+ int prio;
+ int err;
+ int i;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ prio_count = 0;
+ lossy_count = 0;
+
+ for (prio = 0; prio < MLX5E_MAX_PRIORITY; prio++) {
+ if (buffer[prio] != i)
+ continue;
+
+ prio_count++;
+ lossy_count += !(pfc_en & (1 << prio));
+ }
+
+ if (lossy_count == prio_count)
+ lossy = 1;
+ else /* lossy_count < prio_count */
+ lossy = 0;
+
+ if (lossy != port_buffer->buffer[i].lossy) {
+ port_buffer->buffer[i].lossy = lossy;
+ changed = true;
+ }
+ }
+
+ if (changed) {
+ err = update_xoff_threshold(port_buffer, xoff, mtu);
+ if (err)
+ return err;
+
+ *change = true;
+ }
+
+ return 0;
+}
+
+int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
+ u32 change, unsigned int mtu,
+ struct ieee_pfc *pfc,
+ u32 *buffer_size,
+ u8 *prio2buffer)
+{
+ struct mlx5e_port_buffer port_buffer;
+ u32 xoff = calculate_xoff(priv, mtu);
+ bool update_prio2buffer = false;
+ u8 buffer[MLX5E_MAX_PRIORITY];
+ bool update_buffer = false;
+ u32 total_used = 0;
+ u8 curr_pfc_en;
+ int err;
+ int i;
+
+ mlx5e_dbg(HW, priv, "%s: change=%x\n", __func__, change);
+
+ err = mlx5e_port_query_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+
+ if (change & MLX5E_PORT_BUFFER_CABLE_LEN) {
+ update_buffer = true;
+ err = update_xoff_threshold(&port_buffer, xoff, mtu);
+ if (err)
+ return err;
+ }
+
+ if (change & MLX5E_PORT_BUFFER_PFC) {
+ err = mlx5e_port_query_priority2buffer(priv->mdev, buffer);
+ if (err)
+ return err;
+
+ err = update_buffer_lossy(mtu, pfc->pfc_en, buffer, xoff,
+ &port_buffer, &update_buffer);
+ if (err)
+ return err;
+ }
+
+ if (change & MLX5E_PORT_BUFFER_PRIO2BUFFER) {
+ update_prio2buffer = true;
+ err = mlx5_query_port_pfc(priv->mdev, &curr_pfc_en, NULL);
+ if (err)
+ return err;
+
+ err = update_buffer_lossy(mtu, curr_pfc_en, prio2buffer, xoff,
+ &port_buffer, &update_buffer);
+ if (err)
+ return err;
+ }
+
+ if (change & MLX5E_PORT_BUFFER_SIZE) {
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ mlx5e_dbg(HW, priv, "%s: buffer[%d]=%d\n", __func__, i, buffer_size[i]);
+ if (!port_buffer.buffer[i].lossy && !buffer_size[i]) {
+ mlx5e_dbg(HW, priv, "%s: lossless buffer[%d] size cannot be zero\n",
+ __func__, i);
+ return -EINVAL;
+ }
+
+ port_buffer.buffer[i].size = buffer_size[i];
+ total_used += buffer_size[i];
+ }
+
+ mlx5e_dbg(HW, priv, "%s: total buffer requested=%d\n", __func__, total_used);
+
+ if (total_used > port_buffer.port_buffer_size)
+ return -EINVAL;
+
+ update_buffer = true;
+ err = update_xoff_threshold(&port_buffer, xoff, mtu);
+ if (err)
+ return err;
+ }
+
+ /* Need to update buffer configuration if xoff value is changed */
+ if (!update_buffer && xoff != priv->dcbx.xoff) {
+ update_buffer = true;
+ err = update_xoff_threshold(&port_buffer, xoff, mtu);
+ if (err)
+ return err;
+ }
+ priv->dcbx.xoff = xoff;
+
+ /* Apply the settings */
+ if (update_buffer) {
+ err = port_set_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+ }
+
+ if (update_prio2buffer)
+ err = mlx5e_port_set_priority2buffer(priv->mdev, prio2buffer);
+
+ return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
new file mode 100644
index 0000000..34f55b8
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __MLX5_EN_PORT_BUFFER_H__
+#define __MLX5_EN_PORT_BUFFER_H__
+
+#include "en.h"
+#include "port.h"
+
+#define MLX5E_MAX_BUFFER 8
+#define MLX5E_BUFFER_CELL_SHIFT 7
+#define MLX5E_DEFAULT_CABLE_LEN 7 /* 7 meters */
+
+#define MLX5_BUFFER_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, pcam_reg) && \
+ MLX5_CAP_PCAM_REG(mdev, pbmc) && \
+ MLX5_CAP_PCAM_REG(mdev, pptb))
+
+enum {
+ MLX5E_PORT_BUFFER_CABLE_LEN = BIT(0),
+ MLX5E_PORT_BUFFER_PFC = BIT(1),
+ MLX5E_PORT_BUFFER_PRIO2BUFFER = BIT(2),
+ MLX5E_PORT_BUFFER_SIZE = BIT(3),
+};
+
+struct mlx5e_bufferx_reg {
+ u8 lossy;
+ u8 epsb;
+ u32 size;
+ u32 xoff;
+ u32 xon;
+};
+
+struct mlx5e_port_buffer {
+ u32 port_buffer_size;
+ u32 spare_buffer_size;
+ struct mlx5e_bufferx_reg buffer[MLX5E_MAX_BUFFER];
+};
+
+int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
+ u32 change, unsigned int mtu,
+ struct ieee_pfc *pfc,
+ u32 *buffer_size,
+ u8 *prio2buffer);
+
+int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
+ struct mlx5e_port_buffer *port_buffer);
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
new file mode 100644
index 0000000..f20074d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h
@@ -0,0 +1,72 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __MLX5E_EN_ACCEL_H__
+#define __MLX5E_EN_ACCEL_H__
+
+#ifdef CONFIG_MLX5_ACCEL
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include "en_accel/ipsec_rxtx.h"
+#include "en_accel/tls_rxtx.h"
+#include "en.h"
+
+static inline struct sk_buff *mlx5e_accel_handle_tx(struct sk_buff *skb,
+ struct mlx5e_txqsq *sq,
+ struct net_device *dev,
+ struct mlx5e_tx_wqe **wqe,
+ u16 *pi)
+{
+#ifdef CONFIG_MLX5_EN_TLS
+ if (test_bit(MLX5E_SQ_STATE_TLS, &sq->state)) {
+ skb = mlx5e_tls_handle_tx_skb(dev, sq, skb, wqe, pi);
+ if (unlikely(!skb))
+ return NULL;
+ }
+#endif
+
+#ifdef CONFIG_MLX5_EN_IPSEC
+ if (test_bit(MLX5E_SQ_STATE_IPSEC, &sq->state)) {
+ skb = mlx5e_ipsec_handle_tx_skb(dev, *wqe, skb);
+ if (unlikely(!skb))
+ return NULL;
+ }
+#endif
+
+ return skb;
+}
+
+#endif /* CONFIG_MLX5_ACCEL */
+
+#endif /* __MLX5E_EN_ACCEL_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
index 1198fc1..93bf10e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
@@ -45,9 +45,6 @@
#define MLX5E_IPSEC_SADB_RX_BITS 10
#define MLX5E_IPSEC_ESN_SCOPE_MID 0x80000000L
-#define MLX5E_METADATA_ETHER_TYPE (0x8CE4)
-#define MLX5E_METADATA_ETHER_LEN 8
-
struct mlx5e_priv;
struct mlx5e_ipsec_sw_stats {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
new file mode 100644
index 0000000..d167845
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <net/ipv6.h>
+#include "en_accel/tls.h"
+#include "accel/tls.h"
+
+static void mlx5e_tls_set_ipv4_flow(void *flow, struct sock *sk)
+{
+ struct inet_sock *inet = inet_sk(sk);
+
+ MLX5_SET(tls_flow, flow, ipv6, 0);
+ memcpy(MLX5_ADDR_OF(tls_flow, flow, dst_ipv4_dst_ipv6.ipv4_layout.ipv4),
+ &inet->inet_daddr, MLX5_FLD_SZ_BYTES(ipv4_layout, ipv4));
+ memcpy(MLX5_ADDR_OF(tls_flow, flow, src_ipv4_src_ipv6.ipv4_layout.ipv4),
+ &inet->inet_rcv_saddr, MLX5_FLD_SZ_BYTES(ipv4_layout, ipv4));
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static void mlx5e_tls_set_ipv6_flow(void *flow, struct sock *sk)
+{
+ struct ipv6_pinfo *np = inet6_sk(sk);
+
+ MLX5_SET(tls_flow, flow, ipv6, 1);
+ memcpy(MLX5_ADDR_OF(tls_flow, flow, dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+ &sk->sk_v6_daddr, MLX5_FLD_SZ_BYTES(ipv6_layout, ipv6));
+ memcpy(MLX5_ADDR_OF(tls_flow, flow, src_ipv4_src_ipv6.ipv6_layout.ipv6),
+ &np->saddr, MLX5_FLD_SZ_BYTES(ipv6_layout, ipv6));
+}
+#endif
+
+static void mlx5e_tls_set_flow_tcp_ports(void *flow, struct sock *sk)
+{
+ struct inet_sock *inet = inet_sk(sk);
+
+ memcpy(MLX5_ADDR_OF(tls_flow, flow, src_port), &inet->inet_sport,
+ MLX5_FLD_SZ_BYTES(tls_flow, src_port));
+ memcpy(MLX5_ADDR_OF(tls_flow, flow, dst_port), &inet->inet_dport,
+ MLX5_FLD_SZ_BYTES(tls_flow, dst_port));
+}
+
+static int mlx5e_tls_set_flow(void *flow, struct sock *sk, u32 caps)
+{
+ switch (sk->sk_family) {
+ case AF_INET:
+ mlx5e_tls_set_ipv4_flow(flow, sk);
+ break;
+#if IS_ENABLED(CONFIG_IPV6)
+ case AF_INET6:
+ if (!sk->sk_ipv6only &&
+ ipv6_addr_type(&sk->sk_v6_daddr) == IPV6_ADDR_MAPPED) {
+ mlx5e_tls_set_ipv4_flow(flow, sk);
+ break;
+ }
+ if (!(caps & MLX5_ACCEL_TLS_IPV6))
+ goto error_out;
+
+ mlx5e_tls_set_ipv6_flow(flow, sk);
+ break;
+#endif
+ default:
+ goto error_out;
+ }
+
+ mlx5e_tls_set_flow_tcp_ports(flow, sk);
+ return 0;
+error_out:
+ return -EINVAL;
+}
+
+static int mlx5e_tls_add(struct net_device *netdev, struct sock *sk,
+ enum tls_offload_ctx_dir direction,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn)
+{
+ struct mlx5e_priv *priv = netdev_priv(netdev);
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct mlx5_core_dev *mdev = priv->mdev;
+ u32 caps = mlx5_accel_tls_device_caps(mdev);
+ int ret = -ENOMEM;
+ void *flow;
+
+ if (direction != TLS_OFFLOAD_CTX_DIR_TX)
+ return -EINVAL;
+
+ flow = kzalloc(MLX5_ST_SZ_BYTES(tls_flow), GFP_KERNEL);
+ if (!flow)
+ return ret;
+
+ ret = mlx5e_tls_set_flow(flow, sk, caps);
+ if (ret)
+ goto free_flow;
+
+ if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
+ struct mlx5e_tls_offload_context *tx_ctx =
+ mlx5e_get_tls_tx_context(tls_ctx);
+ u32 swid;
+
+ ret = mlx5_accel_tls_add_tx_flow(mdev, flow, crypto_info,
+ start_offload_tcp_sn, &swid);
+ if (ret < 0)
+ goto free_flow;
+
+ tx_ctx->swid = htonl(swid);
+ tx_ctx->expected_seq = start_offload_tcp_sn;
+ }
+
+ return 0;
+free_flow:
+ kfree(flow);
+ return ret;
+}
+
+static void mlx5e_tls_del(struct net_device *netdev,
+ struct tls_context *tls_ctx,
+ enum tls_offload_ctx_dir direction)
+{
+ struct mlx5e_priv *priv = netdev_priv(netdev);
+
+ if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
+ u32 swid = ntohl(mlx5e_get_tls_tx_context(tls_ctx)->swid);
+
+ mlx5_accel_tls_del_tx_flow(priv->mdev, swid);
+ } else {
+ netdev_err(netdev, "unsupported direction %d\n", direction);
+ }
+}
+
+static const struct tlsdev_ops mlx5e_tls_ops = {
+ .tls_dev_add = mlx5e_tls_add,
+ .tls_dev_del = mlx5e_tls_del,
+};
+
+void mlx5e_tls_build_netdev(struct mlx5e_priv *priv)
+{
+ struct net_device *netdev = priv->netdev;
+
+ if (!mlx5_accel_is_tls_device(priv->mdev))
+ return;
+
+ netdev->features |= NETIF_F_HW_TLS_TX;
+ netdev->hw_features |= NETIF_F_HW_TLS_TX;
+ netdev->tlsdev_ops = &mlx5e_tls_ops;
+}
+
+int mlx5e_tls_init(struct mlx5e_priv *priv)
+{
+ struct mlx5e_tls *tls = kzalloc(sizeof(*tls), GFP_KERNEL);
+
+ if (!tls)
+ return -ENOMEM;
+
+ priv->tls = tls;
+ return 0;
+}
+
+void mlx5e_tls_cleanup(struct mlx5e_priv *priv)
+{
+ struct mlx5e_tls *tls = priv->tls;
+
+ if (!tls)
+ return;
+
+ kfree(tls);
+ priv->tls = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
new file mode 100644
index 0000000..b616217
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h
@@ -0,0 +1,87 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+#ifndef __MLX5E_TLS_H__
+#define __MLX5E_TLS_H__
+
+#ifdef CONFIG_MLX5_EN_TLS
+
+#include <net/tls.h>
+#include "en.h"
+
+struct mlx5e_tls_sw_stats {
+ atomic64_t tx_tls_drop_metadata;
+ atomic64_t tx_tls_drop_resync_alloc;
+ atomic64_t tx_tls_drop_no_sync_data;
+ atomic64_t tx_tls_drop_bypass_required;
+};
+
+struct mlx5e_tls {
+ struct mlx5e_tls_sw_stats sw_stats;
+};
+
+struct mlx5e_tls_offload_context {
+ struct tls_offload_context base;
+ u32 expected_seq;
+ __be32 swid;
+};
+
+static inline struct mlx5e_tls_offload_context *
+mlx5e_get_tls_tx_context(struct tls_context *tls_ctx)
+{
+ BUILD_BUG_ON(sizeof(struct mlx5e_tls_offload_context) >
+ TLS_OFFLOAD_CONTEXT_SIZE);
+ return container_of(tls_offload_ctx(tls_ctx),
+ struct mlx5e_tls_offload_context,
+ base);
+}
+
+void mlx5e_tls_build_netdev(struct mlx5e_priv *priv);
+int mlx5e_tls_init(struct mlx5e_priv *priv);
+void mlx5e_tls_cleanup(struct mlx5e_priv *priv);
+
+int mlx5e_tls_get_count(struct mlx5e_priv *priv);
+int mlx5e_tls_get_strings(struct mlx5e_priv *priv, uint8_t *data);
+int mlx5e_tls_get_stats(struct mlx5e_priv *priv, u64 *data);
+
+#else
+
+static inline void mlx5e_tls_build_netdev(struct mlx5e_priv *priv) { }
+static inline int mlx5e_tls_init(struct mlx5e_priv *priv) { return 0; }
+static inline void mlx5e_tls_cleanup(struct mlx5e_priv *priv) { }
+static inline int mlx5e_tls_get_count(struct mlx5e_priv *priv) { return 0; }
+static inline int mlx5e_tls_get_strings(struct mlx5e_priv *priv, uint8_t *data) { return 0; }
+static inline int mlx5e_tls_get_stats(struct mlx5e_priv *priv, u64 *data) { return 0; }
+
+#endif
+
+#endif /* __MLX5E_TLS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
new file mode 100644
index 0000000..ad2790f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
@@ -0,0 +1,278 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include "en_accel/tls.h"
+#include "en_accel/tls_rxtx.h"
+
+#define SYNDROME_OFFLOAD_REQUIRED 32
+#define SYNDROME_SYNC 33
+
+struct sync_info {
+ u64 rcd_sn;
+ s32 sync_len;
+ int nr_frags;
+ skb_frag_t frags[MAX_SKB_FRAGS];
+};
+
+struct mlx5e_tls_metadata {
+ /* One byte of syndrome followed by 3 bytes of swid */
+ __be32 syndrome_swid;
+ __be16 first_seq;
+ /* packet type ID field */
+ __be16 ethertype;
+} __packed;
+
+static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 swid)
+{
+ struct mlx5e_tls_metadata *pet;
+ struct ethhdr *eth;
+
+ if (skb_cow_head(skb, sizeof(struct mlx5e_tls_metadata)))
+ return -ENOMEM;
+
+ eth = (struct ethhdr *)skb_push(skb, sizeof(struct mlx5e_tls_metadata));
+ skb->mac_header -= sizeof(struct mlx5e_tls_metadata);
+ pet = (struct mlx5e_tls_metadata *)(eth + 1);
+
+ memmove(skb->data, skb->data + sizeof(struct mlx5e_tls_metadata),
+ 2 * ETH_ALEN);
+
+ eth->h_proto = cpu_to_be16(MLX5E_METADATA_ETHER_TYPE);
+ pet->syndrome_swid = htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
+
+ return 0;
+}
+
+static int mlx5e_tls_get_sync_data(struct mlx5e_tls_offload_context *context,
+ u32 tcp_seq, struct sync_info *info)
+{
+ int remaining, i = 0, ret = -EINVAL;
+ struct tls_record_info *record;
+ unsigned long flags;
+ s32 sync_size;
+
+ spin_lock_irqsave(&context->base.lock, flags);
+ record = tls_get_record(&context->base, tcp_seq, &info->rcd_sn);
+
+ if (unlikely(!record))
+ goto out;
+
+ sync_size = tcp_seq - tls_record_start_seq(record);
+ info->sync_len = sync_size;
+ if (unlikely(sync_size < 0)) {
+ if (tls_record_is_start_marker(record))
+ goto done;
+
+ goto out;
+ }
+
+ remaining = sync_size;
+ while (remaining > 0) {
+ info->frags[i] = record->frags[i];
+ __skb_frag_ref(&info->frags[i]);
+ remaining -= skb_frag_size(&info->frags[i]);
+
+ if (remaining < 0)
+ skb_frag_size_add(&info->frags[i], remaining);
+
+ i++;
+ }
+ info->nr_frags = i;
+done:
+ ret = 0;
+out:
+ spin_unlock_irqrestore(&context->base.lock, flags);
+ return ret;
+}
+
+static void mlx5e_tls_complete_sync_skb(struct sk_buff *skb,
+ struct sk_buff *nskb, u32 tcp_seq,
+ int headln, __be64 rcd_sn)
+{
+ struct mlx5e_tls_metadata *pet;
+ u8 syndrome = SYNDROME_SYNC;
+ struct iphdr *iph;
+ struct tcphdr *th;
+ int data_len, mss;
+
+ nskb->dev = skb->dev;
+ skb_reset_mac_header(nskb);
+ skb_set_network_header(nskb, skb_network_offset(skb));
+ skb_set_transport_header(nskb, skb_transport_offset(skb));
+ memcpy(nskb->data, skb->data, headln);
+ memcpy(nskb->data + headln, &rcd_sn, sizeof(rcd_sn));
+
+ iph = ip_hdr(nskb);
+ iph->tot_len = htons(nskb->len - skb_network_offset(nskb));
+ th = tcp_hdr(nskb);
+ data_len = nskb->len - headln;
+ tcp_seq -= data_len;
+ th->seq = htonl(tcp_seq);
+
+ mss = nskb->dev->mtu - (headln - skb_network_offset(nskb));
+ skb_shinfo(nskb)->gso_size = 0;
+ if (data_len > mss) {
+ skb_shinfo(nskb)->gso_size = mss;
+ skb_shinfo(nskb)->gso_segs = DIV_ROUND_UP(data_len, mss);
+ }
+ skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type;
+
+ pet = (struct mlx5e_tls_metadata *)(nskb->data + sizeof(struct ethhdr));
+ memcpy(pet, &syndrome, sizeof(syndrome));
+ pet->first_seq = htons(tcp_seq);
+
+ /* MLX5 devices don't care about the checksum partial start, offset
+ * and pseudo header
+ */
+ nskb->ip_summed = CHECKSUM_PARTIAL;
+
+ nskb->xmit_more = 1;
+ nskb->queue_mapping = skb->queue_mapping;
+}
+
+static struct sk_buff *
+mlx5e_tls_handle_ooo(struct mlx5e_tls_offload_context *context,
+ struct mlx5e_txqsq *sq, struct sk_buff *skb,
+ struct mlx5e_tx_wqe **wqe,
+ u16 *pi,
+ struct mlx5e_tls *tls)
+{
+ u32 tcp_seq = ntohl(tcp_hdr(skb)->seq);
+ struct sync_info info;
+ struct sk_buff *nskb;
+ int linear_len = 0;
+ int headln;
+ int i;
+
+ sq->stats.tls_ooo++;
+
+ if (mlx5e_tls_get_sync_data(context, tcp_seq, &info)) {
+ /* We might get here if a retransmission reaches the driver
+ * after the relevant record is acked.
+ * It should be safe to drop the packet in this case
+ */
+ atomic64_inc(&tls->sw_stats.tx_tls_drop_no_sync_data);
+ goto err_out;
+ }
+
+ if (unlikely(info.sync_len < 0)) {
+ u32 payload;
+
+ headln = skb_transport_offset(skb) + tcp_hdrlen(skb);
+ payload = skb->len - headln;
+ if (likely(payload <= -info.sync_len))
+ /* SKB payload doesn't require offload
+ */
+ return skb;
+
+ atomic64_inc(&tls->sw_stats.tx_tls_drop_bypass_required);
+ goto err_out;
+ }
+
+ if (unlikely(mlx5e_tls_add_metadata(skb, context->swid))) {
+ atomic64_inc(&tls->sw_stats.tx_tls_drop_metadata);
+ goto err_out;
+ }
+
+ headln = skb_transport_offset(skb) + tcp_hdrlen(skb);
+ linear_len += headln + sizeof(info.rcd_sn);
+ nskb = alloc_skb(linear_len, GFP_ATOMIC);
+ if (unlikely(!nskb)) {
+ atomic64_inc(&tls->sw_stats.tx_tls_drop_resync_alloc);
+ goto err_out;
+ }
+
+ context->expected_seq = tcp_seq + skb->len - headln;
+ skb_put(nskb, linear_len);
+ for (i = 0; i < info.nr_frags; i++)
+ skb_shinfo(nskb)->frags[i] = info.frags[i];
+
+ skb_shinfo(nskb)->nr_frags = info.nr_frags;
+ nskb->data_len = info.sync_len;
+ nskb->len += info.sync_len;
+ sq->stats.tls_resync_bytes += nskb->len;
+ mlx5e_tls_complete_sync_skb(skb, nskb, tcp_seq, headln,
+ cpu_to_be64(info.rcd_sn));
+ mlx5e_sq_xmit(sq, nskb, *wqe, *pi);
+ mlx5e_sq_fetch_wqe(sq, wqe, pi);
+ return skb;
+
+err_out:
+ dev_kfree_skb_any(skb);
+ return NULL;
+}
+
+struct sk_buff *mlx5e_tls_handle_tx_skb(struct net_device *netdev,
+ struct mlx5e_txqsq *sq,
+ struct sk_buff *skb,
+ struct mlx5e_tx_wqe **wqe,
+ u16 *pi)
+{
+ struct mlx5e_priv *priv = netdev_priv(netdev);
+ struct mlx5e_tls_offload_context *context;
+ struct tls_context *tls_ctx;
+ u32 expected_seq;
+ int datalen;
+ u32 skb_seq;
+
+ if (!skb->sk || !tls_is_sk_tx_device_offloaded(skb->sk))
+ goto out;
+
+ datalen = skb->len - (skb_transport_offset(skb) + tcp_hdrlen(skb));
+ if (!datalen)
+ goto out;
+
+ tls_ctx = tls_get_ctx(skb->sk);
+ if (unlikely(tls_ctx->netdev != netdev))
+ goto out;
+
+ skb_seq = ntohl(tcp_hdr(skb)->seq);
+ context = mlx5e_get_tls_tx_context(tls_ctx);
+ expected_seq = context->expected_seq;
+
+ if (unlikely(expected_seq != skb_seq)) {
+ skb = mlx5e_tls_handle_ooo(context, sq, skb, wqe, pi, priv->tls);
+ goto out;
+ }
+
+ if (unlikely(mlx5e_tls_add_metadata(skb, context->swid))) {
+ atomic64_inc(&priv->tls->sw_stats.tx_tls_drop_metadata);
+ dev_kfree_skb_any(skb);
+ skb = NULL;
+ goto out;
+ }
+
+ context->expected_seq = skb_seq + datalen;
+out:
+ return skb;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.h
new file mode 100644
index 0000000..405dfd3
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __MLX5E_TLS_RXTX_H__
+#define __MLX5E_TLS_RXTX_H__
+
+#ifdef CONFIG_MLX5_EN_TLS
+
+#include <linux/skbuff.h>
+#include "en.h"
+
+struct sk_buff *mlx5e_tls_handle_tx_skb(struct net_device *netdev,
+ struct mlx5e_txqsq *sq,
+ struct sk_buff *skb,
+ struct mlx5e_tx_wqe **wqe,
+ u16 *pi);
+
+#endif /* CONFIG_MLX5_EN_TLS */
+
+#endif /* __MLX5E_TLS_RXTX_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_stats.c
new file mode 100644
index 0000000..01468ec
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_stats.c
@@ -0,0 +1,89 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/ethtool.h>
+#include <net/sock.h>
+
+#include "en.h"
+#include "accel/tls.h"
+#include "fpga/sdk.h"
+#include "en_accel/tls.h"
+
+static const struct counter_desc mlx5e_tls_sw_stats_desc[] = {
+ { MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_drop_metadata) },
+ { MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_drop_resync_alloc) },
+ { MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_drop_no_sync_data) },
+ { MLX5E_DECLARE_STAT(struct mlx5e_tls_sw_stats, tx_tls_drop_bypass_required) },
+};
+
+#define MLX5E_READ_CTR_ATOMIC64(ptr, dsc, i) \
+ atomic64_read((atomic64_t *)((char *)(ptr) + (dsc)[i].offset))
+
+#define NUM_TLS_SW_COUNTERS ARRAY_SIZE(mlx5e_tls_sw_stats_desc)
+
+int mlx5e_tls_get_count(struct mlx5e_priv *priv)
+{
+ if (!priv->tls)
+ return 0;
+
+ return NUM_TLS_SW_COUNTERS;
+}
+
+int mlx5e_tls_get_strings(struct mlx5e_priv *priv, uint8_t *data)
+{
+ unsigned int i, idx = 0;
+
+ if (!priv->tls)
+ return 0;
+
+ for (i = 0; i < NUM_TLS_SW_COUNTERS; i++)
+ strcpy(data + (idx++) * ETH_GSTRING_LEN,
+ mlx5e_tls_sw_stats_desc[i].format);
+
+ return NUM_TLS_SW_COUNTERS;
+}
+
+int mlx5e_tls_get_stats(struct mlx5e_priv *priv, u64 *data)
+{
+ int i, idx = 0;
+
+ if (!priv->tls)
+ return 0;
+
+ for (i = 0; i < NUM_TLS_SW_COUNTERS; i++)
+ data[idx++] =
+ MLX5E_READ_CTR_ATOMIC64(&priv->tls->sw_stats,
+ mlx5e_tls_sw_stats_desc, i);
+
+ return NUM_TLS_SW_COUNTERS;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 610d485..f64b5e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -565,7 +565,7 @@
err = mlx5_modify_rule_destination(rule, &dst, NULL);
if (err)
netdev_warn(priv->netdev,
- "Failed to modfiy aRFS rule destination to rq=%d\n", rxq);
+ "Failed to modify aRFS rule destination to rq=%d\n", rxq);
}
static void arfs_handle_work(struct work_struct *work)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index c641d56..0a52f31 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -32,8 +32,8 @@
#include <linux/device.h>
#include <linux/netdevice.h>
#include "en.h"
-
-#define MLX5E_MAX_PRIORITY 8
+#include "en/port.h"
+#include "en/port_buffer.h"
#define MLX5E_100MB (100000)
#define MLX5E_1GB (1000000)
@@ -41,6 +41,9 @@
#define MLX5E_CEE_STATE_UP 1
#define MLX5E_CEE_STATE_DOWN 0
+/* Max supported cable length is 1000 meters */
+#define MLX5E_MAX_CABLE_LENGTH 1000
+
enum {
MLX5E_VENDOR_TC_GROUP_NUM = 7,
MLX5E_LOWEST_PRIO_GROUP = 0,
@@ -338,6 +341,9 @@
pfc->indications[i] = PPORT_PER_PRIO_GET(pstats, i, rx_pause);
}
+ if (MLX5_BUFFER_SUPPORTED(mdev))
+ pfc->delay = priv->dcbx.cable_len;
+
return mlx5_query_port_pfc(mdev, &pfc->pfc_en, NULL);
}
@@ -346,16 +352,39 @@
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;
+ u32 old_cable_len = priv->dcbx.cable_len;
+ struct ieee_pfc pfc_new;
+ u32 changed = 0;
u8 curr_pfc_en;
- int ret;
+ int ret = 0;
+ /* pfc_en */
mlx5_query_port_pfc(mdev, &curr_pfc_en, NULL);
+ if (pfc->pfc_en != curr_pfc_en) {
+ ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
+ if (ret)
+ return ret;
+ mlx5_toggle_port_link(mdev);
+ changed |= MLX5E_PORT_BUFFER_PFC;
+ }
- if (pfc->pfc_en == curr_pfc_en)
- return 0;
+ if (pfc->delay &&
+ pfc->delay < MLX5E_MAX_CABLE_LENGTH &&
+ pfc->delay != priv->dcbx.cable_len) {
+ priv->dcbx.cable_len = pfc->delay;
+ changed |= MLX5E_PORT_BUFFER_CABLE_LEN;
+ }
- ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
- mlx5_toggle_port_link(mdev);
+ if (MLX5_BUFFER_SUPPORTED(mdev)) {
+ pfc_new.pfc_en = (changed & MLX5E_PORT_BUFFER_PFC) ? pfc->pfc_en : curr_pfc_en;
+ if (priv->dcbx.manual_buffer)
+ ret = mlx5e_port_manual_buffer_config(priv, changed,
+ dev->mtu, &pfc_new,
+ NULL, NULL);
+
+ if (ret && (changed & MLX5E_PORT_BUFFER_CABLE_LEN))
+ priv->dcbx.cable_len = old_cable_len;
+ }
if (!ret) {
mlx5e_dbg(HW, priv,
@@ -873,6 +902,90 @@
cee_cfg->pfc_enable = state;
}
+static int mlx5e_dcbnl_getbuffer(struct net_device *dev,
+ struct dcbnl_buffer *dcb_buffer)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5_core_dev *mdev = priv->mdev;
+ struct mlx5e_port_buffer port_buffer;
+ u8 buffer[MLX5E_MAX_PRIORITY];
+ int i, err;
+
+ if (!MLX5_BUFFER_SUPPORTED(mdev))
+ return -EOPNOTSUPP;
+
+ err = mlx5e_port_query_priority2buffer(mdev, buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_PRIORITY; i++)
+ dcb_buffer->prio2buffer[i] = buffer[i];
+
+ err = mlx5e_port_query_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+ dcb_buffer->buffer_size[i] = port_buffer.buffer[i].size;
+ dcb_buffer->total_size = port_buffer.port_buffer_size;
+
+ return 0;
+}
+
+static int mlx5e_dcbnl_setbuffer(struct net_device *dev,
+ struct dcbnl_buffer *dcb_buffer)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5_core_dev *mdev = priv->mdev;
+ struct mlx5e_port_buffer port_buffer;
+ u8 old_prio2buffer[MLX5E_MAX_PRIORITY];
+ u32 *buffer_size = NULL;
+ u8 *prio2buffer = NULL;
+ u32 changed = 0;
+ int i, err;
+
+ if (!MLX5_BUFFER_SUPPORTED(mdev))
+ return -EOPNOTSUPP;
+
+ for (i = 0; i < DCBX_MAX_BUFFERS; i++)
+ mlx5_core_dbg(mdev, "buffer[%d]=%d\n", i, dcb_buffer->buffer_size[i]);
+
+ for (i = 0; i < MLX5E_MAX_PRIORITY; i++)
+ mlx5_core_dbg(mdev, "priority %d buffer%d\n", i, dcb_buffer->prio2buffer[i]);
+
+ err = mlx5e_port_query_priority2buffer(mdev, old_prio2buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_PRIORITY; i++) {
+ if (dcb_buffer->prio2buffer[i] != old_prio2buffer[i]) {
+ changed |= MLX5E_PORT_BUFFER_PRIO2BUFFER;
+ prio2buffer = dcb_buffer->prio2buffer;
+ break;
+ }
+ }
+
+ err = mlx5e_port_query_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ if (port_buffer.buffer[i].size != dcb_buffer->buffer_size[i]) {
+ changed |= MLX5E_PORT_BUFFER_SIZE;
+ buffer_size = dcb_buffer->buffer_size;
+ break;
+ }
+ }
+
+ if (!changed)
+ return 0;
+
+ priv->dcbx.manual_buffer = true;
+ err = mlx5e_port_manual_buffer_config(priv, changed, dev->mtu, NULL,
+ buffer_size, prio2buffer);
+ return err;
+}
+
const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops = {
.ieee_getets = mlx5e_dcbnl_ieee_getets,
.ieee_setets = mlx5e_dcbnl_ieee_setets,
@@ -884,6 +997,8 @@
.ieee_delapp = mlx5e_dcbnl_ieee_delapp,
.getdcbx = mlx5e_dcbnl_getdcbx,
.setdcbx = mlx5e_dcbnl_setdcbx,
+ .dcbnl_getbuffer = mlx5e_dcbnl_getbuffer,
+ .dcbnl_setbuffer = mlx5e_dcbnl_setbuffer,
/* CEE interfaces */
.setall = mlx5e_dcbnl_setall,
@@ -1091,5 +1206,8 @@
if (priv->dcbx.mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
priv->dcbx.cap |= DCB_CAP_DCBX_HOST;
+ priv->dcbx.manual_buffer = false;
+ priv->dcbx.cable_len = MLX5E_DEFAULT_CABLE_LEN;
+
mlx5e_ets_init(priv);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index 602851a..d67adf7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -33,16 +33,30 @@
#include <linux/net_dim.h>
#include "en.h"
+static void
+mlx5e_complete_dim_work(struct net_dim *dim, struct net_dim_cq_moder moder,
+ struct mlx5_core_dev *mdev, struct mlx5_core_cq *mcq)
+{
+ mlx5_core_modify_cq_moderation(mdev, mcq, moder.usec, moder.pkts);
+ dim->state = NET_DIM_START_MEASURE;
+}
+
void mlx5e_rx_dim_work(struct work_struct *work)
{
- struct net_dim *dim = container_of(work, struct net_dim,
- work);
+ struct net_dim *dim = container_of(work, struct net_dim, work);
struct mlx5e_rq *rq = container_of(dim, struct mlx5e_rq, dim);
- struct net_dim_cq_moder cur_profile = net_dim_get_profile(dim->mode,
- dim->profile_ix);
+ struct net_dim_cq_moder cur_moder =
+ net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
- mlx5_core_modify_cq_moderation(rq->mdev, &rq->cq.mcq,
- cur_profile.usec, cur_profile.pkts);
+ mlx5e_complete_dim_work(dim, cur_moder, rq->mdev, &rq->cq.mcq);
+}
- dim->state = NET_DIM_START_MEASURE;
+void mlx5e_tx_dim_work(struct work_struct *work)
+{
+ struct net_dim *dim = container_of(work, struct net_dim, work);
+ struct mlx5e_txqsq *sq = container_of(dim, struct mlx5e_txqsq, dim);
+ struct net_dim_cq_moder cur_moder =
+ net_dim_get_tx_moderation(dim->mode, dim->profile_ix);
+
+ mlx5e_complete_dim_work(dim, cur_moder, sq->cq.mdev, &sq->cq.mcq);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 37fd024..42bd256 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -31,6 +31,7 @@
*/
#include "en.h"
+#include "en/port.h"
void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv,
struct ethtool_drvinfo *drvinfo)
@@ -59,18 +60,16 @@
struct ptys2ethtool_config {
__ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
__ETHTOOL_DECLARE_LINK_MODE_MASK(advertised);
- u32 speed;
};
static struct ptys2ethtool_config ptys2ethtool_table[MLX5E_LINK_MODES_NUMBER];
-#define MLX5_BUILD_PTYS2ETHTOOL_CONFIG(reg_, speed_, ...) \
+#define MLX5_BUILD_PTYS2ETHTOOL_CONFIG(reg_, ...) \
({ \
struct ptys2ethtool_config *cfg; \
const unsigned int modes[] = { __VA_ARGS__ }; \
unsigned int i; \
cfg = &ptys2ethtool_table[reg_]; \
- cfg->speed = speed_; \
bitmap_zero(cfg->supported, \
__ETHTOOL_LINK_MODE_MASK_NBITS); \
bitmap_zero(cfg->advertised, \
@@ -83,55 +82,55 @@
void mlx5e_build_ptys2ethtool_map(void)
{
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_CX_SGMII, SPEED_1000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_CX_SGMII,
ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_KX, SPEED_1000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_KX,
ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CX4, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CX4,
ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KX4, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KX4,
ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KR, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KR,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_20GBASE_KR2, SPEED_20000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_20GBASE_KR2,
ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_CR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_CR4,
ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_KR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_KR4,
ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_56GBASE_R4, SPEED_56000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_56GBASE_R4,
ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CR, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CR,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_SR, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_SR,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_ER, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_ER,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_SR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_SR4,
ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_LR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_LR4,
ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_SR2, SPEED_50000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_SR2,
ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_CR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_CR4,
ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_SR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_SR4,
ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_KR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_KR4,
ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_LR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_LR4,
ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_T, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_T,
ETHTOOL_LINK_MODE_10000baseT_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_CR, SPEED_25000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_CR,
ETHTOOL_LINK_MODE_25000baseCR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_KR, SPEED_25000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_KR,
ETHTOOL_LINK_MODE_25000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_SR, SPEED_25000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_SR,
ETHTOOL_LINK_MODE_25000baseSR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_CR2, SPEED_50000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_CR2,
ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_KR2, SPEED_50000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_KR2,
ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT);
}
@@ -389,14 +388,20 @@
int mlx5e_ethtool_get_coalesce(struct mlx5e_priv *priv,
struct ethtool_coalesce *coal)
{
+ struct net_dim_cq_moder *rx_moder, *tx_moder;
+
if (!MLX5_CAP_GEN(priv->mdev, cq_moderation))
return -EOPNOTSUPP;
- coal->rx_coalesce_usecs = priv->channels.params.rx_cq_moderation.usec;
- coal->rx_max_coalesced_frames = priv->channels.params.rx_cq_moderation.pkts;
- coal->tx_coalesce_usecs = priv->channels.params.tx_cq_moderation.usec;
- coal->tx_max_coalesced_frames = priv->channels.params.tx_cq_moderation.pkts;
- coal->use_adaptive_rx_coalesce = priv->channels.params.rx_dim_enabled;
+ rx_moder = &priv->channels.params.rx_cq_moderation;
+ coal->rx_coalesce_usecs = rx_moder->usec;
+ coal->rx_max_coalesced_frames = rx_moder->pkts;
+ coal->use_adaptive_rx_coalesce = priv->channels.params.rx_dim_enabled;
+
+ tx_moder = &priv->channels.params.tx_cq_moderation;
+ coal->tx_coalesce_usecs = tx_moder->usec;
+ coal->tx_max_coalesced_frames = tx_moder->pkts;
+ coal->use_adaptive_tx_coalesce = priv->channels.params.tx_dim_enabled;
return 0;
}
@@ -438,6 +443,7 @@
int mlx5e_ethtool_set_coalesce(struct mlx5e_priv *priv,
struct ethtool_coalesce *coal)
{
+ struct net_dim_cq_moder *rx_moder, *tx_moder;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_channels new_channels = {};
int err = 0;
@@ -463,11 +469,15 @@
mutex_lock(&priv->state_lock);
new_channels.params = priv->channels.params;
- new_channels.params.tx_cq_moderation.usec = coal->tx_coalesce_usecs;
- new_channels.params.tx_cq_moderation.pkts = coal->tx_max_coalesced_frames;
- new_channels.params.rx_cq_moderation.usec = coal->rx_coalesce_usecs;
- new_channels.params.rx_cq_moderation.pkts = coal->rx_max_coalesced_frames;
- new_channels.params.rx_dim_enabled = !!coal->use_adaptive_rx_coalesce;
+ rx_moder = &new_channels.params.rx_cq_moderation;
+ rx_moder->usec = coal->rx_coalesce_usecs;
+ rx_moder->pkts = coal->rx_max_coalesced_frames;
+ new_channels.params.rx_dim_enabled = !!coal->use_adaptive_rx_coalesce;
+
+ tx_moder = &new_channels.params.tx_cq_moderation;
+ tx_moder->usec = coal->tx_coalesce_usecs;
+ tx_moder->pkts = coal->tx_max_coalesced_frames;
+ new_channels.params.tx_dim_enabled = !!coal->use_adaptive_tx_coalesce;
if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
priv->channels.params = new_channels.params;
@@ -475,7 +485,9 @@
}
/* we are opened */
- reset = !!coal->use_adaptive_rx_coalesce != priv->channels.params.rx_dim_enabled;
+ reset = (!!coal->use_adaptive_rx_coalesce != priv->channels.params.rx_dim_enabled) ||
+ (!!coal->use_adaptive_tx_coalesce != priv->channels.params.tx_dim_enabled);
+
if (!reset) {
mlx5e_set_priv_channels_coalesce(priv, coal);
priv->channels.params = new_channels.params;
@@ -604,43 +616,24 @@
}
}
-int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
-{
- u32 max_speed = 0;
- u32 proto_cap;
- int err;
- int i;
-
- err = mlx5_query_port_proto_cap(mdev, &proto_cap, MLX5_PTYS_EN);
- if (err)
- return err;
-
- for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i)
- if (proto_cap & MLX5E_PROT_MASK(i))
- max_speed = max(max_speed, ptys2ethtool_table[i].speed);
-
- *speed = max_speed;
- return 0;
-}
-
static void get_speed_duplex(struct net_device *netdev,
u32 eth_proto_oper,
struct ethtool_link_ksettings *link_ksettings)
{
- int i;
u32 speed = SPEED_UNKNOWN;
u8 duplex = DUPLEX_UNKNOWN;
if (!netif_carrier_ok(netdev))
goto out;
- for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
- if (eth_proto_oper & MLX5E_PROT_MASK(i)) {
- speed = ptys2ethtool_table[i].speed;
- duplex = DUPLEX_FULL;
- break;
- }
+ speed = mlx5e_port_ptys2speed(eth_proto_oper);
+ if (!speed) {
+ speed = SPEED_UNKNOWN;
+ goto out;
}
+
+ duplex = DUPLEX_FULL;
+
out:
link_ksettings->base.speed = speed;
link_ksettings->base.duplex = duplex;
@@ -798,18 +791,6 @@
return ptys_modes;
}
-static u32 mlx5e_ethtool2ptys_speed_link(u32 speed)
-{
- u32 i, speed_links = 0;
-
- for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
- if (ptys2ethtool_table[i].speed == speed)
- speed_links |= MLX5E_PROT_MASK(i);
- }
-
- return speed_links;
-}
-
static int mlx5e_set_link_ksettings(struct net_device *netdev,
const struct ethtool_link_ksettings *link_ksettings)
{
@@ -829,7 +810,7 @@
link_modes = link_ksettings->base.autoneg == AUTONEG_ENABLE ?
mlx5e_ethtool2ptys_adver_link(link_ksettings->link_modes.advertising) :
- mlx5e_ethtool2ptys_speed_link(speed);
+ mlx5e_port_speed2linkmodes(speed);
err = mlx5_query_port_proto_cap(mdev, ð_proto_cap, MLX5_PTYS_EN);
if (err) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index f64dda2..76cc10e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -277,7 +277,6 @@
}
break;
case MLX5E_VLAN_RULE_TYPE_MATCH_CTAG_VID:
- mlx5e_vport_context_update_vlans(priv);
if (priv->fs.vlan.active_cvlans_rule[vid]) {
mlx5_del_flow_rules(priv->fs.vlan.active_cvlans_rule[vid]);
priv->fs.vlan.active_cvlans_rule[vid] = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b29c1d9..cee44c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -35,14 +35,18 @@
#include <linux/mlx5/fs.h>
#include <net/vxlan.h>
#include <linux/bpf.h>
+#include <net/page_pool.h>
#include "eswitch.h"
#include "en.h"
#include "en_tc.h"
#include "en_rep.h"
#include "en_accel/ipsec.h"
#include "en_accel/ipsec_rxtx.h"
+#include "en_accel/tls.h"
#include "accel/ipsec.h"
+#include "accel/tls.h"
#include "vxlan.h"
+#include "en/port.h"
struct mlx5e_rq_param {
u32 rqc[MLX5_ST_SZ_DW(rqc)];
@@ -389,10 +393,11 @@
struct mlx5e_rq_param *rqp,
struct mlx5e_rq *rq)
{
+ struct page_pool_params pp_params = { 0 };
struct mlx5_core_dev *mdev = c->mdev;
void *rqc = rqp->rqc;
void *rqc_wq = MLX5_ADDR_OF(rqc, rqc, wq);
- u32 byte_count;
+ u32 byte_count, pool_size;
int npages;
int wq_sz;
int err;
@@ -432,9 +437,12 @@
rq->buff.map_dir = rq->xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
rq->buff.headroom = mlx5e_get_rq_headroom(mdev, params);
+ pool_size = 1 << params->log_rq_mtu_frames;
switch (rq->wq_type) {
case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
+
+ pool_size = MLX5_MPWRQ_PAGES_PER_WQE << mlx5e_mpwqe_get_log_rq_size(params);
rq->post_wqes = mlx5e_post_rx_mpwqes;
rq->dealloc_wqe = mlx5e_dealloc_rx_mpwqe;
@@ -512,6 +520,32 @@
rq->mkey_be = c->mkey_be;
}
+ /* Create a page_pool and register it with rxq */
+ pp_params.order = rq->buff.page_order;
+ pp_params.flags = 0; /* No-internal DMA mapping in page_pool */
+ pp_params.pool_size = pool_size;
+ pp_params.nid = cpu_to_node(c->cpu);
+ pp_params.dev = c->pdev;
+ pp_params.dma_dir = rq->buff.map_dir;
+
+ /* page_pool can be used even when there is no rq->xdp_prog,
+ * given page_pool does not handle DMA mapping there is no
+ * required state to clear. And page_pool gracefully handle
+ * elevated refcnt.
+ */
+ rq->page_pool = page_pool_create(&pp_params);
+ if (IS_ERR(rq->page_pool)) {
+ if (rq->wq_type != MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ)
+ kfree(rq->wqe.frag_info);
+ err = PTR_ERR(rq->page_pool);
+ rq->page_pool = NULL;
+ goto err_rq_wq_destroy;
+ }
+ err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
+ MEM_TYPE_PAGE_POOL, rq->page_pool);
+ if (err)
+ goto err_rq_wq_destroy;
+
for (i = 0; i < wq_sz; i++) {
struct mlx5e_rx_wqe *wqe = mlx5_wq_ll_get_wqe(&rq->wq, i);
@@ -548,6 +582,8 @@
if (rq->xdp_prog)
bpf_prog_put(rq->xdp_prog);
xdp_rxq_info_unreg(&rq->xdp_rxq);
+ if (rq->page_pool)
+ page_pool_destroy(rq->page_pool);
mlx5_wq_destroy(&rq->wq_ctrl);
return err;
@@ -561,6 +597,8 @@
bpf_prog_put(rq->xdp_prog);
xdp_rxq_info_unreg(&rq->xdp_rxq);
+ if (rq->page_pool)
+ page_pool_destroy(rq->page_pool);
switch (rq->wq_type) {
case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
@@ -710,23 +748,24 @@
mlx5_core_destroy_rq(rq->mdev, rq->rqn);
}
-static int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq)
+static int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time)
{
- unsigned long exp_time = jiffies + msecs_to_jiffies(20000);
+ unsigned long exp_time = jiffies + msecs_to_jiffies(wait_time);
struct mlx5e_channel *c = rq->channel;
struct mlx5_wq_ll *wq = &rq->wq;
u16 min_wqes = mlx5_min_rx_wqes(rq->wq_type, mlx5_wq_ll_get_size(wq));
- while (time_before(jiffies, exp_time)) {
+ do {
if (wq->cur_sz >= min_wqes)
return 0;
msleep(20);
- }
+ } while (time_before(jiffies, exp_time));
- netdev_warn(c->netdev, "Failed to get min RX wqes on RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n",
- rq->rqn, wq->cur_sz, min_wqes);
+ netdev_warn(c->netdev, "Failed to get min RX wqes on Channel[%d] RQN[0x%x] wq cur_sz(%d) min_rx_wqes(%d)\n",
+ c->ix, rq->rqn, wq->cur_sz, min_wqes);
+
return -ETIMEDOUT;
}
@@ -782,7 +821,7 @@
goto err_destroy_rq;
if (params->rx_dim_enabled)
- c->rq.state |= BIT(MLX5E_RQ_STATE_AM);
+ __set_bit(MLX5E_RQ_STATE_AM, &c->rq.state);
return 0;
@@ -979,6 +1018,8 @@
INIT_WORK(&sq->recover.recover_work, mlx5e_sq_recover);
if (MLX5_IPSEC_DEV(c->priv->mdev))
set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);
+ if (mlx5_accel_is_tls_device(c->priv->mdev))
+ set_bit(MLX5E_SQ_STATE_TLS, &sq->state);
param->wq.db_numa_node = cpu_to_node(c->cpu);
err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq, &sq->wq_ctrl);
@@ -990,6 +1031,9 @@
if (err)
goto err_sq_wq_destroy;
+ INIT_WORK(&sq->dim.work, mlx5e_tx_dim_work);
+ sq->dim.mode = params->tx_cq_moderation.cq_period_mode;
+
sq->edge = (sq->wq.sz_m1 + 1) - MLX5_SEND_WQE_MAX_WQEBBS;
return 0;
@@ -1153,6 +1197,9 @@
if (tx_rate)
mlx5e_set_sq_maxrate(c->netdev, sq, tx_rate);
+ if (params->tx_dim_enabled)
+ sq->state |= BIT(MLX5E_SQ_STATE_AM);
+
return 0;
err_free_txqsq:
@@ -1896,7 +1943,6 @@
MLX5_SET(rqc, rqc, scatter_fcs, params->scatter_fcs_en);
param->wq.buf_numa_node = dev_to_node(&mdev->pdev->dev);
- param->wq.linear = 1;
}
static void mlx5e_build_drop_rq_param(struct mlx5e_priv *priv,
@@ -2084,13 +2130,11 @@
int err = 0;
int i;
- for (i = 0; i < chs->num; i++) {
- err = mlx5e_wait_for_min_rx_wqes(&chs->c[i]->rq);
- if (err)
- break;
- }
+ for (i = 0; i < chs->num; i++)
+ err |= mlx5e_wait_for_min_rx_wqes(&chs->c[i]->rq,
+ err ? 0 : 20000);
- return err;
+ return err ? -ETIMEDOUT : 0;
}
static void mlx5e_deactivate_channels(struct mlx5e_channels *chs)
@@ -3093,22 +3137,23 @@
#ifdef CONFIG_MLX5_ESWITCH
static int mlx5e_setup_tc_cls_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *cls_flower)
+ struct tc_cls_flower_offload *cls_flower,
+ int flags)
{
switch (cls_flower->command) {
case TC_CLSFLOWER_REPLACE:
- return mlx5e_configure_flower(priv, cls_flower);
+ return mlx5e_configure_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_DESTROY:
- return mlx5e_delete_flower(priv, cls_flower);
+ return mlx5e_delete_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_STATS:
- return mlx5e_stats_flower(priv, cls_flower);
+ return mlx5e_stats_flower(priv, cls_flower, flags);
default:
return -EOPNOTSUPP;
}
}
-int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv)
+static int mlx5e_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
{
struct mlx5e_priv *priv = cb_priv;
@@ -3117,7 +3162,7 @@
switch (type) {
case TC_SETUP_CLSFLOWER:
- return mlx5e_setup_tc_cls_flower(priv, type_data);
+ return mlx5e_setup_tc_cls_flower(priv, type_data, MLX5E_TC_INGRESS);
default:
return -EOPNOTSUPP;
}
@@ -4038,7 +4083,7 @@
u32 link_speed = 0;
u32 pci_bw = 0;
- mlx5e_get_max_linkspeed(mdev, &link_speed);
+ mlx5e_port_max_linkspeed(mdev, &link_speed);
pci_bw = pcie_bandwidth_available(mdev->pdev, NULL, NULL, NULL);
mlx5_core_dbg_once(mdev, "Max link speed = %d, PCI BW = %d\n",
link_speed, pci_bw);
@@ -4049,18 +4094,48 @@
link_speed > MLX5E_SLOW_PCI_RATIO * pci_bw;
}
+static struct net_dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
+{
+ struct net_dim_cq_moder moder;
+
+ moder.cq_period_mode = cq_period_mode;
+ moder.pkts = MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS;
+ moder.usec = MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC;
+ if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
+ moder.usec = MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE;
+
+ return moder;
+}
+
+static struct net_dim_cq_moder mlx5e_get_def_rx_moderation(u8 cq_period_mode)
+{
+ struct net_dim_cq_moder moder;
+
+ moder.cq_period_mode = cq_period_mode;
+ moder.pkts = MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS;
+ moder.usec = MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
+ if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
+ moder.usec = MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE;
+
+ return moder;
+}
+
+static u8 mlx5_to_net_dim_cq_period_mode(u8 cq_period_mode)
+{
+ return cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE ?
+ NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE :
+ NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+}
+
void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
{
- params->tx_cq_moderation.cq_period_mode = cq_period_mode;
+ if (params->tx_dim_enabled) {
+ u8 dim_period_mode = mlx5_to_net_dim_cq_period_mode(cq_period_mode);
- params->tx_cq_moderation.pkts =
- MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS;
- params->tx_cq_moderation.usec =
- MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC;
-
- if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
- params->tx_cq_moderation.usec =
- MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE;
+ params->tx_cq_moderation = net_dim_get_def_tx_moderation(dim_period_mode);
+ } else {
+ params->tx_cq_moderation = mlx5e_get_def_tx_moderation(cq_period_mode);
+ }
MLX5E_SET_PFLAG(params, MLX5E_PFLAG_TX_CQE_BASED_MODER,
params->tx_cq_moderation.cq_period_mode ==
@@ -4069,28 +4144,12 @@
void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode)
{
- params->rx_cq_moderation.cq_period_mode = cq_period_mode;
-
- params->rx_cq_moderation.pkts =
- MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS;
- params->rx_cq_moderation.usec =
- MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
-
- if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
- params->rx_cq_moderation.usec =
- MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE;
-
if (params->rx_dim_enabled) {
- switch (cq_period_mode) {
- case MLX5_CQ_PERIOD_MODE_START_FROM_CQE:
- params->rx_cq_moderation =
- net_dim_get_def_profile(NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE);
- break;
- case MLX5_CQ_PERIOD_MODE_START_FROM_EQE:
- default:
- params->rx_cq_moderation =
- net_dim_get_def_profile(NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE);
- }
+ u8 dim_period_mode = mlx5_to_net_dim_cq_period_mode(cq_period_mode);
+
+ params->rx_cq_moderation = net_dim_get_def_rx_moderation(dim_period_mode);
+ } else {
+ params->rx_cq_moderation = mlx5e_get_def_rx_moderation(cq_period_mode);
}
MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_BASED_MODER,
@@ -4154,6 +4213,7 @@
MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
+ params->tx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
mlx5e_set_rx_cq_mode_params(params, rx_cq_period_mode);
mlx5e_set_tx_cq_mode_params(params, MLX5_CQ_PERIOD_MODE_START_FROM_EQE);
@@ -4320,6 +4380,7 @@
#endif
mlx5e_ipsec_build_netdev(priv);
+ mlx5e_tls_build_netdev(priv);
}
static void mlx5e_create_q_counters(struct mlx5e_priv *priv)
@@ -4361,12 +4422,16 @@
err = mlx5e_ipsec_init(priv);
if (err)
mlx5_core_err(mdev, "IPSec initialization failed, %d\n", err);
+ err = mlx5e_tls_init(priv);
+ if (err)
+ mlx5_core_err(mdev, "TLS initialization failed, %d\n", err);
mlx5e_build_nic_netdev(netdev);
mlx5e_vxlan_init(priv);
}
static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
{
+ mlx5e_tls_cleanup(priv);
mlx5e_ipsec_cleanup(priv);
mlx5e_vxlan_cleanup(priv);
}
@@ -4398,7 +4463,7 @@
goto err_destroy_direct_tirs;
}
- err = mlx5e_tc_init(priv);
+ err = mlx5e_tc_nic_init(priv);
if (err)
goto err_destroy_flow_steering;
@@ -4419,7 +4484,7 @@
static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
{
- mlx5e_tc_cleanup(priv);
+ mlx5e_tc_nic_cleanup(priv);
mlx5e_destroy_flow_steering(priv);
mlx5e_destroy_direct_tirs(priv);
mlx5e_destroy_indirect_tirs(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 876c3e4..c3034f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -66,18 +66,36 @@
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_bytes) },
};
-#define NUM_VPORT_REP_COUNTERS ARRAY_SIZE(sw_rep_stats_desc)
+struct vport_stats {
+ u64 vport_rx_packets;
+ u64 vport_tx_packets;
+ u64 vport_rx_bytes;
+ u64 vport_tx_bytes;
+};
+
+static const struct counter_desc vport_rep_stats_desc[] = {
+ { MLX5E_DECLARE_STAT(struct vport_stats, vport_rx_packets) },
+ { MLX5E_DECLARE_STAT(struct vport_stats, vport_rx_bytes) },
+ { MLX5E_DECLARE_STAT(struct vport_stats, vport_tx_packets) },
+ { MLX5E_DECLARE_STAT(struct vport_stats, vport_tx_bytes) },
+};
+
+#define NUM_VPORT_REP_SW_COUNTERS ARRAY_SIZE(sw_rep_stats_desc)
+#define NUM_VPORT_REP_HW_COUNTERS ARRAY_SIZE(vport_rep_stats_desc)
static void mlx5e_rep_get_strings(struct net_device *dev,
u32 stringset, uint8_t *data)
{
- int i;
+ int i, j;
switch (stringset) {
case ETH_SS_STATS:
- for (i = 0; i < NUM_VPORT_REP_COUNTERS; i++)
+ for (i = 0; i < NUM_VPORT_REP_SW_COUNTERS; i++)
strcpy(data + (i * ETH_GSTRING_LEN),
sw_rep_stats_desc[i].format);
+ for (j = 0; j < NUM_VPORT_REP_HW_COUNTERS; j++, i++)
+ strcpy(data + (i * ETH_GSTRING_LEN),
+ vport_rep_stats_desc[j].format);
break;
}
}
@@ -140,7 +158,7 @@
struct ethtool_stats *stats, u64 *data)
{
struct mlx5e_priv *priv = netdev_priv(dev);
- int i;
+ int i, j;
if (!data)
return;
@@ -148,18 +166,23 @@
mutex_lock(&priv->state_lock);
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_rep_update_sw_counters(priv);
+ mlx5e_rep_update_hw_counters(priv);
mutex_unlock(&priv->state_lock);
- for (i = 0; i < NUM_VPORT_REP_COUNTERS; i++)
+ for (i = 0; i < NUM_VPORT_REP_SW_COUNTERS; i++)
data[i] = MLX5E_READ_CTR64_CPU(&priv->stats.sw,
sw_rep_stats_desc, i);
+
+ for (j = 0; j < NUM_VPORT_REP_HW_COUNTERS; j++, i++)
+ data[i] = MLX5E_READ_CTR64_CPU(&priv->stats.vf_vport,
+ vport_rep_stats_desc, j);
}
static int mlx5e_rep_get_sset_count(struct net_device *dev, int sset)
{
switch (sset) {
case ETH_SS_STATS:
- return NUM_VPORT_REP_COUNTERS;
+ return NUM_VPORT_REP_SW_COUNTERS + NUM_VPORT_REP_HW_COUNTERS;
default:
return -EOPNOTSUPP;
}
@@ -681,8 +704,8 @@
goto unlock;
if (!mlx5_modify_vport_admin_state(priv->mdev,
- MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT,
- rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_UP))
+ MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT,
+ rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_UP))
netif_carrier_on(dev);
unlock:
@@ -699,8 +722,8 @@
mutex_lock(&priv->state_lock);
mlx5_modify_vport_admin_state(priv->mdev,
- MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT,
- rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_DOWN);
+ MLX5_QUERY_VPORT_STATE_IN_OP_MOD_ESW_VPORT,
+ rep->vport, MLX5_ESW_VPORT_ADMIN_STATE_DOWN);
ret = mlx5e_close_locked(dev);
mutex_unlock(&priv->state_lock);
return ret;
@@ -723,15 +746,31 @@
static int
mlx5e_rep_setup_tc_cls_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *cls_flower)
+ struct tc_cls_flower_offload *cls_flower, int flags)
{
switch (cls_flower->command) {
case TC_CLSFLOWER_REPLACE:
- return mlx5e_configure_flower(priv, cls_flower);
+ return mlx5e_configure_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_DESTROY:
- return mlx5e_delete_flower(priv, cls_flower);
+ return mlx5e_delete_flower(priv, cls_flower, flags);
case TC_CLSFLOWER_STATS:
- return mlx5e_stats_flower(priv, cls_flower);
+ return mlx5e_stats_flower(priv, cls_flower, flags);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int mlx5e_rep_setup_tc_cb_egdev(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct mlx5e_priv *priv = cb_priv;
+
+ if (!tc_cls_can_offload_and_chain0(priv->netdev, type_data))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return mlx5e_rep_setup_tc_cls_flower(priv, type_data, MLX5E_TC_EGRESS);
default:
return -EOPNOTSUPP;
}
@@ -747,7 +786,7 @@
switch (type) {
case TC_SETUP_CLSFLOWER:
- return mlx5e_rep_setup_tc_cls_flower(priv, type_data);
+ return mlx5e_rep_setup_tc_cls_flower(priv, type_data, MLX5E_TC_INGRESS);
default:
return -EOPNOTSUPP;
}
@@ -965,14 +1004,8 @@
}
rpriv->vport_rx_rule = flow_rule;
- err = mlx5e_tc_init(priv);
- if (err)
- goto err_del_flow_rule;
-
return 0;
-err_del_flow_rule:
- mlx5_del_flow_rules(rpriv->vport_rx_rule);
err_destroy_direct_tirs:
mlx5e_destroy_direct_tirs(priv);
err_destroy_direct_rqts:
@@ -984,7 +1017,6 @@
{
struct mlx5e_rep_priv *rpriv = priv->ppriv;
- mlx5e_tc_cleanup(priv);
mlx5_del_flow_rules(rpriv->vport_rx_rule);
mlx5e_destroy_direct_tirs(priv);
mlx5e_destroy_direct_rqts(priv);
@@ -1042,8 +1074,15 @@
if (err)
goto err_remove_sqs;
+ /* init shared tc flow table */
+ err = mlx5e_tc_esw_init(&rpriv->tc_ht);
+ if (err)
+ goto err_neigh_cleanup;
+
return 0;
+err_neigh_cleanup:
+ mlx5e_rep_neigh_cleanup(rpriv);
err_remove_sqs:
mlx5e_remove_sqs_fwd_rules(priv);
return err;
@@ -1058,9 +1097,8 @@
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_remove_sqs_fwd_rules(priv);
- /* clean (and re-init) existing uplink offloaded TC rules */
- mlx5e_tc_cleanup(priv);
- mlx5e_tc_init(priv);
+ /* clean uplink offloaded TC rules, delete shared tc flow table */
+ mlx5e_tc_esw_cleanup(&rpriv->tc_ht);
mlx5e_rep_neigh_cleanup(rpriv);
}
@@ -1107,7 +1145,7 @@
uplink_rpriv = mlx5_eswitch_get_uplink_priv(dev->priv.eswitch, REP_ETH);
upriv = netdev_priv(uplink_rpriv->netdev);
- err = tc_setup_cb_egdev_register(netdev, mlx5e_setup_tc_block_cb,
+ err = tc_setup_cb_egdev_register(netdev, mlx5e_rep_setup_tc_cb_egdev,
upriv);
if (err)
goto err_neigh_cleanup;
@@ -1122,7 +1160,7 @@
return 0;
err_egdev_cleanup:
- tc_setup_cb_egdev_unregister(netdev, mlx5e_setup_tc_block_cb,
+ tc_setup_cb_egdev_unregister(netdev, mlx5e_rep_setup_tc_cb_egdev,
upriv);
err_neigh_cleanup:
@@ -1151,7 +1189,7 @@
uplink_rpriv = mlx5_eswitch_get_uplink_priv(priv->mdev->priv.eswitch,
REP_ETH);
upriv = netdev_priv(uplink_rpriv->netdev);
- tc_setup_cb_egdev_unregister(netdev, mlx5e_setup_tc_block_cb,
+ tc_setup_cb_egdev_unregister(netdev, mlx5e_rep_setup_tc_cb_egdev,
upriv);
mlx5e_rep_neigh_cleanup(rpriv);
mlx5e_detach_netdev(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index b9b481f..844d32d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -59,6 +59,7 @@
struct net_device *netdev;
struct mlx5_flow_handle *vport_rx_rule;
struct list_head vport_sqs_list;
+ struct rhashtable tc_ht; /* valid for uplink rep */
};
static inline
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1ff0b0e..a6a92c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -37,6 +37,7 @@
#include <linux/bpf_trace.h>
#include <net/busy_poll.h>
#include <net/ip6_checksum.h>
+#include <net/page_pool.h>
#include "en.h"
#include "en_tc.h"
#include "eswitch.h"
@@ -221,7 +222,7 @@
if (mlx5e_rx_cache_get(rq, dma_info))
return 0;
- dma_info->page = dev_alloc_pages(rq->buff.page_order);
+ dma_info->page = page_pool_dev_alloc_pages(rq->page_pool);
if (unlikely(!dma_info->page))
return -ENOMEM;
@@ -236,15 +237,26 @@
return 0;
}
+static void mlx5e_page_dma_unmap(struct mlx5e_rq *rq,
+ struct mlx5e_dma_info *dma_info)
+{
+ dma_unmap_page(rq->pdev, dma_info->addr, RQ_PAGE_SIZE(rq),
+ rq->buff.map_dir);
+}
+
void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info,
bool recycle)
{
- if (likely(recycle) && mlx5e_rx_cache_put(rq, dma_info))
- return;
+ if (likely(recycle)) {
+ if (mlx5e_rx_cache_put(rq, dma_info))
+ return;
- dma_unmap_page(rq->pdev, dma_info->addr, RQ_PAGE_SIZE(rq),
- rq->buff.map_dir);
- put_page(dma_info->page);
+ mlx5e_page_dma_unmap(rq, dma_info);
+ page_pool_recycle_direct(rq->page_pool, dma_info->page);
+ } else {
+ mlx5e_page_dma_unmap(rq, dma_info);
+ put_page(dma_info->page);
+ }
}
static inline bool mlx5e_page_reuse(struct mlx5e_rq *rq,
@@ -438,7 +450,7 @@
struct mlx5_wq_ll *wq = &rq->wq;
int err;
- if (unlikely(!MLX5E_TEST_BIT(rq->state, MLX5E_RQ_STATE_ENABLED)))
+ if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, &rq->state)))
return false;
if (mlx5_wq_ll_is_full(wq))
@@ -496,7 +508,7 @@
struct mlx5e_icosq *sq = container_of(cq, struct mlx5e_icosq, cq);
struct mlx5_cqe64 *cqe;
- if (unlikely(!MLX5E_TEST_BIT(sq->state, MLX5E_SQ_STATE_ENABLED)))
+ if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state)))
return;
cqe = mlx5_cqwq_get_cqe(&cq->wq);
@@ -513,7 +525,7 @@
{
struct mlx5_wq_ll *wq = &rq->wq;
- if (unlikely(!MLX5E_TEST_BIT(rq->state, MLX5E_RQ_STATE_ENABLED)))
+ if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, &rq->state)))
return false;
mlx5e_poll_ico_cq(&rq->channel->icosq.cq, rq);
@@ -711,11 +723,10 @@
struct mlx5e_rq *rq,
struct sk_buff *skb)
{
+ u8 lro_num_seg = be32_to_cpu(cqe->srqn) >> 24;
struct net_device *netdev = rq->netdev;
- int lro_num_seg;
skb->mac_len = ETH_HLEN;
- lro_num_seg = be32_to_cpu(cqe->srqn) >> 24;
if (lro_num_seg > 1) {
mlx5e_lro_update_hdr(skb, cqe, cqe_bcnt);
skb_shinfo(skb)->gso_size = DIV_ROUND_UP(cqe_bcnt, lro_num_seg);
@@ -838,13 +849,14 @@
}
/* returns true if packet was consumed by xdp */
-static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,
- struct mlx5e_dma_info *di,
- void *va, u16 *rx_headroom, u32 *len)
+static inline bool mlx5e_xdp_handle(struct mlx5e_rq *rq,
+ struct mlx5e_dma_info *di,
+ void *va, u16 *rx_headroom, u32 *len)
{
- const struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
+ struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
struct xdp_buff xdp;
u32 act;
+ int err;
if (!prog)
return false;
@@ -865,6 +877,15 @@
if (unlikely(!mlx5e_xmit_xdp_frame(rq, di, &xdp)))
trace_xdp_exception(rq->netdev, prog, act);
return true;
+ case XDP_REDIRECT:
+ /* When XDP enabled then page-refcnt==1 here */
+ err = xdp_do_redirect(rq->netdev, &xdp, prog);
+ if (!err) {
+ __set_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags);
+ rq->xdpsq.db.redirect_flush = true;
+ mlx5e_page_dma_unmap(rq, di);
+ }
+ return true;
default:
bpf_warn_invalid_xdp_action(act);
case XDP_ABORTED:
@@ -910,6 +931,7 @@
dma_sync_single_range_for_cpu(rq->pdev, di->addr, wi->offset,
frag_size, DMA_FROM_DEVICE);
+ prefetchw(va); /* xdp_frame data area */
prefetch(data);
wi->offset += frag_size;
@@ -1152,7 +1174,7 @@
struct mlx5_cqe64 *cqe;
int work_done = 0;
- if (unlikely(!MLX5E_TEST_BIT(rq->state, MLX5E_RQ_STATE_ENABLED)))
+ if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, &rq->state)))
return 0;
if (cq->decmprs_left)
@@ -1182,6 +1204,11 @@
xdpsq->db.doorbell = false;
}
+ if (xdpsq->db.redirect_flush) {
+ xdp_do_flush_map();
+ xdpsq->db.redirect_flush = false;
+ }
+
mlx5_cqwq_update_db_record(&cq->wq);
/* ensure cq space is freed before enabling more cqes */
@@ -1200,7 +1227,7 @@
sq = container_of(cq, struct mlx5e_xdpsq, cq);
- if (unlikely(!MLX5E_TEST_BIT(sq->state, MLX5E_SQ_STATE_ENABLED)))
+ if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state)))
return false;
cqe = mlx5_cqwq_get_cqe(&cq->wq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index b08c944..e17919c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -32,6 +32,7 @@
#include "en.h"
#include "en_accel/ipsec.h"
+#include "en_accel/tls.h"
static const struct counter_desc sw_stats_desc[] = {
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_packets) },
@@ -43,6 +44,12 @@
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_inner_packets) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tso_inner_bytes) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_added_vlan_packets) },
+
+#ifdef CONFIG_MLX5_EN_TLS
+ { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tls_ooo) },
+ { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_tls_resync_bytes) },
+#endif
+
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_packets) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_lro_bytes) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_removed_vlan_packets) },
@@ -161,6 +168,10 @@
s->tx_csum_partial_inner += sq_stats->csum_partial_inner;
s->tx_csum_none += sq_stats->csum_none;
s->tx_csum_partial += sq_stats->csum_partial;
+#ifdef CONFIG_MLX5_EN_TLS
+ s->tx_tls_ooo += sq_stats->tls_ooo;
+ s->tx_tls_resync_bytes += sq_stats->tls_resync_bytes;
+#endif
}
}
@@ -1065,6 +1076,22 @@
mlx5e_ipsec_update_stats(priv);
}
+static int mlx5e_grp_tls_get_num_stats(struct mlx5e_priv *priv)
+{
+ return mlx5e_tls_get_count(priv);
+}
+
+static int mlx5e_grp_tls_fill_strings(struct mlx5e_priv *priv, u8 *data,
+ int idx)
+{
+ return idx + mlx5e_tls_get_strings(priv, data + idx * ETH_GSTRING_LEN);
+}
+
+static int mlx5e_grp_tls_fill_stats(struct mlx5e_priv *priv, u64 *data, int idx)
+{
+ return idx + mlx5e_tls_get_stats(priv, data + idx);
+}
+
static const struct counter_desc rq_stats_desc[] = {
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, packets) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, bytes) },
@@ -1268,6 +1295,11 @@
.update_stats = mlx5e_grp_ipsec_update_stats,
},
{
+ .get_num_stats = mlx5e_grp_tls_get_num_stats,
+ .fill_strings = mlx5e_grp_tls_fill_strings,
+ .fill_stats = mlx5e_grp_tls_fill_stats,
+ },
+ {
.get_num_stats = mlx5e_grp_channels_get_num_stats,
.fill_strings = mlx5e_grp_channels_fill_strings,
.fill_stats = mlx5e_grp_channels_fill_stats,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 53111a2..a36e6a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -93,6 +93,11 @@
u64 rx_cache_waive;
u64 ch_eq_rearm;
+#ifdef CONFIG_MLX5_EN_TLS
+ u64 tx_tls_ooo;
+ u64 tx_tls_resync_bytes;
+#endif
+
/* Special handling counters */
u64 link_down_events_phy;
};
@@ -194,6 +199,10 @@
u64 csum_partial_inner;
u64 added_vlan_packets;
u64 nop;
+#ifdef CONFIG_MLX5_EN_TLS
+ u64 tls_ooo;
+ u64 tls_resync_bytes;
+#endif
/* less likely accessed in data path */
u64 csum_none;
u64 stopped;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index b94276d..a9c96fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -52,25 +52,32 @@
#include "eswitch.h"
#include "vxlan.h"
#include "fs_core.h"
+#include "en/port.h"
struct mlx5_nic_flow_attr {
u32 action;
u32 flow_tag;
u32 mod_hdr_id;
u32 hairpin_tirn;
+ u8 match_level;
struct mlx5_flow_table *hairpin_ft;
};
+#define MLX5E_TC_FLOW_BASE (MLX5E_TC_LAST_EXPORTED_BIT + 1)
+
enum {
- MLX5E_TC_FLOW_ESWITCH = BIT(0),
- MLX5E_TC_FLOW_NIC = BIT(1),
- MLX5E_TC_FLOW_OFFLOADED = BIT(2),
- MLX5E_TC_FLOW_HAIRPIN = BIT(3),
- MLX5E_TC_FLOW_HAIRPIN_RSS = BIT(4),
+ MLX5E_TC_FLOW_INGRESS = MLX5E_TC_INGRESS,
+ MLX5E_TC_FLOW_EGRESS = MLX5E_TC_EGRESS,
+ MLX5E_TC_FLOW_ESWITCH = BIT(MLX5E_TC_FLOW_BASE),
+ MLX5E_TC_FLOW_NIC = BIT(MLX5E_TC_FLOW_BASE + 1),
+ MLX5E_TC_FLOW_OFFLOADED = BIT(MLX5E_TC_FLOW_BASE + 2),
+ MLX5E_TC_FLOW_HAIRPIN = BIT(MLX5E_TC_FLOW_BASE + 3),
+ MLX5E_TC_FLOW_HAIRPIN_RSS = BIT(MLX5E_TC_FLOW_BASE + 4),
};
struct mlx5e_tc_flow {
struct rhash_head node;
+ struct mlx5e_priv *priv;
u64 cookie;
u8 flags;
struct mlx5_flow_handle *rule;
@@ -97,7 +104,7 @@
};
#define MLX5E_TC_TABLE_NUM_GROUPS 4
-#define MLX5E_TC_TABLE_MAX_GROUP_SIZE (1 << 16)
+#define MLX5E_TC_TABLE_MAX_GROUP_SIZE BIT(16)
struct mlx5e_hairpin {
struct mlx5_hairpin *pair;
@@ -607,7 +614,7 @@
params.q_counter = priv->q_counter;
/* set hairpin pair per each 50Gbs share of the link */
- mlx5e_get_max_linkspeed(priv->mdev, &link_speed);
+ mlx5e_port_max_linkspeed(priv->mdev, &link_speed);
link_speed = max_t(u32, link_speed, 50000);
link_speed64 = link_speed;
do_div(link_speed64, 50000);
@@ -753,7 +760,9 @@
table_created = true;
}
- parse_attr->spec.match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
+ if (attr->match_level != MLX5_MATCH_NONE)
+ parse_attr->spec.match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
+
rule = mlx5_add_flow_rules(priv->fs.tc.t, &parse_attr->spec,
&flow_act, dest, dest_ix);
@@ -789,7 +798,7 @@
mlx5_del_flow_rules(flow->rule);
mlx5_fc_destroy(priv->mdev, counter);
- if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
+ if (!mlx5e_tc_num_filters(priv) && priv->fs.tc.t) {
mlx5_destroy_flow_table(priv->fs.tc.t);
priv->fs.tc.t = NULL;
}
@@ -836,6 +845,7 @@
out_priv = netdev_priv(encap_dev);
rpriv = out_priv->ppriv;
attr->out_rep = rpriv->rep;
+ attr->out_mdev = out_priv->mdev;
}
err = mlx5_eswitch_add_vlan_action(esw, attr);
@@ -982,6 +992,8 @@
}
}
}
+ if (neigh_used)
+ break;
}
if (neigh_used) {
@@ -1190,7 +1202,7 @@
static int __parse_cls_flower(struct mlx5e_priv *priv,
struct mlx5_flow_spec *spec,
struct tc_cls_flower_offload *f,
- u8 *min_inline)
+ u8 *match_level)
{
void *headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
outer_headers);
@@ -1199,7 +1211,7 @@
u16 addr_type = 0;
u8 ip_proto = 0;
- *min_inline = MLX5_INLINE_MODE_L2;
+ *match_level = MLX5_MATCH_NONE;
if (f->dissector->used_keys &
~(BIT(FLOW_DISSECTOR_KEY_CONTROL) |
@@ -1249,58 +1261,6 @@
inner_headers);
}
- if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_CONTROL)) {
- struct flow_dissector_key_control *key =
- skb_flow_dissector_target(f->dissector,
- FLOW_DISSECTOR_KEY_CONTROL,
- f->key);
-
- struct flow_dissector_key_control *mask =
- skb_flow_dissector_target(f->dissector,
- FLOW_DISSECTOR_KEY_CONTROL,
- f->mask);
- addr_type = key->addr_type;
-
- /* the HW doesn't support frag first/later */
- if (mask->flags & FLOW_DIS_FIRST_FRAG)
- return -EOPNOTSUPP;
-
- if (mask->flags & FLOW_DIS_IS_FRAGMENT) {
- MLX5_SET(fte_match_set_lyr_2_4, headers_c, frag, 1);
- MLX5_SET(fte_match_set_lyr_2_4, headers_v, frag,
- key->flags & FLOW_DIS_IS_FRAGMENT);
-
- /* the HW doesn't need L3 inline to match on frag=no */
- if (key->flags & FLOW_DIS_IS_FRAGMENT)
- *min_inline = MLX5_INLINE_MODE_IP;
- }
- }
-
- if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
- struct flow_dissector_key_basic *key =
- skb_flow_dissector_target(f->dissector,
- FLOW_DISSECTOR_KEY_BASIC,
- f->key);
- struct flow_dissector_key_basic *mask =
- skb_flow_dissector_target(f->dissector,
- FLOW_DISSECTOR_KEY_BASIC,
- f->mask);
- ip_proto = key->ip_proto;
-
- MLX5_SET(fte_match_set_lyr_2_4, headers_c, ethertype,
- ntohs(mask->n_proto));
- MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
- ntohs(key->n_proto));
-
- MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_protocol,
- mask->ip_proto);
- MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
- key->ip_proto);
-
- if (mask->ip_proto)
- *min_inline = MLX5_INLINE_MODE_IP;
- }
-
if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
struct flow_dissector_key_eth_addrs *key =
skb_flow_dissector_target(f->dissector,
@@ -1324,6 +1284,9 @@
ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
smac_47_16),
key->src);
+
+ if (!is_zero_ether_addr(mask->src) || !is_zero_ether_addr(mask->dst))
+ *match_level = MLX5_MATCH_L2;
}
if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_VLAN)) {
@@ -1344,9 +1307,79 @@
MLX5_SET(fte_match_set_lyr_2_4, headers_c, first_prio, mask->vlan_priority);
MLX5_SET(fte_match_set_lyr_2_4, headers_v, first_prio, key->vlan_priority);
+
+ *match_level = MLX5_MATCH_L2;
}
}
+ if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+ struct flow_dissector_key_basic *key =
+ skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ f->key);
+ struct flow_dissector_key_basic *mask =
+ skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ f->mask);
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, ethertype,
+ ntohs(mask->n_proto));
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
+ ntohs(key->n_proto));
+
+ if (mask->n_proto)
+ *match_level = MLX5_MATCH_L2;
+ }
+
+ if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_CONTROL)) {
+ struct flow_dissector_key_control *key =
+ skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_CONTROL,
+ f->key);
+
+ struct flow_dissector_key_control *mask =
+ skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_CONTROL,
+ f->mask);
+ addr_type = key->addr_type;
+
+ /* the HW doesn't support frag first/later */
+ if (mask->flags & FLOW_DIS_FIRST_FRAG)
+ return -EOPNOTSUPP;
+
+ if (mask->flags & FLOW_DIS_IS_FRAGMENT) {
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, frag, 1);
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, frag,
+ key->flags & FLOW_DIS_IS_FRAGMENT);
+
+ /* the HW doesn't need L3 inline to match on frag=no */
+ if (!(key->flags & FLOW_DIS_IS_FRAGMENT))
+ *match_level = MLX5_INLINE_MODE_L2;
+ /* *** L2 attributes parsing up to here *** */
+ else
+ *match_level = MLX5_INLINE_MODE_IP;
+ }
+ }
+
+ if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
+ struct flow_dissector_key_basic *key =
+ skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ f->key);
+ struct flow_dissector_key_basic *mask =
+ skb_flow_dissector_target(f->dissector,
+ FLOW_DISSECTOR_KEY_BASIC,
+ f->mask);
+ ip_proto = key->ip_proto;
+
+ MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_protocol,
+ mask->ip_proto);
+ MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_protocol,
+ key->ip_proto);
+
+ if (mask->ip_proto)
+ *match_level = MLX5_MATCH_L3;
+ }
+
if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
struct flow_dissector_key_ipv4_addrs *key =
skb_flow_dissector_target(f->dissector,
@@ -1371,7 +1404,7 @@
&key->dst, sizeof(key->dst));
if (mask->src || mask->dst)
- *min_inline = MLX5_INLINE_MODE_IP;
+ *match_level = MLX5_MATCH_L3;
}
if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
@@ -1400,7 +1433,7 @@
if (ipv6_addr_type(&mask->src) != IPV6_ADDR_ANY ||
ipv6_addr_type(&mask->dst) != IPV6_ADDR_ANY)
- *min_inline = MLX5_INLINE_MODE_IP;
+ *match_level = MLX5_MATCH_L3;
}
if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_IP)) {
@@ -1428,9 +1461,11 @@
return -EOPNOTSUPP;
if (mask->tos || mask->ttl)
- *min_inline = MLX5_INLINE_MODE_IP;
+ *match_level = MLX5_MATCH_L3;
}
+ /* *** L3 attributes parsing up to here *** */
+
if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_PORTS)) {
struct flow_dissector_key_ports *key =
skb_flow_dissector_target(f->dissector,
@@ -1471,7 +1506,7 @@
}
if (mask->src || mask->dst)
- *min_inline = MLX5_INLINE_MODE_TCP_UDP;
+ *match_level = MLX5_MATCH_L4;
}
if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_TCP)) {
@@ -1490,7 +1525,7 @@
ntohs(key->flags));
if (mask->flags)
- *min_inline = MLX5_INLINE_MODE_TCP_UDP;
+ *match_level = MLX5_MATCH_L4;
}
return 0;
@@ -1505,23 +1540,28 @@
struct mlx5_eswitch *esw = dev->priv.eswitch;
struct mlx5e_rep_priv *rpriv = priv->ppriv;
struct mlx5_eswitch_rep *rep;
- u8 min_inline;
+ u8 match_level;
int err;
- err = __parse_cls_flower(priv, spec, f, &min_inline);
+ err = __parse_cls_flower(priv, spec, f, &match_level);
if (!err && (flow->flags & MLX5E_TC_FLOW_ESWITCH)) {
rep = rpriv->rep;
if (rep->vport != FDB_UPLINK_VPORT &&
(esw->offloads.inline_mode != MLX5_INLINE_MODE_NONE &&
- esw->offloads.inline_mode < min_inline)) {
+ esw->offloads.inline_mode < match_level)) {
netdev_warn(priv->netdev,
"Flow is not offloaded due to min inline setting, required %d actual %d\n",
- min_inline, esw->offloads.inline_mode);
+ match_level, esw->offloads.inline_mode);
return -EOPNOTSUPP;
}
}
+ if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
+ flow->esw_attr->match_level = match_level;
+ else
+ flow->nic_attr->match_level = match_level;
+
return err;
}
@@ -1578,7 +1618,6 @@
static struct mlx5_fields fields[] = {
OFFLOAD(DMAC_47_16, 4, eth.h_dest[0], 0),
- OFFLOAD(DMAC_47_16, 4, eth.h_dest[0], 0),
OFFLOAD(DMAC_15_0, 2, eth.h_dest[4], 0),
OFFLOAD(SMAC_47_16, 4, eth.h_source[0], 0),
OFFLOAD(SMAC_15_0, 2, eth.h_source[4], 0),
@@ -1764,12 +1803,12 @@
err = -EOPNOTSUPP; /* can't be all optimistic */
if (htype == TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK) {
- printk(KERN_WARNING "mlx5: legacy pedit isn't offloaded\n");
+ netdev_warn(priv->netdev, "legacy pedit isn't offloaded\n");
goto out_err;
}
if (cmd != TCA_PEDIT_KEY_EX_CMD_SET && cmd != TCA_PEDIT_KEY_EX_CMD_ADD) {
- printk(KERN_WARNING "mlx5: pedit cmd %d isn't offloaded\n", cmd);
+ netdev_warn(priv->netdev, "pedit cmd %d isn't offloaded\n", cmd);
goto out_err;
}
@@ -1793,8 +1832,7 @@
for (cmd = 0; cmd < __PEDIT_CMD_MAX; cmd++) {
cmd_masks = &masks[cmd];
if (memcmp(cmd_masks, &zero_masks, sizeof(zero_masks))) {
- printk(KERN_WARNING "mlx5: attempt to offload an unsupported field (cmd %d)\n",
- cmd);
+ netdev_warn(priv->netdev, "attempt to offload an unsupported field (cmd %d)\n", cmd);
print_hex_dump(KERN_WARNING, "mask: ", DUMP_PREFIX_ADDRESS,
16, 1, cmd_masks, sizeof(zero_masks), true);
err = -EOPNOTSUPP;
@@ -1917,21 +1955,21 @@
struct mlx5_nic_flow_attr *attr = flow->nic_attr;
const struct tc_action *a;
LIST_HEAD(actions);
+ u32 action = 0;
int err;
if (!tcf_exts_has_actions(exts))
return -EINVAL;
attr->flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
- attr->action = 0;
tcf_exts_to_list(exts, &actions);
list_for_each_entry(a, &actions, list) {
if (is_tcf_gact_shot(a)) {
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_DROP;
+ action |= MLX5_FLOW_CONTEXT_ACTION_DROP;
if (MLX5_CAP_FLOWTABLE(priv->mdev,
flow_table_properties_nic_receive.flow_counter))
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
continue;
}
@@ -1941,13 +1979,13 @@
if (err)
return err;
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
- MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+ action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR |
+ MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
continue;
}
if (is_tcf_csum(a)) {
- if (csum_offload_supported(priv, attr->action,
+ if (csum_offload_supported(priv, action,
tcf_csum_update_flags(a)))
continue;
@@ -1961,8 +1999,8 @@
same_hw_devs(priv, netdev_priv(peer_dev))) {
parse_attr->mirred_ifindex = peer_dev->ifindex;
flow->flags |= MLX5E_TC_FLOW_HAIRPIN;
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
- MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+ MLX5_FLOW_CONTEXT_ACTION_COUNT;
} else {
netdev_warn(priv->netdev, "device %s not on same HW, can't offload\n",
peer_dev->name);
@@ -1981,13 +2019,14 @@
}
attr->flow_tag = mark;
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+ action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
continue;
}
return -EINVAL;
}
+ attr->action = action;
if (!actions_match_supported(priv, exts, parse_attr, flow))
return -EOPNOTSUPP;
@@ -2044,6 +2083,20 @@
return 0;
}
+static bool is_merged_eswitch_dev(struct mlx5e_priv *priv,
+ struct net_device *peer_netdev)
+{
+ struct mlx5e_priv *peer_priv;
+
+ peer_priv = netdev_priv(peer_netdev);
+
+ return (MLX5_CAP_ESW(priv->mdev, merged_eswitch) &&
+ (priv->netdev->netdev_ops == peer_netdev->netdev_ops) &&
+ same_hw_devs(priv, peer_priv) &&
+ MLX5_VPORT_MANAGER(peer_priv->mdev) &&
+ (peer_priv->mdev->priv.eswitch->mode == SRIOV_OFFLOADS));
+}
+
static int mlx5e_route_lookup_ipv6(struct mlx5e_priv *priv,
struct net_device *mirred_dev,
struct net_device **out_dev,
@@ -2459,34 +2512,36 @@
const struct tc_action *a;
LIST_HEAD(actions);
bool encap = false;
- int err = 0;
+ u32 action = 0;
if (!tcf_exts_has_actions(exts))
return -EINVAL;
- memset(attr, 0, sizeof(*attr));
attr->in_rep = rpriv->rep;
+ attr->in_mdev = priv->mdev;
tcf_exts_to_list(exts, &actions);
list_for_each_entry(a, &actions, list) {
if (is_tcf_gact_shot(a)) {
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_DROP |
- MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ action |= MLX5_FLOW_CONTEXT_ACTION_DROP |
+ MLX5_FLOW_CONTEXT_ACTION_COUNT;
continue;
}
if (is_tcf_pedit(a)) {
+ int err;
+
err = parse_tc_pedit_action(priv, a, MLX5_FLOW_NAMESPACE_FDB,
parse_attr);
if (err)
return err;
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+ action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
continue;
}
if (is_tcf_csum(a)) {
- if (csum_offload_supported(priv, attr->action,
+ if (csum_offload_supported(priv, action,
tcf_csum_update_flags(a)))
continue;
@@ -2500,19 +2555,21 @@
out_dev = tcf_mirred_dev(a);
if (switchdev_port_same_parent_id(priv->netdev,
- out_dev)) {
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
- MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ out_dev) ||
+ is_merged_eswitch_dev(priv, out_dev)) {
+ action |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+ MLX5_FLOW_CONTEXT_ACTION_COUNT;
out_priv = netdev_priv(out_dev);
rpriv = out_priv->ppriv;
attr->out_rep = rpriv->rep;
+ attr->out_mdev = out_priv->mdev;
} else if (encap) {
parse_attr->mirred_ifindex = out_dev->ifindex;
parse_attr->tun_info = *info;
attr->parse_attr = parse_attr;
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_ENCAP |
- MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
- MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ action |= MLX5_FLOW_CONTEXT_ACTION_ENCAP |
+ MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+ MLX5_FLOW_CONTEXT_ACTION_COUNT;
/* attr->out_rep is resolved when we handle encap */
} else {
pr_err("devices %s %s not on same switch HW, can't offload forwarding\n",
@@ -2533,9 +2590,9 @@
if (is_tcf_vlan(a)) {
if (tcf_vlan_action(a) == TCA_VLAN_ACT_POP) {
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
+ action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
} else if (tcf_vlan_action(a) == TCA_VLAN_ACT_PUSH) {
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH;
+ action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH;
attr->vlan_vid = tcf_vlan_push_vid(a);
if (mlx5_eswitch_vlan_actions_supported(priv->mdev)) {
attr->vlan_prio = tcf_vlan_push_prio(a);
@@ -2553,34 +2610,74 @@
}
if (is_tcf_tunnel_release(a)) {
- attr->action |= MLX5_FLOW_CONTEXT_ACTION_DECAP;
+ action |= MLX5_FLOW_CONTEXT_ACTION_DECAP;
continue;
}
return -EINVAL;
}
+ attr->action = action;
if (!actions_match_supported(priv, exts, parse_attr, flow))
return -EOPNOTSUPP;
- return err;
+ return 0;
+}
+
+static void get_flags(int flags, u8 *flow_flags)
+{
+ u8 __flow_flags = 0;
+
+ if (flags & MLX5E_TC_INGRESS)
+ __flow_flags |= MLX5E_TC_FLOW_INGRESS;
+ if (flags & MLX5E_TC_EGRESS)
+ __flow_flags |= MLX5E_TC_FLOW_EGRESS;
+
+ *flow_flags = __flow_flags;
+}
+
+static const struct rhashtable_params tc_ht_params = {
+ .head_offset = offsetof(struct mlx5e_tc_flow, node),
+ .key_offset = offsetof(struct mlx5e_tc_flow, cookie),
+ .key_len = sizeof(((struct mlx5e_tc_flow *)0)->cookie),
+ .automatic_shrinking = true,
+};
+
+static struct rhashtable *get_tc_ht(struct mlx5e_priv *priv)
+{
+ struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+ struct mlx5e_rep_priv *uplink_rpriv;
+
+ if (MLX5_VPORT_MANAGER(priv->mdev) && esw->mode == SRIOV_OFFLOADS) {
+ uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+ return &uplink_rpriv->tc_ht;
+ } else
+ return &priv->fs.tc.ht;
}
int mlx5e_configure_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f)
+ struct tc_cls_flower_offload *f, int flags)
{
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5e_tc_flow_parse_attr *parse_attr;
- struct mlx5e_tc_table *tc = &priv->fs.tc;
+ struct rhashtable *tc_ht = get_tc_ht(priv);
struct mlx5e_tc_flow *flow;
int attr_size, err = 0;
u8 flow_flags = 0;
+ get_flags(flags, &flow_flags);
+
+ flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
+ if (flow) {
+ netdev_warn_once(priv->netdev, "flow cookie %lx already exists, ignoring\n", f->cookie);
+ return 0;
+ }
+
if (esw && esw->mode == SRIOV_OFFLOADS) {
- flow_flags = MLX5E_TC_FLOW_ESWITCH;
+ flow_flags |= MLX5E_TC_FLOW_ESWITCH;
attr_size = sizeof(struct mlx5_esw_flow_attr);
} else {
- flow_flags = MLX5E_TC_FLOW_NIC;
+ flow_flags |= MLX5E_TC_FLOW_NIC;
attr_size = sizeof(struct mlx5_nic_flow_attr);
}
@@ -2593,6 +2690,7 @@
flow->cookie = f->cookie;
flow->flags = flow_flags;
+ flow->priv = priv;
err = parse_cls_flower(priv, flow, &parse_attr->spec, f);
if (err < 0)
@@ -2623,8 +2721,7 @@
!(flow->esw_attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP))
kvfree(parse_attr);
- err = rhashtable_insert_fast(&tc->ht, &flow->node,
- tc->ht_params);
+ err = rhashtable_insert_fast(tc_ht, &flow->node, tc_ht_params);
if (err) {
mlx5e_tc_del_flow(priv, flow);
kfree(flow);
@@ -2638,18 +2735,28 @@
return err;
}
-int mlx5e_delete_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f)
-{
- struct mlx5e_tc_flow *flow;
- struct mlx5e_tc_table *tc = &priv->fs.tc;
+#define DIRECTION_MASK (MLX5E_TC_INGRESS | MLX5E_TC_EGRESS)
+#define FLOW_DIRECTION_MASK (MLX5E_TC_FLOW_INGRESS | MLX5E_TC_FLOW_EGRESS)
- flow = rhashtable_lookup_fast(&tc->ht, &f->cookie,
- tc->ht_params);
- if (!flow)
+static bool same_flow_direction(struct mlx5e_tc_flow *flow, int flags)
+{
+ if ((flow->flags & FLOW_DIRECTION_MASK) == (flags & DIRECTION_MASK))
+ return true;
+
+ return false;
+}
+
+int mlx5e_delete_flower(struct mlx5e_priv *priv,
+ struct tc_cls_flower_offload *f, int flags)
+{
+ struct rhashtable *tc_ht = get_tc_ht(priv);
+ struct mlx5e_tc_flow *flow;
+
+ flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
+ if (!flow || !same_flow_direction(flow, flags))
return -EINVAL;
- rhashtable_remove_fast(&tc->ht, &flow->node, tc->ht_params);
+ rhashtable_remove_fast(tc_ht, &flow->node, tc_ht_params);
mlx5e_tc_del_flow(priv, flow);
@@ -2659,18 +2766,17 @@
}
int mlx5e_stats_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f)
+ struct tc_cls_flower_offload *f, int flags)
{
- struct mlx5e_tc_table *tc = &priv->fs.tc;
+ struct rhashtable *tc_ht = get_tc_ht(priv);
struct mlx5e_tc_flow *flow;
struct mlx5_fc *counter;
u64 bytes;
u64 packets;
u64 lastuse;
- flow = rhashtable_lookup_fast(&tc->ht, &f->cookie,
- tc->ht_params);
- if (!flow)
+ flow = rhashtable_lookup_fast(tc_ht, &f->cookie, tc_ht_params);
+ if (!flow || !same_flow_direction(flow, flags))
return -EINVAL;
if (!(flow->flags & MLX5E_TC_FLOW_OFFLOADED))
@@ -2687,41 +2793,43 @@
return 0;
}
-static const struct rhashtable_params mlx5e_tc_flow_ht_params = {
- .head_offset = offsetof(struct mlx5e_tc_flow, node),
- .key_offset = offsetof(struct mlx5e_tc_flow, cookie),
- .key_len = sizeof(((struct mlx5e_tc_flow *)0)->cookie),
- .automatic_shrinking = true,
-};
-
-int mlx5e_tc_init(struct mlx5e_priv *priv)
+int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
{
struct mlx5e_tc_table *tc = &priv->fs.tc;
hash_init(tc->mod_hdr_tbl);
hash_init(tc->hairpin_tbl);
- tc->ht_params = mlx5e_tc_flow_ht_params;
- return rhashtable_init(&tc->ht, &tc->ht_params);
+ return rhashtable_init(&tc->ht, &tc_ht_params);
}
static void _mlx5e_tc_del_flow(void *ptr, void *arg)
{
struct mlx5e_tc_flow *flow = ptr;
- struct mlx5e_priv *priv = arg;
+ struct mlx5e_priv *priv = flow->priv;
mlx5e_tc_del_flow(priv, flow);
kfree(flow);
}
-void mlx5e_tc_cleanup(struct mlx5e_priv *priv)
+void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv)
{
struct mlx5e_tc_table *tc = &priv->fs.tc;
- rhashtable_free_and_destroy(&tc->ht, _mlx5e_tc_del_flow, priv);
+ rhashtable_free_and_destroy(&tc->ht, _mlx5e_tc_del_flow, NULL);
if (!IS_ERR_OR_NULL(tc->t)) {
mlx5_destroy_flow_table(tc->t);
tc->t = NULL;
}
}
+
+int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
+{
+ return rhashtable_init(tc_ht, &tc_ht_params);
+}
+
+void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht)
+{
+ rhashtable_free_and_destroy(tc_ht, _mlx5e_tc_del_flow, NULL);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index c14c263..59e52b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -38,16 +38,26 @@
#define MLX5E_TC_FLOW_ID_MASK 0x0000ffff
#ifdef CONFIG_MLX5_ESWITCH
-int mlx5e_tc_init(struct mlx5e_priv *priv);
-void mlx5e_tc_cleanup(struct mlx5e_priv *priv);
+
+enum {
+ MLX5E_TC_INGRESS = BIT(0),
+ MLX5E_TC_EGRESS = BIT(1),
+ MLX5E_TC_LAST_EXPORTED_BIT = 1,
+};
+
+int mlx5e_tc_nic_init(struct mlx5e_priv *priv);
+void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv);
+
+int mlx5e_tc_esw_init(struct rhashtable *tc_ht);
+void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht);
int mlx5e_configure_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f);
+ struct tc_cls_flower_offload *f, int flags);
int mlx5e_delete_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f);
+ struct tc_cls_flower_offload *f, int flags);
int mlx5e_stats_flower(struct mlx5e_priv *priv,
- struct tc_cls_flower_offload *f);
+ struct tc_cls_flower_offload *f, int flags);
struct mlx5e_encap_entry;
void mlx5e_tc_encap_flows_add(struct mlx5e_priv *priv,
@@ -64,8 +74,8 @@
}
#else /* CONFIG_MLX5_ESWITCH */
-static inline int mlx5e_tc_init(struct mlx5e_priv *priv) { return 0; }
-static inline void mlx5e_tc_cleanup(struct mlx5e_priv *priv) {}
+static inline int mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
+static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
static inline int mlx5e_tc_num_filters(struct mlx5e_priv *priv) { return 0; }
#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 5532aa3..2d3f17d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -35,12 +35,21 @@
#include <net/dsfield.h>
#include "en.h"
#include "ipoib/ipoib.h"
-#include "en_accel/ipsec_rxtx.h"
+#include "en_accel/en_accel.h"
#include "lib/clock.h"
#define MLX5E_SQ_NOPS_ROOM MLX5_SEND_WQE_MAX_WQEBBS
+
+#ifndef CONFIG_MLX5_EN_TLS
#define MLX5E_SQ_STOP_ROOM (MLX5_SEND_WQE_MAX_WQEBBS +\
MLX5E_SQ_NOPS_ROOM)
+#else
+/* TLS offload requires MLX5E_SQ_STOP_ROOM to have
+ * enough room for a resync SKB, a normal SKB and a NOP
+ */
+#define MLX5E_SQ_STOP_ROOM (2 * MLX5_SEND_WQE_MAX_WQEBBS +\
+ MLX5E_SQ_NOPS_ROOM)
+#endif
static inline void mlx5e_tx_dma_unmap(struct device *pdev,
struct mlx5e_sq_dma *dma)
@@ -329,8 +338,8 @@
}
}
-static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
- struct mlx5e_tx_wqe *wqe, u16 pi)
+netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
+ struct mlx5e_tx_wqe *wqe, u16 pi)
{
struct mlx5e_tx_wqe_info *wi = &sq->db.wqe_info[pi];
@@ -401,21 +410,19 @@
netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct mlx5e_priv *priv = netdev_priv(dev);
- struct mlx5e_txqsq *sq = priv->txq2sq[skb_get_queue_mapping(skb)];
- struct mlx5_wq_cyc *wq = &sq->wq;
- u16 pi = sq->pc & wq->sz_m1;
- struct mlx5e_tx_wqe *wqe = mlx5_wq_cyc_get_wqe(wq, pi);
+ struct mlx5e_tx_wqe *wqe;
+ struct mlx5e_txqsq *sq;
+ u16 pi;
- memset(wqe, 0, sizeof(*wqe));
+ sq = priv->txq2sq[skb_get_queue_mapping(skb)];
+ mlx5e_sq_fetch_wqe(sq, &wqe, &pi);
-#ifdef CONFIG_MLX5_EN_IPSEC
- if (sq->state & BIT(MLX5E_SQ_STATE_IPSEC)) {
- skb = mlx5e_ipsec_handle_tx_skb(dev, wqe, skb);
- if (unlikely(!skb))
- return NETDEV_TX_OK;
- }
+#ifdef CONFIG_MLX5_ACCEL
+ /* might send skbs and update wqe and pi */
+ skb = mlx5e_accel_handle_tx(skb, sq, dev, &wqe, &pi);
+ if (unlikely(!skb))
+ return NETDEV_TX_OK;
#endif
-
return mlx5e_sq_xmit(sq, skb, wqe, pi);
}
@@ -443,7 +450,7 @@
sq = container_of(cq, struct mlx5e_txqsq, cq);
- if (unlikely(!MLX5E_TEST_BIT(sq->state, MLX5E_SQ_STATE_ENABLED)))
+ if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state)))
return false;
cqe = mlx5_cqwq_get_cqe(&cq->wq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index f292bb3..5d6f9ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -44,6 +44,30 @@
return cpumask_test_cpu(current_cpu, aff);
}
+static void mlx5e_handle_tx_dim(struct mlx5e_txqsq *sq)
+{
+ struct net_dim_sample dim_sample;
+
+ if (unlikely(!test_bit(MLX5E_SQ_STATE_AM, &sq->state)))
+ return;
+
+ net_dim_sample(sq->cq.event_ctr, sq->stats.packets, sq->stats.bytes,
+ &dim_sample);
+ net_dim(&sq->dim, dim_sample);
+}
+
+static void mlx5e_handle_rx_dim(struct mlx5e_rq *rq)
+{
+ struct net_dim_sample dim_sample;
+
+ if (unlikely(!test_bit(MLX5E_RQ_STATE_AM, &rq->state)))
+ return;
+
+ net_dim_sample(rq->cq.event_ctr, rq->stats.packets, rq->stats.bytes,
+ &dim_sample);
+ net_dim(&rq->dim, dim_sample);
+}
+
int mlx5e_napi_poll(struct napi_struct *napi, int budget)
{
struct mlx5e_channel *c = container_of(napi, struct mlx5e_channel,
@@ -75,18 +99,13 @@
if (unlikely(!napi_complete_done(napi, work_done)))
return work_done;
- for (i = 0; i < c->num_tc; i++)
+ for (i = 0; i < c->num_tc; i++) {
+ mlx5e_handle_tx_dim(&c->sq[i]);
mlx5e_cq_arm(&c->sq[i].cq);
-
- if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM)) {
- struct net_dim_sample dim_sample;
- net_dim_sample(c->rq.cq.event_ctr,
- c->rq.stats.packets,
- c->rq.stats.bytes,
- &dim_sample);
- net_dim(&c->rq.dim, dim_sample);
}
+ mlx5e_handle_rx_dim(&c->rq);
+
mlx5e_cq_arm(&c->rq.cq);
mlx5e_cq_arm(&c->icosq.cq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 1352d13..09f0e11c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -192,7 +192,7 @@
}
dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
- dest.vport_num = vport;
+ dest.vport.num = vport;
esw_debug(esw->dev,
"\tFDB add rule dmac_v(%pM) dmac_c(%pM) -> vport(%d)\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 4cd773f..f47a14e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -227,9 +227,18 @@
SET_VLAN_INSERT = BIT(1)
};
+enum mlx5_flow_match_level {
+ MLX5_MATCH_NONE = MLX5_INLINE_MODE_NONE,
+ MLX5_MATCH_L2 = MLX5_INLINE_MODE_L2,
+ MLX5_MATCH_L3 = MLX5_INLINE_MODE_IP,
+ MLX5_MATCH_L4 = MLX5_INLINE_MODE_TCP_UDP,
+};
+
struct mlx5_esw_flow_attr {
struct mlx5_eswitch_rep *in_rep;
struct mlx5_eswitch_rep *out_rep;
+ struct mlx5_core_dev *out_mdev;
+ struct mlx5_core_dev *in_mdev;
int action;
__be16 vlan_proto;
@@ -238,6 +247,7 @@
bool vlan_handled;
u32 encap_id;
u32 mod_hdr_id;
+ u8 match_level;
struct mlx5e_tc_flow_parse_attr *parse_attr;
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 35e256e..b9ea464 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -71,7 +71,12 @@
if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
dest[i].type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
- dest[i].vport_num = attr->out_rep->vport;
+ dest[i].vport.num = attr->out_rep->vport;
+ if (MLX5_CAP_ESW(esw->dev, merged_eswitch)) {
+ dest[i].vport.vhca_id =
+ MLX5_CAP_GEN(attr->out_mdev, vhca_id);
+ dest[i].vport.vhca_id_valid = 1;
+ }
i++;
}
if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
@@ -88,11 +93,23 @@
misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters);
MLX5_SET(fte_match_set_misc, misc, source_port, attr->in_rep->vport);
+ if (MLX5_CAP_ESW(esw->dev, merged_eswitch))
+ MLX5_SET(fte_match_set_misc, misc,
+ source_eswitch_owner_vhca_id,
+ MLX5_CAP_GEN(attr->in_mdev, vhca_id));
+
misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters);
MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
+ if (MLX5_CAP_ESW(esw->dev, merged_eswitch))
+ MLX5_SET_TO_ONES(fte_match_set_misc, misc,
+ source_eswitch_owner_vhca_id);
- spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS |
- MLX5_MATCH_MISC_PARAMETERS;
+ if (attr->match_level == MLX5_MATCH_NONE)
+ spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS;
+ else
+ spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS |
+ MLX5_MATCH_MISC_PARAMETERS;
+
if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_DECAP)
spec->match_criteria_enable |= MLX5_MATCH_INNER_HEADERS;
@@ -343,7 +360,7 @@
spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS;
dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
- dest.vport_num = vport;
+ dest.vport.num = vport;
flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.fdb, spec,
@@ -387,7 +404,7 @@
dmac_c[0] = 0x01;
dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
- dest.vport_num = 0;
+ dest.vport.num = 0;
flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
flow_rule = mlx5_add_flow_rules(esw->fdb_table.offloads.fdb, spec,
@@ -663,7 +680,7 @@
esw->offloads.vport_rx_group = g;
out:
- kfree(flow_group_in);
+ kvfree(flow_group_in);
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.h b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.h
index 82405ed..3e2355c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.h
@@ -53,6 +53,7 @@
} conn_res;
struct mlx5_fpga_ipsec *ipsec;
+ struct mlx5_fpga_tls *tls;
};
#define mlx5_fpga_dbg(__adev, format, ...) \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
index fad8c2e..a0433b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
@@ -43,9 +43,6 @@
#include "fpga/sdk.h"
#include "fpga/core.h"
-#define SBU_QP_QUEUE_SIZE 8
-#define MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC (60 * 1000)
-
enum mlx5_fpga_ipsec_cmd_status {
MLX5_FPGA_IPSEC_CMD_PENDING,
MLX5_FPGA_IPSEC_CMD_SEND_FAIL,
@@ -256,7 +253,7 @@
{
struct mlx5_fpga_ipsec_cmd_context *context = ctx;
unsigned long timeout =
- msecs_to_jiffies(MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC);
+ msecs_to_jiffies(MLX5_FPGA_CMD_TIMEOUT_MSEC);
int res;
res = wait_for_completion_timeout(&context->complete, timeout);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h
index baa537e..a0573cc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h
@@ -41,6 +41,8 @@
* DOC: Innova SDK
* This header defines the in-kernel API for Innova FPGA client drivers.
*/
+#define SBU_QP_QUEUE_SIZE 8
+#define MLX5_FPGA_CMD_TIMEOUT_MSEC (60 * 1000)
enum mlx5_fpga_access_type {
MLX5_FPGA_ACCESS_TYPE_I2C = 0x0,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
new file mode 100644
index 0000000..2104801
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c
@@ -0,0 +1,562 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/mlx5/device.h>
+#include "fpga/tls.h"
+#include "fpga/cmd.h"
+#include "fpga/sdk.h"
+#include "fpga/core.h"
+#include "accel/tls.h"
+
+struct mlx5_fpga_tls_command_context;
+
+typedef void (*mlx5_fpga_tls_command_complete)
+ (struct mlx5_fpga_conn *conn, struct mlx5_fpga_device *fdev,
+ struct mlx5_fpga_tls_command_context *ctx,
+ struct mlx5_fpga_dma_buf *resp);
+
+struct mlx5_fpga_tls_command_context {
+ struct list_head list;
+ /* There is no guarantee on the order between the TX completion
+ * and the command response.
+ * The TX completion is going to touch cmd->buf even in
+ * the case of successful transmission.
+ * So instead of requiring separate allocations for cmd
+ * and cmd->buf we've decided to use a reference counter
+ */
+ refcount_t ref;
+ struct mlx5_fpga_dma_buf buf;
+ mlx5_fpga_tls_command_complete complete;
+};
+
+static void
+mlx5_fpga_tls_put_command_ctx(struct mlx5_fpga_tls_command_context *ctx)
+{
+ if (refcount_dec_and_test(&ctx->ref))
+ kfree(ctx);
+}
+
+static void mlx5_fpga_tls_cmd_complete(struct mlx5_fpga_device *fdev,
+ struct mlx5_fpga_dma_buf *resp)
+{
+ struct mlx5_fpga_conn *conn = fdev->tls->conn;
+ struct mlx5_fpga_tls_command_context *ctx;
+ struct mlx5_fpga_tls *tls = fdev->tls;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tls->pending_cmds_lock, flags);
+ ctx = list_first_entry(&tls->pending_cmds,
+ struct mlx5_fpga_tls_command_context, list);
+ list_del(&ctx->list);
+ spin_unlock_irqrestore(&tls->pending_cmds_lock, flags);
+ ctx->complete(conn, fdev, ctx, resp);
+}
+
+static void mlx5_fpga_cmd_send_complete(struct mlx5_fpga_conn *conn,
+ struct mlx5_fpga_device *fdev,
+ struct mlx5_fpga_dma_buf *buf,
+ u8 status)
+{
+ struct mlx5_fpga_tls_command_context *ctx =
+ container_of(buf, struct mlx5_fpga_tls_command_context, buf);
+
+ mlx5_fpga_tls_put_command_ctx(ctx);
+
+ if (unlikely(status))
+ mlx5_fpga_tls_cmd_complete(fdev, NULL);
+}
+
+static void mlx5_fpga_tls_cmd_send(struct mlx5_fpga_device *fdev,
+ struct mlx5_fpga_tls_command_context *cmd,
+ mlx5_fpga_tls_command_complete complete)
+{
+ struct mlx5_fpga_tls *tls = fdev->tls;
+ unsigned long flags;
+ int ret;
+
+ refcount_set(&cmd->ref, 2);
+ cmd->complete = complete;
+ cmd->buf.complete = mlx5_fpga_cmd_send_complete;
+
+ spin_lock_irqsave(&tls->pending_cmds_lock, flags);
+ /* mlx5_fpga_sbu_conn_sendmsg is called under pending_cmds_lock
+ * to make sure commands are inserted to the tls->pending_cmds list
+ * and the command QP in the same order.
+ */
+ ret = mlx5_fpga_sbu_conn_sendmsg(tls->conn, &cmd->buf);
+ if (likely(!ret))
+ list_add_tail(&cmd->list, &tls->pending_cmds);
+ else
+ complete(tls->conn, fdev, cmd, NULL);
+ spin_unlock_irqrestore(&tls->pending_cmds_lock, flags);
+}
+
+/* Start of context identifiers range (inclusive) */
+#define SWID_START 0
+/* End of context identifiers range (exclusive) */
+#define SWID_END BIT(24)
+
+static int mlx5_fpga_tls_alloc_swid(struct idr *idr, spinlock_t *idr_spinlock,
+ void *ptr)
+{
+ int ret;
+
+ /* TLS metadata format is 1 byte for syndrome followed
+ * by 3 bytes of swid (software ID)
+ * swid must not exceed 3 bytes.
+ * See tls_rxtx.c:insert_pet() for details
+ */
+ BUILD_BUG_ON((SWID_END - 1) & 0xFF000000);
+
+ idr_preload(GFP_KERNEL);
+ spin_lock_irq(idr_spinlock);
+ ret = idr_alloc(idr, ptr, SWID_START, SWID_END, GFP_ATOMIC);
+ spin_unlock_irq(idr_spinlock);
+ idr_preload_end();
+
+ return ret;
+}
+
+static void mlx5_fpga_tls_release_swid(struct idr *idr,
+ spinlock_t *idr_spinlock, u32 swid)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(idr_spinlock, flags);
+ idr_remove(idr, swid);
+ spin_unlock_irqrestore(idr_spinlock, flags);
+}
+
+struct mlx5_teardown_stream_context {
+ struct mlx5_fpga_tls_command_context cmd;
+ u32 swid;
+};
+
+static void
+mlx5_fpga_tls_teardown_completion(struct mlx5_fpga_conn *conn,
+ struct mlx5_fpga_device *fdev,
+ struct mlx5_fpga_tls_command_context *cmd,
+ struct mlx5_fpga_dma_buf *resp)
+{
+ struct mlx5_teardown_stream_context *ctx =
+ container_of(cmd, struct mlx5_teardown_stream_context, cmd);
+
+ if (resp) {
+ u32 syndrome = MLX5_GET(tls_resp, resp->sg[0].data, syndrome);
+
+ if (syndrome)
+ mlx5_fpga_err(fdev,
+ "Teardown stream failed with syndrome = %d",
+ syndrome);
+ else
+ mlx5_fpga_tls_release_swid(&fdev->tls->tx_idr,
+ &fdev->tls->idr_spinlock,
+ ctx->swid);
+ }
+ mlx5_fpga_tls_put_command_ctx(cmd);
+}
+
+static void mlx5_fpga_tls_flow_to_cmd(void *flow, void *cmd)
+{
+ memcpy(MLX5_ADDR_OF(tls_cmd, cmd, src_port), flow,
+ MLX5_BYTE_OFF(tls_flow, ipv6));
+
+ MLX5_SET(tls_cmd, cmd, ipv6, MLX5_GET(tls_flow, flow, ipv6));
+ MLX5_SET(tls_cmd, cmd, direction_sx,
+ MLX5_GET(tls_flow, flow, direction_sx));
+}
+
+void mlx5_fpga_tls_send_teardown_cmd(struct mlx5_core_dev *mdev, void *flow,
+ u32 swid, gfp_t flags)
+{
+ struct mlx5_teardown_stream_context *ctx;
+ struct mlx5_fpga_dma_buf *buf;
+ void *cmd;
+
+ ctx = kzalloc(sizeof(*ctx) + MLX5_TLS_COMMAND_SIZE, flags);
+ if (!ctx)
+ return;
+
+ buf = &ctx->cmd.buf;
+ cmd = (ctx + 1);
+ MLX5_SET(tls_cmd, cmd, command_type, CMD_TEARDOWN_STREAM);
+ MLX5_SET(tls_cmd, cmd, swid, swid);
+
+ mlx5_fpga_tls_flow_to_cmd(flow, cmd);
+ kfree(flow);
+
+ buf->sg[0].data = cmd;
+ buf->sg[0].size = MLX5_TLS_COMMAND_SIZE;
+
+ ctx->swid = swid;
+ mlx5_fpga_tls_cmd_send(mdev->fpga, &ctx->cmd,
+ mlx5_fpga_tls_teardown_completion);
+}
+
+void mlx5_fpga_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid,
+ gfp_t flags)
+{
+ struct mlx5_fpga_tls *tls = mdev->fpga->tls;
+ void *flow;
+
+ rcu_read_lock();
+ flow = idr_find(&tls->tx_idr, swid);
+ rcu_read_unlock();
+
+ if (!flow) {
+ mlx5_fpga_err(mdev->fpga, "No flow information for swid %u\n",
+ swid);
+ return;
+ }
+
+ mlx5_fpga_tls_send_teardown_cmd(mdev, flow, swid, flags);
+}
+
+enum mlx5_fpga_setup_stream_status {
+ MLX5_FPGA_CMD_PENDING,
+ MLX5_FPGA_CMD_SEND_FAILED,
+ MLX5_FPGA_CMD_RESPONSE_RECEIVED,
+ MLX5_FPGA_CMD_ABANDONED,
+};
+
+struct mlx5_setup_stream_context {
+ struct mlx5_fpga_tls_command_context cmd;
+ atomic_t status;
+ u32 syndrome;
+ struct completion comp;
+};
+
+static void
+mlx5_fpga_tls_setup_completion(struct mlx5_fpga_conn *conn,
+ struct mlx5_fpga_device *fdev,
+ struct mlx5_fpga_tls_command_context *cmd,
+ struct mlx5_fpga_dma_buf *resp)
+{
+ struct mlx5_setup_stream_context *ctx =
+ container_of(cmd, struct mlx5_setup_stream_context, cmd);
+ int status = MLX5_FPGA_CMD_SEND_FAILED;
+ void *tls_cmd = ctx + 1;
+
+ /* If we failed to send to command resp == NULL */
+ if (resp) {
+ ctx->syndrome = MLX5_GET(tls_resp, resp->sg[0].data, syndrome);
+ status = MLX5_FPGA_CMD_RESPONSE_RECEIVED;
+ }
+
+ status = atomic_xchg_release(&ctx->status, status);
+ if (likely(status != MLX5_FPGA_CMD_ABANDONED)) {
+ complete(&ctx->comp);
+ return;
+ }
+
+ mlx5_fpga_err(fdev, "Command was abandoned, syndrome = %u\n",
+ ctx->syndrome);
+
+ if (!ctx->syndrome) {
+ /* The process was killed while waiting for the context to be
+ * added, and the add completed successfully.
+ * We need to destroy the HW context, and we can't can't reuse
+ * the command context because we might not have received
+ * the tx completion yet.
+ */
+ mlx5_fpga_tls_del_tx_flow(fdev->mdev,
+ MLX5_GET(tls_cmd, tls_cmd, swid),
+ GFP_ATOMIC);
+ }
+
+ mlx5_fpga_tls_put_command_ctx(cmd);
+}
+
+static int mlx5_fpga_tls_setup_stream_cmd(struct mlx5_core_dev *mdev,
+ struct mlx5_setup_stream_context *ctx)
+{
+ struct mlx5_fpga_dma_buf *buf;
+ void *cmd = ctx + 1;
+ int status, ret = 0;
+
+ buf = &ctx->cmd.buf;
+ buf->sg[0].data = cmd;
+ buf->sg[0].size = MLX5_TLS_COMMAND_SIZE;
+ MLX5_SET(tls_cmd, cmd, command_type, CMD_SETUP_STREAM);
+
+ init_completion(&ctx->comp);
+ atomic_set(&ctx->status, MLX5_FPGA_CMD_PENDING);
+ ctx->syndrome = -1;
+
+ mlx5_fpga_tls_cmd_send(mdev->fpga, &ctx->cmd,
+ mlx5_fpga_tls_setup_completion);
+ wait_for_completion_killable(&ctx->comp);
+
+ status = atomic_xchg_acquire(&ctx->status, MLX5_FPGA_CMD_ABANDONED);
+ if (unlikely(status == MLX5_FPGA_CMD_PENDING))
+ /* ctx is going to be released in mlx5_fpga_tls_setup_completion */
+ return -EINTR;
+
+ if (unlikely(ctx->syndrome))
+ ret = -ENOMEM;
+
+ mlx5_fpga_tls_put_command_ctx(&ctx->cmd);
+ return ret;
+}
+
+static void mlx5_fpga_tls_hw_qp_recv_cb(void *cb_arg,
+ struct mlx5_fpga_dma_buf *buf)
+{
+ struct mlx5_fpga_device *fdev = (struct mlx5_fpga_device *)cb_arg;
+
+ mlx5_fpga_tls_cmd_complete(fdev, buf);
+}
+
+bool mlx5_fpga_is_tls_device(struct mlx5_core_dev *mdev)
+{
+ if (!mdev->fpga || !MLX5_CAP_GEN(mdev, fpga))
+ return false;
+
+ if (MLX5_CAP_FPGA(mdev, ieee_vendor_id) !=
+ MLX5_FPGA_CAP_SANDBOX_VENDOR_ID_MLNX)
+ return false;
+
+ if (MLX5_CAP_FPGA(mdev, sandbox_product_id) !=
+ MLX5_FPGA_CAP_SANDBOX_PRODUCT_ID_TLS)
+ return false;
+
+ if (MLX5_CAP_FPGA(mdev, sandbox_product_version) != 0)
+ return false;
+
+ return true;
+}
+
+static int mlx5_fpga_tls_get_caps(struct mlx5_fpga_device *fdev,
+ u32 *p_caps)
+{
+ int err, cap_size = MLX5_ST_SZ_BYTES(tls_extended_cap);
+ u32 caps = 0;
+ void *buf;
+
+ buf = kzalloc(cap_size, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ err = mlx5_fpga_get_sbu_caps(fdev, cap_size, buf);
+ if (err)
+ goto out;
+
+ if (MLX5_GET(tls_extended_cap, buf, tx))
+ caps |= MLX5_ACCEL_TLS_TX;
+ if (MLX5_GET(tls_extended_cap, buf, rx))
+ caps |= MLX5_ACCEL_TLS_RX;
+ if (MLX5_GET(tls_extended_cap, buf, tls_v12))
+ caps |= MLX5_ACCEL_TLS_V12;
+ if (MLX5_GET(tls_extended_cap, buf, tls_v13))
+ caps |= MLX5_ACCEL_TLS_V13;
+ if (MLX5_GET(tls_extended_cap, buf, lro))
+ caps |= MLX5_ACCEL_TLS_LRO;
+ if (MLX5_GET(tls_extended_cap, buf, ipv6))
+ caps |= MLX5_ACCEL_TLS_IPV6;
+
+ if (MLX5_GET(tls_extended_cap, buf, aes_gcm_128))
+ caps |= MLX5_ACCEL_TLS_AES_GCM128;
+ if (MLX5_GET(tls_extended_cap, buf, aes_gcm_256))
+ caps |= MLX5_ACCEL_TLS_AES_GCM256;
+
+ *p_caps = caps;
+ err = 0;
+out:
+ kfree(buf);
+ return err;
+}
+
+int mlx5_fpga_tls_init(struct mlx5_core_dev *mdev)
+{
+ struct mlx5_fpga_device *fdev = mdev->fpga;
+ struct mlx5_fpga_conn_attr init_attr = {0};
+ struct mlx5_fpga_conn *conn;
+ struct mlx5_fpga_tls *tls;
+ int err = 0;
+
+ if (!mlx5_fpga_is_tls_device(mdev) || !fdev)
+ return 0;
+
+ tls = kzalloc(sizeof(*tls), GFP_KERNEL);
+ if (!tls)
+ return -ENOMEM;
+
+ err = mlx5_fpga_tls_get_caps(fdev, &tls->caps);
+ if (err)
+ goto error;
+
+ if (!(tls->caps & (MLX5_ACCEL_TLS_TX | MLX5_ACCEL_TLS_V12 |
+ MLX5_ACCEL_TLS_AES_GCM128))) {
+ err = -ENOTSUPP;
+ goto error;
+ }
+
+ init_attr.rx_size = SBU_QP_QUEUE_SIZE;
+ init_attr.tx_size = SBU_QP_QUEUE_SIZE;
+ init_attr.recv_cb = mlx5_fpga_tls_hw_qp_recv_cb;
+ init_attr.cb_arg = fdev;
+ conn = mlx5_fpga_sbu_conn_create(fdev, &init_attr);
+ if (IS_ERR(conn)) {
+ err = PTR_ERR(conn);
+ mlx5_fpga_err(fdev, "Error creating TLS command connection %d\n",
+ err);
+ goto error;
+ }
+
+ tls->conn = conn;
+ spin_lock_init(&tls->pending_cmds_lock);
+ INIT_LIST_HEAD(&tls->pending_cmds);
+
+ idr_init(&tls->tx_idr);
+ spin_lock_init(&tls->idr_spinlock);
+ fdev->tls = tls;
+ return 0;
+
+error:
+ kfree(tls);
+ return err;
+}
+
+void mlx5_fpga_tls_cleanup(struct mlx5_core_dev *mdev)
+{
+ struct mlx5_fpga_device *fdev = mdev->fpga;
+
+ if (!fdev || !fdev->tls)
+ return;
+
+ mlx5_fpga_sbu_conn_destroy(fdev->tls->conn);
+ kfree(fdev->tls);
+ fdev->tls = NULL;
+}
+
+static void mlx5_fpga_tls_set_aes_gcm128_ctx(void *cmd,
+ struct tls_crypto_info *info,
+ __be64 *rcd_sn)
+{
+ struct tls12_crypto_info_aes_gcm_128 *crypto_info =
+ (struct tls12_crypto_info_aes_gcm_128 *)info;
+
+ memcpy(MLX5_ADDR_OF(tls_cmd, cmd, tls_rcd_sn), crypto_info->rec_seq,
+ TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
+
+ memcpy(MLX5_ADDR_OF(tls_cmd, cmd, tls_implicit_iv),
+ crypto_info->salt, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+ memcpy(MLX5_ADDR_OF(tls_cmd, cmd, encryption_key),
+ crypto_info->key, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+
+ /* in AES-GCM 128 we need to write the key twice */
+ memcpy(MLX5_ADDR_OF(tls_cmd, cmd, encryption_key) +
+ TLS_CIPHER_AES_GCM_128_KEY_SIZE,
+ crypto_info->key, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+
+ MLX5_SET(tls_cmd, cmd, alg, MLX5_TLS_ALG_AES_GCM_128);
+}
+
+static int mlx5_fpga_tls_set_key_material(void *cmd, u32 caps,
+ struct tls_crypto_info *crypto_info)
+{
+ __be64 rcd_sn;
+
+ switch (crypto_info->cipher_type) {
+ case TLS_CIPHER_AES_GCM_128:
+ if (!(caps & MLX5_ACCEL_TLS_AES_GCM128))
+ return -EINVAL;
+ mlx5_fpga_tls_set_aes_gcm128_ctx(cmd, crypto_info, &rcd_sn);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int mlx5_fpga_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
+ struct tls_crypto_info *crypto_info, u32 swid,
+ u32 tcp_sn)
+{
+ u32 caps = mlx5_fpga_tls_device_caps(mdev);
+ struct mlx5_setup_stream_context *ctx;
+ int ret = -ENOMEM;
+ size_t cmd_size;
+ void *cmd;
+
+ cmd_size = MLX5_TLS_COMMAND_SIZE + sizeof(*ctx);
+ ctx = kzalloc(cmd_size, GFP_KERNEL);
+ if (!ctx)
+ goto out;
+
+ cmd = ctx + 1;
+ ret = mlx5_fpga_tls_set_key_material(cmd, caps, crypto_info);
+ if (ret)
+ goto free_ctx;
+
+ mlx5_fpga_tls_flow_to_cmd(flow, cmd);
+
+ MLX5_SET(tls_cmd, cmd, swid, swid);
+ MLX5_SET(tls_cmd, cmd, tcp_sn, tcp_sn);
+
+ return mlx5_fpga_tls_setup_stream_cmd(mdev, ctx);
+
+free_ctx:
+ kfree(ctx);
+out:
+ return ret;
+}
+
+int mlx5_fpga_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn, u32 *p_swid)
+{
+ struct mlx5_fpga_tls *tls = mdev->fpga->tls;
+ int ret = -ENOMEM;
+ u32 swid;
+
+ ret = mlx5_fpga_tls_alloc_swid(&tls->tx_idr, &tls->idr_spinlock, flow);
+ if (ret < 0)
+ return ret;
+
+ swid = ret;
+ MLX5_SET(tls_flow, flow, direction_sx, 1);
+
+ ret = mlx5_fpga_tls_add_flow(mdev, flow, crypto_info, swid,
+ start_offload_tcp_sn);
+ if (ret && ret != -EINTR)
+ goto free_swid;
+
+ *p_swid = swid;
+ return 0;
+free_swid:
+ mlx5_fpga_tls_release_swid(&tls->tx_idr, &tls->idr_spinlock, swid);
+
+ return ret;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h
new file mode 100644
index 0000000..800a214
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __MLX5_FPGA_TLS_H__
+#define __MLX5_FPGA_TLS_H__
+
+#include <linux/mlx5/driver.h>
+
+#include <net/tls.h>
+#include "fpga/core.h"
+
+struct mlx5_fpga_tls {
+ struct list_head pending_cmds;
+ spinlock_t pending_cmds_lock; /* Protects pending_cmds */
+ u32 caps;
+ struct mlx5_fpga_conn *conn;
+
+ struct idr tx_idr;
+ spinlock_t idr_spinlock; /* protects the IDR */
+};
+
+int mlx5_fpga_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn, u32 *p_swid);
+
+void mlx5_fpga_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid,
+ gfp_t flags);
+
+bool mlx5_fpga_is_tls_device(struct mlx5_core_dev *mdev);
+int mlx5_fpga_tls_init(struct mlx5_core_dev *mdev);
+void mlx5_fpga_tls_cleanup(struct mlx5_core_dev *mdev);
+
+static inline u32 mlx5_fpga_tls_device_caps(struct mlx5_core_dev *mdev)
+{
+ return mdev->fpga->tls->caps;
+}
+
+#endif /* __MLX5_FPGA_TLS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index ef5afd7..5a00def 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -372,6 +372,15 @@
if (dst->dest_attr.type ==
MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE) {
id = dst->dest_attr.ft->id;
+ } else if (dst->dest_attr.type ==
+ MLX5_FLOW_DESTINATION_TYPE_VPORT) {
+ id = dst->dest_attr.vport.num;
+ MLX5_SET(dest_format_struct, in_dests,
+ destination_eswitch_owner_vhca_id_valid,
+ dst->dest_attr.vport.vhca_id_valid);
+ MLX5_SET(dest_format_struct, in_dests,
+ destination_eswitch_owner_vhca_id,
+ dst->dest_attr.vport.vhca_id);
} else {
id = dst->dest_attr.tir_num;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c39c169..806e9552 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1374,6 +1374,8 @@
struct mlx5_core_dev *dev = get_dev(&ft->node);
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
void *match_criteria_addr;
+ u8 src_esw_owner_mask_on;
+ void *misc;
int err;
u32 *in;
@@ -1386,6 +1388,14 @@
MLX5_SET(create_flow_group_in, in, start_flow_index, fg->start_index);
MLX5_SET(create_flow_group_in, in, end_flow_index, fg->start_index +
fg->max_ftes - 1);
+
+ misc = MLX5_ADDR_OF(fte_match_param, fg->mask.match_criteria,
+ misc_parameters);
+ src_esw_owner_mask_on = !!MLX5_GET(fte_match_set_misc, misc,
+ source_eswitch_owner_vhca_id);
+ MLX5_SET(create_flow_group_in, in,
+ source_eswitch_owner_vhca_id_valid, src_esw_owner_mask_on);
+
match_criteria_addr = MLX5_ADDR_OF(create_flow_group_in,
in, match_criteria);
memcpy(match_criteria_addr, fg->mask.match_criteria,
@@ -1406,7 +1416,7 @@
{
if (d1->type == d2->type) {
if ((d1->type == MLX5_FLOW_DESTINATION_TYPE_VPORT &&
- d1->vport_num == d2->vport_num) ||
+ d1->vport.num == d2->vport.num) ||
(d1->type == MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE &&
d1->ft == d2->ft) ||
(d1->type == MLX5_FLOW_DESTINATION_TYPE_TIR &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index e2c465b..615005e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -60,6 +60,7 @@
#include "fpga/core.h"
#include "fpga/ipsec.h"
#include "accel/ipsec.h"
+#include "accel/tls.h"
#include "lib/clock.h"
MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
@@ -1190,6 +1191,12 @@
goto err_ipsec_start;
}
+ err = mlx5_accel_tls_init(dev);
+ if (err) {
+ dev_err(&pdev->dev, "TLS device start failed %d\n", err);
+ goto err_tls_start;
+ }
+
err = mlx5_init_fs(dev);
if (err) {
dev_err(&pdev->dev, "Failed to init flow steering\n");
@@ -1231,6 +1238,9 @@
mlx5_cleanup_fs(dev);
err_fs:
+ mlx5_accel_tls_cleanup(dev);
+
+err_tls_start:
mlx5_accel_ipsec_cleanup(dev);
err_ipsec_start:
@@ -1306,6 +1316,7 @@
mlx5_sriov_detach(dev);
mlx5_cleanup_fs(dev);
mlx5_accel_ipsec_cleanup(dev);
+ mlx5_accel_tls_cleanup(dev);
mlx5_fpga_device_stop(dev);
mlx5_irq_clear_affinity_hints(dev);
free_comp_eqs(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
index b9736f5..f4f02f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c
@@ -123,8 +123,8 @@
deleted_mkey = radix_tree_delete(&table->tree, mlx5_base_mkey(mkey->key));
write_unlock_irqrestore(&table->lock, flags);
if (!deleted_mkey) {
- mlx5_core_warn(dev, "failed radix tree delete of mkey 0x%x\n",
- mlx5_base_mkey(mkey->key));
+ mlx5_core_dbg(dev, "failed radix tree delete of mkey 0x%x\n",
+ mlx5_base_mkey(mkey->key));
return -ENOENT;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 02d6c5b..4ca07bf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -407,21 +407,21 @@
case MLX5_CMD_OP_RST2INIT_QP:
if (MBOX_ALLOC(mbox, rst2init_qp))
return -ENOMEM;
- MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
- opt_param_mask, qpc);
- break;
+ MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
+ opt_param_mask, qpc);
+ break;
case MLX5_CMD_OP_INIT2RTR_QP:
if (MBOX_ALLOC(mbox, init2rtr_qp))
return -ENOMEM;
- MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn,
- opt_param_mask, qpc);
- break;
+ MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn,
+ opt_param_mask, qpc);
+ break;
case MLX5_CMD_OP_RTR2RTS_QP:
if (MBOX_ALLOC(mbox, rtr2rts_qp))
return -ENOMEM;
- MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn,
- opt_param_mask, qpc);
- break;
+ MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn,
+ opt_param_mask, qpc);
+ break;
case MLX5_CMD_OP_RTS2RTS_QP:
if (MBOX_ALLOC(mbox, rts2rts_qp))
return -ENOMEM;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 177e076..719cecb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -511,7 +511,7 @@
*system_image_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.system_image_guid);
- kfree(out);
+ kvfree(out);
return 0;
}
@@ -531,7 +531,7 @@
*node_guid = MLX5_GET64(query_nic_vport_context_out, out,
nic_vport_context.node_guid);
- kfree(out);
+ kvfree(out);
return 0;
}
@@ -587,7 +587,7 @@
*qkey_viol_cntr = MLX5_GET(query_nic_vport_context_out, out,
nic_vport_context.qkey_violation_counter);
- kfree(out);
+ kvfree(out);
return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.h b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
index fca90b9..f3dfa0ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.h
@@ -38,7 +38,6 @@
#include <linux/mlx5/qp.h>
struct mlx5_wq_param {
- int linear;
int buf_numa_node;
int db_numa_node;
};
diff --git a/drivers/net/ethernet/mellanox/mlxsw/cmd.h b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
index 479511c..8da91b0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/cmd.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/cmd.h
@@ -424,10 +424,15 @@
MLXSW_ITEM32(cmd_mbox, query_aq_cap, max_num_rdqs, 0x04, 0, 8);
/* cmd_mbox_query_aq_cap_log_max_cq_sz
- * Log (base 2) of max CQEs allowed on CQ.
+ * Log (base 2) of the Maximum CQEs allowed in a CQ for CQEv0 and CQEv1.
*/
MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_cq_sz, 0x08, 24, 8);
+/* cmd_mbox_query_aq_cap_log_max_cqv2_sz
+ * Log (base 2) of the Maximum CQEs allowed in a CQ for CQEv2.
+ */
+MLXSW_ITEM32(cmd_mbox, query_aq_cap, log_max_cqv2_sz, 0x08, 16, 8);
+
/* cmd_mbox_query_aq_cap_max_num_cqs
* Maximum number of CQs.
*/
@@ -662,6 +667,12 @@
*/
MLXSW_ITEM32(cmd_mbox, config_profile, set_kvd_hash_double_size, 0x0C, 26, 1);
+/* cmd_mbox_config_set_cqe_version
+ * Capability bit. Setting a bit to 1 configures the profile
+ * according to the mailbox contents.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, set_cqe_version, 0x08, 0, 1);
+
/* cmd_mbox_config_profile_max_vepa_channels
* Maximum number of VEPA channels per port (0 through 16)
* 0 - multi-channel VEPA is disabled
@@ -841,6 +852,14 @@
MLXSW_ITEM32_INDEXED(cmd_mbox, config_profile, swid_config_properties,
0x60, 0, 8, 0x08, 0x00, false);
+/* cmd_mbox_config_profile_cqe_version
+ * CQE version:
+ * 0: CQE version is 0
+ * 1: CQE version is either 1 or 2
+ * CQE ver 1 or 2 is configured by Completion Queue Context field cqe_ver.
+ */
+MLXSW_ITEM32(cmd_mbox, config_profile, cqe_version, 0xB0, 0, 8);
+
/* ACCESS_REG - Access EMAD Supported Register
* ----------------------------------
* OpMod == 0 (N/A), INMmod == 0 (N/A)
@@ -1032,11 +1051,15 @@
0, cq_number, in_mbox, MLXSW_CMD_MBOX_SIZE);
}
-/* cmd_mbox_sw2hw_cq_cv
+enum mlxsw_cmd_mbox_sw2hw_cq_cqe_ver {
+ MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_1,
+ MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_2,
+};
+
+/* cmd_mbox_sw2hw_cq_cqe_ver
* CQE Version.
- * 0 - CQE Version 0, 1 - CQE Version 1
*/
-MLXSW_ITEM32(cmd_mbox, sw2hw_cq, cv, 0x00, 28, 4);
+MLXSW_ITEM32(cmd_mbox, sw2hw_cq, cqe_ver, 0x00, 28, 4);
/* cmd_mbox_sw2hw_cq_c_eqn
* Event Queue this CQ reports completion events to.
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index e13ac3b..a38faec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1714,15 +1714,16 @@
void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv, struct net_device *dev,
- bool split, u32 split_group)
+ u32 port_number, bool split,
+ u32 split_port_subnumber)
{
struct mlxsw_core_port *mlxsw_core_port =
&mlxsw_core->ports[local_port];
struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
mlxsw_core_port->port_driver_priv = port_driver_priv;
- if (split)
- devlink_port_split_set(devlink_port, split_group);
+ devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+ port_number, split, split_port_subnumber);
devlink_port_type_eth_set(devlink_port, dev);
}
EXPORT_SYMBOL(mlxsw_core_port_eth_set);
@@ -1762,6 +1763,17 @@
}
EXPORT_SYMBOL(mlxsw_core_port_type_get);
+int mlxsw_core_port_get_phys_port_name(struct mlxsw_core *mlxsw_core,
+ u8 local_port, char *name, size_t len)
+{
+ struct mlxsw_core_port *mlxsw_core_port =
+ &mlxsw_core->ports[local_port];
+ struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
+
+ return devlink_port_get_phys_port_name(devlink_port, name, len);
+}
+EXPORT_SYMBOL(mlxsw_core_port_get_phys_port_name);
+
static void mlxsw_core_buf_dump_dbg(struct mlxsw_core *mlxsw_core,
const char *buf, size_t size)
{
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 092d393..4eac7fb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -201,13 +201,16 @@
void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port);
void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv, struct net_device *dev,
- bool split, u32 split_group);
+ u32 port_number, bool split,
+ u32 split_port_subnumber);
void mlxsw_core_port_ib_set(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv);
void mlxsw_core_port_clear(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv);
enum devlink_port_type mlxsw_core_port_type_get(struct mlxsw_core *mlxsw_core,
u8 local_port);
+int mlxsw_core_port_get_phys_port_name(struct mlxsw_core *mlxsw_core,
+ u8 local_port, char *name, size_t len);
int mlxsw_core_schedule_dw(struct delayed_work *dwork, unsigned long delay);
bool mlxsw_core_schedule_work(struct work_struct *work);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 3a93819..db794a1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -117,6 +117,7 @@
struct {
u32 comp_sdq_count;
u32 comp_rdq_count;
+ enum mlxsw_pci_cqe_v v;
} cq;
struct {
u32 ev_cmd_count;
@@ -155,6 +156,8 @@
} cmd;
struct mlxsw_bus_info bus_info;
const struct pci_device_id *id;
+ enum mlxsw_pci_cqe_v max_cqe_ver; /* Maximal supported CQE version */
+ u8 num_sdq_cqs; /* Number of CQs used for SDQs */
};
static void mlxsw_pci_queue_tasklet_schedule(struct mlxsw_pci_queue *q)
@@ -202,24 +205,6 @@
return owner_bit != !!(q->consumer_counter & q->count);
}
-static char *
-mlxsw_pci_queue_sw_elem_get(struct mlxsw_pci_queue *q,
- u32 (*get_elem_owner_func)(const char *))
-{
- struct mlxsw_pci_queue_elem_info *elem_info;
- char *elem;
- bool owner_bit;
-
- elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
- elem = elem_info->elem;
- owner_bit = get_elem_owner_func(elem);
- if (mlxsw_pci_elem_hw_owned(q, owner_bit))
- return NULL;
- q->consumer_counter++;
- rmb(); /* make sure we read owned bit before the rest of elem */
- return elem;
-}
-
static struct mlxsw_pci_queue_type_group *
mlxsw_pci_queue_type_group_get(struct mlxsw_pci *mlxsw_pci,
enum mlxsw_pci_queue_type q_type)
@@ -494,6 +479,17 @@
}
}
+static void mlxsw_pci_cq_pre_init(struct mlxsw_pci *mlxsw_pci,
+ struct mlxsw_pci_queue *q)
+{
+ q->u.cq.v = mlxsw_pci->max_cqe_ver;
+
+ /* For SDQ it is pointless to use CQEv2, so use CQEv1 instead */
+ if (q->u.cq.v == MLXSW_PCI_CQE_V2 &&
+ q->num < mlxsw_pci->num_sdq_cqs)
+ q->u.cq.v = MLXSW_PCI_CQE_V1;
+}
+
static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
struct mlxsw_pci_queue *q)
{
@@ -505,10 +501,16 @@
for (i = 0; i < q->count; i++) {
char *elem = mlxsw_pci_queue_elem_get(q, i);
- mlxsw_pci_cqe_owner_set(elem, 1);
+ mlxsw_pci_cqe_owner_set(q->u.cq.v, elem, 1);
}
- mlxsw_cmd_mbox_sw2hw_cq_cv_set(mbox, 0); /* CQE ver 0 */
+ if (q->u.cq.v == MLXSW_PCI_CQE_V1)
+ mlxsw_cmd_mbox_sw2hw_cq_cqe_ver_set(mbox,
+ MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_1);
+ else if (q->u.cq.v == MLXSW_PCI_CQE_V2)
+ mlxsw_cmd_mbox_sw2hw_cq_cqe_ver_set(mbox,
+ MLXSW_CMD_MBOX_SW2HW_CQ_CQE_VER_2);
+
mlxsw_cmd_mbox_sw2hw_cq_c_eqn_set(mbox, MLXSW_PCI_EQ_COMP_NUM);
mlxsw_cmd_mbox_sw2hw_cq_st_set(mbox, 0);
mlxsw_cmd_mbox_sw2hw_cq_log_cq_size_set(mbox, ilog2(q->count));
@@ -559,7 +561,7 @@
static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
struct mlxsw_pci_queue *q,
u16 consumer_counter_limit,
- char *cqe)
+ enum mlxsw_pci_cqe_v cqe_v, char *cqe)
{
struct pci_dev *pdev = mlxsw_pci->pdev;
struct mlxsw_pci_queue_elem_info *elem_info;
@@ -579,10 +581,11 @@
if (q->consumer_counter++ != consumer_counter_limit)
dev_dbg_ratelimited(&pdev->dev, "Consumer counter does not match limit in RDQ\n");
- if (mlxsw_pci_cqe_lag_get(cqe)) {
+ if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
rx_info.is_lag = true;
- rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe);
- rx_info.lag_port_index = mlxsw_pci_cqe_lag_port_index_get(cqe);
+ rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
+ rx_info.lag_port_index =
+ mlxsw_pci_cqe_lag_subport_get(cqe_v, cqe);
} else {
rx_info.is_lag = false;
rx_info.u.sys_port = mlxsw_pci_cqe_system_port_get(cqe);
@@ -591,7 +594,7 @@
rx_info.trap_id = mlxsw_pci_cqe_trap_id_get(cqe);
byte_count = mlxsw_pci_cqe_byte_count_get(cqe);
- if (mlxsw_pci_cqe_crc_get(cqe))
+ if (mlxsw_pci_cqe_crc_get(cqe_v, cqe))
byte_count -= ETH_FCS_LEN;
skb_put(skb, byte_count);
mlxsw_core_skb_receive(mlxsw_pci->core, skb, &rx_info);
@@ -608,7 +611,18 @@
static char *mlxsw_pci_cq_sw_cqe_get(struct mlxsw_pci_queue *q)
{
- return mlxsw_pci_queue_sw_elem_get(q, mlxsw_pci_cqe_owner_get);
+ struct mlxsw_pci_queue_elem_info *elem_info;
+ char *elem;
+ bool owner_bit;
+
+ elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+ elem = elem_info->elem;
+ owner_bit = mlxsw_pci_cqe_owner_get(q->u.cq.v, elem);
+ if (mlxsw_pci_elem_hw_owned(q, owner_bit))
+ return NULL;
+ q->consumer_counter++;
+ rmb(); /* make sure we read owned bit before the rest of elem */
+ return elem;
}
static void mlxsw_pci_cq_tasklet(unsigned long data)
@@ -621,8 +635,8 @@
while ((cqe = mlxsw_pci_cq_sw_cqe_get(q))) {
u16 wqe_counter = mlxsw_pci_cqe_wqe_counter_get(cqe);
- u8 sendq = mlxsw_pci_cqe_sr_get(cqe);
- u8 dqn = mlxsw_pci_cqe_dqn_get(cqe);
+ u8 sendq = mlxsw_pci_cqe_sr_get(q->u.cq.v, cqe);
+ u8 dqn = mlxsw_pci_cqe_dqn_get(q->u.cq.v, cqe);
if (sendq) {
struct mlxsw_pci_queue *sdq;
@@ -636,7 +650,7 @@
rdq = mlxsw_pci_rdq_get(mlxsw_pci, dqn);
mlxsw_pci_cqe_rdq_handle(mlxsw_pci, rdq,
- wqe_counter, cqe);
+ wqe_counter, q->u.cq.v, cqe);
q->u.cq.comp_rdq_count++;
}
if (++items == credits)
@@ -648,6 +662,18 @@
}
}
+static u16 mlxsw_pci_cq_elem_count(const struct mlxsw_pci_queue *q)
+{
+ return q->u.cq.v == MLXSW_PCI_CQE_V2 ? MLXSW_PCI_CQE2_COUNT :
+ MLXSW_PCI_CQE01_COUNT;
+}
+
+static u8 mlxsw_pci_cq_elem_size(const struct mlxsw_pci_queue *q)
+{
+ return q->u.cq.v == MLXSW_PCI_CQE_V2 ? MLXSW_PCI_CQE2_SIZE :
+ MLXSW_PCI_CQE01_SIZE;
+}
+
static int mlxsw_pci_eq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
struct mlxsw_pci_queue *q)
{
@@ -696,7 +722,18 @@
static char *mlxsw_pci_eq_sw_eqe_get(struct mlxsw_pci_queue *q)
{
- return mlxsw_pci_queue_sw_elem_get(q, mlxsw_pci_eqe_owner_get);
+ struct mlxsw_pci_queue_elem_info *elem_info;
+ char *elem;
+ bool owner_bit;
+
+ elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+ elem = elem_info->elem;
+ owner_bit = mlxsw_pci_eqe_owner_get(elem);
+ if (mlxsw_pci_elem_hw_owned(q, owner_bit))
+ return NULL;
+ q->consumer_counter++;
+ rmb(); /* make sure we read owned bit before the rest of elem */
+ return elem;
}
static void mlxsw_pci_eq_tasklet(unsigned long data)
@@ -749,11 +786,15 @@
struct mlxsw_pci_queue_ops {
const char *name;
enum mlxsw_pci_queue_type type;
+ void (*pre_init)(struct mlxsw_pci *mlxsw_pci,
+ struct mlxsw_pci_queue *q);
int (*init)(struct mlxsw_pci *mlxsw_pci, char *mbox,
struct mlxsw_pci_queue *q);
void (*fini)(struct mlxsw_pci *mlxsw_pci,
struct mlxsw_pci_queue *q);
void (*tasklet)(unsigned long data);
+ u16 (*elem_count_f)(const struct mlxsw_pci_queue *q);
+ u8 (*elem_size_f)(const struct mlxsw_pci_queue *q);
u16 elem_count;
u8 elem_size;
};
@@ -776,11 +817,12 @@
static const struct mlxsw_pci_queue_ops mlxsw_pci_cq_ops = {
.type = MLXSW_PCI_QUEUE_TYPE_CQ,
+ .pre_init = mlxsw_pci_cq_pre_init,
.init = mlxsw_pci_cq_init,
.fini = mlxsw_pci_cq_fini,
.tasklet = mlxsw_pci_cq_tasklet,
- .elem_count = MLXSW_PCI_CQE_COUNT,
- .elem_size = MLXSW_PCI_CQE_SIZE
+ .elem_count_f = mlxsw_pci_cq_elem_count,
+ .elem_size_f = mlxsw_pci_cq_elem_size
};
static const struct mlxsw_pci_queue_ops mlxsw_pci_eq_ops = {
@@ -800,10 +842,15 @@
int i;
int err;
- spin_lock_init(&q->lock);
q->num = q_num;
- q->count = q_ops->elem_count;
- q->elem_size = q_ops->elem_size;
+ if (q_ops->pre_init)
+ q_ops->pre_init(mlxsw_pci, q);
+
+ spin_lock_init(&q->lock);
+ q->count = q_ops->elem_count_f ? q_ops->elem_count_f(q) :
+ q_ops->elem_count;
+ q->elem_size = q_ops->elem_size_f ? q_ops->elem_size_f(q) :
+ q_ops->elem_size;
q->type = q_ops->type;
q->pci = mlxsw_pci;
@@ -832,7 +879,7 @@
elem_info = mlxsw_pci_queue_elem_info_get(q, i);
elem_info->elem =
- __mlxsw_pci_queue_elem_get(q, q_ops->elem_size, i);
+ __mlxsw_pci_queue_elem_get(q, q->elem_size, i);
}
mlxsw_cmd_mbox_zero(mbox);
@@ -912,6 +959,7 @@
u8 rdq_log2sz;
u8 num_cqs;
u8 cq_log2sz;
+ u8 cqv2_log2sz;
u8 num_eqs;
u8 eq_log2sz;
int err;
@@ -927,6 +975,7 @@
rdq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_rdq_sz_get(mbox);
num_cqs = mlxsw_cmd_mbox_query_aq_cap_max_num_cqs_get(mbox);
cq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_cq_sz_get(mbox);
+ cqv2_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_cqv2_sz_get(mbox);
num_eqs = mlxsw_cmd_mbox_query_aq_cap_max_num_eqs_get(mbox);
eq_log2sz = mlxsw_cmd_mbox_query_aq_cap_log_max_eq_sz_get(mbox);
@@ -938,12 +987,16 @@
if ((1 << sdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
(1 << rdq_log2sz != MLXSW_PCI_WQE_COUNT) ||
- (1 << cq_log2sz != MLXSW_PCI_CQE_COUNT) ||
+ (1 << cq_log2sz != MLXSW_PCI_CQE01_COUNT) ||
+ (mlxsw_pci->max_cqe_ver == MLXSW_PCI_CQE_V2 &&
+ (1 << cqv2_log2sz != MLXSW_PCI_CQE2_COUNT)) ||
(1 << eq_log2sz != MLXSW_PCI_EQE_COUNT)) {
dev_err(&pdev->dev, "Unsupported number of async queue descriptors\n");
return -EINVAL;
}
+ mlxsw_pci->num_sdq_cqs = num_sdqs;
+
err = mlxsw_pci_queue_group_init(mlxsw_pci, mbox, &mlxsw_pci_eq_ops,
num_eqs);
if (err) {
@@ -1184,6 +1237,11 @@
mlxsw_pci_config_profile_swid_config(mlxsw_pci, mbox, i,
&profile->swid_config[i]);
+ if (mlxsw_pci->max_cqe_ver > MLXSW_PCI_CQE_V0) {
+ mlxsw_cmd_mbox_config_profile_set_cqe_version_set(mbox, 1);
+ mlxsw_cmd_mbox_config_profile_cqe_version_set(mbox, 1);
+ }
+
return mlxsw_cmd_config_profile_set(mlxsw_pci->core, mbox);
}
@@ -1378,6 +1436,21 @@
if (err)
goto err_query_resources;
+ if (MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V2) &&
+ MLXSW_CORE_RES_GET(mlxsw_core, CQE_V2))
+ mlxsw_pci->max_cqe_ver = MLXSW_PCI_CQE_V2;
+ else if (MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V1) &&
+ MLXSW_CORE_RES_GET(mlxsw_core, CQE_V1))
+ mlxsw_pci->max_cqe_ver = MLXSW_PCI_CQE_V1;
+ else if ((MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V0) &&
+ MLXSW_CORE_RES_GET(mlxsw_core, CQE_V0)) ||
+ !MLXSW_CORE_RES_VALID(mlxsw_core, CQE_V0)) {
+ mlxsw_pci->max_cqe_ver = MLXSW_PCI_CQE_V0;
+ } else {
+ dev_err(&pdev->dev, "Invalid supported CQE version combination reported\n");
+ goto err_cqe_v_check;
+ }
+
err = mlxsw_pci_config_profile(mlxsw_pci, mbox, profile, res);
if (err)
goto err_config_profile;
@@ -1400,6 +1473,7 @@
mlxsw_pci_aqs_fini(mlxsw_pci);
err_aqs_init:
err_config_profile:
+err_cqe_v_check:
err_query_resources:
err_boardinfo:
mlxsw_pci_fw_area_fini(mlxsw_pci);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
index fb082ad2..963155f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
@@ -82,10 +82,12 @@
#define MLXSW_PCI_AQ_PAGES 8
#define MLXSW_PCI_AQ_SIZE (MLXSW_PCI_PAGE_SIZE * MLXSW_PCI_AQ_PAGES)
#define MLXSW_PCI_WQE_SIZE 32 /* 32 bytes per element */
-#define MLXSW_PCI_CQE_SIZE 16 /* 16 bytes per element */
+#define MLXSW_PCI_CQE01_SIZE 16 /* 16 bytes per element */
+#define MLXSW_PCI_CQE2_SIZE 32 /* 32 bytes per element */
#define MLXSW_PCI_EQE_SIZE 16 /* 16 bytes per element */
#define MLXSW_PCI_WQE_COUNT (MLXSW_PCI_AQ_SIZE / MLXSW_PCI_WQE_SIZE)
-#define MLXSW_PCI_CQE_COUNT (MLXSW_PCI_AQ_SIZE / MLXSW_PCI_CQE_SIZE)
+#define MLXSW_PCI_CQE01_COUNT (MLXSW_PCI_AQ_SIZE / MLXSW_PCI_CQE01_SIZE)
+#define MLXSW_PCI_CQE2_COUNT (MLXSW_PCI_AQ_SIZE / MLXSW_PCI_CQE2_SIZE)
#define MLXSW_PCI_EQE_COUNT (MLXSW_PCI_AQ_SIZE / MLXSW_PCI_EQE_SIZE)
#define MLXSW_PCI_EQE_UPDATE_COUNT 0x80
@@ -126,10 +128,48 @@
*/
MLXSW_ITEM64_INDEXED(pci, wqe, address, 0x08, 0, 64, 0x8, 0x0, false);
+enum mlxsw_pci_cqe_v {
+ MLXSW_PCI_CQE_V0,
+ MLXSW_PCI_CQE_V1,
+ MLXSW_PCI_CQE_V2,
+};
+
+#define mlxsw_pci_cqe_item_helpers(name, v0, v1, v2) \
+static inline u32 mlxsw_pci_cqe_##name##_get(enum mlxsw_pci_cqe_v v, char *cqe) \
+{ \
+ switch (v) { \
+ default: \
+ case MLXSW_PCI_CQE_V0: \
+ return mlxsw_pci_cqe##v0##_##name##_get(cqe); \
+ case MLXSW_PCI_CQE_V1: \
+ return mlxsw_pci_cqe##v1##_##name##_get(cqe); \
+ case MLXSW_PCI_CQE_V2: \
+ return mlxsw_pci_cqe##v2##_##name##_get(cqe); \
+ } \
+} \
+static inline void mlxsw_pci_cqe_##name##_set(enum mlxsw_pci_cqe_v v, \
+ char *cqe, u32 val) \
+{ \
+ switch (v) { \
+ default: \
+ case MLXSW_PCI_CQE_V0: \
+ mlxsw_pci_cqe##v0##_##name##_set(cqe, val); \
+ break; \
+ case MLXSW_PCI_CQE_V1: \
+ mlxsw_pci_cqe##v1##_##name##_set(cqe, val); \
+ break; \
+ case MLXSW_PCI_CQE_V2: \
+ mlxsw_pci_cqe##v2##_##name##_set(cqe, val); \
+ break; \
+ } \
+}
+
/* pci_cqe_lag
* Packet arrives from a port which is a LAG
*/
-MLXSW_ITEM32(pci, cqe, lag, 0x00, 23, 1);
+MLXSW_ITEM32(pci, cqe0, lag, 0x00, 23, 1);
+MLXSW_ITEM32(pci, cqe12, lag, 0x00, 24, 1);
+mlxsw_pci_cqe_item_helpers(lag, 0, 12, 12);
/* pci_cqe_system_port/lag_id
* When lag=0: System port on which the packet was received
@@ -138,8 +178,12 @@
* bits [3:0] sub_port on which the packet was received
*/
MLXSW_ITEM32(pci, cqe, system_port, 0x00, 0, 16);
-MLXSW_ITEM32(pci, cqe, lag_id, 0x00, 4, 12);
-MLXSW_ITEM32(pci, cqe, lag_port_index, 0x00, 0, 4);
+MLXSW_ITEM32(pci, cqe0, lag_id, 0x00, 4, 12);
+MLXSW_ITEM32(pci, cqe12, lag_id, 0x00, 0, 16);
+mlxsw_pci_cqe_item_helpers(lag_id, 0, 12, 12);
+MLXSW_ITEM32(pci, cqe0, lag_subport, 0x00, 0, 4);
+MLXSW_ITEM32(pci, cqe12, lag_subport, 0x00, 16, 8);
+mlxsw_pci_cqe_item_helpers(lag_subport, 0, 12, 12);
/* pci_cqe_wqe_counter
* WQE count of the WQEs completed on the associated dqn
@@ -162,28 +206,38 @@
* Length include CRC. Indicates the length field includes
* the packet's CRC.
*/
-MLXSW_ITEM32(pci, cqe, crc, 0x0C, 8, 1);
+MLXSW_ITEM32(pci, cqe0, crc, 0x0C, 8, 1);
+MLXSW_ITEM32(pci, cqe12, crc, 0x0C, 9, 1);
+mlxsw_pci_cqe_item_helpers(crc, 0, 12, 12);
/* pci_cqe_e
* CQE with Error.
*/
-MLXSW_ITEM32(pci, cqe, e, 0x0C, 7, 1);
+MLXSW_ITEM32(pci, cqe0, e, 0x0C, 7, 1);
+MLXSW_ITEM32(pci, cqe12, e, 0x00, 27, 1);
+mlxsw_pci_cqe_item_helpers(e, 0, 12, 12);
/* pci_cqe_sr
* 1 - Send Queue
* 0 - Receive Queue
*/
-MLXSW_ITEM32(pci, cqe, sr, 0x0C, 6, 1);
+MLXSW_ITEM32(pci, cqe0, sr, 0x0C, 6, 1);
+MLXSW_ITEM32(pci, cqe12, sr, 0x00, 26, 1);
+mlxsw_pci_cqe_item_helpers(sr, 0, 12, 12);
/* pci_cqe_dqn
* Descriptor Queue (DQ) Number.
*/
-MLXSW_ITEM32(pci, cqe, dqn, 0x0C, 1, 5);
+MLXSW_ITEM32(pci, cqe0, dqn, 0x0C, 1, 5);
+MLXSW_ITEM32(pci, cqe12, dqn, 0x0C, 1, 6);
+mlxsw_pci_cqe_item_helpers(dqn, 0, 12, 12);
/* pci_cqe_owner
* Ownership bit.
*/
-MLXSW_ITEM32(pci, cqe, owner, 0x0C, 0, 1);
+MLXSW_ITEM32(pci, cqe01, owner, 0x0C, 0, 1);
+MLXSW_ITEM32(pci, cqe2, owner, 0x1C, 0, 1);
+mlxsw_pci_cqe_item_helpers(owner, 01, 01, 2);
/* pci_eqe_event_type
* Event type.
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 6218231..3f4d7e2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -6833,6 +6833,12 @@
*/
MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH = 0x0,
+ /* Remote SPAN Ethernet VLAN.
+ * The packet is forwarded to the monitoring port on the monitoring
+ * VLAN.
+ */
+ MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH = 0x1,
+
/* Encapsulated Remote SPAN Ethernet L3 GRE.
* The packet is encapsulated with GRE header.
*/
diff --git a/drivers/net/ethernet/mellanox/mlxsw/resources.h b/drivers/net/ethernet/mellanox/mlxsw/resources.h
index 087aad5..fd9299c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/resources.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/resources.h
@@ -43,6 +43,9 @@
MLXSW_RES_ID_KVD_SINGLE_MIN_SIZE,
MLXSW_RES_ID_KVD_DOUBLE_MIN_SIZE,
MLXSW_RES_ID_MAX_TRAP_GROUPS,
+ MLXSW_RES_ID_CQE_V0,
+ MLXSW_RES_ID_CQE_V1,
+ MLXSW_RES_ID_CQE_V2,
MLXSW_RES_ID_COUNTER_POOL_SIZE,
MLXSW_RES_ID_MAX_SPAN,
MLXSW_RES_ID_COUNTER_SIZE_PACKETS_BYTES,
@@ -81,6 +84,9 @@
[MLXSW_RES_ID_KVD_SINGLE_MIN_SIZE] = 0x1002,
[MLXSW_RES_ID_KVD_DOUBLE_MIN_SIZE] = 0x1003,
[MLXSW_RES_ID_MAX_TRAP_GROUPS] = 0x2201,
+ [MLXSW_RES_ID_CQE_V0] = 0x2210,
+ [MLXSW_RES_ID_CQE_V1] = 0x2211,
+ [MLXSW_RES_ID_CQE_V2] = 0x2212,
[MLXSW_RES_ID_COUNTER_POOL_SIZE] = 0x2410,
[MLXSW_RES_ID_MAX_SPAN] = 0x2420,
[MLXSW_RES_ID_COUNTER_SIZE_PACKETS_BYTES] = 0x2443,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index ca38a30..bb252b3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -441,29 +441,29 @@
mlxsw_tx_hdr_type_set(txhdr, MLXSW_TXHDR_TYPE_CONTROL);
}
-int mlxsw_sp_port_vid_stp_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid,
- u8 state)
+enum mlxsw_reg_spms_state mlxsw_sp_stp_spms_state(u8 state)
{
- struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
- enum mlxsw_reg_spms_state spms_state;
- char *spms_pl;
- int err;
-
switch (state) {
case BR_STATE_FORWARDING:
- spms_state = MLXSW_REG_SPMS_STATE_FORWARDING;
- break;
+ return MLXSW_REG_SPMS_STATE_FORWARDING;
case BR_STATE_LEARNING:
- spms_state = MLXSW_REG_SPMS_STATE_LEARNING;
- break;
+ return MLXSW_REG_SPMS_STATE_LEARNING;
case BR_STATE_LISTENING: /* fall-through */
case BR_STATE_DISABLED: /* fall-through */
case BR_STATE_BLOCKING:
- spms_state = MLXSW_REG_SPMS_STATE_DISCARDING;
- break;
+ return MLXSW_REG_SPMS_STATE_DISCARDING;
default:
BUG();
}
+}
+
+int mlxsw_sp_port_vid_stp_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid,
+ u8 state)
+{
+ enum mlxsw_reg_spms_state spms_state = mlxsw_sp_stp_spms_state(state);
+ struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+ char *spms_pl;
+ int err;
spms_pl = kmalloc(MLXSW_REG_SPMS_LEN, GFP_KERNEL);
if (!spms_pl)
@@ -1238,21 +1238,10 @@
size_t len)
{
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
- u8 module = mlxsw_sp_port->mapping.module;
- u8 width = mlxsw_sp_port->mapping.width;
- u8 lane = mlxsw_sp_port->mapping.lane;
- int err;
- if (!mlxsw_sp_port->split)
- err = snprintf(name, len, "p%d", module + 1);
- else
- err = snprintf(name, len, "p%ds%d", module + 1,
- lane / width);
-
- if (err >= len)
- return -EINVAL;
-
- return 0;
+ return mlxsw_core_port_get_phys_port_name(mlxsw_sp_port->mlxsw_sp->core,
+ mlxsw_sp_port->local_port,
+ name, len);
}
static struct mlxsw_sp_port_mall_tc_entry *
@@ -2927,8 +2916,8 @@
}
mlxsw_core_port_eth_set(mlxsw_sp->core, mlxsw_sp_port->local_port,
- mlxsw_sp_port, dev, mlxsw_sp_port->split,
- module);
+ mlxsw_sp_port, dev, module + 1,
+ mlxsw_sp_port->split, lane / width);
mlxsw_core_schedule_dw(&mlxsw_sp_port->periodic_hw_stats.update_dw, 0);
return 0;
@@ -3666,6 +3655,15 @@
goto err_lag_init;
}
+ /* Initialize SPAN before router and switchdev, so that those components
+ * can call mlxsw_sp_span_respin().
+ */
+ err = mlxsw_sp_span_init(mlxsw_sp);
+ if (err) {
+ dev_err(mlxsw_sp->bus_info->dev, "Failed to init span system\n");
+ goto err_span_init;
+ }
+
err = mlxsw_sp_switchdev_init(mlxsw_sp);
if (err) {
dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize switchdev\n");
@@ -3684,15 +3682,6 @@
goto err_afa_init;
}
- err = mlxsw_sp_span_init(mlxsw_sp);
- if (err) {
- dev_err(mlxsw_sp->bus_info->dev, "Failed to init span system\n");
- goto err_span_init;
- }
-
- /* Initialize router after SPAN is initialized, so that the FIB and
- * neighbor event handlers can issue SPAN respin.
- */
err = mlxsw_sp_router_init(mlxsw_sp);
if (err) {
dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize router\n");
@@ -3739,14 +3728,14 @@
err_netdev_notifier:
mlxsw_sp_router_fini(mlxsw_sp);
err_router_init:
- mlxsw_sp_span_fini(mlxsw_sp);
-err_span_init:
mlxsw_sp_afa_fini(mlxsw_sp);
err_afa_init:
mlxsw_sp_counter_pool_fini(mlxsw_sp);
err_counter_pool_init:
mlxsw_sp_switchdev_fini(mlxsw_sp);
err_switchdev_init:
+ mlxsw_sp_span_fini(mlxsw_sp);
+err_span_init:
mlxsw_sp_lag_fini(mlxsw_sp);
err_lag_init:
mlxsw_sp_buffers_fini(mlxsw_sp);
@@ -3768,10 +3757,10 @@
mlxsw_sp_acl_fini(mlxsw_sp);
unregister_netdevice_notifier(&mlxsw_sp->netdevice_nb);
mlxsw_sp_router_fini(mlxsw_sp);
- mlxsw_sp_span_fini(mlxsw_sp);
mlxsw_sp_afa_fini(mlxsw_sp);
mlxsw_sp_counter_pool_fini(mlxsw_sp);
mlxsw_sp_switchdev_fini(mlxsw_sp);
+ mlxsw_sp_span_fini(mlxsw_sp);
mlxsw_sp_lag_fini(mlxsw_sp);
mlxsw_sp_buffers_fini(mlxsw_sp);
mlxsw_sp_traps_fini(mlxsw_sp);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 804d4d2..4a519d8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -364,6 +364,7 @@
int mlxsw_sp_port_ets_maxrate_set(struct mlxsw_sp_port *mlxsw_sp_port,
enum mlxsw_reg_qeec_hr hr, u8 index,
u8 next_index, u32 maxrate);
+enum mlxsw_reg_spms_state mlxsw_sp_stp_spms_state(u8 stp_state);
int mlxsw_sp_port_vid_stp_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid,
u8 state);
int mlxsw_sp_port_vp_mode_set(struct mlxsw_sp_port *mlxsw_sp_port, bool enable);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 1904c03..77b2adb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -442,7 +442,7 @@
struct mlxsw_sp_rt6 {
struct list_head list;
- struct rt6_info *rt;
+ struct fib6_info *rt;
};
struct mlxsw_sp_lpm_tree {
@@ -2770,9 +2770,9 @@
struct in6_addr *gw;
int ifindex, weight;
- ifindex = mlxsw_sp_rt6->rt->dst.dev->ifindex;
- weight = mlxsw_sp_rt6->rt->rt6i_nh_weight;
- gw = &mlxsw_sp_rt6->rt->rt6i_gateway;
+ ifindex = mlxsw_sp_rt6->rt->fib6_nh.nh_dev->ifindex;
+ weight = mlxsw_sp_rt6->rt->fib6_nh.nh_weight;
+ gw = &mlxsw_sp_rt6->rt->fib6_nh.nh_gw;
if (!mlxsw_sp_nexthop6_group_has_nexthop(nh_grp, gw, ifindex,
weight))
return false;
@@ -2838,7 +2838,7 @@
struct net_device *dev;
list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
- dev = mlxsw_sp_rt6->rt->dst.dev;
+ dev = mlxsw_sp_rt6->rt->fib6_nh.nh_dev;
val ^= dev->ifindex;
}
@@ -3834,11 +3834,11 @@
for (i = 0; i < nh_grp->count; i++) {
struct mlxsw_sp_nexthop *nh = &nh_grp->nexthops[i];
- struct rt6_info *rt = mlxsw_sp_rt6->rt;
+ struct fib6_info *rt = mlxsw_sp_rt6->rt;
- if (nh->rif && nh->rif->dev == rt->dst.dev &&
+ if (nh->rif && nh->rif->dev == rt->fib6_nh.nh_dev &&
ipv6_addr_equal((const struct in6_addr *) &nh->gw_addr,
- &rt->rt6i_gateway))
+ &rt->fib6_nh.nh_gw))
return nh;
continue;
}
@@ -3895,7 +3895,7 @@
if (fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_LOCAL) {
list_first_entry(&fib6_entry->rt6_list, struct mlxsw_sp_rt6,
- list)->rt->rt6i_nh_flags |= RTNH_F_OFFLOAD;
+ list)->rt->fib6_nh.nh_flags |= RTNH_F_OFFLOAD;
return;
}
@@ -3905,9 +3905,9 @@
nh = mlxsw_sp_rt6_nexthop(nh_grp, mlxsw_sp_rt6);
if (nh && nh->offloaded)
- mlxsw_sp_rt6->rt->rt6i_nh_flags |= RTNH_F_OFFLOAD;
+ mlxsw_sp_rt6->rt->fib6_nh.nh_flags |= RTNH_F_OFFLOAD;
else
- mlxsw_sp_rt6->rt->rt6i_nh_flags &= ~RTNH_F_OFFLOAD;
+ mlxsw_sp_rt6->rt->fib6_nh.nh_flags &= ~RTNH_F_OFFLOAD;
}
}
@@ -3920,9 +3920,9 @@
fib6_entry = container_of(fib_entry, struct mlxsw_sp_fib6_entry,
common);
list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
- struct rt6_info *rt = mlxsw_sp_rt6->rt;
+ struct fib6_info *rt = mlxsw_sp_rt6->rt;
- rt->rt6i_nh_flags &= ~RTNH_F_OFFLOAD;
+ rt->fib6_nh.nh_flags &= ~RTNH_F_OFFLOAD;
}
}
@@ -4699,29 +4699,29 @@
mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
}
-static bool mlxsw_sp_fib6_rt_should_ignore(const struct rt6_info *rt)
+static bool mlxsw_sp_fib6_rt_should_ignore(const struct fib6_info *rt)
{
/* Packets with link-local destination IP arriving to the router
* are trapped to the CPU, so no need to program specific routes
* for them.
*/
- if (ipv6_addr_type(&rt->rt6i_dst.addr) & IPV6_ADDR_LINKLOCAL)
+ if (ipv6_addr_type(&rt->fib6_dst.addr) & IPV6_ADDR_LINKLOCAL)
return true;
/* Multicast routes aren't supported, so ignore them. Neighbour
* Discovery packets are specifically trapped.
*/
- if (ipv6_addr_type(&rt->rt6i_dst.addr) & IPV6_ADDR_MULTICAST)
+ if (ipv6_addr_type(&rt->fib6_dst.addr) & IPV6_ADDR_MULTICAST)
return true;
/* Cloned routes are irrelevant in the forwarding path. */
- if (rt->rt6i_flags & RTF_CACHE)
+ if (rt->fib6_flags & RTF_CACHE)
return true;
return false;
}
-static struct mlxsw_sp_rt6 *mlxsw_sp_rt6_create(struct rt6_info *rt)
+static struct mlxsw_sp_rt6 *mlxsw_sp_rt6_create(struct fib6_info *rt)
{
struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
@@ -4734,18 +4734,18 @@
* memory.
*/
mlxsw_sp_rt6->rt = rt;
- rt6_hold(rt);
+ fib6_info_hold(rt);
return mlxsw_sp_rt6;
}
#if IS_ENABLED(CONFIG_IPV6)
-static void mlxsw_sp_rt6_release(struct rt6_info *rt)
+static void mlxsw_sp_rt6_release(struct fib6_info *rt)
{
- rt6_release(rt);
+ fib6_info_release(rt);
}
#else
-static void mlxsw_sp_rt6_release(struct rt6_info *rt)
+static void mlxsw_sp_rt6_release(struct fib6_info *rt)
{
}
#endif
@@ -4756,13 +4756,13 @@
kfree(mlxsw_sp_rt6);
}
-static bool mlxsw_sp_fib6_rt_can_mp(const struct rt6_info *rt)
+static bool mlxsw_sp_fib6_rt_can_mp(const struct fib6_info *rt)
{
/* RTF_CACHE routes are ignored */
- return (rt->rt6i_flags & (RTF_GATEWAY | RTF_ADDRCONF)) == RTF_GATEWAY;
+ return (rt->fib6_flags & (RTF_GATEWAY | RTF_ADDRCONF)) == RTF_GATEWAY;
}
-static struct rt6_info *
+static struct fib6_info *
mlxsw_sp_fib6_entry_rt(const struct mlxsw_sp_fib6_entry *fib6_entry)
{
return list_first_entry(&fib6_entry->rt6_list, struct mlxsw_sp_rt6,
@@ -4771,7 +4771,7 @@
static struct mlxsw_sp_fib6_entry *
mlxsw_sp_fib6_node_mp_entry_find(const struct mlxsw_sp_fib_node *fib_node,
- const struct rt6_info *nrt, bool replace)
+ const struct fib6_info *nrt, bool replace)
{
struct mlxsw_sp_fib6_entry *fib6_entry;
@@ -4779,21 +4779,21 @@
return NULL;
list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
- struct rt6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
+ struct fib6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
/* RT6_TABLE_LOCAL and RT6_TABLE_MAIN share the same
* virtual router.
*/
- if (rt->rt6i_table->tb6_id > nrt->rt6i_table->tb6_id)
+ if (rt->fib6_table->tb6_id > nrt->fib6_table->tb6_id)
continue;
- if (rt->rt6i_table->tb6_id != nrt->rt6i_table->tb6_id)
+ if (rt->fib6_table->tb6_id != nrt->fib6_table->tb6_id)
break;
- if (rt->rt6i_metric < nrt->rt6i_metric)
+ if (rt->fib6_metric < nrt->fib6_metric)
continue;
- if (rt->rt6i_metric == nrt->rt6i_metric &&
+ if (rt->fib6_metric == nrt->fib6_metric &&
mlxsw_sp_fib6_rt_can_mp(rt))
return fib6_entry;
- if (rt->rt6i_metric > nrt->rt6i_metric)
+ if (rt->fib6_metric > nrt->fib6_metric)
break;
}
@@ -4802,7 +4802,7 @@
static struct mlxsw_sp_rt6 *
mlxsw_sp_fib6_entry_rt_find(const struct mlxsw_sp_fib6_entry *fib6_entry,
- const struct rt6_info *rt)
+ const struct fib6_info *rt)
{
struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
@@ -4815,21 +4815,21 @@
}
static bool mlxsw_sp_nexthop6_ipip_type(const struct mlxsw_sp *mlxsw_sp,
- const struct rt6_info *rt,
+ const struct fib6_info *rt,
enum mlxsw_sp_ipip_type *ret)
{
- return rt->dst.dev &&
- mlxsw_sp_netdev_ipip_type(mlxsw_sp, rt->dst.dev, ret);
+ return rt->fib6_nh.nh_dev &&
+ mlxsw_sp_netdev_ipip_type(mlxsw_sp, rt->fib6_nh.nh_dev, ret);
}
static int mlxsw_sp_nexthop6_type_init(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_nexthop_group *nh_grp,
struct mlxsw_sp_nexthop *nh,
- const struct rt6_info *rt)
+ const struct fib6_info *rt)
{
const struct mlxsw_sp_ipip_ops *ipip_ops;
struct mlxsw_sp_ipip_entry *ipip_entry;
- struct net_device *dev = rt->dst.dev;
+ struct net_device *dev = rt->fib6_nh.nh_dev;
struct mlxsw_sp_rif *rif;
int err;
@@ -4870,13 +4870,13 @@
static int mlxsw_sp_nexthop6_init(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_nexthop_group *nh_grp,
struct mlxsw_sp_nexthop *nh,
- const struct rt6_info *rt)
+ const struct fib6_info *rt)
{
- struct net_device *dev = rt->dst.dev;
+ struct net_device *dev = rt->fib6_nh.nh_dev;
nh->nh_grp = nh_grp;
- nh->nh_weight = rt->rt6i_nh_weight;
- memcpy(&nh->gw_addr, &rt->rt6i_gateway, sizeof(nh->gw_addr));
+ nh->nh_weight = rt->fib6_nh.nh_weight;
+ memcpy(&nh->gw_addr, &rt->fib6_nh.nh_gw, sizeof(nh->gw_addr));
mlxsw_sp_nexthop_counter_alloc(mlxsw_sp, nh);
list_add_tail(&nh->router_list_node, &mlxsw_sp->router->nexthop_list);
@@ -4897,9 +4897,9 @@
}
static bool mlxsw_sp_rt6_is_gateway(const struct mlxsw_sp *mlxsw_sp,
- const struct rt6_info *rt)
+ const struct fib6_info *rt)
{
- return rt->rt6i_flags & RTF_GATEWAY ||
+ return rt->fib6_flags & RTF_GATEWAY ||
mlxsw_sp_nexthop6_ipip_type(mlxsw_sp, rt, NULL);
}
@@ -4928,7 +4928,7 @@
nh_grp->gateway = mlxsw_sp_rt6_is_gateway(mlxsw_sp, mlxsw_sp_rt6->rt);
nh_grp->count = fib6_entry->nrt6;
for (i = 0; i < nh_grp->count; i++) {
- struct rt6_info *rt = mlxsw_sp_rt6->rt;
+ struct fib6_info *rt = mlxsw_sp_rt6->rt;
nh = &nh_grp->nexthops[i];
err = mlxsw_sp_nexthop6_init(mlxsw_sp, nh_grp, nh, rt);
@@ -5040,7 +5040,7 @@
static int
mlxsw_sp_fib6_entry_nexthop_add(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib6_entry *fib6_entry,
- struct rt6_info *rt)
+ struct fib6_info *rt)
{
struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
int err;
@@ -5068,7 +5068,7 @@
static void
mlxsw_sp_fib6_entry_nexthop_del(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib6_entry *fib6_entry,
- struct rt6_info *rt)
+ struct fib6_info *rt)
{
struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
@@ -5084,7 +5084,7 @@
static void mlxsw_sp_fib6_entry_type_set(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib_entry *fib_entry,
- const struct rt6_info *rt)
+ const struct fib6_info *rt)
{
/* Packets hitting RTF_REJECT routes need to be discarded by the
* stack. We can rely on their destination device not having a
@@ -5092,9 +5092,9 @@
* local, which will cause them to be trapped with a lower
* priority than packets that need to be locally received.
*/
- if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
+ if (rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST))
fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
- else if (rt->rt6i_flags & RTF_REJECT)
+ else if (rt->fib6_flags & RTF_REJECT)
fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
else if (mlxsw_sp_rt6_is_gateway(mlxsw_sp, rt))
fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_REMOTE;
@@ -5118,7 +5118,7 @@
static struct mlxsw_sp_fib6_entry *
mlxsw_sp_fib6_entry_create(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_fib_node *fib_node,
- struct rt6_info *rt)
+ struct fib6_info *rt)
{
struct mlxsw_sp_fib6_entry *fib6_entry;
struct mlxsw_sp_fib_entry *fib_entry;
@@ -5168,25 +5168,25 @@
static struct mlxsw_sp_fib6_entry *
mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
- const struct rt6_info *nrt, bool replace)
+ const struct fib6_info *nrt, bool replace)
{
struct mlxsw_sp_fib6_entry *fib6_entry, *fallback = NULL;
list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
- struct rt6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
+ struct fib6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
- if (rt->rt6i_table->tb6_id > nrt->rt6i_table->tb6_id)
+ if (rt->fib6_table->tb6_id > nrt->fib6_table->tb6_id)
continue;
- if (rt->rt6i_table->tb6_id != nrt->rt6i_table->tb6_id)
+ if (rt->fib6_table->tb6_id != nrt->fib6_table->tb6_id)
break;
- if (replace && rt->rt6i_metric == nrt->rt6i_metric) {
+ if (replace && rt->fib6_metric == nrt->fib6_metric) {
if (mlxsw_sp_fib6_rt_can_mp(rt) ==
mlxsw_sp_fib6_rt_can_mp(nrt))
return fib6_entry;
if (mlxsw_sp_fib6_rt_can_mp(nrt))
fallback = fallback ?: fib6_entry;
}
- if (rt->rt6i_metric > nrt->rt6i_metric)
+ if (rt->fib6_metric > nrt->fib6_metric)
return fallback ?: fib6_entry;
}
@@ -5198,7 +5198,7 @@
bool replace)
{
struct mlxsw_sp_fib_node *fib_node = new6_entry->common.fib_node;
- struct rt6_info *nrt = mlxsw_sp_fib6_entry_rt(new6_entry);
+ struct fib6_info *nrt = mlxsw_sp_fib6_entry_rt(new6_entry);
struct mlxsw_sp_fib6_entry *fib6_entry;
fib6_entry = mlxsw_sp_fib6_node_entry_find(fib_node, nrt, replace);
@@ -5213,9 +5213,9 @@
struct mlxsw_sp_fib6_entry *last;
list_for_each_entry(last, &fib_node->entry_list, common.list) {
- struct rt6_info *rt = mlxsw_sp_fib6_entry_rt(last);
+ struct fib6_info *rt = mlxsw_sp_fib6_entry_rt(last);
- if (nrt->rt6i_table->tb6_id > rt->rt6i_table->tb6_id)
+ if (nrt->fib6_table->tb6_id > rt->fib6_table->tb6_id)
break;
fib6_entry = last;
}
@@ -5268,29 +5268,29 @@
static struct mlxsw_sp_fib6_entry *
mlxsw_sp_fib6_entry_lookup(struct mlxsw_sp *mlxsw_sp,
- const struct rt6_info *rt)
+ const struct fib6_info *rt)
{
struct mlxsw_sp_fib6_entry *fib6_entry;
struct mlxsw_sp_fib_node *fib_node;
struct mlxsw_sp_fib *fib;
struct mlxsw_sp_vr *vr;
- vr = mlxsw_sp_vr_find(mlxsw_sp, rt->rt6i_table->tb6_id);
+ vr = mlxsw_sp_vr_find(mlxsw_sp, rt->fib6_table->tb6_id);
if (!vr)
return NULL;
fib = mlxsw_sp_vr_fib(vr, MLXSW_SP_L3_PROTO_IPV6);
- fib_node = mlxsw_sp_fib_node_lookup(fib, &rt->rt6i_dst.addr,
- sizeof(rt->rt6i_dst.addr),
- rt->rt6i_dst.plen);
+ fib_node = mlxsw_sp_fib_node_lookup(fib, &rt->fib6_dst.addr,
+ sizeof(rt->fib6_dst.addr),
+ rt->fib6_dst.plen);
if (!fib_node)
return NULL;
list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
- struct rt6_info *iter_rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
+ struct fib6_info *iter_rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
- if (rt->rt6i_table->tb6_id == iter_rt->rt6i_table->tb6_id &&
- rt->rt6i_metric == iter_rt->rt6i_metric &&
+ if (rt->fib6_table->tb6_id == iter_rt->fib6_table->tb6_id &&
+ rt->fib6_metric == iter_rt->fib6_metric &&
mlxsw_sp_fib6_entry_rt_find(fib6_entry, rt))
return fib6_entry;
}
@@ -5316,7 +5316,7 @@
}
static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
- struct rt6_info *rt, bool replace)
+ struct fib6_info *rt, bool replace)
{
struct mlxsw_sp_fib6_entry *fib6_entry;
struct mlxsw_sp_fib_node *fib_node;
@@ -5325,16 +5325,16 @@
if (mlxsw_sp->router->aborted)
return 0;
- if (rt->rt6i_src.plen)
+ if (rt->fib6_src.plen)
return -EINVAL;
if (mlxsw_sp_fib6_rt_should_ignore(rt))
return 0;
- fib_node = mlxsw_sp_fib_node_get(mlxsw_sp, rt->rt6i_table->tb6_id,
- &rt->rt6i_dst.addr,
- sizeof(rt->rt6i_dst.addr),
- rt->rt6i_dst.plen,
+ fib_node = mlxsw_sp_fib_node_get(mlxsw_sp, rt->fib6_table->tb6_id,
+ &rt->fib6_dst.addr,
+ sizeof(rt->fib6_dst.addr),
+ rt->fib6_dst.plen,
MLXSW_SP_L3_PROTO_IPV6);
if (IS_ERR(fib_node))
return PTR_ERR(fib_node);
@@ -5373,7 +5373,7 @@
}
static void mlxsw_sp_router_fib6_del(struct mlxsw_sp *mlxsw_sp,
- struct rt6_info *rt)
+ struct fib6_info *rt)
{
struct mlxsw_sp_fib6_entry *fib6_entry;
struct mlxsw_sp_fib_node *fib_node;
@@ -5725,6 +5725,7 @@
switch (fib_work->event) {
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
+ case FIB_EVENT_ENTRY_APPEND: /* fall through */
case FIB_EVENT_ENTRY_ADD:
replace = fib_work->event == FIB_EVENT_ENTRY_REPLACE;
err = mlxsw_sp_router_fib6_add(mlxsw_sp,
@@ -5831,12 +5832,13 @@
switch (fib_work->event) {
case FIB_EVENT_ENTRY_REPLACE: /* fall through */
+ case FIB_EVENT_ENTRY_APPEND: /* fall through */
case FIB_EVENT_ENTRY_ADD: /* fall through */
case FIB_EVENT_ENTRY_DEL:
fen6_info = container_of(info, struct fib6_entry_notifier_info,
info);
fib_work->fen6_info = *fen6_info;
- rt6_hold(fib_work->fen6_info.rt);
+ fib6_info_hold(fib_work->fen6_info.rt);
break;
}
}
@@ -5882,24 +5884,24 @@
switch (info->family) {
case AF_INET:
if (!fib4_rule_default(rule) && !rule->l3mdev)
- err = -1;
+ err = -EOPNOTSUPP;
break;
case AF_INET6:
if (!fib6_rule_default(rule) && !rule->l3mdev)
- err = -1;
+ err = -EOPNOTSUPP;
break;
case RTNL_FAMILY_IPMR:
if (!ipmr_rule_default(rule) && !rule->l3mdev)
- err = -1;
+ err = -EOPNOTSUPP;
break;
case RTNL_FAMILY_IP6MR:
if (!ip6mr_rule_default(rule) && !rule->l3mdev)
- err = -1;
+ err = -EOPNOTSUPP;
break;
}
if (err < 0)
- NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported. Aborting offload");
+ NL_SET_ERR_MSG_MOD(extack, "FIB rules not supported");
return err;
}
@@ -5926,8 +5928,15 @@
case FIB_EVENT_RULE_DEL:
err = mlxsw_sp_router_fib_rule_event(event, info,
router->mlxsw_sp);
- if (!err)
- return NOTIFY_DONE;
+ if (!err || info->extack)
+ return notifier_from_errno(err);
+ break;
+ case FIB_EVENT_ENTRY_ADD:
+ if (router->aborted) {
+ NL_SET_ERR_MSG_MOD(info->extack, "FIB offload was aborted. Not configuring route");
+ return notifier_from_errno(-EINVAL);
+ }
+ break;
}
fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index 65a7770..da3f7f5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -32,6 +32,7 @@
* POSSIBILITY OF SUCH DAMAGE.
*/
+#include <linux/if_bridge.h>
#include <linux/list.h>
#include <net/arp.h>
#include <net/gre.h>
@@ -39,8 +40,9 @@
#include <net/ip6_tunnel.h>
#include "spectrum.h"
-#include "spectrum_span.h"
#include "spectrum_ipip.h"
+#include "spectrum_span.h"
+#include "spectrum_switchdev.h"
int mlxsw_sp_span_init(struct mlxsw_sp *mlxsw_sp)
{
@@ -135,14 +137,14 @@
static int mlxsw_sp_span_dmac(struct neigh_table *tbl,
const void *pkey,
- struct net_device *l3edev,
+ struct net_device *dev,
unsigned char dmac[ETH_ALEN])
{
- struct neighbour *neigh = neigh_lookup(tbl, pkey, l3edev);
+ struct neighbour *neigh = neigh_lookup(tbl, pkey, dev);
int err = 0;
if (!neigh) {
- neigh = neigh_create(tbl, pkey, l3edev);
+ neigh = neigh_create(tbl, pkey, dev);
if (IS_ERR(neigh))
return PTR_ERR(neigh);
}
@@ -167,8 +169,97 @@
return 0;
}
+static struct net_device *
+mlxsw_sp_span_entry_bridge_8021q(const struct net_device *br_dev,
+ unsigned char *dmac,
+ u16 *p_vid)
+{
+ struct bridge_vlan_info vinfo;
+ struct net_device *edev;
+ u16 vid = *p_vid;
+
+ if (!vid && WARN_ON(br_vlan_get_pvid(br_dev, &vid)))
+ return NULL;
+ if (!vid ||
+ br_vlan_get_info(br_dev, vid, &vinfo) ||
+ !(vinfo.flags & BRIDGE_VLAN_INFO_BRENTRY))
+ return NULL;
+
+ edev = br_fdb_find_port(br_dev, dmac, vid);
+ if (!edev)
+ return NULL;
+
+ if (br_vlan_get_info(edev, vid, &vinfo))
+ return NULL;
+ if (!(vinfo.flags & BRIDGE_VLAN_INFO_UNTAGGED))
+ *p_vid = vid;
+ return edev;
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_bridge_8021d(const struct net_device *br_dev,
+ unsigned char *dmac)
+{
+ return br_fdb_find_port(br_dev, dmac, 0);
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_bridge(const struct net_device *br_dev,
+ unsigned char dmac[ETH_ALEN],
+ u16 *p_vid)
+{
+ struct mlxsw_sp_bridge_port *bridge_port;
+ enum mlxsw_reg_spms_state spms_state;
+ struct net_device *dev = NULL;
+ struct mlxsw_sp_port *port;
+ u8 stp_state;
+
+ if (br_vlan_enabled(br_dev))
+ dev = mlxsw_sp_span_entry_bridge_8021q(br_dev, dmac, p_vid);
+ else if (!*p_vid)
+ dev = mlxsw_sp_span_entry_bridge_8021d(br_dev, dmac);
+ if (!dev)
+ return NULL;
+
+ port = mlxsw_sp_port_dev_lower_find(dev);
+ if (!port)
+ return NULL;
+
+ bridge_port = mlxsw_sp_bridge_port_find(port->mlxsw_sp->bridge, dev);
+ if (!bridge_port)
+ return NULL;
+
+ stp_state = mlxsw_sp_bridge_port_stp_state(bridge_port);
+ spms_state = mlxsw_sp_stp_spms_state(stp_state);
+ if (spms_state != MLXSW_REG_SPMS_STATE_FORWARDING)
+ return NULL;
+
+ return dev;
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_vlan(const struct net_device *vlan_dev,
+ u16 *p_vid)
+{
+ *p_vid = vlan_dev_vlan_id(vlan_dev);
+ return vlan_dev_real_dev(vlan_dev);
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_lag(struct net_device *lag_dev)
+{
+ struct net_device *dev;
+ struct list_head *iter;
+
+ netdev_for_each_lower_dev(lag_dev, dev, iter)
+ if ((dev->flags & IFF_UP) && mlxsw_sp_port_dev_check(dev))
+ return dev;
+
+ return NULL;
+}
+
static __maybe_unused int
-mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *l3edev,
+mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *edev,
union mlxsw_sp_l3addr saddr,
union mlxsw_sp_l3addr daddr,
union mlxsw_sp_l3addr gw,
@@ -177,21 +268,51 @@
struct mlxsw_sp_span_parms *sparmsp)
{
unsigned char dmac[ETH_ALEN];
+ u16 vid = 0;
if (mlxsw_sp_l3addr_is_zero(gw))
gw = daddr;
- if (!l3edev || !mlxsw_sp_port_dev_check(l3edev) ||
- mlxsw_sp_span_dmac(tbl, &gw, l3edev, dmac))
- return mlxsw_sp_span_entry_unoffloadable(sparmsp);
+ if (!edev || mlxsw_sp_span_dmac(tbl, &gw, edev, dmac))
+ goto unoffloadable;
- sparmsp->dest_port = netdev_priv(l3edev);
+ if (is_vlan_dev(edev))
+ edev = mlxsw_sp_span_entry_vlan(edev, &vid);
+
+ if (netif_is_bridge_master(edev)) {
+ edev = mlxsw_sp_span_entry_bridge(edev, dmac, &vid);
+ if (!edev)
+ goto unoffloadable;
+ }
+
+ if (is_vlan_dev(edev)) {
+ if (vid || !(edev->flags & IFF_UP))
+ goto unoffloadable;
+ edev = mlxsw_sp_span_entry_vlan(edev, &vid);
+ }
+
+ if (netif_is_lag_master(edev)) {
+ if (!(edev->flags & IFF_UP))
+ goto unoffloadable;
+ edev = mlxsw_sp_span_entry_lag(edev);
+ if (!edev)
+ goto unoffloadable;
+ }
+
+ if (!mlxsw_sp_port_dev_check(edev))
+ goto unoffloadable;
+
+ sparmsp->dest_port = netdev_priv(edev);
sparmsp->ttl = ttl;
memcpy(sparmsp->dmac, dmac, ETH_ALEN);
- memcpy(sparmsp->smac, l3edev->dev_addr, ETH_ALEN);
+ memcpy(sparmsp->smac, edev->dev_addr, ETH_ALEN);
sparmsp->saddr = saddr;
sparmsp->daddr = daddr;
+ sparmsp->vid = vid;
return 0;
+
+unoffloadable:
+ return mlxsw_sp_span_entry_unoffloadable(sparmsp);
}
#if IS_ENABLED(CONFIG_NET_IPGRE)
@@ -268,9 +389,10 @@
/* Create a new port analayzer entry for local_port. */
mlxsw_reg_mpat_pack(mpat_pl, pa_id, local_port, true,
MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH_L3);
+ mlxsw_reg_mpat_eth_rspan_pack(mpat_pl, sparms.vid);
mlxsw_reg_mpat_eth_rspan_l2_pack(mpat_pl,
MLXSW_REG_MPAT_ETH_RSPAN_VERSION_NO_HEADER,
- sparms.dmac, false);
+ sparms.dmac, !!sparms.vid);
mlxsw_reg_mpat_eth_rspan_l3_ipv4_pack(mpat_pl,
sparms.ttl, sparms.smac,
be32_to_cpu(sparms.saddr.addr4),
@@ -368,9 +490,10 @@
/* Create a new port analayzer entry for local_port. */
mlxsw_reg_mpat_pack(mpat_pl, pa_id, local_port, true,
MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH_L3);
+ mlxsw_reg_mpat_eth_rspan_pack(mpat_pl, sparms.vid);
mlxsw_reg_mpat_eth_rspan_l2_pack(mpat_pl,
MLXSW_REG_MPAT_ETH_RSPAN_VERSION_NO_HEADER,
- sparms.dmac, false);
+ sparms.dmac, !!sparms.vid);
mlxsw_reg_mpat_eth_rspan_l3_ipv6_pack(mpat_pl, sparms.ttl, sparms.smac,
sparms.saddr.addr6,
sparms.daddr.addr6);
@@ -394,6 +517,61 @@
};
#endif
+static bool
+mlxsw_sp_span_vlan_can_handle(const struct net_device *dev)
+{
+ return is_vlan_dev(dev) &&
+ mlxsw_sp_port_dev_check(vlan_dev_real_dev(dev));
+}
+
+static int
+mlxsw_sp_span_entry_vlan_parms(const struct net_device *to_dev,
+ struct mlxsw_sp_span_parms *sparmsp)
+{
+ struct net_device *real_dev;
+ u16 vid;
+
+ if (!(to_dev->flags & IFF_UP))
+ return mlxsw_sp_span_entry_unoffloadable(sparmsp);
+
+ real_dev = mlxsw_sp_span_entry_vlan(to_dev, &vid);
+ sparmsp->dest_port = netdev_priv(real_dev);
+ sparmsp->vid = vid;
+ return 0;
+}
+
+static int
+mlxsw_sp_span_entry_vlan_configure(struct mlxsw_sp_span_entry *span_entry,
+ struct mlxsw_sp_span_parms sparms)
+{
+ struct mlxsw_sp_port *dest_port = sparms.dest_port;
+ struct mlxsw_sp *mlxsw_sp = dest_port->mlxsw_sp;
+ u8 local_port = dest_port->local_port;
+ char mpat_pl[MLXSW_REG_MPAT_LEN];
+ int pa_id = span_entry->id;
+
+ mlxsw_reg_mpat_pack(mpat_pl, pa_id, local_port, true,
+ MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH);
+ mlxsw_reg_mpat_eth_rspan_pack(mpat_pl, sparms.vid);
+
+ return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mpat), mpat_pl);
+}
+
+static void
+mlxsw_sp_span_entry_vlan_deconfigure(struct mlxsw_sp_span_entry *span_entry)
+{
+ mlxsw_sp_span_entry_deconfigure_common(span_entry,
+ MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH);
+}
+
+static const
+struct mlxsw_sp_span_entry_ops mlxsw_sp_span_entry_ops_vlan = {
+ .can_handle = mlxsw_sp_span_vlan_can_handle,
+ .parms = mlxsw_sp_span_entry_vlan_parms,
+ .configure = mlxsw_sp_span_entry_vlan_configure,
+ .deconfigure = mlxsw_sp_span_entry_vlan_deconfigure,
+};
+
static const
struct mlxsw_sp_span_entry_ops *const mlxsw_sp_span_entry_types[] = {
&mlxsw_sp_span_entry_ops_phys,
@@ -403,6 +581,7 @@
#if IS_ENABLED(CONFIG_IPV6_GRE)
&mlxsw_sp_span_entry_ops_gretap6,
#endif
+ &mlxsw_sp_span_entry_ops_vlan,
};
static int
@@ -766,7 +945,7 @@
span_entry = mlxsw_sp_span_entry_get(mlxsw_sp, to_dev, ops, sparms);
if (!span_entry)
- return -ENOENT;
+ return -ENOBUFS;
netdev_dbg(from->dev, "Adding inspected port to SPAN entry %d\n",
span_entry->id);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
index 4b87ec2..14a6de9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.h
@@ -63,6 +63,7 @@
unsigned char smac[ETH_ALEN];
union mlxsw_sp_l3addr daddr;
union mlxsw_sp_l3addr saddr;
+ u16 vid;
};
struct mlxsw_sp_span_entry_ops;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 4ed0118..8c9cf8e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -49,7 +49,9 @@
#include <linux/netlink.h>
#include <net/switchdev.h>
+#include "spectrum_span.h"
#include "spectrum_router.h"
+#include "spectrum_switchdev.h"
#include "spectrum.h"
#include "core.h"
#include "reg.h"
@@ -239,7 +241,7 @@
return NULL;
}
-static struct mlxsw_sp_bridge_port *
+struct mlxsw_sp_bridge_port *
mlxsw_sp_bridge_port_find(struct mlxsw_sp_bridge *bridge,
struct net_device *brport_dev)
{
@@ -922,6 +924,9 @@
break;
}
+ if (switchdev_trans_ph_commit(trans))
+ mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp);
+
return err;
}
@@ -1646,18 +1651,57 @@
}
}
+struct mlxsw_sp_span_respin_work {
+ struct work_struct work;
+ struct mlxsw_sp *mlxsw_sp;
+};
+
+static void mlxsw_sp_span_respin_work(struct work_struct *work)
+{
+ struct mlxsw_sp_span_respin_work *respin_work =
+ container_of(work, struct mlxsw_sp_span_respin_work, work);
+
+ rtnl_lock();
+ mlxsw_sp_span_respin(respin_work->mlxsw_sp);
+ rtnl_unlock();
+ kfree(respin_work);
+}
+
+static void mlxsw_sp_span_respin_schedule(struct mlxsw_sp *mlxsw_sp)
+{
+ struct mlxsw_sp_span_respin_work *respin_work;
+
+ respin_work = kzalloc(sizeof(*respin_work), GFP_ATOMIC);
+ if (!respin_work)
+ return;
+
+ INIT_WORK(&respin_work->work, mlxsw_sp_span_respin_work);
+ respin_work->mlxsw_sp = mlxsw_sp;
+
+ mlxsw_core_schedule_work(&respin_work->work);
+}
+
static int mlxsw_sp_port_obj_add(struct net_device *dev,
const struct switchdev_obj *obj,
struct switchdev_trans *trans)
{
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+ const struct switchdev_obj_port_vlan *vlan;
int err = 0;
switch (obj->id) {
case SWITCHDEV_OBJ_ID_PORT_VLAN:
- err = mlxsw_sp_port_vlans_add(mlxsw_sp_port,
- SWITCHDEV_OBJ_PORT_VLAN(obj),
- trans);
+ vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+ err = mlxsw_sp_port_vlans_add(mlxsw_sp_port, vlan, trans);
+
+ if (switchdev_trans_ph_commit(trans)) {
+ /* The event is emitted before the changes are actually
+ * applied to the bridge. Therefore schedule the respin
+ * call for later, so that the respin logic sees the
+ * updated bridge state.
+ */
+ mlxsw_sp_span_respin_schedule(mlxsw_sp_port->mlxsw_sp);
+ }
break;
case SWITCHDEV_OBJ_ID_PORT_MDB:
err = mlxsw_sp_port_mdb_add(mlxsw_sp_port,
@@ -1806,6 +1850,8 @@
break;
}
+ mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp);
+
return err;
}
@@ -2222,6 +2268,8 @@
switch (switchdev_work->event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE:
fdb_info = &switchdev_work->fdb_info;
+ if (!fdb_info->added_by_user)
+ break;
err = mlxsw_sp_port_fdb_set(mlxsw_sp_port, fdb_info, true);
if (err)
break;
@@ -2231,10 +2279,20 @@
break;
case SWITCHDEV_FDB_DEL_TO_DEVICE:
fdb_info = &switchdev_work->fdb_info;
+ if (!fdb_info->added_by_user)
+ break;
mlxsw_sp_port_fdb_set(mlxsw_sp_port, fdb_info, false);
break;
+ case SWITCHDEV_FDB_ADD_TO_BRIDGE: /* fall through */
+ case SWITCHDEV_FDB_DEL_TO_BRIDGE:
+ /* These events are only used to potentially update an existing
+ * SPAN mirror.
+ */
+ break;
}
+ mlxsw_sp_span_respin(mlxsw_sp_port->mlxsw_sp);
+
out:
rtnl_unlock();
kfree(switchdev_work->fdb_info.addr);
@@ -2263,7 +2321,9 @@
switch (event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE: /* fall through */
- case SWITCHDEV_FDB_DEL_TO_DEVICE:
+ case SWITCHDEV_FDB_DEL_TO_DEVICE: /* fall through */
+ case SWITCHDEV_FDB_ADD_TO_BRIDGE: /* fall through */
+ case SWITCHDEV_FDB_DEL_TO_BRIDGE:
memcpy(&switchdev_work->fdb_info, ptr,
sizeof(switchdev_work->fdb_info));
switchdev_work->fdb_info.addr = kzalloc(ETH_ALEN, GFP_ATOMIC);
@@ -2295,6 +2355,12 @@
.notifier_call = mlxsw_sp_switchdev_event,
};
+u8
+mlxsw_sp_bridge_port_stp_state(struct mlxsw_sp_bridge_port *bridge_port)
+{
+ return bridge_port->stp_state;
+}
+
static int mlxsw_sp_fdb_init(struct mlxsw_sp *mlxsw_sp)
{
struct mlxsw_sp_bridge *bridge = mlxsw_sp->bridge;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.h
new file mode 100644
index 0000000..bc44d5ef
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.h
+ * Copyright (c) 2018 Mellanox Technologies. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ * contributors may be used to endorse or promote products derived from
+ * this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/netdevice.h>
+
+struct mlxsw_sp_bridge;
+struct mlxsw_sp_bridge_port;
+
+struct mlxsw_sp_bridge_port *
+mlxsw_sp_bridge_port_find(struct mlxsw_sp_bridge *bridge,
+ struct net_device *brport_dev);
+
+u8 mlxsw_sp_bridge_port_stp_state(struct mlxsw_sp_bridge_port *bridge_port);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index a655c58..3922c1c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -417,13 +417,10 @@
size_t len)
{
struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
- int err;
- err = snprintf(name, len, "p%d", mlxsw_sx_port->mapping.module + 1);
- if (err >= len)
- return -EINVAL;
-
- return 0;
+ return mlxsw_core_port_get_phys_port_name(mlxsw_sx_port->mlxsw_sx->core,
+ mlxsw_sx_port->local_port,
+ name, len);
}
static const struct net_device_ops mlxsw_sx_port_netdev_ops = {
@@ -1149,7 +1146,7 @@
}
mlxsw_core_port_eth_set(mlxsw_sx->core, mlxsw_sx_port->local_port,
- mlxsw_sx_port, dev, false, 0);
+ mlxsw_sx_port, dev, module + 1, false, 0);
mlxsw_sx->ports[local_port] = mlxsw_sx_port;
return 0;
diff --git a/drivers/net/ethernet/mscc/Kconfig b/drivers/net/ethernet/mscc/Kconfig
new file mode 100644
index 0000000..36c8462
--- /dev/null
+++ b/drivers/net/ethernet/mscc/Kconfig
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: (GPL-2.0 OR MIT)
+config NET_VENDOR_MICROSEMI
+ bool "Microsemi devices"
+ default y
+ help
+ If you have a network (Ethernet) card belonging to this class, say Y.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about Microsemi devices.
+
+if NET_VENDOR_MICROSEMI
+
+config MSCC_OCELOT_SWITCH
+ tristate "Ocelot switch driver"
+ depends on NET_SWITCHDEV
+ depends on HAS_IOMEM
+ select PHYLIB
+ select REGMAP_MMIO
+ help
+ This driver supports the Ocelot network switch device.
+
+config MSCC_OCELOT_SWITCH_OCELOT
+ tristate "Ocelot switch driver on Ocelot"
+ depends on MSCC_OCELOT_SWITCH
+ help
+ This driver supports the Ocelot network switch device as present on
+ the Ocelot SoCs.
+
+endif # NET_VENDOR_MICROSEMI
diff --git a/drivers/net/ethernet/mscc/Makefile b/drivers/net/ethernet/mscc/Makefile
new file mode 100644
index 0000000..cb52a3b
--- /dev/null
+++ b/drivers/net/ethernet/mscc/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: (GPL-2.0 OR MIT)
+obj-$(CONFIG_MSCC_OCELOT_SWITCH) += mscc_ocelot_common.o
+mscc_ocelot_common-y := ocelot.o ocelot_io.o
+mscc_ocelot_common-y += ocelot_regs.o
+obj-$(CONFIG_MSCC_OCELOT_SWITCH_OCELOT) += ocelot_board.o
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
new file mode 100644
index 0000000..c8c74aa
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -0,0 +1,1333 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/if_bridge.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/phy.h>
+#include <linux/skbuff.h>
+#include <net/arp.h>
+#include <net/netevent.h>
+#include <net/rtnetlink.h>
+#include <net/switchdev.h>
+
+#include "ocelot.h"
+
+/* MAC table entry types.
+ * ENTRYTYPE_NORMAL is subject to aging.
+ * ENTRYTYPE_LOCKED is not subject to aging.
+ * ENTRYTYPE_MACv4 is not subject to aging. For IPv4 multicast.
+ * ENTRYTYPE_MACv6 is not subject to aging. For IPv6 multicast.
+ */
+enum macaccess_entry_type {
+ ENTRYTYPE_NORMAL = 0,
+ ENTRYTYPE_LOCKED,
+ ENTRYTYPE_MACv4,
+ ENTRYTYPE_MACv6,
+};
+
+struct ocelot_mact_entry {
+ u8 mac[ETH_ALEN];
+ u16 vid;
+ enum macaccess_entry_type type;
+};
+
+static inline int ocelot_mact_wait_for_completion(struct ocelot *ocelot)
+{
+ unsigned int val, timeout = 10;
+
+ /* Wait for the issued mac table command to be completed, or timeout.
+ * When the command read from ANA_TABLES_MACACCESS is
+ * MACACCESS_CMD_IDLE, the issued command completed successfully.
+ */
+ do {
+ val = ocelot_read(ocelot, ANA_TABLES_MACACCESS);
+ val &= ANA_TABLES_MACACCESS_MAC_TABLE_CMD_M;
+ } while (val != MACACCESS_CMD_IDLE && timeout--);
+
+ if (!timeout)
+ return -ETIMEDOUT;
+
+ return 0;
+}
+
+static void ocelot_mact_select(struct ocelot *ocelot,
+ const unsigned char mac[ETH_ALEN],
+ unsigned int vid)
+{
+ u32 macl = 0, mach = 0;
+
+ /* Set the MAC address to handle and the vlan associated in a format
+ * understood by the hardware.
+ */
+ mach |= vid << 16;
+ mach |= mac[0] << 8;
+ mach |= mac[1] << 0;
+ macl |= mac[2] << 24;
+ macl |= mac[3] << 16;
+ macl |= mac[4] << 8;
+ macl |= mac[5] << 0;
+
+ ocelot_write(ocelot, macl, ANA_TABLES_MACLDATA);
+ ocelot_write(ocelot, mach, ANA_TABLES_MACHDATA);
+
+}
+
+static int ocelot_mact_learn(struct ocelot *ocelot, int port,
+ const unsigned char mac[ETH_ALEN],
+ unsigned int vid,
+ enum macaccess_entry_type type)
+{
+ ocelot_mact_select(ocelot, mac, vid);
+
+ /* Issue a write command */
+ ocelot_write(ocelot, ANA_TABLES_MACACCESS_VALID |
+ ANA_TABLES_MACACCESS_DEST_IDX(port) |
+ ANA_TABLES_MACACCESS_ENTRYTYPE(type) |
+ ANA_TABLES_MACACCESS_MAC_TABLE_CMD(MACACCESS_CMD_LEARN),
+ ANA_TABLES_MACACCESS);
+
+ return ocelot_mact_wait_for_completion(ocelot);
+}
+
+static int ocelot_mact_forget(struct ocelot *ocelot,
+ const unsigned char mac[ETH_ALEN],
+ unsigned int vid)
+{
+ ocelot_mact_select(ocelot, mac, vid);
+
+ /* Issue a forget command */
+ ocelot_write(ocelot,
+ ANA_TABLES_MACACCESS_MAC_TABLE_CMD(MACACCESS_CMD_FORGET),
+ ANA_TABLES_MACACCESS);
+
+ return ocelot_mact_wait_for_completion(ocelot);
+}
+
+static void ocelot_mact_init(struct ocelot *ocelot)
+{
+ /* Configure the learning mode entries attributes:
+ * - Do not copy the frame to the CPU extraction queues.
+ * - Use the vlan and mac_cpoy for dmac lookup.
+ */
+ ocelot_rmw(ocelot, 0,
+ ANA_AGENCTRL_LEARN_CPU_COPY | ANA_AGENCTRL_IGNORE_DMAC_FLAGS
+ | ANA_AGENCTRL_LEARN_FWD_KILL
+ | ANA_AGENCTRL_LEARN_IGNORE_VLAN,
+ ANA_AGENCTRL);
+
+ /* Clear the MAC table */
+ ocelot_write(ocelot, MACACCESS_CMD_INIT, ANA_TABLES_MACACCESS);
+}
+
+static inline int ocelot_vlant_wait_for_completion(struct ocelot *ocelot)
+{
+ unsigned int val, timeout = 10;
+
+ /* Wait for the issued mac table command to be completed, or timeout.
+ * When the command read from ANA_TABLES_MACACCESS is
+ * MACACCESS_CMD_IDLE, the issued command completed successfully.
+ */
+ do {
+ val = ocelot_read(ocelot, ANA_TABLES_VLANACCESS);
+ val &= ANA_TABLES_VLANACCESS_VLAN_TBL_CMD_M;
+ } while (val != ANA_TABLES_VLANACCESS_CMD_IDLE && timeout--);
+
+ if (!timeout)
+ return -ETIMEDOUT;
+
+ return 0;
+}
+
+static void ocelot_vlan_init(struct ocelot *ocelot)
+{
+ /* Clear VLAN table, by default all ports are members of all VLANs */
+ ocelot_write(ocelot, ANA_TABLES_VLANACCESS_CMD_INIT,
+ ANA_TABLES_VLANACCESS);
+ ocelot_vlant_wait_for_completion(ocelot);
+}
+
+/* Watermark encode
+ * Bit 8: Unit; 0:1, 1:16
+ * Bit 7-0: Value to be multiplied with unit
+ */
+static u16 ocelot_wm_enc(u16 value)
+{
+ if (value >= BIT(8))
+ return BIT(8) | (value / 16);
+
+ return value;
+}
+
+static void ocelot_port_adjust_link(struct net_device *dev)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ u8 p = port->chip_port;
+ int speed, atop_wm, mode = 0;
+
+ switch (dev->phydev->speed) {
+ case SPEED_10:
+ speed = OCELOT_SPEED_10;
+ break;
+ case SPEED_100:
+ speed = OCELOT_SPEED_100;
+ break;
+ case SPEED_1000:
+ speed = OCELOT_SPEED_1000;
+ mode = DEV_MAC_MODE_CFG_GIGA_MODE_ENA;
+ break;
+ case SPEED_2500:
+ speed = OCELOT_SPEED_2500;
+ mode = DEV_MAC_MODE_CFG_GIGA_MODE_ENA;
+ break;
+ default:
+ netdev_err(dev, "Unsupported PHY speed: %d\n",
+ dev->phydev->speed);
+ return;
+ }
+
+ phy_print_status(dev->phydev);
+
+ if (!dev->phydev->link)
+ return;
+
+ /* Only full duplex supported for now */
+ ocelot_port_writel(port, DEV_MAC_MODE_CFG_FDX_ENA |
+ mode, DEV_MAC_MODE_CFG);
+
+ /* Set MAC IFG Gaps
+ * FDX: TX_IFG = 5, RX_IFG1 = RX_IFG2 = 0
+ * !FDX: TX_IFG = 5, RX_IFG1 = RX_IFG2 = 5
+ */
+ ocelot_port_writel(port, DEV_MAC_IFG_CFG_TX_IFG(5), DEV_MAC_IFG_CFG);
+
+ /* Load seed (0) and set MAC HDX late collision */
+ ocelot_port_writel(port, DEV_MAC_HDX_CFG_LATE_COL_POS(67) |
+ DEV_MAC_HDX_CFG_SEED_LOAD,
+ DEV_MAC_HDX_CFG);
+ mdelay(1);
+ ocelot_port_writel(port, DEV_MAC_HDX_CFG_LATE_COL_POS(67),
+ DEV_MAC_HDX_CFG);
+
+ /* Disable HDX fast control */
+ ocelot_port_writel(port, DEV_PORT_MISC_HDX_FAST_DIS, DEV_PORT_MISC);
+
+ /* SGMII only for now */
+ ocelot_port_writel(port, PCS1G_MODE_CFG_SGMII_MODE_ENA, PCS1G_MODE_CFG);
+ ocelot_port_writel(port, PCS1G_SD_CFG_SD_SEL, PCS1G_SD_CFG);
+
+ /* Enable PCS */
+ ocelot_port_writel(port, PCS1G_CFG_PCS_ENA, PCS1G_CFG);
+
+ /* No aneg on SGMII */
+ ocelot_port_writel(port, 0, PCS1G_ANEG_CFG);
+
+ /* No loopback */
+ ocelot_port_writel(port, 0, PCS1G_LB_CFG);
+
+ /* Set Max Length and maximum tags allowed */
+ ocelot_port_writel(port, VLAN_ETH_FRAME_LEN, DEV_MAC_MAXLEN_CFG);
+ ocelot_port_writel(port, DEV_MAC_TAGS_CFG_TAG_ID(ETH_P_8021AD) |
+ DEV_MAC_TAGS_CFG_VLAN_AWR_ENA |
+ DEV_MAC_TAGS_CFG_VLAN_LEN_AWR_ENA,
+ DEV_MAC_TAGS_CFG);
+
+ /* Enable MAC module */
+ ocelot_port_writel(port, DEV_MAC_ENA_CFG_RX_ENA |
+ DEV_MAC_ENA_CFG_TX_ENA, DEV_MAC_ENA_CFG);
+
+ /* Take MAC, Port, Phy (intern) and PCS (SGMII/Serdes) clock out of
+ * reset */
+ ocelot_port_writel(port, DEV_CLOCK_CFG_LINK_SPEED(speed),
+ DEV_CLOCK_CFG);
+
+ /* Set SMAC of Pause frame (00:00:00:00:00:00) */
+ ocelot_port_writel(port, 0, DEV_MAC_FC_MAC_HIGH_CFG);
+ ocelot_port_writel(port, 0, DEV_MAC_FC_MAC_LOW_CFG);
+
+ /* No PFC */
+ ocelot_write_gix(ocelot, ANA_PFC_PFC_CFG_FC_LINK_SPEED(speed),
+ ANA_PFC_PFC_CFG, p);
+
+ /* Set Pause WM hysteresis
+ * 152 = 6 * VLAN_ETH_FRAME_LEN / OCELOT_BUFFER_CELL_SZ
+ * 101 = 4 * VLAN_ETH_FRAME_LEN / OCELOT_BUFFER_CELL_SZ
+ */
+ ocelot_write_rix(ocelot, SYS_PAUSE_CFG_PAUSE_ENA |
+ SYS_PAUSE_CFG_PAUSE_STOP(101) |
+ SYS_PAUSE_CFG_PAUSE_START(152), SYS_PAUSE_CFG, p);
+
+ /* Core: Enable port for frame transfer */
+ ocelot_write_rix(ocelot, QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE |
+ QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG(1) |
+ QSYS_SWITCH_PORT_MODE_PORT_ENA,
+ QSYS_SWITCH_PORT_MODE, p);
+
+ /* Flow control */
+ ocelot_write_rix(ocelot, SYS_MAC_FC_CFG_PAUSE_VAL_CFG(0xffff) |
+ SYS_MAC_FC_CFG_RX_FC_ENA | SYS_MAC_FC_CFG_TX_FC_ENA |
+ SYS_MAC_FC_CFG_ZERO_PAUSE_ENA |
+ SYS_MAC_FC_CFG_FC_LATENCY_CFG(0x7) |
+ SYS_MAC_FC_CFG_FC_LINK_SPEED(speed),
+ SYS_MAC_FC_CFG, p);
+ ocelot_write_rix(ocelot, 0, ANA_POL_FLOWC, p);
+
+ /* Tail dropping watermark */
+ atop_wm = (ocelot->shared_queue_sz - 9 * VLAN_ETH_FRAME_LEN) / OCELOT_BUFFER_CELL_SZ;
+ ocelot_write_rix(ocelot, ocelot_wm_enc(9 * VLAN_ETH_FRAME_LEN),
+ SYS_ATOP, p);
+ ocelot_write(ocelot, ocelot_wm_enc(atop_wm), SYS_ATOP_TOT_CFG);
+}
+
+static int ocelot_port_open(struct net_device *dev)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ int err;
+
+ /* Enable receiving frames on the port, and activate auto-learning of
+ * MAC addresses.
+ */
+ ocelot_write_gix(ocelot, ANA_PORT_PORT_CFG_LEARNAUTO |
+ ANA_PORT_PORT_CFG_RECV_ENA |
+ ANA_PORT_PORT_CFG_PORTID_VAL(port->chip_port),
+ ANA_PORT_PORT_CFG, port->chip_port);
+
+ err = phy_connect_direct(dev, port->phy, &ocelot_port_adjust_link,
+ PHY_INTERFACE_MODE_NA);
+ if (err) {
+ netdev_err(dev, "Could not attach to PHY\n");
+ return err;
+ }
+
+ dev->phydev = port->phy;
+
+ phy_attached_info(port->phy);
+ phy_start(port->phy);
+ return 0;
+}
+
+static int ocelot_port_stop(struct net_device *dev)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+
+ phy_disconnect(port->phy);
+
+ dev->phydev = NULL;
+
+ ocelot_port_writel(port, 0, DEV_MAC_ENA_CFG);
+ ocelot_rmw_rix(port->ocelot, 0, QSYS_SWITCH_PORT_MODE_PORT_ENA,
+ QSYS_SWITCH_PORT_MODE, port->chip_port);
+ return 0;
+}
+
+/* Generate the IFH for frame injection
+ *
+ * The IFH is a 128bit-value
+ * bit 127: bypass the analyzer processing
+ * bit 56-67: destination mask
+ * bit 28-29: pop_cnt: 3 disables all rewriting of the frame
+ * bit 20-27: cpu extraction queue mask
+ * bit 16: tag type 0: C-tag, 1: S-tag
+ * bit 0-11: VID
+ */
+static int ocelot_gen_ifh(u32 *ifh, struct frame_info *info)
+{
+ ifh[0] = IFH_INJ_BYPASS;
+ ifh[1] = (0xff00 & info->port) >> 8;
+ ifh[2] = (0xff & info->port) << 24;
+ ifh[3] = IFH_INJ_POP_CNT_DISABLE | (info->cpuq << 20) |
+ (info->tag_type << 16) | info->vid;
+
+ return 0;
+}
+
+static int ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ u32 val, ifh[IFH_LEN];
+ struct frame_info info = {};
+ u8 grp = 0; /* Send everything on CPU group 0 */
+ unsigned int i, count, last;
+
+ val = ocelot_read(ocelot, QS_INJ_STATUS);
+ if (!(val & QS_INJ_STATUS_FIFO_RDY(BIT(grp))) ||
+ (val & QS_INJ_STATUS_WMARK_REACHED(BIT(grp))))
+ return NETDEV_TX_BUSY;
+
+ ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
+ QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
+
+ info.port = BIT(port->chip_port);
+ info.cpuq = 0xff;
+ ocelot_gen_ifh(ifh, &info);
+
+ for (i = 0; i < IFH_LEN; i++)
+ ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp);
+
+ count = (skb->len + 3) / 4;
+ last = skb->len % 4;
+ for (i = 0; i < count; i++) {
+ ocelot_write_rix(ocelot, ((u32 *)skb->data)[i], QS_INJ_WR, grp);
+ }
+
+ /* Add padding */
+ while (i < (OCELOT_BUFFER_CELL_SZ / 4)) {
+ ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
+ i++;
+ }
+
+ /* Indicate EOF and valid bytes in last word */
+ ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
+ QS_INJ_CTRL_VLD_BYTES(skb->len < OCELOT_BUFFER_CELL_SZ ? 0 : last) |
+ QS_INJ_CTRL_EOF,
+ QS_INJ_CTRL, grp);
+
+ /* Add dummy CRC */
+ ocelot_write_rix(ocelot, 0, QS_INJ_WR, grp);
+ skb_tx_timestamp(skb);
+
+ dev->stats.tx_packets++;
+ dev->stats.tx_bytes += skb->len;
+ dev_kfree_skb_any(skb);
+
+ return NETDEV_TX_OK;
+}
+
+static void ocelot_mact_mc_reset(struct ocelot_port *port)
+{
+ struct ocelot *ocelot = port->ocelot;
+ struct netdev_hw_addr *ha, *n;
+
+ /* Free and forget all the MAC addresses stored in the port private mc
+ * list. These are mc addresses that were previously added by calling
+ * ocelot_mact_mc_add().
+ */
+ list_for_each_entry_safe(ha, n, &port->mc, list) {
+ ocelot_mact_forget(ocelot, ha->addr, port->pvid);
+ list_del(&ha->list);
+ kfree(ha);
+ }
+}
+
+static int ocelot_mact_mc_add(struct ocelot_port *port,
+ struct netdev_hw_addr *hw_addr)
+{
+ struct ocelot *ocelot = port->ocelot;
+ struct netdev_hw_addr *ha = kzalloc(sizeof(*ha), GFP_KERNEL);
+
+ if (!ha)
+ return -ENOMEM;
+
+ memcpy(ha, hw_addr, sizeof(*ha));
+ list_add_tail(&ha->list, &port->mc);
+
+ ocelot_mact_learn(ocelot, PGID_CPU, ha->addr, port->pvid,
+ ENTRYTYPE_LOCKED);
+
+ return 0;
+}
+
+static void ocelot_set_rx_mode(struct net_device *dev)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ struct netdev_hw_addr *ha;
+ int i;
+ u32 val;
+
+ /* This doesn't handle promiscuous mode because the bridge core is
+ * setting IFF_PROMISC on all slave interfaces and all frames would be
+ * forwarded to the CPU port.
+ */
+ val = GENMASK(ocelot->num_phys_ports - 1, 0);
+ for (i = ocelot->num_phys_ports + 1; i < PGID_CPU; i++)
+ ocelot_write_rix(ocelot, val, ANA_PGID_PGID, i);
+
+ /* Handle the device multicast addresses. First remove all the
+ * previously installed addresses and then add the latest ones to the
+ * mac table.
+ */
+ ocelot_mact_mc_reset(port);
+ netdev_for_each_mc_addr(ha, dev)
+ ocelot_mact_mc_add(port, ha);
+}
+
+static int ocelot_port_get_phys_port_name(struct net_device *dev,
+ char *buf, size_t len)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ int ret;
+
+ ret = snprintf(buf, len, "p%d", port->chip_port);
+ if (ret >= len)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int ocelot_port_set_mac_address(struct net_device *dev, void *p)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ const struct sockaddr *addr = p;
+
+ /* Learn the new net device MAC address in the mac table. */
+ ocelot_mact_learn(ocelot, PGID_CPU, addr->sa_data, port->pvid,
+ ENTRYTYPE_LOCKED);
+ /* Then forget the previous one. */
+ ocelot_mact_forget(ocelot, dev->dev_addr, port->pvid);
+
+ ether_addr_copy(dev->dev_addr, addr->sa_data);
+ return 0;
+}
+
+static void ocelot_get_stats64(struct net_device *dev,
+ struct rtnl_link_stats64 *stats)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+
+ /* Configure the port to read the stats from */
+ ocelot_write(ocelot, SYS_STAT_CFG_STAT_VIEW(port->chip_port),
+ SYS_STAT_CFG);
+
+ /* Get Rx stats */
+ stats->rx_bytes = ocelot_read(ocelot, SYS_COUNT_RX_OCTETS);
+ stats->rx_packets = ocelot_read(ocelot, SYS_COUNT_RX_SHORTS) +
+ ocelot_read(ocelot, SYS_COUNT_RX_FRAGMENTS) +
+ ocelot_read(ocelot, SYS_COUNT_RX_JABBERS) +
+ ocelot_read(ocelot, SYS_COUNT_RX_LONGS) +
+ ocelot_read(ocelot, SYS_COUNT_RX_64) +
+ ocelot_read(ocelot, SYS_COUNT_RX_65_127) +
+ ocelot_read(ocelot, SYS_COUNT_RX_128_255) +
+ ocelot_read(ocelot, SYS_COUNT_RX_256_1023) +
+ ocelot_read(ocelot, SYS_COUNT_RX_1024_1526) +
+ ocelot_read(ocelot, SYS_COUNT_RX_1527_MAX);
+ stats->multicast = ocelot_read(ocelot, SYS_COUNT_RX_MULTICAST);
+ stats->rx_dropped = dev->stats.rx_dropped;
+
+ /* Get Tx stats */
+ stats->tx_bytes = ocelot_read(ocelot, SYS_COUNT_TX_OCTETS);
+ stats->tx_packets = ocelot_read(ocelot, SYS_COUNT_TX_64) +
+ ocelot_read(ocelot, SYS_COUNT_TX_65_127) +
+ ocelot_read(ocelot, SYS_COUNT_TX_128_511) +
+ ocelot_read(ocelot, SYS_COUNT_TX_512_1023) +
+ ocelot_read(ocelot, SYS_COUNT_TX_1024_1526) +
+ ocelot_read(ocelot, SYS_COUNT_TX_1527_MAX);
+ stats->tx_dropped = ocelot_read(ocelot, SYS_COUNT_TX_DROPS) +
+ ocelot_read(ocelot, SYS_COUNT_TX_AGING);
+ stats->collisions = ocelot_read(ocelot, SYS_COUNT_TX_COLLISION);
+}
+
+static int ocelot_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
+ struct net_device *dev, const unsigned char *addr,
+ u16 vid, u16 flags)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+
+ return ocelot_mact_learn(ocelot, port->chip_port, addr, vid,
+ ENTRYTYPE_NORMAL);
+}
+
+static int ocelot_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
+ struct net_device *dev,
+ const unsigned char *addr, u16 vid)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+
+ return ocelot_mact_forget(ocelot, addr, vid);
+}
+
+struct ocelot_dump_ctx {
+ struct net_device *dev;
+ struct sk_buff *skb;
+ struct netlink_callback *cb;
+ int idx;
+};
+
+static int ocelot_fdb_do_dump(struct ocelot_mact_entry *entry,
+ struct ocelot_dump_ctx *dump)
+{
+ u32 portid = NETLINK_CB(dump->cb->skb).portid;
+ u32 seq = dump->cb->nlh->nlmsg_seq;
+ struct nlmsghdr *nlh;
+ struct ndmsg *ndm;
+
+ if (dump->idx < dump->cb->args[2])
+ goto skip;
+
+ nlh = nlmsg_put(dump->skb, portid, seq, RTM_NEWNEIGH,
+ sizeof(*ndm), NLM_F_MULTI);
+ if (!nlh)
+ return -EMSGSIZE;
+
+ ndm = nlmsg_data(nlh);
+ ndm->ndm_family = AF_BRIDGE;
+ ndm->ndm_pad1 = 0;
+ ndm->ndm_pad2 = 0;
+ ndm->ndm_flags = NTF_SELF;
+ ndm->ndm_type = 0;
+ ndm->ndm_ifindex = dump->dev->ifindex;
+ ndm->ndm_state = NUD_REACHABLE;
+
+ if (nla_put(dump->skb, NDA_LLADDR, ETH_ALEN, entry->mac))
+ goto nla_put_failure;
+
+ if (entry->vid && nla_put_u16(dump->skb, NDA_VLAN, entry->vid))
+ goto nla_put_failure;
+
+ nlmsg_end(dump->skb, nlh);
+
+skip:
+ dump->idx++;
+ return 0;
+
+nla_put_failure:
+ nlmsg_cancel(dump->skb, nlh);
+ return -EMSGSIZE;
+}
+
+static inline int ocelot_mact_read(struct ocelot_port *port, int row, int col,
+ struct ocelot_mact_entry *entry)
+{
+ struct ocelot *ocelot = port->ocelot;
+ char mac[ETH_ALEN];
+ u32 val, dst, macl, mach;
+
+ /* Set row and column to read from */
+ ocelot_field_write(ocelot, ANA_TABLES_MACTINDX_M_INDEX, row);
+ ocelot_field_write(ocelot, ANA_TABLES_MACTINDX_BUCKET, col);
+
+ /* Issue a read command */
+ ocelot_write(ocelot,
+ ANA_TABLES_MACACCESS_MAC_TABLE_CMD(MACACCESS_CMD_READ),
+ ANA_TABLES_MACACCESS);
+
+ if (ocelot_mact_wait_for_completion(ocelot))
+ return -ETIMEDOUT;
+
+ /* Read the entry flags */
+ val = ocelot_read(ocelot, ANA_TABLES_MACACCESS);
+ if (!(val & ANA_TABLES_MACACCESS_VALID))
+ return -EINVAL;
+
+ /* If the entry read has another port configured as its destination,
+ * do not report it.
+ */
+ dst = (val & ANA_TABLES_MACACCESS_DEST_IDX_M) >> 3;
+ if (dst != port->chip_port)
+ return -EINVAL;
+
+ /* Get the entry's MAC address and VLAN id */
+ macl = ocelot_read(ocelot, ANA_TABLES_MACLDATA);
+ mach = ocelot_read(ocelot, ANA_TABLES_MACHDATA);
+
+ mac[0] = (mach >> 8) & 0xff;
+ mac[1] = (mach >> 0) & 0xff;
+ mac[2] = (macl >> 24) & 0xff;
+ mac[3] = (macl >> 16) & 0xff;
+ mac[4] = (macl >> 8) & 0xff;
+ mac[5] = (macl >> 0) & 0xff;
+
+ entry->vid = (mach >> 16) & 0xfff;
+ ether_addr_copy(entry->mac, mac);
+
+ return 0;
+}
+
+static int ocelot_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
+ struct net_device *dev,
+ struct net_device *filter_dev, int *idx)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ int i, j, ret = 0;
+ struct ocelot_dump_ctx dump = {
+ .dev = dev,
+ .skb = skb,
+ .cb = cb,
+ .idx = *idx,
+ };
+
+ struct ocelot_mact_entry entry;
+
+ /* Loop through all the mac tables entries. There are 1024 rows of 4
+ * entries.
+ */
+ for (i = 0; i < 1024; i++) {
+ for (j = 0; j < 4; j++) {
+ ret = ocelot_mact_read(port, i, j, &entry);
+ /* If the entry is invalid (wrong port, invalid...),
+ * skip it.
+ */
+ if (ret == -EINVAL)
+ continue;
+ else if (ret)
+ goto end;
+
+ ret = ocelot_fdb_do_dump(&entry, &dump);
+ if (ret)
+ goto end;
+ }
+ }
+
+end:
+ *idx = dump.idx;
+ return ret;
+}
+
+static const struct net_device_ops ocelot_port_netdev_ops = {
+ .ndo_open = ocelot_port_open,
+ .ndo_stop = ocelot_port_stop,
+ .ndo_start_xmit = ocelot_port_xmit,
+ .ndo_set_rx_mode = ocelot_set_rx_mode,
+ .ndo_get_phys_port_name = ocelot_port_get_phys_port_name,
+ .ndo_set_mac_address = ocelot_port_set_mac_address,
+ .ndo_get_stats64 = ocelot_get_stats64,
+ .ndo_fdb_add = ocelot_fdb_add,
+ .ndo_fdb_del = ocelot_fdb_del,
+ .ndo_fdb_dump = ocelot_fdb_dump,
+};
+
+static void ocelot_get_strings(struct net_device *netdev, u32 sset, u8 *data)
+{
+ struct ocelot_port *port = netdev_priv(netdev);
+ struct ocelot *ocelot = port->ocelot;
+ int i;
+
+ if (sset != ETH_SS_STATS)
+ return;
+
+ for (i = 0; i < ocelot->num_stats; i++)
+ memcpy(data + i * ETH_GSTRING_LEN, ocelot->stats_layout[i].name,
+ ETH_GSTRING_LEN);
+}
+
+static void ocelot_check_stats(struct work_struct *work)
+{
+ struct delayed_work *del_work = to_delayed_work(work);
+ struct ocelot *ocelot = container_of(del_work, struct ocelot, stats_work);
+ int i, j;
+
+ mutex_lock(&ocelot->stats_lock);
+
+ for (i = 0; i < ocelot->num_phys_ports; i++) {
+ /* Configure the port to read the stats from */
+ ocelot_write(ocelot, SYS_STAT_CFG_STAT_VIEW(i), SYS_STAT_CFG);
+
+ for (j = 0; j < ocelot->num_stats; j++) {
+ u32 val;
+ unsigned int idx = i * ocelot->num_stats + j;
+
+ val = ocelot_read_rix(ocelot, SYS_COUNT_RX_OCTETS,
+ ocelot->stats_layout[j].offset);
+
+ if (val < (ocelot->stats[idx] & U32_MAX))
+ ocelot->stats[idx] += (u64)1 << 32;
+
+ ocelot->stats[idx] = (ocelot->stats[idx] &
+ ~(u64)U32_MAX) + val;
+ }
+ }
+
+ cancel_delayed_work(&ocelot->stats_work);
+ queue_delayed_work(ocelot->stats_queue, &ocelot->stats_work,
+ OCELOT_STATS_CHECK_DELAY);
+
+ mutex_unlock(&ocelot->stats_lock);
+}
+
+static void ocelot_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *stats, u64 *data)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ int i;
+
+ /* check and update now */
+ ocelot_check_stats(&ocelot->stats_work.work);
+
+ /* Copy all counters */
+ for (i = 0; i < ocelot->num_stats; i++)
+ *data++ = ocelot->stats[port->chip_port * ocelot->num_stats + i];
+}
+
+static int ocelot_get_sset_count(struct net_device *dev, int sset)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+
+ if (sset != ETH_SS_STATS)
+ return -EOPNOTSUPP;
+ return ocelot->num_stats;
+}
+
+static const struct ethtool_ops ocelot_ethtool_ops = {
+ .get_strings = ocelot_get_strings,
+ .get_ethtool_stats = ocelot_get_ethtool_stats,
+ .get_sset_count = ocelot_get_sset_count,
+};
+
+static int ocelot_port_attr_get(struct net_device *dev,
+ struct switchdev_attr *attr)
+{
+ struct ocelot_port *ocelot_port = netdev_priv(dev);
+ struct ocelot *ocelot = ocelot_port->ocelot;
+
+ switch (attr->id) {
+ case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+ attr->u.ppid.id_len = sizeof(ocelot->base_mac);
+ memcpy(&attr->u.ppid.id, &ocelot->base_mac,
+ attr->u.ppid.id_len);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int ocelot_port_attr_stp_state_set(struct ocelot_port *ocelot_port,
+ struct switchdev_trans *trans,
+ u8 state)
+{
+ struct ocelot *ocelot = ocelot_port->ocelot;
+ u32 port_cfg;
+ int port, i;
+
+ if (switchdev_trans_ph_prepare(trans))
+ return 0;
+
+ if (!(BIT(ocelot_port->chip_port) & ocelot->bridge_mask))
+ return 0;
+
+ port_cfg = ocelot_read_gix(ocelot, ANA_PORT_PORT_CFG,
+ ocelot_port->chip_port);
+
+ switch (state) {
+ case BR_STATE_FORWARDING:
+ ocelot->bridge_fwd_mask |= BIT(ocelot_port->chip_port);
+ /* Fallthrough */
+ case BR_STATE_LEARNING:
+ port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
+ break;
+
+ default:
+ port_cfg &= ~ANA_PORT_PORT_CFG_LEARN_ENA;
+ ocelot->bridge_fwd_mask &= ~BIT(ocelot_port->chip_port);
+ break;
+ }
+
+ ocelot_write_gix(ocelot, port_cfg, ANA_PORT_PORT_CFG,
+ ocelot_port->chip_port);
+
+ /* Apply FWD mask. The loop is needed to add/remove the current port as
+ * a source for the other ports.
+ */
+ for (port = 0; port < ocelot->num_phys_ports; port++) {
+ if (ocelot->bridge_fwd_mask & BIT(port)) {
+ unsigned long mask = ocelot->bridge_fwd_mask & ~BIT(port);
+
+ for (i = 0; i < ocelot->num_phys_ports; i++) {
+ unsigned long bond_mask = ocelot->lags[i];
+
+ if (!bond_mask)
+ continue;
+
+ if (bond_mask & BIT(port)) {
+ mask &= ~bond_mask;
+ break;
+ }
+ }
+
+ ocelot_write_rix(ocelot,
+ BIT(ocelot->num_phys_ports) | mask,
+ ANA_PGID_PGID, PGID_SRC + port);
+ } else {
+ /* Only the CPU port, this is compatible with link
+ * aggregation.
+ */
+ ocelot_write_rix(ocelot,
+ BIT(ocelot->num_phys_ports),
+ ANA_PGID_PGID, PGID_SRC + port);
+ }
+ }
+
+ return 0;
+}
+
+static void ocelot_port_attr_ageing_set(struct ocelot_port *ocelot_port,
+ unsigned long ageing_clock_t)
+{
+ struct ocelot *ocelot = ocelot_port->ocelot;
+ unsigned long ageing_jiffies = clock_t_to_jiffies(ageing_clock_t);
+ u32 ageing_time = jiffies_to_msecs(ageing_jiffies) / 1000;
+
+ ocelot_write(ocelot, ANA_AUTOAGE_AGE_PERIOD(ageing_time / 2),
+ ANA_AUTOAGE);
+}
+
+static void ocelot_port_attr_mc_set(struct ocelot_port *port, bool mc)
+{
+ struct ocelot *ocelot = port->ocelot;
+ u32 val = ocelot_read_gix(ocelot, ANA_PORT_CPU_FWD_CFG,
+ port->chip_port);
+
+ if (mc)
+ val |= ANA_PORT_CPU_FWD_CFG_CPU_IGMP_REDIR_ENA |
+ ANA_PORT_CPU_FWD_CFG_CPU_MLD_REDIR_ENA |
+ ANA_PORT_CPU_FWD_CFG_CPU_IPMC_CTRL_COPY_ENA;
+ else
+ val &= ~(ANA_PORT_CPU_FWD_CFG_CPU_IGMP_REDIR_ENA |
+ ANA_PORT_CPU_FWD_CFG_CPU_MLD_REDIR_ENA |
+ ANA_PORT_CPU_FWD_CFG_CPU_IPMC_CTRL_COPY_ENA);
+
+ ocelot_write_gix(ocelot, val, ANA_PORT_CPU_FWD_CFG, port->chip_port);
+}
+
+static int ocelot_port_attr_set(struct net_device *dev,
+ const struct switchdev_attr *attr,
+ struct switchdev_trans *trans)
+{
+ struct ocelot_port *ocelot_port = netdev_priv(dev);
+ int err = 0;
+
+ switch (attr->id) {
+ case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
+ ocelot_port_attr_stp_state_set(ocelot_port, trans,
+ attr->u.stp_state);
+ break;
+ case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
+ ocelot_port_attr_ageing_set(ocelot_port, attr->u.ageing_time);
+ break;
+ case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED:
+ ocelot_port_attr_mc_set(ocelot_port, !attr->u.mc_disabled);
+ break;
+ default:
+ err = -EOPNOTSUPP;
+ break;
+ }
+
+ return err;
+}
+
+static struct ocelot_multicast *ocelot_multicast_get(struct ocelot *ocelot,
+ const unsigned char *addr,
+ u16 vid)
+{
+ struct ocelot_multicast *mc;
+
+ list_for_each_entry(mc, &ocelot->multicast, list) {
+ if (ether_addr_equal(mc->addr, addr) && mc->vid == vid)
+ return mc;
+ }
+
+ return NULL;
+}
+
+static int ocelot_port_obj_add_mdb(struct net_device *dev,
+ const struct switchdev_obj_port_mdb *mdb,
+ struct switchdev_trans *trans)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ struct ocelot_multicast *mc;
+ unsigned char addr[ETH_ALEN];
+ u16 vid = mdb->vid;
+ bool new = false;
+
+ if (!vid)
+ vid = 1;
+
+ mc = ocelot_multicast_get(ocelot, mdb->addr, vid);
+ if (!mc) {
+ mc = devm_kzalloc(ocelot->dev, sizeof(*mc), GFP_KERNEL);
+ if (!mc)
+ return -ENOMEM;
+
+ memcpy(mc->addr, mdb->addr, ETH_ALEN);
+ mc->vid = vid;
+
+ list_add_tail(&mc->list, &ocelot->multicast);
+ new = true;
+ }
+
+ memcpy(addr, mc->addr, ETH_ALEN);
+ addr[0] = 0;
+
+ if (!new) {
+ addr[2] = mc->ports << 0;
+ addr[1] = mc->ports << 8;
+ ocelot_mact_forget(ocelot, addr, vid);
+ }
+
+ mc->ports |= BIT(port->chip_port);
+ addr[2] = mc->ports << 0;
+ addr[1] = mc->ports << 8;
+
+ return ocelot_mact_learn(ocelot, 0, addr, vid, ENTRYTYPE_MACv4);
+}
+
+static int ocelot_port_obj_del_mdb(struct net_device *dev,
+ const struct switchdev_obj_port_mdb *mdb)
+{
+ struct ocelot_port *port = netdev_priv(dev);
+ struct ocelot *ocelot = port->ocelot;
+ struct ocelot_multicast *mc;
+ unsigned char addr[ETH_ALEN];
+ u16 vid = mdb->vid;
+
+ if (!vid)
+ vid = 1;
+
+ mc = ocelot_multicast_get(ocelot, mdb->addr, vid);
+ if (!mc)
+ return -ENOENT;
+
+ memcpy(addr, mc->addr, ETH_ALEN);
+ addr[2] = mc->ports << 0;
+ addr[1] = mc->ports << 8;
+ addr[0] = 0;
+ ocelot_mact_forget(ocelot, addr, vid);
+
+ mc->ports &= ~BIT(port->chip_port);
+ if (!mc->ports) {
+ list_del(&mc->list);
+ devm_kfree(ocelot->dev, mc);
+ return 0;
+ }
+
+ addr[2] = mc->ports << 0;
+ addr[1] = mc->ports << 8;
+
+ return ocelot_mact_learn(ocelot, 0, addr, vid, ENTRYTYPE_MACv4);
+}
+
+static int ocelot_port_obj_add(struct net_device *dev,
+ const struct switchdev_obj *obj,
+ struct switchdev_trans *trans)
+{
+ int ret = 0;
+
+ switch (obj->id) {
+ case SWITCHDEV_OBJ_ID_PORT_MDB:
+ ret = ocelot_port_obj_add_mdb(dev, SWITCHDEV_OBJ_PORT_MDB(obj),
+ trans);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return ret;
+}
+
+static int ocelot_port_obj_del(struct net_device *dev,
+ const struct switchdev_obj *obj)
+{
+ int ret = 0;
+
+ switch (obj->id) {
+ case SWITCHDEV_OBJ_ID_PORT_MDB:
+ ret = ocelot_port_obj_del_mdb(dev, SWITCHDEV_OBJ_PORT_MDB(obj));
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return ret;
+}
+
+static const struct switchdev_ops ocelot_port_switchdev_ops = {
+ .switchdev_port_attr_get = ocelot_port_attr_get,
+ .switchdev_port_attr_set = ocelot_port_attr_set,
+ .switchdev_port_obj_add = ocelot_port_obj_add,
+ .switchdev_port_obj_del = ocelot_port_obj_del,
+};
+
+static int ocelot_port_bridge_join(struct ocelot_port *ocelot_port,
+ struct net_device *bridge)
+{
+ struct ocelot *ocelot = ocelot_port->ocelot;
+
+ if (!ocelot->bridge_mask) {
+ ocelot->hw_bridge_dev = bridge;
+ } else {
+ if (ocelot->hw_bridge_dev != bridge)
+ /* This is adding the port to a second bridge, this is
+ * unsupported */
+ return -ENODEV;
+ }
+
+ ocelot->bridge_mask |= BIT(ocelot_port->chip_port);
+
+ return 0;
+}
+
+static void ocelot_port_bridge_leave(struct ocelot_port *ocelot_port,
+ struct net_device *bridge)
+{
+ struct ocelot *ocelot = ocelot_port->ocelot;
+
+ ocelot->bridge_mask &= ~BIT(ocelot_port->chip_port);
+
+ if (!ocelot->bridge_mask)
+ ocelot->hw_bridge_dev = NULL;
+}
+
+/* Checks if the net_device instance given to us originate from our driver. */
+static bool ocelot_netdevice_dev_check(const struct net_device *dev)
+{
+ return dev->netdev_ops == &ocelot_port_netdev_ops;
+}
+
+static int ocelot_netdevice_port_event(struct net_device *dev,
+ unsigned long event,
+ struct netdev_notifier_changeupper_info *info)
+{
+ struct ocelot_port *ocelot_port = netdev_priv(dev);
+ int err = 0;
+
+ if (!ocelot_netdevice_dev_check(dev))
+ return 0;
+
+ switch (event) {
+ case NETDEV_CHANGEUPPER:
+ if (netif_is_bridge_master(info->upper_dev)) {
+ if (info->linking)
+ err = ocelot_port_bridge_join(ocelot_port,
+ info->upper_dev);
+ else
+ ocelot_port_bridge_leave(ocelot_port,
+ info->upper_dev);
+ }
+ break;
+ default:
+ break;
+ }
+
+ return err;
+}
+
+static int ocelot_netdevice_event(struct notifier_block *unused,
+ unsigned long event, void *ptr)
+{
+ struct netdev_notifier_changeupper_info *info = ptr;
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+ int ret;
+
+ if (netif_is_lag_master(dev)) {
+ struct net_device *slave;
+ struct list_head *iter;
+
+ netdev_for_each_lower_dev(dev, slave, iter) {
+ ret = ocelot_netdevice_port_event(slave, event, info);
+ if (ret)
+ goto notify;
+ }
+ } else {
+ ret = ocelot_netdevice_port_event(dev, event, info);
+ }
+
+notify:
+ return notifier_from_errno(ret);
+}
+
+struct notifier_block ocelot_netdevice_nb __read_mostly = {
+ .notifier_call = ocelot_netdevice_event,
+};
+EXPORT_SYMBOL(ocelot_netdevice_nb);
+
+int ocelot_probe_port(struct ocelot *ocelot, u8 port,
+ void __iomem *regs,
+ struct phy_device *phy)
+{
+ struct ocelot_port *ocelot_port;
+ struct net_device *dev;
+ int err;
+
+ dev = alloc_etherdev(sizeof(struct ocelot_port));
+ if (!dev)
+ return -ENOMEM;
+ SET_NETDEV_DEV(dev, ocelot->dev);
+ ocelot_port = netdev_priv(dev);
+ ocelot_port->dev = dev;
+ ocelot_port->ocelot = ocelot;
+ ocelot_port->regs = regs;
+ ocelot_port->chip_port = port;
+ ocelot_port->phy = phy;
+ INIT_LIST_HEAD(&ocelot_port->mc);
+ ocelot->ports[port] = ocelot_port;
+
+ dev->netdev_ops = &ocelot_port_netdev_ops;
+ dev->ethtool_ops = &ocelot_ethtool_ops;
+ dev->switchdev_ops = &ocelot_port_switchdev_ops;
+
+ memcpy(dev->dev_addr, ocelot->base_mac, ETH_ALEN);
+ dev->dev_addr[ETH_ALEN - 1] += port;
+ ocelot_mact_learn(ocelot, PGID_CPU, dev->dev_addr, ocelot_port->pvid,
+ ENTRYTYPE_LOCKED);
+
+ err = register_netdev(dev);
+ if (err) {
+ dev_err(ocelot->dev, "register_netdev failed\n");
+ goto err_register_netdev;
+ }
+
+ return 0;
+
+err_register_netdev:
+ free_netdev(dev);
+ return err;
+}
+EXPORT_SYMBOL(ocelot_probe_port);
+
+int ocelot_init(struct ocelot *ocelot)
+{
+ u32 port;
+ int i, cpu = ocelot->num_phys_ports;
+ char queue_name[32];
+
+ ocelot->stats = devm_kcalloc(ocelot->dev,
+ ocelot->num_phys_ports * ocelot->num_stats,
+ sizeof(u64), GFP_KERNEL);
+ if (!ocelot->stats)
+ return -ENOMEM;
+
+ mutex_init(&ocelot->stats_lock);
+ snprintf(queue_name, sizeof(queue_name), "%s-stats",
+ dev_name(ocelot->dev));
+ ocelot->stats_queue = create_singlethread_workqueue(queue_name);
+ if (!ocelot->stats_queue)
+ return -ENOMEM;
+
+ ocelot_mact_init(ocelot);
+ ocelot_vlan_init(ocelot);
+
+ for (port = 0; port < ocelot->num_phys_ports; port++) {
+ /* Clear all counters (5 groups) */
+ ocelot_write(ocelot, SYS_STAT_CFG_STAT_VIEW(port) |
+ SYS_STAT_CFG_STAT_CLEAR_SHOT(0x7f),
+ SYS_STAT_CFG);
+ }
+
+ /* Only use S-Tag */
+ ocelot_write(ocelot, ETH_P_8021AD, SYS_VLAN_ETYPE_CFG);
+
+ /* Aggregation mode */
+ ocelot_write(ocelot, ANA_AGGR_CFG_AC_SMAC_ENA |
+ ANA_AGGR_CFG_AC_DMAC_ENA |
+ ANA_AGGR_CFG_AC_IP4_SIPDIP_ENA |
+ ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA, ANA_AGGR_CFG);
+
+ /* Set MAC age time to default value. The entry is aged after
+ * 2*AGE_PERIOD
+ */
+ ocelot_write(ocelot,
+ ANA_AUTOAGE_AGE_PERIOD(BR_DEFAULT_AGEING_TIME / 2 / HZ),
+ ANA_AUTOAGE);
+
+ /* Disable learning for frames discarded by VLAN ingress filtering */
+ regmap_field_write(ocelot->regfields[ANA_ADVLEARN_VLAN_CHK], 1);
+
+ /* Setup frame ageing - fixed value "2 sec" - in 6.5 us units */
+ ocelot_write(ocelot, SYS_FRM_AGING_AGE_TX_ENA |
+ SYS_FRM_AGING_MAX_AGE(307692), SYS_FRM_AGING);
+
+ /* Setup flooding PGIDs */
+ ocelot_write_rix(ocelot, ANA_FLOODING_FLD_MULTICAST(PGID_MC) |
+ ANA_FLOODING_FLD_BROADCAST(PGID_MC) |
+ ANA_FLOODING_FLD_UNICAST(PGID_UC),
+ ANA_FLOODING, 0);
+ ocelot_write(ocelot, ANA_FLOODING_IPMC_FLD_MC6_DATA(PGID_MCIPV6) |
+ ANA_FLOODING_IPMC_FLD_MC6_CTRL(PGID_MC) |
+ ANA_FLOODING_IPMC_FLD_MC4_DATA(PGID_MCIPV4) |
+ ANA_FLOODING_IPMC_FLD_MC4_CTRL(PGID_MC),
+ ANA_FLOODING_IPMC);
+
+ for (port = 0; port < ocelot->num_phys_ports; port++) {
+ /* Transmit the frame to the local port. */
+ ocelot_write_rix(ocelot, BIT(port), ANA_PGID_PGID, port);
+ /* Do not forward BPDU frames to the front ports. */
+ ocelot_write_gix(ocelot,
+ ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA(0xffff),
+ ANA_PORT_CPU_FWD_BPDU_CFG,
+ port);
+ /* Ensure bridging is disabled */
+ ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_SRC + port);
+ }
+
+ /* Configure and enable the CPU port. */
+ ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, cpu);
+ ocelot_write_rix(ocelot, BIT(cpu), ANA_PGID_PGID, PGID_CPU);
+ ocelot_write_gix(ocelot, ANA_PORT_PORT_CFG_RECV_ENA |
+ ANA_PORT_PORT_CFG_PORTID_VAL(cpu),
+ ANA_PORT_PORT_CFG, cpu);
+
+ /* Allow broadcast MAC frames. */
+ for (i = ocelot->num_phys_ports + 1; i < PGID_CPU; i++) {
+ u32 val = ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports - 1, 0));
+
+ ocelot_write_rix(ocelot, val, ANA_PGID_PGID, i);
+ }
+ ocelot_write_rix(ocelot,
+ ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)),
+ ANA_PGID_PGID, PGID_MC);
+ ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_MCIPV4);
+ ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_MCIPV6);
+
+ /* CPU port Injection/Extraction configuration */
+ ocelot_write_rix(ocelot, QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE |
+ QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG(1) |
+ QSYS_SWITCH_PORT_MODE_PORT_ENA,
+ QSYS_SWITCH_PORT_MODE, cpu);
+ ocelot_write_rix(ocelot, SYS_PORT_MODE_INCL_XTR_HDR(1) |
+ SYS_PORT_MODE_INCL_INJ_HDR(1), SYS_PORT_MODE, cpu);
+ /* Allow manual injection via DEVCPU_QS registers, and byte swap these
+ * registers endianness.
+ */
+ ocelot_write_rix(ocelot, QS_INJ_GRP_CFG_BYTE_SWAP |
+ QS_INJ_GRP_CFG_MODE(1), QS_INJ_GRP_CFG, 0);
+ ocelot_write_rix(ocelot, QS_XTR_GRP_CFG_BYTE_SWAP |
+ QS_XTR_GRP_CFG_MODE(1), QS_XTR_GRP_CFG, 0);
+ ocelot_write(ocelot, ANA_CPUQ_CFG_CPUQ_MIRROR(2) |
+ ANA_CPUQ_CFG_CPUQ_LRN(2) |
+ ANA_CPUQ_CFG_CPUQ_MAC_COPY(2) |
+ ANA_CPUQ_CFG_CPUQ_SRC_COPY(2) |
+ ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE(2) |
+ ANA_CPUQ_CFG_CPUQ_ALLBRIDGE(6) |
+ ANA_CPUQ_CFG_CPUQ_IPMC_CTRL(6) |
+ ANA_CPUQ_CFG_CPUQ_IGMP(6) |
+ ANA_CPUQ_CFG_CPUQ_MLD(6), ANA_CPUQ_CFG);
+ for (i = 0; i < 16; i++)
+ ocelot_write_rix(ocelot, ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL(6) |
+ ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL(6),
+ ANA_CPUQ_8021_CFG, i);
+
+ INIT_DELAYED_WORK(&ocelot->stats_work, ocelot_check_stats);
+ queue_delayed_work(ocelot->stats_queue, &ocelot->stats_work,
+ OCELOT_STATS_CHECK_DELAY);
+ return 0;
+}
+EXPORT_SYMBOL(ocelot_init);
+
+void ocelot_deinit(struct ocelot *ocelot)
+{
+ destroy_workqueue(ocelot->stats_queue);
+ mutex_destroy(&ocelot->stats_lock);
+}
+EXPORT_SYMBOL(ocelot_deinit);
+
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
new file mode 100644
index 0000000..097bd12
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -0,0 +1,572 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_H_
+#define _MSCC_OCELOT_H_
+
+#include <linux/bitops.h>
+#include <linux/etherdevice.h>
+#include <linux/if_vlan.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+
+#include "ocelot_ana.h"
+#include "ocelot_dev.h"
+#include "ocelot_hsio.h"
+#include "ocelot_qsys.h"
+#include "ocelot_rew.h"
+#include "ocelot_sys.h"
+#include "ocelot_qs.h"
+
+#define PGID_AGGR 64
+#define PGID_SRC 80
+
+/* Reserved PGIDs */
+#define PGID_CPU (PGID_AGGR - 5)
+#define PGID_UC (PGID_AGGR - 4)
+#define PGID_MC (PGID_AGGR - 3)
+#define PGID_MCIPV4 (PGID_AGGR - 2)
+#define PGID_MCIPV6 (PGID_AGGR - 1)
+
+#define OCELOT_BUFFER_CELL_SZ 60
+
+#define OCELOT_STATS_CHECK_DELAY (2 * HZ)
+
+#define IFH_LEN 4
+
+struct frame_info {
+ u32 len;
+ u16 port;
+ u16 vid;
+ u8 cpuq;
+ u8 tag_type;
+};
+
+#define IFH_INJ_BYPASS BIT(31)
+#define IFH_INJ_POP_CNT_DISABLE (3 << 28)
+
+#define IFH_TAG_TYPE_C 0
+#define IFH_TAG_TYPE_S 1
+
+#define OCELOT_SPEED_2500 0
+#define OCELOT_SPEED_1000 1
+#define OCELOT_SPEED_100 2
+#define OCELOT_SPEED_10 3
+
+#define TARGET_OFFSET 24
+#define REG_MASK GENMASK(TARGET_OFFSET - 1, 0)
+#define REG(reg, offset) [reg & REG_MASK] = offset
+
+enum ocelot_target {
+ ANA = 1,
+ QS,
+ QSYS,
+ REW,
+ SYS,
+ HSIO,
+ TARGET_MAX,
+};
+
+enum ocelot_reg {
+ ANA_ADVLEARN = ANA << TARGET_OFFSET,
+ ANA_VLANMASK,
+ ANA_PORT_B_DOMAIN,
+ ANA_ANAGEFIL,
+ ANA_ANEVENTS,
+ ANA_STORMLIMIT_BURST,
+ ANA_STORMLIMIT_CFG,
+ ANA_ISOLATED_PORTS,
+ ANA_COMMUNITY_PORTS,
+ ANA_AUTOAGE,
+ ANA_MACTOPTIONS,
+ ANA_LEARNDISC,
+ ANA_AGENCTRL,
+ ANA_MIRRORPORTS,
+ ANA_EMIRRORPORTS,
+ ANA_FLOODING,
+ ANA_FLOODING_IPMC,
+ ANA_SFLOW_CFG,
+ ANA_PORT_MODE,
+ ANA_CUT_THRU_CFG,
+ ANA_PGID_PGID,
+ ANA_TABLES_ANMOVED,
+ ANA_TABLES_MACHDATA,
+ ANA_TABLES_MACLDATA,
+ ANA_TABLES_STREAMDATA,
+ ANA_TABLES_MACACCESS,
+ ANA_TABLES_MACTINDX,
+ ANA_TABLES_VLANACCESS,
+ ANA_TABLES_VLANTIDX,
+ ANA_TABLES_ISDXACCESS,
+ ANA_TABLES_ISDXTIDX,
+ ANA_TABLES_ENTRYLIM,
+ ANA_TABLES_PTP_ID_HIGH,
+ ANA_TABLES_PTP_ID_LOW,
+ ANA_TABLES_STREAMACCESS,
+ ANA_TABLES_STREAMTIDX,
+ ANA_TABLES_SEQ_HISTORY,
+ ANA_TABLES_SEQ_MASK,
+ ANA_TABLES_SFID_MASK,
+ ANA_TABLES_SFIDACCESS,
+ ANA_TABLES_SFIDTIDX,
+ ANA_MSTI_STATE,
+ ANA_OAM_UPM_LM_CNT,
+ ANA_SG_ACCESS_CTRL,
+ ANA_SG_CONFIG_REG_1,
+ ANA_SG_CONFIG_REG_2,
+ ANA_SG_CONFIG_REG_3,
+ ANA_SG_CONFIG_REG_4,
+ ANA_SG_CONFIG_REG_5,
+ ANA_SG_GCL_GS_CONFIG,
+ ANA_SG_GCL_TI_CONFIG,
+ ANA_SG_STATUS_REG_1,
+ ANA_SG_STATUS_REG_2,
+ ANA_SG_STATUS_REG_3,
+ ANA_PORT_VLAN_CFG,
+ ANA_PORT_DROP_CFG,
+ ANA_PORT_QOS_CFG,
+ ANA_PORT_VCAP_CFG,
+ ANA_PORT_VCAP_S1_KEY_CFG,
+ ANA_PORT_VCAP_S2_CFG,
+ ANA_PORT_PCP_DEI_MAP,
+ ANA_PORT_CPU_FWD_CFG,
+ ANA_PORT_CPU_FWD_BPDU_CFG,
+ ANA_PORT_CPU_FWD_GARP_CFG,
+ ANA_PORT_CPU_FWD_CCM_CFG,
+ ANA_PORT_PORT_CFG,
+ ANA_PORT_POL_CFG,
+ ANA_PORT_PTP_CFG,
+ ANA_PORT_PTP_DLY1_CFG,
+ ANA_PORT_PTP_DLY2_CFG,
+ ANA_PORT_SFID_CFG,
+ ANA_PFC_PFC_CFG,
+ ANA_PFC_PFC_TIMER,
+ ANA_IPT_OAM_MEP_CFG,
+ ANA_IPT_IPT,
+ ANA_PPT_PPT,
+ ANA_FID_MAP_FID_MAP,
+ ANA_AGGR_CFG,
+ ANA_CPUQ_CFG,
+ ANA_CPUQ_CFG2,
+ ANA_CPUQ_8021_CFG,
+ ANA_DSCP_CFG,
+ ANA_DSCP_REWR_CFG,
+ ANA_VCAP_RNG_TYPE_CFG,
+ ANA_VCAP_RNG_VAL_CFG,
+ ANA_VRAP_CFG,
+ ANA_VRAP_HDR_DATA,
+ ANA_VRAP_HDR_MASK,
+ ANA_DISCARD_CFG,
+ ANA_FID_CFG,
+ ANA_POL_PIR_CFG,
+ ANA_POL_CIR_CFG,
+ ANA_POL_MODE_CFG,
+ ANA_POL_PIR_STATE,
+ ANA_POL_CIR_STATE,
+ ANA_POL_STATE,
+ ANA_POL_FLOWC,
+ ANA_POL_HYST,
+ ANA_POL_MISC_CFG,
+ QS_XTR_GRP_CFG = QS << TARGET_OFFSET,
+ QS_XTR_RD,
+ QS_XTR_FRM_PRUNING,
+ QS_XTR_FLUSH,
+ QS_XTR_DATA_PRESENT,
+ QS_XTR_CFG,
+ QS_INJ_GRP_CFG,
+ QS_INJ_WR,
+ QS_INJ_CTRL,
+ QS_INJ_STATUS,
+ QS_INJ_ERR,
+ QS_INH_DBG,
+ QSYS_PORT_MODE = QSYS << TARGET_OFFSET,
+ QSYS_SWITCH_PORT_MODE,
+ QSYS_STAT_CNT_CFG,
+ QSYS_EEE_CFG,
+ QSYS_EEE_THRES,
+ QSYS_IGR_NO_SHARING,
+ QSYS_EGR_NO_SHARING,
+ QSYS_SW_STATUS,
+ QSYS_EXT_CPU_CFG,
+ QSYS_PAD_CFG,
+ QSYS_CPU_GROUP_MAP,
+ QSYS_QMAP,
+ QSYS_ISDX_SGRP,
+ QSYS_TIMED_FRAME_ENTRY,
+ QSYS_TFRM_MISC,
+ QSYS_TFRM_PORT_DLY,
+ QSYS_TFRM_TIMER_CFG_1,
+ QSYS_TFRM_TIMER_CFG_2,
+ QSYS_TFRM_TIMER_CFG_3,
+ QSYS_TFRM_TIMER_CFG_4,
+ QSYS_TFRM_TIMER_CFG_5,
+ QSYS_TFRM_TIMER_CFG_6,
+ QSYS_TFRM_TIMER_CFG_7,
+ QSYS_TFRM_TIMER_CFG_8,
+ QSYS_RED_PROFILE,
+ QSYS_RES_QOS_MODE,
+ QSYS_RES_CFG,
+ QSYS_RES_STAT,
+ QSYS_EGR_DROP_MODE,
+ QSYS_EQ_CTRL,
+ QSYS_EVENTS_CORE,
+ QSYS_QMAXSDU_CFG_0,
+ QSYS_QMAXSDU_CFG_1,
+ QSYS_QMAXSDU_CFG_2,
+ QSYS_QMAXSDU_CFG_3,
+ QSYS_QMAXSDU_CFG_4,
+ QSYS_QMAXSDU_CFG_5,
+ QSYS_QMAXSDU_CFG_6,
+ QSYS_QMAXSDU_CFG_7,
+ QSYS_PREEMPTION_CFG,
+ QSYS_CIR_CFG,
+ QSYS_EIR_CFG,
+ QSYS_SE_CFG,
+ QSYS_SE_DWRR_CFG,
+ QSYS_SE_CONNECT,
+ QSYS_SE_DLB_SENSE,
+ QSYS_CIR_STATE,
+ QSYS_EIR_STATE,
+ QSYS_SE_STATE,
+ QSYS_HSCH_MISC_CFG,
+ QSYS_TAG_CONFIG,
+ QSYS_TAS_PARAM_CFG_CTRL,
+ QSYS_PORT_MAX_SDU,
+ QSYS_PARAM_CFG_REG_1,
+ QSYS_PARAM_CFG_REG_2,
+ QSYS_PARAM_CFG_REG_3,
+ QSYS_PARAM_CFG_REG_4,
+ QSYS_PARAM_CFG_REG_5,
+ QSYS_GCL_CFG_REG_1,
+ QSYS_GCL_CFG_REG_2,
+ QSYS_PARAM_STATUS_REG_1,
+ QSYS_PARAM_STATUS_REG_2,
+ QSYS_PARAM_STATUS_REG_3,
+ QSYS_PARAM_STATUS_REG_4,
+ QSYS_PARAM_STATUS_REG_5,
+ QSYS_PARAM_STATUS_REG_6,
+ QSYS_PARAM_STATUS_REG_7,
+ QSYS_PARAM_STATUS_REG_8,
+ QSYS_PARAM_STATUS_REG_9,
+ QSYS_GCL_STATUS_REG_1,
+ QSYS_GCL_STATUS_REG_2,
+ REW_PORT_VLAN_CFG = REW << TARGET_OFFSET,
+ REW_TAG_CFG,
+ REW_PORT_CFG,
+ REW_DSCP_CFG,
+ REW_PCP_DEI_QOS_MAP_CFG,
+ REW_PTP_CFG,
+ REW_PTP_DLY1_CFG,
+ REW_RED_TAG_CFG,
+ REW_DSCP_REMAP_DP1_CFG,
+ REW_DSCP_REMAP_CFG,
+ REW_STAT_CFG,
+ REW_REW_STICKY,
+ REW_PPT,
+ SYS_COUNT_RX_OCTETS = SYS << TARGET_OFFSET,
+ SYS_COUNT_RX_UNICAST,
+ SYS_COUNT_RX_MULTICAST,
+ SYS_COUNT_RX_BROADCAST,
+ SYS_COUNT_RX_SHORTS,
+ SYS_COUNT_RX_FRAGMENTS,
+ SYS_COUNT_RX_JABBERS,
+ SYS_COUNT_RX_CRC_ALIGN_ERRS,
+ SYS_COUNT_RX_SYM_ERRS,
+ SYS_COUNT_RX_64,
+ SYS_COUNT_RX_65_127,
+ SYS_COUNT_RX_128_255,
+ SYS_COUNT_RX_256_1023,
+ SYS_COUNT_RX_1024_1526,
+ SYS_COUNT_RX_1527_MAX,
+ SYS_COUNT_RX_PAUSE,
+ SYS_COUNT_RX_CONTROL,
+ SYS_COUNT_RX_LONGS,
+ SYS_COUNT_RX_CLASSIFIED_DROPS,
+ SYS_COUNT_TX_OCTETS,
+ SYS_COUNT_TX_UNICAST,
+ SYS_COUNT_TX_MULTICAST,
+ SYS_COUNT_TX_BROADCAST,
+ SYS_COUNT_TX_COLLISION,
+ SYS_COUNT_TX_DROPS,
+ SYS_COUNT_TX_PAUSE,
+ SYS_COUNT_TX_64,
+ SYS_COUNT_TX_65_127,
+ SYS_COUNT_TX_128_511,
+ SYS_COUNT_TX_512_1023,
+ SYS_COUNT_TX_1024_1526,
+ SYS_COUNT_TX_1527_MAX,
+ SYS_COUNT_TX_AGING,
+ SYS_RESET_CFG,
+ SYS_SR_ETYPE_CFG,
+ SYS_VLAN_ETYPE_CFG,
+ SYS_PORT_MODE,
+ SYS_FRONT_PORT_MODE,
+ SYS_FRM_AGING,
+ SYS_STAT_CFG,
+ SYS_SW_STATUS,
+ SYS_MISC_CFG,
+ SYS_REW_MAC_HIGH_CFG,
+ SYS_REW_MAC_LOW_CFG,
+ SYS_TIMESTAMP_OFFSET,
+ SYS_CMID,
+ SYS_PAUSE_CFG,
+ SYS_PAUSE_TOT_CFG,
+ SYS_ATOP,
+ SYS_ATOP_TOT_CFG,
+ SYS_MAC_FC_CFG,
+ SYS_MMGT,
+ SYS_MMGT_FAST,
+ SYS_EVENTS_DIF,
+ SYS_EVENTS_CORE,
+ SYS_CNT,
+ SYS_PTP_STATUS,
+ SYS_PTP_TXSTAMP,
+ SYS_PTP_NXT,
+ SYS_PTP_CFG,
+ SYS_RAM_INIT,
+ SYS_CM_ADDR,
+ SYS_CM_DATA_WR,
+ SYS_CM_DATA_RD,
+ SYS_CM_OP,
+ SYS_CM_DATA,
+ HSIO_PLL5G_CFG0 = HSIO << TARGET_OFFSET,
+ HSIO_PLL5G_CFG1,
+ HSIO_PLL5G_CFG2,
+ HSIO_PLL5G_CFG3,
+ HSIO_PLL5G_CFG4,
+ HSIO_PLL5G_CFG5,
+ HSIO_PLL5G_CFG6,
+ HSIO_PLL5G_STATUS0,
+ HSIO_PLL5G_STATUS1,
+ HSIO_PLL5G_BIST_CFG0,
+ HSIO_PLL5G_BIST_CFG1,
+ HSIO_PLL5G_BIST_CFG2,
+ HSIO_PLL5G_BIST_STAT0,
+ HSIO_PLL5G_BIST_STAT1,
+ HSIO_RCOMP_CFG0,
+ HSIO_RCOMP_STATUS,
+ HSIO_SYNC_ETH_CFG,
+ HSIO_SYNC_ETH_PLL_CFG,
+ HSIO_S1G_DES_CFG,
+ HSIO_S1G_IB_CFG,
+ HSIO_S1G_OB_CFG,
+ HSIO_S1G_SER_CFG,
+ HSIO_S1G_COMMON_CFG,
+ HSIO_S1G_PLL_CFG,
+ HSIO_S1G_PLL_STATUS,
+ HSIO_S1G_DFT_CFG0,
+ HSIO_S1G_DFT_CFG1,
+ HSIO_S1G_DFT_CFG2,
+ HSIO_S1G_TP_CFG,
+ HSIO_S1G_RC_PLL_BIST_CFG,
+ HSIO_S1G_MISC_CFG,
+ HSIO_S1G_DFT_STATUS,
+ HSIO_S1G_MISC_STATUS,
+ HSIO_MCB_S1G_ADDR_CFG,
+ HSIO_S6G_DIG_CFG,
+ HSIO_S6G_DFT_CFG0,
+ HSIO_S6G_DFT_CFG1,
+ HSIO_S6G_DFT_CFG2,
+ HSIO_S6G_TP_CFG0,
+ HSIO_S6G_TP_CFG1,
+ HSIO_S6G_RC_PLL_BIST_CFG,
+ HSIO_S6G_MISC_CFG,
+ HSIO_S6G_OB_ANEG_CFG,
+ HSIO_S6G_DFT_STATUS,
+ HSIO_S6G_ERR_CNT,
+ HSIO_S6G_MISC_STATUS,
+ HSIO_S6G_DES_CFG,
+ HSIO_S6G_IB_CFG,
+ HSIO_S6G_IB_CFG1,
+ HSIO_S6G_IB_CFG2,
+ HSIO_S6G_IB_CFG3,
+ HSIO_S6G_IB_CFG4,
+ HSIO_S6G_IB_CFG5,
+ HSIO_S6G_OB_CFG,
+ HSIO_S6G_OB_CFG1,
+ HSIO_S6G_SER_CFG,
+ HSIO_S6G_COMMON_CFG,
+ HSIO_S6G_PLL_CFG,
+ HSIO_S6G_ACJTAG_CFG,
+ HSIO_S6G_GP_CFG,
+ HSIO_S6G_IB_STATUS0,
+ HSIO_S6G_IB_STATUS1,
+ HSIO_S6G_ACJTAG_STATUS,
+ HSIO_S6G_PLL_STATUS,
+ HSIO_S6G_REVID,
+ HSIO_MCB_S6G_ADDR_CFG,
+ HSIO_HW_CFG,
+ HSIO_HW_QSGMII_CFG,
+ HSIO_HW_QSGMII_STAT,
+ HSIO_CLK_CFG,
+ HSIO_TEMP_SENSOR_CTRL,
+ HSIO_TEMP_SENSOR_CFG,
+ HSIO_TEMP_SENSOR_STAT,
+};
+
+enum ocelot_regfield {
+ ANA_ADVLEARN_VLAN_CHK,
+ ANA_ADVLEARN_LEARN_MIRROR,
+ ANA_ANEVENTS_FLOOD_DISCARD,
+ ANA_ANEVENTS_MSTI_DROP,
+ ANA_ANEVENTS_ACLKILL,
+ ANA_ANEVENTS_ACLUSED,
+ ANA_ANEVENTS_AUTOAGE,
+ ANA_ANEVENTS_VS2TTL1,
+ ANA_ANEVENTS_STORM_DROP,
+ ANA_ANEVENTS_LEARN_DROP,
+ ANA_ANEVENTS_AGED_ENTRY,
+ ANA_ANEVENTS_CPU_LEARN_FAILED,
+ ANA_ANEVENTS_AUTO_LEARN_FAILED,
+ ANA_ANEVENTS_LEARN_REMOVE,
+ ANA_ANEVENTS_AUTO_LEARNED,
+ ANA_ANEVENTS_AUTO_MOVED,
+ ANA_ANEVENTS_DROPPED,
+ ANA_ANEVENTS_CLASSIFIED_DROP,
+ ANA_ANEVENTS_CLASSIFIED_COPY,
+ ANA_ANEVENTS_VLAN_DISCARD,
+ ANA_ANEVENTS_FWD_DISCARD,
+ ANA_ANEVENTS_MULTICAST_FLOOD,
+ ANA_ANEVENTS_UNICAST_FLOOD,
+ ANA_ANEVENTS_DEST_KNOWN,
+ ANA_ANEVENTS_BUCKET3_MATCH,
+ ANA_ANEVENTS_BUCKET2_MATCH,
+ ANA_ANEVENTS_BUCKET1_MATCH,
+ ANA_ANEVENTS_BUCKET0_MATCH,
+ ANA_ANEVENTS_CPU_OPERATION,
+ ANA_ANEVENTS_DMAC_LOOKUP,
+ ANA_ANEVENTS_SMAC_LOOKUP,
+ ANA_ANEVENTS_SEQ_GEN_ERR_0,
+ ANA_ANEVENTS_SEQ_GEN_ERR_1,
+ ANA_TABLES_MACACCESS_B_DOM,
+ ANA_TABLES_MACTINDX_BUCKET,
+ ANA_TABLES_MACTINDX_M_INDEX,
+ QSYS_TIMED_FRAME_ENTRY_TFRM_VLD,
+ QSYS_TIMED_FRAME_ENTRY_TFRM_FP,
+ QSYS_TIMED_FRAME_ENTRY_TFRM_PORTNO,
+ QSYS_TIMED_FRAME_ENTRY_TFRM_TM_SEL,
+ QSYS_TIMED_FRAME_ENTRY_TFRM_TM_T,
+ SYS_RESET_CFG_CORE_ENA,
+ SYS_RESET_CFG_MEM_ENA,
+ SYS_RESET_CFG_MEM_INIT,
+ REGFIELD_MAX
+};
+
+struct ocelot_multicast {
+ struct list_head list;
+ unsigned char addr[ETH_ALEN];
+ u16 vid;
+ u16 ports;
+};
+
+struct ocelot_port;
+
+struct ocelot_stat_layout {
+ u32 offset;
+ char name[ETH_GSTRING_LEN];
+};
+
+struct ocelot {
+ struct device *dev;
+
+ struct regmap *targets[TARGET_MAX];
+ struct regmap_field *regfields[REGFIELD_MAX];
+ const u32 *const *map;
+ const struct ocelot_stat_layout *stats_layout;
+ unsigned int num_stats;
+
+ u8 base_mac[ETH_ALEN];
+
+ struct net_device *hw_bridge_dev;
+ u16 bridge_mask;
+ u16 bridge_fwd_mask;
+
+ struct workqueue_struct *ocelot_owq;
+
+ int shared_queue_sz;
+
+ u8 num_phys_ports;
+ u8 num_cpu_ports;
+ struct ocelot_port **ports;
+
+ u16 lags[16];
+
+ /* Keep track of the vlan port masks */
+ u32 vlan_mask[VLAN_N_VID];
+
+ struct list_head multicast;
+
+ /* Workqueue to check statistics for overflow with its lock */
+ struct mutex stats_lock;
+ u64 *stats;
+ struct delayed_work stats_work;
+ struct workqueue_struct *stats_queue;
+};
+
+struct ocelot_port {
+ struct net_device *dev;
+ struct ocelot *ocelot;
+ struct phy_device *phy;
+ void __iomem *regs;
+ u8 chip_port;
+ /* Keep a track of the mc addresses added to the mac table, so that they
+ * can be removed when needed.
+ */
+ struct list_head mc;
+
+ /* Ingress default VLAN (pvid) */
+ u16 pvid;
+
+ /* Egress default VLAN (vid) */
+ u16 vid;
+
+ u8 vlan_aware;
+
+ u64 *stats;
+};
+
+u32 __ocelot_read_ix(struct ocelot *ocelot, u32 reg, u32 offset);
+#define ocelot_read_ix(ocelot, reg, gi, ri) __ocelot_read_ix(ocelot, reg, reg##_GSZ * (gi) + reg##_RSZ * (ri))
+#define ocelot_read_gix(ocelot, reg, gi) __ocelot_read_ix(ocelot, reg, reg##_GSZ * (gi))
+#define ocelot_read_rix(ocelot, reg, ri) __ocelot_read_ix(ocelot, reg, reg##_RSZ * (ri))
+#define ocelot_read(ocelot, reg) __ocelot_read_ix(ocelot, reg, 0)
+
+void __ocelot_write_ix(struct ocelot *ocelot, u32 val, u32 reg, u32 offset);
+#define ocelot_write_ix(ocelot, val, reg, gi, ri) __ocelot_write_ix(ocelot, val, reg, reg##_GSZ * (gi) + reg##_RSZ * (ri))
+#define ocelot_write_gix(ocelot, val, reg, gi) __ocelot_write_ix(ocelot, val, reg, reg##_GSZ * (gi))
+#define ocelot_write_rix(ocelot, val, reg, ri) __ocelot_write_ix(ocelot, val, reg, reg##_RSZ * (ri))
+#define ocelot_write(ocelot, val, reg) __ocelot_write_ix(ocelot, val, reg, 0)
+
+void __ocelot_rmw_ix(struct ocelot *ocelot, u32 val, u32 reg, u32 mask,
+ u32 offset);
+#define ocelot_rmw_ix(ocelot, val, m, reg, gi, ri) __ocelot_rmw_ix(ocelot, val, m, reg, reg##_GSZ * (gi) + reg##_RSZ * (ri))
+#define ocelot_rmw_gix(ocelot, val, m, reg, gi) __ocelot_rmw_ix(ocelot, val, m, reg, reg##_GSZ * (gi))
+#define ocelot_rmw_rix(ocelot, val, m, reg, ri) __ocelot_rmw_ix(ocelot, val, m, reg, reg##_RSZ * (ri))
+#define ocelot_rmw(ocelot, val, m, reg) __ocelot_rmw_ix(ocelot, val, m, reg, 0)
+
+u32 ocelot_port_readl(struct ocelot_port *port, u32 reg);
+void ocelot_port_writel(struct ocelot_port *port, u32 val, u32 reg);
+
+int ocelot_regfields_init(struct ocelot *ocelot,
+ const struct reg_field *const regfields);
+struct regmap *ocelot_io_platform_init(struct ocelot *ocelot,
+ struct platform_device *pdev,
+ const char *name);
+
+#define ocelot_field_write(ocelot, reg, val) regmap_field_write((ocelot)->regfields[(reg)], (val))
+#define ocelot_field_read(ocelot, reg, val) regmap_field_read((ocelot)->regfields[(reg)], (val))
+
+int ocelot_init(struct ocelot *ocelot);
+void ocelot_deinit(struct ocelot *ocelot);
+int ocelot_chip_init(struct ocelot *ocelot);
+int ocelot_probe_port(struct ocelot *ocelot, u8 port,
+ void __iomem *regs,
+ struct phy_device *phy);
+
+extern struct notifier_block ocelot_netdevice_nb;
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_ana.h b/drivers/net/ethernet/mscc/ocelot_ana.h
new file mode 100644
index 0000000..841c6ec
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_ana.h
@@ -0,0 +1,625 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_ANA_H_
+#define _MSCC_OCELOT_ANA_H_
+
+#define ANA_ANAGEFIL_B_DOM_EN BIT(22)
+#define ANA_ANAGEFIL_B_DOM_VAL BIT(21)
+#define ANA_ANAGEFIL_AGE_LOCKED BIT(20)
+#define ANA_ANAGEFIL_PID_EN BIT(19)
+#define ANA_ANAGEFIL_PID_VAL(x) (((x) << 14) & GENMASK(18, 14))
+#define ANA_ANAGEFIL_PID_VAL_M GENMASK(18, 14)
+#define ANA_ANAGEFIL_PID_VAL_X(x) (((x) & GENMASK(18, 14)) >> 14)
+#define ANA_ANAGEFIL_VID_EN BIT(13)
+#define ANA_ANAGEFIL_VID_VAL(x) ((x) & GENMASK(12, 0))
+#define ANA_ANAGEFIL_VID_VAL_M GENMASK(12, 0)
+
+#define ANA_STORMLIMIT_CFG_RSZ 0x4
+
+#define ANA_STORMLIMIT_CFG_STORM_RATE(x) (((x) << 3) & GENMASK(6, 3))
+#define ANA_STORMLIMIT_CFG_STORM_RATE_M GENMASK(6, 3)
+#define ANA_STORMLIMIT_CFG_STORM_RATE_X(x) (((x) & GENMASK(6, 3)) >> 3)
+#define ANA_STORMLIMIT_CFG_STORM_UNIT BIT(2)
+#define ANA_STORMLIMIT_CFG_STORM_MODE(x) ((x) & GENMASK(1, 0))
+#define ANA_STORMLIMIT_CFG_STORM_MODE_M GENMASK(1, 0)
+
+#define ANA_AUTOAGE_AGE_FAST BIT(21)
+#define ANA_AUTOAGE_AGE_PERIOD(x) (((x) << 1) & GENMASK(20, 1))
+#define ANA_AUTOAGE_AGE_PERIOD_M GENMASK(20, 1)
+#define ANA_AUTOAGE_AGE_PERIOD_X(x) (((x) & GENMASK(20, 1)) >> 1)
+#define ANA_AUTOAGE_AUTOAGE_LOCKED BIT(0)
+
+#define ANA_MACTOPTIONS_REDUCED_TABLE BIT(1)
+#define ANA_MACTOPTIONS_SHADOW BIT(0)
+
+#define ANA_AGENCTRL_FID_MASK(x) (((x) << 12) & GENMASK(23, 12))
+#define ANA_AGENCTRL_FID_MASK_M GENMASK(23, 12)
+#define ANA_AGENCTRL_FID_MASK_X(x) (((x) & GENMASK(23, 12)) >> 12)
+#define ANA_AGENCTRL_IGNORE_DMAC_FLAGS BIT(11)
+#define ANA_AGENCTRL_IGNORE_SMAC_FLAGS BIT(10)
+#define ANA_AGENCTRL_FLOOD_SPECIAL BIT(9)
+#define ANA_AGENCTRL_FLOOD_IGNORE_VLAN BIT(8)
+#define ANA_AGENCTRL_MIRROR_CPU BIT(7)
+#define ANA_AGENCTRL_LEARN_CPU_COPY BIT(6)
+#define ANA_AGENCTRL_LEARN_FWD_KILL BIT(5)
+#define ANA_AGENCTRL_LEARN_IGNORE_VLAN BIT(4)
+#define ANA_AGENCTRL_CPU_CPU_KILL_ENA BIT(3)
+#define ANA_AGENCTRL_GREEN_COUNT_MODE BIT(2)
+#define ANA_AGENCTRL_YELLOW_COUNT_MODE BIT(1)
+#define ANA_AGENCTRL_RED_COUNT_MODE BIT(0)
+
+#define ANA_FLOODING_RSZ 0x4
+
+#define ANA_FLOODING_FLD_UNICAST(x) (((x) << 12) & GENMASK(17, 12))
+#define ANA_FLOODING_FLD_UNICAST_M GENMASK(17, 12)
+#define ANA_FLOODING_FLD_UNICAST_X(x) (((x) & GENMASK(17, 12)) >> 12)
+#define ANA_FLOODING_FLD_BROADCAST(x) (((x) << 6) & GENMASK(11, 6))
+#define ANA_FLOODING_FLD_BROADCAST_M GENMASK(11, 6)
+#define ANA_FLOODING_FLD_BROADCAST_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define ANA_FLOODING_FLD_MULTICAST(x) ((x) & GENMASK(5, 0))
+#define ANA_FLOODING_FLD_MULTICAST_M GENMASK(5, 0)
+
+#define ANA_FLOODING_IPMC_FLD_MC4_CTRL(x) (((x) << 18) & GENMASK(23, 18))
+#define ANA_FLOODING_IPMC_FLD_MC4_CTRL_M GENMASK(23, 18)
+#define ANA_FLOODING_IPMC_FLD_MC4_CTRL_X(x) (((x) & GENMASK(23, 18)) >> 18)
+#define ANA_FLOODING_IPMC_FLD_MC4_DATA(x) (((x) << 12) & GENMASK(17, 12))
+#define ANA_FLOODING_IPMC_FLD_MC4_DATA_M GENMASK(17, 12)
+#define ANA_FLOODING_IPMC_FLD_MC4_DATA_X(x) (((x) & GENMASK(17, 12)) >> 12)
+#define ANA_FLOODING_IPMC_FLD_MC6_CTRL(x) (((x) << 6) & GENMASK(11, 6))
+#define ANA_FLOODING_IPMC_FLD_MC6_CTRL_M GENMASK(11, 6)
+#define ANA_FLOODING_IPMC_FLD_MC6_CTRL_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define ANA_FLOODING_IPMC_FLD_MC6_DATA(x) ((x) & GENMASK(5, 0))
+#define ANA_FLOODING_IPMC_FLD_MC6_DATA_M GENMASK(5, 0)
+
+#define ANA_SFLOW_CFG_RSZ 0x4
+
+#define ANA_SFLOW_CFG_SF_RATE(x) (((x) << 2) & GENMASK(13, 2))
+#define ANA_SFLOW_CFG_SF_RATE_M GENMASK(13, 2)
+#define ANA_SFLOW_CFG_SF_RATE_X(x) (((x) & GENMASK(13, 2)) >> 2)
+#define ANA_SFLOW_CFG_SF_SAMPLE_RX BIT(1)
+#define ANA_SFLOW_CFG_SF_SAMPLE_TX BIT(0)
+
+#define ANA_PORT_MODE_RSZ 0x4
+
+#define ANA_PORT_MODE_REDTAG_PARSE_CFG BIT(3)
+#define ANA_PORT_MODE_VLAN_PARSE_CFG(x) (((x) << 1) & GENMASK(2, 1))
+#define ANA_PORT_MODE_VLAN_PARSE_CFG_M GENMASK(2, 1)
+#define ANA_PORT_MODE_VLAN_PARSE_CFG_X(x) (((x) & GENMASK(2, 1)) >> 1)
+#define ANA_PORT_MODE_L3_PARSE_CFG BIT(0)
+
+#define ANA_CUT_THRU_CFG_RSZ 0x4
+
+#define ANA_PGID_PGID_RSZ 0x4
+
+#define ANA_PGID_PGID_PGID(x) ((x) & GENMASK(11, 0))
+#define ANA_PGID_PGID_PGID_M GENMASK(11, 0)
+#define ANA_PGID_PGID_CPUQ_DST_PGID(x) (((x) << 27) & GENMASK(29, 27))
+#define ANA_PGID_PGID_CPUQ_DST_PGID_M GENMASK(29, 27)
+#define ANA_PGID_PGID_CPUQ_DST_PGID_X(x) (((x) & GENMASK(29, 27)) >> 27)
+
+#define ANA_TABLES_MACHDATA_VID(x) (((x) << 16) & GENMASK(28, 16))
+#define ANA_TABLES_MACHDATA_VID_M GENMASK(28, 16)
+#define ANA_TABLES_MACHDATA_VID_X(x) (((x) & GENMASK(28, 16)) >> 16)
+#define ANA_TABLES_MACHDATA_MACHDATA(x) ((x) & GENMASK(15, 0))
+#define ANA_TABLES_MACHDATA_MACHDATA_M GENMASK(15, 0)
+
+#define ANA_TABLES_STREAMDATA_SSID_VALID BIT(16)
+#define ANA_TABLES_STREAMDATA_SSID(x) (((x) << 9) & GENMASK(15, 9))
+#define ANA_TABLES_STREAMDATA_SSID_M GENMASK(15, 9)
+#define ANA_TABLES_STREAMDATA_SSID_X(x) (((x) & GENMASK(15, 9)) >> 9)
+#define ANA_TABLES_STREAMDATA_SFID_VALID BIT(8)
+#define ANA_TABLES_STREAMDATA_SFID(x) ((x) & GENMASK(7, 0))
+#define ANA_TABLES_STREAMDATA_SFID_M GENMASK(7, 0)
+
+#define ANA_TABLES_MACACCESS_MAC_CPU_COPY BIT(15)
+#define ANA_TABLES_MACACCESS_SRC_KILL BIT(14)
+#define ANA_TABLES_MACACCESS_IGNORE_VLAN BIT(13)
+#define ANA_TABLES_MACACCESS_AGED_FLAG BIT(12)
+#define ANA_TABLES_MACACCESS_VALID BIT(11)
+#define ANA_TABLES_MACACCESS_ENTRYTYPE(x) (((x) << 9) & GENMASK(10, 9))
+#define ANA_TABLES_MACACCESS_ENTRYTYPE_M GENMASK(10, 9)
+#define ANA_TABLES_MACACCESS_ENTRYTYPE_X(x) (((x) & GENMASK(10, 9)) >> 9)
+#define ANA_TABLES_MACACCESS_DEST_IDX(x) (((x) << 3) & GENMASK(8, 3))
+#define ANA_TABLES_MACACCESS_DEST_IDX_M GENMASK(8, 3)
+#define ANA_TABLES_MACACCESS_DEST_IDX_X(x) (((x) & GENMASK(8, 3)) >> 3)
+#define ANA_TABLES_MACACCESS_MAC_TABLE_CMD(x) ((x) & GENMASK(2, 0))
+#define ANA_TABLES_MACACCESS_MAC_TABLE_CMD_M GENMASK(2, 0)
+#define MACACCESS_CMD_IDLE 0
+#define MACACCESS_CMD_LEARN 1
+#define MACACCESS_CMD_FORGET 2
+#define MACACCESS_CMD_AGE 3
+#define MACACCESS_CMD_GET_NEXT 4
+#define MACACCESS_CMD_INIT 5
+#define MACACCESS_CMD_READ 6
+#define MACACCESS_CMD_WRITE 7
+
+#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK(x) (((x) << 2) & GENMASK(13, 2))
+#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK_M GENMASK(13, 2)
+#define ANA_TABLES_VLANACCESS_VLAN_PORT_MASK_X(x) (((x) & GENMASK(13, 2)) >> 2)
+#define ANA_TABLES_VLANACCESS_VLAN_TBL_CMD(x) ((x) & GENMASK(1, 0))
+#define ANA_TABLES_VLANACCESS_VLAN_TBL_CMD_M GENMASK(1, 0)
+#define ANA_TABLES_VLANACCESS_CMD_IDLE 0x0
+#define ANA_TABLES_VLANACCESS_CMD_WRITE 0x2
+#define ANA_TABLES_VLANACCESS_CMD_INIT 0x3
+
+#define ANA_TABLES_VLANTIDX_VLAN_SEC_FWD_ENA BIT(17)
+#define ANA_TABLES_VLANTIDX_VLAN_FLOOD_DIS BIT(16)
+#define ANA_TABLES_VLANTIDX_VLAN_PRIV_VLAN BIT(15)
+#define ANA_TABLES_VLANTIDX_VLAN_LEARN_DISABLED BIT(14)
+#define ANA_TABLES_VLANTIDX_VLAN_MIRROR BIT(13)
+#define ANA_TABLES_VLANTIDX_VLAN_SRC_CHK BIT(12)
+#define ANA_TABLES_VLANTIDX_V_INDEX(x) ((x) & GENMASK(11, 0))
+#define ANA_TABLES_VLANTIDX_V_INDEX_M GENMASK(11, 0)
+
+#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK(x) (((x) << 2) & GENMASK(8, 2))
+#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK_M GENMASK(8, 2)
+#define ANA_TABLES_ISDXACCESS_ISDX_PORT_MASK_X(x) (((x) & GENMASK(8, 2)) >> 2)
+#define ANA_TABLES_ISDXACCESS_ISDX_TBL_CMD(x) ((x) & GENMASK(1, 0))
+#define ANA_TABLES_ISDXACCESS_ISDX_TBL_CMD_M GENMASK(1, 0)
+
+#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI(x) (((x) << 21) & GENMASK(28, 21))
+#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI_M GENMASK(28, 21)
+#define ANA_TABLES_ISDXTIDX_ISDX_SDLBI_X(x) (((x) & GENMASK(28, 21)) >> 21)
+#define ANA_TABLES_ISDXTIDX_ISDX_MSTI(x) (((x) << 15) & GENMASK(20, 15))
+#define ANA_TABLES_ISDXTIDX_ISDX_MSTI_M GENMASK(20, 15)
+#define ANA_TABLES_ISDXTIDX_ISDX_MSTI_X(x) (((x) & GENMASK(20, 15)) >> 15)
+#define ANA_TABLES_ISDXTIDX_ISDX_ES0_KEY_ENA BIT(14)
+#define ANA_TABLES_ISDXTIDX_ISDX_FORCE_ENA BIT(10)
+#define ANA_TABLES_ISDXTIDX_ISDX_INDEX(x) ((x) & GENMASK(7, 0))
+#define ANA_TABLES_ISDXTIDX_ISDX_INDEX_M GENMASK(7, 0)
+
+#define ANA_TABLES_ENTRYLIM_RSZ 0x4
+
+#define ANA_TABLES_ENTRYLIM_ENTRYLIM(x) (((x) << 14) & GENMASK(17, 14))
+#define ANA_TABLES_ENTRYLIM_ENTRYLIM_M GENMASK(17, 14)
+#define ANA_TABLES_ENTRYLIM_ENTRYLIM_X(x) (((x) & GENMASK(17, 14)) >> 14)
+#define ANA_TABLES_ENTRYLIM_ENTRYSTAT(x) ((x) & GENMASK(13, 0))
+#define ANA_TABLES_ENTRYLIM_ENTRYSTAT_M GENMASK(13, 0)
+
+#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM(x) (((x) << 4) & GENMASK(31, 4))
+#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM_M GENMASK(31, 4)
+#define ANA_TABLES_STREAMACCESS_GEN_REC_SEQ_NUM_X(x) (((x) & GENMASK(31, 4)) >> 4)
+#define ANA_TABLES_STREAMACCESS_SEQ_GEN_REC_ENA BIT(3)
+#define ANA_TABLES_STREAMACCESS_GEN_REC_TYPE BIT(2)
+#define ANA_TABLES_STREAMACCESS_STREAM_TBL_CMD(x) ((x) & GENMASK(1, 0))
+#define ANA_TABLES_STREAMACCESS_STREAM_TBL_CMD_M GENMASK(1, 0)
+
+#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS(x) (((x) << 30) & GENMASK(31, 30))
+#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS_M GENMASK(31, 30)
+#define ANA_TABLES_STREAMTIDX_SEQ_GEN_ERR_STATUS_X(x) (((x) & GENMASK(31, 30)) >> 30)
+#define ANA_TABLES_STREAMTIDX_S_INDEX(x) (((x) << 16) & GENMASK(22, 16))
+#define ANA_TABLES_STREAMTIDX_S_INDEX_M GENMASK(22, 16)
+#define ANA_TABLES_STREAMTIDX_S_INDEX_X(x) (((x) & GENMASK(22, 16)) >> 16)
+#define ANA_TABLES_STREAMTIDX_FORCE_SF_BEHAVIOUR BIT(14)
+#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN(x) (((x) << 8) & GENMASK(13, 8))
+#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN_M GENMASK(13, 8)
+#define ANA_TABLES_STREAMTIDX_SEQ_HISTORY_LEN_X(x) (((x) & GENMASK(13, 8)) >> 8)
+#define ANA_TABLES_STREAMTIDX_RESET_ON_ROGUE BIT(7)
+#define ANA_TABLES_STREAMTIDX_REDTAG_POP BIT(6)
+#define ANA_TABLES_STREAMTIDX_STREAM_SPLIT BIT(5)
+#define ANA_TABLES_STREAMTIDX_SEQ_SPACE_LOG2(x) ((x) & GENMASK(4, 0))
+#define ANA_TABLES_STREAMTIDX_SEQ_SPACE_LOG2_M GENMASK(4, 0)
+
+#define ANA_TABLES_SEQ_MASK_SPLIT_MASK(x) (((x) << 16) & GENMASK(22, 16))
+#define ANA_TABLES_SEQ_MASK_SPLIT_MASK_M GENMASK(22, 16)
+#define ANA_TABLES_SEQ_MASK_SPLIT_MASK_X(x) (((x) & GENMASK(22, 16)) >> 16)
+#define ANA_TABLES_SEQ_MASK_INPUT_PORT_MASK(x) ((x) & GENMASK(6, 0))
+#define ANA_TABLES_SEQ_MASK_INPUT_PORT_MASK_M GENMASK(6, 0)
+
+#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK(x) (((x) << 1) & GENMASK(7, 1))
+#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK_M GENMASK(7, 1)
+#define ANA_TABLES_SFID_MASK_IGR_PORT_MASK_X(x) (((x) & GENMASK(7, 1)) >> 1)
+#define ANA_TABLES_SFID_MASK_IGR_SRCPORT_MATCH_ENA BIT(0)
+
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO_MATCH_ENA BIT(22)
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO(x) (((x) << 19) & GENMASK(21, 19))
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO_M GENMASK(21, 19)
+#define ANA_TABLES_SFIDACCESS_IGR_PRIO_X(x) (((x) & GENMASK(21, 19)) >> 19)
+#define ANA_TABLES_SFIDACCESS_FORCE_BLOCK BIT(18)
+#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN(x) (((x) << 2) & GENMASK(17, 2))
+#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN_M GENMASK(17, 2)
+#define ANA_TABLES_SFIDACCESS_MAX_SDU_LEN_X(x) (((x) & GENMASK(17, 2)) >> 2)
+#define ANA_TABLES_SFIDACCESS_SFID_TBL_CMD(x) ((x) & GENMASK(1, 0))
+#define ANA_TABLES_SFIDACCESS_SFID_TBL_CMD_M GENMASK(1, 0)
+
+#define ANA_TABLES_SFIDTIDX_SGID_VALID BIT(26)
+#define ANA_TABLES_SFIDTIDX_SGID(x) (((x) << 18) & GENMASK(25, 18))
+#define ANA_TABLES_SFIDTIDX_SGID_M GENMASK(25, 18)
+#define ANA_TABLES_SFIDTIDX_SGID_X(x) (((x) & GENMASK(25, 18)) >> 18)
+#define ANA_TABLES_SFIDTIDX_POL_ENA BIT(17)
+#define ANA_TABLES_SFIDTIDX_POL_IDX(x) (((x) << 8) & GENMASK(16, 8))
+#define ANA_TABLES_SFIDTIDX_POL_IDX_M GENMASK(16, 8)
+#define ANA_TABLES_SFIDTIDX_POL_IDX_X(x) (((x) & GENMASK(16, 8)) >> 8)
+#define ANA_TABLES_SFIDTIDX_SFID_INDEX(x) ((x) & GENMASK(7, 0))
+#define ANA_TABLES_SFIDTIDX_SFID_INDEX_M GENMASK(7, 0)
+
+#define ANA_MSTI_STATE_RSZ 0x4
+
+#define ANA_OAM_UPM_LM_CNT_RSZ 0x4
+
+#define ANA_SG_ACCESS_CTRL_SGID(x) ((x) & GENMASK(7, 0))
+#define ANA_SG_ACCESS_CTRL_SGID_M GENMASK(7, 0)
+#define ANA_SG_ACCESS_CTRL_CONFIG_CHANGE BIT(28)
+
+#define ANA_SG_CONFIG_REG_3_BASE_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0))
+#define ANA_SG_CONFIG_REG_3_BASE_TIME_SEC_MSB_M GENMASK(15, 0)
+#define ANA_SG_CONFIG_REG_3_LIST_LENGTH(x) (((x) << 16) & GENMASK(18, 16))
+#define ANA_SG_CONFIG_REG_3_LIST_LENGTH_M GENMASK(18, 16)
+#define ANA_SG_CONFIG_REG_3_LIST_LENGTH_X(x) (((x) & GENMASK(18, 16)) >> 16)
+#define ANA_SG_CONFIG_REG_3_GATE_ENABLE BIT(20)
+#define ANA_SG_CONFIG_REG_3_INIT_IPS(x) (((x) << 24) & GENMASK(27, 24))
+#define ANA_SG_CONFIG_REG_3_INIT_IPS_M GENMASK(27, 24)
+#define ANA_SG_CONFIG_REG_3_INIT_IPS_X(x) (((x) & GENMASK(27, 24)) >> 24)
+#define ANA_SG_CONFIG_REG_3_INIT_GATE_STATE BIT(28)
+
+#define ANA_SG_GCL_GS_CONFIG_RSZ 0x4
+
+#define ANA_SG_GCL_GS_CONFIG_IPS(x) ((x) & GENMASK(3, 0))
+#define ANA_SG_GCL_GS_CONFIG_IPS_M GENMASK(3, 0)
+#define ANA_SG_GCL_GS_CONFIG_GATE_STATE BIT(4)
+
+#define ANA_SG_GCL_TI_CONFIG_RSZ 0x4
+
+#define ANA_SG_STATUS_REG_3_CFG_CHG_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0))
+#define ANA_SG_STATUS_REG_3_CFG_CHG_TIME_SEC_MSB_M GENMASK(15, 0)
+#define ANA_SG_STATUS_REG_3_GATE_STATE BIT(16)
+#define ANA_SG_STATUS_REG_3_IPS(x) (((x) << 20) & GENMASK(23, 20))
+#define ANA_SG_STATUS_REG_3_IPS_M GENMASK(23, 20)
+#define ANA_SG_STATUS_REG_3_IPS_X(x) (((x) & GENMASK(23, 20)) >> 20)
+#define ANA_SG_STATUS_REG_3_CONFIG_PENDING BIT(24)
+
+#define ANA_PORT_VLAN_CFG_GSZ 0x100
+
+#define ANA_PORT_VLAN_CFG_VLAN_VID_AS_ISDX BIT(21)
+#define ANA_PORT_VLAN_CFG_VLAN_AWARE_ENA BIT(20)
+#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT(x) (((x) << 18) & GENMASK(19, 18))
+#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT_M GENMASK(19, 18)
+#define ANA_PORT_VLAN_CFG_VLAN_POP_CNT_X(x) (((x) & GENMASK(19, 18)) >> 18)
+#define ANA_PORT_VLAN_CFG_VLAN_INNER_TAG_ENA BIT(17)
+#define ANA_PORT_VLAN_CFG_VLAN_TAG_TYPE BIT(16)
+#define ANA_PORT_VLAN_CFG_VLAN_DEI BIT(15)
+#define ANA_PORT_VLAN_CFG_VLAN_PCP(x) (((x) << 12) & GENMASK(14, 12))
+#define ANA_PORT_VLAN_CFG_VLAN_PCP_M GENMASK(14, 12)
+#define ANA_PORT_VLAN_CFG_VLAN_PCP_X(x) (((x) & GENMASK(14, 12)) >> 12)
+#define ANA_PORT_VLAN_CFG_VLAN_VID(x) ((x) & GENMASK(11, 0))
+#define ANA_PORT_VLAN_CFG_VLAN_VID_M GENMASK(11, 0)
+
+#define ANA_PORT_DROP_CFG_GSZ 0x100
+
+#define ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA BIT(6)
+#define ANA_PORT_DROP_CFG_DROP_S_TAGGED_ENA BIT(5)
+#define ANA_PORT_DROP_CFG_DROP_C_TAGGED_ENA BIT(4)
+#define ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA BIT(3)
+#define ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA BIT(2)
+#define ANA_PORT_DROP_CFG_DROP_NULL_MAC_ENA BIT(1)
+#define ANA_PORT_DROP_CFG_DROP_MC_SMAC_ENA BIT(0)
+
+#define ANA_PORT_QOS_CFG_GSZ 0x100
+
+#define ANA_PORT_QOS_CFG_DP_DEFAULT_VAL BIT(8)
+#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL(x) (((x) << 5) & GENMASK(7, 5))
+#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL_M GENMASK(7, 5)
+#define ANA_PORT_QOS_CFG_QOS_DEFAULT_VAL_X(x) (((x) & GENMASK(7, 5)) >> 5)
+#define ANA_PORT_QOS_CFG_QOS_DSCP_ENA BIT(4)
+#define ANA_PORT_QOS_CFG_QOS_PCP_ENA BIT(3)
+#define ANA_PORT_QOS_CFG_DSCP_TRANSLATE_ENA BIT(2)
+#define ANA_PORT_QOS_CFG_DSCP_REWR_CFG(x) ((x) & GENMASK(1, 0))
+#define ANA_PORT_QOS_CFG_DSCP_REWR_CFG_M GENMASK(1, 0)
+
+#define ANA_PORT_VCAP_CFG_GSZ 0x100
+
+#define ANA_PORT_VCAP_CFG_S1_ENA BIT(14)
+#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA(x) (((x) << 11) & GENMASK(13, 11))
+#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA_M GENMASK(13, 11)
+#define ANA_PORT_VCAP_CFG_S1_DMAC_DIP_ENA_X(x) (((x) & GENMASK(13, 11)) >> 11)
+#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA(x) (((x) << 8) & GENMASK(10, 8))
+#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA_M GENMASK(10, 8)
+#define ANA_PORT_VCAP_CFG_S1_VLAN_INNER_TAG_ENA_X(x) (((x) & GENMASK(10, 8)) >> 8)
+#define ANA_PORT_VCAP_CFG_PAG_VAL(x) ((x) & GENMASK(7, 0))
+#define ANA_PORT_VCAP_CFG_PAG_VAL_M GENMASK(7, 0)
+
+#define ANA_PORT_VCAP_S1_KEY_CFG_GSZ 0x100
+#define ANA_PORT_VCAP_S1_KEY_CFG_RSZ 0x4
+
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG(x) (((x) << 4) & GENMASK(6, 4))
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG_M GENMASK(6, 4)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP6_CFG_X(x) (((x) & GENMASK(6, 4)) >> 4)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG(x) (((x) << 2) & GENMASK(3, 2))
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG_M GENMASK(3, 2)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_IP4_CFG_X(x) (((x) & GENMASK(3, 2)) >> 2)
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_OTHER_CFG(x) ((x) & GENMASK(1, 0))
+#define ANA_PORT_VCAP_S1_KEY_CFG_S1_KEY_OTHER_CFG_M GENMASK(1, 0)
+
+#define ANA_PORT_VCAP_S2_CFG_GSZ 0x100
+
+#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA(x) (((x) << 17) & GENMASK(18, 17))
+#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA_M GENMASK(18, 17)
+#define ANA_PORT_VCAP_S2_CFG_S2_UDP_PAYLOAD_ENA_X(x) (((x) & GENMASK(18, 17)) >> 17)
+#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA(x) (((x) << 15) & GENMASK(16, 15))
+#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA_M GENMASK(16, 15)
+#define ANA_PORT_VCAP_S2_CFG_S2_ETYPE_PAYLOAD_ENA_X(x) (((x) & GENMASK(16, 15)) >> 15)
+#define ANA_PORT_VCAP_S2_CFG_S2_ENA BIT(14)
+#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS(x) (((x) << 12) & GENMASK(13, 12))
+#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS_M GENMASK(13, 12)
+#define ANA_PORT_VCAP_S2_CFG_S2_SNAP_DIS_X(x) (((x) & GENMASK(13, 12)) >> 12)
+#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS(x) (((x) << 10) & GENMASK(11, 10))
+#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS_M GENMASK(11, 10)
+#define ANA_PORT_VCAP_S2_CFG_S2_ARP_DIS_X(x) (((x) & GENMASK(11, 10)) >> 10)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS(x) (((x) << 8) & GENMASK(9, 8))
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS_M GENMASK(9, 8)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_TCPUDP_DIS_X(x) (((x) & GENMASK(9, 8)) >> 8)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS(x) (((x) << 6) & GENMASK(7, 6))
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS_M GENMASK(7, 6)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP_OTHER_DIS_X(x) (((x) & GENMASK(7, 6)) >> 6)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG(x) (((x) << 2) & GENMASK(5, 2))
+#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG_M GENMASK(5, 2)
+#define ANA_PORT_VCAP_S2_CFG_S2_IP6_CFG_X(x) (((x) & GENMASK(5, 2)) >> 2)
+#define ANA_PORT_VCAP_S2_CFG_S2_OAM_DIS(x) ((x) & GENMASK(1, 0))
+#define ANA_PORT_VCAP_S2_CFG_S2_OAM_DIS_M GENMASK(1, 0)
+
+#define ANA_PORT_PCP_DEI_MAP_GSZ 0x100
+#define ANA_PORT_PCP_DEI_MAP_RSZ 0x4
+
+#define ANA_PORT_PCP_DEI_MAP_DP_PCP_DEI_VAL BIT(3)
+#define ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL(x) ((x) & GENMASK(2, 0))
+#define ANA_PORT_PCP_DEI_MAP_QOS_PCP_DEI_VAL_M GENMASK(2, 0)
+
+#define ANA_PORT_CPU_FWD_CFG_GSZ 0x100
+
+#define ANA_PORT_CPU_FWD_CFG_CPU_VRAP_REDIR_ENA BIT(7)
+#define ANA_PORT_CPU_FWD_CFG_CPU_MLD_REDIR_ENA BIT(6)
+#define ANA_PORT_CPU_FWD_CFG_CPU_IGMP_REDIR_ENA BIT(5)
+#define ANA_PORT_CPU_FWD_CFG_CPU_IPMC_CTRL_COPY_ENA BIT(4)
+#define ANA_PORT_CPU_FWD_CFG_CPU_SRC_COPY_ENA BIT(3)
+#define ANA_PORT_CPU_FWD_CFG_CPU_ALLBRIDGE_DROP_ENA BIT(2)
+#define ANA_PORT_CPU_FWD_CFG_CPU_ALLBRIDGE_REDIR_ENA BIT(1)
+#define ANA_PORT_CPU_FWD_CFG_CPU_OAM_ENA BIT(0)
+
+#define ANA_PORT_CPU_FWD_BPDU_CFG_GSZ 0x100
+
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA(x) (((x) << 16) & GENMASK(31, 16))
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA_M GENMASK(31, 16)
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_DROP_ENA_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA(x) ((x) & GENMASK(15, 0))
+#define ANA_PORT_CPU_FWD_BPDU_CFG_BPDU_REDIR_ENA_M GENMASK(15, 0)
+
+#define ANA_PORT_CPU_FWD_GARP_CFG_GSZ 0x100
+
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA(x) (((x) << 16) & GENMASK(31, 16))
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA_M GENMASK(31, 16)
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_DROP_ENA_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_REDIR_ENA(x) ((x) & GENMASK(15, 0))
+#define ANA_PORT_CPU_FWD_GARP_CFG_GARP_REDIR_ENA_M GENMASK(15, 0)
+
+#define ANA_PORT_CPU_FWD_CCM_CFG_GSZ 0x100
+
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA(x) (((x) << 16) & GENMASK(31, 16))
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA_M GENMASK(31, 16)
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_DROP_ENA_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_REDIR_ENA(x) ((x) & GENMASK(15, 0))
+#define ANA_PORT_CPU_FWD_CCM_CFG_CCM_REDIR_ENA_M GENMASK(15, 0)
+
+#define ANA_PORT_PORT_CFG_GSZ 0x100
+
+#define ANA_PORT_PORT_CFG_SRC_MIRROR_ENA BIT(15)
+#define ANA_PORT_PORT_CFG_LIMIT_DROP BIT(14)
+#define ANA_PORT_PORT_CFG_LIMIT_CPU BIT(13)
+#define ANA_PORT_PORT_CFG_LOCKED_PORTMOVE_DROP BIT(12)
+#define ANA_PORT_PORT_CFG_LOCKED_PORTMOVE_CPU BIT(11)
+#define ANA_PORT_PORT_CFG_LEARNDROP BIT(10)
+#define ANA_PORT_PORT_CFG_LEARNCPU BIT(9)
+#define ANA_PORT_PORT_CFG_LEARNAUTO BIT(8)
+#define ANA_PORT_PORT_CFG_LEARN_ENA BIT(7)
+#define ANA_PORT_PORT_CFG_RECV_ENA BIT(6)
+#define ANA_PORT_PORT_CFG_PORTID_VAL(x) (((x) << 2) & GENMASK(5, 2))
+#define ANA_PORT_PORT_CFG_PORTID_VAL_M GENMASK(5, 2)
+#define ANA_PORT_PORT_CFG_PORTID_VAL_X(x) (((x) & GENMASK(5, 2)) >> 2)
+#define ANA_PORT_PORT_CFG_USE_B_DOM_TBL BIT(1)
+#define ANA_PORT_PORT_CFG_LSR_MODE BIT(0)
+
+#define ANA_PORT_POL_CFG_GSZ 0x100
+
+#define ANA_PORT_POL_CFG_POL_CPU_REDIR_8021 BIT(19)
+#define ANA_PORT_POL_CFG_POL_CPU_REDIR_IP BIT(18)
+#define ANA_PORT_POL_CFG_PORT_POL_ENA BIT(17)
+#define ANA_PORT_POL_CFG_QUEUE_POL_ENA(x) (((x) << 9) & GENMASK(16, 9))
+#define ANA_PORT_POL_CFG_QUEUE_POL_ENA_M GENMASK(16, 9)
+#define ANA_PORT_POL_CFG_QUEUE_POL_ENA_X(x) (((x) & GENMASK(16, 9)) >> 9)
+#define ANA_PORT_POL_CFG_POL_ORDER(x) ((x) & GENMASK(8, 0))
+#define ANA_PORT_POL_CFG_POL_ORDER_M GENMASK(8, 0)
+
+#define ANA_PORT_PTP_CFG_GSZ 0x100
+
+#define ANA_PORT_PTP_CFG_PTP_BACKPLANE_MODE BIT(0)
+
+#define ANA_PORT_PTP_DLY1_CFG_GSZ 0x100
+
+#define ANA_PORT_PTP_DLY2_CFG_GSZ 0x100
+
+#define ANA_PORT_SFID_CFG_GSZ 0x100
+#define ANA_PORT_SFID_CFG_RSZ 0x4
+
+#define ANA_PORT_SFID_CFG_SFID_VALID BIT(8)
+#define ANA_PORT_SFID_CFG_SFID(x) ((x) & GENMASK(7, 0))
+#define ANA_PORT_SFID_CFG_SFID_M GENMASK(7, 0)
+
+#define ANA_PFC_PFC_CFG_GSZ 0x40
+
+#define ANA_PFC_PFC_CFG_RX_PFC_ENA(x) (((x) << 2) & GENMASK(9, 2))
+#define ANA_PFC_PFC_CFG_RX_PFC_ENA_M GENMASK(9, 2)
+#define ANA_PFC_PFC_CFG_RX_PFC_ENA_X(x) (((x) & GENMASK(9, 2)) >> 2)
+#define ANA_PFC_PFC_CFG_FC_LINK_SPEED(x) ((x) & GENMASK(1, 0))
+#define ANA_PFC_PFC_CFG_FC_LINK_SPEED_M GENMASK(1, 0)
+
+#define ANA_PFC_PFC_TIMER_GSZ 0x40
+#define ANA_PFC_PFC_TIMER_RSZ 0x4
+
+#define ANA_IPT_OAM_MEP_CFG_GSZ 0x8
+
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P(x) (((x) << 6) & GENMASK(10, 6))
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P_M GENMASK(10, 6)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_P_X(x) (((x) & GENMASK(10, 6)) >> 6)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX(x) (((x) << 1) & GENMASK(5, 1))
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_M GENMASK(5, 1)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_X(x) (((x) & GENMASK(5, 1)) >> 1)
+#define ANA_IPT_OAM_MEP_CFG_MEP_IDX_ENA BIT(0)
+
+#define ANA_IPT_IPT_GSZ 0x8
+
+#define ANA_IPT_IPT_IPT_CFG(x) (((x) << 15) & GENMASK(16, 15))
+#define ANA_IPT_IPT_IPT_CFG_M GENMASK(16, 15)
+#define ANA_IPT_IPT_IPT_CFG_X(x) (((x) & GENMASK(16, 15)) >> 15)
+#define ANA_IPT_IPT_ISDX_P(x) (((x) << 7) & GENMASK(14, 7))
+#define ANA_IPT_IPT_ISDX_P_M GENMASK(14, 7)
+#define ANA_IPT_IPT_ISDX_P_X(x) (((x) & GENMASK(14, 7)) >> 7)
+#define ANA_IPT_IPT_PPT_IDX(x) ((x) & GENMASK(6, 0))
+#define ANA_IPT_IPT_PPT_IDX_M GENMASK(6, 0)
+
+#define ANA_PPT_PPT_RSZ 0x4
+
+#define ANA_FID_MAP_FID_MAP_RSZ 0x4
+
+#define ANA_FID_MAP_FID_MAP_FID_C_VAL(x) (((x) << 6) & GENMASK(11, 6))
+#define ANA_FID_MAP_FID_MAP_FID_C_VAL_M GENMASK(11, 6)
+#define ANA_FID_MAP_FID_MAP_FID_C_VAL_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define ANA_FID_MAP_FID_MAP_FID_B_VAL(x) ((x) & GENMASK(5, 0))
+#define ANA_FID_MAP_FID_MAP_FID_B_VAL_M GENMASK(5, 0)
+
+#define ANA_AGGR_CFG_AC_RND_ENA BIT(7)
+#define ANA_AGGR_CFG_AC_DMAC_ENA BIT(6)
+#define ANA_AGGR_CFG_AC_SMAC_ENA BIT(5)
+#define ANA_AGGR_CFG_AC_IP6_FLOW_LBL_ENA BIT(4)
+#define ANA_AGGR_CFG_AC_IP6_TCPUDP_ENA BIT(3)
+#define ANA_AGGR_CFG_AC_IP4_SIPDIP_ENA BIT(2)
+#define ANA_AGGR_CFG_AC_IP4_TCPUDP_ENA BIT(1)
+#define ANA_AGGR_CFG_AC_ISDX_ENA BIT(0)
+
+#define ANA_CPUQ_CFG_CPUQ_MLD(x) (((x) << 27) & GENMASK(29, 27))
+#define ANA_CPUQ_CFG_CPUQ_MLD_M GENMASK(29, 27)
+#define ANA_CPUQ_CFG_CPUQ_MLD_X(x) (((x) & GENMASK(29, 27)) >> 27)
+#define ANA_CPUQ_CFG_CPUQ_IGMP(x) (((x) << 24) & GENMASK(26, 24))
+#define ANA_CPUQ_CFG_CPUQ_IGMP_M GENMASK(26, 24)
+#define ANA_CPUQ_CFG_CPUQ_IGMP_X(x) (((x) & GENMASK(26, 24)) >> 24)
+#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL(x) (((x) << 21) & GENMASK(23, 21))
+#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL_M GENMASK(23, 21)
+#define ANA_CPUQ_CFG_CPUQ_IPMC_CTRL_X(x) (((x) & GENMASK(23, 21)) >> 21)
+#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE(x) (((x) << 18) & GENMASK(20, 18))
+#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE_M GENMASK(20, 18)
+#define ANA_CPUQ_CFG_CPUQ_ALLBRIDGE_X(x) (((x) & GENMASK(20, 18)) >> 18)
+#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE(x) (((x) << 15) & GENMASK(17, 15))
+#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE_M GENMASK(17, 15)
+#define ANA_CPUQ_CFG_CPUQ_LOCKED_PORTMOVE_X(x) (((x) & GENMASK(17, 15)) >> 15)
+#define ANA_CPUQ_CFG_CPUQ_SRC_COPY(x) (((x) << 12) & GENMASK(14, 12))
+#define ANA_CPUQ_CFG_CPUQ_SRC_COPY_M GENMASK(14, 12)
+#define ANA_CPUQ_CFG_CPUQ_SRC_COPY_X(x) (((x) & GENMASK(14, 12)) >> 12)
+#define ANA_CPUQ_CFG_CPUQ_MAC_COPY(x) (((x) << 9) & GENMASK(11, 9))
+#define ANA_CPUQ_CFG_CPUQ_MAC_COPY_M GENMASK(11, 9)
+#define ANA_CPUQ_CFG_CPUQ_MAC_COPY_X(x) (((x) & GENMASK(11, 9)) >> 9)
+#define ANA_CPUQ_CFG_CPUQ_LRN(x) (((x) << 6) & GENMASK(8, 6))
+#define ANA_CPUQ_CFG_CPUQ_LRN_M GENMASK(8, 6)
+#define ANA_CPUQ_CFG_CPUQ_LRN_X(x) (((x) & GENMASK(8, 6)) >> 6)
+#define ANA_CPUQ_CFG_CPUQ_MIRROR(x) (((x) << 3) & GENMASK(5, 3))
+#define ANA_CPUQ_CFG_CPUQ_MIRROR_M GENMASK(5, 3)
+#define ANA_CPUQ_CFG_CPUQ_MIRROR_X(x) (((x) & GENMASK(5, 3)) >> 3)
+#define ANA_CPUQ_CFG_CPUQ_SFLOW(x) ((x) & GENMASK(2, 0))
+#define ANA_CPUQ_CFG_CPUQ_SFLOW_M GENMASK(2, 0)
+
+#define ANA_CPUQ_8021_CFG_RSZ 0x4
+
+#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL(x) (((x) << 6) & GENMASK(8, 6))
+#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL_M GENMASK(8, 6)
+#define ANA_CPUQ_8021_CFG_CPUQ_BPDU_VAL_X(x) (((x) & GENMASK(8, 6)) >> 6)
+#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL(x) (((x) << 3) & GENMASK(5, 3))
+#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL_M GENMASK(5, 3)
+#define ANA_CPUQ_8021_CFG_CPUQ_GARP_VAL_X(x) (((x) & GENMASK(5, 3)) >> 3)
+#define ANA_CPUQ_8021_CFG_CPUQ_CCM_VAL(x) ((x) & GENMASK(2, 0))
+#define ANA_CPUQ_8021_CFG_CPUQ_CCM_VAL_M GENMASK(2, 0)
+
+#define ANA_DSCP_CFG_RSZ 0x4
+
+#define ANA_DSCP_CFG_DP_DSCP_VAL BIT(11)
+#define ANA_DSCP_CFG_QOS_DSCP_VAL(x) (((x) << 8) & GENMASK(10, 8))
+#define ANA_DSCP_CFG_QOS_DSCP_VAL_M GENMASK(10, 8)
+#define ANA_DSCP_CFG_QOS_DSCP_VAL_X(x) (((x) & GENMASK(10, 8)) >> 8)
+#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL(x) (((x) << 2) & GENMASK(7, 2))
+#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL_M GENMASK(7, 2)
+#define ANA_DSCP_CFG_DSCP_TRANSLATE_VAL_X(x) (((x) & GENMASK(7, 2)) >> 2)
+#define ANA_DSCP_CFG_DSCP_TRUST_ENA BIT(1)
+#define ANA_DSCP_CFG_DSCP_REWR_ENA BIT(0)
+
+#define ANA_DSCP_REWR_CFG_RSZ 0x4
+
+#define ANA_VCAP_RNG_TYPE_CFG_RSZ 0x4
+
+#define ANA_VCAP_RNG_VAL_CFG_RSZ 0x4
+
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL(x) (((x) << 16) & GENMASK(31, 16))
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL_M GENMASK(31, 16)
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MIN_VAL_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MAX_VAL(x) ((x) & GENMASK(15, 0))
+#define ANA_VCAP_RNG_VAL_CFG_VCAP_RNG_MAX_VAL_M GENMASK(15, 0)
+
+#define ANA_VRAP_CFG_VRAP_VLAN_AWARE_ENA BIT(12)
+#define ANA_VRAP_CFG_VRAP_VID(x) ((x) & GENMASK(11, 0))
+#define ANA_VRAP_CFG_VRAP_VID_M GENMASK(11, 0)
+
+#define ANA_DISCARD_CFG_DROP_TAGGING_ISDX0 BIT(3)
+#define ANA_DISCARD_CFG_DROP_CTRLPROT_ISDX0 BIT(2)
+#define ANA_DISCARD_CFG_DROP_TAGGING_S2_ENA BIT(1)
+#define ANA_DISCARD_CFG_DROP_CTRLPROT_S2_ENA BIT(0)
+
+#define ANA_FID_CFG_VID_MC_ENA BIT(0)
+
+#define ANA_POL_PIR_CFG_GSZ 0x20
+
+#define ANA_POL_PIR_CFG_PIR_RATE(x) (((x) << 6) & GENMASK(20, 6))
+#define ANA_POL_PIR_CFG_PIR_RATE_M GENMASK(20, 6)
+#define ANA_POL_PIR_CFG_PIR_RATE_X(x) (((x) & GENMASK(20, 6)) >> 6)
+#define ANA_POL_PIR_CFG_PIR_BURST(x) ((x) & GENMASK(5, 0))
+#define ANA_POL_PIR_CFG_PIR_BURST_M GENMASK(5, 0)
+
+#define ANA_POL_CIR_CFG_GSZ 0x20
+
+#define ANA_POL_CIR_CFG_CIR_RATE(x) (((x) << 6) & GENMASK(20, 6))
+#define ANA_POL_CIR_CFG_CIR_RATE_M GENMASK(20, 6)
+#define ANA_POL_CIR_CFG_CIR_RATE_X(x) (((x) & GENMASK(20, 6)) >> 6)
+#define ANA_POL_CIR_CFG_CIR_BURST(x) ((x) & GENMASK(5, 0))
+#define ANA_POL_CIR_CFG_CIR_BURST_M GENMASK(5, 0)
+
+#define ANA_POL_MODE_CFG_GSZ 0x20
+
+#define ANA_POL_MODE_CFG_IPG_SIZE(x) (((x) << 5) & GENMASK(9, 5))
+#define ANA_POL_MODE_CFG_IPG_SIZE_M GENMASK(9, 5)
+#define ANA_POL_MODE_CFG_IPG_SIZE_X(x) (((x) & GENMASK(9, 5)) >> 5)
+#define ANA_POL_MODE_CFG_FRM_MODE(x) (((x) << 3) & GENMASK(4, 3))
+#define ANA_POL_MODE_CFG_FRM_MODE_M GENMASK(4, 3)
+#define ANA_POL_MODE_CFG_FRM_MODE_X(x) (((x) & GENMASK(4, 3)) >> 3)
+#define ANA_POL_MODE_CFG_DLB_COUPLED BIT(2)
+#define ANA_POL_MODE_CFG_CIR_ENA BIT(1)
+#define ANA_POL_MODE_CFG_OVERSHOOT_ENA BIT(0)
+
+#define ANA_POL_PIR_STATE_GSZ 0x20
+
+#define ANA_POL_CIR_STATE_GSZ 0x20
+
+#define ANA_POL_STATE_GSZ 0x20
+
+#define ANA_POL_FLOWC_RSZ 0x4
+
+#define ANA_POL_FLOWC_POL_FLOWC BIT(0)
+
+#define ANA_POL_HYST_POL_FC_HYST(x) (((x) << 4) & GENMASK(9, 4))
+#define ANA_POL_HYST_POL_FC_HYST_M GENMASK(9, 4)
+#define ANA_POL_HYST_POL_FC_HYST_X(x) (((x) & GENMASK(9, 4)) >> 4)
+#define ANA_POL_HYST_POL_STOP_HYST(x) ((x) & GENMASK(3, 0))
+#define ANA_POL_HYST_POL_STOP_HYST_M GENMASK(3, 0)
+
+#define ANA_POL_MISC_CFG_POL_CLOSE_ALL BIT(1)
+#define ANA_POL_MISC_CFG_POL_LEAK_DIS BIT(0)
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_board.c b/drivers/net/ethernet/mscc/ocelot_board.c
new file mode 100644
index 0000000..18df7d9
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_board.c
@@ -0,0 +1,316 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/of_mdio.h>
+#include <linux/of_platform.h>
+#include <linux/skbuff.h>
+
+#include "ocelot.h"
+
+static int ocelot_parse_ifh(u32 *ifh, struct frame_info *info)
+{
+ int i;
+ u8 llen, wlen;
+
+ /* The IFH is in network order, switch to CPU order */
+ for (i = 0; i < IFH_LEN; i++)
+ ifh[i] = ntohl((__force __be32)ifh[i]);
+
+ wlen = (ifh[1] >> 7) & 0xff;
+ llen = (ifh[1] >> 15) & 0x3f;
+ info->len = OCELOT_BUFFER_CELL_SZ * wlen + llen - 80;
+
+ info->port = (ifh[2] & GENMASK(14, 11)) >> 11;
+
+ info->cpuq = (ifh[3] & GENMASK(27, 20)) >> 20;
+ info->tag_type = (ifh[3] & GENMASK(16, 16)) >> 16;
+ info->vid = ifh[3] & GENMASK(11, 0);
+
+ return 0;
+}
+
+static int ocelot_rx_frame_word(struct ocelot *ocelot, u8 grp, bool ifh,
+ u32 *rval)
+{
+ u32 val;
+ u32 bytes_valid;
+
+ val = ocelot_read_rix(ocelot, QS_XTR_RD, grp);
+ if (val == XTR_NOT_READY) {
+ if (ifh)
+ return -EIO;
+
+ do {
+ val = ocelot_read_rix(ocelot, QS_XTR_RD, grp);
+ } while (val == XTR_NOT_READY);
+ }
+
+ switch (val) {
+ case XTR_ABORT:
+ return -EIO;
+ case XTR_EOF_0:
+ case XTR_EOF_1:
+ case XTR_EOF_2:
+ case XTR_EOF_3:
+ case XTR_PRUNED:
+ bytes_valid = XTR_VALID_BYTES(val);
+ val = ocelot_read_rix(ocelot, QS_XTR_RD, grp);
+ if (val == XTR_ESCAPE)
+ *rval = ocelot_read_rix(ocelot, QS_XTR_RD, grp);
+ else
+ *rval = val;
+
+ return bytes_valid;
+ case XTR_ESCAPE:
+ *rval = ocelot_read_rix(ocelot, QS_XTR_RD, grp);
+
+ return 4;
+ default:
+ *rval = val;
+
+ return 4;
+ }
+}
+
+static irqreturn_t ocelot_xtr_irq_handler(int irq, void *arg)
+{
+ struct ocelot *ocelot = arg;
+ int i = 0, grp = 0;
+ int err = 0;
+
+ if (!(ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)))
+ return IRQ_NONE;
+
+ do {
+ struct sk_buff *skb;
+ struct net_device *dev;
+ u32 *buf;
+ int sz, len;
+ u32 ifh[4];
+ u32 val;
+ struct frame_info info;
+
+ for (i = 0; i < IFH_LEN; i++) {
+ err = ocelot_rx_frame_word(ocelot, grp, true, &ifh[i]);
+ if (err != 4)
+ break;
+ }
+
+ if (err != 4)
+ break;
+
+ ocelot_parse_ifh(ifh, &info);
+
+ dev = ocelot->ports[info.port]->dev;
+
+ skb = netdev_alloc_skb(dev, info.len);
+
+ if (unlikely(!skb)) {
+ netdev_err(dev, "Unable to allocate sk_buff\n");
+ err = -ENOMEM;
+ break;
+ }
+ buf = (u32 *)skb_put(skb, info.len);
+
+ len = 0;
+ do {
+ sz = ocelot_rx_frame_word(ocelot, grp, false, &val);
+ *buf++ = val;
+ len += sz;
+ } while ((sz == 4) && (len < info.len));
+
+ if (sz < 0) {
+ err = sz;
+ break;
+ }
+
+ /* Everything we see on an interface that is in the HW bridge
+ * has already been forwarded.
+ */
+ if (ocelot->bridge_mask & BIT(info.port))
+ skb->offload_fwd_mark = 1;
+
+ skb->protocol = eth_type_trans(skb, dev);
+ netif_rx(skb);
+ dev->stats.rx_bytes += len;
+ dev->stats.rx_packets++;
+ } while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp));
+
+ if (err)
+ while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp))
+ ocelot_read_rix(ocelot, QS_XTR_RD, grp);
+
+ return IRQ_HANDLED;
+}
+
+static const struct of_device_id mscc_ocelot_match[] = {
+ { .compatible = "mscc,vsc7514-switch" },
+ { }
+};
+MODULE_DEVICE_TABLE(of, mscc_ocelot_match);
+
+static int mscc_ocelot_probe(struct platform_device *pdev)
+{
+ int err, irq;
+ unsigned int i;
+ struct device_node *np = pdev->dev.of_node;
+ struct device_node *ports, *portnp;
+ struct ocelot *ocelot;
+ u32 val;
+
+ struct {
+ enum ocelot_target id;
+ char *name;
+ } res[] = {
+ { SYS, "sys" },
+ { REW, "rew" },
+ { QSYS, "qsys" },
+ { ANA, "ana" },
+ { QS, "qs" },
+ { HSIO, "hsio" },
+ };
+
+ if (!np && !pdev->dev.platform_data)
+ return -ENODEV;
+
+ ocelot = devm_kzalloc(&pdev->dev, sizeof(*ocelot), GFP_KERNEL);
+ if (!ocelot)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, ocelot);
+ ocelot->dev = &pdev->dev;
+
+ for (i = 0; i < ARRAY_SIZE(res); i++) {
+ struct regmap *target;
+
+ target = ocelot_io_platform_init(ocelot, pdev, res[i].name);
+ if (IS_ERR(target))
+ return PTR_ERR(target);
+
+ ocelot->targets[res[i].id] = target;
+ }
+
+ err = ocelot_chip_init(ocelot);
+ if (err)
+ return err;
+
+ irq = platform_get_irq_byname(pdev, "xtr");
+ if (irq < 0)
+ return -ENODEV;
+
+ err = devm_request_threaded_irq(&pdev->dev, irq, NULL,
+ ocelot_xtr_irq_handler, IRQF_ONESHOT,
+ "frame extraction", ocelot);
+ if (err)
+ return err;
+
+ regmap_field_write(ocelot->regfields[SYS_RESET_CFG_MEM_INIT], 1);
+ regmap_field_write(ocelot->regfields[SYS_RESET_CFG_MEM_ENA], 1);
+
+ do {
+ msleep(1);
+ regmap_field_read(ocelot->regfields[SYS_RESET_CFG_MEM_INIT],
+ &val);
+ } while (val);
+
+ regmap_field_write(ocelot->regfields[SYS_RESET_CFG_MEM_ENA], 1);
+ regmap_field_write(ocelot->regfields[SYS_RESET_CFG_CORE_ENA], 1);
+
+ ocelot->num_cpu_ports = 1; /* 1 port on the switch, two groups */
+
+ ports = of_get_child_by_name(np, "ethernet-ports");
+ if (!ports) {
+ dev_err(&pdev->dev, "no ethernet-ports child node found\n");
+ return -ENODEV;
+ }
+
+ ocelot->num_phys_ports = of_get_child_count(ports);
+
+ ocelot->ports = devm_kcalloc(&pdev->dev, ocelot->num_phys_ports,
+ sizeof(struct ocelot_port *), GFP_KERNEL);
+
+ INIT_LIST_HEAD(&ocelot->multicast);
+ ocelot_init(ocelot);
+
+ ocelot_rmw(ocelot, HSIO_HW_CFG_DEV1G_4_MODE |
+ HSIO_HW_CFG_DEV1G_6_MODE |
+ HSIO_HW_CFG_DEV1G_9_MODE,
+ HSIO_HW_CFG_DEV1G_4_MODE |
+ HSIO_HW_CFG_DEV1G_6_MODE |
+ HSIO_HW_CFG_DEV1G_9_MODE,
+ HSIO_HW_CFG);
+
+ for_each_available_child_of_node(ports, portnp) {
+ struct device_node *phy_node;
+ struct phy_device *phy;
+ struct resource *res;
+ void __iomem *regs;
+ char res_name[8];
+ u32 port;
+
+ if (of_property_read_u32(portnp, "reg", &port))
+ continue;
+
+ snprintf(res_name, sizeof(res_name), "port%d", port);
+
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
+ res_name);
+ regs = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(regs))
+ continue;
+
+ phy_node = of_parse_phandle(portnp, "phy-handle", 0);
+ if (!phy_node)
+ continue;
+
+ phy = of_phy_find_device(phy_node);
+ if (!phy)
+ continue;
+
+ err = ocelot_probe_port(ocelot, port, regs, phy);
+ if (err) {
+ dev_err(&pdev->dev, "failed to probe ports\n");
+ goto err_probe_ports;
+ }
+ }
+
+ register_netdevice_notifier(&ocelot_netdevice_nb);
+
+ dev_info(&pdev->dev, "Ocelot switch probed\n");
+
+ return 0;
+
+err_probe_ports:
+ return err;
+}
+
+static int mscc_ocelot_remove(struct platform_device *pdev)
+{
+ struct ocelot *ocelot = platform_get_drvdata(pdev);
+
+ ocelot_deinit(ocelot);
+ unregister_netdevice_notifier(&ocelot_netdevice_nb);
+
+ return 0;
+}
+
+static struct platform_driver mscc_ocelot_driver = {
+ .probe = mscc_ocelot_probe,
+ .remove = mscc_ocelot_remove,
+ .driver = {
+ .name = "ocelot-switch",
+ .of_match_table = mscc_ocelot_match,
+ },
+};
+
+module_platform_driver(mscc_ocelot_driver);
+
+MODULE_DESCRIPTION("Microsemi Ocelot switch driver");
+MODULE_AUTHOR("Alexandre Belloni <alexandre.belloni@bootlin.com>");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/drivers/net/ethernet/mscc/ocelot_dev.h b/drivers/net/ethernet/mscc/ocelot_dev.h
new file mode 100644
index 0000000..0a50d53
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_dev.h
@@ -0,0 +1,275 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_DEV_H_
+#define _MSCC_OCELOT_DEV_H_
+
+#define DEV_CLOCK_CFG 0x0
+
+#define DEV_CLOCK_CFG_MAC_TX_RST BIT(7)
+#define DEV_CLOCK_CFG_MAC_RX_RST BIT(6)
+#define DEV_CLOCK_CFG_PCS_TX_RST BIT(5)
+#define DEV_CLOCK_CFG_PCS_RX_RST BIT(4)
+#define DEV_CLOCK_CFG_PORT_RST BIT(3)
+#define DEV_CLOCK_CFG_PHY_RST BIT(2)
+#define DEV_CLOCK_CFG_LINK_SPEED(x) ((x) & GENMASK(1, 0))
+#define DEV_CLOCK_CFG_LINK_SPEED_M GENMASK(1, 0)
+
+#define DEV_PORT_MISC 0x4
+
+#define DEV_PORT_MISC_FWD_ERROR_ENA BIT(4)
+#define DEV_PORT_MISC_FWD_PAUSE_ENA BIT(3)
+#define DEV_PORT_MISC_FWD_CTRL_ENA BIT(2)
+#define DEV_PORT_MISC_DEV_LOOP_ENA BIT(1)
+#define DEV_PORT_MISC_HDX_FAST_DIS BIT(0)
+
+#define DEV_EVENTS 0x8
+
+#define DEV_EEE_CFG 0xc
+
+#define DEV_EEE_CFG_EEE_ENA BIT(22)
+#define DEV_EEE_CFG_EEE_TIMER_AGE(x) (((x) << 15) & GENMASK(21, 15))
+#define DEV_EEE_CFG_EEE_TIMER_AGE_M GENMASK(21, 15)
+#define DEV_EEE_CFG_EEE_TIMER_AGE_X(x) (((x) & GENMASK(21, 15)) >> 15)
+#define DEV_EEE_CFG_EEE_TIMER_WAKEUP(x) (((x) << 8) & GENMASK(14, 8))
+#define DEV_EEE_CFG_EEE_TIMER_WAKEUP_M GENMASK(14, 8)
+#define DEV_EEE_CFG_EEE_TIMER_WAKEUP_X(x) (((x) & GENMASK(14, 8)) >> 8)
+#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF(x) (((x) << 1) & GENMASK(7, 1))
+#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF_M GENMASK(7, 1)
+#define DEV_EEE_CFG_EEE_TIMER_HOLDOFF_X(x) (((x) & GENMASK(7, 1)) >> 1)
+#define DEV_EEE_CFG_PORT_LPI BIT(0)
+
+#define DEV_RX_PATH_DELAY 0x10
+
+#define DEV_TX_PATH_DELAY 0x14
+
+#define DEV_PTP_PREDICT_CFG 0x18
+
+#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG(x) (((x) << 4) & GENMASK(11, 4))
+#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG_M GENMASK(11, 4)
+#define DEV_PTP_PREDICT_CFG_PTP_PHY_PREDICT_CFG_X(x) (((x) & GENMASK(11, 4)) >> 4)
+#define DEV_PTP_PREDICT_CFG_PTP_PHASE_PREDICT_CFG(x) ((x) & GENMASK(3, 0))
+#define DEV_PTP_PREDICT_CFG_PTP_PHASE_PREDICT_CFG_M GENMASK(3, 0)
+
+#define DEV_MAC_ENA_CFG 0x1c
+
+#define DEV_MAC_ENA_CFG_RX_ENA BIT(4)
+#define DEV_MAC_ENA_CFG_TX_ENA BIT(0)
+
+#define DEV_MAC_MODE_CFG 0x20
+
+#define DEV_MAC_MODE_CFG_FC_WORD_SYNC_ENA BIT(8)
+#define DEV_MAC_MODE_CFG_GIGA_MODE_ENA BIT(4)
+#define DEV_MAC_MODE_CFG_FDX_ENA BIT(0)
+
+#define DEV_MAC_MAXLEN_CFG 0x24
+
+#define DEV_MAC_TAGS_CFG 0x28
+
+#define DEV_MAC_TAGS_CFG_TAG_ID(x) (((x) << 16) & GENMASK(31, 16))
+#define DEV_MAC_TAGS_CFG_TAG_ID_M GENMASK(31, 16)
+#define DEV_MAC_TAGS_CFG_TAG_ID_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define DEV_MAC_TAGS_CFG_VLAN_LEN_AWR_ENA BIT(2)
+#define DEV_MAC_TAGS_CFG_PB_ENA BIT(1)
+#define DEV_MAC_TAGS_CFG_VLAN_AWR_ENA BIT(0)
+
+#define DEV_MAC_ADV_CHK_CFG 0x2c
+
+#define DEV_MAC_ADV_CHK_CFG_LEN_DROP_ENA BIT(0)
+
+#define DEV_MAC_IFG_CFG 0x30
+
+#define DEV_MAC_IFG_CFG_RESTORE_OLD_IPG_CHECK BIT(17)
+#define DEV_MAC_IFG_CFG_REDUCED_TX_IFG BIT(16)
+#define DEV_MAC_IFG_CFG_TX_IFG(x) (((x) << 8) & GENMASK(12, 8))
+#define DEV_MAC_IFG_CFG_TX_IFG_M GENMASK(12, 8)
+#define DEV_MAC_IFG_CFG_TX_IFG_X(x) (((x) & GENMASK(12, 8)) >> 8)
+#define DEV_MAC_IFG_CFG_RX_IFG2(x) (((x) << 4) & GENMASK(7, 4))
+#define DEV_MAC_IFG_CFG_RX_IFG2_M GENMASK(7, 4)
+#define DEV_MAC_IFG_CFG_RX_IFG2_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define DEV_MAC_IFG_CFG_RX_IFG1(x) ((x) & GENMASK(3, 0))
+#define DEV_MAC_IFG_CFG_RX_IFG1_M GENMASK(3, 0)
+
+#define DEV_MAC_HDX_CFG 0x34
+
+#define DEV_MAC_HDX_CFG_BYPASS_COL_SYNC BIT(26)
+#define DEV_MAC_HDX_CFG_OB_ENA BIT(25)
+#define DEV_MAC_HDX_CFG_WEXC_DIS BIT(24)
+#define DEV_MAC_HDX_CFG_SEED(x) (((x) << 16) & GENMASK(23, 16))
+#define DEV_MAC_HDX_CFG_SEED_M GENMASK(23, 16)
+#define DEV_MAC_HDX_CFG_SEED_X(x) (((x) & GENMASK(23, 16)) >> 16)
+#define DEV_MAC_HDX_CFG_SEED_LOAD BIT(12)
+#define DEV_MAC_HDX_CFG_RETRY_AFTER_EXC_COL_ENA BIT(8)
+#define DEV_MAC_HDX_CFG_LATE_COL_POS(x) ((x) & GENMASK(6, 0))
+#define DEV_MAC_HDX_CFG_LATE_COL_POS_M GENMASK(6, 0)
+
+#define DEV_MAC_DBG_CFG 0x38
+
+#define DEV_MAC_DBG_CFG_TBI_MODE BIT(4)
+#define DEV_MAC_DBG_CFG_IFG_CRS_EXT_CHK_ENA BIT(0)
+
+#define DEV_MAC_FC_MAC_LOW_CFG 0x3c
+
+#define DEV_MAC_FC_MAC_HIGH_CFG 0x40
+
+#define DEV_MAC_STICKY 0x44
+
+#define DEV_MAC_STICKY_RX_IPG_SHRINK_STICKY BIT(9)
+#define DEV_MAC_STICKY_RX_PREAM_SHRINK_STICKY BIT(8)
+#define DEV_MAC_STICKY_RX_CARRIER_EXT_STICKY BIT(7)
+#define DEV_MAC_STICKY_RX_CARRIER_EXT_ERR_STICKY BIT(6)
+#define DEV_MAC_STICKY_RX_JUNK_STICKY BIT(5)
+#define DEV_MAC_STICKY_TX_RETRANSMIT_STICKY BIT(4)
+#define DEV_MAC_STICKY_TX_JAM_STICKY BIT(3)
+#define DEV_MAC_STICKY_TX_FIFO_OFLW_STICKY BIT(2)
+#define DEV_MAC_STICKY_TX_FRM_LEN_OVR_STICKY BIT(1)
+#define DEV_MAC_STICKY_TX_ABORT_STICKY BIT(0)
+
+#define PCS1G_CFG 0x48
+
+#define PCS1G_CFG_LINK_STATUS_TYPE BIT(4)
+#define PCS1G_CFG_AN_LINK_CTRL_ENA BIT(1)
+#define PCS1G_CFG_PCS_ENA BIT(0)
+
+#define PCS1G_MODE_CFG 0x4c
+
+#define PCS1G_MODE_CFG_UNIDIR_MODE_ENA BIT(4)
+#define PCS1G_MODE_CFG_SGMII_MODE_ENA BIT(0)
+
+#define PCS1G_SD_CFG 0x50
+
+#define PCS1G_SD_CFG_SD_SEL BIT(8)
+#define PCS1G_SD_CFG_SD_POL BIT(4)
+#define PCS1G_SD_CFG_SD_ENA BIT(0)
+
+#define PCS1G_ANEG_CFG 0x54
+
+#define PCS1G_ANEG_CFG_ADV_ABILITY(x) (((x) << 16) & GENMASK(31, 16))
+#define PCS1G_ANEG_CFG_ADV_ABILITY_M GENMASK(31, 16)
+#define PCS1G_ANEG_CFG_ADV_ABILITY_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define PCS1G_ANEG_CFG_SW_RESOLVE_ENA BIT(8)
+#define PCS1G_ANEG_CFG_ANEG_RESTART_ONE_SHOT BIT(1)
+#define PCS1G_ANEG_CFG_ANEG_ENA BIT(0)
+
+#define PCS1G_ANEG_NP_CFG 0x58
+
+#define PCS1G_ANEG_NP_CFG_NP_TX(x) (((x) << 16) & GENMASK(31, 16))
+#define PCS1G_ANEG_NP_CFG_NP_TX_M GENMASK(31, 16)
+#define PCS1G_ANEG_NP_CFG_NP_TX_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define PCS1G_ANEG_NP_CFG_NP_LOADED_ONE_SHOT BIT(0)
+
+#define PCS1G_LB_CFG 0x5c
+
+#define PCS1G_LB_CFG_RA_ENA BIT(4)
+#define PCS1G_LB_CFG_GMII_PHY_LB_ENA BIT(1)
+#define PCS1G_LB_CFG_TBI_HOST_LB_ENA BIT(0)
+
+#define PCS1G_DBG_CFG 0x60
+
+#define PCS1G_DBG_CFG_UDLT BIT(0)
+
+#define PCS1G_CDET_CFG 0x64
+
+#define PCS1G_CDET_CFG_CDET_ENA BIT(0)
+
+#define PCS1G_ANEG_STATUS 0x68
+
+#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY(x) (((x) << 16) & GENMASK(31, 16))
+#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY_M GENMASK(31, 16)
+#define PCS1G_ANEG_STATUS_LP_ADV_ABILITY_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define PCS1G_ANEG_STATUS_PR BIT(4)
+#define PCS1G_ANEG_STATUS_PAGE_RX_STICKY BIT(3)
+#define PCS1G_ANEG_STATUS_ANEG_COMPLETE BIT(0)
+
+#define PCS1G_ANEG_NP_STATUS 0x6c
+
+#define PCS1G_LINK_STATUS 0x70
+
+#define PCS1G_LINK_STATUS_DELAY_VAR(x) (((x) << 12) & GENMASK(15, 12))
+#define PCS1G_LINK_STATUS_DELAY_VAR_M GENMASK(15, 12)
+#define PCS1G_LINK_STATUS_DELAY_VAR_X(x) (((x) & GENMASK(15, 12)) >> 12)
+#define PCS1G_LINK_STATUS_SIGNAL_DETECT BIT(8)
+#define PCS1G_LINK_STATUS_LINK_STATUS BIT(4)
+#define PCS1G_LINK_STATUS_SYNC_STATUS BIT(0)
+
+#define PCS1G_LINK_DOWN_CNT 0x74
+
+#define PCS1G_STICKY 0x78
+
+#define PCS1G_STICKY_LINK_DOWN_STICKY BIT(4)
+#define PCS1G_STICKY_OUT_OF_SYNC_STICKY BIT(0)
+
+#define PCS1G_DEBUG_STATUS 0x7c
+
+#define PCS1G_LPI_CFG 0x80
+
+#define PCS1G_LPI_CFG_QSGMII_MS_SEL BIT(20)
+#define PCS1G_LPI_CFG_RX_LPI_OUT_DIS BIT(17)
+#define PCS1G_LPI_CFG_LPI_TESTMODE BIT(16)
+#define PCS1G_LPI_CFG_LPI_RX_WTIM(x) (((x) << 4) & GENMASK(5, 4))
+#define PCS1G_LPI_CFG_LPI_RX_WTIM_M GENMASK(5, 4)
+#define PCS1G_LPI_CFG_LPI_RX_WTIM_X(x) (((x) & GENMASK(5, 4)) >> 4)
+#define PCS1G_LPI_CFG_TX_ASSERT_LPIDLE BIT(0)
+
+#define PCS1G_LPI_WAKE_ERROR_CNT 0x84
+
+#define PCS1G_LPI_STATUS 0x88
+
+#define PCS1G_LPI_STATUS_RX_LPI_FAIL BIT(16)
+#define PCS1G_LPI_STATUS_RX_LPI_EVENT_STICKY BIT(12)
+#define PCS1G_LPI_STATUS_RX_QUIET BIT(9)
+#define PCS1G_LPI_STATUS_RX_LPI_MODE BIT(8)
+#define PCS1G_LPI_STATUS_TX_LPI_EVENT_STICKY BIT(4)
+#define PCS1G_LPI_STATUS_TX_QUIET BIT(1)
+#define PCS1G_LPI_STATUS_TX_LPI_MODE BIT(0)
+
+#define PCS1G_TSTPAT_MODE_CFG 0x8c
+
+#define PCS1G_TSTPAT_STATUS 0x90
+
+#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT(x) (((x) << 8) & GENMASK(15, 8))
+#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT_M GENMASK(15, 8)
+#define PCS1G_TSTPAT_STATUS_JTP_ERR_CNT_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define PCS1G_TSTPAT_STATUS_JTP_ERR BIT(4)
+#define PCS1G_TSTPAT_STATUS_JTP_LOCK BIT(0)
+
+#define DEV_PCS_FX100_CFG 0x94
+
+#define DEV_PCS_FX100_CFG_SD_SEL BIT(26)
+#define DEV_PCS_FX100_CFG_SD_POL BIT(25)
+#define DEV_PCS_FX100_CFG_SD_ENA BIT(24)
+#define DEV_PCS_FX100_CFG_LOOPBACK_ENA BIT(20)
+#define DEV_PCS_FX100_CFG_SWAP_MII_ENA BIT(16)
+#define DEV_PCS_FX100_CFG_RXBITSEL(x) (((x) << 12) & GENMASK(15, 12))
+#define DEV_PCS_FX100_CFG_RXBITSEL_M GENMASK(15, 12)
+#define DEV_PCS_FX100_CFG_RXBITSEL_X(x) (((x) & GENMASK(15, 12)) >> 12)
+#define DEV_PCS_FX100_CFG_SIGDET_CFG(x) (((x) << 9) & GENMASK(10, 9))
+#define DEV_PCS_FX100_CFG_SIGDET_CFG_M GENMASK(10, 9)
+#define DEV_PCS_FX100_CFG_SIGDET_CFG_X(x) (((x) & GENMASK(10, 9)) >> 9)
+#define DEV_PCS_FX100_CFG_LINKHYST_TM_ENA BIT(8)
+#define DEV_PCS_FX100_CFG_LINKHYSTTIMER(x) (((x) << 4) & GENMASK(7, 4))
+#define DEV_PCS_FX100_CFG_LINKHYSTTIMER_M GENMASK(7, 4)
+#define DEV_PCS_FX100_CFG_LINKHYSTTIMER_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define DEV_PCS_FX100_CFG_UNIDIR_MODE_ENA BIT(3)
+#define DEV_PCS_FX100_CFG_FEFCHK_ENA BIT(2)
+#define DEV_PCS_FX100_CFG_FEFGEN_ENA BIT(1)
+#define DEV_PCS_FX100_CFG_PCS_ENA BIT(0)
+
+#define DEV_PCS_FX100_STATUS 0x98
+
+#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP(x) (((x) << 8) & GENMASK(11, 8))
+#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP_M GENMASK(11, 8)
+#define DEV_PCS_FX100_STATUS_EDGE_POS_PTP_X(x) (((x) & GENMASK(11, 8)) >> 8)
+#define DEV_PCS_FX100_STATUS_PCS_ERROR_STICKY BIT(7)
+#define DEV_PCS_FX100_STATUS_FEF_FOUND_STICKY BIT(6)
+#define DEV_PCS_FX100_STATUS_SSD_ERROR_STICKY BIT(5)
+#define DEV_PCS_FX100_STATUS_SYNC_LOST_STICKY BIT(4)
+#define DEV_PCS_FX100_STATUS_FEF_STATUS BIT(2)
+#define DEV_PCS_FX100_STATUS_SIGNAL_DETECT BIT(1)
+#define DEV_PCS_FX100_STATUS_SYNC_STATUS BIT(0)
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_dev_gmii.h b/drivers/net/ethernet/mscc/ocelot_dev_gmii.h
new file mode 100644
index 0000000..6aa40ea
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_dev_gmii.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_DEV_GMII_H_
+#define _MSCC_OCELOT_DEV_GMII_H_
+
+#define DEV_GMII_PORT_MODE_CLOCK_CFG 0x0
+
+#define DEV_GMII_PORT_MODE_CLOCK_CFG_MAC_TX_RST BIT(5)
+#define DEV_GMII_PORT_MODE_CLOCK_CFG_MAC_RX_RST BIT(4)
+#define DEV_GMII_PORT_MODE_CLOCK_CFG_PORT_RST BIT(3)
+#define DEV_GMII_PORT_MODE_CLOCK_CFG_PHY_RST BIT(2)
+#define DEV_GMII_PORT_MODE_CLOCK_CFG_LINK_SPEED(x) ((x) & GENMASK(1, 0))
+#define DEV_GMII_PORT_MODE_CLOCK_CFG_LINK_SPEED_M GENMASK(1, 0)
+
+#define DEV_GMII_PORT_MODE_PORT_MISC 0x4
+
+#define DEV_GMII_PORT_MODE_PORT_MISC_MPLS_RX_ENA BIT(5)
+#define DEV_GMII_PORT_MODE_PORT_MISC_FWD_ERROR_ENA BIT(4)
+#define DEV_GMII_PORT_MODE_PORT_MISC_FWD_PAUSE_ENA BIT(3)
+#define DEV_GMII_PORT_MODE_PORT_MISC_FWD_CTRL_ENA BIT(2)
+#define DEV_GMII_PORT_MODE_PORT_MISC_GMII_LOOP_ENA BIT(1)
+#define DEV_GMII_PORT_MODE_PORT_MISC_DEV_LOOP_ENA BIT(0)
+
+#define DEV_GMII_PORT_MODE_EVENTS 0x8
+
+#define DEV_GMII_PORT_MODE_EEE_CFG 0xc
+
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_ENA BIT(22)
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_AGE(x) (((x) << 15) & GENMASK(21, 15))
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_AGE_M GENMASK(21, 15)
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_AGE_X(x) (((x) & GENMASK(21, 15)) >> 15)
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_WAKEUP(x) (((x) << 8) & GENMASK(14, 8))
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_WAKEUP_M GENMASK(14, 8)
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_WAKEUP_X(x) (((x) & GENMASK(14, 8)) >> 8)
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_HOLDOFF(x) (((x) << 1) & GENMASK(7, 1))
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_HOLDOFF_M GENMASK(7, 1)
+#define DEV_GMII_PORT_MODE_EEE_CFG_EEE_TIMER_HOLDOFF_X(x) (((x) & GENMASK(7, 1)) >> 1)
+#define DEV_GMII_PORT_MODE_EEE_CFG_PORT_LPI BIT(0)
+
+#define DEV_GMII_PORT_MODE_RX_PATH_DELAY 0x10
+
+#define DEV_GMII_PORT_MODE_TX_PATH_DELAY 0x14
+
+#define DEV_GMII_PORT_MODE_PTP_PREDICT_CFG 0x18
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_ENA_CFG 0x1c
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_ENA_CFG_RX_ENA BIT(4)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_ENA_CFG_TX_ENA BIT(0)
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_MODE_CFG 0x20
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_MODE_CFG_FC_WORD_SYNC_ENA BIT(8)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_MODE_CFG_GIGA_MODE_ENA BIT(4)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_MODE_CFG_FDX_ENA BIT(0)
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_MAXLEN_CFG 0x24
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_TAGS_CFG 0x28
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_TAGS_CFG_TAG_ID(x) (((x) << 16) & GENMASK(31, 16))
+#define DEV_GMII_MAC_CFG_STATUS_MAC_TAGS_CFG_TAG_ID_M GENMASK(31, 16)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_TAGS_CFG_TAG_ID_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_TAGS_CFG_PB_ENA BIT(1)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_TAGS_CFG_VLAN_AWR_ENA BIT(0)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_TAGS_CFG_VLAN_LEN_AWR_ENA BIT(2)
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_ADV_CHK_CFG 0x2c
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_ADV_CHK_CFG_LEN_DROP_ENA BIT(0)
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG 0x30
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_RESTORE_OLD_IPG_CHECK BIT(17)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_REDUCED_TX_IFG BIT(16)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_TX_IFG(x) (((x) << 8) & GENMASK(12, 8))
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_TX_IFG_M GENMASK(12, 8)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_TX_IFG_X(x) (((x) & GENMASK(12, 8)) >> 8)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_RX_IFG2(x) (((x) << 4) & GENMASK(7, 4))
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_RX_IFG2_M GENMASK(7, 4)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_RX_IFG2_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_RX_IFG1(x) ((x) & GENMASK(3, 0))
+#define DEV_GMII_MAC_CFG_STATUS_MAC_IFG_CFG_RX_IFG1_M GENMASK(3, 0)
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG 0x34
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_BYPASS_COL_SYNC BIT(26)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_OB_ENA BIT(25)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_WEXC_DIS BIT(24)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_SEED(x) (((x) << 16) & GENMASK(23, 16))
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_SEED_M GENMASK(23, 16)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_SEED_X(x) (((x) & GENMASK(23, 16)) >> 16)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_SEED_LOAD BIT(12)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_RETRY_AFTER_EXC_COL_ENA BIT(8)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_LATE_COL_POS(x) ((x) & GENMASK(6, 0))
+#define DEV_GMII_MAC_CFG_STATUS_MAC_HDX_CFG_LATE_COL_POS_M GENMASK(6, 0)
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_DBG_CFG 0x38
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_DBG_CFG_TBI_MODE BIT(4)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_DBG_CFG_IFG_CRS_EXT_CHK_ENA BIT(0)
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_FC_MAC_LOW_CFG 0x3c
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_FC_MAC_HIGH_CFG 0x40
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY 0x44
+
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_RX_IPG_SHRINK_STICKY BIT(9)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_RX_PREAM_SHRINK_STICKY BIT(8)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_RX_CARRIER_EXT_STICKY BIT(7)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_RX_CARRIER_EXT_ERR_STICKY BIT(6)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_RX_JUNK_STICKY BIT(5)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_TX_RETRANSMIT_STICKY BIT(4)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_TX_JAM_STICKY BIT(3)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_TX_FIFO_OFLW_STICKY BIT(2)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_TX_FRM_LEN_OVR_STICKY BIT(1)
+#define DEV_GMII_MAC_CFG_STATUS_MAC_STICKY_TX_ABORT_STICKY BIT(0)
+
+#define DEV_GMII_MM_CONFIG_ENABLE_CONFIG 0x48
+
+#define DEV_GMII_MM_CONFIG_ENABLE_CONFIG_MM_RX_ENA BIT(0)
+#define DEV_GMII_MM_CONFIG_ENABLE_CONFIG_MM_TX_ENA BIT(4)
+#define DEV_GMII_MM_CONFIG_ENABLE_CONFIG_KEEP_S_AFTER_D BIT(8)
+
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG 0x4c
+
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG_PRM_VERIFY_DIS BIT(0)
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG_PRM_VERIFY_TIME(x) (((x) << 4) & GENMASK(11, 4))
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG_PRM_VERIFY_TIME_M GENMASK(11, 4)
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG_PRM_VERIFY_TIME_X(x) (((x) & GENMASK(11, 4)) >> 4)
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG_VERIF_TIMER_UNITS(x) (((x) << 12) & GENMASK(13, 12))
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG_VERIF_TIMER_UNITS_M GENMASK(13, 12)
+#define DEV_GMII_MM_CONFIG_VERIF_CONFIG_VERIF_TIMER_UNITS_X(x) (((x) & GENMASK(13, 12)) >> 12)
+
+#define DEV_GMII_MM_STATISTICS_MM_STATUS 0x50
+
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_PRMPT_ACTIVE_STATUS BIT(0)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_PRMPT_ACTIVE_STICKY BIT(4)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_PRMPT_VERIFY_STATE(x) (((x) << 8) & GENMASK(10, 8))
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_PRMPT_VERIFY_STATE_M GENMASK(10, 8)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_PRMPT_VERIFY_STATE_X(x) (((x) & GENMASK(10, 8)) >> 8)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_UNEXP_RX_PFRM_STICKY BIT(12)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_UNEXP_TX_PFRM_STICKY BIT(16)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_MM_RX_FRAME_STATUS BIT(20)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_MM_TX_FRAME_STATUS BIT(24)
+#define DEV_GMII_MM_STATISTICS_MM_STATUS_MM_TX_PRMPT_STATUS BIT(28)
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_hsio.h b/drivers/net/ethernet/mscc/ocelot_hsio.h
new file mode 100644
index 0000000..d93ddec3
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_hsio.h
@@ -0,0 +1,785 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_HSIO_H_
+#define _MSCC_OCELOT_HSIO_H_
+
+#define HSIO_PLL5G_CFG0_ENA_ROT BIT(31)
+#define HSIO_PLL5G_CFG0_ENA_LANE BIT(30)
+#define HSIO_PLL5G_CFG0_ENA_CLKTREE BIT(29)
+#define HSIO_PLL5G_CFG0_DIV4 BIT(28)
+#define HSIO_PLL5G_CFG0_ENA_LOCK_FINE BIT(27)
+#define HSIO_PLL5G_CFG0_SELBGV820(x) (((x) << 23) & GENMASK(26, 23))
+#define HSIO_PLL5G_CFG0_SELBGV820_M GENMASK(26, 23)
+#define HSIO_PLL5G_CFG0_SELBGV820_X(x) (((x) & GENMASK(26, 23)) >> 23)
+#define HSIO_PLL5G_CFG0_LOOP_BW_RES(x) (((x) << 18) & GENMASK(22, 18))
+#define HSIO_PLL5G_CFG0_LOOP_BW_RES_M GENMASK(22, 18)
+#define HSIO_PLL5G_CFG0_LOOP_BW_RES_X(x) (((x) & GENMASK(22, 18)) >> 18)
+#define HSIO_PLL5G_CFG0_SELCPI(x) (((x) << 16) & GENMASK(17, 16))
+#define HSIO_PLL5G_CFG0_SELCPI_M GENMASK(17, 16)
+#define HSIO_PLL5G_CFG0_SELCPI_X(x) (((x) & GENMASK(17, 16)) >> 16)
+#define HSIO_PLL5G_CFG0_ENA_VCO_CONTRH BIT(15)
+#define HSIO_PLL5G_CFG0_ENA_CP1 BIT(14)
+#define HSIO_PLL5G_CFG0_ENA_VCO_BUF BIT(13)
+#define HSIO_PLL5G_CFG0_ENA_BIAS BIT(12)
+#define HSIO_PLL5G_CFG0_CPU_CLK_DIV(x) (((x) << 6) & GENMASK(11, 6))
+#define HSIO_PLL5G_CFG0_CPU_CLK_DIV_M GENMASK(11, 6)
+#define HSIO_PLL5G_CFG0_CPU_CLK_DIV_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define HSIO_PLL5G_CFG0_CORE_CLK_DIV(x) ((x) & GENMASK(5, 0))
+#define HSIO_PLL5G_CFG0_CORE_CLK_DIV_M GENMASK(5, 0)
+
+#define HSIO_PLL5G_CFG1_ENA_DIRECT BIT(18)
+#define HSIO_PLL5G_CFG1_ROT_SPEED BIT(17)
+#define HSIO_PLL5G_CFG1_ROT_DIR BIT(16)
+#define HSIO_PLL5G_CFG1_READBACK_DATA_SEL BIT(15)
+#define HSIO_PLL5G_CFG1_RC_ENABLE BIT(14)
+#define HSIO_PLL5G_CFG1_RC_CTRL_DATA(x) (((x) << 6) & GENMASK(13, 6))
+#define HSIO_PLL5G_CFG1_RC_CTRL_DATA_M GENMASK(13, 6)
+#define HSIO_PLL5G_CFG1_RC_CTRL_DATA_X(x) (((x) & GENMASK(13, 6)) >> 6)
+#define HSIO_PLL5G_CFG1_QUARTER_RATE BIT(5)
+#define HSIO_PLL5G_CFG1_PWD_TX BIT(4)
+#define HSIO_PLL5G_CFG1_PWD_RX BIT(3)
+#define HSIO_PLL5G_CFG1_OUT_OF_RANGE_RECAL_ENA BIT(2)
+#define HSIO_PLL5G_CFG1_HALF_RATE BIT(1)
+#define HSIO_PLL5G_CFG1_FORCE_SET_ENA BIT(0)
+
+#define HSIO_PLL5G_CFG2_ENA_TEST_MODE BIT(30)
+#define HSIO_PLL5G_CFG2_ENA_PFD_IN_FLIP BIT(29)
+#define HSIO_PLL5G_CFG2_ENA_VCO_NREF_TESTOUT BIT(28)
+#define HSIO_PLL5G_CFG2_ENA_FBTESTOUT BIT(27)
+#define HSIO_PLL5G_CFG2_ENA_RCPLL BIT(26)
+#define HSIO_PLL5G_CFG2_ENA_CP2 BIT(25)
+#define HSIO_PLL5G_CFG2_ENA_CLK_BYPASS1 BIT(24)
+#define HSIO_PLL5G_CFG2_AMPC_SEL(x) (((x) << 16) & GENMASK(23, 16))
+#define HSIO_PLL5G_CFG2_AMPC_SEL_M GENMASK(23, 16)
+#define HSIO_PLL5G_CFG2_AMPC_SEL_X(x) (((x) & GENMASK(23, 16)) >> 16)
+#define HSIO_PLL5G_CFG2_ENA_CLK_BYPASS BIT(15)
+#define HSIO_PLL5G_CFG2_PWD_AMPCTRL_N BIT(14)
+#define HSIO_PLL5G_CFG2_ENA_AMPCTRL BIT(13)
+#define HSIO_PLL5G_CFG2_ENA_AMP_CTRL_FORCE BIT(12)
+#define HSIO_PLL5G_CFG2_FRC_FSM_POR BIT(11)
+#define HSIO_PLL5G_CFG2_DISABLE_FSM_POR BIT(10)
+#define HSIO_PLL5G_CFG2_GAIN_TEST(x) (((x) << 5) & GENMASK(9, 5))
+#define HSIO_PLL5G_CFG2_GAIN_TEST_M GENMASK(9, 5)
+#define HSIO_PLL5G_CFG2_GAIN_TEST_X(x) (((x) & GENMASK(9, 5)) >> 5)
+#define HSIO_PLL5G_CFG2_EN_RESET_OVERRUN BIT(4)
+#define HSIO_PLL5G_CFG2_EN_RESET_LIM_DET BIT(3)
+#define HSIO_PLL5G_CFG2_EN_RESET_FRQ_DET BIT(2)
+#define HSIO_PLL5G_CFG2_DISABLE_FSM BIT(1)
+#define HSIO_PLL5G_CFG2_ENA_GAIN_TEST BIT(0)
+
+#define HSIO_PLL5G_CFG3_TEST_ANA_OUT_SEL(x) (((x) << 22) & GENMASK(23, 22))
+#define HSIO_PLL5G_CFG3_TEST_ANA_OUT_SEL_M GENMASK(23, 22)
+#define HSIO_PLL5G_CFG3_TEST_ANA_OUT_SEL_X(x) (((x) & GENMASK(23, 22)) >> 22)
+#define HSIO_PLL5G_CFG3_TESTOUT_SEL(x) (((x) << 19) & GENMASK(21, 19))
+#define HSIO_PLL5G_CFG3_TESTOUT_SEL_M GENMASK(21, 19)
+#define HSIO_PLL5G_CFG3_TESTOUT_SEL_X(x) (((x) & GENMASK(21, 19)) >> 19)
+#define HSIO_PLL5G_CFG3_ENA_ANA_TEST_OUT BIT(18)
+#define HSIO_PLL5G_CFG3_ENA_TEST_OUT BIT(17)
+#define HSIO_PLL5G_CFG3_SEL_FBDCLK BIT(16)
+#define HSIO_PLL5G_CFG3_SEL_CML_CMOS_PFD BIT(15)
+#define HSIO_PLL5G_CFG3_RST_FB_N BIT(14)
+#define HSIO_PLL5G_CFG3_FORCE_VCO_CONTRH BIT(13)
+#define HSIO_PLL5G_CFG3_FORCE_LO BIT(12)
+#define HSIO_PLL5G_CFG3_FORCE_HI BIT(11)
+#define HSIO_PLL5G_CFG3_FORCE_ENA BIT(10)
+#define HSIO_PLL5G_CFG3_FORCE_CP BIT(9)
+#define HSIO_PLL5G_CFG3_FBDIVSEL_TST_ENA BIT(8)
+#define HSIO_PLL5G_CFG3_FBDIVSEL(x) ((x) & GENMASK(7, 0))
+#define HSIO_PLL5G_CFG3_FBDIVSEL_M GENMASK(7, 0)
+
+#define HSIO_PLL5G_CFG4_IB_BIAS_CTRL(x) (((x) << 16) & GENMASK(23, 16))
+#define HSIO_PLL5G_CFG4_IB_BIAS_CTRL_M GENMASK(23, 16)
+#define HSIO_PLL5G_CFG4_IB_BIAS_CTRL_X(x) (((x) & GENMASK(23, 16)) >> 16)
+#define HSIO_PLL5G_CFG4_IB_CTRL(x) ((x) & GENMASK(15, 0))
+#define HSIO_PLL5G_CFG4_IB_CTRL_M GENMASK(15, 0)
+
+#define HSIO_PLL5G_CFG5_OB_BIAS_CTRL(x) (((x) << 16) & GENMASK(23, 16))
+#define HSIO_PLL5G_CFG5_OB_BIAS_CTRL_M GENMASK(23, 16)
+#define HSIO_PLL5G_CFG5_OB_BIAS_CTRL_X(x) (((x) & GENMASK(23, 16)) >> 16)
+#define HSIO_PLL5G_CFG5_OB_CTRL(x) ((x) & GENMASK(15, 0))
+#define HSIO_PLL5G_CFG5_OB_CTRL_M GENMASK(15, 0)
+
+#define HSIO_PLL5G_CFG6_REFCLK_SEL_SRC BIT(23)
+#define HSIO_PLL5G_CFG6_REFCLK_SEL(x) (((x) << 20) & GENMASK(22, 20))
+#define HSIO_PLL5G_CFG6_REFCLK_SEL_M GENMASK(22, 20)
+#define HSIO_PLL5G_CFG6_REFCLK_SEL_X(x) (((x) & GENMASK(22, 20)) >> 20)
+#define HSIO_PLL5G_CFG6_REFCLK_SRC BIT(19)
+#define HSIO_PLL5G_CFG6_POR_DEL_SEL(x) (((x) << 16) & GENMASK(17, 16))
+#define HSIO_PLL5G_CFG6_POR_DEL_SEL_M GENMASK(17, 16)
+#define HSIO_PLL5G_CFG6_POR_DEL_SEL_X(x) (((x) & GENMASK(17, 16)) >> 16)
+#define HSIO_PLL5G_CFG6_DIV125REF_SEL(x) (((x) << 8) & GENMASK(15, 8))
+#define HSIO_PLL5G_CFG6_DIV125REF_SEL_M GENMASK(15, 8)
+#define HSIO_PLL5G_CFG6_DIV125REF_SEL_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define HSIO_PLL5G_CFG6_ENA_REFCLKC2 BIT(7)
+#define HSIO_PLL5G_CFG6_ENA_FBCLKC2 BIT(6)
+#define HSIO_PLL5G_CFG6_DDR_CLK_DIV(x) ((x) & GENMASK(5, 0))
+#define HSIO_PLL5G_CFG6_DDR_CLK_DIV_M GENMASK(5, 0)
+
+#define HSIO_PLL5G_STATUS0_RANGE_LIM BIT(12)
+#define HSIO_PLL5G_STATUS0_OUT_OF_RANGE_ERR BIT(11)
+#define HSIO_PLL5G_STATUS0_CALIBRATION_ERR BIT(10)
+#define HSIO_PLL5G_STATUS0_CALIBRATION_DONE BIT(9)
+#define HSIO_PLL5G_STATUS0_READBACK_DATA(x) (((x) << 1) & GENMASK(8, 1))
+#define HSIO_PLL5G_STATUS0_READBACK_DATA_M GENMASK(8, 1)
+#define HSIO_PLL5G_STATUS0_READBACK_DATA_X(x) (((x) & GENMASK(8, 1)) >> 1)
+#define HSIO_PLL5G_STATUS0_LOCK_STATUS BIT(0)
+
+#define HSIO_PLL5G_STATUS1_SIG_DEL(x) (((x) << 21) & GENMASK(28, 21))
+#define HSIO_PLL5G_STATUS1_SIG_DEL_M GENMASK(28, 21)
+#define HSIO_PLL5G_STATUS1_SIG_DEL_X(x) (((x) & GENMASK(28, 21)) >> 21)
+#define HSIO_PLL5G_STATUS1_GAIN_STAT(x) (((x) << 16) & GENMASK(20, 16))
+#define HSIO_PLL5G_STATUS1_GAIN_STAT_M GENMASK(20, 16)
+#define HSIO_PLL5G_STATUS1_GAIN_STAT_X(x) (((x) & GENMASK(20, 16)) >> 16)
+#define HSIO_PLL5G_STATUS1_FBCNT_DIF(x) (((x) << 4) & GENMASK(13, 4))
+#define HSIO_PLL5G_STATUS1_FBCNT_DIF_M GENMASK(13, 4)
+#define HSIO_PLL5G_STATUS1_FBCNT_DIF_X(x) (((x) & GENMASK(13, 4)) >> 4)
+#define HSIO_PLL5G_STATUS1_FSM_STAT(x) (((x) << 1) & GENMASK(3, 1))
+#define HSIO_PLL5G_STATUS1_FSM_STAT_M GENMASK(3, 1)
+#define HSIO_PLL5G_STATUS1_FSM_STAT_X(x) (((x) & GENMASK(3, 1)) >> 1)
+#define HSIO_PLL5G_STATUS1_FSM_LOCK BIT(0)
+
+#define HSIO_PLL5G_BIST_CFG0_PLLB_START_BIST BIT(31)
+#define HSIO_PLL5G_BIST_CFG0_PLLB_MEAS_MODE BIT(30)
+#define HSIO_PLL5G_BIST_CFG0_PLLB_LOCK_REPEAT(x) (((x) << 20) & GENMASK(23, 20))
+#define HSIO_PLL5G_BIST_CFG0_PLLB_LOCK_REPEAT_M GENMASK(23, 20)
+#define HSIO_PLL5G_BIST_CFG0_PLLB_LOCK_REPEAT_X(x) (((x) & GENMASK(23, 20)) >> 20)
+#define HSIO_PLL5G_BIST_CFG0_PLLB_LOCK_UNCERT(x) (((x) << 16) & GENMASK(19, 16))
+#define HSIO_PLL5G_BIST_CFG0_PLLB_LOCK_UNCERT_M GENMASK(19, 16)
+#define HSIO_PLL5G_BIST_CFG0_PLLB_LOCK_UNCERT_X(x) (((x) & GENMASK(19, 16)) >> 16)
+#define HSIO_PLL5G_BIST_CFG0_PLLB_DIV_FACTOR_PRE(x) ((x) & GENMASK(15, 0))
+#define HSIO_PLL5G_BIST_CFG0_PLLB_DIV_FACTOR_PRE_M GENMASK(15, 0)
+
+#define HSIO_PLL5G_BIST_STAT0_PLLB_FSM_STAT(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_PLL5G_BIST_STAT0_PLLB_FSM_STAT_M GENMASK(7, 4)
+#define HSIO_PLL5G_BIST_STAT0_PLLB_FSM_STAT_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_PLL5G_BIST_STAT0_PLLB_BUSY BIT(2)
+#define HSIO_PLL5G_BIST_STAT0_PLLB_DONE_N BIT(1)
+#define HSIO_PLL5G_BIST_STAT0_PLLB_FAIL BIT(0)
+
+#define HSIO_PLL5G_BIST_STAT1_PLLB_CNT_OUT(x) (((x) << 16) & GENMASK(31, 16))
+#define HSIO_PLL5G_BIST_STAT1_PLLB_CNT_OUT_M GENMASK(31, 16)
+#define HSIO_PLL5G_BIST_STAT1_PLLB_CNT_OUT_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define HSIO_PLL5G_BIST_STAT1_PLLB_CNT_REF_DIFF(x) ((x) & GENMASK(15, 0))
+#define HSIO_PLL5G_BIST_STAT1_PLLB_CNT_REF_DIFF_M GENMASK(15, 0)
+
+#define HSIO_RCOMP_CFG0_PWD_ENA BIT(13)
+#define HSIO_RCOMP_CFG0_RUN_CAL BIT(12)
+#define HSIO_RCOMP_CFG0_SPEED_SEL(x) (((x) << 10) & GENMASK(11, 10))
+#define HSIO_RCOMP_CFG0_SPEED_SEL_M GENMASK(11, 10)
+#define HSIO_RCOMP_CFG0_SPEED_SEL_X(x) (((x) & GENMASK(11, 10)) >> 10)
+#define HSIO_RCOMP_CFG0_MODE_SEL(x) (((x) << 8) & GENMASK(9, 8))
+#define HSIO_RCOMP_CFG0_MODE_SEL_M GENMASK(9, 8)
+#define HSIO_RCOMP_CFG0_MODE_SEL_X(x) (((x) & GENMASK(9, 8)) >> 8)
+#define HSIO_RCOMP_CFG0_FORCE_ENA BIT(4)
+#define HSIO_RCOMP_CFG0_RCOMP_VAL(x) ((x) & GENMASK(3, 0))
+#define HSIO_RCOMP_CFG0_RCOMP_VAL_M GENMASK(3, 0)
+
+#define HSIO_RCOMP_STATUS_BUSY BIT(12)
+#define HSIO_RCOMP_STATUS_DELTA_ALERT BIT(7)
+#define HSIO_RCOMP_STATUS_RCOMP(x) ((x) & GENMASK(3, 0))
+#define HSIO_RCOMP_STATUS_RCOMP_M GENMASK(3, 0)
+
+#define HSIO_SYNC_ETH_CFG_RSZ 0x4
+
+#define HSIO_SYNC_ETH_CFG_SEL_RECO_CLK_SRC(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_SYNC_ETH_CFG_SEL_RECO_CLK_SRC_M GENMASK(7, 4)
+#define HSIO_SYNC_ETH_CFG_SEL_RECO_CLK_SRC_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_SYNC_ETH_CFG_SEL_RECO_CLK_DIV(x) (((x) << 1) & GENMASK(3, 1))
+#define HSIO_SYNC_ETH_CFG_SEL_RECO_CLK_DIV_M GENMASK(3, 1)
+#define HSIO_SYNC_ETH_CFG_SEL_RECO_CLK_DIV_X(x) (((x) & GENMASK(3, 1)) >> 1)
+#define HSIO_SYNC_ETH_CFG_RECO_CLK_ENA BIT(0)
+
+#define HSIO_SYNC_ETH_PLL_CFG_PLL_AUTO_SQUELCH_ENA BIT(0)
+
+#define HSIO_S1G_DES_CFG_DES_PHS_CTRL(x) (((x) << 13) & GENMASK(16, 13))
+#define HSIO_S1G_DES_CFG_DES_PHS_CTRL_M GENMASK(16, 13)
+#define HSIO_S1G_DES_CFG_DES_PHS_CTRL_X(x) (((x) & GENMASK(16, 13)) >> 13)
+#define HSIO_S1G_DES_CFG_DES_CPMD_SEL(x) (((x) << 11) & GENMASK(12, 11))
+#define HSIO_S1G_DES_CFG_DES_CPMD_SEL_M GENMASK(12, 11)
+#define HSIO_S1G_DES_CFG_DES_CPMD_SEL_X(x) (((x) & GENMASK(12, 11)) >> 11)
+#define HSIO_S1G_DES_CFG_DES_MBTR_CTRL(x) (((x) << 8) & GENMASK(10, 8))
+#define HSIO_S1G_DES_CFG_DES_MBTR_CTRL_M GENMASK(10, 8)
+#define HSIO_S1G_DES_CFG_DES_MBTR_CTRL_X(x) (((x) & GENMASK(10, 8)) >> 8)
+#define HSIO_S1G_DES_CFG_DES_BW_ANA(x) (((x) << 5) & GENMASK(7, 5))
+#define HSIO_S1G_DES_CFG_DES_BW_ANA_M GENMASK(7, 5)
+#define HSIO_S1G_DES_CFG_DES_BW_ANA_X(x) (((x) & GENMASK(7, 5)) >> 5)
+#define HSIO_S1G_DES_CFG_DES_SWAP_ANA BIT(4)
+#define HSIO_S1G_DES_CFG_DES_BW_HYST(x) (((x) << 1) & GENMASK(3, 1))
+#define HSIO_S1G_DES_CFG_DES_BW_HYST_M GENMASK(3, 1)
+#define HSIO_S1G_DES_CFG_DES_BW_HYST_X(x) (((x) & GENMASK(3, 1)) >> 1)
+#define HSIO_S1G_DES_CFG_DES_SWAP_HYST BIT(0)
+
+#define HSIO_S1G_IB_CFG_IB_FX100_ENA BIT(27)
+#define HSIO_S1G_IB_CFG_ACJTAG_HYST(x) (((x) << 24) & GENMASK(26, 24))
+#define HSIO_S1G_IB_CFG_ACJTAG_HYST_M GENMASK(26, 24)
+#define HSIO_S1G_IB_CFG_ACJTAG_HYST_X(x) (((x) & GENMASK(26, 24)) >> 24)
+#define HSIO_S1G_IB_CFG_IB_DET_LEV(x) (((x) << 19) & GENMASK(21, 19))
+#define HSIO_S1G_IB_CFG_IB_DET_LEV_M GENMASK(21, 19)
+#define HSIO_S1G_IB_CFG_IB_DET_LEV_X(x) (((x) & GENMASK(21, 19)) >> 19)
+#define HSIO_S1G_IB_CFG_IB_HYST_LEV BIT(14)
+#define HSIO_S1G_IB_CFG_IB_ENA_CMV_TERM BIT(13)
+#define HSIO_S1G_IB_CFG_IB_ENA_DC_COUPLING BIT(12)
+#define HSIO_S1G_IB_CFG_IB_ENA_DETLEV BIT(11)
+#define HSIO_S1G_IB_CFG_IB_ENA_HYST BIT(10)
+#define HSIO_S1G_IB_CFG_IB_ENA_OFFSET_COMP BIT(9)
+#define HSIO_S1G_IB_CFG_IB_EQ_GAIN(x) (((x) << 6) & GENMASK(8, 6))
+#define HSIO_S1G_IB_CFG_IB_EQ_GAIN_M GENMASK(8, 6)
+#define HSIO_S1G_IB_CFG_IB_EQ_GAIN_X(x) (((x) & GENMASK(8, 6)) >> 6)
+#define HSIO_S1G_IB_CFG_IB_SEL_CORNER_FREQ(x) (((x) << 4) & GENMASK(5, 4))
+#define HSIO_S1G_IB_CFG_IB_SEL_CORNER_FREQ_M GENMASK(5, 4)
+#define HSIO_S1G_IB_CFG_IB_SEL_CORNER_FREQ_X(x) (((x) & GENMASK(5, 4)) >> 4)
+#define HSIO_S1G_IB_CFG_IB_RESISTOR_CTRL(x) ((x) & GENMASK(3, 0))
+#define HSIO_S1G_IB_CFG_IB_RESISTOR_CTRL_M GENMASK(3, 0)
+
+#define HSIO_S1G_OB_CFG_OB_SLP(x) (((x) << 17) & GENMASK(18, 17))
+#define HSIO_S1G_OB_CFG_OB_SLP_M GENMASK(18, 17)
+#define HSIO_S1G_OB_CFG_OB_SLP_X(x) (((x) & GENMASK(18, 17)) >> 17)
+#define HSIO_S1G_OB_CFG_OB_AMP_CTRL(x) (((x) << 13) & GENMASK(16, 13))
+#define HSIO_S1G_OB_CFG_OB_AMP_CTRL_M GENMASK(16, 13)
+#define HSIO_S1G_OB_CFG_OB_AMP_CTRL_X(x) (((x) & GENMASK(16, 13)) >> 13)
+#define HSIO_S1G_OB_CFG_OB_CMM_BIAS_CTRL(x) (((x) << 10) & GENMASK(12, 10))
+#define HSIO_S1G_OB_CFG_OB_CMM_BIAS_CTRL_M GENMASK(12, 10)
+#define HSIO_S1G_OB_CFG_OB_CMM_BIAS_CTRL_X(x) (((x) & GENMASK(12, 10)) >> 10)
+#define HSIO_S1G_OB_CFG_OB_DIS_VCM_CTRL BIT(9)
+#define HSIO_S1G_OB_CFG_OB_EN_MEAS_VREG BIT(8)
+#define HSIO_S1G_OB_CFG_OB_VCM_CTRL(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_S1G_OB_CFG_OB_VCM_CTRL_M GENMASK(7, 4)
+#define HSIO_S1G_OB_CFG_OB_VCM_CTRL_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_S1G_OB_CFG_OB_RESISTOR_CTRL(x) ((x) & GENMASK(3, 0))
+#define HSIO_S1G_OB_CFG_OB_RESISTOR_CTRL_M GENMASK(3, 0)
+
+#define HSIO_S1G_SER_CFG_SER_IDLE BIT(9)
+#define HSIO_S1G_SER_CFG_SER_DEEMPH BIT(8)
+#define HSIO_S1G_SER_CFG_SER_CPMD_SEL BIT(7)
+#define HSIO_S1G_SER_CFG_SER_SWAP_CPMD BIT(6)
+#define HSIO_S1G_SER_CFG_SER_ALISEL(x) (((x) << 4) & GENMASK(5, 4))
+#define HSIO_S1G_SER_CFG_SER_ALISEL_M GENMASK(5, 4)
+#define HSIO_S1G_SER_CFG_SER_ALISEL_X(x) (((x) & GENMASK(5, 4)) >> 4)
+#define HSIO_S1G_SER_CFG_SER_ENHYS BIT(3)
+#define HSIO_S1G_SER_CFG_SER_BIG_WIN BIT(2)
+#define HSIO_S1G_SER_CFG_SER_EN_WIN BIT(1)
+#define HSIO_S1G_SER_CFG_SER_ENALI BIT(0)
+
+#define HSIO_S1G_COMMON_CFG_SYS_RST BIT(31)
+#define HSIO_S1G_COMMON_CFG_SE_AUTO_SQUELCH_ENA BIT(21)
+#define HSIO_S1G_COMMON_CFG_ENA_LANE BIT(18)
+#define HSIO_S1G_COMMON_CFG_PWD_RX BIT(17)
+#define HSIO_S1G_COMMON_CFG_PWD_TX BIT(16)
+#define HSIO_S1G_COMMON_CFG_LANE_CTRL(x) (((x) << 13) & GENMASK(15, 13))
+#define HSIO_S1G_COMMON_CFG_LANE_CTRL_M GENMASK(15, 13)
+#define HSIO_S1G_COMMON_CFG_LANE_CTRL_X(x) (((x) & GENMASK(15, 13)) >> 13)
+#define HSIO_S1G_COMMON_CFG_ENA_DIRECT BIT(12)
+#define HSIO_S1G_COMMON_CFG_ENA_ELOOP BIT(11)
+#define HSIO_S1G_COMMON_CFG_ENA_FLOOP BIT(10)
+#define HSIO_S1G_COMMON_CFG_ENA_ILOOP BIT(9)
+#define HSIO_S1G_COMMON_CFG_ENA_PLOOP BIT(8)
+#define HSIO_S1G_COMMON_CFG_HRATE BIT(7)
+#define HSIO_S1G_COMMON_CFG_IF_MODE BIT(0)
+
+#define HSIO_S1G_PLL_CFG_PLL_ENA_FB_DIV2 BIT(22)
+#define HSIO_S1G_PLL_CFG_PLL_ENA_RC_DIV2 BIT(21)
+#define HSIO_S1G_PLL_CFG_PLL_FSM_CTRL_DATA(x) (((x) << 8) & GENMASK(15, 8))
+#define HSIO_S1G_PLL_CFG_PLL_FSM_CTRL_DATA_M GENMASK(15, 8)
+#define HSIO_S1G_PLL_CFG_PLL_FSM_CTRL_DATA_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define HSIO_S1G_PLL_CFG_PLL_FSM_ENA BIT(7)
+#define HSIO_S1G_PLL_CFG_PLL_FSM_FORCE_SET_ENA BIT(6)
+#define HSIO_S1G_PLL_CFG_PLL_FSM_OOR_RECAL_ENA BIT(5)
+#define HSIO_S1G_PLL_CFG_PLL_RB_DATA_SEL BIT(3)
+
+#define HSIO_S1G_PLL_STATUS_PLL_CAL_NOT_DONE BIT(12)
+#define HSIO_S1G_PLL_STATUS_PLL_CAL_ERR BIT(11)
+#define HSIO_S1G_PLL_STATUS_PLL_OUT_OF_RANGE_ERR BIT(10)
+#define HSIO_S1G_PLL_STATUS_PLL_RB_DATA(x) ((x) & GENMASK(7, 0))
+#define HSIO_S1G_PLL_STATUS_PLL_RB_DATA_M GENMASK(7, 0)
+
+#define HSIO_S1G_DFT_CFG0_LAZYBIT BIT(31)
+#define HSIO_S1G_DFT_CFG0_INV_DIS BIT(23)
+#define HSIO_S1G_DFT_CFG0_PRBS_SEL(x) (((x) << 20) & GENMASK(21, 20))
+#define HSIO_S1G_DFT_CFG0_PRBS_SEL_M GENMASK(21, 20)
+#define HSIO_S1G_DFT_CFG0_PRBS_SEL_X(x) (((x) & GENMASK(21, 20)) >> 20)
+#define HSIO_S1G_DFT_CFG0_TEST_MODE(x) (((x) << 16) & GENMASK(18, 16))
+#define HSIO_S1G_DFT_CFG0_TEST_MODE_M GENMASK(18, 16)
+#define HSIO_S1G_DFT_CFG0_TEST_MODE_X(x) (((x) & GENMASK(18, 16)) >> 16)
+#define HSIO_S1G_DFT_CFG0_RX_PHS_CORR_DIS BIT(4)
+#define HSIO_S1G_DFT_CFG0_RX_PDSENS_ENA BIT(3)
+#define HSIO_S1G_DFT_CFG0_RX_DFT_ENA BIT(2)
+#define HSIO_S1G_DFT_CFG0_TX_DFT_ENA BIT(0)
+
+#define HSIO_S1G_DFT_CFG1_TX_JITTER_AMPL(x) (((x) << 8) & GENMASK(17, 8))
+#define HSIO_S1G_DFT_CFG1_TX_JITTER_AMPL_M GENMASK(17, 8)
+#define HSIO_S1G_DFT_CFG1_TX_JITTER_AMPL_X(x) (((x) & GENMASK(17, 8)) >> 8)
+#define HSIO_S1G_DFT_CFG1_TX_STEP_FREQ(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_S1G_DFT_CFG1_TX_STEP_FREQ_M GENMASK(7, 4)
+#define HSIO_S1G_DFT_CFG1_TX_STEP_FREQ_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_S1G_DFT_CFG1_TX_JI_ENA BIT(3)
+#define HSIO_S1G_DFT_CFG1_TX_WAVEFORM_SEL BIT(2)
+#define HSIO_S1G_DFT_CFG1_TX_FREQOFF_DIR BIT(1)
+#define HSIO_S1G_DFT_CFG1_TX_FREQOFF_ENA BIT(0)
+
+#define HSIO_S1G_DFT_CFG2_RX_JITTER_AMPL(x) (((x) << 8) & GENMASK(17, 8))
+#define HSIO_S1G_DFT_CFG2_RX_JITTER_AMPL_M GENMASK(17, 8)
+#define HSIO_S1G_DFT_CFG2_RX_JITTER_AMPL_X(x) (((x) & GENMASK(17, 8)) >> 8)
+#define HSIO_S1G_DFT_CFG2_RX_STEP_FREQ(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_S1G_DFT_CFG2_RX_STEP_FREQ_M GENMASK(7, 4)
+#define HSIO_S1G_DFT_CFG2_RX_STEP_FREQ_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_S1G_DFT_CFG2_RX_JI_ENA BIT(3)
+#define HSIO_S1G_DFT_CFG2_RX_WAVEFORM_SEL BIT(2)
+#define HSIO_S1G_DFT_CFG2_RX_FREQOFF_DIR BIT(1)
+#define HSIO_S1G_DFT_CFG2_RX_FREQOFF_ENA BIT(0)
+
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_ENA BIT(20)
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_FBS_HIGH(x) (((x) << 16) & GENMASK(17, 16))
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_FBS_HIGH_M GENMASK(17, 16)
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_FBS_HIGH_X(x) (((x) & GENMASK(17, 16)) >> 16)
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_HIGH(x) (((x) << 8) & GENMASK(15, 8))
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_HIGH_M GENMASK(15, 8)
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_HIGH_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_LOW(x) ((x) & GENMASK(7, 0))
+#define HSIO_S1G_RC_PLL_BIST_CFG_PLL_BIST_LOW_M GENMASK(7, 0)
+
+#define HSIO_S1G_MISC_CFG_DES_100FX_KICK_MODE(x) (((x) << 11) & GENMASK(12, 11))
+#define HSIO_S1G_MISC_CFG_DES_100FX_KICK_MODE_M GENMASK(12, 11)
+#define HSIO_S1G_MISC_CFG_DES_100FX_KICK_MODE_X(x) (((x) & GENMASK(12, 11)) >> 11)
+#define HSIO_S1G_MISC_CFG_DES_100FX_CPMD_SWAP BIT(10)
+#define HSIO_S1G_MISC_CFG_DES_100FX_CPMD_MODE BIT(9)
+#define HSIO_S1G_MISC_CFG_DES_100FX_CPMD_ENA BIT(8)
+#define HSIO_S1G_MISC_CFG_RX_LPI_MODE_ENA BIT(5)
+#define HSIO_S1G_MISC_CFG_TX_LPI_MODE_ENA BIT(4)
+#define HSIO_S1G_MISC_CFG_RX_DATA_INV_ENA BIT(3)
+#define HSIO_S1G_MISC_CFG_TX_DATA_INV_ENA BIT(2)
+#define HSIO_S1G_MISC_CFG_LANE_RST BIT(0)
+
+#define HSIO_S1G_DFT_STATUS_PLL_BIST_NOT_DONE BIT(7)
+#define HSIO_S1G_DFT_STATUS_PLL_BIST_FAILED BIT(6)
+#define HSIO_S1G_DFT_STATUS_PLL_BIST_TIMEOUT_ERR BIT(5)
+#define HSIO_S1G_DFT_STATUS_BIST_ACTIVE BIT(3)
+#define HSIO_S1G_DFT_STATUS_BIST_NOSYNC BIT(2)
+#define HSIO_S1G_DFT_STATUS_BIST_COMPLETE_N BIT(1)
+#define HSIO_S1G_DFT_STATUS_BIST_ERROR BIT(0)
+
+#define HSIO_S1G_MISC_STATUS_DES_100FX_PHASE_SEL BIT(0)
+
+#define HSIO_MCB_S1G_ADDR_CFG_SERDES1G_WR_ONE_SHOT BIT(31)
+#define HSIO_MCB_S1G_ADDR_CFG_SERDES1G_RD_ONE_SHOT BIT(30)
+#define HSIO_MCB_S1G_ADDR_CFG_SERDES1G_ADDR(x) ((x) & GENMASK(8, 0))
+#define HSIO_MCB_S1G_ADDR_CFG_SERDES1G_ADDR_M GENMASK(8, 0)
+
+#define HSIO_S6G_DIG_CFG_GP(x) (((x) << 16) & GENMASK(18, 16))
+#define HSIO_S6G_DIG_CFG_GP_M GENMASK(18, 16)
+#define HSIO_S6G_DIG_CFG_GP_X(x) (((x) & GENMASK(18, 16)) >> 16)
+#define HSIO_S6G_DIG_CFG_TX_BIT_DOUBLING_MODE_ENA BIT(7)
+#define HSIO_S6G_DIG_CFG_SIGDET_TESTMODE BIT(6)
+#define HSIO_S6G_DIG_CFG_SIGDET_AST(x) (((x) << 3) & GENMASK(5, 3))
+#define HSIO_S6G_DIG_CFG_SIGDET_AST_M GENMASK(5, 3)
+#define HSIO_S6G_DIG_CFG_SIGDET_AST_X(x) (((x) & GENMASK(5, 3)) >> 3)
+#define HSIO_S6G_DIG_CFG_SIGDET_DST(x) ((x) & GENMASK(2, 0))
+#define HSIO_S6G_DIG_CFG_SIGDET_DST_M GENMASK(2, 0)
+
+#define HSIO_S6G_DFT_CFG0_LAZYBIT BIT(31)
+#define HSIO_S6G_DFT_CFG0_INV_DIS BIT(23)
+#define HSIO_S6G_DFT_CFG0_PRBS_SEL(x) (((x) << 20) & GENMASK(21, 20))
+#define HSIO_S6G_DFT_CFG0_PRBS_SEL_M GENMASK(21, 20)
+#define HSIO_S6G_DFT_CFG0_PRBS_SEL_X(x) (((x) & GENMASK(21, 20)) >> 20)
+#define HSIO_S6G_DFT_CFG0_TEST_MODE(x) (((x) << 16) & GENMASK(18, 16))
+#define HSIO_S6G_DFT_CFG0_TEST_MODE_M GENMASK(18, 16)
+#define HSIO_S6G_DFT_CFG0_TEST_MODE_X(x) (((x) & GENMASK(18, 16)) >> 16)
+#define HSIO_S6G_DFT_CFG0_RX_PHS_CORR_DIS BIT(4)
+#define HSIO_S6G_DFT_CFG0_RX_PDSENS_ENA BIT(3)
+#define HSIO_S6G_DFT_CFG0_RX_DFT_ENA BIT(2)
+#define HSIO_S6G_DFT_CFG0_TX_DFT_ENA BIT(0)
+
+#define HSIO_S6G_DFT_CFG1_TX_JITTER_AMPL(x) (((x) << 8) & GENMASK(17, 8))
+#define HSIO_S6G_DFT_CFG1_TX_JITTER_AMPL_M GENMASK(17, 8)
+#define HSIO_S6G_DFT_CFG1_TX_JITTER_AMPL_X(x) (((x) & GENMASK(17, 8)) >> 8)
+#define HSIO_S6G_DFT_CFG1_TX_STEP_FREQ(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_S6G_DFT_CFG1_TX_STEP_FREQ_M GENMASK(7, 4)
+#define HSIO_S6G_DFT_CFG1_TX_STEP_FREQ_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_S6G_DFT_CFG1_TX_JI_ENA BIT(3)
+#define HSIO_S6G_DFT_CFG1_TX_WAVEFORM_SEL BIT(2)
+#define HSIO_S6G_DFT_CFG1_TX_FREQOFF_DIR BIT(1)
+#define HSIO_S6G_DFT_CFG1_TX_FREQOFF_ENA BIT(0)
+
+#define HSIO_S6G_DFT_CFG2_RX_JITTER_AMPL(x) (((x) << 8) & GENMASK(17, 8))
+#define HSIO_S6G_DFT_CFG2_RX_JITTER_AMPL_M GENMASK(17, 8)
+#define HSIO_S6G_DFT_CFG2_RX_JITTER_AMPL_X(x) (((x) & GENMASK(17, 8)) >> 8)
+#define HSIO_S6G_DFT_CFG2_RX_STEP_FREQ(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_S6G_DFT_CFG2_RX_STEP_FREQ_M GENMASK(7, 4)
+#define HSIO_S6G_DFT_CFG2_RX_STEP_FREQ_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_S6G_DFT_CFG2_RX_JI_ENA BIT(3)
+#define HSIO_S6G_DFT_CFG2_RX_WAVEFORM_SEL BIT(2)
+#define HSIO_S6G_DFT_CFG2_RX_FREQOFF_DIR BIT(1)
+#define HSIO_S6G_DFT_CFG2_RX_FREQOFF_ENA BIT(0)
+
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_ENA BIT(20)
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_FBS_HIGH(x) (((x) << 16) & GENMASK(19, 16))
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_FBS_HIGH_M GENMASK(19, 16)
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_FBS_HIGH_X(x) (((x) & GENMASK(19, 16)) >> 16)
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_HIGH(x) (((x) << 8) & GENMASK(15, 8))
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_HIGH_M GENMASK(15, 8)
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_HIGH_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_LOW(x) ((x) & GENMASK(7, 0))
+#define HSIO_S6G_RC_PLL_BIST_CFG_PLL_BIST_LOW_M GENMASK(7, 0)
+
+#define HSIO_S6G_MISC_CFG_SEL_RECO_CLK(x) (((x) << 13) & GENMASK(14, 13))
+#define HSIO_S6G_MISC_CFG_SEL_RECO_CLK_M GENMASK(14, 13)
+#define HSIO_S6G_MISC_CFG_SEL_RECO_CLK_X(x) (((x) & GENMASK(14, 13)) >> 13)
+#define HSIO_S6G_MISC_CFG_DES_100FX_KICK_MODE(x) (((x) << 11) & GENMASK(12, 11))
+#define HSIO_S6G_MISC_CFG_DES_100FX_KICK_MODE_M GENMASK(12, 11)
+#define HSIO_S6G_MISC_CFG_DES_100FX_KICK_MODE_X(x) (((x) & GENMASK(12, 11)) >> 11)
+#define HSIO_S6G_MISC_CFG_DES_100FX_CPMD_SWAP BIT(10)
+#define HSIO_S6G_MISC_CFG_DES_100FX_CPMD_MODE BIT(9)
+#define HSIO_S6G_MISC_CFG_DES_100FX_CPMD_ENA BIT(8)
+#define HSIO_S6G_MISC_CFG_RX_BUS_FLIP_ENA BIT(7)
+#define HSIO_S6G_MISC_CFG_TX_BUS_FLIP_ENA BIT(6)
+#define HSIO_S6G_MISC_CFG_RX_LPI_MODE_ENA BIT(5)
+#define HSIO_S6G_MISC_CFG_TX_LPI_MODE_ENA BIT(4)
+#define HSIO_S6G_MISC_CFG_RX_DATA_INV_ENA BIT(3)
+#define HSIO_S6G_MISC_CFG_TX_DATA_INV_ENA BIT(2)
+#define HSIO_S6G_MISC_CFG_LANE_RST BIT(0)
+
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_POST0(x) (((x) << 23) & GENMASK(28, 23))
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_POST0_M GENMASK(28, 23)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_POST0_X(x) (((x) & GENMASK(28, 23)) >> 23)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_POST1(x) (((x) << 18) & GENMASK(22, 18))
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_POST1_M GENMASK(22, 18)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_POST1_X(x) (((x) & GENMASK(22, 18)) >> 18)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_PREC(x) (((x) << 13) & GENMASK(17, 13))
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_PREC_M GENMASK(17, 13)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_PREC_X(x) (((x) & GENMASK(17, 13)) >> 13)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_ENA_CAS(x) (((x) << 6) & GENMASK(8, 6))
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_ENA_CAS_M GENMASK(8, 6)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_ENA_CAS_X(x) (((x) & GENMASK(8, 6)) >> 6)
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_LEV(x) ((x) & GENMASK(5, 0))
+#define HSIO_S6G_OB_ANEG_CFG_AN_OB_LEV_M GENMASK(5, 0)
+
+#define HSIO_S6G_DFT_STATUS_PRBS_SYNC_STAT BIT(8)
+#define HSIO_S6G_DFT_STATUS_PLL_BIST_NOT_DONE BIT(7)
+#define HSIO_S6G_DFT_STATUS_PLL_BIST_FAILED BIT(6)
+#define HSIO_S6G_DFT_STATUS_PLL_BIST_TIMEOUT_ERR BIT(5)
+#define HSIO_S6G_DFT_STATUS_BIST_ACTIVE BIT(3)
+#define HSIO_S6G_DFT_STATUS_BIST_NOSYNC BIT(2)
+#define HSIO_S6G_DFT_STATUS_BIST_COMPLETE_N BIT(1)
+#define HSIO_S6G_DFT_STATUS_BIST_ERROR BIT(0)
+
+#define HSIO_S6G_MISC_STATUS_DES_100FX_PHASE_SEL BIT(0)
+
+#define HSIO_S6G_DES_CFG_DES_PHS_CTRL(x) (((x) << 13) & GENMASK(16, 13))
+#define HSIO_S6G_DES_CFG_DES_PHS_CTRL_M GENMASK(16, 13)
+#define HSIO_S6G_DES_CFG_DES_PHS_CTRL_X(x) (((x) & GENMASK(16, 13)) >> 13)
+#define HSIO_S6G_DES_CFG_DES_MBTR_CTRL(x) (((x) << 10) & GENMASK(12, 10))
+#define HSIO_S6G_DES_CFG_DES_MBTR_CTRL_M GENMASK(12, 10)
+#define HSIO_S6G_DES_CFG_DES_MBTR_CTRL_X(x) (((x) & GENMASK(12, 10)) >> 10)
+#define HSIO_S6G_DES_CFG_DES_CPMD_SEL(x) (((x) << 8) & GENMASK(9, 8))
+#define HSIO_S6G_DES_CFG_DES_CPMD_SEL_M GENMASK(9, 8)
+#define HSIO_S6G_DES_CFG_DES_CPMD_SEL_X(x) (((x) & GENMASK(9, 8)) >> 8)
+#define HSIO_S6G_DES_CFG_DES_BW_HYST(x) (((x) << 5) & GENMASK(7, 5))
+#define HSIO_S6G_DES_CFG_DES_BW_HYST_M GENMASK(7, 5)
+#define HSIO_S6G_DES_CFG_DES_BW_HYST_X(x) (((x) & GENMASK(7, 5)) >> 5)
+#define HSIO_S6G_DES_CFG_DES_SWAP_HYST BIT(4)
+#define HSIO_S6G_DES_CFG_DES_BW_ANA(x) (((x) << 1) & GENMASK(3, 1))
+#define HSIO_S6G_DES_CFG_DES_BW_ANA_M GENMASK(3, 1)
+#define HSIO_S6G_DES_CFG_DES_BW_ANA_X(x) (((x) & GENMASK(3, 1)) >> 1)
+#define HSIO_S6G_DES_CFG_DES_SWAP_ANA BIT(0)
+
+#define HSIO_S6G_IB_CFG_IB_SOFSI(x) (((x) << 29) & GENMASK(30, 29))
+#define HSIO_S6G_IB_CFG_IB_SOFSI_M GENMASK(30, 29)
+#define HSIO_S6G_IB_CFG_IB_SOFSI_X(x) (((x) & GENMASK(30, 29)) >> 29)
+#define HSIO_S6G_IB_CFG_IB_VBULK_SEL BIT(28)
+#define HSIO_S6G_IB_CFG_IB_RTRM_ADJ(x) (((x) << 24) & GENMASK(27, 24))
+#define HSIO_S6G_IB_CFG_IB_RTRM_ADJ_M GENMASK(27, 24)
+#define HSIO_S6G_IB_CFG_IB_RTRM_ADJ_X(x) (((x) & GENMASK(27, 24)) >> 24)
+#define HSIO_S6G_IB_CFG_IB_ICML_ADJ(x) (((x) << 20) & GENMASK(23, 20))
+#define HSIO_S6G_IB_CFG_IB_ICML_ADJ_M GENMASK(23, 20)
+#define HSIO_S6G_IB_CFG_IB_ICML_ADJ_X(x) (((x) & GENMASK(23, 20)) >> 20)
+#define HSIO_S6G_IB_CFG_IB_TERM_MODE_SEL(x) (((x) << 18) & GENMASK(19, 18))
+#define HSIO_S6G_IB_CFG_IB_TERM_MODE_SEL_M GENMASK(19, 18)
+#define HSIO_S6G_IB_CFG_IB_TERM_MODE_SEL_X(x) (((x) & GENMASK(19, 18)) >> 18)
+#define HSIO_S6G_IB_CFG_IB_SIG_DET_CLK_SEL(x) (((x) << 15) & GENMASK(17, 15))
+#define HSIO_S6G_IB_CFG_IB_SIG_DET_CLK_SEL_M GENMASK(17, 15)
+#define HSIO_S6G_IB_CFG_IB_SIG_DET_CLK_SEL_X(x) (((x) & GENMASK(17, 15)) >> 15)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_HP(x) (((x) << 13) & GENMASK(14, 13))
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_HP_M GENMASK(14, 13)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_HP_X(x) (((x) & GENMASK(14, 13)) >> 13)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_MID(x) (((x) << 11) & GENMASK(12, 11))
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_MID_M GENMASK(12, 11)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_MID_X(x) (((x) & GENMASK(12, 11)) >> 11)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_LP(x) (((x) << 9) & GENMASK(10, 9))
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_LP_M GENMASK(10, 9)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_LP_X(x) (((x) & GENMASK(10, 9)) >> 9)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_OFFSET(x) (((x) << 7) & GENMASK(8, 7))
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_OFFSET_M GENMASK(8, 7)
+#define HSIO_S6G_IB_CFG_IB_REG_PAT_SEL_OFFSET_X(x) (((x) & GENMASK(8, 7)) >> 7)
+#define HSIO_S6G_IB_CFG_IB_ANA_TEST_ENA BIT(6)
+#define HSIO_S6G_IB_CFG_IB_SIG_DET_ENA BIT(5)
+#define HSIO_S6G_IB_CFG_IB_CONCUR BIT(4)
+#define HSIO_S6G_IB_CFG_IB_CAL_ENA BIT(3)
+#define HSIO_S6G_IB_CFG_IB_SAM_ENA BIT(2)
+#define HSIO_S6G_IB_CFG_IB_EQZ_ENA BIT(1)
+#define HSIO_S6G_IB_CFG_IB_REG_ENA BIT(0)
+
+#define HSIO_S6G_IB_CFG1_IB_TJTAG(x) (((x) << 17) & GENMASK(21, 17))
+#define HSIO_S6G_IB_CFG1_IB_TJTAG_M GENMASK(21, 17)
+#define HSIO_S6G_IB_CFG1_IB_TJTAG_X(x) (((x) & GENMASK(21, 17)) >> 17)
+#define HSIO_S6G_IB_CFG1_IB_TSDET(x) (((x) << 12) & GENMASK(16, 12))
+#define HSIO_S6G_IB_CFG1_IB_TSDET_M GENMASK(16, 12)
+#define HSIO_S6G_IB_CFG1_IB_TSDET_X(x) (((x) & GENMASK(16, 12)) >> 12)
+#define HSIO_S6G_IB_CFG1_IB_SCALY(x) (((x) << 8) & GENMASK(11, 8))
+#define HSIO_S6G_IB_CFG1_IB_SCALY_M GENMASK(11, 8)
+#define HSIO_S6G_IB_CFG1_IB_SCALY_X(x) (((x) & GENMASK(11, 8)) >> 8)
+#define HSIO_S6G_IB_CFG1_IB_FILT_HP BIT(7)
+#define HSIO_S6G_IB_CFG1_IB_FILT_MID BIT(6)
+#define HSIO_S6G_IB_CFG1_IB_FILT_LP BIT(5)
+#define HSIO_S6G_IB_CFG1_IB_FILT_OFFSET BIT(4)
+#define HSIO_S6G_IB_CFG1_IB_FRC_HP BIT(3)
+#define HSIO_S6G_IB_CFG1_IB_FRC_MID BIT(2)
+#define HSIO_S6G_IB_CFG1_IB_FRC_LP BIT(1)
+#define HSIO_S6G_IB_CFG1_IB_FRC_OFFSET BIT(0)
+
+#define HSIO_S6G_IB_CFG2_IB_TINFV(x) (((x) << 27) & GENMASK(29, 27))
+#define HSIO_S6G_IB_CFG2_IB_TINFV_M GENMASK(29, 27)
+#define HSIO_S6G_IB_CFG2_IB_TINFV_X(x) (((x) & GENMASK(29, 27)) >> 27)
+#define HSIO_S6G_IB_CFG2_IB_OINFI(x) (((x) << 22) & GENMASK(26, 22))
+#define HSIO_S6G_IB_CFG2_IB_OINFI_M GENMASK(26, 22)
+#define HSIO_S6G_IB_CFG2_IB_OINFI_X(x) (((x) & GENMASK(26, 22)) >> 22)
+#define HSIO_S6G_IB_CFG2_IB_TAUX(x) (((x) << 19) & GENMASK(21, 19))
+#define HSIO_S6G_IB_CFG2_IB_TAUX_M GENMASK(21, 19)
+#define HSIO_S6G_IB_CFG2_IB_TAUX_X(x) (((x) & GENMASK(21, 19)) >> 19)
+#define HSIO_S6G_IB_CFG2_IB_OINFS(x) (((x) << 16) & GENMASK(18, 16))
+#define HSIO_S6G_IB_CFG2_IB_OINFS_M GENMASK(18, 16)
+#define HSIO_S6G_IB_CFG2_IB_OINFS_X(x) (((x) & GENMASK(18, 16)) >> 16)
+#define HSIO_S6G_IB_CFG2_IB_OCALS(x) (((x) << 10) & GENMASK(15, 10))
+#define HSIO_S6G_IB_CFG2_IB_OCALS_M GENMASK(15, 10)
+#define HSIO_S6G_IB_CFG2_IB_OCALS_X(x) (((x) & GENMASK(15, 10)) >> 10)
+#define HSIO_S6G_IB_CFG2_IB_TCALV(x) (((x) << 5) & GENMASK(9, 5))
+#define HSIO_S6G_IB_CFG2_IB_TCALV_M GENMASK(9, 5)
+#define HSIO_S6G_IB_CFG2_IB_TCALV_X(x) (((x) & GENMASK(9, 5)) >> 5)
+#define HSIO_S6G_IB_CFG2_IB_UMAX(x) (((x) << 3) & GENMASK(4, 3))
+#define HSIO_S6G_IB_CFG2_IB_UMAX_M GENMASK(4, 3)
+#define HSIO_S6G_IB_CFG2_IB_UMAX_X(x) (((x) & GENMASK(4, 3)) >> 3)
+#define HSIO_S6G_IB_CFG2_IB_UREG(x) ((x) & GENMASK(2, 0))
+#define HSIO_S6G_IB_CFG2_IB_UREG_M GENMASK(2, 0)
+
+#define HSIO_S6G_IB_CFG3_IB_INI_HP(x) (((x) << 18) & GENMASK(23, 18))
+#define HSIO_S6G_IB_CFG3_IB_INI_HP_M GENMASK(23, 18)
+#define HSIO_S6G_IB_CFG3_IB_INI_HP_X(x) (((x) & GENMASK(23, 18)) >> 18)
+#define HSIO_S6G_IB_CFG3_IB_INI_MID(x) (((x) << 12) & GENMASK(17, 12))
+#define HSIO_S6G_IB_CFG3_IB_INI_MID_M GENMASK(17, 12)
+#define HSIO_S6G_IB_CFG3_IB_INI_MID_X(x) (((x) & GENMASK(17, 12)) >> 12)
+#define HSIO_S6G_IB_CFG3_IB_INI_LP(x) (((x) << 6) & GENMASK(11, 6))
+#define HSIO_S6G_IB_CFG3_IB_INI_LP_M GENMASK(11, 6)
+#define HSIO_S6G_IB_CFG3_IB_INI_LP_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define HSIO_S6G_IB_CFG3_IB_INI_OFFSET(x) ((x) & GENMASK(5, 0))
+#define HSIO_S6G_IB_CFG3_IB_INI_OFFSET_M GENMASK(5, 0)
+
+#define HSIO_S6G_IB_CFG4_IB_MAX_HP(x) (((x) << 18) & GENMASK(23, 18))
+#define HSIO_S6G_IB_CFG4_IB_MAX_HP_M GENMASK(23, 18)
+#define HSIO_S6G_IB_CFG4_IB_MAX_HP_X(x) (((x) & GENMASK(23, 18)) >> 18)
+#define HSIO_S6G_IB_CFG4_IB_MAX_MID(x) (((x) << 12) & GENMASK(17, 12))
+#define HSIO_S6G_IB_CFG4_IB_MAX_MID_M GENMASK(17, 12)
+#define HSIO_S6G_IB_CFG4_IB_MAX_MID_X(x) (((x) & GENMASK(17, 12)) >> 12)
+#define HSIO_S6G_IB_CFG4_IB_MAX_LP(x) (((x) << 6) & GENMASK(11, 6))
+#define HSIO_S6G_IB_CFG4_IB_MAX_LP_M GENMASK(11, 6)
+#define HSIO_S6G_IB_CFG4_IB_MAX_LP_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define HSIO_S6G_IB_CFG4_IB_MAX_OFFSET(x) ((x) & GENMASK(5, 0))
+#define HSIO_S6G_IB_CFG4_IB_MAX_OFFSET_M GENMASK(5, 0)
+
+#define HSIO_S6G_IB_CFG5_IB_MIN_HP(x) (((x) << 18) & GENMASK(23, 18))
+#define HSIO_S6G_IB_CFG5_IB_MIN_HP_M GENMASK(23, 18)
+#define HSIO_S6G_IB_CFG5_IB_MIN_HP_X(x) (((x) & GENMASK(23, 18)) >> 18)
+#define HSIO_S6G_IB_CFG5_IB_MIN_MID(x) (((x) << 12) & GENMASK(17, 12))
+#define HSIO_S6G_IB_CFG5_IB_MIN_MID_M GENMASK(17, 12)
+#define HSIO_S6G_IB_CFG5_IB_MIN_MID_X(x) (((x) & GENMASK(17, 12)) >> 12)
+#define HSIO_S6G_IB_CFG5_IB_MIN_LP(x) (((x) << 6) & GENMASK(11, 6))
+#define HSIO_S6G_IB_CFG5_IB_MIN_LP_M GENMASK(11, 6)
+#define HSIO_S6G_IB_CFG5_IB_MIN_LP_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define HSIO_S6G_IB_CFG5_IB_MIN_OFFSET(x) ((x) & GENMASK(5, 0))
+#define HSIO_S6G_IB_CFG5_IB_MIN_OFFSET_M GENMASK(5, 0)
+
+#define HSIO_S6G_OB_CFG_OB_IDLE BIT(31)
+#define HSIO_S6G_OB_CFG_OB_ENA1V_MODE BIT(30)
+#define HSIO_S6G_OB_CFG_OB_POL BIT(29)
+#define HSIO_S6G_OB_CFG_OB_POST0(x) (((x) << 23) & GENMASK(28, 23))
+#define HSIO_S6G_OB_CFG_OB_POST0_M GENMASK(28, 23)
+#define HSIO_S6G_OB_CFG_OB_POST0_X(x) (((x) & GENMASK(28, 23)) >> 23)
+#define HSIO_S6G_OB_CFG_OB_PREC(x) (((x) << 18) & GENMASK(22, 18))
+#define HSIO_S6G_OB_CFG_OB_PREC_M GENMASK(22, 18)
+#define HSIO_S6G_OB_CFG_OB_PREC_X(x) (((x) & GENMASK(22, 18)) >> 18)
+#define HSIO_S6G_OB_CFG_OB_R_ADJ_MUX BIT(17)
+#define HSIO_S6G_OB_CFG_OB_R_ADJ_PDR BIT(16)
+#define HSIO_S6G_OB_CFG_OB_POST1(x) (((x) << 11) & GENMASK(15, 11))
+#define HSIO_S6G_OB_CFG_OB_POST1_M GENMASK(15, 11)
+#define HSIO_S6G_OB_CFG_OB_POST1_X(x) (((x) & GENMASK(15, 11)) >> 11)
+#define HSIO_S6G_OB_CFG_OB_R_COR BIT(10)
+#define HSIO_S6G_OB_CFG_OB_SEL_RCTRL BIT(9)
+#define HSIO_S6G_OB_CFG_OB_SR_H BIT(8)
+#define HSIO_S6G_OB_CFG_OB_SR(x) (((x) << 4) & GENMASK(7, 4))
+#define HSIO_S6G_OB_CFG_OB_SR_M GENMASK(7, 4)
+#define HSIO_S6G_OB_CFG_OB_SR_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define HSIO_S6G_OB_CFG_OB_RESISTOR_CTRL(x) ((x) & GENMASK(3, 0))
+#define HSIO_S6G_OB_CFG_OB_RESISTOR_CTRL_M GENMASK(3, 0)
+
+#define HSIO_S6G_OB_CFG1_OB_ENA_CAS(x) (((x) << 6) & GENMASK(8, 6))
+#define HSIO_S6G_OB_CFG1_OB_ENA_CAS_M GENMASK(8, 6)
+#define HSIO_S6G_OB_CFG1_OB_ENA_CAS_X(x) (((x) & GENMASK(8, 6)) >> 6)
+#define HSIO_S6G_OB_CFG1_OB_LEV(x) ((x) & GENMASK(5, 0))
+#define HSIO_S6G_OB_CFG1_OB_LEV_M GENMASK(5, 0)
+
+#define HSIO_S6G_SER_CFG_SER_4TAP_ENA BIT(8)
+#define HSIO_S6G_SER_CFG_SER_CPMD_SEL BIT(7)
+#define HSIO_S6G_SER_CFG_SER_SWAP_CPMD BIT(6)
+#define HSIO_S6G_SER_CFG_SER_ALISEL(x) (((x) << 4) & GENMASK(5, 4))
+#define HSIO_S6G_SER_CFG_SER_ALISEL_M GENMASK(5, 4)
+#define HSIO_S6G_SER_CFG_SER_ALISEL_X(x) (((x) & GENMASK(5, 4)) >> 4)
+#define HSIO_S6G_SER_CFG_SER_ENHYS BIT(3)
+#define HSIO_S6G_SER_CFG_SER_BIG_WIN BIT(2)
+#define HSIO_S6G_SER_CFG_SER_EN_WIN BIT(1)
+#define HSIO_S6G_SER_CFG_SER_ENALI BIT(0)
+
+#define HSIO_S6G_COMMON_CFG_SYS_RST BIT(17)
+#define HSIO_S6G_COMMON_CFG_SE_DIV2_ENA BIT(16)
+#define HSIO_S6G_COMMON_CFG_SE_AUTO_SQUELCH_ENA BIT(15)
+#define HSIO_S6G_COMMON_CFG_ENA_LANE BIT(14)
+#define HSIO_S6G_COMMON_CFG_PWD_RX BIT(13)
+#define HSIO_S6G_COMMON_CFG_PWD_TX BIT(12)
+#define HSIO_S6G_COMMON_CFG_LANE_CTRL(x) (((x) << 9) & GENMASK(11, 9))
+#define HSIO_S6G_COMMON_CFG_LANE_CTRL_M GENMASK(11, 9)
+#define HSIO_S6G_COMMON_CFG_LANE_CTRL_X(x) (((x) & GENMASK(11, 9)) >> 9)
+#define HSIO_S6G_COMMON_CFG_ENA_DIRECT BIT(8)
+#define HSIO_S6G_COMMON_CFG_ENA_ELOOP BIT(7)
+#define HSIO_S6G_COMMON_CFG_ENA_FLOOP BIT(6)
+#define HSIO_S6G_COMMON_CFG_ENA_ILOOP BIT(5)
+#define HSIO_S6G_COMMON_CFG_ENA_PLOOP BIT(4)
+#define HSIO_S6G_COMMON_CFG_HRATE BIT(3)
+#define HSIO_S6G_COMMON_CFG_QRATE BIT(2)
+#define HSIO_S6G_COMMON_CFG_IF_MODE(x) ((x) & GENMASK(1, 0))
+#define HSIO_S6G_COMMON_CFG_IF_MODE_M GENMASK(1, 0)
+
+#define HSIO_S6G_PLL_CFG_PLL_ENA_OFFS(x) (((x) << 16) & GENMASK(17, 16))
+#define HSIO_S6G_PLL_CFG_PLL_ENA_OFFS_M GENMASK(17, 16)
+#define HSIO_S6G_PLL_CFG_PLL_ENA_OFFS_X(x) (((x) & GENMASK(17, 16)) >> 16)
+#define HSIO_S6G_PLL_CFG_PLL_DIV4 BIT(15)
+#define HSIO_S6G_PLL_CFG_PLL_ENA_ROT BIT(14)
+#define HSIO_S6G_PLL_CFG_PLL_FSM_CTRL_DATA(x) (((x) << 6) & GENMASK(13, 6))
+#define HSIO_S6G_PLL_CFG_PLL_FSM_CTRL_DATA_M GENMASK(13, 6)
+#define HSIO_S6G_PLL_CFG_PLL_FSM_CTRL_DATA_X(x) (((x) & GENMASK(13, 6)) >> 6)
+#define HSIO_S6G_PLL_CFG_PLL_FSM_ENA BIT(5)
+#define HSIO_S6G_PLL_CFG_PLL_FSM_FORCE_SET_ENA BIT(4)
+#define HSIO_S6G_PLL_CFG_PLL_FSM_OOR_RECAL_ENA BIT(3)
+#define HSIO_S6G_PLL_CFG_PLL_RB_DATA_SEL BIT(2)
+#define HSIO_S6G_PLL_CFG_PLL_ROT_DIR BIT(1)
+#define HSIO_S6G_PLL_CFG_PLL_ROT_FRQ BIT(0)
+
+#define HSIO_S6G_ACJTAG_CFG_ACJTAG_INIT_DATA_N BIT(5)
+#define HSIO_S6G_ACJTAG_CFG_ACJTAG_INIT_DATA_P BIT(4)
+#define HSIO_S6G_ACJTAG_CFG_ACJTAG_INIT_CLK BIT(3)
+#define HSIO_S6G_ACJTAG_CFG_OB_DIRECT BIT(2)
+#define HSIO_S6G_ACJTAG_CFG_ACJTAG_ENA BIT(1)
+#define HSIO_S6G_ACJTAG_CFG_JTAG_CTRL_ENA BIT(0)
+
+#define HSIO_S6G_GP_CFG_GP_MSB(x) (((x) << 16) & GENMASK(31, 16))
+#define HSIO_S6G_GP_CFG_GP_MSB_M GENMASK(31, 16)
+#define HSIO_S6G_GP_CFG_GP_MSB_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define HSIO_S6G_GP_CFG_GP_LSB(x) ((x) & GENMASK(15, 0))
+#define HSIO_S6G_GP_CFG_GP_LSB_M GENMASK(15, 0)
+
+#define HSIO_S6G_IB_STATUS0_IB_CAL_DONE BIT(8)
+#define HSIO_S6G_IB_STATUS0_IB_HP_GAIN_ACT BIT(7)
+#define HSIO_S6G_IB_STATUS0_IB_MID_GAIN_ACT BIT(6)
+#define HSIO_S6G_IB_STATUS0_IB_LP_GAIN_ACT BIT(5)
+#define HSIO_S6G_IB_STATUS0_IB_OFFSET_ACT BIT(4)
+#define HSIO_S6G_IB_STATUS0_IB_OFFSET_VLD BIT(3)
+#define HSIO_S6G_IB_STATUS0_IB_OFFSET_ERR BIT(2)
+#define HSIO_S6G_IB_STATUS0_IB_OFFSDIR BIT(1)
+#define HSIO_S6G_IB_STATUS0_IB_SIG_DET BIT(0)
+
+#define HSIO_S6G_IB_STATUS1_IB_HP_GAIN_STAT(x) (((x) << 18) & GENMASK(23, 18))
+#define HSIO_S6G_IB_STATUS1_IB_HP_GAIN_STAT_M GENMASK(23, 18)
+#define HSIO_S6G_IB_STATUS1_IB_HP_GAIN_STAT_X(x) (((x) & GENMASK(23, 18)) >> 18)
+#define HSIO_S6G_IB_STATUS1_IB_MID_GAIN_STAT(x) (((x) << 12) & GENMASK(17, 12))
+#define HSIO_S6G_IB_STATUS1_IB_MID_GAIN_STAT_M GENMASK(17, 12)
+#define HSIO_S6G_IB_STATUS1_IB_MID_GAIN_STAT_X(x) (((x) & GENMASK(17, 12)) >> 12)
+#define HSIO_S6G_IB_STATUS1_IB_LP_GAIN_STAT(x) (((x) << 6) & GENMASK(11, 6))
+#define HSIO_S6G_IB_STATUS1_IB_LP_GAIN_STAT_M GENMASK(11, 6)
+#define HSIO_S6G_IB_STATUS1_IB_LP_GAIN_STAT_X(x) (((x) & GENMASK(11, 6)) >> 6)
+#define HSIO_S6G_IB_STATUS1_IB_OFFSET_STAT(x) ((x) & GENMASK(5, 0))
+#define HSIO_S6G_IB_STATUS1_IB_OFFSET_STAT_M GENMASK(5, 0)
+
+#define HSIO_S6G_ACJTAG_STATUS_ACJTAG_CAPT_DATA_N BIT(2)
+#define HSIO_S6G_ACJTAG_STATUS_ACJTAG_CAPT_DATA_P BIT(1)
+#define HSIO_S6G_ACJTAG_STATUS_IB_DIRECT BIT(0)
+
+#define HSIO_S6G_PLL_STATUS_PLL_CAL_NOT_DONE BIT(10)
+#define HSIO_S6G_PLL_STATUS_PLL_CAL_ERR BIT(9)
+#define HSIO_S6G_PLL_STATUS_PLL_OUT_OF_RANGE_ERR BIT(8)
+#define HSIO_S6G_PLL_STATUS_PLL_RB_DATA(x) ((x) & GENMASK(7, 0))
+#define HSIO_S6G_PLL_STATUS_PLL_RB_DATA_M GENMASK(7, 0)
+
+#define HSIO_S6G_REVID_SERDES_REV(x) (((x) << 26) & GENMASK(31, 26))
+#define HSIO_S6G_REVID_SERDES_REV_M GENMASK(31, 26)
+#define HSIO_S6G_REVID_SERDES_REV_X(x) (((x) & GENMASK(31, 26)) >> 26)
+#define HSIO_S6G_REVID_RCPLL_REV(x) (((x) << 21) & GENMASK(25, 21))
+#define HSIO_S6G_REVID_RCPLL_REV_M GENMASK(25, 21)
+#define HSIO_S6G_REVID_RCPLL_REV_X(x) (((x) & GENMASK(25, 21)) >> 21)
+#define HSIO_S6G_REVID_SER_REV(x) (((x) << 16) & GENMASK(20, 16))
+#define HSIO_S6G_REVID_SER_REV_M GENMASK(20, 16)
+#define HSIO_S6G_REVID_SER_REV_X(x) (((x) & GENMASK(20, 16)) >> 16)
+#define HSIO_S6G_REVID_DES_REV(x) (((x) << 10) & GENMASK(15, 10))
+#define HSIO_S6G_REVID_DES_REV_M GENMASK(15, 10)
+#define HSIO_S6G_REVID_DES_REV_X(x) (((x) & GENMASK(15, 10)) >> 10)
+#define HSIO_S6G_REVID_OB_REV(x) (((x) << 5) & GENMASK(9, 5))
+#define HSIO_S6G_REVID_OB_REV_M GENMASK(9, 5)
+#define HSIO_S6G_REVID_OB_REV_X(x) (((x) & GENMASK(9, 5)) >> 5)
+#define HSIO_S6G_REVID_IB_REV(x) ((x) & GENMASK(4, 0))
+#define HSIO_S6G_REVID_IB_REV_M GENMASK(4, 0)
+
+#define HSIO_MCB_S6G_ADDR_CFG_SERDES6G_WR_ONE_SHOT BIT(31)
+#define HSIO_MCB_S6G_ADDR_CFG_SERDES6G_RD_ONE_SHOT BIT(30)
+#define HSIO_MCB_S6G_ADDR_CFG_SERDES6G_ADDR(x) ((x) & GENMASK(24, 0))
+#define HSIO_MCB_S6G_ADDR_CFG_SERDES6G_ADDR_M GENMASK(24, 0)
+
+#define HSIO_HW_CFG_DEV2G5_10_MODE BIT(6)
+#define HSIO_HW_CFG_DEV1G_9_MODE BIT(5)
+#define HSIO_HW_CFG_DEV1G_6_MODE BIT(4)
+#define HSIO_HW_CFG_DEV1G_5_MODE BIT(3)
+#define HSIO_HW_CFG_DEV1G_4_MODE BIT(2)
+#define HSIO_HW_CFG_PCIE_ENA BIT(1)
+#define HSIO_HW_CFG_QSGMII_ENA BIT(0)
+
+#define HSIO_HW_QSGMII_CFG_SHYST_DIS BIT(3)
+#define HSIO_HW_QSGMII_CFG_E_DET_ENA BIT(2)
+#define HSIO_HW_QSGMII_CFG_USE_I1_ENA BIT(1)
+#define HSIO_HW_QSGMII_CFG_FLIP_LANES BIT(0)
+
+#define HSIO_HW_QSGMII_STAT_DELAY_VAR_X200PS(x) (((x) << 1) & GENMASK(6, 1))
+#define HSIO_HW_QSGMII_STAT_DELAY_VAR_X200PS_M GENMASK(6, 1)
+#define HSIO_HW_QSGMII_STAT_DELAY_VAR_X200PS_X(x) (((x) & GENMASK(6, 1)) >> 1)
+#define HSIO_HW_QSGMII_STAT_SYNC BIT(0)
+
+#define HSIO_CLK_CFG_CLKDIV_PHY(x) (((x) << 1) & GENMASK(8, 1))
+#define HSIO_CLK_CFG_CLKDIV_PHY_M GENMASK(8, 1)
+#define HSIO_CLK_CFG_CLKDIV_PHY_X(x) (((x) & GENMASK(8, 1)) >> 1)
+#define HSIO_CLK_CFG_CLKDIV_PHY_DIS BIT(0)
+
+#define HSIO_TEMP_SENSOR_CTRL_FORCE_TEMP_RD BIT(5)
+#define HSIO_TEMP_SENSOR_CTRL_FORCE_RUN BIT(4)
+#define HSIO_TEMP_SENSOR_CTRL_FORCE_NO_RST BIT(3)
+#define HSIO_TEMP_SENSOR_CTRL_FORCE_POWER_UP BIT(2)
+#define HSIO_TEMP_SENSOR_CTRL_FORCE_CLK BIT(1)
+#define HSIO_TEMP_SENSOR_CTRL_SAMPLE_ENA BIT(0)
+
+#define HSIO_TEMP_SENSOR_CFG_RUN_WID(x) (((x) << 8) & GENMASK(15, 8))
+#define HSIO_TEMP_SENSOR_CFG_RUN_WID_M GENMASK(15, 8)
+#define HSIO_TEMP_SENSOR_CFG_RUN_WID_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define HSIO_TEMP_SENSOR_CFG_SAMPLE_PER(x) ((x) & GENMASK(7, 0))
+#define HSIO_TEMP_SENSOR_CFG_SAMPLE_PER_M GENMASK(7, 0)
+
+#define HSIO_TEMP_SENSOR_STAT_TEMP_VALID BIT(8)
+#define HSIO_TEMP_SENSOR_STAT_TEMP(x) ((x) & GENMASK(7, 0))
+#define HSIO_TEMP_SENSOR_STAT_TEMP_M GENMASK(7, 0)
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_io.c b/drivers/net/ethernet/mscc/ocelot_io.c
new file mode 100644
index 0000000..c6db8ad
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_io.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/platform_device.h>
+
+#include "ocelot.h"
+
+u32 __ocelot_read_ix(struct ocelot *ocelot, u32 reg, u32 offset)
+{
+ u16 target = reg >> TARGET_OFFSET;
+ u32 val;
+
+ WARN_ON(!target);
+
+ regmap_read(ocelot->targets[target],
+ ocelot->map[target][reg & REG_MASK] + offset, &val);
+ return val;
+}
+EXPORT_SYMBOL(__ocelot_read_ix);
+
+void __ocelot_write_ix(struct ocelot *ocelot, u32 val, u32 reg, u32 offset)
+{
+ u16 target = reg >> TARGET_OFFSET;
+
+ WARN_ON(!target);
+
+ regmap_write(ocelot->targets[target],
+ ocelot->map[target][reg & REG_MASK] + offset, val);
+}
+EXPORT_SYMBOL(__ocelot_write_ix);
+
+void __ocelot_rmw_ix(struct ocelot *ocelot, u32 val, u32 mask, u32 reg,
+ u32 offset)
+{
+ u16 target = reg >> TARGET_OFFSET;
+
+ WARN_ON(!target);
+
+ regmap_update_bits(ocelot->targets[target],
+ ocelot->map[target][reg & REG_MASK] + offset,
+ mask, val);
+}
+EXPORT_SYMBOL(__ocelot_rmw_ix);
+
+u32 ocelot_port_readl(struct ocelot_port *port, u32 reg)
+{
+ return readl(port->regs + reg);
+}
+EXPORT_SYMBOL(ocelot_port_readl);
+
+void ocelot_port_writel(struct ocelot_port *port, u32 val, u32 reg)
+{
+ writel(val, port->regs + reg);
+}
+EXPORT_SYMBOL(ocelot_port_writel);
+
+int ocelot_regfields_init(struct ocelot *ocelot,
+ const struct reg_field *const regfields)
+{
+ unsigned int i;
+ u16 target;
+
+ for (i = 0; i < REGFIELD_MAX; i++) {
+ struct reg_field regfield = {};
+ u32 reg = regfields[i].reg;
+
+ if (!reg)
+ continue;
+
+ target = regfields[i].reg >> TARGET_OFFSET;
+
+ regfield.reg = ocelot->map[target][reg & REG_MASK];
+ regfield.lsb = regfields[i].lsb;
+ regfield.msb = regfields[i].msb;
+
+ ocelot->regfields[i] =
+ devm_regmap_field_alloc(ocelot->dev,
+ ocelot->targets[target],
+ regfield);
+
+ if (IS_ERR(ocelot->regfields[i]))
+ return PTR_ERR(ocelot->regfields[i]);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(ocelot_regfields_init);
+
+static struct regmap_config ocelot_regmap_config = {
+ .reg_bits = 32,
+ .val_bits = 32,
+ .reg_stride = 4,
+};
+
+struct regmap *ocelot_io_platform_init(struct ocelot *ocelot,
+ struct platform_device *pdev,
+ const char *name)
+{
+ struct resource *res;
+ void __iomem *regs;
+
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM, name);
+ regs = devm_ioremap_resource(ocelot->dev, res);
+ if (IS_ERR(regs))
+ return ERR_CAST(regs);
+
+ ocelot_regmap_config.name = name;
+ return devm_regmap_init_mmio(ocelot->dev, regs,
+ &ocelot_regmap_config);
+}
+EXPORT_SYMBOL(ocelot_io_platform_init);
diff --git a/drivers/net/ethernet/mscc/ocelot_qs.h b/drivers/net/ethernet/mscc/ocelot_qs.h
new file mode 100644
index 0000000..d18ae72
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_qs.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_QS_H_
+#define _MSCC_OCELOT_QS_H_
+
+/* TODO handle BE */
+#define XTR_EOF_0 0x00000080U
+#define XTR_EOF_1 0x01000080U
+#define XTR_EOF_2 0x02000080U
+#define XTR_EOF_3 0x03000080U
+#define XTR_PRUNED 0x04000080U
+#define XTR_ABORT 0x05000080U
+#define XTR_ESCAPE 0x06000080U
+#define XTR_NOT_READY 0x07000080U
+#define XTR_VALID_BYTES(x) (4 - (((x) >> 24) & 3))
+
+#define QS_XTR_GRP_CFG_RSZ 0x4
+
+#define QS_XTR_GRP_CFG_MODE(x) (((x) << 2) & GENMASK(3, 2))
+#define QS_XTR_GRP_CFG_MODE_M GENMASK(3, 2)
+#define QS_XTR_GRP_CFG_MODE_X(x) (((x) & GENMASK(3, 2)) >> 2)
+#define QS_XTR_GRP_CFG_STATUS_WORD_POS BIT(1)
+#define QS_XTR_GRP_CFG_BYTE_SWAP BIT(0)
+
+#define QS_XTR_RD_RSZ 0x4
+
+#define QS_XTR_FRM_PRUNING_RSZ 0x4
+
+#define QS_XTR_CFG_DP_WM(x) (((x) << 5) & GENMASK(7, 5))
+#define QS_XTR_CFG_DP_WM_M GENMASK(7, 5)
+#define QS_XTR_CFG_DP_WM_X(x) (((x) & GENMASK(7, 5)) >> 5)
+#define QS_XTR_CFG_SCH_WM(x) (((x) << 2) & GENMASK(4, 2))
+#define QS_XTR_CFG_SCH_WM_M GENMASK(4, 2)
+#define QS_XTR_CFG_SCH_WM_X(x) (((x) & GENMASK(4, 2)) >> 2)
+#define QS_XTR_CFG_OFLW_ERR_STICKY(x) ((x) & GENMASK(1, 0))
+#define QS_XTR_CFG_OFLW_ERR_STICKY_M GENMASK(1, 0)
+
+#define QS_INJ_GRP_CFG_RSZ 0x4
+
+#define QS_INJ_GRP_CFG_MODE(x) (((x) << 2) & GENMASK(3, 2))
+#define QS_INJ_GRP_CFG_MODE_M GENMASK(3, 2)
+#define QS_INJ_GRP_CFG_MODE_X(x) (((x) & GENMASK(3, 2)) >> 2)
+#define QS_INJ_GRP_CFG_BYTE_SWAP BIT(0)
+
+#define QS_INJ_WR_RSZ 0x4
+
+#define QS_INJ_CTRL_RSZ 0x4
+
+#define QS_INJ_CTRL_GAP_SIZE(x) (((x) << 21) & GENMASK(24, 21))
+#define QS_INJ_CTRL_GAP_SIZE_M GENMASK(24, 21)
+#define QS_INJ_CTRL_GAP_SIZE_X(x) (((x) & GENMASK(24, 21)) >> 21)
+#define QS_INJ_CTRL_ABORT BIT(20)
+#define QS_INJ_CTRL_EOF BIT(19)
+#define QS_INJ_CTRL_SOF BIT(18)
+#define QS_INJ_CTRL_VLD_BYTES(x) (((x) << 16) & GENMASK(17, 16))
+#define QS_INJ_CTRL_VLD_BYTES_M GENMASK(17, 16)
+#define QS_INJ_CTRL_VLD_BYTES_X(x) (((x) & GENMASK(17, 16)) >> 16)
+
+#define QS_INJ_STATUS_WMARK_REACHED(x) (((x) << 4) & GENMASK(5, 4))
+#define QS_INJ_STATUS_WMARK_REACHED_M GENMASK(5, 4)
+#define QS_INJ_STATUS_WMARK_REACHED_X(x) (((x) & GENMASK(5, 4)) >> 4)
+#define QS_INJ_STATUS_FIFO_RDY(x) (((x) << 2) & GENMASK(3, 2))
+#define QS_INJ_STATUS_FIFO_RDY_M GENMASK(3, 2)
+#define QS_INJ_STATUS_FIFO_RDY_X(x) (((x) & GENMASK(3, 2)) >> 2)
+#define QS_INJ_STATUS_INJ_IN_PROGRESS(x) ((x) & GENMASK(1, 0))
+#define QS_INJ_STATUS_INJ_IN_PROGRESS_M GENMASK(1, 0)
+
+#define QS_INJ_ERR_RSZ 0x4
+
+#define QS_INJ_ERR_ABORT_ERR_STICKY BIT(1)
+#define QS_INJ_ERR_WR_ERR_STICKY BIT(0)
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_qsys.h b/drivers/net/ethernet/mscc/ocelot_qsys.h
new file mode 100644
index 0000000..d8c63aa
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_qsys.h
@@ -0,0 +1,270 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_QSYS_H_
+#define _MSCC_OCELOT_QSYS_H_
+
+#define QSYS_PORT_MODE_RSZ 0x4
+
+#define QSYS_PORT_MODE_DEQUEUE_DIS BIT(1)
+#define QSYS_PORT_MODE_DEQUEUE_LATE BIT(0)
+
+#define QSYS_SWITCH_PORT_MODE_RSZ 0x4
+
+#define QSYS_SWITCH_PORT_MODE_PORT_ENA BIT(14)
+#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG(x) (((x) << 11) & GENMASK(13, 11))
+#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG_M GENMASK(13, 11)
+#define QSYS_SWITCH_PORT_MODE_SCH_NEXT_CFG_X(x) (((x) & GENMASK(13, 11)) >> 11)
+#define QSYS_SWITCH_PORT_MODE_YEL_RSRVD BIT(10)
+#define QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE BIT(9)
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA(x) (((x) << 1) & GENMASK(8, 1))
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA_M GENMASK(8, 1)
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_ENA_X(x) (((x) & GENMASK(8, 1)) >> 1)
+#define QSYS_SWITCH_PORT_MODE_TX_PFC_MODE BIT(0)
+
+#define QSYS_STAT_CNT_CFG_TX_GREEN_CNT_MODE BIT(5)
+#define QSYS_STAT_CNT_CFG_TX_YELLOW_CNT_MODE BIT(4)
+#define QSYS_STAT_CNT_CFG_DROP_GREEN_CNT_MODE BIT(3)
+#define QSYS_STAT_CNT_CFG_DROP_YELLOW_CNT_MODE BIT(2)
+#define QSYS_STAT_CNT_CFG_DROP_COUNT_ONCE BIT(1)
+#define QSYS_STAT_CNT_CFG_DROP_COUNT_EGRESS BIT(0)
+
+#define QSYS_EEE_CFG_RSZ 0x4
+
+#define QSYS_EEE_THRES_EEE_HIGH_BYTES(x) (((x) << 8) & GENMASK(15, 8))
+#define QSYS_EEE_THRES_EEE_HIGH_BYTES_M GENMASK(15, 8)
+#define QSYS_EEE_THRES_EEE_HIGH_BYTES_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define QSYS_EEE_THRES_EEE_HIGH_FRAMES(x) ((x) & GENMASK(7, 0))
+#define QSYS_EEE_THRES_EEE_HIGH_FRAMES_M GENMASK(7, 0)
+
+#define QSYS_SW_STATUS_RSZ 0x4
+
+#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT(x) (((x) << 8) & GENMASK(12, 8))
+#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT_M GENMASK(12, 8)
+#define QSYS_EXT_CPU_CFG_EXT_CPU_PORT_X(x) (((x) & GENMASK(12, 8)) >> 8)
+#define QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK(x) ((x) & GENMASK(7, 0))
+#define QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK_M GENMASK(7, 0)
+
+#define QSYS_QMAP_GSZ 0x4
+
+#define QSYS_QMAP_SE_BASE(x) (((x) << 5) & GENMASK(12, 5))
+#define QSYS_QMAP_SE_BASE_M GENMASK(12, 5)
+#define QSYS_QMAP_SE_BASE_X(x) (((x) & GENMASK(12, 5)) >> 5)
+#define QSYS_QMAP_SE_IDX_SEL(x) (((x) << 2) & GENMASK(4, 2))
+#define QSYS_QMAP_SE_IDX_SEL_M GENMASK(4, 2)
+#define QSYS_QMAP_SE_IDX_SEL_X(x) (((x) & GENMASK(4, 2)) >> 2)
+#define QSYS_QMAP_SE_INP_SEL(x) ((x) & GENMASK(1, 0))
+#define QSYS_QMAP_SE_INP_SEL_M GENMASK(1, 0)
+
+#define QSYS_ISDX_SGRP_GSZ 0x4
+
+#define QSYS_TIMED_FRAME_ENTRY_GSZ 0x4
+
+#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT(x) (((x) << 9) & GENMASK(18, 9))
+#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT_M GENMASK(18, 9)
+#define QSYS_TFRM_MISC_TIMED_CANCEL_SLOT_X(x) (((x) & GENMASK(18, 9)) >> 9)
+#define QSYS_TFRM_MISC_TIMED_CANCEL_1SHOT BIT(8)
+#define QSYS_TFRM_MISC_TIMED_SLOT_MODE_MC BIT(7)
+#define QSYS_TFRM_MISC_TIMED_ENTRY_FAST_CNT(x) ((x) & GENMASK(6, 0))
+#define QSYS_TFRM_MISC_TIMED_ENTRY_FAST_CNT_M GENMASK(6, 0)
+
+#define QSYS_RED_PROFILE_RSZ 0x4
+
+#define QSYS_RED_PROFILE_WM_RED_LOW(x) (((x) << 8) & GENMASK(15, 8))
+#define QSYS_RED_PROFILE_WM_RED_LOW_M GENMASK(15, 8)
+#define QSYS_RED_PROFILE_WM_RED_LOW_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define QSYS_RED_PROFILE_WM_RED_HIGH(x) ((x) & GENMASK(7, 0))
+#define QSYS_RED_PROFILE_WM_RED_HIGH_M GENMASK(7, 0)
+
+#define QSYS_RES_CFG_GSZ 0x8
+
+#define QSYS_RES_STAT_GSZ 0x8
+
+#define QSYS_RES_STAT_INUSE(x) (((x) << 12) & GENMASK(23, 12))
+#define QSYS_RES_STAT_INUSE_M GENMASK(23, 12)
+#define QSYS_RES_STAT_INUSE_X(x) (((x) & GENMASK(23, 12)) >> 12)
+#define QSYS_RES_STAT_MAXUSE(x) ((x) & GENMASK(11, 0))
+#define QSYS_RES_STAT_MAXUSE_M GENMASK(11, 0)
+
+#define QSYS_EVENTS_CORE_EV_FDC(x) (((x) << 2) & GENMASK(4, 2))
+#define QSYS_EVENTS_CORE_EV_FDC_M GENMASK(4, 2)
+#define QSYS_EVENTS_CORE_EV_FDC_X(x) (((x) & GENMASK(4, 2)) >> 2)
+#define QSYS_EVENTS_CORE_EV_FRD(x) ((x) & GENMASK(1, 0))
+#define QSYS_EVENTS_CORE_EV_FRD_M GENMASK(1, 0)
+
+#define QSYS_QMAXSDU_CFG_0_RSZ 0x4
+
+#define QSYS_QMAXSDU_CFG_1_RSZ 0x4
+
+#define QSYS_QMAXSDU_CFG_2_RSZ 0x4
+
+#define QSYS_QMAXSDU_CFG_3_RSZ 0x4
+
+#define QSYS_QMAXSDU_CFG_4_RSZ 0x4
+
+#define QSYS_QMAXSDU_CFG_5_RSZ 0x4
+
+#define QSYS_QMAXSDU_CFG_6_RSZ 0x4
+
+#define QSYS_QMAXSDU_CFG_7_RSZ 0x4
+
+#define QSYS_PREEMPTION_CFG_RSZ 0x4
+
+#define QSYS_PREEMPTION_CFG_P_QUEUES(x) ((x) & GENMASK(7, 0))
+#define QSYS_PREEMPTION_CFG_P_QUEUES_M GENMASK(7, 0)
+#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE(x) (((x) << 8) & GENMASK(9, 8))
+#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE_M GENMASK(9, 8)
+#define QSYS_PREEMPTION_CFG_MM_ADD_FRAG_SIZE_X(x) (((x) & GENMASK(9, 8)) >> 8)
+#define QSYS_PREEMPTION_CFG_STRICT_IPG(x) (((x) << 12) & GENMASK(13, 12))
+#define QSYS_PREEMPTION_CFG_STRICT_IPG_M GENMASK(13, 12)
+#define QSYS_PREEMPTION_CFG_STRICT_IPG_X(x) (((x) & GENMASK(13, 12)) >> 12)
+#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE(x) (((x) << 16) & GENMASK(31, 16))
+#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE_M GENMASK(31, 16)
+#define QSYS_PREEMPTION_CFG_HOLD_ADVANCE_X(x) (((x) & GENMASK(31, 16)) >> 16)
+
+#define QSYS_CIR_CFG_GSZ 0x80
+
+#define QSYS_CIR_CFG_CIR_RATE(x) (((x) << 6) & GENMASK(20, 6))
+#define QSYS_CIR_CFG_CIR_RATE_M GENMASK(20, 6)
+#define QSYS_CIR_CFG_CIR_RATE_X(x) (((x) & GENMASK(20, 6)) >> 6)
+#define QSYS_CIR_CFG_CIR_BURST(x) ((x) & GENMASK(5, 0))
+#define QSYS_CIR_CFG_CIR_BURST_M GENMASK(5, 0)
+
+#define QSYS_EIR_CFG_GSZ 0x80
+
+#define QSYS_EIR_CFG_EIR_RATE(x) (((x) << 7) & GENMASK(21, 7))
+#define QSYS_EIR_CFG_EIR_RATE_M GENMASK(21, 7)
+#define QSYS_EIR_CFG_EIR_RATE_X(x) (((x) & GENMASK(21, 7)) >> 7)
+#define QSYS_EIR_CFG_EIR_BURST(x) (((x) << 1) & GENMASK(6, 1))
+#define QSYS_EIR_CFG_EIR_BURST_M GENMASK(6, 1)
+#define QSYS_EIR_CFG_EIR_BURST_X(x) (((x) & GENMASK(6, 1)) >> 1)
+#define QSYS_EIR_CFG_EIR_MARK_ENA BIT(0)
+
+#define QSYS_SE_CFG_GSZ 0x80
+
+#define QSYS_SE_CFG_SE_DWRR_CNT(x) (((x) << 6) & GENMASK(9, 6))
+#define QSYS_SE_CFG_SE_DWRR_CNT_M GENMASK(9, 6)
+#define QSYS_SE_CFG_SE_DWRR_CNT_X(x) (((x) & GENMASK(9, 6)) >> 6)
+#define QSYS_SE_CFG_SE_RR_ENA BIT(5)
+#define QSYS_SE_CFG_SE_AVB_ENA BIT(4)
+#define QSYS_SE_CFG_SE_FRM_MODE(x) (((x) << 2) & GENMASK(3, 2))
+#define QSYS_SE_CFG_SE_FRM_MODE_M GENMASK(3, 2)
+#define QSYS_SE_CFG_SE_FRM_MODE_X(x) (((x) & GENMASK(3, 2)) >> 2)
+#define QSYS_SE_CFG_SE_EXC_ENA BIT(1)
+#define QSYS_SE_CFG_SE_EXC_FWD BIT(0)
+
+#define QSYS_SE_DWRR_CFG_GSZ 0x80
+#define QSYS_SE_DWRR_CFG_RSZ 0x4
+
+#define QSYS_SE_CONNECT_GSZ 0x80
+
+#define QSYS_SE_CONNECT_SE_OUTP_IDX(x) (((x) << 17) & GENMASK(24, 17))
+#define QSYS_SE_CONNECT_SE_OUTP_IDX_M GENMASK(24, 17)
+#define QSYS_SE_CONNECT_SE_OUTP_IDX_X(x) (((x) & GENMASK(24, 17)) >> 17)
+#define QSYS_SE_CONNECT_SE_INP_IDX(x) (((x) << 9) & GENMASK(16, 9))
+#define QSYS_SE_CONNECT_SE_INP_IDX_M GENMASK(16, 9)
+#define QSYS_SE_CONNECT_SE_INP_IDX_X(x) (((x) & GENMASK(16, 9)) >> 9)
+#define QSYS_SE_CONNECT_SE_OUTP_CON(x) (((x) << 5) & GENMASK(8, 5))
+#define QSYS_SE_CONNECT_SE_OUTP_CON_M GENMASK(8, 5)
+#define QSYS_SE_CONNECT_SE_OUTP_CON_X(x) (((x) & GENMASK(8, 5)) >> 5)
+#define QSYS_SE_CONNECT_SE_INP_CNT(x) (((x) << 1) & GENMASK(4, 1))
+#define QSYS_SE_CONNECT_SE_INP_CNT_M GENMASK(4, 1)
+#define QSYS_SE_CONNECT_SE_INP_CNT_X(x) (((x) & GENMASK(4, 1)) >> 1)
+#define QSYS_SE_CONNECT_SE_TERMINAL BIT(0)
+
+#define QSYS_SE_DLB_SENSE_GSZ 0x80
+
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO(x) (((x) << 11) & GENMASK(13, 11))
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_M GENMASK(13, 11)
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_X(x) (((x) & GENMASK(13, 11)) >> 11)
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT(x) (((x) << 7) & GENMASK(10, 7))
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_M GENMASK(10, 7)
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_X(x) (((x) & GENMASK(10, 7)) >> 7)
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT(x) (((x) << 3) & GENMASK(6, 3))
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_M GENMASK(6, 3)
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_X(x) (((x) & GENMASK(6, 3)) >> 3)
+#define QSYS_SE_DLB_SENSE_SE_DLB_PRIO_ENA BIT(2)
+#define QSYS_SE_DLB_SENSE_SE_DLB_SPORT_ENA BIT(1)
+#define QSYS_SE_DLB_SENSE_SE_DLB_DPORT_ENA BIT(0)
+
+#define QSYS_CIR_STATE_GSZ 0x80
+
+#define QSYS_CIR_STATE_CIR_LVL(x) (((x) << 4) & GENMASK(25, 4))
+#define QSYS_CIR_STATE_CIR_LVL_M GENMASK(25, 4)
+#define QSYS_CIR_STATE_CIR_LVL_X(x) (((x) & GENMASK(25, 4)) >> 4)
+#define QSYS_CIR_STATE_SHP_TIME(x) ((x) & GENMASK(3, 0))
+#define QSYS_CIR_STATE_SHP_TIME_M GENMASK(3, 0)
+
+#define QSYS_EIR_STATE_GSZ 0x80
+
+#define QSYS_SE_STATE_GSZ 0x80
+
+#define QSYS_SE_STATE_SE_OUTP_LVL(x) (((x) << 1) & GENMASK(2, 1))
+#define QSYS_SE_STATE_SE_OUTP_LVL_M GENMASK(2, 1)
+#define QSYS_SE_STATE_SE_OUTP_LVL_X(x) (((x) & GENMASK(2, 1)) >> 1)
+#define QSYS_SE_STATE_SE_WAS_YEL BIT(0)
+
+#define QSYS_HSCH_MISC_CFG_SE_CONNECT_VLD BIT(8)
+#define QSYS_HSCH_MISC_CFG_FRM_ADJ(x) (((x) << 3) & GENMASK(7, 3))
+#define QSYS_HSCH_MISC_CFG_FRM_ADJ_M GENMASK(7, 3)
+#define QSYS_HSCH_MISC_CFG_FRM_ADJ_X(x) (((x) & GENMASK(7, 3)) >> 3)
+#define QSYS_HSCH_MISC_CFG_LEAK_DIS BIT(2)
+#define QSYS_HSCH_MISC_CFG_QSHP_EXC_ENA BIT(1)
+#define QSYS_HSCH_MISC_CFG_PFC_BYP_UPD BIT(0)
+
+#define QSYS_TAG_CONFIG_RSZ 0x4
+
+#define QSYS_TAG_CONFIG_ENABLE BIT(0)
+#define QSYS_TAG_CONFIG_LINK_SPEED(x) (((x) << 4) & GENMASK(5, 4))
+#define QSYS_TAG_CONFIG_LINK_SPEED_M GENMASK(5, 4)
+#define QSYS_TAG_CONFIG_LINK_SPEED_X(x) (((x) & GENMASK(5, 4)) >> 4)
+#define QSYS_TAG_CONFIG_INIT_GATE_STATE(x) (((x) << 8) & GENMASK(15, 8))
+#define QSYS_TAG_CONFIG_INIT_GATE_STATE_M GENMASK(15, 8)
+#define QSYS_TAG_CONFIG_INIT_GATE_STATE_X(x) (((x) & GENMASK(15, 8)) >> 8)
+#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES(x) (((x) << 16) & GENMASK(23, 16))
+#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES_M GENMASK(23, 16)
+#define QSYS_TAG_CONFIG_SCH_TRAFFIC_QUEUES_X(x) (((x) & GENMASK(23, 16)) >> 16)
+
+#define QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM(x) ((x) & GENMASK(7, 0))
+#define QSYS_TAS_PARAM_CFG_CTRL_PORT_NUM_M GENMASK(7, 0)
+#define QSYS_TAS_PARAM_CFG_CTRL_ALWAYS_GUARD_BAND_SCH_Q BIT(8)
+#define QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE BIT(16)
+
+#define QSYS_PORT_MAX_SDU_RSZ 0x4
+
+#define QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0))
+#define QSYS_PARAM_CFG_REG_3_BASE_TIME_SEC_MSB_M GENMASK(15, 0)
+#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH(x) (((x) << 16) & GENMASK(31, 16))
+#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH_M GENMASK(31, 16)
+#define QSYS_PARAM_CFG_REG_3_LIST_LENGTH_X(x) (((x) & GENMASK(31, 16)) >> 16)
+
+#define QSYS_GCL_CFG_REG_1_GCL_ENTRY_NUM(x) ((x) & GENMASK(5, 0))
+#define QSYS_GCL_CFG_REG_1_GCL_ENTRY_NUM_M GENMASK(5, 0)
+#define QSYS_GCL_CFG_REG_1_GATE_STATE(x) (((x) << 8) & GENMASK(15, 8))
+#define QSYS_GCL_CFG_REG_1_GATE_STATE_M GENMASK(15, 8)
+#define QSYS_GCL_CFG_REG_1_GATE_STATE_X(x) (((x) & GENMASK(15, 8)) >> 8)
+
+#define QSYS_PARAM_STATUS_REG_3_BASE_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0))
+#define QSYS_PARAM_STATUS_REG_3_BASE_TIME_SEC_MSB_M GENMASK(15, 0)
+#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH(x) (((x) << 16) & GENMASK(31, 16))
+#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_M GENMASK(31, 16)
+#define QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_X(x) (((x) & GENMASK(31, 16)) >> 16)
+
+#define QSYS_PARAM_STATUS_REG_8_CFG_CHG_TIME_SEC_MSB(x) ((x) & GENMASK(15, 0))
+#define QSYS_PARAM_STATUS_REG_8_CFG_CHG_TIME_SEC_MSB_M GENMASK(15, 0)
+#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE(x) (((x) << 16) & GENMASK(23, 16))
+#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE_M GENMASK(23, 16)
+#define QSYS_PARAM_STATUS_REG_8_OPER_GATE_STATE_X(x) (((x) & GENMASK(23, 16)) >> 16)
+#define QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING BIT(24)
+
+#define QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM(x) ((x) & GENMASK(5, 0))
+#define QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM_M GENMASK(5, 0)
+#define QSYS_GCL_STATUS_REG_1_GATE_STATE(x) (((x) << 8) & GENMASK(15, 8))
+#define QSYS_GCL_STATUS_REG_1_GATE_STATE_M GENMASK(15, 8)
+#define QSYS_GCL_STATUS_REG_1_GATE_STATE_X(x) (((x) & GENMASK(15, 8)) >> 8)
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_regs.c b/drivers/net/ethernet/mscc/ocelot_regs.c
new file mode 100644
index 0000000..e334b40
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_regs.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+#include "ocelot.h"
+
+static const u32 ocelot_ana_regmap[] = {
+ REG(ANA_ADVLEARN, 0x009000),
+ REG(ANA_VLANMASK, 0x009004),
+ REG(ANA_PORT_B_DOMAIN, 0x009008),
+ REG(ANA_ANAGEFIL, 0x00900c),
+ REG(ANA_ANEVENTS, 0x009010),
+ REG(ANA_STORMLIMIT_BURST, 0x009014),
+ REG(ANA_STORMLIMIT_CFG, 0x009018),
+ REG(ANA_ISOLATED_PORTS, 0x009028),
+ REG(ANA_COMMUNITY_PORTS, 0x00902c),
+ REG(ANA_AUTOAGE, 0x009030),
+ REG(ANA_MACTOPTIONS, 0x009034),
+ REG(ANA_LEARNDISC, 0x009038),
+ REG(ANA_AGENCTRL, 0x00903c),
+ REG(ANA_MIRRORPORTS, 0x009040),
+ REG(ANA_EMIRRORPORTS, 0x009044),
+ REG(ANA_FLOODING, 0x009048),
+ REG(ANA_FLOODING_IPMC, 0x00904c),
+ REG(ANA_SFLOW_CFG, 0x009050),
+ REG(ANA_PORT_MODE, 0x009080),
+ REG(ANA_PGID_PGID, 0x008c00),
+ REG(ANA_TABLES_ANMOVED, 0x008b30),
+ REG(ANA_TABLES_MACHDATA, 0x008b34),
+ REG(ANA_TABLES_MACLDATA, 0x008b38),
+ REG(ANA_TABLES_MACACCESS, 0x008b3c),
+ REG(ANA_TABLES_MACTINDX, 0x008b40),
+ REG(ANA_TABLES_VLANACCESS, 0x008b44),
+ REG(ANA_TABLES_VLANTIDX, 0x008b48),
+ REG(ANA_TABLES_ISDXACCESS, 0x008b4c),
+ REG(ANA_TABLES_ISDXTIDX, 0x008b50),
+ REG(ANA_TABLES_ENTRYLIM, 0x008b00),
+ REG(ANA_TABLES_PTP_ID_HIGH, 0x008b54),
+ REG(ANA_TABLES_PTP_ID_LOW, 0x008b58),
+ REG(ANA_MSTI_STATE, 0x008e00),
+ REG(ANA_PORT_VLAN_CFG, 0x007000),
+ REG(ANA_PORT_DROP_CFG, 0x007004),
+ REG(ANA_PORT_QOS_CFG, 0x007008),
+ REG(ANA_PORT_VCAP_CFG, 0x00700c),
+ REG(ANA_PORT_VCAP_S1_KEY_CFG, 0x007010),
+ REG(ANA_PORT_VCAP_S2_CFG, 0x00701c),
+ REG(ANA_PORT_PCP_DEI_MAP, 0x007020),
+ REG(ANA_PORT_CPU_FWD_CFG, 0x007060),
+ REG(ANA_PORT_CPU_FWD_BPDU_CFG, 0x007064),
+ REG(ANA_PORT_CPU_FWD_GARP_CFG, 0x007068),
+ REG(ANA_PORT_CPU_FWD_CCM_CFG, 0x00706c),
+ REG(ANA_PORT_PORT_CFG, 0x007070),
+ REG(ANA_PORT_POL_CFG, 0x007074),
+ REG(ANA_PORT_PTP_CFG, 0x007078),
+ REG(ANA_PORT_PTP_DLY1_CFG, 0x00707c),
+ REG(ANA_OAM_UPM_LM_CNT, 0x007c00),
+ REG(ANA_PORT_PTP_DLY2_CFG, 0x007080),
+ REG(ANA_PFC_PFC_CFG, 0x008800),
+ REG(ANA_PFC_PFC_TIMER, 0x008804),
+ REG(ANA_IPT_OAM_MEP_CFG, 0x008000),
+ REG(ANA_IPT_IPT, 0x008004),
+ REG(ANA_PPT_PPT, 0x008ac0),
+ REG(ANA_FID_MAP_FID_MAP, 0x000000),
+ REG(ANA_AGGR_CFG, 0x0090b4),
+ REG(ANA_CPUQ_CFG, 0x0090b8),
+ REG(ANA_CPUQ_CFG2, 0x0090bc),
+ REG(ANA_CPUQ_8021_CFG, 0x0090c0),
+ REG(ANA_DSCP_CFG, 0x009100),
+ REG(ANA_DSCP_REWR_CFG, 0x009200),
+ REG(ANA_VCAP_RNG_TYPE_CFG, 0x009240),
+ REG(ANA_VCAP_RNG_VAL_CFG, 0x009260),
+ REG(ANA_VRAP_CFG, 0x009280),
+ REG(ANA_VRAP_HDR_DATA, 0x009284),
+ REG(ANA_VRAP_HDR_MASK, 0x009288),
+ REG(ANA_DISCARD_CFG, 0x00928c),
+ REG(ANA_FID_CFG, 0x009290),
+ REG(ANA_POL_PIR_CFG, 0x004000),
+ REG(ANA_POL_CIR_CFG, 0x004004),
+ REG(ANA_POL_MODE_CFG, 0x004008),
+ REG(ANA_POL_PIR_STATE, 0x00400c),
+ REG(ANA_POL_CIR_STATE, 0x004010),
+ REG(ANA_POL_STATE, 0x004014),
+ REG(ANA_POL_FLOWC, 0x008b80),
+ REG(ANA_POL_HYST, 0x008bec),
+ REG(ANA_POL_MISC_CFG, 0x008bf0),
+};
+
+static const u32 ocelot_qs_regmap[] = {
+ REG(QS_XTR_GRP_CFG, 0x000000),
+ REG(QS_XTR_RD, 0x000008),
+ REG(QS_XTR_FRM_PRUNING, 0x000010),
+ REG(QS_XTR_FLUSH, 0x000018),
+ REG(QS_XTR_DATA_PRESENT, 0x00001c),
+ REG(QS_XTR_CFG, 0x000020),
+ REG(QS_INJ_GRP_CFG, 0x000024),
+ REG(QS_INJ_WR, 0x00002c),
+ REG(QS_INJ_CTRL, 0x000034),
+ REG(QS_INJ_STATUS, 0x00003c),
+ REG(QS_INJ_ERR, 0x000040),
+ REG(QS_INH_DBG, 0x000048),
+};
+
+static const u32 ocelot_hsio_regmap[] = {
+ REG(HSIO_PLL5G_CFG0, 0x000000),
+ REG(HSIO_PLL5G_CFG1, 0x000004),
+ REG(HSIO_PLL5G_CFG2, 0x000008),
+ REG(HSIO_PLL5G_CFG3, 0x00000c),
+ REG(HSIO_PLL5G_CFG4, 0x000010),
+ REG(HSIO_PLL5G_CFG5, 0x000014),
+ REG(HSIO_PLL5G_CFG6, 0x000018),
+ REG(HSIO_PLL5G_STATUS0, 0x00001c),
+ REG(HSIO_PLL5G_STATUS1, 0x000020),
+ REG(HSIO_PLL5G_BIST_CFG0, 0x000024),
+ REG(HSIO_PLL5G_BIST_CFG1, 0x000028),
+ REG(HSIO_PLL5G_BIST_CFG2, 0x00002c),
+ REG(HSIO_PLL5G_BIST_STAT0, 0x000030),
+ REG(HSIO_PLL5G_BIST_STAT1, 0x000034),
+ REG(HSIO_RCOMP_CFG0, 0x000038),
+ REG(HSIO_RCOMP_STATUS, 0x00003c),
+ REG(HSIO_SYNC_ETH_CFG, 0x000040),
+ REG(HSIO_SYNC_ETH_PLL_CFG, 0x000048),
+ REG(HSIO_S1G_DES_CFG, 0x00004c),
+ REG(HSIO_S1G_IB_CFG, 0x000050),
+ REG(HSIO_S1G_OB_CFG, 0x000054),
+ REG(HSIO_S1G_SER_CFG, 0x000058),
+ REG(HSIO_S1G_COMMON_CFG, 0x00005c),
+ REG(HSIO_S1G_PLL_CFG, 0x000060),
+ REG(HSIO_S1G_PLL_STATUS, 0x000064),
+ REG(HSIO_S1G_DFT_CFG0, 0x000068),
+ REG(HSIO_S1G_DFT_CFG1, 0x00006c),
+ REG(HSIO_S1G_DFT_CFG2, 0x000070),
+ REG(HSIO_S1G_TP_CFG, 0x000074),
+ REG(HSIO_S1G_RC_PLL_BIST_CFG, 0x000078),
+ REG(HSIO_S1G_MISC_CFG, 0x00007c),
+ REG(HSIO_S1G_DFT_STATUS, 0x000080),
+ REG(HSIO_S1G_MISC_STATUS, 0x000084),
+ REG(HSIO_MCB_S1G_ADDR_CFG, 0x000088),
+ REG(HSIO_S6G_DIG_CFG, 0x00008c),
+ REG(HSIO_S6G_DFT_CFG0, 0x000090),
+ REG(HSIO_S6G_DFT_CFG1, 0x000094),
+ REG(HSIO_S6G_DFT_CFG2, 0x000098),
+ REG(HSIO_S6G_TP_CFG0, 0x00009c),
+ REG(HSIO_S6G_TP_CFG1, 0x0000a0),
+ REG(HSIO_S6G_RC_PLL_BIST_CFG, 0x0000a4),
+ REG(HSIO_S6G_MISC_CFG, 0x0000a8),
+ REG(HSIO_S6G_OB_ANEG_CFG, 0x0000ac),
+ REG(HSIO_S6G_DFT_STATUS, 0x0000b0),
+ REG(HSIO_S6G_ERR_CNT, 0x0000b4),
+ REG(HSIO_S6G_MISC_STATUS, 0x0000b8),
+ REG(HSIO_S6G_DES_CFG, 0x0000bc),
+ REG(HSIO_S6G_IB_CFG, 0x0000c0),
+ REG(HSIO_S6G_IB_CFG1, 0x0000c4),
+ REG(HSIO_S6G_IB_CFG2, 0x0000c8),
+ REG(HSIO_S6G_IB_CFG3, 0x0000cc),
+ REG(HSIO_S6G_IB_CFG4, 0x0000d0),
+ REG(HSIO_S6G_IB_CFG5, 0x0000d4),
+ REG(HSIO_S6G_OB_CFG, 0x0000d8),
+ REG(HSIO_S6G_OB_CFG1, 0x0000dc),
+ REG(HSIO_S6G_SER_CFG, 0x0000e0),
+ REG(HSIO_S6G_COMMON_CFG, 0x0000e4),
+ REG(HSIO_S6G_PLL_CFG, 0x0000e8),
+ REG(HSIO_S6G_ACJTAG_CFG, 0x0000ec),
+ REG(HSIO_S6G_GP_CFG, 0x0000f0),
+ REG(HSIO_S6G_IB_STATUS0, 0x0000f4),
+ REG(HSIO_S6G_IB_STATUS1, 0x0000f8),
+ REG(HSIO_S6G_ACJTAG_STATUS, 0x0000fc),
+ REG(HSIO_S6G_PLL_STATUS, 0x000100),
+ REG(HSIO_S6G_REVID, 0x000104),
+ REG(HSIO_MCB_S6G_ADDR_CFG, 0x000108),
+ REG(HSIO_HW_CFG, 0x00010c),
+ REG(HSIO_HW_QSGMII_CFG, 0x000110),
+ REG(HSIO_HW_QSGMII_STAT, 0x000114),
+ REG(HSIO_CLK_CFG, 0x000118),
+ REG(HSIO_TEMP_SENSOR_CTRL, 0x00011c),
+ REG(HSIO_TEMP_SENSOR_CFG, 0x000120),
+ REG(HSIO_TEMP_SENSOR_STAT, 0x000124),
+};
+
+static const u32 ocelot_qsys_regmap[] = {
+ REG(QSYS_PORT_MODE, 0x011200),
+ REG(QSYS_SWITCH_PORT_MODE, 0x011234),
+ REG(QSYS_STAT_CNT_CFG, 0x011264),
+ REG(QSYS_EEE_CFG, 0x011268),
+ REG(QSYS_EEE_THRES, 0x011294),
+ REG(QSYS_IGR_NO_SHARING, 0x011298),
+ REG(QSYS_EGR_NO_SHARING, 0x01129c),
+ REG(QSYS_SW_STATUS, 0x0112a0),
+ REG(QSYS_EXT_CPU_CFG, 0x0112d0),
+ REG(QSYS_PAD_CFG, 0x0112d4),
+ REG(QSYS_CPU_GROUP_MAP, 0x0112d8),
+ REG(QSYS_QMAP, 0x0112dc),
+ REG(QSYS_ISDX_SGRP, 0x011400),
+ REG(QSYS_TIMED_FRAME_ENTRY, 0x014000),
+ REG(QSYS_TFRM_MISC, 0x011310),
+ REG(QSYS_TFRM_PORT_DLY, 0x011314),
+ REG(QSYS_TFRM_TIMER_CFG_1, 0x011318),
+ REG(QSYS_TFRM_TIMER_CFG_2, 0x01131c),
+ REG(QSYS_TFRM_TIMER_CFG_3, 0x011320),
+ REG(QSYS_TFRM_TIMER_CFG_4, 0x011324),
+ REG(QSYS_TFRM_TIMER_CFG_5, 0x011328),
+ REG(QSYS_TFRM_TIMER_CFG_6, 0x01132c),
+ REG(QSYS_TFRM_TIMER_CFG_7, 0x011330),
+ REG(QSYS_TFRM_TIMER_CFG_8, 0x011334),
+ REG(QSYS_RED_PROFILE, 0x011338),
+ REG(QSYS_RES_QOS_MODE, 0x011378),
+ REG(QSYS_RES_CFG, 0x012000),
+ REG(QSYS_RES_STAT, 0x012004),
+ REG(QSYS_EGR_DROP_MODE, 0x01137c),
+ REG(QSYS_EQ_CTRL, 0x011380),
+ REG(QSYS_EVENTS_CORE, 0x011384),
+ REG(QSYS_CIR_CFG, 0x000000),
+ REG(QSYS_EIR_CFG, 0x000004),
+ REG(QSYS_SE_CFG, 0x000008),
+ REG(QSYS_SE_DWRR_CFG, 0x00000c),
+ REG(QSYS_SE_CONNECT, 0x00003c),
+ REG(QSYS_SE_DLB_SENSE, 0x000040),
+ REG(QSYS_CIR_STATE, 0x000044),
+ REG(QSYS_EIR_STATE, 0x000048),
+ REG(QSYS_SE_STATE, 0x00004c),
+ REG(QSYS_HSCH_MISC_CFG, 0x011388),
+};
+
+static const u32 ocelot_rew_regmap[] = {
+ REG(REW_PORT_VLAN_CFG, 0x000000),
+ REG(REW_TAG_CFG, 0x000004),
+ REG(REW_PORT_CFG, 0x000008),
+ REG(REW_DSCP_CFG, 0x00000c),
+ REG(REW_PCP_DEI_QOS_MAP_CFG, 0x000010),
+ REG(REW_PTP_CFG, 0x000050),
+ REG(REW_PTP_DLY1_CFG, 0x000054),
+ REG(REW_DSCP_REMAP_DP1_CFG, 0x000690),
+ REG(REW_DSCP_REMAP_CFG, 0x000790),
+ REG(REW_STAT_CFG, 0x000890),
+ REG(REW_PPT, 0x000680),
+};
+
+static const u32 ocelot_sys_regmap[] = {
+ REG(SYS_COUNT_RX_OCTETS, 0x000000),
+ REG(SYS_COUNT_RX_UNICAST, 0x000004),
+ REG(SYS_COUNT_RX_MULTICAST, 0x000008),
+ REG(SYS_COUNT_RX_BROADCAST, 0x00000c),
+ REG(SYS_COUNT_RX_SHORTS, 0x000010),
+ REG(SYS_COUNT_RX_FRAGMENTS, 0x000014),
+ REG(SYS_COUNT_RX_JABBERS, 0x000018),
+ REG(SYS_COUNT_RX_CRC_ALIGN_ERRS, 0x00001c),
+ REG(SYS_COUNT_RX_SYM_ERRS, 0x000020),
+ REG(SYS_COUNT_RX_64, 0x000024),
+ REG(SYS_COUNT_RX_65_127, 0x000028),
+ REG(SYS_COUNT_RX_128_255, 0x00002c),
+ REG(SYS_COUNT_RX_256_1023, 0x000030),
+ REG(SYS_COUNT_RX_1024_1526, 0x000034),
+ REG(SYS_COUNT_RX_1527_MAX, 0x000038),
+ REG(SYS_COUNT_RX_PAUSE, 0x00003c),
+ REG(SYS_COUNT_RX_CONTROL, 0x000040),
+ REG(SYS_COUNT_RX_LONGS, 0x000044),
+ REG(SYS_COUNT_RX_CLASSIFIED_DROPS, 0x000048),
+ REG(SYS_COUNT_TX_OCTETS, 0x000100),
+ REG(SYS_COUNT_TX_UNICAST, 0x000104),
+ REG(SYS_COUNT_TX_MULTICAST, 0x000108),
+ REG(SYS_COUNT_TX_BROADCAST, 0x00010c),
+ REG(SYS_COUNT_TX_COLLISION, 0x000110),
+ REG(SYS_COUNT_TX_DROPS, 0x000114),
+ REG(SYS_COUNT_TX_PAUSE, 0x000118),
+ REG(SYS_COUNT_TX_64, 0x00011c),
+ REG(SYS_COUNT_TX_65_127, 0x000120),
+ REG(SYS_COUNT_TX_128_511, 0x000124),
+ REG(SYS_COUNT_TX_512_1023, 0x000128),
+ REG(SYS_COUNT_TX_1024_1526, 0x00012c),
+ REG(SYS_COUNT_TX_1527_MAX, 0x000130),
+ REG(SYS_COUNT_TX_AGING, 0x000170),
+ REG(SYS_RESET_CFG, 0x000508),
+ REG(SYS_CMID, 0x00050c),
+ REG(SYS_VLAN_ETYPE_CFG, 0x000510),
+ REG(SYS_PORT_MODE, 0x000514),
+ REG(SYS_FRONT_PORT_MODE, 0x000548),
+ REG(SYS_FRM_AGING, 0x000574),
+ REG(SYS_STAT_CFG, 0x000578),
+ REG(SYS_SW_STATUS, 0x00057c),
+ REG(SYS_MISC_CFG, 0x0005ac),
+ REG(SYS_REW_MAC_HIGH_CFG, 0x0005b0),
+ REG(SYS_REW_MAC_LOW_CFG, 0x0005dc),
+ REG(SYS_CM_ADDR, 0x000500),
+ REG(SYS_CM_DATA, 0x000504),
+ REG(SYS_PAUSE_CFG, 0x000608),
+ REG(SYS_PAUSE_TOT_CFG, 0x000638),
+ REG(SYS_ATOP, 0x00063c),
+ REG(SYS_ATOP_TOT_CFG, 0x00066c),
+ REG(SYS_MAC_FC_CFG, 0x000670),
+ REG(SYS_MMGT, 0x00069c),
+ REG(SYS_MMGT_FAST, 0x0006a0),
+ REG(SYS_EVENTS_DIF, 0x0006a4),
+ REG(SYS_EVENTS_CORE, 0x0006b4),
+ REG(SYS_CNT, 0x000000),
+ REG(SYS_PTP_STATUS, 0x0006b8),
+ REG(SYS_PTP_TXSTAMP, 0x0006bc),
+ REG(SYS_PTP_NXT, 0x0006c0),
+ REG(SYS_PTP_CFG, 0x0006c4),
+};
+
+static const u32 *ocelot_regmap[] = {
+ [ANA] = ocelot_ana_regmap,
+ [QS] = ocelot_qs_regmap,
+ [HSIO] = ocelot_hsio_regmap,
+ [QSYS] = ocelot_qsys_regmap,
+ [REW] = ocelot_rew_regmap,
+ [SYS] = ocelot_sys_regmap,
+};
+
+static const struct reg_field ocelot_regfields[] = {
+ [ANA_ADVLEARN_VLAN_CHK] = REG_FIELD(ANA_ADVLEARN, 11, 11),
+ [ANA_ADVLEARN_LEARN_MIRROR] = REG_FIELD(ANA_ADVLEARN, 0, 10),
+ [ANA_ANEVENTS_MSTI_DROP] = REG_FIELD(ANA_ANEVENTS, 27, 27),
+ [ANA_ANEVENTS_ACLKILL] = REG_FIELD(ANA_ANEVENTS, 26, 26),
+ [ANA_ANEVENTS_ACLUSED] = REG_FIELD(ANA_ANEVENTS, 25, 25),
+ [ANA_ANEVENTS_AUTOAGE] = REG_FIELD(ANA_ANEVENTS, 24, 24),
+ [ANA_ANEVENTS_VS2TTL1] = REG_FIELD(ANA_ANEVENTS, 23, 23),
+ [ANA_ANEVENTS_STORM_DROP] = REG_FIELD(ANA_ANEVENTS, 22, 22),
+ [ANA_ANEVENTS_LEARN_DROP] = REG_FIELD(ANA_ANEVENTS, 21, 21),
+ [ANA_ANEVENTS_AGED_ENTRY] = REG_FIELD(ANA_ANEVENTS, 20, 20),
+ [ANA_ANEVENTS_CPU_LEARN_FAILED] = REG_FIELD(ANA_ANEVENTS, 19, 19),
+ [ANA_ANEVENTS_AUTO_LEARN_FAILED] = REG_FIELD(ANA_ANEVENTS, 18, 18),
+ [ANA_ANEVENTS_LEARN_REMOVE] = REG_FIELD(ANA_ANEVENTS, 17, 17),
+ [ANA_ANEVENTS_AUTO_LEARNED] = REG_FIELD(ANA_ANEVENTS, 16, 16),
+ [ANA_ANEVENTS_AUTO_MOVED] = REG_FIELD(ANA_ANEVENTS, 15, 15),
+ [ANA_ANEVENTS_DROPPED] = REG_FIELD(ANA_ANEVENTS, 14, 14),
+ [ANA_ANEVENTS_CLASSIFIED_DROP] = REG_FIELD(ANA_ANEVENTS, 13, 13),
+ [ANA_ANEVENTS_CLASSIFIED_COPY] = REG_FIELD(ANA_ANEVENTS, 12, 12),
+ [ANA_ANEVENTS_VLAN_DISCARD] = REG_FIELD(ANA_ANEVENTS, 11, 11),
+ [ANA_ANEVENTS_FWD_DISCARD] = REG_FIELD(ANA_ANEVENTS, 10, 10),
+ [ANA_ANEVENTS_MULTICAST_FLOOD] = REG_FIELD(ANA_ANEVENTS, 9, 9),
+ [ANA_ANEVENTS_UNICAST_FLOOD] = REG_FIELD(ANA_ANEVENTS, 8, 8),
+ [ANA_ANEVENTS_DEST_KNOWN] = REG_FIELD(ANA_ANEVENTS, 7, 7),
+ [ANA_ANEVENTS_BUCKET3_MATCH] = REG_FIELD(ANA_ANEVENTS, 6, 6),
+ [ANA_ANEVENTS_BUCKET2_MATCH] = REG_FIELD(ANA_ANEVENTS, 5, 5),
+ [ANA_ANEVENTS_BUCKET1_MATCH] = REG_FIELD(ANA_ANEVENTS, 4, 4),
+ [ANA_ANEVENTS_BUCKET0_MATCH] = REG_FIELD(ANA_ANEVENTS, 3, 3),
+ [ANA_ANEVENTS_CPU_OPERATION] = REG_FIELD(ANA_ANEVENTS, 2, 2),
+ [ANA_ANEVENTS_DMAC_LOOKUP] = REG_FIELD(ANA_ANEVENTS, 1, 1),
+ [ANA_ANEVENTS_SMAC_LOOKUP] = REG_FIELD(ANA_ANEVENTS, 0, 0),
+ [ANA_TABLES_MACACCESS_B_DOM] = REG_FIELD(ANA_TABLES_MACACCESS, 18, 18),
+ [ANA_TABLES_MACTINDX_BUCKET] = REG_FIELD(ANA_TABLES_MACTINDX, 10, 11),
+ [ANA_TABLES_MACTINDX_M_INDEX] = REG_FIELD(ANA_TABLES_MACTINDX, 0, 9),
+ [QSYS_TIMED_FRAME_ENTRY_TFRM_VLD] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 20, 20),
+ [QSYS_TIMED_FRAME_ENTRY_TFRM_FP] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 8, 19),
+ [QSYS_TIMED_FRAME_ENTRY_TFRM_PORTNO] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 4, 7),
+ [QSYS_TIMED_FRAME_ENTRY_TFRM_TM_SEL] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 1, 3),
+ [QSYS_TIMED_FRAME_ENTRY_TFRM_TM_T] = REG_FIELD(QSYS_TIMED_FRAME_ENTRY, 0, 0),
+ [SYS_RESET_CFG_CORE_ENA] = REG_FIELD(SYS_RESET_CFG, 2, 2),
+ [SYS_RESET_CFG_MEM_ENA] = REG_FIELD(SYS_RESET_CFG, 1, 1),
+ [SYS_RESET_CFG_MEM_INIT] = REG_FIELD(SYS_RESET_CFG, 0, 0),
+};
+
+static const struct ocelot_stat_layout ocelot_stats_layout[] = {
+ { .name = "rx_octets", .offset = 0x00, },
+ { .name = "rx_unicast", .offset = 0x01, },
+ { .name = "rx_multicast", .offset = 0x02, },
+ { .name = "rx_broadcast", .offset = 0x03, },
+ { .name = "rx_shorts", .offset = 0x04, },
+ { .name = "rx_fragments", .offset = 0x05, },
+ { .name = "rx_jabbers", .offset = 0x06, },
+ { .name = "rx_crc_align_errs", .offset = 0x07, },
+ { .name = "rx_sym_errs", .offset = 0x08, },
+ { .name = "rx_frames_below_65_octets", .offset = 0x09, },
+ { .name = "rx_frames_65_to_127_octets", .offset = 0x0A, },
+ { .name = "rx_frames_128_to_255_octets", .offset = 0x0B, },
+ { .name = "rx_frames_256_to_511_octets", .offset = 0x0C, },
+ { .name = "rx_frames_512_to_1023_octets", .offset = 0x0D, },
+ { .name = "rx_frames_1024_to_1526_octets", .offset = 0x0E, },
+ { .name = "rx_frames_over_1526_octets", .offset = 0x0F, },
+ { .name = "rx_pause", .offset = 0x10, },
+ { .name = "rx_control", .offset = 0x11, },
+ { .name = "rx_longs", .offset = 0x12, },
+ { .name = "rx_classified_drops", .offset = 0x13, },
+ { .name = "rx_red_prio_0", .offset = 0x14, },
+ { .name = "rx_red_prio_1", .offset = 0x15, },
+ { .name = "rx_red_prio_2", .offset = 0x16, },
+ { .name = "rx_red_prio_3", .offset = 0x17, },
+ { .name = "rx_red_prio_4", .offset = 0x18, },
+ { .name = "rx_red_prio_5", .offset = 0x19, },
+ { .name = "rx_red_prio_6", .offset = 0x1A, },
+ { .name = "rx_red_prio_7", .offset = 0x1B, },
+ { .name = "rx_yellow_prio_0", .offset = 0x1C, },
+ { .name = "rx_yellow_prio_1", .offset = 0x1D, },
+ { .name = "rx_yellow_prio_2", .offset = 0x1E, },
+ { .name = "rx_yellow_prio_3", .offset = 0x1F, },
+ { .name = "rx_yellow_prio_4", .offset = 0x20, },
+ { .name = "rx_yellow_prio_5", .offset = 0x21, },
+ { .name = "rx_yellow_prio_6", .offset = 0x22, },
+ { .name = "rx_yellow_prio_7", .offset = 0x23, },
+ { .name = "rx_green_prio_0", .offset = 0x24, },
+ { .name = "rx_green_prio_1", .offset = 0x25, },
+ { .name = "rx_green_prio_2", .offset = 0x26, },
+ { .name = "rx_green_prio_3", .offset = 0x27, },
+ { .name = "rx_green_prio_4", .offset = 0x28, },
+ { .name = "rx_green_prio_5", .offset = 0x29, },
+ { .name = "rx_green_prio_6", .offset = 0x2A, },
+ { .name = "rx_green_prio_7", .offset = 0x2B, },
+ { .name = "tx_octets", .offset = 0x40, },
+ { .name = "tx_unicast", .offset = 0x41, },
+ { .name = "tx_multicast", .offset = 0x42, },
+ { .name = "tx_broadcast", .offset = 0x43, },
+ { .name = "tx_collision", .offset = 0x44, },
+ { .name = "tx_drops", .offset = 0x45, },
+ { .name = "tx_pause", .offset = 0x46, },
+ { .name = "tx_frames_below_65_octets", .offset = 0x47, },
+ { .name = "tx_frames_65_to_127_octets", .offset = 0x48, },
+ { .name = "tx_frames_128_255_octets", .offset = 0x49, },
+ { .name = "tx_frames_256_511_octets", .offset = 0x4A, },
+ { .name = "tx_frames_512_1023_octets", .offset = 0x4B, },
+ { .name = "tx_frames_1024_1526_octets", .offset = 0x4C, },
+ { .name = "tx_frames_over_1526_octets", .offset = 0x4D, },
+ { .name = "tx_yellow_prio_0", .offset = 0x4E, },
+ { .name = "tx_yellow_prio_1", .offset = 0x4F, },
+ { .name = "tx_yellow_prio_2", .offset = 0x50, },
+ { .name = "tx_yellow_prio_3", .offset = 0x51, },
+ { .name = "tx_yellow_prio_4", .offset = 0x52, },
+ { .name = "tx_yellow_prio_5", .offset = 0x53, },
+ { .name = "tx_yellow_prio_6", .offset = 0x54, },
+ { .name = "tx_yellow_prio_7", .offset = 0x55, },
+ { .name = "tx_green_prio_0", .offset = 0x56, },
+ { .name = "tx_green_prio_1", .offset = 0x57, },
+ { .name = "tx_green_prio_2", .offset = 0x58, },
+ { .name = "tx_green_prio_3", .offset = 0x59, },
+ { .name = "tx_green_prio_4", .offset = 0x5A, },
+ { .name = "tx_green_prio_5", .offset = 0x5B, },
+ { .name = "tx_green_prio_6", .offset = 0x5C, },
+ { .name = "tx_green_prio_7", .offset = 0x5D, },
+ { .name = "tx_aged", .offset = 0x5E, },
+ { .name = "drop_local", .offset = 0x80, },
+ { .name = "drop_tail", .offset = 0x81, },
+ { .name = "drop_yellow_prio_0", .offset = 0x82, },
+ { .name = "drop_yellow_prio_1", .offset = 0x83, },
+ { .name = "drop_yellow_prio_2", .offset = 0x84, },
+ { .name = "drop_yellow_prio_3", .offset = 0x85, },
+ { .name = "drop_yellow_prio_4", .offset = 0x86, },
+ { .name = "drop_yellow_prio_5", .offset = 0x87, },
+ { .name = "drop_yellow_prio_6", .offset = 0x88, },
+ { .name = "drop_yellow_prio_7", .offset = 0x89, },
+ { .name = "drop_green_prio_0", .offset = 0x8A, },
+ { .name = "drop_green_prio_1", .offset = 0x8B, },
+ { .name = "drop_green_prio_2", .offset = 0x8C, },
+ { .name = "drop_green_prio_3", .offset = 0x8D, },
+ { .name = "drop_green_prio_4", .offset = 0x8E, },
+ { .name = "drop_green_prio_5", .offset = 0x8F, },
+ { .name = "drop_green_prio_6", .offset = 0x90, },
+ { .name = "drop_green_prio_7", .offset = 0x91, },
+};
+
+static void ocelot_pll5_init(struct ocelot *ocelot)
+{
+ /* Configure PLL5. This will need a proper CCF driver
+ * The values are coming from the VTSS API for Ocelot
+ */
+ ocelot_write(ocelot, HSIO_PLL5G_CFG4_IB_CTRL(0x7600) |
+ HSIO_PLL5G_CFG4_IB_BIAS_CTRL(0x8), HSIO_PLL5G_CFG4);
+ ocelot_write(ocelot, HSIO_PLL5G_CFG0_CORE_CLK_DIV(0x11) |
+ HSIO_PLL5G_CFG0_CPU_CLK_DIV(2) |
+ HSIO_PLL5G_CFG0_ENA_BIAS |
+ HSIO_PLL5G_CFG0_ENA_VCO_BUF |
+ HSIO_PLL5G_CFG0_ENA_CP1 |
+ HSIO_PLL5G_CFG0_SELCPI(2) |
+ HSIO_PLL5G_CFG0_LOOP_BW_RES(0xe) |
+ HSIO_PLL5G_CFG0_SELBGV820(4) |
+ HSIO_PLL5G_CFG0_DIV4 |
+ HSIO_PLL5G_CFG0_ENA_CLKTREE |
+ HSIO_PLL5G_CFG0_ENA_LANE, HSIO_PLL5G_CFG0);
+ ocelot_write(ocelot, HSIO_PLL5G_CFG2_EN_RESET_FRQ_DET |
+ HSIO_PLL5G_CFG2_EN_RESET_OVERRUN |
+ HSIO_PLL5G_CFG2_GAIN_TEST(0x8) |
+ HSIO_PLL5G_CFG2_ENA_AMPCTRL |
+ HSIO_PLL5G_CFG2_PWD_AMPCTRL_N |
+ HSIO_PLL5G_CFG2_AMPC_SEL(0x10), HSIO_PLL5G_CFG2);
+}
+
+int ocelot_chip_init(struct ocelot *ocelot)
+{
+ int ret;
+
+ ocelot->map = ocelot_regmap;
+ ocelot->stats_layout = ocelot_stats_layout;
+ ocelot->num_stats = ARRAY_SIZE(ocelot_stats_layout);
+ ocelot->shared_queue_sz = 224 * 1024;
+
+ ret = ocelot_regfields_init(ocelot, ocelot_regfields);
+ if (ret)
+ return ret;
+
+ ocelot_pll5_init(ocelot);
+
+ eth_random_addr(ocelot->base_mac);
+ ocelot->base_mac[5] &= 0xf0;
+
+ return 0;
+}
+EXPORT_SYMBOL(ocelot_chip_init);
diff --git a/drivers/net/ethernet/mscc/ocelot_rew.h b/drivers/net/ethernet/mscc/ocelot_rew.h
new file mode 100644
index 0000000..210914b
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_rew.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_REW_H_
+#define _MSCC_OCELOT_REW_H_
+
+#define REW_PORT_VLAN_CFG_GSZ 0x80
+
+#define REW_PORT_VLAN_CFG_PORT_TPID(x) (((x) << 16) & GENMASK(31, 16))
+#define REW_PORT_VLAN_CFG_PORT_TPID_M GENMASK(31, 16)
+#define REW_PORT_VLAN_CFG_PORT_TPID_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define REW_PORT_VLAN_CFG_PORT_DEI BIT(15)
+#define REW_PORT_VLAN_CFG_PORT_PCP(x) (((x) << 12) & GENMASK(14, 12))
+#define REW_PORT_VLAN_CFG_PORT_PCP_M GENMASK(14, 12)
+#define REW_PORT_VLAN_CFG_PORT_PCP_X(x) (((x) & GENMASK(14, 12)) >> 12)
+#define REW_PORT_VLAN_CFG_PORT_VID(x) ((x) & GENMASK(11, 0))
+#define REW_PORT_VLAN_CFG_PORT_VID_M GENMASK(11, 0)
+
+#define REW_TAG_CFG_GSZ 0x80
+
+#define REW_TAG_CFG_TAG_CFG(x) (((x) << 7) & GENMASK(8, 7))
+#define REW_TAG_CFG_TAG_CFG_M GENMASK(8, 7)
+#define REW_TAG_CFG_TAG_CFG_X(x) (((x) & GENMASK(8, 7)) >> 7)
+#define REW_TAG_CFG_TAG_TPID_CFG(x) (((x) << 5) & GENMASK(6, 5))
+#define REW_TAG_CFG_TAG_TPID_CFG_M GENMASK(6, 5)
+#define REW_TAG_CFG_TAG_TPID_CFG_X(x) (((x) & GENMASK(6, 5)) >> 5)
+#define REW_TAG_CFG_TAG_VID_CFG BIT(4)
+#define REW_TAG_CFG_TAG_PCP_CFG(x) (((x) << 2) & GENMASK(3, 2))
+#define REW_TAG_CFG_TAG_PCP_CFG_M GENMASK(3, 2)
+#define REW_TAG_CFG_TAG_PCP_CFG_X(x) (((x) & GENMASK(3, 2)) >> 2)
+#define REW_TAG_CFG_TAG_DEI_CFG(x) ((x) & GENMASK(1, 0))
+#define REW_TAG_CFG_TAG_DEI_CFG_M GENMASK(1, 0)
+
+#define REW_PORT_CFG_GSZ 0x80
+
+#define REW_PORT_CFG_ES0_EN BIT(5)
+#define REW_PORT_CFG_FCS_UPDATE_NONCPU_CFG(x) (((x) << 3) & GENMASK(4, 3))
+#define REW_PORT_CFG_FCS_UPDATE_NONCPU_CFG_M GENMASK(4, 3)
+#define REW_PORT_CFG_FCS_UPDATE_NONCPU_CFG_X(x) (((x) & GENMASK(4, 3)) >> 3)
+#define REW_PORT_CFG_FCS_UPDATE_CPU_ENA BIT(2)
+#define REW_PORT_CFG_FLUSH_ENA BIT(1)
+#define REW_PORT_CFG_AGE_DIS BIT(0)
+
+#define REW_DSCP_CFG_GSZ 0x80
+
+#define REW_PCP_DEI_QOS_MAP_CFG_GSZ 0x80
+#define REW_PCP_DEI_QOS_MAP_CFG_RSZ 0x4
+
+#define REW_PCP_DEI_QOS_MAP_CFG_DEI_QOS_VAL BIT(3)
+#define REW_PCP_DEI_QOS_MAP_CFG_PCP_QOS_VAL(x) ((x) & GENMASK(2, 0))
+#define REW_PCP_DEI_QOS_MAP_CFG_PCP_QOS_VAL_M GENMASK(2, 0)
+
+#define REW_PTP_CFG_GSZ 0x80
+
+#define REW_PTP_CFG_PTP_BACKPLANE_MODE BIT(7)
+#define REW_PTP_CFG_GP_CFG_UNUSED(x) (((x) << 3) & GENMASK(6, 3))
+#define REW_PTP_CFG_GP_CFG_UNUSED_M GENMASK(6, 3)
+#define REW_PTP_CFG_GP_CFG_UNUSED_X(x) (((x) & GENMASK(6, 3)) >> 3)
+#define REW_PTP_CFG_PTP_1STEP_DIS BIT(2)
+#define REW_PTP_CFG_PTP_2STEP_DIS BIT(1)
+#define REW_PTP_CFG_PTP_UDP_KEEP BIT(0)
+
+#define REW_PTP_DLY1_CFG_GSZ 0x80
+
+#define REW_RED_TAG_CFG_GSZ 0x80
+
+#define REW_RED_TAG_CFG_RED_TAG_CFG BIT(0)
+
+#define REW_DSCP_REMAP_DP1_CFG_RSZ 0x4
+
+#define REW_DSCP_REMAP_CFG_RSZ 0x4
+
+#define REW_REW_STICKY_ES0_TAGB_PUSH_FAILED BIT(0)
+
+#define REW_PPT_RSZ 0x4
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_sys.h b/drivers/net/ethernet/mscc/ocelot_sys.h
new file mode 100644
index 0000000..16f91e1
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_sys.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi Ocelot Switch driver
+ *
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#ifndef _MSCC_OCELOT_SYS_H_
+#define _MSCC_OCELOT_SYS_H_
+
+#define SYS_COUNT_RX_OCTETS_RSZ 0x4
+
+#define SYS_COUNT_TX_OCTETS_RSZ 0x4
+
+#define SYS_PORT_MODE_RSZ 0x4
+
+#define SYS_PORT_MODE_DATA_WO_TS(x) (((x) << 5) & GENMASK(6, 5))
+#define SYS_PORT_MODE_DATA_WO_TS_M GENMASK(6, 5)
+#define SYS_PORT_MODE_DATA_WO_TS_X(x) (((x) & GENMASK(6, 5)) >> 5)
+#define SYS_PORT_MODE_INCL_INJ_HDR(x) (((x) << 3) & GENMASK(4, 3))
+#define SYS_PORT_MODE_INCL_INJ_HDR_M GENMASK(4, 3)
+#define SYS_PORT_MODE_INCL_INJ_HDR_X(x) (((x) & GENMASK(4, 3)) >> 3)
+#define SYS_PORT_MODE_INCL_XTR_HDR(x) (((x) << 1) & GENMASK(2, 1))
+#define SYS_PORT_MODE_INCL_XTR_HDR_M GENMASK(2, 1)
+#define SYS_PORT_MODE_INCL_XTR_HDR_X(x) (((x) & GENMASK(2, 1)) >> 1)
+#define SYS_PORT_MODE_INJ_HDR_ERR BIT(0)
+
+#define SYS_FRONT_PORT_MODE_RSZ 0x4
+
+#define SYS_FRONT_PORT_MODE_HDX_MODE BIT(0)
+
+#define SYS_FRM_AGING_AGE_TX_ENA BIT(20)
+#define SYS_FRM_AGING_MAX_AGE(x) ((x) & GENMASK(19, 0))
+#define SYS_FRM_AGING_MAX_AGE_M GENMASK(19, 0)
+
+#define SYS_STAT_CFG_STAT_CLEAR_SHOT(x) (((x) << 10) & GENMASK(16, 10))
+#define SYS_STAT_CFG_STAT_CLEAR_SHOT_M GENMASK(16, 10)
+#define SYS_STAT_CFG_STAT_CLEAR_SHOT_X(x) (((x) & GENMASK(16, 10)) >> 10)
+#define SYS_STAT_CFG_STAT_VIEW(x) ((x) & GENMASK(9, 0))
+#define SYS_STAT_CFG_STAT_VIEW_M GENMASK(9, 0)
+
+#define SYS_SW_STATUS_RSZ 0x4
+
+#define SYS_SW_STATUS_PORT_RX_PAUSED BIT(0)
+
+#define SYS_MISC_CFG_PTP_RSRV_CLR BIT(1)
+#define SYS_MISC_CFG_PTP_DIS_NEG_RO BIT(0)
+
+#define SYS_REW_MAC_HIGH_CFG_RSZ 0x4
+
+#define SYS_REW_MAC_LOW_CFG_RSZ 0x4
+
+#define SYS_TIMESTAMP_OFFSET_ETH_TYPE_CFG(x) (((x) << 6) & GENMASK(21, 6))
+#define SYS_TIMESTAMP_OFFSET_ETH_TYPE_CFG_M GENMASK(21, 6)
+#define SYS_TIMESTAMP_OFFSET_ETH_TYPE_CFG_X(x) (((x) & GENMASK(21, 6)) >> 6)
+#define SYS_TIMESTAMP_OFFSET_TIMESTAMP_OFFSET(x) ((x) & GENMASK(5, 0))
+#define SYS_TIMESTAMP_OFFSET_TIMESTAMP_OFFSET_M GENMASK(5, 0)
+
+#define SYS_PAUSE_CFG_RSZ 0x4
+
+#define SYS_PAUSE_CFG_PAUSE_START(x) (((x) << 10) & GENMASK(18, 10))
+#define SYS_PAUSE_CFG_PAUSE_START_M GENMASK(18, 10)
+#define SYS_PAUSE_CFG_PAUSE_START_X(x) (((x) & GENMASK(18, 10)) >> 10)
+#define SYS_PAUSE_CFG_PAUSE_STOP(x) (((x) << 1) & GENMASK(9, 1))
+#define SYS_PAUSE_CFG_PAUSE_STOP_M GENMASK(9, 1)
+#define SYS_PAUSE_CFG_PAUSE_STOP_X(x) (((x) & GENMASK(9, 1)) >> 1)
+#define SYS_PAUSE_CFG_PAUSE_ENA BIT(0)
+
+#define SYS_PAUSE_TOT_CFG_PAUSE_TOT_START(x) (((x) << 9) & GENMASK(17, 9))
+#define SYS_PAUSE_TOT_CFG_PAUSE_TOT_START_M GENMASK(17, 9)
+#define SYS_PAUSE_TOT_CFG_PAUSE_TOT_START_X(x) (((x) & GENMASK(17, 9)) >> 9)
+#define SYS_PAUSE_TOT_CFG_PAUSE_TOT_STOP(x) ((x) & GENMASK(8, 0))
+#define SYS_PAUSE_TOT_CFG_PAUSE_TOT_STOP_M GENMASK(8, 0)
+
+#define SYS_ATOP_RSZ 0x4
+
+#define SYS_MAC_FC_CFG_RSZ 0x4
+
+#define SYS_MAC_FC_CFG_FC_LINK_SPEED(x) (((x) << 26) & GENMASK(27, 26))
+#define SYS_MAC_FC_CFG_FC_LINK_SPEED_M GENMASK(27, 26)
+#define SYS_MAC_FC_CFG_FC_LINK_SPEED_X(x) (((x) & GENMASK(27, 26)) >> 26)
+#define SYS_MAC_FC_CFG_FC_LATENCY_CFG(x) (((x) << 20) & GENMASK(25, 20))
+#define SYS_MAC_FC_CFG_FC_LATENCY_CFG_M GENMASK(25, 20)
+#define SYS_MAC_FC_CFG_FC_LATENCY_CFG_X(x) (((x) & GENMASK(25, 20)) >> 20)
+#define SYS_MAC_FC_CFG_ZERO_PAUSE_ENA BIT(18)
+#define SYS_MAC_FC_CFG_TX_FC_ENA BIT(17)
+#define SYS_MAC_FC_CFG_RX_FC_ENA BIT(16)
+#define SYS_MAC_FC_CFG_PAUSE_VAL_CFG(x) ((x) & GENMASK(15, 0))
+#define SYS_MAC_FC_CFG_PAUSE_VAL_CFG_M GENMASK(15, 0)
+
+#define SYS_MMGT_RELCNT(x) (((x) << 16) & GENMASK(31, 16))
+#define SYS_MMGT_RELCNT_M GENMASK(31, 16)
+#define SYS_MMGT_RELCNT_X(x) (((x) & GENMASK(31, 16)) >> 16)
+#define SYS_MMGT_FREECNT(x) ((x) & GENMASK(15, 0))
+#define SYS_MMGT_FREECNT_M GENMASK(15, 0)
+
+#define SYS_MMGT_FAST_FREEVLD(x) (((x) << 4) & GENMASK(7, 4))
+#define SYS_MMGT_FAST_FREEVLD_M GENMASK(7, 4)
+#define SYS_MMGT_FAST_FREEVLD_X(x) (((x) & GENMASK(7, 4)) >> 4)
+#define SYS_MMGT_FAST_RELVLD(x) ((x) & GENMASK(3, 0))
+#define SYS_MMGT_FAST_RELVLD_M GENMASK(3, 0)
+
+#define SYS_EVENTS_DIF_RSZ 0x4
+
+#define SYS_EVENTS_DIF_EV_DRX(x) (((x) << 6) & GENMASK(8, 6))
+#define SYS_EVENTS_DIF_EV_DRX_M GENMASK(8, 6)
+#define SYS_EVENTS_DIF_EV_DRX_X(x) (((x) & GENMASK(8, 6)) >> 6)
+#define SYS_EVENTS_DIF_EV_DTX(x) ((x) & GENMASK(5, 0))
+#define SYS_EVENTS_DIF_EV_DTX_M GENMASK(5, 0)
+
+#define SYS_EVENTS_CORE_EV_FWR BIT(2)
+#define SYS_EVENTS_CORE_EV_ANA(x) ((x) & GENMASK(1, 0))
+#define SYS_EVENTS_CORE_EV_ANA_M GENMASK(1, 0)
+
+#define SYS_CNT_GSZ 0x4
+
+#define SYS_PTP_STATUS_PTP_TXSTAMP_OAM BIT(29)
+#define SYS_PTP_STATUS_PTP_OVFL BIT(28)
+#define SYS_PTP_STATUS_PTP_MESS_VLD BIT(27)
+#define SYS_PTP_STATUS_PTP_MESS_ID(x) (((x) << 21) & GENMASK(26, 21))
+#define SYS_PTP_STATUS_PTP_MESS_ID_M GENMASK(26, 21)
+#define SYS_PTP_STATUS_PTP_MESS_ID_X(x) (((x) & GENMASK(26, 21)) >> 21)
+#define SYS_PTP_STATUS_PTP_MESS_TXPORT(x) (((x) << 16) & GENMASK(20, 16))
+#define SYS_PTP_STATUS_PTP_MESS_TXPORT_M GENMASK(20, 16)
+#define SYS_PTP_STATUS_PTP_MESS_TXPORT_X(x) (((x) & GENMASK(20, 16)) >> 16)
+#define SYS_PTP_STATUS_PTP_MESS_SEQ_ID(x) ((x) & GENMASK(15, 0))
+#define SYS_PTP_STATUS_PTP_MESS_SEQ_ID_M GENMASK(15, 0)
+
+#define SYS_PTP_TXSTAMP_PTP_TXSTAMP(x) ((x) & GENMASK(29, 0))
+#define SYS_PTP_TXSTAMP_PTP_TXSTAMP_M GENMASK(29, 0)
+#define SYS_PTP_TXSTAMP_PTP_TXSTAMP_SEC BIT(31)
+
+#define SYS_PTP_NXT_PTP_NXT BIT(0)
+
+#define SYS_PTP_CFG_PTP_STAMP_WID(x) (((x) << 2) & GENMASK(7, 2))
+#define SYS_PTP_CFG_PTP_STAMP_WID_M GENMASK(7, 2)
+#define SYS_PTP_CFG_PTP_STAMP_WID_X(x) (((x) & GENMASK(7, 2)) >> 2)
+#define SYS_PTP_CFG_PTP_CF_ROLL_MODE(x) ((x) & GENMASK(1, 0))
+#define SYS_PTP_CFG_PTP_CF_ROLL_MODE_M GENMASK(1, 0)
+
+#define SYS_RAM_INIT_RAM_INIT BIT(1)
+#define SYS_RAM_INIT_RAM_CFG_HOOK BIT(0)
+
+#endif
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.c b/drivers/net/ethernet/neterion/vxge/vxge-config.c
index 6223930..c60da9e 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.c
@@ -693,7 +693,7 @@
VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)
return VXGE_HW_OK;
else
- return VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+ return VXGE_HW_ERR_PRIVILEGED_OPERATION;
}
/*
@@ -1920,7 +1920,7 @@
}
if (!(hldev->access_rights & VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
- status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+ status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
goto exit;
}
@@ -3153,7 +3153,7 @@
case vxge_hw_mgmt_reg_type_mrpcim:
if (!(hldev->access_rights &
VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
- status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+ status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
break;
}
if (offset > sizeof(struct vxge_hw_mrpcim_reg) - 8) {
@@ -3165,7 +3165,7 @@
case vxge_hw_mgmt_reg_type_srpcim:
if (!(hldev->access_rights &
VXGE_HW_DEVICE_ACCESS_RIGHT_SRPCIM)) {
- status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+ status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
break;
}
if (index > VXGE_HW_TITAN_SRPCIM_REG_SPACES - 1) {
@@ -3279,7 +3279,7 @@
case vxge_hw_mgmt_reg_type_mrpcim:
if (!(hldev->access_rights &
VXGE_HW_DEVICE_ACCESS_RIGHT_MRPCIM)) {
- status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+ status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
break;
}
if (offset > sizeof(struct vxge_hw_mrpcim_reg) - 8) {
@@ -3291,7 +3291,7 @@
case vxge_hw_mgmt_reg_type_srpcim:
if (!(hldev->access_rights &
VXGE_HW_DEVICE_ACCESS_RIGHT_SRPCIM)) {
- status = VXGE_HW_ERR_PRIVILAGED_OPEARATION;
+ status = VXGE_HW_ERR_PRIVILEGED_OPERATION;
break;
}
if (index > VXGE_HW_TITAN_SRPCIM_REG_SPACES - 1) {
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.h b/drivers/net/ethernet/neterion/vxge/vxge-config.h
index cfa9704..d743a37 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.h
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.h
@@ -127,7 +127,7 @@
VXGE_HW_ERR_INVALID_TCODE = VXGE_HW_BASE_ERR + 14,
VXGE_HW_ERR_INVALID_BLOCK_SIZE = VXGE_HW_BASE_ERR + 15,
VXGE_HW_ERR_INVALID_STATE = VXGE_HW_BASE_ERR + 16,
- VXGE_HW_ERR_PRIVILAGED_OPEARATION = VXGE_HW_BASE_ERR + 17,
+ VXGE_HW_ERR_PRIVILEGED_OPERATION = VXGE_HW_BASE_ERR + 17,
VXGE_HW_ERR_INVALID_PORT = VXGE_HW_BASE_ERR + 18,
VXGE_HW_ERR_FIFO = VXGE_HW_BASE_ERR + 19,
VXGE_HW_ERR_VPATH = VXGE_HW_BASE_ERR + 20,
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
index 0452848..03c3d12 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
@@ -276,7 +276,7 @@
*ptr++ = 0;
status = vxge_hw_device_xmac_stats_get(hldev, xmac_stats);
if (status != VXGE_HW_OK) {
- if (status != VXGE_HW_ERR_PRIVILAGED_OPEARATION) {
+ if (status != VXGE_HW_ERR_PRIVILEGED_OPERATION) {
vxge_debug_init(VXGE_ERR,
"%s : %d Failure in getting xmac stats",
__func__, __LINE__);
diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index b2299f2..a8918bb 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -3484,11 +3484,11 @@
0,
&stat);
- if (status == VXGE_HW_ERR_PRIVILAGED_OPEARATION)
+ if (status == VXGE_HW_ERR_PRIVILEGED_OPERATION)
vxge_debug_init(
vxge_hw_device_trace_level_get(hldev),
"%s: device stats clear returns"
- "VXGE_HW_ERR_PRIVILAGED_OPEARATION", ndev->name);
+ "VXGE_HW_ERR_PRIVILEGED_OPERATION", ndev->name);
vxge_debug_entryexit(vxge_hw_device_trace_level_get(hldev),
"%s: %s:%d Exiting...",
diff --git a/drivers/net/ethernet/netronome/Kconfig b/drivers/net/ethernet/netronome/Kconfig
index ae0c46b..66f15b0 100644
--- a/drivers/net/ethernet/netronome/Kconfig
+++ b/drivers/net/ethernet/netronome/Kconfig
@@ -36,6 +36,19 @@
either directly, with Open vSwitch, or any other way. Note that
TC Flower offload requires specific FW to work.
+config NFP_APP_ABM_NIC
+ bool "NFP4000/NFP6000 Advanced buffer management NIC support"
+ depends on NFP
+ depends on NET_SWITCHDEV
+ default y
+ help
+ Enable driver support for Advanced buffer management NIC on NFP.
+ ABM NIC allows advanced configuration of queuing and scheduling
+ of packets, including ECN marking. Say Y, if you are planning to
+ use one of the NFP4000 and NFP6000 platforms which support this
+ functionality.
+ Code will be built into the nfp.ko driver.
+
config NFP_DEBUG
bool "Debug support for Netronome(R) NFP4000/NFP6000 NIC drivers"
depends on NFP
diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile
index d5866d7..4afb103 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -30,12 +30,14 @@
nfp_net_sriov.o \
nfp_netvf_main.o \
nfp_port.o \
+ nfp_shared_buf.o \
nic/main.o
ifeq ($(CONFIG_NFP_APP_FLOWER),y)
nfp-objs += \
flower/action.o \
flower/cmsg.o \
+ flower/lag_conf.o \
flower/main.o \
flower/match.o \
flower/metadata.o \
@@ -52,4 +54,10 @@
bpf/jit.o
endif
+ifeq ($(CONFIG_NFP_APP_ABM_NIC),y)
+nfp-objs += \
+ abm/ctrl.o \
+ abm/main.o
+endif
+
nfp-$(CONFIG_NFP_DEBUG) += nfp_net_debugfs.o
diff --git a/drivers/net/ethernet/netronome/nfp/abm/ctrl.c b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
new file mode 100644
index 0000000..e40f6f0
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/abm/ctrl.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+/*
+ * Copyright (C) 2018 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below. You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+
+#include "../nfpcore/nfp_cpp.h"
+#include "../nfp_app.h"
+#include "../nfp_main.h"
+#include "../nfp_net.h"
+#include "main.h"
+
+void nfp_abm_ctrl_read_params(struct nfp_abm_link *alink)
+{
+ alink->queue_base = nn_readl(alink->vnic, NFP_NET_CFG_START_RXQ);
+ alink->queue_base /= alink->vnic->stride_rx;
+}
+
+int nfp_abm_ctrl_find_addrs(struct nfp_abm *abm)
+{
+ struct nfp_pf *pf = abm->app->pf;
+ unsigned int pf_id;
+
+ pf_id = nfp_cppcore_pcie_unit(pf->cpp);
+ abm->pf_id = pf_id;
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c
new file mode 100644
index 0000000..5a12bb2
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -0,0 +1,399 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+/*
+ * Copyright (C) 2018 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below. You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/etherdevice.h>
+#include <linux/lockdep.h>
+#include <linux/netdevice.h>
+#include <linux/rcupdate.h>
+#include <linux/slab.h>
+
+#include "../nfpcore/nfp.h"
+#include "../nfpcore/nfp_cpp.h"
+#include "../nfpcore/nfp_nsp.h"
+#include "../nfp_app.h"
+#include "../nfp_main.h"
+#include "../nfp_net.h"
+#include "../nfp_net_repr.h"
+#include "../nfp_port.h"
+#include "main.h"
+
+static u32 nfp_abm_portid(enum nfp_repr_type rtype, unsigned int id)
+{
+ return FIELD_PREP(NFP_ABM_PORTID_TYPE, rtype) |
+ FIELD_PREP(NFP_ABM_PORTID_ID, id);
+}
+
+static struct net_device *nfp_abm_repr_get(struct nfp_app *app, u32 port_id)
+{
+ enum nfp_repr_type rtype;
+ struct nfp_reprs *reprs;
+ u8 port;
+
+ rtype = FIELD_GET(NFP_ABM_PORTID_TYPE, port_id);
+ port = FIELD_GET(NFP_ABM_PORTID_ID, port_id);
+
+ reprs = rcu_dereference(app->reprs[rtype]);
+ if (!reprs)
+ return NULL;
+
+ if (port >= reprs->num_reprs)
+ return NULL;
+
+ return rcu_dereference(reprs->reprs[port]);
+}
+
+static int
+nfp_abm_spawn_repr(struct nfp_app *app, struct nfp_abm_link *alink,
+ enum nfp_port_type ptype)
+{
+ struct net_device *netdev;
+ enum nfp_repr_type rtype;
+ struct nfp_reprs *reprs;
+ struct nfp_repr *repr;
+ struct nfp_port *port;
+ int err;
+
+ if (ptype == NFP_PORT_PHYS_PORT)
+ rtype = NFP_REPR_TYPE_PHYS_PORT;
+ else
+ rtype = NFP_REPR_TYPE_PF;
+
+ netdev = nfp_repr_alloc(app);
+ if (!netdev)
+ return -ENOMEM;
+ repr = netdev_priv(netdev);
+ repr->app_priv = alink;
+
+ port = nfp_port_alloc(app, ptype, netdev);
+ if (IS_ERR(port)) {
+ err = PTR_ERR(port);
+ goto err_free_repr;
+ }
+
+ if (ptype == NFP_PORT_PHYS_PORT) {
+ port->eth_forced = true;
+ err = nfp_port_init_phy_port(app->pf, app, port, alink->id);
+ if (err)
+ goto err_free_port;
+ } else {
+ port->pf_id = alink->abm->pf_id;
+ port->pf_split = app->pf->max_data_vnics > 1;
+ port->pf_split_id = alink->id;
+ port->vnic = alink->vnic->dp.ctrl_bar;
+ }
+
+ SET_NETDEV_DEV(netdev, &alink->vnic->pdev->dev);
+ eth_hw_addr_random(netdev);
+
+ err = nfp_repr_init(app, netdev, nfp_abm_portid(rtype, alink->id),
+ port, alink->vnic->dp.netdev);
+ if (err)
+ goto err_free_port;
+
+ reprs = nfp_reprs_get_locked(app, rtype);
+ WARN(nfp_repr_get_locked(app, reprs, alink->id), "duplicate repr");
+ rcu_assign_pointer(reprs->reprs[alink->id], netdev);
+
+ nfp_info(app->cpp, "%s Port %d Representor(%s) created\n",
+ ptype == NFP_PORT_PF_PORT ? "PCIe" : "Phys",
+ alink->id, netdev->name);
+
+ return 0;
+
+err_free_port:
+ nfp_port_free(port);
+err_free_repr:
+ nfp_repr_free(netdev);
+ return err;
+}
+
+static void
+nfp_abm_kill_repr(struct nfp_app *app, struct nfp_abm_link *alink,
+ enum nfp_repr_type rtype)
+{
+ struct net_device *netdev;
+ struct nfp_reprs *reprs;
+
+ reprs = nfp_reprs_get_locked(app, rtype);
+ netdev = nfp_repr_get_locked(app, reprs, alink->id);
+ if (!netdev)
+ return;
+ rcu_assign_pointer(reprs->reprs[alink->id], NULL);
+ synchronize_rcu();
+ /* Cast to make sure nfp_repr_clean_and_free() takes a nfp_repr */
+ nfp_repr_clean_and_free((struct nfp_repr *)netdev_priv(netdev));
+}
+
+static void
+nfp_abm_kill_reprs(struct nfp_abm *abm, struct nfp_abm_link *alink)
+{
+ nfp_abm_kill_repr(abm->app, alink, NFP_REPR_TYPE_PF);
+ nfp_abm_kill_repr(abm->app, alink, NFP_REPR_TYPE_PHYS_PORT);
+}
+
+static void nfp_abm_kill_reprs_all(struct nfp_abm *abm)
+{
+ struct nfp_pf *pf = abm->app->pf;
+ struct nfp_net *nn;
+
+ list_for_each_entry(nn, &pf->vnics, vnic_list)
+ nfp_abm_kill_reprs(abm, (struct nfp_abm_link *)nn->app_priv);
+}
+
+static enum devlink_eswitch_mode nfp_abm_eswitch_mode_get(struct nfp_app *app)
+{
+ struct nfp_abm *abm = app->priv;
+
+ return abm->eswitch_mode;
+}
+
+static int nfp_abm_eswitch_set_legacy(struct nfp_abm *abm)
+{
+ nfp_abm_kill_reprs_all(abm);
+
+ abm->eswitch_mode = DEVLINK_ESWITCH_MODE_LEGACY;
+ return 0;
+}
+
+static void nfp_abm_eswitch_clean_up(struct nfp_abm *abm)
+{
+ if (abm->eswitch_mode != DEVLINK_ESWITCH_MODE_LEGACY)
+ WARN_ON(nfp_abm_eswitch_set_legacy(abm));
+}
+
+static int nfp_abm_eswitch_set_switchdev(struct nfp_abm *abm)
+{
+ struct nfp_app *app = abm->app;
+ struct nfp_pf *pf = app->pf;
+ struct nfp_net *nn;
+ int err;
+
+ list_for_each_entry(nn, &pf->vnics, vnic_list) {
+ struct nfp_abm_link *alink = nn->app_priv;
+
+ err = nfp_abm_spawn_repr(app, alink, NFP_PORT_PHYS_PORT);
+ if (err)
+ goto err_kill_all_reprs;
+
+ err = nfp_abm_spawn_repr(app, alink, NFP_PORT_PF_PORT);
+ if (err)
+ goto err_kill_all_reprs;
+ }
+
+ abm->eswitch_mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
+ return 0;
+
+err_kill_all_reprs:
+ nfp_abm_kill_reprs_all(abm);
+ return err;
+}
+
+static int nfp_abm_eswitch_mode_set(struct nfp_app *app, u16 mode)
+{
+ struct nfp_abm *abm = app->priv;
+
+ if (abm->eswitch_mode == mode)
+ return 0;
+
+ switch (mode) {
+ case DEVLINK_ESWITCH_MODE_LEGACY:
+ return nfp_abm_eswitch_set_legacy(abm);
+ case DEVLINK_ESWITCH_MODE_SWITCHDEV:
+ return nfp_abm_eswitch_set_switchdev(abm);
+ default:
+ return -EINVAL;
+ }
+}
+
+static void
+nfp_abm_vnic_set_mac(struct nfp_pf *pf, struct nfp_abm *abm, struct nfp_net *nn,
+ unsigned int id)
+{
+ struct nfp_eth_table_port *eth_port = &pf->eth_tbl->ports[id];
+ u8 mac_addr[ETH_ALEN];
+ const char *mac_str;
+ char name[32];
+
+ if (id > pf->eth_tbl->count) {
+ nfp_warn(pf->cpp, "No entry for persistent MAC address\n");
+ eth_hw_addr_random(nn->dp.netdev);
+ return;
+ }
+
+ snprintf(name, sizeof(name), "eth%u.mac.pf%u",
+ eth_port->eth_index, abm->pf_id);
+
+ mac_str = nfp_hwinfo_lookup(pf->hwinfo, name);
+ if (!mac_str) {
+ nfp_warn(pf->cpp, "Can't lookup persistent MAC address (%s)\n",
+ name);
+ eth_hw_addr_random(nn->dp.netdev);
+ return;
+ }
+
+ if (sscanf(mac_str, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
+ &mac_addr[0], &mac_addr[1], &mac_addr[2],
+ &mac_addr[3], &mac_addr[4], &mac_addr[5]) != 6) {
+ nfp_warn(pf->cpp, "Can't parse persistent MAC address (%s)\n",
+ mac_str);
+ eth_hw_addr_random(nn->dp.netdev);
+ return;
+ }
+
+ ether_addr_copy(nn->dp.netdev->dev_addr, mac_addr);
+ ether_addr_copy(nn->dp.netdev->perm_addr, mac_addr);
+}
+
+static int
+nfp_abm_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id)
+{
+ struct nfp_eth_table_port *eth_port = &app->pf->eth_tbl->ports[id];
+ struct nfp_abm *abm = app->priv;
+ struct nfp_abm_link *alink;
+ int err;
+
+ alink = kzalloc(sizeof(*alink), GFP_KERNEL);
+ if (!alink)
+ return -ENOMEM;
+ nn->app_priv = alink;
+ alink->abm = abm;
+ alink->vnic = nn;
+ alink->id = id;
+
+ /* This is a multi-host app, make sure MAC/PHY is up, but don't
+ * make the MAC/PHY state follow the state of any of the ports.
+ */
+ err = nfp_eth_set_configured(app->cpp, eth_port->index, true);
+ if (err < 0)
+ goto err_free_alink;
+
+ netif_keep_dst(nn->dp.netdev);
+
+ nfp_abm_vnic_set_mac(app->pf, abm, nn, id);
+ nfp_abm_ctrl_read_params(alink);
+
+ return 0;
+
+err_free_alink:
+ kfree(alink);
+ return err;
+}
+
+static void nfp_abm_vnic_free(struct nfp_app *app, struct nfp_net *nn)
+{
+ struct nfp_abm_link *alink = nn->app_priv;
+
+ nfp_abm_kill_reprs(alink->abm, alink);
+ kfree(alink);
+}
+
+static int nfp_abm_init(struct nfp_app *app)
+{
+ struct nfp_pf *pf = app->pf;
+ struct nfp_reprs *reprs;
+ struct nfp_abm *abm;
+ int err;
+
+ if (!pf->eth_tbl) {
+ nfp_err(pf->cpp, "ABM NIC requires ETH table\n");
+ return -EINVAL;
+ }
+ if (pf->max_data_vnics != pf->eth_tbl->count) {
+ nfp_err(pf->cpp, "ETH entries don't match vNICs (%d vs %d)\n",
+ pf->max_data_vnics, pf->eth_tbl->count);
+ return -EINVAL;
+ }
+ if (!pf->mac_stats_bar) {
+ nfp_warn(app->cpp, "ABM NIC requires mac_stats symbol\n");
+ return -EINVAL;
+ }
+
+ abm = kzalloc(sizeof(*abm), GFP_KERNEL);
+ if (!abm)
+ return -ENOMEM;
+ app->priv = abm;
+ abm->app = app;
+
+ err = nfp_abm_ctrl_find_addrs(abm);
+ if (err)
+ goto err_free_abm;
+
+ err = -ENOMEM;
+ reprs = nfp_reprs_alloc(pf->max_data_vnics);
+ if (!reprs)
+ goto err_free_abm;
+ RCU_INIT_POINTER(app->reprs[NFP_REPR_TYPE_PHYS_PORT], reprs);
+
+ reprs = nfp_reprs_alloc(pf->max_data_vnics);
+ if (!reprs)
+ goto err_free_phys;
+ RCU_INIT_POINTER(app->reprs[NFP_REPR_TYPE_PF], reprs);
+
+ return 0;
+
+err_free_phys:
+ nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PHYS_PORT);
+err_free_abm:
+ kfree(abm);
+ app->priv = NULL;
+ return err;
+}
+
+static void nfp_abm_clean(struct nfp_app *app)
+{
+ struct nfp_abm *abm = app->priv;
+
+ nfp_abm_eswitch_clean_up(abm);
+ nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PF);
+ nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PHYS_PORT);
+ kfree(abm);
+ app->priv = NULL;
+}
+
+const struct nfp_app_type app_abm = {
+ .id = NFP_APP_ACTIVE_BUFFER_MGMT_NIC,
+ .name = "abm",
+
+ .init = nfp_abm_init,
+ .clean = nfp_abm_clean,
+
+ .vnic_alloc = nfp_abm_vnic_alloc,
+ .vnic_free = nfp_abm_vnic_free,
+
+ .eswitch_mode_get = nfp_abm_eswitch_mode_get,
+ .eswitch_mode_set = nfp_abm_eswitch_mode_set,
+
+ .repr_get = nfp_abm_repr_get,
+};
diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h
new file mode 100644
index 0000000..5938b69
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) */
+/*
+ * Copyright (C) 2018 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below. You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __NFP_ABM_H__
+#define __NFP_ABM_H__ 1
+
+#include <net/devlink.h>
+
+struct nfp_app;
+struct nfp_net;
+
+#define NFP_ABM_PORTID_TYPE GENMASK(23, 16)
+#define NFP_ABM_PORTID_ID GENMASK(7, 0)
+
+/**
+ * struct nfp_abm - ABM NIC app structure
+ * @app: back pointer to nfp_app
+ * @pf_id: ID of our PF link
+ * @eswitch_mode: devlink eswitch mode, advanced functions only visible
+ * in switchdev mode
+ */
+struct nfp_abm {
+ struct nfp_app *app;
+ unsigned int pf_id;
+ enum devlink_eswitch_mode eswitch_mode;
+};
+
+/**
+ * struct nfp_abm_link - port tuple of a ABM NIC
+ * @abm: back pointer to nfp_abm
+ * @vnic: data vNIC
+ * @id: id of the data vNIC
+ * @queue_base: id of base to host queue within PCIe (not QC idx)
+ */
+struct nfp_abm_link {
+ struct nfp_abm *abm;
+ struct nfp_net *vnic;
+ unsigned int id;
+ unsigned int queue_base;
+};
+
+void nfp_abm_ctrl_read_params(struct nfp_abm_link *alink);
+int nfp_abm_ctrl_find_addrs(struct nfp_abm *abm);
+#endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
index 7e29814..cb87fcc 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -102,6 +102,15 @@
return nfp_bpf_cmsg_alloc(bpf, size);
}
+static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb)
+{
+ struct cmsg_hdr *hdr;
+
+ hdr = (struct cmsg_hdr *)skb->data;
+
+ return hdr->type;
+}
+
static unsigned int nfp_bpf_cmsg_get_tag(struct sk_buff *skb)
{
struct cmsg_hdr *hdr;
@@ -431,6 +440,11 @@
goto err_free;
}
+ if (nfp_bpf_cmsg_get_type(skb) == CMSG_TYPE_BPF_EVENT) {
+ nfp_bpf_event_output(bpf, skb);
+ return;
+ }
+
nfp_ctrl_lock(bpf->app->ctrl);
tag = nfp_bpf_cmsg_get_tag(skb);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
index 39639ac..4c7972e 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -37,11 +37,20 @@
#include <linux/bitops.h>
#include <linux/types.h>
+/* Kernel's enum bpf_reg_type is not uABI so people may change it breaking
+ * our FW ABI. In that case we will do translation in the driver.
+ */
+#define NFP_BPF_SCALAR_VALUE 1
+#define NFP_BPF_MAP_VALUE 4
+#define NFP_BPF_STACK 6
+#define NFP_BPF_PACKET_DATA 8
+
enum bpf_cap_tlv_type {
NFP_BPF_CAP_TYPE_FUNC = 1,
NFP_BPF_CAP_TYPE_ADJUST_HEAD = 2,
NFP_BPF_CAP_TYPE_MAPS = 3,
NFP_BPF_CAP_TYPE_RANDOM = 4,
+ NFP_BPF_CAP_TYPE_QUEUE_SELECT = 5,
};
struct nfp_bpf_cap_tlv_func {
@@ -81,6 +90,7 @@
CMSG_TYPE_MAP_DELETE = 5,
CMSG_TYPE_MAP_GETNEXT = 6,
CMSG_TYPE_MAP_GETFIRST = 7,
+ CMSG_TYPE_BPF_EVENT = 8,
__CMSG_TYPE_MAP_MAX,
};
@@ -155,4 +165,13 @@
__be32 resv;
struct cmsg_key_value_pair elem[0];
};
+
+struct cmsg_bpf_event {
+ struct cmsg_hdr hdr;
+ __be32 cpu_id;
+ __be64 map_ptr;
+ __be32 data_size;
+ __be32 pkt_size;
+ u8 data[0];
+};
#endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 29b4e5f..8a92088 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -42,6 +42,7 @@
#include "main.h"
#include "../nfp_asm.h"
+#include "../nfp_net_ctrl.h"
/* --- NFP prog --- */
/* Foreach "multiple" entries macros provide pos and next<n> pointers.
@@ -211,6 +212,60 @@
}
static void
+__emit_br_bit(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 addr, u8 defer,
+ bool set, bool src_lmextn)
+{
+ u16 addr_lo, addr_hi;
+ u64 insn;
+
+ addr_lo = addr & (OP_BR_BIT_ADDR_LO >> __bf_shf(OP_BR_BIT_ADDR_LO));
+ addr_hi = addr != addr_lo;
+
+ insn = OP_BR_BIT_BASE |
+ FIELD_PREP(OP_BR_BIT_A_SRC, areg) |
+ FIELD_PREP(OP_BR_BIT_B_SRC, breg) |
+ FIELD_PREP(OP_BR_BIT_BV, set) |
+ FIELD_PREP(OP_BR_BIT_DEFBR, defer) |
+ FIELD_PREP(OP_BR_BIT_ADDR_LO, addr_lo) |
+ FIELD_PREP(OP_BR_BIT_ADDR_HI, addr_hi) |
+ FIELD_PREP(OP_BR_BIT_SRC_LMEXTN, src_lmextn);
+
+ nfp_prog_push(nfp_prog, insn);
+}
+
+static void
+emit_br_bit_relo(struct nfp_prog *nfp_prog, swreg src, u8 bit, u16 addr,
+ u8 defer, bool set, enum nfp_relo_type relo)
+{
+ struct nfp_insn_re_regs reg;
+ int err;
+
+ /* NOTE: The bit to test is specified as an rotation amount, such that
+ * the bit to test will be placed on the MSB of the result when
+ * doing a rotate right. For bit X, we need right rotate X + 1.
+ */
+ bit += 1;
+
+ err = swreg_to_restricted(reg_none(), src, reg_imm(bit), ®, false);
+ if (err) {
+ nfp_prog->error = err;
+ return;
+ }
+
+ __emit_br_bit(nfp_prog, reg.areg, reg.breg, addr, defer, set,
+ reg.src_lmextn);
+
+ nfp_prog->prog[nfp_prog->prog_len - 1] |=
+ FIELD_PREP(OP_RELO_TYPE, relo);
+}
+
+static void
+emit_br_bset(struct nfp_prog *nfp_prog, swreg src, u8 bit, u16 addr, u8 defer)
+{
+ emit_br_bit_relo(nfp_prog, src, bit, addr, defer, true, RELO_BR_REL);
+}
+
+static void
__emit_immed(struct nfp_prog *nfp_prog, u16 areg, u16 breg, u16 imm_hi,
enum immed_width width, bool invert,
enum immed_shift shift, bool wr_both,
@@ -309,6 +364,19 @@
}
static void
+emit_shf_indir(struct nfp_prog *nfp_prog, swreg dst,
+ swreg lreg, enum shf_op op, swreg rreg, enum shf_sc sc)
+{
+ if (sc == SHF_SC_R_ROT) {
+ pr_err("indirect shift is not allowed on rotation\n");
+ nfp_prog->error = -EFAULT;
+ return;
+ }
+
+ emit_shf(nfp_prog, dst, lreg, op, rreg, sc, 0);
+}
+
+static void
__emit_alu(struct nfp_prog *nfp_prog, u16 dst, enum alu_dst_ab dst_ab,
u16 areg, enum alu_op op, u16 breg, bool swap, bool wr_both,
bool dst_lmextn, bool src_lmextn)
@@ -1214,45 +1282,83 @@
return 0;
}
-static int
-wrp_cmp_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
- enum br_mask br_mask, bool swap)
+static const struct jmp_code_map {
+ enum br_mask br_mask;
+ bool swap;
+} jmp_code_map[] = {
+ [BPF_JGT >> 4] = { BR_BLO, true },
+ [BPF_JGE >> 4] = { BR_BHS, false },
+ [BPF_JLT >> 4] = { BR_BLO, false },
+ [BPF_JLE >> 4] = { BR_BHS, true },
+ [BPF_JSGT >> 4] = { BR_BLT, true },
+ [BPF_JSGE >> 4] = { BR_BGE, false },
+ [BPF_JSLT >> 4] = { BR_BLT, false },
+ [BPF_JSLE >> 4] = { BR_BGE, true },
+};
+
+static const struct jmp_code_map *nfp_jmp_code_get(struct nfp_insn_meta *meta)
+{
+ unsigned int op;
+
+ op = BPF_OP(meta->insn.code) >> 4;
+ /* br_mask of 0 is BR_BEQ which we don't use in jump code table */
+ if (WARN_ONCE(op >= ARRAY_SIZE(jmp_code_map) ||
+ !jmp_code_map[op].br_mask,
+ "no code found for jump instruction"))
+ return NULL;
+
+ return &jmp_code_map[op];
+}
+
+static int cmp_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
const struct bpf_insn *insn = &meta->insn;
u64 imm = insn->imm; /* sign extend */
+ const struct jmp_code_map *code;
+ enum alu_op alu_op, carry_op;
u8 reg = insn->dst_reg * 2;
swreg tmp_reg;
+ code = nfp_jmp_code_get(meta);
+ if (!code)
+ return -EINVAL;
+
+ alu_op = meta->jump_neg_op ? ALU_OP_ADD : ALU_OP_SUB;
+ carry_op = meta->jump_neg_op ? ALU_OP_ADD_C : ALU_OP_SUB_C;
+
tmp_reg = ur_load_imm_any(nfp_prog, imm & ~0U, imm_b(nfp_prog));
- if (!swap)
- emit_alu(nfp_prog, reg_none(), reg_a(reg), ALU_OP_SUB, tmp_reg);
+ if (!code->swap)
+ emit_alu(nfp_prog, reg_none(), reg_a(reg), alu_op, tmp_reg);
else
- emit_alu(nfp_prog, reg_none(), tmp_reg, ALU_OP_SUB, reg_a(reg));
+ emit_alu(nfp_prog, reg_none(), tmp_reg, alu_op, reg_a(reg));
tmp_reg = ur_load_imm_any(nfp_prog, imm >> 32, imm_b(nfp_prog));
- if (!swap)
+ if (!code->swap)
emit_alu(nfp_prog, reg_none(),
- reg_a(reg + 1), ALU_OP_SUB_C, tmp_reg);
+ reg_a(reg + 1), carry_op, tmp_reg);
else
emit_alu(nfp_prog, reg_none(),
- tmp_reg, ALU_OP_SUB_C, reg_a(reg + 1));
+ tmp_reg, carry_op, reg_a(reg + 1));
- emit_br(nfp_prog, br_mask, insn->off, 0);
+ emit_br(nfp_prog, code->br_mask, insn->off, 0);
return 0;
}
-static int
-wrp_cmp_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
- enum br_mask br_mask, bool swap)
+static int cmp_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
const struct bpf_insn *insn = &meta->insn;
+ const struct jmp_code_map *code;
u8 areg, breg;
+ code = nfp_jmp_code_get(meta);
+ if (!code)
+ return -EINVAL;
+
areg = insn->dst_reg * 2;
breg = insn->src_reg * 2;
- if (swap) {
+ if (code->swap) {
areg ^= breg;
breg ^= areg;
areg ^= breg;
@@ -1261,7 +1367,7 @@
emit_alu(nfp_prog, reg_none(), reg_a(areg), ALU_OP_SUB, reg_b(breg));
emit_alu(nfp_prog, reg_none(),
reg_a(areg + 1), ALU_OP_SUB_C, reg_b(breg + 1));
- emit_br(nfp_prog, br_mask, insn->off, 0);
+ emit_br(nfp_prog, code->br_mask, insn->off, 0);
return 0;
}
@@ -1357,15 +1463,9 @@
static int
map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
- struct bpf_offloaded_map *offmap;
- struct nfp_bpf_map *nfp_map;
bool load_lm_ptr;
u32 ret_tgt;
s64 lm_off;
- swreg tid;
-
- offmap = (struct bpf_offloaded_map *)meta->arg1.map_ptr;
- nfp_map = offmap->dev_priv;
/* We only have to reload LM0 if the key is not at start of stack */
lm_off = nfp_prog->stack_depth;
@@ -1378,17 +1478,12 @@
if (meta->func_id == BPF_FUNC_map_update_elem)
emit_csr_wr(nfp_prog, reg_b(3 * 2), NFP_CSR_ACT_LM_ADDR2);
- /* Load map ID into a register, it should actually fit as an immediate
- * but in case it doesn't deal with it here, not in the delay slots.
- */
- tid = ur_load_imm_any(nfp_prog, nfp_map->tid, imm_a(nfp_prog));
-
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO + meta->func_id,
2, RELO_BR_HELPER);
ret_tgt = nfp_prog_current_offset(nfp_prog) + 2;
/* Load map ID into A0 */
- wrp_mov(nfp_prog, reg_a(0), tid);
+ wrp_mov(nfp_prog, reg_a(0), reg_a(2));
/* Load the return address into B0 */
wrp_immed_relo(nfp_prog, reg_b(0), ret_tgt, RELO_IMMED_REL);
@@ -1400,7 +1495,7 @@
if (!load_lm_ptr)
return 0;
- emit_csr_wr(nfp_prog, stack_reg(nfp_prog), NFP_CSR_ACT_LM_ADDR0);
+ emit_csr_wr(nfp_prog, stack_reg(nfp_prog), NFP_CSR_ACT_LM_ADDR0);
wrp_nops(nfp_prog, 3);
return 0;
@@ -1418,6 +1513,63 @@
return 0;
}
+static int
+nfp_perf_event_output(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ swreg ptr_type;
+ u32 ret_tgt;
+
+ ptr_type = ur_load_imm_any(nfp_prog, meta->arg1.type, imm_a(nfp_prog));
+
+ ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
+
+ emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO + meta->func_id,
+ 2, RELO_BR_HELPER);
+
+ /* Load ptr type into A1 */
+ wrp_mov(nfp_prog, reg_a(1), ptr_type);
+
+ /* Load the return address into B0 */
+ wrp_immed_relo(nfp_prog, reg_b(0), ret_tgt, RELO_IMMED_REL);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int
+nfp_queue_select(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ u32 jmp_tgt;
+
+ jmp_tgt = nfp_prog_current_offset(nfp_prog) + 5;
+
+ /* Make sure the queue id fits into FW field */
+ emit_alu(nfp_prog, reg_none(), reg_a(meta->insn.src_reg * 2),
+ ALU_OP_AND_NOT_B, reg_imm(0xff));
+ emit_br(nfp_prog, BR_BEQ, jmp_tgt, 2);
+
+ /* Set the 'queue selected' bit and the queue value */
+ emit_shf(nfp_prog, pv_qsel_set(nfp_prog),
+ pv_qsel_set(nfp_prog), SHF_OP_OR, reg_imm(1),
+ SHF_SC_L_SHF, PKT_VEL_QSEL_SET_BIT);
+ emit_ld_field(nfp_prog,
+ pv_qsel_val(nfp_prog), 0x1, reg_b(meta->insn.src_reg * 2),
+ SHF_SC_NONE, 0);
+ /* Delay slots end here, we will jump over next instruction if queue
+ * value fits into the field.
+ */
+ emit_ld_field(nfp_prog,
+ pv_qsel_val(nfp_prog), 0x1, reg_imm(NFP_NET_RXR_MAX),
+ SHF_SC_NONE, 0);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, jmp_tgt))
+ return -EINVAL;
+
+ return 0;
+}
+
/* --- Callbacks --- */
static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
@@ -1544,26 +1696,142 @@
return 0;
}
+/* Pseudo code:
+ * if shift_amt >= 32
+ * dst_high = dst_low << shift_amt[4:0]
+ * dst_low = 0;
+ * else
+ * dst_high = (dst_high, dst_low) >> (32 - shift_amt)
+ * dst_low = dst_low << shift_amt
+ *
+ * The indirect shift will use the same logic at runtime.
+ */
+static int __shl_imm64(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+{
+ if (shift_amt < 32) {
+ emit_shf(nfp_prog, reg_both(dst + 1), reg_a(dst + 1),
+ SHF_OP_NONE, reg_b(dst), SHF_SC_R_DSHF,
+ 32 - shift_amt);
+ emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_L_SHF, shift_amt);
+ } else if (shift_amt == 32) {
+ wrp_reg_mov(nfp_prog, dst + 1, dst);
+ wrp_immed(nfp_prog, reg_both(dst), 0);
+ } else if (shift_amt > 32) {
+ emit_shf(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_L_SHF, shift_amt - 32);
+ wrp_immed(nfp_prog, reg_both(dst), 0);
+ }
+
+ return 0;
+}
+
static int shl_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
const struct bpf_insn *insn = &meta->insn;
u8 dst = insn->dst_reg * 2;
- if (insn->imm < 32) {
- emit_shf(nfp_prog, reg_both(dst + 1),
- reg_a(dst + 1), SHF_OP_NONE, reg_b(dst),
- SHF_SC_R_DSHF, 32 - insn->imm);
- emit_shf(nfp_prog, reg_both(dst),
- reg_none(), SHF_OP_NONE, reg_b(dst),
- SHF_SC_L_SHF, insn->imm);
- } else if (insn->imm == 32) {
- wrp_reg_mov(nfp_prog, dst + 1, dst);
- wrp_immed(nfp_prog, reg_both(dst), 0);
- } else if (insn->imm > 32) {
- emit_shf(nfp_prog, reg_both(dst + 1),
- reg_none(), SHF_OP_NONE, reg_b(dst),
- SHF_SC_L_SHF, insn->imm - 32);
- wrp_immed(nfp_prog, reg_both(dst), 0);
+ return __shl_imm64(nfp_prog, dst, insn->imm);
+}
+
+static void shl_reg64_lt32_high(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ emit_alu(nfp_prog, imm_both(nfp_prog), reg_imm(32), ALU_OP_SUB,
+ reg_b(src));
+ emit_alu(nfp_prog, reg_none(), imm_a(nfp_prog), ALU_OP_OR, reg_imm(0));
+ emit_shf_indir(nfp_prog, reg_both(dst + 1), reg_a(dst + 1), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_R_DSHF);
+}
+
+/* NOTE: for indirect left shift, HIGH part should be calculated first. */
+static void shl_reg64_lt32_low(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_imm(0));
+ emit_shf_indir(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_L_SHF);
+}
+
+static void shl_reg64_lt32(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ shl_reg64_lt32_high(nfp_prog, dst, src);
+ shl_reg64_lt32_low(nfp_prog, dst, src);
+}
+
+static void shl_reg64_ge32(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_imm(0));
+ emit_shf_indir(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_L_SHF);
+ wrp_immed(nfp_prog, reg_both(dst), 0);
+}
+
+static int shl_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ const struct bpf_insn *insn = &meta->insn;
+ u64 umin, umax;
+ u8 dst, src;
+
+ dst = insn->dst_reg * 2;
+ umin = meta->umin;
+ umax = meta->umax;
+ if (umin == umax)
+ return __shl_imm64(nfp_prog, dst, umin);
+
+ src = insn->src_reg * 2;
+ if (umax < 32) {
+ shl_reg64_lt32(nfp_prog, dst, src);
+ } else if (umin >= 32) {
+ shl_reg64_ge32(nfp_prog, dst, src);
+ } else {
+ /* Generate different instruction sequences depending on runtime
+ * value of shift amount.
+ */
+ u16 label_ge32, label_end;
+
+ label_ge32 = nfp_prog_current_offset(nfp_prog) + 7;
+ emit_br_bset(nfp_prog, reg_a(src), 5, label_ge32, 0);
+
+ shl_reg64_lt32_high(nfp_prog, dst, src);
+ label_end = nfp_prog_current_offset(nfp_prog) + 6;
+ emit_br(nfp_prog, BR_UNC, label_end, 2);
+ /* shl_reg64_lt32_low packed in delay slot. */
+ shl_reg64_lt32_low(nfp_prog, dst, src);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, label_ge32))
+ return -EINVAL;
+ shl_reg64_ge32(nfp_prog, dst, src);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, label_end))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/* Pseudo code:
+ * if shift_amt >= 32
+ * dst_high = 0;
+ * dst_low = dst_high >> shift_amt[4:0]
+ * else
+ * dst_high = dst_high >> shift_amt
+ * dst_low = (dst_high, dst_low) >> shift_amt
+ *
+ * The indirect shift will use the same logic at runtime.
+ */
+static int __shr_imm64(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+{
+ if (shift_amt < 32) {
+ emit_shf(nfp_prog, reg_both(dst), reg_a(dst + 1), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_R_DSHF, shift_amt);
+ emit_shf(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_NONE,
+ reg_b(dst + 1), SHF_SC_R_SHF, shift_amt);
+ } else if (shift_amt == 32) {
+ wrp_reg_mov(nfp_prog, dst, dst + 1);
+ wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+ } else if (shift_amt > 32) {
+ emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
+ reg_b(dst + 1), SHF_SC_R_SHF, shift_amt - 32);
+ wrp_immed(nfp_prog, reg_both(dst + 1), 0);
}
return 0;
@@ -1574,21 +1842,186 @@
const struct bpf_insn *insn = &meta->insn;
u8 dst = insn->dst_reg * 2;
- if (insn->imm < 32) {
- emit_shf(nfp_prog, reg_both(dst),
- reg_a(dst + 1), SHF_OP_NONE, reg_b(dst),
- SHF_SC_R_DSHF, insn->imm);
- emit_shf(nfp_prog, reg_both(dst + 1),
- reg_none(), SHF_OP_NONE, reg_b(dst + 1),
- SHF_SC_R_SHF, insn->imm);
- } else if (insn->imm == 32) {
+ return __shr_imm64(nfp_prog, dst, insn->imm);
+}
+
+/* NOTE: for indirect right shift, LOW part should be calculated first. */
+static void shr_reg64_lt32_high(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_imm(0));
+ emit_shf_indir(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_NONE,
+ reg_b(dst + 1), SHF_SC_R_SHF);
+}
+
+static void shr_reg64_lt32_low(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_imm(0));
+ emit_shf_indir(nfp_prog, reg_both(dst), reg_a(dst + 1), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_R_DSHF);
+}
+
+static void shr_reg64_lt32(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ shr_reg64_lt32_low(nfp_prog, dst, src);
+ shr_reg64_lt32_high(nfp_prog, dst, src);
+}
+
+static void shr_reg64_ge32(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_imm(0));
+ emit_shf_indir(nfp_prog, reg_both(dst), reg_none(), SHF_OP_NONE,
+ reg_b(dst + 1), SHF_SC_R_SHF);
+ wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+}
+
+static int shr_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ const struct bpf_insn *insn = &meta->insn;
+ u64 umin, umax;
+ u8 dst, src;
+
+ dst = insn->dst_reg * 2;
+ umin = meta->umin;
+ umax = meta->umax;
+ if (umin == umax)
+ return __shr_imm64(nfp_prog, dst, umin);
+
+ src = insn->src_reg * 2;
+ if (umax < 32) {
+ shr_reg64_lt32(nfp_prog, dst, src);
+ } else if (umin >= 32) {
+ shr_reg64_ge32(nfp_prog, dst, src);
+ } else {
+ /* Generate different instruction sequences depending on runtime
+ * value of shift amount.
+ */
+ u16 label_ge32, label_end;
+
+ label_ge32 = nfp_prog_current_offset(nfp_prog) + 6;
+ emit_br_bset(nfp_prog, reg_a(src), 5, label_ge32, 0);
+ shr_reg64_lt32_low(nfp_prog, dst, src);
+ label_end = nfp_prog_current_offset(nfp_prog) + 6;
+ emit_br(nfp_prog, BR_UNC, label_end, 2);
+ /* shr_reg64_lt32_high packed in delay slot. */
+ shr_reg64_lt32_high(nfp_prog, dst, src);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, label_ge32))
+ return -EINVAL;
+ shr_reg64_ge32(nfp_prog, dst, src);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, label_end))
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/* Code logic is the same as __shr_imm64 except ashr requires signedness bit
+ * told through PREV_ALU result.
+ */
+static int __ashr_imm64(struct nfp_prog *nfp_prog, u8 dst, u8 shift_amt)
+{
+ if (shift_amt < 32) {
+ emit_shf(nfp_prog, reg_both(dst), reg_a(dst + 1), SHF_OP_NONE,
+ reg_b(dst), SHF_SC_R_DSHF, shift_amt);
+ /* Set signedness bit. */
+ emit_alu(nfp_prog, reg_none(), reg_a(dst + 1), ALU_OP_OR,
+ reg_imm(0));
+ emit_shf(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_ASHR,
+ reg_b(dst + 1), SHF_SC_R_SHF, shift_amt);
+ } else if (shift_amt == 32) {
+ /* NOTE: this also helps setting signedness bit. */
wrp_reg_mov(nfp_prog, dst, dst + 1);
- wrp_immed(nfp_prog, reg_both(dst + 1), 0);
- } else if (insn->imm > 32) {
- emit_shf(nfp_prog, reg_both(dst),
- reg_none(), SHF_OP_NONE, reg_b(dst + 1),
- SHF_SC_R_SHF, insn->imm - 32);
- wrp_immed(nfp_prog, reg_both(dst + 1), 0);
+ emit_shf(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_ASHR,
+ reg_b(dst + 1), SHF_SC_R_SHF, 31);
+ } else if (shift_amt > 32) {
+ emit_alu(nfp_prog, reg_none(), reg_a(dst + 1), ALU_OP_OR,
+ reg_imm(0));
+ emit_shf(nfp_prog, reg_both(dst), reg_none(), SHF_OP_ASHR,
+ reg_b(dst + 1), SHF_SC_R_SHF, shift_amt - 32);
+ emit_shf(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_ASHR,
+ reg_b(dst + 1), SHF_SC_R_SHF, 31);
+ }
+
+ return 0;
+}
+
+static int ashr_imm64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ const struct bpf_insn *insn = &meta->insn;
+ u8 dst = insn->dst_reg * 2;
+
+ return __ashr_imm64(nfp_prog, dst, insn->imm);
+}
+
+static void ashr_reg64_lt32_high(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ /* NOTE: the first insn will set both indirect shift amount (source A)
+ * and signedness bit (MSB of result).
+ */
+ emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_b(dst + 1));
+ emit_shf_indir(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_ASHR,
+ reg_b(dst + 1), SHF_SC_R_SHF);
+}
+
+static void ashr_reg64_lt32_low(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ /* NOTE: it is the same as logic shift because we don't need to shift in
+ * signedness bit when the shift amount is less than 32.
+ */
+ return shr_reg64_lt32_low(nfp_prog, dst, src);
+}
+
+static void ashr_reg64_lt32(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ ashr_reg64_lt32_low(nfp_prog, dst, src);
+ ashr_reg64_lt32_high(nfp_prog, dst, src);
+}
+
+static void ashr_reg64_ge32(struct nfp_prog *nfp_prog, u8 dst, u8 src)
+{
+ emit_alu(nfp_prog, reg_none(), reg_a(src), ALU_OP_OR, reg_b(dst + 1));
+ emit_shf_indir(nfp_prog, reg_both(dst), reg_none(), SHF_OP_ASHR,
+ reg_b(dst + 1), SHF_SC_R_SHF);
+ emit_shf(nfp_prog, reg_both(dst + 1), reg_none(), SHF_OP_ASHR,
+ reg_b(dst + 1), SHF_SC_R_SHF, 31);
+}
+
+/* Like ashr_imm64, but need to use indirect shift. */
+static int ashr_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ const struct bpf_insn *insn = &meta->insn;
+ u64 umin, umax;
+ u8 dst, src;
+
+ dst = insn->dst_reg * 2;
+ umin = meta->umin;
+ umax = meta->umax;
+ if (umin == umax)
+ return __ashr_imm64(nfp_prog, dst, umin);
+
+ src = insn->src_reg * 2;
+ if (umax < 32) {
+ ashr_reg64_lt32(nfp_prog, dst, src);
+ } else if (umin >= 32) {
+ ashr_reg64_ge32(nfp_prog, dst, src);
+ } else {
+ u16 label_ge32, label_end;
+
+ label_ge32 = nfp_prog_current_offset(nfp_prog) + 6;
+ emit_br_bset(nfp_prog, reg_a(src), 5, label_ge32, 0);
+ ashr_reg64_lt32_low(nfp_prog, dst, src);
+ label_end = nfp_prog_current_offset(nfp_prog) + 6;
+ emit_br(nfp_prog, BR_UNC, label_end, 2);
+ /* ashr_reg64_lt32_high packed in delay slot. */
+ ashr_reg64_lt32_high(nfp_prog, dst, src);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, label_ge32))
+ return -EINVAL;
+ ashr_reg64_ge32(nfp_prog, dst, src);
+
+ if (!nfp_prog_confirm_current_offset(nfp_prog, label_end))
+ return -EINVAL;
}
return 0;
@@ -2108,6 +2541,17 @@
false, wrp_lmem_store);
}
+static int mem_stx_xdp(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+ switch (meta->insn.off) {
+ case offsetof(struct xdp_md, rx_queue_index):
+ return nfp_queue_select(nfp_prog, meta);
+ }
+
+ WARN_ON_ONCE(1); /* verifier should have rejected bad accesses */
+ return -EOPNOTSUPP;
+}
+
static int
mem_stx(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
unsigned int size)
@@ -2134,6 +2578,9 @@
static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
+ if (meta->ptr.type == PTR_TO_CTX)
+ if (nfp_prog->type == BPF_PROG_TYPE_XDP)
+ return mem_stx_xdp(nfp_prog, meta);
return mem_stx(nfp_prog, meta, 4);
}
@@ -2283,46 +2730,6 @@
return 0;
}
-static int jgt_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BLO, true);
-}
-
-static int jge_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BHS, false);
-}
-
-static int jlt_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BLO, false);
-}
-
-static int jle_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BHS, true);
-}
-
-static int jsgt_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BLT, true);
-}
-
-static int jsge_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BGE, false);
-}
-
-static int jslt_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BLT, false);
-}
-
-static int jsle_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_imm(nfp_prog, meta, BR_BGE, true);
-}
-
static int jset_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
const struct bpf_insn *insn = &meta->insn;
@@ -2392,46 +2799,6 @@
return 0;
}
-static int jgt_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BLO, true);
-}
-
-static int jge_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BHS, false);
-}
-
-static int jlt_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BLO, false);
-}
-
-static int jle_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BHS, true);
-}
-
-static int jsgt_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BLT, true);
-}
-
-static int jsge_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BGE, false);
-}
-
-static int jslt_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BLT, false);
-}
-
-static int jsle_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
-{
- return wrp_cmp_reg(nfp_prog, meta, BR_BGE, true);
-}
-
static int jset_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
return wrp_test_reg(nfp_prog, meta, ALU_OP_AND, BR_BNE);
@@ -2453,6 +2820,8 @@
return map_call_stack_common(nfp_prog, meta);
case BPF_FUNC_get_prandom_u32:
return nfp_get_prandom_u32(nfp_prog, meta);
+ case BPF_FUNC_perf_event_output:
+ return nfp_perf_event_output(nfp_prog, meta);
default:
WARN_ONCE(1, "verifier allowed unsupported function\n");
return -EOPNOTSUPP;
@@ -2480,8 +2849,12 @@
[BPF_ALU64 | BPF_SUB | BPF_X] = sub_reg64,
[BPF_ALU64 | BPF_SUB | BPF_K] = sub_imm64,
[BPF_ALU64 | BPF_NEG] = neg_reg64,
+ [BPF_ALU64 | BPF_LSH | BPF_X] = shl_reg64,
[BPF_ALU64 | BPF_LSH | BPF_K] = shl_imm64,
+ [BPF_ALU64 | BPF_RSH | BPF_X] = shr_reg64,
[BPF_ALU64 | BPF_RSH | BPF_K] = shr_imm64,
+ [BPF_ALU64 | BPF_ARSH | BPF_X] = ashr_reg64,
+ [BPF_ALU64 | BPF_ARSH | BPF_K] = ashr_imm64,
[BPF_ALU | BPF_MOV | BPF_X] = mov_reg,
[BPF_ALU | BPF_MOV | BPF_K] = mov_imm,
[BPF_ALU | BPF_XOR | BPF_X] = xor_reg,
@@ -2520,25 +2893,25 @@
[BPF_ST | BPF_MEM | BPF_DW] = mem_st8,
[BPF_JMP | BPF_JA | BPF_K] = jump,
[BPF_JMP | BPF_JEQ | BPF_K] = jeq_imm,
- [BPF_JMP | BPF_JGT | BPF_K] = jgt_imm,
- [BPF_JMP | BPF_JGE | BPF_K] = jge_imm,
- [BPF_JMP | BPF_JLT | BPF_K] = jlt_imm,
- [BPF_JMP | BPF_JLE | BPF_K] = jle_imm,
- [BPF_JMP | BPF_JSGT | BPF_K] = jsgt_imm,
- [BPF_JMP | BPF_JSGE | BPF_K] = jsge_imm,
- [BPF_JMP | BPF_JSLT | BPF_K] = jslt_imm,
- [BPF_JMP | BPF_JSLE | BPF_K] = jsle_imm,
+ [BPF_JMP | BPF_JGT | BPF_K] = cmp_imm,
+ [BPF_JMP | BPF_JGE | BPF_K] = cmp_imm,
+ [BPF_JMP | BPF_JLT | BPF_K] = cmp_imm,
+ [BPF_JMP | BPF_JLE | BPF_K] = cmp_imm,
+ [BPF_JMP | BPF_JSGT | BPF_K] = cmp_imm,
+ [BPF_JMP | BPF_JSGE | BPF_K] = cmp_imm,
+ [BPF_JMP | BPF_JSLT | BPF_K] = cmp_imm,
+ [BPF_JMP | BPF_JSLE | BPF_K] = cmp_imm,
[BPF_JMP | BPF_JSET | BPF_K] = jset_imm,
[BPF_JMP | BPF_JNE | BPF_K] = jne_imm,
[BPF_JMP | BPF_JEQ | BPF_X] = jeq_reg,
- [BPF_JMP | BPF_JGT | BPF_X] = jgt_reg,
- [BPF_JMP | BPF_JGE | BPF_X] = jge_reg,
- [BPF_JMP | BPF_JLT | BPF_X] = jlt_reg,
- [BPF_JMP | BPF_JLE | BPF_X] = jle_reg,
- [BPF_JMP | BPF_JSGT | BPF_X] = jsgt_reg,
- [BPF_JMP | BPF_JSGE | BPF_X] = jsge_reg,
- [BPF_JMP | BPF_JSLT | BPF_X] = jslt_reg,
- [BPF_JMP | BPF_JSLE | BPF_X] = jsle_reg,
+ [BPF_JMP | BPF_JGT | BPF_X] = cmp_reg,
+ [BPF_JMP | BPF_JGE | BPF_X] = cmp_reg,
+ [BPF_JMP | BPF_JLT | BPF_X] = cmp_reg,
+ [BPF_JMP | BPF_JLE | BPF_X] = cmp_reg,
+ [BPF_JMP | BPF_JSGT | BPF_X] = cmp_reg,
+ [BPF_JMP | BPF_JSGE | BPF_X] = cmp_reg,
+ [BPF_JMP | BPF_JSLT | BPF_X] = cmp_reg,
+ [BPF_JMP | BPF_JSLE | BPF_X] = cmp_reg,
[BPF_JMP | BPF_JSET | BPF_X] = jset_reg,
[BPF_JMP | BPF_JNE | BPF_X] = jne_reg,
[BPF_JMP | BPF_CALL] = call,
@@ -2777,6 +3150,54 @@
}
}
+/* abs(insn.imm) will fit better into unrestricted reg immediate -
+ * convert add/sub of a negative number into a sub/add of a positive one.
+ */
+static void nfp_bpf_opt_neg_add_sub(struct nfp_prog *nfp_prog)
+{
+ struct nfp_insn_meta *meta;
+
+ list_for_each_entry(meta, &nfp_prog->insns, l) {
+ struct bpf_insn insn = meta->insn;
+
+ if (meta->skip)
+ continue;
+
+ if (BPF_CLASS(insn.code) != BPF_ALU &&
+ BPF_CLASS(insn.code) != BPF_ALU64 &&
+ BPF_CLASS(insn.code) != BPF_JMP)
+ continue;
+ if (BPF_SRC(insn.code) != BPF_K)
+ continue;
+ if (insn.imm >= 0)
+ continue;
+
+ if (BPF_CLASS(insn.code) == BPF_JMP) {
+ switch (BPF_OP(insn.code)) {
+ case BPF_JGE:
+ case BPF_JSGE:
+ case BPF_JLT:
+ case BPF_JSLT:
+ meta->jump_neg_op = true;
+ break;
+ default:
+ continue;
+ }
+ } else {
+ if (BPF_OP(insn.code) == BPF_ADD)
+ insn.code = BPF_CLASS(insn.code) | BPF_SUB;
+ else if (BPF_OP(insn.code) == BPF_SUB)
+ insn.code = BPF_CLASS(insn.code) | BPF_ADD;
+ else
+ continue;
+
+ meta->insn.code = insn.code | BPF_K;
+ }
+
+ meta->insn.imm = -insn.imm;
+ }
+}
+
/* Remove masking after load since our load guarantees this is not needed */
static void nfp_bpf_opt_ld_mask(struct nfp_prog *nfp_prog)
{
@@ -3212,6 +3633,7 @@
{
nfp_bpf_opt_reg_init(nfp_prog);
+ nfp_bpf_opt_neg_add_sub(nfp_prog);
nfp_bpf_opt_ld_mask(nfp_prog);
nfp_bpf_opt_ld_shift(nfp_prog);
nfp_bpf_opt_ldst_gather(nfp_prog);
@@ -3220,6 +3642,33 @@
return 0;
}
+static int nfp_bpf_replace_map_ptrs(struct nfp_prog *nfp_prog)
+{
+ struct nfp_insn_meta *meta1, *meta2;
+ struct nfp_bpf_map *nfp_map;
+ struct bpf_map *map;
+
+ nfp_for_each_insn_walk2(nfp_prog, meta1, meta2) {
+ if (meta1->skip || meta2->skip)
+ continue;
+
+ if (meta1->insn.code != (BPF_LD | BPF_IMM | BPF_DW) ||
+ meta1->insn.src_reg != BPF_PSEUDO_MAP_FD)
+ continue;
+
+ map = (void *)(unsigned long)((u32)meta1->insn.imm |
+ (u64)meta2->insn.imm << 32);
+ if (bpf_map_offload_neutral(map))
+ continue;
+ nfp_map = map_to_offmap(map)->dev_priv;
+
+ meta1->insn.imm = nfp_map->tid;
+ meta2->insn.imm = 0;
+ }
+
+ return 0;
+}
+
static int nfp_bpf_ustore_calc(u64 *prog, unsigned int len)
{
__le64 *ustore = (__force __le64 *)prog;
@@ -3256,6 +3705,10 @@
{
int ret;
+ ret = nfp_bpf_replace_map_ptrs(nfp_prog);
+ if (ret)
+ return ret;
+
ret = nfp_bpf_optimize(nfp_prog);
if (ret)
return ret;
@@ -3346,6 +3799,9 @@
case BPF_FUNC_map_delete_elem:
val = nfp_prog->bpf->helpers.map_delete;
break;
+ case BPF_FUNC_perf_event_output:
+ val = nfp_prog->bpf->helpers.perf_event_output;
+ break;
default:
pr_err("relocation of unknown helper %d\n",
val);
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 35fb31f..fcdfb8e 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -43,6 +43,14 @@
#include "fw.h"
#include "main.h"
+const struct rhashtable_params nfp_bpf_maps_neutral_params = {
+ .nelem_hint = 4,
+ .key_len = FIELD_SIZEOF(struct nfp_bpf_neutral_map, ptr),
+ .key_offset = offsetof(struct nfp_bpf_neutral_map, ptr),
+ .head_offset = offsetof(struct nfp_bpf_neutral_map, l),
+ .automatic_shrinking = true,
+};
+
static bool nfp_net_ebpf_capable(struct nfp_net *nn)
{
#ifdef __LITTLE_ENDIAN
@@ -290,6 +298,9 @@
case BPF_FUNC_map_delete_elem:
bpf->helpers.map_delete = readl(&cap->func_addr);
break;
+ case BPF_FUNC_perf_event_output:
+ bpf->helpers.perf_event_output = readl(&cap->func_addr);
+ break;
}
return 0;
@@ -323,6 +334,13 @@
return 0;
}
+static int
+nfp_bpf_parse_cap_qsel(struct nfp_app_bpf *bpf, void __iomem *value, u32 length)
+{
+ bpf->queue_select = true;
+ return 0;
+}
+
static int nfp_bpf_parse_capabilities(struct nfp_app *app)
{
struct nfp_cpp *cpp = app->pf->cpp;
@@ -365,6 +383,10 @@
if (nfp_bpf_parse_cap_random(app->priv, value, length))
goto err_release_free;
break;
+ case NFP_BPF_CAP_TYPE_QUEUE_SELECT:
+ if (nfp_bpf_parse_cap_qsel(app->priv, value, length))
+ goto err_release_free;
+ break;
default:
nfp_dbg(cpp, "unknown BPF capability: %d\n", type);
break;
@@ -401,17 +423,28 @@
init_waitqueue_head(&bpf->cmsg_wq);
INIT_LIST_HEAD(&bpf->map_list);
- err = nfp_bpf_parse_capabilities(app);
+ err = rhashtable_init(&bpf->maps_neutral, &nfp_bpf_maps_neutral_params);
if (err)
goto err_free_bpf;
+ err = nfp_bpf_parse_capabilities(app);
+ if (err)
+ goto err_free_neutral_maps;
+
return 0;
+err_free_neutral_maps:
+ rhashtable_destroy(&bpf->maps_neutral);
err_free_bpf:
kfree(bpf);
return err;
}
+static void nfp_check_rhashtable_empty(void *ptr, void *arg)
+{
+ WARN_ON_ONCE(1);
+}
+
static void nfp_bpf_clean(struct nfp_app *app)
{
struct nfp_app_bpf *bpf = app->priv;
@@ -419,6 +452,8 @@
WARN_ON(!skb_queue_empty(&bpf->cmsg_replies));
WARN_ON(!list_empty(&bpf->map_list));
WARN_ON(bpf->maps_in_use || bpf->map_elems_in_use);
+ rhashtable_free_and_destroy(&bpf->maps_neutral,
+ nfp_check_rhashtable_empty, NULL);
kfree(bpf);
}
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 4981c89..654fe78 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -39,6 +39,7 @@
#include <linux/bpf_verifier.h>
#include <linux/kernel.h>
#include <linux/list.h>
+#include <linux/rhashtable.h>
#include <linux/skbuff.h>
#include <linux/types.h>
#include <linux/wait.h>
@@ -81,10 +82,16 @@
enum pkt_vec {
PKT_VEC_PKT_LEN = 0,
PKT_VEC_PKT_PTR = 2,
+ PKT_VEC_QSEL_SET = 4,
+ PKT_VEC_QSEL_VAL = 6,
};
+#define PKT_VEL_QSEL_SET_BIT 4
+
#define pv_len(np) reg_lm(1, PKT_VEC_PKT_LEN)
#define pv_ctm_ptr(np) reg_lm(1, PKT_VEC_PKT_PTR)
+#define pv_qsel_set(np) reg_lm(1, PKT_VEC_QSEL_SET)
+#define pv_qsel_val(np) reg_lm(1, PKT_VEC_QSEL_VAL)
#define stack_reg(np) reg_a(STATIC_REG_STACK)
#define stack_imm(np) imm_b(np)
@@ -114,6 +121,8 @@
* @maps_in_use: number of currently offloaded maps
* @map_elems_in_use: number of elements allocated to offloaded maps
*
+ * @maps_neutral: hash table of offload-neutral maps (on pointer)
+ *
* @adjust_head: adjust head capability
* @adjust_head.flags: extra flags for adjust head
* @adjust_head.off_min: minimal packet offset within buffer required
@@ -133,8 +142,10 @@
* @helpers.map_lookup: map lookup helper address
* @helpers.map_update: map update helper address
* @helpers.map_delete: map delete helper address
+ * @helpers.perf_event_output: output perf event to a ring buffer
*
* @pseudo_random: FW initialized the pseudo-random machinery (CSRs)
+ * @queue_select: BPF can set the RX queue ID in packet vector
*/
struct nfp_app_bpf {
struct nfp_app *app;
@@ -150,6 +161,8 @@
unsigned int maps_in_use;
unsigned int map_elems_in_use;
+ struct rhashtable maps_neutral;
+
struct nfp_bpf_cap_adjust_head {
u32 flags;
int off_min;
@@ -171,9 +184,11 @@
u32 map_lookup;
u32 map_update;
u32 map_delete;
+ u32 perf_event_output;
} helpers;
bool pseudo_random;
+ bool queue_select;
};
enum nfp_bpf_map_use {
@@ -199,6 +214,14 @@
enum nfp_bpf_map_use use_map[];
};
+struct nfp_bpf_neutral_map {
+ struct rhash_head l;
+ struct bpf_map *ptr;
+ u32 count;
+};
+
+extern const struct rhashtable_params nfp_bpf_maps_neutral_params;
+
struct nfp_prog;
struct nfp_insn_meta;
typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *);
@@ -236,9 +259,12 @@
* @xadd_over_16bit: 16bit immediate is not guaranteed
* @xadd_maybe_16bit: 16bit immediate is possible
* @jmp_dst: destination info for jump instructions
+ * @jump_neg_op: jump instruction has inverted immediate, use ADD instead of SUB
* @func_id: function id for call instructions
* @arg1: arg1 for call instructions
* @arg2: arg2 for call instructions
+ * @umin: copy of core verifier umin_value.
+ * @umax: copy of core verifier umax_value.
* @off: index of first generated machine instruction (in nfp_prog.prog)
* @n: eBPF instruction number
* @flags: eBPF instruction extra optimization flags
@@ -264,13 +290,23 @@
bool xadd_maybe_16bit;
};
/* jump */
- struct nfp_insn_meta *jmp_dst;
+ struct {
+ struct nfp_insn_meta *jmp_dst;
+ bool jump_neg_op;
+ };
/* function calls */
struct {
u32 func_id;
struct bpf_reg_state arg1;
struct nfp_bpf_reg_state arg2;
};
+ /* We are interested in range info for some operands,
+ * for example, the shift amount.
+ */
+ struct {
+ u64 umin;
+ u64 umax;
+ };
};
unsigned int off;
unsigned short n;
@@ -348,6 +384,25 @@
return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_XADD);
}
+static inline bool is_mbpf_indir_shift(const struct nfp_insn_meta *meta)
+{
+ u8 code = meta->insn.code;
+ bool is_alu, is_shift;
+ u8 opclass, opcode;
+
+ opclass = BPF_CLASS(code);
+ is_alu = opclass == BPF_ALU64 || opclass == BPF_ALU;
+ if (!is_alu)
+ return false;
+
+ opcode = BPF_OP(code);
+ is_shift = opcode == BPF_LSH || opcode == BPF_RSH || opcode == BPF_ARSH;
+ if (!is_shift)
+ return false;
+
+ return BPF_SRC(code) == BPF_X;
+}
+
/**
* struct nfp_prog - nfp BPF program
* @bpf: backpointer to the bpf app priv structure
@@ -363,6 +418,8 @@
* @error: error code if something went wrong
* @stack_depth: max stack depth from the verifier
* @adjust_head_location: if program has single adjust head call - the insn no.
+ * @map_records_cnt: the number of map pointers recorded for this prog
+ * @map_records: the map record pointers from bpf->maps_neutral
* @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
*/
struct nfp_prog {
@@ -386,6 +443,9 @@
unsigned int stack_depth;
unsigned int adjust_head_location;
+ unsigned int map_records_cnt;
+ struct nfp_bpf_neutral_map **map_records;
+
struct list_head insns;
};
@@ -436,5 +496,7 @@
int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
void *key, void *next_key);
+int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb);
+
void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb);
#endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 42d98792..7eae4c0 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -57,6 +57,126 @@
#include "../nfp_net.h"
static int
+nfp_map_ptr_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
+ struct bpf_map *map)
+{
+ struct nfp_bpf_neutral_map *record;
+ int err;
+
+ /* Map record paths are entered via ndo, update side is protected. */
+ ASSERT_RTNL();
+
+ /* Reuse path - other offloaded program is already tracking this map. */
+ record = rhashtable_lookup_fast(&bpf->maps_neutral, &map,
+ nfp_bpf_maps_neutral_params);
+ if (record) {
+ nfp_prog->map_records[nfp_prog->map_records_cnt++] = record;
+ record->count++;
+ return 0;
+ }
+
+ /* Grab a single ref to the map for our record. The prog destroy ndo
+ * happens after free_used_maps().
+ */
+ map = bpf_map_inc(map, false);
+ if (IS_ERR(map))
+ return PTR_ERR(map);
+
+ record = kmalloc(sizeof(*record), GFP_KERNEL);
+ if (!record) {
+ err = -ENOMEM;
+ goto err_map_put;
+ }
+
+ record->ptr = map;
+ record->count = 1;
+
+ err = rhashtable_insert_fast(&bpf->maps_neutral, &record->l,
+ nfp_bpf_maps_neutral_params);
+ if (err)
+ goto err_free_rec;
+
+ nfp_prog->map_records[nfp_prog->map_records_cnt++] = record;
+
+ return 0;
+
+err_free_rec:
+ kfree(record);
+err_map_put:
+ bpf_map_put(map);
+ return err;
+}
+
+static void
+nfp_map_ptrs_forget(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog)
+{
+ bool freed = false;
+ int i;
+
+ ASSERT_RTNL();
+
+ for (i = 0; i < nfp_prog->map_records_cnt; i++) {
+ if (--nfp_prog->map_records[i]->count) {
+ nfp_prog->map_records[i] = NULL;
+ continue;
+ }
+
+ WARN_ON(rhashtable_remove_fast(&bpf->maps_neutral,
+ &nfp_prog->map_records[i]->l,
+ nfp_bpf_maps_neutral_params));
+ freed = true;
+ }
+
+ if (freed) {
+ synchronize_rcu();
+
+ for (i = 0; i < nfp_prog->map_records_cnt; i++)
+ if (nfp_prog->map_records[i]) {
+ bpf_map_put(nfp_prog->map_records[i]->ptr);
+ kfree(nfp_prog->map_records[i]);
+ }
+ }
+
+ kfree(nfp_prog->map_records);
+ nfp_prog->map_records = NULL;
+ nfp_prog->map_records_cnt = 0;
+}
+
+static int
+nfp_map_ptrs_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
+ struct bpf_prog *prog)
+{
+ int i, cnt, err;
+
+ /* Quickly count the maps we will have to remember */
+ cnt = 0;
+ for (i = 0; i < prog->aux->used_map_cnt; i++)
+ if (bpf_map_offload_neutral(prog->aux->used_maps[i]))
+ cnt++;
+ if (!cnt)
+ return 0;
+
+ nfp_prog->map_records = kmalloc_array(cnt,
+ sizeof(nfp_prog->map_records[0]),
+ GFP_KERNEL);
+ if (!nfp_prog->map_records)
+ return -ENOMEM;
+
+ for (i = 0; i < prog->aux->used_map_cnt; i++)
+ if (bpf_map_offload_neutral(prog->aux->used_maps[i])) {
+ err = nfp_map_ptr_record(bpf, nfp_prog,
+ prog->aux->used_maps[i]);
+ if (err) {
+ nfp_map_ptrs_forget(bpf, nfp_prog);
+ return err;
+ }
+ }
+ WARN_ON(cnt != nfp_prog->map_records_cnt);
+
+ return 0;
+}
+
+static int
nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
unsigned int cnt)
{
@@ -70,6 +190,8 @@
meta->insn = prog[i];
meta->n = i;
+ if (is_mbpf_indir_shift(meta))
+ meta->umin = U64_MAX;
list_add_tail(&meta->l, &nfp_prog->insns);
}
@@ -151,7 +273,7 @@
prog->aux->offload->jited_len = nfp_prog->prog_len * sizeof(u64);
prog->aux->offload->jited_image = nfp_prog->prog;
- return 0;
+ return nfp_map_ptrs_record(nfp_prog->bpf, nfp_prog, prog);
}
static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
@@ -159,6 +281,7 @@
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
kvfree(nfp_prog->prog);
+ nfp_map_ptrs_forget(nfp_prog->bpf, nfp_prog);
nfp_prog_free(nfp_prog);
return 0;
@@ -320,6 +443,53 @@
}
}
+static unsigned long
+nfp_bpf_perf_event_copy(void *dst, const void *src,
+ unsigned long off, unsigned long len)
+{
+ memcpy(dst, src + off, len);
+ return 0;
+}
+
+int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb)
+{
+ struct cmsg_bpf_event *cbe = (void *)skb->data;
+ u32 pkt_size, data_size;
+ struct bpf_map *map;
+
+ if (skb->len < sizeof(struct cmsg_bpf_event))
+ goto err_drop;
+
+ pkt_size = be32_to_cpu(cbe->pkt_size);
+ data_size = be32_to_cpu(cbe->data_size);
+ map = (void *)(unsigned long)be64_to_cpu(cbe->map_ptr);
+
+ if (skb->len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size)
+ goto err_drop;
+ if (cbe->hdr.ver != CMSG_MAP_ABI_VERSION)
+ goto err_drop;
+
+ rcu_read_lock();
+ if (!rhashtable_lookup_fast(&bpf->maps_neutral, &map,
+ nfp_bpf_maps_neutral_params)) {
+ rcu_read_unlock();
+ pr_warn("perf event: dest map pointer %px not recognized, dropping event\n",
+ map);
+ goto err_drop;
+ }
+
+ bpf_event_output(map, be32_to_cpu(cbe->cpu_id),
+ &cbe->data[round_up(pkt_size, 4)], data_size,
+ cbe->data, pkt_size, nfp_bpf_perf_event_copy);
+ rcu_read_unlock();
+
+ dev_consume_skb_any(skb);
+ return 0;
+err_drop:
+ dev_kfree_skb_any(skb);
+ return -EINVAL;
+}
+
static int
nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 06ad53c..4bfeba7 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2016-2017 Netronome Systems, Inc.
+ * Copyright (C) 2016-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -36,6 +36,8 @@
#include <linux/kernel.h>
#include <linux/pkt_cls.h>
+#include "../nfp_app.h"
+#include "../nfp_main.h"
#include "fw.h"
#include "main.h"
@@ -149,15 +151,6 @@
return false;
}
- /* Rest of the checks is only if we re-parse the same insn */
- if (!meta->func_id)
- return true;
-
- if (meta->arg1.map_ptr != reg1->map_ptr) {
- pr_vlog(env, "%s: called for different map\n", fname);
- return false;
- }
-
return true;
}
@@ -216,6 +209,71 @@
pr_vlog(env, "bpf_get_prandom_u32(): FW doesn't support random number generation\n");
return -EOPNOTSUPP;
+ case BPF_FUNC_perf_event_output:
+ BUILD_BUG_ON(NFP_BPF_SCALAR_VALUE != SCALAR_VALUE ||
+ NFP_BPF_MAP_VALUE != PTR_TO_MAP_VALUE ||
+ NFP_BPF_STACK != PTR_TO_STACK ||
+ NFP_BPF_PACKET_DATA != PTR_TO_PACKET);
+
+ if (!bpf->helpers.perf_event_output) {
+ pr_vlog(env, "event_output: not supported by FW\n");
+ return -EOPNOTSUPP;
+ }
+
+ /* Force current CPU to make sure we can report the event
+ * wherever we get the control message from FW.
+ */
+ if (reg3->var_off.mask & BPF_F_INDEX_MASK ||
+ (reg3->var_off.value & BPF_F_INDEX_MASK) !=
+ BPF_F_CURRENT_CPU) {
+ char tn_buf[48];
+
+ tnum_strn(tn_buf, sizeof(tn_buf), reg3->var_off);
+ pr_vlog(env, "event_output: must use BPF_F_CURRENT_CPU, var_off: %s\n",
+ tn_buf);
+ return -EOPNOTSUPP;
+ }
+
+ /* Save space in meta, we don't care about arguments other
+ * than 4th meta, shove it into arg1.
+ */
+ reg1 = cur_regs(env) + BPF_REG_4;
+
+ if (reg1->type != SCALAR_VALUE /* NULL ptr */ &&
+ reg1->type != PTR_TO_STACK &&
+ reg1->type != PTR_TO_MAP_VALUE &&
+ reg1->type != PTR_TO_PACKET) {
+ pr_vlog(env, "event_output: unsupported ptr type: %d\n",
+ reg1->type);
+ return -EOPNOTSUPP;
+ }
+
+ if (reg1->type == PTR_TO_STACK &&
+ !nfp_bpf_stack_arg_ok("event_output", env, reg1, NULL))
+ return -EOPNOTSUPP;
+
+ /* Warn user that on offload NFP may return success even if map
+ * is not going to accept the event, since the event output is
+ * fully async and device won't know the state of the map.
+ * There is also FW limitation on the event length.
+ *
+ * Lost events will not show up on the perf ring, driver
+ * won't see them at all. Events may also get reordered.
+ */
+ dev_warn_once(&nfp_prog->bpf->app->pf->pdev->dev,
+ "bpf: note: return codes and behavior of bpf_event_output() helper differs for offloaded programs!\n");
+ pr_vlog(env, "warning: return codes and behavior of event_output helper differ for offload!\n");
+
+ if (!meta->func_id)
+ break;
+
+ if (reg1->type != meta->arg1.type) {
+ pr_vlog(env, "event_output: ptr type changed: %d %d\n",
+ meta->arg1.type, reg1->type);
+ return -EINVAL;
+ }
+ break;
+
default:
pr_vlog(env, "unsupported function id: %d\n", func_id);
return -EOPNOTSUPP;
@@ -410,6 +468,30 @@
}
static int
+nfp_bpf_check_store(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+ struct bpf_verifier_env *env)
+{
+ const struct bpf_reg_state *reg = cur_regs(env) + meta->insn.dst_reg;
+
+ if (reg->type == PTR_TO_CTX) {
+ if (nfp_prog->type == BPF_PROG_TYPE_XDP) {
+ /* XDP ctx accesses must be 4B in size */
+ switch (meta->insn.off) {
+ case offsetof(struct xdp_md, rx_queue_index):
+ if (nfp_prog->bpf->queue_select)
+ goto exit_check_ptr;
+ pr_vlog(env, "queue selection not supported by FW\n");
+ return -EOPNOTSUPP;
+ }
+ }
+ pr_vlog(env, "unsupported store to context field\n");
+ return -EOPNOTSUPP;
+ }
+exit_check_ptr:
+ return nfp_bpf_check_ptr(nfp_prog, meta, env, meta->insn.dst_reg);
+}
+
+static int
nfp_bpf_check_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
struct bpf_verifier_env *env)
{
@@ -464,11 +546,19 @@
return nfp_bpf_check_ptr(nfp_prog, meta, env,
meta->insn.src_reg);
if (is_mbpf_store(meta))
- return nfp_bpf_check_ptr(nfp_prog, meta, env,
- meta->insn.dst_reg);
+ return nfp_bpf_check_store(nfp_prog, meta, env);
+
if (is_mbpf_xadd(meta))
return nfp_bpf_check_xadd(nfp_prog, meta, env);
+ if (is_mbpf_indir_shift(meta)) {
+ const struct bpf_reg_state *sreg =
+ cur_regs(env) + meta->insn.src_reg;
+
+ meta->umin = min(meta->umin, sreg->umin_value);
+ meta->umax = max(meta->umax, sreg->umax_value);
+ }
+
return 0;
}
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 80df9a5..4a6d2db 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -72,6 +72,42 @@
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
}
+static int
+nfp_fl_pre_lag(struct nfp_app *app, const struct tc_action *action,
+ struct nfp_fl_payload *nfp_flow, int act_len)
+{
+ size_t act_size = sizeof(struct nfp_fl_pre_lag);
+ struct nfp_fl_pre_lag *pre_lag;
+ struct net_device *out_dev;
+ int err;
+
+ out_dev = tcf_mirred_dev(action);
+ if (!out_dev || !netif_is_lag_master(out_dev))
+ return 0;
+
+ if (act_len + act_size > NFP_FL_MAX_A_SIZ)
+ return -EOPNOTSUPP;
+
+ /* Pre_lag action must be first on action list.
+ * If other actions already exist they need pushed forward.
+ */
+ if (act_len)
+ memmove(nfp_flow->action_data + act_size,
+ nfp_flow->action_data, act_len);
+
+ pre_lag = (struct nfp_fl_pre_lag *)nfp_flow->action_data;
+ err = nfp_flower_lag_populate_pre_action(app, out_dev, pre_lag);
+ if (err)
+ return err;
+
+ pre_lag->head.jump_id = NFP_FL_ACTION_OPCODE_PRE_LAG;
+ pre_lag->head.len_lw = act_size >> NFP_FL_LW_SIZ;
+
+ nfp_flow->meta.shortcut = cpu_to_be32(NFP_FL_SC_ACT_NULL);
+
+ return act_size;
+}
+
static bool nfp_fl_netdev_is_tunnel_type(struct net_device *out_dev,
enum nfp_flower_tun_type tun_type)
{
@@ -88,12 +124,13 @@
}
static int
-nfp_fl_output(struct nfp_fl_output *output, const struct tc_action *action,
- struct nfp_fl_payload *nfp_flow, bool last,
- struct net_device *in_dev, enum nfp_flower_tun_type tun_type,
- int *tun_out_cnt)
+nfp_fl_output(struct nfp_app *app, struct nfp_fl_output *output,
+ const struct tc_action *action, struct nfp_fl_payload *nfp_flow,
+ bool last, struct net_device *in_dev,
+ enum nfp_flower_tun_type tun_type, int *tun_out_cnt)
{
size_t act_size = sizeof(struct nfp_fl_output);
+ struct nfp_flower_priv *priv = app->priv;
struct net_device *out_dev;
u16 tmp_flags;
@@ -118,6 +155,15 @@
output->flags = cpu_to_be16(tmp_flags |
NFP_FL_OUT_FLAGS_USE_TUN);
output->port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type);
+ } else if (netif_is_lag_master(out_dev) &&
+ priv->flower_ext_feats & NFP_FL_FEATS_LAG) {
+ int gid;
+
+ output->flags = cpu_to_be16(tmp_flags);
+ gid = nfp_flower_lag_get_output_id(app, out_dev);
+ if (gid < 0)
+ return gid;
+ output->port = cpu_to_be32(NFP_FL_LAG_OUT | gid);
} else {
/* Set action output parameters. */
output->flags = cpu_to_be16(tmp_flags);
@@ -164,7 +210,7 @@
struct nfp_fl_pre_tunnel *pre_tun_act;
/* Pre_tunnel action must be first on action list.
- * If other actions already exist they need pushed forward.
+ * If other actions already exist they need to be pushed forward.
*/
if (act_len)
memmove(act_data + act_size, act_data, act_len);
@@ -443,42 +489,73 @@
}
static int
-nfp_flower_loop_action(const struct tc_action *a,
+nfp_flower_output_action(struct nfp_app *app, const struct tc_action *a,
+ struct nfp_fl_payload *nfp_fl, int *a_len,
+ struct net_device *netdev, bool last,
+ enum nfp_flower_tun_type *tun_type, int *tun_out_cnt,
+ int *out_cnt)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ struct nfp_fl_output *output;
+ int err, prelag_size;
+
+ if (*a_len + sizeof(struct nfp_fl_output) > NFP_FL_MAX_A_SIZ)
+ return -EOPNOTSUPP;
+
+ output = (struct nfp_fl_output *)&nfp_fl->action_data[*a_len];
+ err = nfp_fl_output(app, output, a, nfp_fl, last, netdev, *tun_type,
+ tun_out_cnt);
+ if (err)
+ return err;
+
+ *a_len += sizeof(struct nfp_fl_output);
+
+ if (priv->flower_ext_feats & NFP_FL_FEATS_LAG) {
+ /* nfp_fl_pre_lag returns -err or size of prelag action added.
+ * This will be 0 if it is not egressing to a lag dev.
+ */
+ prelag_size = nfp_fl_pre_lag(app, a, nfp_fl, *a_len);
+ if (prelag_size < 0)
+ return prelag_size;
+ else if (prelag_size > 0 && (!last || *out_cnt))
+ return -EOPNOTSUPP;
+
+ *a_len += prelag_size;
+ }
+ (*out_cnt)++;
+
+ return 0;
+}
+
+static int
+nfp_flower_loop_action(struct nfp_app *app, const struct tc_action *a,
struct nfp_fl_payload *nfp_fl, int *a_len,
struct net_device *netdev,
- enum nfp_flower_tun_type *tun_type, int *tun_out_cnt)
+ enum nfp_flower_tun_type *tun_type, int *tun_out_cnt,
+ int *out_cnt)
{
struct nfp_fl_set_ipv4_udp_tun *set_tun;
struct nfp_fl_pre_tunnel *pre_tun;
struct nfp_fl_push_vlan *psh_v;
struct nfp_fl_pop_vlan *pop_v;
- struct nfp_fl_output *output;
int err;
if (is_tcf_gact_shot(a)) {
nfp_fl->meta.shortcut = cpu_to_be32(NFP_FL_SC_ACT_DROP);
} else if (is_tcf_mirred_egress_redirect(a)) {
- if (*a_len + sizeof(struct nfp_fl_output) > NFP_FL_MAX_A_SIZ)
- return -EOPNOTSUPP;
-
- output = (struct nfp_fl_output *)&nfp_fl->action_data[*a_len];
- err = nfp_fl_output(output, a, nfp_fl, true, netdev, *tun_type,
- tun_out_cnt);
+ err = nfp_flower_output_action(app, a, nfp_fl, a_len, netdev,
+ true, tun_type, tun_out_cnt,
+ out_cnt);
if (err)
return err;
- *a_len += sizeof(struct nfp_fl_output);
} else if (is_tcf_mirred_egress_mirror(a)) {
- if (*a_len + sizeof(struct nfp_fl_output) > NFP_FL_MAX_A_SIZ)
- return -EOPNOTSUPP;
-
- output = (struct nfp_fl_output *)&nfp_fl->action_data[*a_len];
- err = nfp_fl_output(output, a, nfp_fl, false, netdev, *tun_type,
- tun_out_cnt);
+ err = nfp_flower_output_action(app, a, nfp_fl, a_len, netdev,
+ false, tun_type, tun_out_cnt,
+ out_cnt);
if (err)
return err;
- *a_len += sizeof(struct nfp_fl_output);
} else if (is_tcf_vlan(a) && tcf_vlan_action(a) == TCA_VLAN_ACT_POP) {
if (*a_len + sizeof(struct nfp_fl_pop_vlan) > NFP_FL_MAX_A_SIZ)
return -EOPNOTSUPP;
@@ -535,11 +612,12 @@
return 0;
}
-int nfp_flower_compile_action(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_action(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
struct net_device *netdev,
struct nfp_fl_payload *nfp_flow)
{
- int act_len, act_cnt, err, tun_out_cnt;
+ int act_len, act_cnt, err, tun_out_cnt, out_cnt;
enum nfp_flower_tun_type tun_type;
const struct tc_action *a;
LIST_HEAD(actions);
@@ -550,11 +628,12 @@
act_len = 0;
act_cnt = 0;
tun_out_cnt = 0;
+ out_cnt = 0;
tcf_exts_to_list(flow->exts, &actions);
list_for_each_entry(a, &actions, list) {
- err = nfp_flower_loop_action(a, nfp_flow, &act_len, netdev,
- &tun_type, &tun_out_cnt);
+ err = nfp_flower_loop_action(app, a, nfp_flow, &act_len, netdev,
+ &tun_type, &tun_out_cnt, &out_cnt);
if (err)
return err;
act_cnt++;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
index 577659f..cb85652 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -239,8 +239,10 @@
static void
nfp_flower_cmsg_process_one_rx(struct nfp_app *app, struct sk_buff *skb)
{
+ struct nfp_flower_priv *app_priv = app->priv;
struct nfp_flower_cmsg_hdr *cmsg_hdr;
enum nfp_flower_cmsg_type_port type;
+ bool skb_stored = false;
cmsg_hdr = nfp_flower_cmsg_get_hdr(skb);
@@ -258,13 +260,20 @@
case NFP_FLOWER_CMSG_TYPE_ACTIVE_TUNS:
nfp_tunnel_keep_alive(app, skb);
break;
+ case NFP_FLOWER_CMSG_TYPE_LAG_CONFIG:
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG) {
+ skb_stored = nfp_flower_lag_unprocessed_msg(app, skb);
+ break;
+ }
+ /* fall through */
default:
nfp_flower_cmsg_warn(app, "Cannot handle invalid repr control type %u\n",
type);
goto out;
}
- dev_consume_skb_any(skb);
+ if (!skb_stored)
+ dev_consume_skb_any(skb);
return;
out:
dev_kfree_skb_any(skb);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index bee4367..4a7f351 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -92,6 +92,7 @@
#define NFP_FL_ACTION_OPCODE_SET_IPV6_DST 12
#define NFP_FL_ACTION_OPCODE_SET_UDP 14
#define NFP_FL_ACTION_OPCODE_SET_TCP 15
+#define NFP_FL_ACTION_OPCODE_PRE_LAG 16
#define NFP_FL_ACTION_OPCODE_PRE_TUNNEL 17
#define NFP_FL_ACTION_OPCODE_NUM 32
@@ -103,6 +104,9 @@
#define NFP_FL_PUSH_VLAN_CFI BIT(12)
#define NFP_FL_PUSH_VLAN_VID GENMASK(11, 0)
+/* LAG ports */
+#define NFP_FL_LAG_OUT 0xC0DE0000
+
/* Tunnel ports */
#define NFP_FL_PORT_TYPE_TUN 0x50000000
#define NFP_FL_IPV4_TUNNEL_TYPE GENMASK(7, 4)
@@ -177,6 +181,15 @@
__be16 reserved;
};
+struct nfp_fl_pre_lag {
+ struct nfp_fl_act_head head;
+ __be16 group_id;
+ u8 lag_version[3];
+ u8 instance;
+};
+
+#define NFP_FL_PRE_LAG_VER_OFF 8
+
struct nfp_fl_pre_tunnel {
struct nfp_fl_act_head head;
__be16 reserved;
@@ -366,6 +379,7 @@
enum nfp_flower_cmsg_type_port {
NFP_FLOWER_CMSG_TYPE_FLOW_ADD = 0,
NFP_FLOWER_CMSG_TYPE_FLOW_DEL = 2,
+ NFP_FLOWER_CMSG_TYPE_LAG_CONFIG = 4,
NFP_FLOWER_CMSG_TYPE_PORT_REIFY = 6,
NFP_FLOWER_CMSG_TYPE_MAC_REPR = 7,
NFP_FLOWER_CMSG_TYPE_PORT_MOD = 8,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
new file mode 100644
index 0000000..0c4c957
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/lag_conf.c
@@ -0,0 +1,726 @@
+/*
+ * Copyright (C) 2018 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below. You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "main.h"
+
+/* LAG group config flags. */
+#define NFP_FL_LAG_LAST BIT(1)
+#define NFP_FL_LAG_FIRST BIT(2)
+#define NFP_FL_LAG_DATA BIT(3)
+#define NFP_FL_LAG_XON BIT(4)
+#define NFP_FL_LAG_SYNC BIT(5)
+#define NFP_FL_LAG_SWITCH BIT(6)
+#define NFP_FL_LAG_RESET BIT(7)
+
+/* LAG port state flags. */
+#define NFP_PORT_LAG_LINK_UP BIT(0)
+#define NFP_PORT_LAG_TX_ENABLED BIT(1)
+#define NFP_PORT_LAG_CHANGED BIT(2)
+
+enum nfp_fl_lag_batch {
+ NFP_FL_LAG_BATCH_FIRST,
+ NFP_FL_LAG_BATCH_MEMBER,
+ NFP_FL_LAG_BATCH_FINISHED
+};
+
+/**
+ * struct nfp_flower_cmsg_lag_config - control message payload for LAG config
+ * @ctrl_flags: Configuration flags
+ * @reserved: Reserved for future use
+ * @ttl: Time to live of packet - host always sets to 0xff
+ * @pkt_number: Config message packet number - increment for each message
+ * @batch_ver: Batch version of messages - increment for each batch of messages
+ * @group_id: Group ID applicable
+ * @group_inst: Group instance number - increment when group is reused
+ * @members: Array of 32-bit words listing all active group members
+ */
+struct nfp_flower_cmsg_lag_config {
+ u8 ctrl_flags;
+ u8 reserved[2];
+ u8 ttl;
+ __be32 pkt_number;
+ __be32 batch_ver;
+ __be32 group_id;
+ __be32 group_inst;
+ __be32 members[];
+};
+
+/**
+ * struct nfp_fl_lag_group - list entry for each LAG group
+ * @group_id: Assigned group ID for host/kernel sync
+ * @group_inst: Group instance in case of ID reuse
+ * @list: List entry
+ * @master_ndev: Group master Netdev
+ * @dirty: Marked if the group needs synced to HW
+ * @offloaded: Marked if the group is currently offloaded to NIC
+ * @to_remove: Marked if the group should be removed from NIC
+ * @to_destroy: Marked if the group should be removed from driver
+ * @slave_cnt: Number of slaves in group
+ */
+struct nfp_fl_lag_group {
+ unsigned int group_id;
+ u8 group_inst;
+ struct list_head list;
+ struct net_device *master_ndev;
+ bool dirty;
+ bool offloaded;
+ bool to_remove;
+ bool to_destroy;
+ unsigned int slave_cnt;
+};
+
+#define NFP_FL_LAG_PKT_NUMBER_MASK GENMASK(30, 0)
+#define NFP_FL_LAG_VERSION_MASK GENMASK(22, 0)
+#define NFP_FL_LAG_HOST_TTL 0xff
+
+/* Use this ID with zero members to ack a batch config */
+#define NFP_FL_LAG_SYNC_ID 0
+#define NFP_FL_LAG_GROUP_MIN 1 /* ID 0 reserved */
+#define NFP_FL_LAG_GROUP_MAX 32 /* IDs 1 to 31 are valid */
+
+/* wait for more config */
+#define NFP_FL_LAG_DELAY (msecs_to_jiffies(2))
+
+#define NFP_FL_LAG_RETRANS_LIMIT 100 /* max retrans cmsgs to store */
+
+static unsigned int nfp_fl_get_next_pkt_number(struct nfp_fl_lag *lag)
+{
+ lag->pkt_num++;
+ lag->pkt_num &= NFP_FL_LAG_PKT_NUMBER_MASK;
+
+ return lag->pkt_num;
+}
+
+static void nfp_fl_increment_version(struct nfp_fl_lag *lag)
+{
+ /* LSB is not considered by firmware so add 2 for each increment. */
+ lag->batch_ver += 2;
+ lag->batch_ver &= NFP_FL_LAG_VERSION_MASK;
+
+ /* Zero is reserved by firmware. */
+ if (!lag->batch_ver)
+ lag->batch_ver += 2;
+}
+
+static struct nfp_fl_lag_group *
+nfp_fl_lag_group_create(struct nfp_fl_lag *lag, struct net_device *master)
+{
+ struct nfp_fl_lag_group *group;
+ struct nfp_flower_priv *priv;
+ int id;
+
+ priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
+
+ id = ida_simple_get(&lag->ida_handle, NFP_FL_LAG_GROUP_MIN,
+ NFP_FL_LAG_GROUP_MAX, GFP_KERNEL);
+ if (id < 0) {
+ nfp_flower_cmsg_warn(priv->app,
+ "No more bonding groups available\n");
+ return ERR_PTR(id);
+ }
+
+ group = kmalloc(sizeof(*group), GFP_KERNEL);
+ if (!group) {
+ ida_simple_remove(&lag->ida_handle, id);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ group->group_id = id;
+ group->master_ndev = master;
+ group->dirty = true;
+ group->offloaded = false;
+ group->to_remove = false;
+ group->to_destroy = false;
+ group->slave_cnt = 0;
+ group->group_inst = ++lag->global_inst;
+ list_add_tail(&group->list, &lag->group_list);
+
+ return group;
+}
+
+static struct nfp_fl_lag_group *
+nfp_fl_lag_find_group_for_master_with_lag(struct nfp_fl_lag *lag,
+ struct net_device *master)
+{
+ struct nfp_fl_lag_group *entry;
+
+ if (!master)
+ return NULL;
+
+ list_for_each_entry(entry, &lag->group_list, list)
+ if (entry->master_ndev == master)
+ return entry;
+
+ return NULL;
+}
+
+int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
+ struct net_device *master,
+ struct nfp_fl_pre_lag *pre_act)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ struct nfp_fl_lag_group *group = NULL;
+ __be32 temp_vers;
+
+ mutex_lock(&priv->nfp_lag.lock);
+ group = nfp_fl_lag_find_group_for_master_with_lag(&priv->nfp_lag,
+ master);
+ if (!group) {
+ mutex_unlock(&priv->nfp_lag.lock);
+ return -ENOENT;
+ }
+
+ pre_act->group_id = cpu_to_be16(group->group_id);
+ temp_vers = cpu_to_be32(priv->nfp_lag.batch_ver <<
+ NFP_FL_PRE_LAG_VER_OFF);
+ memcpy(pre_act->lag_version, &temp_vers, 3);
+ pre_act->instance = group->group_inst;
+ mutex_unlock(&priv->nfp_lag.lock);
+
+ return 0;
+}
+
+int nfp_flower_lag_get_output_id(struct nfp_app *app, struct net_device *master)
+{
+ struct nfp_flower_priv *priv = app->priv;
+ struct nfp_fl_lag_group *group = NULL;
+ int group_id = -ENOENT;
+
+ mutex_lock(&priv->nfp_lag.lock);
+ group = nfp_fl_lag_find_group_for_master_with_lag(&priv->nfp_lag,
+ master);
+ if (group)
+ group_id = group->group_id;
+ mutex_unlock(&priv->nfp_lag.lock);
+
+ return group_id;
+}
+
+static int
+nfp_fl_lag_config_group(struct nfp_fl_lag *lag, struct nfp_fl_lag_group *group,
+ struct net_device **active_members,
+ unsigned int member_cnt, enum nfp_fl_lag_batch *batch)
+{
+ struct nfp_flower_cmsg_lag_config *cmsg_payload;
+ struct nfp_flower_priv *priv;
+ unsigned long int flags;
+ unsigned int size, i;
+ struct sk_buff *skb;
+
+ priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
+ size = sizeof(*cmsg_payload) + sizeof(__be32) * member_cnt;
+ skb = nfp_flower_cmsg_alloc(priv->app, size,
+ NFP_FLOWER_CMSG_TYPE_LAG_CONFIG,
+ GFP_KERNEL);
+ if (!skb)
+ return -ENOMEM;
+
+ cmsg_payload = nfp_flower_cmsg_get_data(skb);
+ flags = 0;
+
+ /* Increment batch version for each new batch of config messages. */
+ if (*batch == NFP_FL_LAG_BATCH_FIRST) {
+ flags |= NFP_FL_LAG_FIRST;
+ nfp_fl_increment_version(lag);
+ *batch = NFP_FL_LAG_BATCH_MEMBER;
+ }
+
+ /* If it is a reset msg then it is also the end of the batch. */
+ if (lag->rst_cfg) {
+ flags |= NFP_FL_LAG_RESET;
+ *batch = NFP_FL_LAG_BATCH_FINISHED;
+ }
+
+ /* To signal the end of a batch, both the switch and last flags are set
+ * and the the reserved SYNC group ID is used.
+ */
+ if (*batch == NFP_FL_LAG_BATCH_FINISHED) {
+ flags |= NFP_FL_LAG_SWITCH | NFP_FL_LAG_LAST;
+ lag->rst_cfg = false;
+ cmsg_payload->group_id = cpu_to_be32(NFP_FL_LAG_SYNC_ID);
+ cmsg_payload->group_inst = 0;
+ } else {
+ cmsg_payload->group_id = cpu_to_be32(group->group_id);
+ cmsg_payload->group_inst = cpu_to_be32(group->group_inst);
+ }
+
+ cmsg_payload->reserved[0] = 0;
+ cmsg_payload->reserved[1] = 0;
+ cmsg_payload->ttl = NFP_FL_LAG_HOST_TTL;
+ cmsg_payload->ctrl_flags = flags;
+ cmsg_payload->batch_ver = cpu_to_be32(lag->batch_ver);
+ cmsg_payload->pkt_number = cpu_to_be32(nfp_fl_get_next_pkt_number(lag));
+
+ for (i = 0; i < member_cnt; i++)
+ cmsg_payload->members[i] =
+ cpu_to_be32(nfp_repr_get_port_id(active_members[i]));
+
+ nfp_ctrl_tx(priv->app->ctrl, skb);
+ return 0;
+}
+
+static void nfp_fl_lag_do_work(struct work_struct *work)
+{
+ enum nfp_fl_lag_batch batch = NFP_FL_LAG_BATCH_FIRST;
+ struct nfp_fl_lag_group *entry, *storage;
+ struct delayed_work *delayed_work;
+ struct nfp_flower_priv *priv;
+ struct nfp_fl_lag *lag;
+ int err;
+
+ delayed_work = to_delayed_work(work);
+ lag = container_of(delayed_work, struct nfp_fl_lag, work);
+ priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
+
+ mutex_lock(&lag->lock);
+ list_for_each_entry_safe(entry, storage, &lag->group_list, list) {
+ struct net_device *iter_netdev, **acti_netdevs;
+ struct nfp_flower_repr_priv *repr_priv;
+ int active_count = 0, slaves = 0;
+ struct nfp_repr *repr;
+ unsigned long *flags;
+
+ if (entry->to_remove) {
+ /* Active count of 0 deletes group on hw. */
+ err = nfp_fl_lag_config_group(lag, entry, NULL, 0,
+ &batch);
+ if (!err) {
+ entry->to_remove = false;
+ entry->offloaded = false;
+ } else {
+ nfp_flower_cmsg_warn(priv->app,
+ "group delete failed\n");
+ schedule_delayed_work(&lag->work,
+ NFP_FL_LAG_DELAY);
+ continue;
+ }
+
+ if (entry->to_destroy) {
+ ida_simple_remove(&lag->ida_handle,
+ entry->group_id);
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ continue;
+ }
+
+ acti_netdevs = kmalloc_array(entry->slave_cnt,
+ sizeof(*acti_netdevs), GFP_KERNEL);
+
+ /* Include sanity check in the loop. It may be that a bond has
+ * changed between processing the last notification and the
+ * work queue triggering. If the number of slaves has changed
+ * or it now contains netdevs that cannot be offloaded, ignore
+ * the group until pending notifications are processed.
+ */
+ rcu_read_lock();
+ for_each_netdev_in_bond_rcu(entry->master_ndev, iter_netdev) {
+ if (!nfp_netdev_is_nfp_repr(iter_netdev)) {
+ slaves = 0;
+ break;
+ }
+
+ repr = netdev_priv(iter_netdev);
+
+ if (repr->app != priv->app) {
+ slaves = 0;
+ break;
+ }
+
+ slaves++;
+ if (slaves > entry->slave_cnt)
+ break;
+
+ /* Check the ports for state changes. */
+ repr_priv = repr->app_priv;
+ flags = &repr_priv->lag_port_flags;
+
+ if (*flags & NFP_PORT_LAG_CHANGED) {
+ *flags &= ~NFP_PORT_LAG_CHANGED;
+ entry->dirty = true;
+ }
+
+ if ((*flags & NFP_PORT_LAG_TX_ENABLED) &&
+ (*flags & NFP_PORT_LAG_LINK_UP))
+ acti_netdevs[active_count++] = iter_netdev;
+ }
+ rcu_read_unlock();
+
+ if (slaves != entry->slave_cnt || !entry->dirty) {
+ kfree(acti_netdevs);
+ continue;
+ }
+
+ err = nfp_fl_lag_config_group(lag, entry, acti_netdevs,
+ active_count, &batch);
+ if (!err) {
+ entry->offloaded = true;
+ entry->dirty = false;
+ } else {
+ nfp_flower_cmsg_warn(priv->app,
+ "group offload failed\n");
+ schedule_delayed_work(&lag->work, NFP_FL_LAG_DELAY);
+ }
+
+ kfree(acti_netdevs);
+ }
+
+ /* End the config batch if at least one packet has been batched. */
+ if (batch == NFP_FL_LAG_BATCH_MEMBER) {
+ batch = NFP_FL_LAG_BATCH_FINISHED;
+ err = nfp_fl_lag_config_group(lag, NULL, NULL, 0, &batch);
+ if (err)
+ nfp_flower_cmsg_warn(priv->app,
+ "group batch end cmsg failed\n");
+ }
+
+ mutex_unlock(&lag->lock);
+}
+
+static int
+nfp_fl_lag_put_unprocessed(struct nfp_fl_lag *lag, struct sk_buff *skb)
+{
+ struct nfp_flower_cmsg_lag_config *cmsg_payload;
+
+ cmsg_payload = nfp_flower_cmsg_get_data(skb);
+ if (be32_to_cpu(cmsg_payload->group_id) >= NFP_FL_LAG_GROUP_MAX)
+ return -EINVAL;
+
+ /* Drop cmsg retrans if storage limit is exceeded to prevent
+ * overloading. If the fw notices that expected messages have not been
+ * received in a given time block, it will request a full resync.
+ */
+ if (skb_queue_len(&lag->retrans_skbs) >= NFP_FL_LAG_RETRANS_LIMIT)
+ return -ENOSPC;
+
+ __skb_queue_tail(&lag->retrans_skbs, skb);
+
+ return 0;
+}
+
+static void nfp_fl_send_unprocessed(struct nfp_fl_lag *lag)
+{
+ struct nfp_flower_priv *priv;
+ struct sk_buff *skb;
+
+ priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
+
+ while ((skb = __skb_dequeue(&lag->retrans_skbs)))
+ nfp_ctrl_tx(priv->app->ctrl, skb);
+}
+
+bool nfp_flower_lag_unprocessed_msg(struct nfp_app *app, struct sk_buff *skb)
+{
+ struct nfp_flower_cmsg_lag_config *cmsg_payload;
+ struct nfp_flower_priv *priv = app->priv;
+ struct nfp_fl_lag_group *group_entry;
+ unsigned long int flags;
+ bool store_skb = false;
+ int err;
+
+ cmsg_payload = nfp_flower_cmsg_get_data(skb);
+ flags = cmsg_payload->ctrl_flags;
+
+ /* Note the intentional fall through below. If DATA and XON are both
+ * set, the message will stored and sent again with the rest of the
+ * unprocessed messages list.
+ */
+
+ /* Store */
+ if (flags & NFP_FL_LAG_DATA)
+ if (!nfp_fl_lag_put_unprocessed(&priv->nfp_lag, skb))
+ store_skb = true;
+
+ /* Send stored */
+ if (flags & NFP_FL_LAG_XON)
+ nfp_fl_send_unprocessed(&priv->nfp_lag);
+
+ /* Resend all */
+ if (flags & NFP_FL_LAG_SYNC) {
+ /* To resend all config:
+ * 1) Clear all unprocessed messages
+ * 2) Mark all groups dirty
+ * 3) Reset NFP group config
+ * 4) Schedule a LAG config update
+ */
+
+ __skb_queue_purge(&priv->nfp_lag.retrans_skbs);
+
+ mutex_lock(&priv->nfp_lag.lock);
+ list_for_each_entry(group_entry, &priv->nfp_lag.group_list,
+ list)
+ group_entry->dirty = true;
+
+ err = nfp_flower_lag_reset(&priv->nfp_lag);
+ if (err)
+ nfp_flower_cmsg_warn(priv->app,
+ "mem err in group reset msg\n");
+ mutex_unlock(&priv->nfp_lag.lock);
+
+ schedule_delayed_work(&priv->nfp_lag.work, 0);
+ }
+
+ return store_skb;
+}
+
+static void
+nfp_fl_lag_schedule_group_remove(struct nfp_fl_lag *lag,
+ struct nfp_fl_lag_group *group)
+{
+ group->to_remove = true;
+
+ schedule_delayed_work(&lag->work, NFP_FL_LAG_DELAY);
+}
+
+static int
+nfp_fl_lag_schedule_group_delete(struct nfp_fl_lag *lag,
+ struct net_device *master)
+{
+ struct nfp_fl_lag_group *group;
+
+ mutex_lock(&lag->lock);
+ group = nfp_fl_lag_find_group_for_master_with_lag(lag, master);
+ if (!group) {
+ mutex_unlock(&lag->lock);
+ return -ENOENT;
+ }
+
+ group->to_remove = true;
+ group->to_destroy = true;
+ mutex_unlock(&lag->lock);
+
+ schedule_delayed_work(&lag->work, NFP_FL_LAG_DELAY);
+ return 0;
+}
+
+static int
+nfp_fl_lag_changeupper_event(struct nfp_fl_lag *lag,
+ struct netdev_notifier_changeupper_info *info)
+{
+ struct net_device *upper = info->upper_dev, *iter_netdev;
+ struct netdev_lag_upper_info *lag_upper_info;
+ struct nfp_fl_lag_group *group;
+ struct nfp_flower_priv *priv;
+ unsigned int slave_count = 0;
+ bool can_offload = true;
+ struct nfp_repr *repr;
+
+ if (!netif_is_lag_master(upper))
+ return 0;
+
+ priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
+
+ rcu_read_lock();
+ for_each_netdev_in_bond_rcu(upper, iter_netdev) {
+ if (!nfp_netdev_is_nfp_repr(iter_netdev)) {
+ can_offload = false;
+ break;
+ }
+ repr = netdev_priv(iter_netdev);
+
+ /* Ensure all ports are created by the same app/on same card. */
+ if (repr->app != priv->app) {
+ can_offload = false;
+ break;
+ }
+
+ slave_count++;
+ }
+ rcu_read_unlock();
+
+ lag_upper_info = info->upper_info;
+
+ /* Firmware supports active/backup and L3/L4 hash bonds. */
+ if (lag_upper_info &&
+ lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_ACTIVEBACKUP &&
+ (lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH ||
+ (lag_upper_info->hash_type != NETDEV_LAG_HASH_L34 &&
+ lag_upper_info->hash_type != NETDEV_LAG_HASH_E34))) {
+ can_offload = false;
+ nfp_flower_cmsg_warn(priv->app,
+ "Unable to offload tx_type %u hash %u\n",
+ lag_upper_info->tx_type,
+ lag_upper_info->hash_type);
+ }
+
+ mutex_lock(&lag->lock);
+ group = nfp_fl_lag_find_group_for_master_with_lag(lag, upper);
+
+ if (slave_count == 0 || !can_offload) {
+ /* Cannot offload the group - remove if previously offloaded. */
+ if (group && group->offloaded)
+ nfp_fl_lag_schedule_group_remove(lag, group);
+
+ mutex_unlock(&lag->lock);
+ return 0;
+ }
+
+ if (!group) {
+ group = nfp_fl_lag_group_create(lag, upper);
+ if (IS_ERR(group)) {
+ mutex_unlock(&lag->lock);
+ return PTR_ERR(group);
+ }
+ }
+
+ group->dirty = true;
+ group->slave_cnt = slave_count;
+
+ /* Group may have been on queue for removal but is now offfloable. */
+ group->to_remove = false;
+ mutex_unlock(&lag->lock);
+
+ schedule_delayed_work(&lag->work, NFP_FL_LAG_DELAY);
+ return 0;
+}
+
+static int
+nfp_fl_lag_changels_event(struct nfp_fl_lag *lag, struct net_device *netdev,
+ struct netdev_notifier_changelowerstate_info *info)
+{
+ struct netdev_lag_lower_state_info *lag_lower_info;
+ struct nfp_flower_repr_priv *repr_priv;
+ struct nfp_flower_priv *priv;
+ struct nfp_repr *repr;
+ unsigned long *flags;
+
+ if (!netif_is_lag_port(netdev) || !nfp_netdev_is_nfp_repr(netdev))
+ return 0;
+
+ lag_lower_info = info->lower_state_info;
+ if (!lag_lower_info)
+ return 0;
+
+ priv = container_of(lag, struct nfp_flower_priv, nfp_lag);
+ repr = netdev_priv(netdev);
+
+ /* Verify that the repr is associated with this app. */
+ if (repr->app != priv->app)
+ return 0;
+
+ repr_priv = repr->app_priv;
+ flags = &repr_priv->lag_port_flags;
+
+ mutex_lock(&lag->lock);
+ if (lag_lower_info->link_up)
+ *flags |= NFP_PORT_LAG_LINK_UP;
+ else
+ *flags &= ~NFP_PORT_LAG_LINK_UP;
+
+ if (lag_lower_info->tx_enabled)
+ *flags |= NFP_PORT_LAG_TX_ENABLED;
+ else
+ *flags &= ~NFP_PORT_LAG_TX_ENABLED;
+
+ *flags |= NFP_PORT_LAG_CHANGED;
+ mutex_unlock(&lag->lock);
+
+ schedule_delayed_work(&lag->work, NFP_FL_LAG_DELAY);
+ return 0;
+}
+
+static int
+nfp_fl_lag_netdev_event(struct notifier_block *nb, unsigned long event,
+ void *ptr)
+{
+ struct net_device *netdev;
+ struct nfp_fl_lag *lag;
+ int err;
+
+ netdev = netdev_notifier_info_to_dev(ptr);
+ lag = container_of(nb, struct nfp_fl_lag, lag_nb);
+
+ switch (event) {
+ case NETDEV_CHANGEUPPER:
+ err = nfp_fl_lag_changeupper_event(lag, ptr);
+ if (err)
+ return NOTIFY_BAD;
+ return NOTIFY_OK;
+ case NETDEV_CHANGELOWERSTATE:
+ err = nfp_fl_lag_changels_event(lag, netdev, ptr);
+ if (err)
+ return NOTIFY_BAD;
+ return NOTIFY_OK;
+ case NETDEV_UNREGISTER:
+ if (netif_is_bond_master(netdev)) {
+ err = nfp_fl_lag_schedule_group_delete(lag, netdev);
+ if (err)
+ return NOTIFY_BAD;
+ return NOTIFY_OK;
+ }
+ }
+
+ return NOTIFY_DONE;
+}
+
+int nfp_flower_lag_reset(struct nfp_fl_lag *lag)
+{
+ enum nfp_fl_lag_batch batch = NFP_FL_LAG_BATCH_FIRST;
+
+ lag->rst_cfg = true;
+ return nfp_fl_lag_config_group(lag, NULL, NULL, 0, &batch);
+}
+
+void nfp_flower_lag_init(struct nfp_fl_lag *lag)
+{
+ INIT_DELAYED_WORK(&lag->work, nfp_fl_lag_do_work);
+ INIT_LIST_HEAD(&lag->group_list);
+ mutex_init(&lag->lock);
+ ida_init(&lag->ida_handle);
+
+ __skb_queue_head_init(&lag->retrans_skbs);
+
+ /* 0 is a reserved batch version so increment to first valid value. */
+ nfp_fl_increment_version(lag);
+
+ lag->lag_nb.notifier_call = nfp_fl_lag_netdev_event;
+}
+
+void nfp_flower_lag_cleanup(struct nfp_fl_lag *lag)
+{
+ struct nfp_fl_lag_group *entry, *storage;
+
+ cancel_delayed_work_sync(&lag->work);
+
+ __skb_queue_purge(&lag->retrans_skbs);
+
+ /* Remove all groups. */
+ mutex_lock(&lag->lock);
+ list_for_each_entry_safe(entry, storage, &lag->group_list, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ mutex_unlock(&lag->lock);
+ mutex_destroy(&lag->lock);
+ ida_destroy(&lag->ida_handle);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 84e3b9f..19cfa16 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -185,6 +185,10 @@
static void
nfp_flower_repr_netdev_clean(struct nfp_app *app, struct net_device *netdev)
{
+ struct nfp_repr *repr = netdev_priv(netdev);
+
+ kfree(repr->app_priv);
+
tc_setup_cb_egdev_unregister(netdev, nfp_flower_setup_tc_egress_cb,
netdev_priv(netdev));
}
@@ -225,7 +229,9 @@
u8 nfp_pcie = nfp_cppcore_pcie_unit(app->pf->cpp);
struct nfp_flower_priv *priv = app->priv;
atomic_t *replies = &priv->reify_replies;
+ struct nfp_flower_repr_priv *repr_priv;
enum nfp_port_type port_type;
+ struct nfp_repr *nfp_repr;
struct nfp_reprs *reprs;
int i, err, reify_cnt;
const u8 queue = 0;
@@ -247,12 +253,25 @@
err = -ENOMEM;
goto err_reprs_clean;
}
- RCU_INIT_POINTER(reprs->reprs[i], repr);
+
+ repr_priv = kzalloc(sizeof(*repr_priv), GFP_KERNEL);
+ if (!repr_priv) {
+ err = -ENOMEM;
+ goto err_reprs_clean;
+ }
+
+ nfp_repr = netdev_priv(repr);
+ nfp_repr->app_priv = repr_priv;
/* For now we only support 1 PF */
WARN_ON(repr_type == NFP_REPR_TYPE_PF && i);
port = nfp_port_alloc(app, port_type, repr);
+ if (IS_ERR(port)) {
+ err = PTR_ERR(port);
+ nfp_repr_free(repr);
+ goto err_reprs_clean;
+ }
if (repr_type == NFP_REPR_TYPE_PF) {
port->pf_id = i;
port->vnic = priv->nn->dp.ctrl_bar;
@@ -271,9 +290,11 @@
port_id, port, priv->nn->dp.netdev);
if (err) {
nfp_port_free(port);
+ nfp_repr_free(repr);
goto err_reprs_clean;
}
+ RCU_INIT_POINTER(reprs->reprs[i], repr);
nfp_info(app->cpp, "%s%d Representor(%s) created\n",
repr_type == NFP_REPR_TYPE_PF ? "PF" : "VF", i,
repr->name);
@@ -318,6 +339,8 @@
{
struct nfp_eth_table *eth_tbl = app->pf->eth_tbl;
atomic_t *replies = &priv->reify_replies;
+ struct nfp_flower_repr_priv *repr_priv;
+ struct nfp_repr *nfp_repr;
struct sk_buff *ctrl_skb;
struct nfp_reprs *reprs;
int err, reify_cnt;
@@ -344,16 +367,26 @@
err = -ENOMEM;
goto err_reprs_clean;
}
- RCU_INIT_POINTER(reprs->reprs[phys_port], repr);
+
+ repr_priv = kzalloc(sizeof(*repr_priv), GFP_KERNEL);
+ if (!repr_priv) {
+ err = -ENOMEM;
+ goto err_reprs_clean;
+ }
+
+ nfp_repr = netdev_priv(repr);
+ nfp_repr->app_priv = repr_priv;
port = nfp_port_alloc(app, NFP_PORT_PHYS_PORT, repr);
if (IS_ERR(port)) {
err = PTR_ERR(port);
+ nfp_repr_free(repr);
goto err_reprs_clean;
}
err = nfp_port_init_phy_port(app->pf, app, port, i);
if (err) {
nfp_port_free(port);
+ nfp_repr_free(repr);
goto err_reprs_clean;
}
@@ -365,6 +398,7 @@
cmsg_port_id, port, priv->nn->dp.netdev);
if (err) {
nfp_port_free(port);
+ nfp_repr_free(repr);
goto err_reprs_clean;
}
@@ -373,6 +407,7 @@
eth_tbl->ports[i].base,
phys_port);
+ RCU_INIT_POINTER(reprs->reprs[phys_port], repr);
nfp_info(app->cpp, "Phys Port %d Representor(%s) created\n",
phys_port, repr->name);
}
@@ -537,8 +572,22 @@
else
app_priv->flower_ext_feats = features;
+ /* Tell the firmware that the driver supports lag. */
+ err = nfp_rtsym_write_le(app->pf->rtbl,
+ "_abi_flower_balance_sync_enable", 1);
+ if (!err) {
+ app_priv->flower_ext_feats |= NFP_FL_FEATS_LAG;
+ nfp_flower_lag_init(&app_priv->nfp_lag);
+ } else if (err == -ENOENT) {
+ nfp_warn(app->cpp, "LAG not supported by FW.\n");
+ } else {
+ goto err_cleanup_metadata;
+ }
+
return 0;
+err_cleanup_metadata:
+ nfp_flower_metadata_cleanup(app);
err_free_app_priv:
vfree(app->priv);
return err;
@@ -552,6 +601,9 @@
skb_queue_purge(&app_priv->cmsg_skbs_low);
flush_work(&app_priv->cmsg_work);
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
+ nfp_flower_lag_cleanup(&app_priv->nfp_lag);
+
nfp_flower_metadata_cleanup(app);
vfree(app->priv);
app->priv = NULL;
@@ -618,11 +670,29 @@
static int nfp_flower_start(struct nfp_app *app)
{
+ struct nfp_flower_priv *app_priv = app->priv;
+ int err;
+
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG) {
+ err = nfp_flower_lag_reset(&app_priv->nfp_lag);
+ if (err)
+ return err;
+
+ err = register_netdevice_notifier(&app_priv->nfp_lag.lag_nb);
+ if (err)
+ return err;
+ }
+
return nfp_tunnel_config_start(app);
}
static void nfp_flower_stop(struct nfp_app *app)
{
+ struct nfp_flower_priv *app_priv = app->priv;
+
+ if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
+ unregister_netdevice_notifier(&app_priv->nfp_lag.lag_nb);
+
nfp_tunnel_config_stop(app);
}
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h
index c67e1b5..bbe5764 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -43,10 +43,13 @@
#include <net/pkt_cls.h>
#include <net/tcp.h>
#include <linux/workqueue.h>
+#include <linux/idr.h>
+struct nfp_fl_pre_lag;
struct net_device;
struct nfp_app;
+#define NFP_FL_STATS_CTX_DONT_CARE cpu_to_be32(0xffffffff)
#define NFP_FL_STATS_ENTRY_RS BIT(20)
#define NFP_FL_STATS_ELEM_RS 4
#define NFP_FL_REPEATED_HASH_MAX BIT(17)
@@ -66,6 +69,7 @@
/* Extra features bitmap. */
#define NFP_FL_FEATS_GENEVE BIT(0)
#define NFP_FL_NBI_MTU_SETTING BIT(1)
+#define NFP_FL_FEATS_LAG BIT(31)
struct nfp_fl_mask_id {
struct circ_buf mask_id_free_list;
@@ -96,6 +100,33 @@
};
/**
+ * struct nfp_fl_lag - Flower APP priv data for link aggregation
+ * @lag_nb: Notifier to track master/slave events
+ * @work: Work queue for writing configs to the HW
+ * @lock: Lock to protect lag_group_list
+ * @group_list: List of all master/slave groups offloaded
+ * @ida_handle: IDA to handle group ids
+ * @pkt_num: Incremented for each config packet sent
+ * @batch_ver: Incremented for each batch of config packets
+ * @global_inst: Instance allocator for groups
+ * @rst_cfg: Marker to reset HW LAG config
+ * @retrans_skbs: Cmsgs that could not be processed by HW and require
+ * retransmission
+ */
+struct nfp_fl_lag {
+ struct notifier_block lag_nb;
+ struct delayed_work work;
+ struct mutex lock;
+ struct list_head group_list;
+ struct ida ida_handle;
+ unsigned int pkt_num;
+ unsigned int batch_ver;
+ u8 global_inst;
+ bool rst_cfg;
+ struct sk_buff_head retrans_skbs;
+};
+
+/**
* struct nfp_flower_priv - Flower APP per-vNIC priv data
* @app: Back pointer to app
* @nn: Pointer to vNIC
@@ -127,6 +158,7 @@
* from firmware for repr reify
* @reify_wait_queue: wait queue for repr reify response counting
* @mtu_conf: Configuration of repr MTU value
+ * @nfp_lag: Link aggregation data block
*/
struct nfp_flower_priv {
struct nfp_app *app;
@@ -156,6 +188,15 @@
atomic_t reify_replies;
wait_queue_head_t reify_wait_queue;
struct nfp_mtu_conf mtu_conf;
+ struct nfp_fl_lag nfp_lag;
+};
+
+/**
+ * struct nfp_flower_repr_priv - Flower APP per-repr priv data
+ * @lag_port_flags: Extended port flags to record lag state of repr
+ */
+struct nfp_flower_repr_priv {
+ unsigned long lag_port_flags;
};
struct nfp_fl_key_ls {
@@ -189,9 +230,11 @@
spinlock_t lock; /* lock stats */
struct nfp_fl_stats stats;
__be32 nfp_tun_ipv4_addr;
+ struct net_device *ingress_dev;
char *unmasked_data;
char *mask_data;
char *action_data;
+ bool ingress_offload;
};
struct nfp_fl_stats_frame {
@@ -211,17 +254,20 @@
struct net_device *netdev,
struct nfp_fl_payload *nfp_flow,
enum nfp_flower_tun_type tun_type);
-int nfp_flower_compile_action(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_action(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
struct net_device *netdev,
struct nfp_fl_payload *nfp_flow);
int nfp_compile_flow_metadata(struct nfp_app *app,
struct tc_cls_flower_offload *flow,
- struct nfp_fl_payload *nfp_flow);
+ struct nfp_fl_payload *nfp_flow,
+ struct net_device *netdev);
int nfp_modify_flow_metadata(struct nfp_app *app,
struct nfp_fl_payload *nfp_flow);
struct nfp_fl_payload *
-nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie);
+nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
+ struct net_device *netdev, __be32 host_ctx);
struct nfp_fl_payload *
nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie);
@@ -236,5 +282,14 @@
void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb);
int nfp_flower_setup_tc_egress_cb(enum tc_setup_type type, void *type_data,
void *cb_priv);
+void nfp_flower_lag_init(struct nfp_fl_lag *lag);
+void nfp_flower_lag_cleanup(struct nfp_fl_lag *lag);
+int nfp_flower_lag_reset(struct nfp_fl_lag *lag);
+bool nfp_flower_lag_unprocessed_msg(struct nfp_app *app, struct sk_buff *skb);
+int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
+ struct net_device *master,
+ struct nfp_fl_pre_lag *pre_act);
+int nfp_flower_lag_get_output_id(struct nfp_app *app,
+ struct net_device *master);
#endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index db977cf..21668aa 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -99,14 +99,18 @@
/* Must be called with either RTNL or rcu_read_lock */
struct nfp_fl_payload *
-nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie)
+nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
+ struct net_device *netdev, __be32 host_ctx)
{
struct nfp_flower_priv *priv = app->priv;
struct nfp_fl_payload *flower_entry;
hash_for_each_possible_rcu(priv->flow_table, flower_entry, link,
tc_flower_cookie)
- if (flower_entry->tc_flower_cookie == tc_flower_cookie)
+ if (flower_entry->tc_flower_cookie == tc_flower_cookie &&
+ (!netdev || flower_entry->ingress_dev == netdev) &&
+ (host_ctx == NFP_FL_STATS_CTX_DONT_CARE ||
+ flower_entry->meta.host_ctx_id == host_ctx))
return flower_entry;
return NULL;
@@ -121,13 +125,11 @@
flower_cookie = be64_to_cpu(stats->stats_cookie);
rcu_read_lock();
- nfp_flow = nfp_flower_search_fl_table(app, flower_cookie);
+ nfp_flow = nfp_flower_search_fl_table(app, flower_cookie, NULL,
+ stats->stats_con_id);
if (!nfp_flow)
goto exit_rcu_unlock;
- if (nfp_flow->meta.host_ctx_id != stats->stats_con_id)
- goto exit_rcu_unlock;
-
spin_lock(&nfp_flow->lock);
nfp_flow->stats.pkts += be32_to_cpu(stats->pkt_count);
nfp_flow->stats.bytes += be64_to_cpu(stats->byte_count);
@@ -317,7 +319,8 @@
int nfp_compile_flow_metadata(struct nfp_app *app,
struct tc_cls_flower_offload *flow,
- struct nfp_fl_payload *nfp_flow)
+ struct nfp_fl_payload *nfp_flow,
+ struct net_device *netdev)
{
struct nfp_flower_priv *priv = app->priv;
struct nfp_fl_payload *check_entry;
@@ -348,7 +351,8 @@
nfp_flow->stats.bytes = 0;
nfp_flow->stats.used = jiffies;
- check_entry = nfp_flower_search_fl_table(app, flow->cookie);
+ check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev,
+ NFP_FL_STATS_CTX_DONT_CARE);
if (check_entry) {
if (nfp_release_stats_entry(app, stats_cxt))
return -EINVAL;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 114d2ab..c42e64f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -345,7 +345,7 @@
}
static struct nfp_fl_payload *
-nfp_flower_allocate_new(struct nfp_fl_key_ls *key_layer)
+nfp_flower_allocate_new(struct nfp_fl_key_ls *key_layer, bool egress)
{
struct nfp_fl_payload *flow_pay;
@@ -371,6 +371,8 @@
flow_pay->meta.flags = 0;
spin_lock_init(&flow_pay->lock);
+ flow_pay->ingress_offload = !egress;
+
return flow_pay;
err_free_mask:
@@ -402,8 +404,20 @@
struct nfp_flower_priv *priv = app->priv;
struct nfp_fl_payload *flow_pay;
struct nfp_fl_key_ls *key_layer;
+ struct net_device *ingr_dev;
int err;
+ ingr_dev = egress ? NULL : netdev;
+ flow_pay = nfp_flower_search_fl_table(app, flow->cookie, ingr_dev,
+ NFP_FL_STATS_CTX_DONT_CARE);
+ if (flow_pay) {
+ /* Ignore as duplicate if it has been added by different cb. */
+ if (flow_pay->ingress_offload && egress)
+ return 0;
+ else
+ return -EOPNOTSUPP;
+ }
+
key_layer = kmalloc(sizeof(*key_layer), GFP_KERNEL);
if (!key_layer)
return -ENOMEM;
@@ -413,22 +427,25 @@
if (err)
goto err_free_key_ls;
- flow_pay = nfp_flower_allocate_new(key_layer);
+ flow_pay = nfp_flower_allocate_new(key_layer, egress);
if (!flow_pay) {
err = -ENOMEM;
goto err_free_key_ls;
}
+ flow_pay->ingress_dev = egress ? NULL : netdev;
+
err = nfp_flower_compile_flow_match(flow, key_layer, netdev, flow_pay,
tun_type);
if (err)
goto err_destroy_flow;
- err = nfp_flower_compile_action(flow, netdev, flow_pay);
+ err = nfp_flower_compile_action(app, flow, netdev, flow_pay);
if (err)
goto err_destroy_flow;
- err = nfp_compile_flow_metadata(app, flow, flow_pay);
+ err = nfp_compile_flow_metadata(app, flow, flow_pay,
+ flow_pay->ingress_dev);
if (err)
goto err_destroy_flow;
@@ -462,6 +479,7 @@
* @app: Pointer to the APP handle
* @netdev: netdev structure.
* @flow: TC flower classifier offload structure
+ * @egress: Netdev is the egress dev.
*
* Removes a flow from the repeated hash structure and clears the
* action payload.
@@ -470,15 +488,18 @@
*/
static int
nfp_flower_del_offload(struct nfp_app *app, struct net_device *netdev,
- struct tc_cls_flower_offload *flow)
+ struct tc_cls_flower_offload *flow, bool egress)
{
struct nfp_port *port = nfp_port_from_netdev(netdev);
struct nfp_fl_payload *nfp_flow;
+ struct net_device *ingr_dev;
int err;
- nfp_flow = nfp_flower_search_fl_table(app, flow->cookie);
+ ingr_dev = egress ? NULL : netdev;
+ nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, ingr_dev,
+ NFP_FL_STATS_CTX_DONT_CARE);
if (!nfp_flow)
- return -ENOENT;
+ return egress ? 0 : -ENOENT;
err = nfp_modify_flow_metadata(app, nfp_flow);
if (err)
@@ -505,7 +526,9 @@
/**
* nfp_flower_get_stats() - Populates flow stats obtained from hardware.
* @app: Pointer to the APP handle
+ * @netdev: Netdev structure.
* @flow: TC flower classifier offload structure
+ * @egress: Netdev is the egress dev.
*
* Populates a flow statistics structure which which corresponds to a
* specific flow.
@@ -513,14 +536,21 @@
* Return: negative value on error, 0 if stats populated successfully.
*/
static int
-nfp_flower_get_stats(struct nfp_app *app, struct tc_cls_flower_offload *flow)
+nfp_flower_get_stats(struct nfp_app *app, struct net_device *netdev,
+ struct tc_cls_flower_offload *flow, bool egress)
{
struct nfp_fl_payload *nfp_flow;
+ struct net_device *ingr_dev;
- nfp_flow = nfp_flower_search_fl_table(app, flow->cookie);
+ ingr_dev = egress ? NULL : netdev;
+ nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, ingr_dev,
+ NFP_FL_STATS_CTX_DONT_CARE);
if (!nfp_flow)
return -EINVAL;
+ if (nfp_flow->ingress_offload && egress)
+ return 0;
+
spin_lock_bh(&nfp_flow->lock);
tcf_exts_stats_update(flow->exts, nfp_flow->stats.bytes,
nfp_flow->stats.pkts, nfp_flow->stats.used);
@@ -543,9 +573,9 @@
case TC_CLSFLOWER_REPLACE:
return nfp_flower_add_offload(app, netdev, flower, egress);
case TC_CLSFLOWER_DESTROY:
- return nfp_flower_del_offload(app, netdev, flower);
+ return nfp_flower_del_offload(app, netdev, flower, egress);
case TC_CLSFLOWER_STATS:
- return nfp_flower_get_stats(app, flower);
+ return nfp_flower_get_stats(app, netdev, flower, egress);
}
return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_abi.h b/drivers/net/ethernet/netronome/nfp/nfp_abi.h
new file mode 100644
index 0000000..7ffa6e6
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_abi.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) */
+/*
+ * Copyright (C) 2018 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below. You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __NFP_ABI__
+#define __NFP_ABI__ 1
+
+#include <linux/types.h>
+
+#define NFP_MBOX_SYM_NAME "_abi_nfd_pf%u_mbox"
+#define NFP_MBOX_SYM_MIN_SIZE 16 /* When no data needed */
+
+#define NFP_MBOX_CMD 0x00
+#define NFP_MBOX_RET 0x04
+#define NFP_MBOX_DATA_LEN 0x08
+#define NFP_MBOX_RESERVED 0x0c
+#define NFP_MBOX_DATA 0x10
+
+/**
+ * enum nfp_mbox_cmd - PF mailbox commands
+ *
+ * @NFP_MBOX_NO_CMD: null command
+ * Used to indicate previous command has finished.
+ *
+ * @NFP_MBOX_POOL_GET: get shared buffer pool info/config
+ * Input - struct nfp_shared_buf_pool_id
+ * Output - struct nfp_shared_buf_pool_info_get
+ *
+ * @NFP_MBOX_POOL_SET: set shared buffer pool info/config
+ * Input - struct nfp_shared_buf_pool_info_set
+ * Output - None
+ */
+enum nfp_mbox_cmd {
+ NFP_MBOX_NO_CMD = 0x00,
+
+ NFP_MBOX_POOL_GET = 0x01,
+ NFP_MBOX_POOL_SET = 0x02,
+};
+
+#define NFP_SHARED_BUF_COUNT_SYM_NAME "_abi_nfd_pf%u_sb_cnt"
+#define NFP_SHARED_BUF_TABLE_SYM_NAME "_abi_nfd_pf%u_sb_tbl"
+
+/**
+ * struct nfp_shared_buf - NFP shared buffer description
+ * @id: numerical user-visible id of the shared buffer
+ * @size: size in bytes of the buffer
+ * @ingress_pools_count: number of ingress pools
+ * @egress_pools_count: number of egress pools
+ * @ingress_tc_count: number of ingress trafic classes
+ * @egress_tc_count: number of egress trafic classes
+ * @pool_size_unit: pool size may be in credits, each credit is
+ * @pool_size_unit bytes
+ */
+struct nfp_shared_buf {
+ __le32 id;
+ __le32 size;
+ __le16 ingress_pools_count;
+ __le16 egress_pools_count;
+ __le16 ingress_tc_count;
+ __le16 egress_tc_count;
+
+ __le32 pool_size_unit;
+};
+
+/**
+ * struct nfp_shared_buf_pool_id - shared buffer pool identification
+ * @shared_buf: shared buffer id
+ * @pool: pool index
+ */
+struct nfp_shared_buf_pool_id {
+ __le32 shared_buf;
+ __le32 pool;
+};
+
+/**
+ * struct nfp_shared_buf_pool_info_get - struct devlink_sb_pool_info mirror
+ * @pool_type: one of enum devlink_sb_pool_type
+ * @size: pool size in units of SB's @pool_size_unit
+ * @threshold_type: one of enum devlink_sb_threshold_type
+ */
+struct nfp_shared_buf_pool_info_get {
+ __le32 pool_type;
+ __le32 size;
+ __le32 threshold_type;
+};
+
+/**
+ * struct nfp_shared_buf_pool_info_set - packed args of sb_pool_set
+ * @id: pool identification info
+ * @size: pool size in units of SB's @pool_size_unit
+ * @threshold_type: one of enum devlink_sb_threshold_type
+ */
+struct nfp_shared_buf_pool_info_set {
+ struct nfp_shared_buf_pool_id id;
+ __le32 size;
+ __le32 threshold_type;
+};
+
+#endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index 6aedef0..c9d8a7a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -54,6 +54,9 @@
#ifdef CONFIG_NFP_APP_FLOWER
[NFP_APP_FLOWER_NIC] = &app_flower,
#endif
+#ifdef CONFIG_NFP_APP_ABM_NIC
+ [NFP_APP_ACTIVE_BUFFER_MGMT_NIC] = &app_abm,
+#endif
};
struct nfp_app *nfp_app_from_netdev(struct net_device *netdev)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index 2d9cb25..23b99a4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -57,11 +57,13 @@
NFP_APP_CORE_NIC = 0x1,
NFP_APP_BPF_NIC = 0x2,
NFP_APP_FLOWER_NIC = 0x3,
+ NFP_APP_ACTIVE_BUFFER_MGMT_NIC = 0x4,
};
extern const struct nfp_app_type app_nic;
extern const struct nfp_app_type app_bpf;
extern const struct nfp_app_type app_flower;
+extern const struct nfp_app_type app_abm;
/**
* struct nfp_app_type - application definition
@@ -95,6 +97,7 @@
* @bpf: BPF ndo offload-related calls
* @xdp_offload: offload an XDP program
* @eswitch_mode_get: get SR-IOV eswitch mode
+ * @eswitch_mode_set: set SR-IOV eswitch mode (under pf->lock)
* @sriov_enable: app-specific sriov initialisation
* @sriov_disable: app-specific sriov clean-up
* @repr_get: get representor netdev
@@ -146,6 +149,7 @@
void (*sriov_disable)(struct nfp_app *app);
enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app);
+ int (*eswitch_mode_set)(struct nfp_app *app, u16 mode);
struct net_device *(*repr_get)(struct nfp_app *app, u32 id);
};
@@ -370,6 +374,13 @@
return 0;
}
+static inline int nfp_app_eswitch_mode_set(struct nfp_app *app, u16 mode)
+{
+ if (!app->type->eswitch_mode_set)
+ return -EOPNOTSUPP;
+ return app->type->eswitch_mode_set(app, mode);
+}
+
static inline int nfp_app_sriov_enable(struct nfp_app *app, int num_vfs)
{
if (!app || !app->type->sriov_enable)
@@ -410,5 +421,7 @@
int nfp_app_nic_vnic_alloc(struct nfp_app *app, struct nfp_net *nn,
unsigned int id);
+int nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct nfp_app *app,
+ struct nfp_net *nn, unsigned int id);
#endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
index b9618c3..e2dfe4f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
@@ -38,9 +38,8 @@
#include "nfp_net.h"
#include "nfp_port.h"
-static int
-nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct nfp_app *app,
- struct nfp_net *nn, unsigned int id)
+int nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct nfp_app *app,
+ struct nfp_net *nn, unsigned int id)
{
int err;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
index 5f2b2f2..f6677bc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
@@ -72,8 +72,21 @@
#define OP_BR_ADDR_LO 0x007ffc00000ULL
#define OP_BR_ADDR_HI 0x10000000000ULL
-#define nfp_is_br(_insn) \
- (((_insn) & OP_BR_BASE_MASK) == OP_BR_BASE)
+#define OP_BR_BIT_BASE 0x0d000000000ULL
+#define OP_BR_BIT_BASE_MASK 0x0f800080300ULL
+#define OP_BR_BIT_A_SRC 0x000000000ffULL
+#define OP_BR_BIT_B_SRC 0x0000003fc00ULL
+#define OP_BR_BIT_BV 0x00000040000ULL
+#define OP_BR_BIT_SRC_LMEXTN 0x40000000000ULL
+#define OP_BR_BIT_DEFBR OP_BR_DEFBR
+#define OP_BR_BIT_ADDR_LO OP_BR_ADDR_LO
+#define OP_BR_BIT_ADDR_HI OP_BR_ADDR_HI
+
+static inline bool nfp_is_br(u64 insn)
+{
+ return (insn & OP_BR_BASE_MASK) == OP_BR_BASE ||
+ (insn & OP_BR_BIT_BASE_MASK) == OP_BR_BIT_BASE;
+}
enum br_mask {
BR_BEQ = 0x00,
@@ -161,6 +174,7 @@
SHF_OP_NONE = 0,
SHF_OP_AND = 2,
SHF_OP_OR = 5,
+ SHF_OP_ASHR = 6,
};
enum shf_sc {
@@ -183,16 +197,18 @@
#define OP_ALU_DST_LMEXTN 0x80000000000ULL
enum alu_op {
- ALU_OP_NONE = 0x00,
- ALU_OP_ADD = 0x01,
- ALU_OP_NOT = 0x04,
- ALU_OP_ADD_2B = 0x05,
- ALU_OP_AND = 0x08,
- ALU_OP_SUB_C = 0x0d,
- ALU_OP_ADD_C = 0x11,
- ALU_OP_OR = 0x14,
- ALU_OP_SUB = 0x15,
- ALU_OP_XOR = 0x18,
+ ALU_OP_NONE = 0x00,
+ ALU_OP_ADD = 0x01,
+ ALU_OP_NOT = 0x04,
+ ALU_OP_ADD_2B = 0x05,
+ ALU_OP_AND = 0x08,
+ ALU_OP_AND_NOT_A = 0x0c,
+ ALU_OP_SUB_C = 0x0d,
+ ALU_OP_AND_NOT_B = 0x10,
+ ALU_OP_ADD_C = 0x11,
+ ALU_OP_OR = 0x14,
+ ALU_OP_SUB = 0x15,
+ ALU_OP_XOR = 0x18,
};
enum alu_dst_ab {
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index eb0fc61..71c2edd 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -149,6 +149,26 @@
return ret;
}
+static int
+nfp_devlink_sb_pool_get(struct devlink *devlink, unsigned int sb_index,
+ u16 pool_index, struct devlink_sb_pool_info *pool_info)
+{
+ struct nfp_pf *pf = devlink_priv(devlink);
+
+ return nfp_shared_buf_pool_get(pf, sb_index, pool_index, pool_info);
+}
+
+static int
+nfp_devlink_sb_pool_set(struct devlink *devlink, unsigned int sb_index,
+ u16 pool_index,
+ u32 size, enum devlink_sb_threshold_type threshold_type)
+{
+ struct nfp_pf *pf = devlink_priv(devlink);
+
+ return nfp_shared_buf_pool_set(pf, sb_index, pool_index,
+ size, threshold_type);
+}
+
static int nfp_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
{
struct nfp_pf *pf = devlink_priv(devlink);
@@ -156,10 +176,25 @@
return nfp_app_eswitch_mode_get(pf->app, mode);
}
+static int nfp_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
+{
+ struct nfp_pf *pf = devlink_priv(devlink);
+ int ret;
+
+ mutex_lock(&pf->lock);
+ ret = nfp_app_eswitch_mode_set(pf->app, mode);
+ mutex_unlock(&pf->lock);
+
+ return ret;
+}
+
const struct devlink_ops nfp_devlink_ops = {
.port_split = nfp_devlink_port_split,
.port_unsplit = nfp_devlink_port_unsplit,
+ .sb_pool_get = nfp_devlink_sb_pool_get,
+ .sb_pool_set = nfp_devlink_sb_pool_set,
.eswitch_mode_get = nfp_devlink_eswitch_mode_get,
+ .eswitch_mode_set = nfp_devlink_eswitch_mode_set,
};
int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
@@ -175,8 +210,9 @@
return ret;
devlink_port_type_eth_set(&port->dl_port, port->netdev);
- if (eth_port.is_split)
- devlink_port_split_set(&port->dl_port, eth_port.label_port);
+ devlink_port_attrs_set(&port->dl_port, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+ eth_port.label_port, eth_port.is_split,
+ eth_port.label_subport);
devlink = priv_to_devlink(app->pf);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index c4b1f34..46b76d5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -55,6 +55,7 @@
#include "nfpcore/nfp6000_pcie.h"
+#include "nfp_abi.h"
#include "nfp_app.h"
#include "nfp_main.h"
#include "nfp_net.h"
@@ -75,6 +76,122 @@
};
MODULE_DEVICE_TABLE(pci, nfp_pci_device_ids);
+int nfp_pf_rtsym_read_optional(struct nfp_pf *pf, const char *format,
+ unsigned int default_val)
+{
+ char name[256];
+ int err = 0;
+ u64 val;
+
+ snprintf(name, sizeof(name), format, nfp_cppcore_pcie_unit(pf->cpp));
+
+ val = nfp_rtsym_read_le(pf->rtbl, name, &err);
+ if (err) {
+ if (err == -ENOENT)
+ return default_val;
+ nfp_err(pf->cpp, "Unable to read symbol %s\n", name);
+ return err;
+ }
+
+ return val;
+}
+
+u8 __iomem *
+nfp_pf_map_rtsym(struct nfp_pf *pf, const char *name, const char *sym_fmt,
+ unsigned int min_size, struct nfp_cpp_area **area)
+{
+ char pf_symbol[256];
+
+ snprintf(pf_symbol, sizeof(pf_symbol), sym_fmt,
+ nfp_cppcore_pcie_unit(pf->cpp));
+
+ return nfp_rtsym_map(pf->rtbl, pf_symbol, name, min_size, area);
+}
+
+/* Callers should hold the devlink instance lock */
+int nfp_mbox_cmd(struct nfp_pf *pf, u32 cmd, void *in_data, u64 in_length,
+ void *out_data, u64 out_length)
+{
+ unsigned long long addr;
+ unsigned long err_at;
+ u64 max_data_sz;
+ u32 val = 0;
+ u32 cpp_id;
+ int n, err;
+
+ if (!pf->mbox)
+ return -EOPNOTSUPP;
+
+ cpp_id = NFP_CPP_ISLAND_ID(pf->mbox->target, NFP_CPP_ACTION_RW, 0,
+ pf->mbox->domain);
+ addr = pf->mbox->addr;
+ max_data_sz = pf->mbox->size - NFP_MBOX_SYM_MIN_SIZE;
+
+ /* Check if cmd field is clear */
+ err = nfp_cpp_readl(pf->cpp, cpp_id, addr + NFP_MBOX_CMD, &val);
+ if (err || val) {
+ nfp_warn(pf->cpp, "failed to issue command (%u): %u, err: %d\n",
+ cmd, val, err);
+ return err ?: -EBUSY;
+ }
+
+ in_length = min(in_length, max_data_sz);
+ n = nfp_cpp_write(pf->cpp, cpp_id, addr + NFP_MBOX_DATA,
+ in_data, in_length);
+ if (n != in_length)
+ return -EIO;
+ /* Write data_len and wipe reserved */
+ err = nfp_cpp_writeq(pf->cpp, cpp_id, addr + NFP_MBOX_DATA_LEN,
+ in_length);
+ if (err)
+ return err;
+
+ /* Read back for ordering */
+ err = nfp_cpp_readl(pf->cpp, cpp_id, addr + NFP_MBOX_DATA_LEN, &val);
+ if (err)
+ return err;
+
+ /* Write cmd and wipe return value */
+ err = nfp_cpp_writeq(pf->cpp, cpp_id, addr + NFP_MBOX_CMD, cmd);
+ if (err)
+ return err;
+
+ err_at = jiffies + 5 * HZ;
+ while (true) {
+ /* Wait for command to go to 0 (NFP_MBOX_NO_CMD) */
+ err = nfp_cpp_readl(pf->cpp, cpp_id, addr + NFP_MBOX_CMD, &val);
+ if (err)
+ return err;
+ if (!val)
+ break;
+
+ if (time_is_before_eq_jiffies(err_at))
+ return -ETIMEDOUT;
+
+ msleep(5);
+ }
+
+ /* Copy output if any (could be error info, do it before reading ret) */
+ err = nfp_cpp_readl(pf->cpp, cpp_id, addr + NFP_MBOX_DATA_LEN, &val);
+ if (err)
+ return err;
+
+ out_length = min_t(u32, val, min(out_length, max_data_sz));
+ n = nfp_cpp_read(pf->cpp, cpp_id, addr + NFP_MBOX_DATA,
+ out_data, out_length);
+ if (n != out_length)
+ return -EIO;
+
+ /* Check if there is an error */
+ err = nfp_cpp_readl(pf->cpp, cpp_id, addr + NFP_MBOX_RET, &val);
+ if (err)
+ return err;
+ if (val)
+ return -val;
+
+ return out_length;
+}
+
static bool nfp_board_ready(struct nfp_pf *pf)
{
const char *cp;
@@ -436,6 +553,25 @@
nfp_nsp_close(nsp);
}
+static int nfp_pf_find_rtsyms(struct nfp_pf *pf)
+{
+ char pf_symbol[256];
+ unsigned int pf_id;
+
+ pf_id = nfp_cppcore_pcie_unit(pf->cpp);
+
+ /* Optional per-PCI PF mailbox */
+ snprintf(pf_symbol, sizeof(pf_symbol), NFP_MBOX_SYM_NAME, pf_id);
+ pf->mbox = nfp_rtsym_lookup(pf->rtbl, pf_symbol);
+ if (pf->mbox && pf->mbox->size < NFP_MBOX_SYM_MIN_SIZE) {
+ nfp_err(pf->cpp, "PF mailbox symbol too small: %llu < %d\n",
+ pf->mbox->size, NFP_MBOX_SYM_MIN_SIZE);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int nfp_pci_probe(struct pci_dev *pdev,
const struct pci_device_id *pci_id)
{
@@ -486,6 +622,10 @@
goto err_disable_msix;
}
+ err = nfp_resource_table_init(pf->cpp);
+ if (err)
+ goto err_cpp_free;
+
pf->hwinfo = nfp_hwinfo_read(pf->cpp);
dev_info(&pdev->dev, "Assembly: %s%s%s-%s CPLD: %s\n",
@@ -506,6 +646,10 @@
pf->mip = nfp_mip_open(pf->cpp);
pf->rtbl = __nfp_rtsym_table_read(pf->cpp, pf->mip);
+ err = nfp_pf_find_rtsyms(pf);
+ if (err)
+ goto err_fw_unload;
+
pf->dump_flag = NFP_DUMP_NSP_DIAG;
pf->dumpspec = nfp_net_dump_load_dumpspec(pf->cpp, pf->rtbl);
@@ -548,6 +692,7 @@
vfree(pf->dumpspec);
err_hwinfo_free:
kfree(pf->hwinfo);
+err_cpp_free:
nfp_cpp_free(pf->cpp);
err_disable_msix:
destroy_workqueue(pf->wq);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index 4221108..595b3dc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -46,10 +46,10 @@
#include <linux/mutex.h>
#include <linux/pci.h>
#include <linux/workqueue.h>
+#include <net/devlink.h>
struct dentry;
struct device;
-struct devlink_ops;
struct pci_dev;
struct nfp_cpp;
@@ -60,7 +60,9 @@
struct nfp_net;
struct nfp_nsp_identify;
struct nfp_port;
+struct nfp_rtsym;
struct nfp_rtsym_table;
+struct nfp_shared_buf;
/**
* struct nfp_dumpspec - NFP FW dump specification structure
@@ -87,6 +89,7 @@
* @vf_cfg_mem: Pointer to mapped VF configuration area
* @vfcfg_tbl2_area: Pointer to the CPP area for the VF config table
* @vfcfg_tbl2: Pointer to mapped VF config table
+ * @mbox: RTSym of per-PCI PF mailbox (under devlink lock)
* @irq_entries: Array of MSI-X entries for all vNICs
* @limit_vfs: Number of VFs supported by firmware (~0 for PCI limit)
* @num_vfs: Number of SR-IOV VFs enabled
@@ -108,6 +111,8 @@
* @ports: Linked list of port structures (struct nfp_port)
* @wq: Workqueue for running works which need to grab @lock
* @port_refresh_work: Work entry for taking netdevs out
+ * @shared_bufs: Array of shared buffer structures if FW has any SBs
+ * @num_shared_bufs: Number of elements in @shared_bufs
* @lock: Protects all fields which may change after probe
*/
struct nfp_pf {
@@ -127,6 +132,8 @@
struct nfp_cpp_area *vfcfg_tbl2_area;
u8 __iomem *vfcfg_tbl2;
+ const struct nfp_rtsym *mbox;
+
struct msix_entry *irq_entries;
unsigned int limit_vfs;
@@ -158,6 +165,9 @@
struct workqueue_struct *wq;
struct work_struct port_refresh_work;
+ struct nfp_shared_buf *shared_bufs;
+ unsigned int num_shared_bufs;
+
struct mutex lock;
};
@@ -177,6 +187,14 @@
bool nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb);
+int nfp_pf_rtsym_read_optional(struct nfp_pf *pf, const char *format,
+ unsigned int default_val);
+u8 __iomem *
+nfp_pf_map_rtsym(struct nfp_pf *pf, const char *name, const char *sym_fmt,
+ unsigned int min_size, struct nfp_cpp_area **area);
+int nfp_mbox_cmd(struct nfp_pf *pf, u32 cmd, void *in_data, u64 in_length,
+ void *out_data, u64 out_length);
+
enum nfp_dump_diag {
NFP_DUMP_NSP_DIAG = 0,
};
@@ -188,4 +206,11 @@
int nfp_net_dump_populate_buffer(struct nfp_pf *pf, struct nfp_dumpspec *spec,
struct ethtool_dump *dump_param, void *dest);
+int nfp_shared_buf_register(struct nfp_pf *pf);
+void nfp_shared_buf_unregister(struct nfp_pf *pf);
+int nfp_shared_buf_pool_get(struct nfp_pf *pf, unsigned int sb, u16 pool_index,
+ struct devlink_sb_pool_info *pool_info);
+int nfp_shared_buf_pool_set(struct nfp_pf *pf, unsigned int sb,
+ u16 pool_index, u32 size,
+ enum devlink_sb_threshold_type threshold_type);
#endif /* NFP_MAIN_H */
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index bd7d8ae..57cb035d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -545,6 +545,7 @@
/**
* struct nfp_net - NFP network device structure
* @dp: Datapath structure
+ * @id: vNIC id within the PF (0 for VFs)
* @fw_ver: Firmware version
* @cap: Capabilities advertised by the Firmware
* @max_mtu: Maximum support MTU advertised by the Firmware
@@ -597,6 +598,8 @@
struct nfp_net_fw_version fw_ver;
+ u32 id;
+
u32 cap;
u32 max_mtu;
@@ -909,7 +912,7 @@
void nfp_net_debugfs_create(void);
void nfp_net_debugfs_destroy(void);
struct dentry *nfp_net_debugfs_device_add(struct pci_dev *pdev);
-void nfp_net_debugfs_vnic_add(struct nfp_net *nn, struct dentry *ddir, int id);
+void nfp_net_debugfs_vnic_add(struct nfp_net *nn, struct dentry *ddir);
void nfp_net_debugfs_dir_clean(struct dentry **dir);
#else
static inline void nfp_net_debugfs_create(void)
@@ -926,7 +929,7 @@
}
static inline void
-nfp_net_debugfs_vnic_add(struct nfp_net *nn, struct dentry *ddir, int id)
+nfp_net_debugfs_vnic_add(struct nfp_net *nn, struct dentry *ddir)
{
}
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1eb6549..eea11e8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1722,7 +1722,7 @@
act = bpf_prog_run_xdp(xdp_prog, &xdp);
- pkt_len -= xdp.data - orig_data;
+ pkt_len = xdp.data_end - xdp.data;
pkt_off += xdp.data - orig_data;
switch (act) {
@@ -3277,6 +3277,24 @@
return features;
}
+static int
+nfp_net_get_phys_port_name(struct net_device *netdev, char *name, size_t len)
+{
+ struct nfp_net *nn = netdev_priv(netdev);
+ int n;
+
+ if (nn->port)
+ return nfp_port_get_phys_port_name(netdev, name, len);
+
+ if (!nn->dp.is_vf) {
+ n = snprintf(name, len, "%d", nn->id);
+ if (n >= len)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
/**
* nfp_net_set_vxlan_port() - set vxlan port in SW and reconfigure HW
* @nn: NFP Net device to reconfigure
@@ -3475,7 +3493,7 @@
.ndo_set_mac_address = nfp_net_set_mac_address,
.ndo_set_features = nfp_net_set_features,
.ndo_features_check = nfp_net_features_check,
- .ndo_get_phys_port_name = nfp_port_get_phys_port_name,
+ .ndo_get_phys_port_name = nfp_net_get_phys_port_name,
.ndo_udp_tunnel_add = nfp_net_add_vxlan_port,
.ndo_udp_tunnel_del = nfp_net_del_vxlan_port,
.ndo_bpf = nfp_net_xdp,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
index 67cdd833..099b63d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
@@ -201,7 +201,7 @@
.llseek = seq_lseek
};
-void nfp_net_debugfs_vnic_add(struct nfp_net *nn, struct dentry *ddir, int id)
+void nfp_net_debugfs_vnic_add(struct nfp_net *nn, struct dentry *ddir)
{
struct dentry *queues, *tx, *rx, *xdp;
char name[20];
@@ -211,7 +211,7 @@
return;
if (nfp_net_is_data_vnic(nn))
- sprintf(name, "vnic%d", id);
+ sprintf(name, "vnic%d", nn->id);
else
strcpy(name, "ctrl-vnic");
nn->debugfs_dir = debugfs_create_dir(name, ddir);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 45cd209..28516ee 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -101,48 +101,15 @@
return NULL;
}
-static int
-nfp_net_pf_rtsym_read_optional(struct nfp_pf *pf, const char *format,
- unsigned int default_val)
-{
- char name[256];
- int err = 0;
- u64 val;
-
- snprintf(name, sizeof(name), format, nfp_cppcore_pcie_unit(pf->cpp));
-
- val = nfp_rtsym_read_le(pf->rtbl, name, &err);
- if (err) {
- if (err == -ENOENT)
- return default_val;
- nfp_err(pf->cpp, "Unable to read symbol %s\n", name);
- return err;
- }
-
- return val;
-}
-
static int nfp_net_pf_get_num_ports(struct nfp_pf *pf)
{
- return nfp_net_pf_rtsym_read_optional(pf, "nfd_cfg_pf%u_num_ports", 1);
+ return nfp_pf_rtsym_read_optional(pf, "nfd_cfg_pf%u_num_ports", 1);
}
static int nfp_net_pf_get_app_id(struct nfp_pf *pf)
{
- return nfp_net_pf_rtsym_read_optional(pf, "_pf%u_net_app_id",
- NFP_APP_CORE_NIC);
-}
-
-static u8 __iomem *
-nfp_net_pf_map_rtsym(struct nfp_pf *pf, const char *name, const char *sym_fmt,
- unsigned int min_size, struct nfp_cpp_area **area)
-{
- char pf_symbol[256];
-
- snprintf(pf_symbol, sizeof(pf_symbol), sym_fmt,
- nfp_cppcore_pcie_unit(pf->cpp));
-
- return nfp_rtsym_map(pf->rtbl, pf_symbol, name, min_size, area);
+ return nfp_pf_rtsym_read_optional(pf, "_pf%u_net_app_id",
+ NFP_APP_CORE_NIC);
}
static void nfp_net_pf_free_vnic(struct nfp_pf *pf, struct nfp_net *nn)
@@ -211,11 +178,13 @@
{
int err;
+ nn->id = id;
+
err = nfp_net_init(nn);
if (err)
return err;
- nfp_net_debugfs_vnic_add(nn, pf->ddir, id);
+ nfp_net_debugfs_vnic_add(nn, pf->ddir);
if (nn->port) {
err = nfp_devlink_port_register(pf->app, nn->port);
@@ -379,9 +348,8 @@
if (!nfp_app_needs_ctrl_vnic(pf->app))
return 0;
- ctrl_bar = nfp_net_pf_map_rtsym(pf, "net.ctrl", "_pf%u_net_ctrl_bar",
- NFP_PF_CSR_SLICE_SIZE,
- &pf->ctrl_vnic_bar);
+ ctrl_bar = nfp_pf_map_rtsym(pf, "net.ctrl", "_pf%u_net_ctrl_bar",
+ NFP_PF_CSR_SLICE_SIZE, &pf->ctrl_vnic_bar);
if (IS_ERR(ctrl_bar)) {
nfp_err(pf->cpp, "Failed to find ctrl vNIC memory symbol\n");
err = PTR_ERR(ctrl_bar);
@@ -507,8 +475,8 @@
int err;
min_size = pf->max_data_vnics * NFP_PF_CSR_SLICE_SIZE;
- mem = nfp_net_pf_map_rtsym(pf, "net.bar0", "_pf%d_net_bar0",
- min_size, &pf->data_vnic_bar);
+ mem = nfp_pf_map_rtsym(pf, "net.bar0", "_pf%d_net_bar0",
+ min_size, &pf->data_vnic_bar);
if (IS_ERR(mem)) {
nfp_err(pf->cpp, "Failed to find data vNIC memory symbol\n");
return PTR_ERR(mem);
@@ -528,10 +496,9 @@
}
}
- pf->vf_cfg_mem = nfp_net_pf_map_rtsym(pf, "net.vfcfg",
- "_pf%d_net_vf_bar",
- NFP_NET_CFG_BAR_SZ *
- pf->limit_vfs, &pf->vf_cfg_bar);
+ pf->vf_cfg_mem = nfp_pf_map_rtsym(pf, "net.vfcfg", "_pf%d_net_vf_bar",
+ NFP_NET_CFG_BAR_SZ * pf->limit_vfs,
+ &pf->vf_cfg_bar);
if (IS_ERR(pf->vf_cfg_mem)) {
if (PTR_ERR(pf->vf_cfg_mem) != -ENOENT) {
err = PTR_ERR(pf->vf_cfg_mem);
@@ -541,9 +508,9 @@
}
min_size = NFP_NET_VF_CFG_SZ * pf->limit_vfs + NFP_NET_VF_CFG_MB_SZ;
- pf->vfcfg_tbl2 = nfp_net_pf_map_rtsym(pf, "net.vfcfg_tbl2",
- "_pf%d_net_vf_cfg2",
- min_size, &pf->vfcfg_tbl2_area);
+ pf->vfcfg_tbl2 = nfp_pf_map_rtsym(pf, "net.vfcfg_tbl2",
+ "_pf%d_net_vf_cfg2",
+ min_size, &pf->vfcfg_tbl2_area);
if (IS_ERR(pf->vfcfg_tbl2)) {
if (PTR_ERR(pf->vfcfg_tbl2) != -ENOENT) {
err = PTR_ERR(pf->vfcfg_tbl2);
@@ -763,6 +730,10 @@
if (err)
goto err_app_clean;
+ err = nfp_shared_buf_register(pf);
+ if (err)
+ goto err_devlink_unreg;
+
mutex_lock(&pf->lock);
pf->ddir = nfp_net_debugfs_device_add(pf->pdev);
@@ -796,6 +767,8 @@
err_clean_ddir:
nfp_net_debugfs_dir_clean(&pf->ddir);
mutex_unlock(&pf->lock);
+ nfp_shared_buf_unregister(pf);
+err_devlink_unreg:
cancel_work_sync(&pf->port_refresh_work);
devlink_unregister(devlink);
err_app_clean:
@@ -823,6 +796,7 @@
mutex_unlock(&pf->lock);
+ nfp_shared_buf_unregister(pf);
devlink_unregister(priv_to_devlink(pf));
nfp_net_pf_free_irqs(pf);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index 0cd077a..117eca6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -277,6 +277,7 @@
.ndo_get_vf_config = nfp_app_get_vf_config,
.ndo_set_vf_link_state = nfp_app_set_vf_link_state,
.ndo_set_features = nfp_port_set_features,
+ .ndo_set_mac_address = eth_mac_addr,
};
static void nfp_repr_clean(struct nfp_repr *repr)
@@ -348,12 +349,17 @@
return err;
}
-static void nfp_repr_free(struct nfp_repr *repr)
+static void __nfp_repr_free(struct nfp_repr *repr)
{
free_percpu(repr->stats);
free_netdev(repr->netdev);
}
+void nfp_repr_free(struct net_device *netdev)
+{
+ __nfp_repr_free(netdev_priv(netdev));
+}
+
struct net_device *nfp_repr_alloc(struct nfp_app *app)
{
struct net_device *netdev;
@@ -380,12 +386,12 @@
return NULL;
}
-static void nfp_repr_clean_and_free(struct nfp_repr *repr)
+void nfp_repr_clean_and_free(struct nfp_repr *repr)
{
nfp_info(repr->app->cpp, "Destroying Representor(%s)\n",
repr->netdev->name);
nfp_repr_clean(repr);
- nfp_repr_free(repr);
+ __nfp_repr_free(repr);
}
void nfp_reprs_clean_and_free(struct nfp_app *app, struct nfp_reprs *reprs)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
index a621e8f..8366e4f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
@@ -76,6 +76,7 @@
* @port: Port of representor
* @app: APP handle
* @stats: Statistic of packets hitting CPU
+ * @app_priv: Pointer for APP data
*/
struct nfp_repr {
struct net_device *netdev;
@@ -83,6 +84,7 @@
struct nfp_port *port;
struct nfp_app *app;
struct nfp_repr_pcpu_stats __percpu *stats;
+ void *app_priv;
};
/**
@@ -123,7 +125,9 @@
int nfp_repr_init(struct nfp_app *app, struct net_device *netdev,
u32 cmsg_port_id, struct nfp_port *port,
struct net_device *pf_netdev);
+void nfp_repr_free(struct net_device *netdev);
struct net_device *nfp_repr_alloc(struct nfp_app *app);
+void nfp_repr_clean_and_free(struct nfp_repr *repr);
void nfp_reprs_clean_and_free(struct nfp_app *app, struct nfp_reprs *reprs);
void nfp_reprs_clean_and_free_by_type(struct nfp_app *app,
enum nfp_repr_type type);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c b/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
index b802a1d..68928c8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
@@ -283,7 +283,7 @@
nfp_net_info(nn);
vf->ddir = nfp_net_debugfs_device_add(pdev);
- nfp_net_debugfs_vnic_add(nn, vf->ddir, 0);
+ nfp_net_debugfs_vnic_add(nn, vf->ddir);
return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index 7bd8be5..9c12981 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -181,7 +181,11 @@
eth_port->label_subport);
break;
case NFP_PORT_PF_PORT:
- n = snprintf(name, len, "pf%d", port->pf_id);
+ if (!port->pf_split)
+ n = snprintf(name, len, "pf%d", port->pf_id);
+ else
+ n = snprintf(name, len, "pf%ds%d", port->pf_id,
+ port->pf_split_id);
break;
case NFP_PORT_VF_PORT:
n = snprintf(name, len, "pf%dvf%d", port->pf_id, port->vf_id);
@@ -218,6 +222,8 @@
eth_port = __nfp_port_get_eth_port(port);
if (!eth_port)
return 0;
+ if (port->eth_forced)
+ return 0;
err = nfp_eth_set_configured(port->app->cpp, eth_port->index, configed);
return err < 0 && err != -EOPNOTSUPP ? err : 0;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index fa7e669..1866675 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -77,10 +77,13 @@
* @app: backpointer to the app structure
* @dl_port: devlink port structure
* @eth_id: for %NFP_PORT_PHYS_PORT port ID in NFP enumeration scheme
+ * @eth_forced: for %NFP_PORT_PHYS_PORT port is forced UP or DOWN, don't change
* @eth_port: for %NFP_PORT_PHYS_PORT translated ETH Table port entry
* @eth_stats: for %NFP_PORT_PHYS_PORT MAC stats if available
* @pf_id: for %NFP_PORT_PF_PORT, %NFP_PORT_VF_PORT ID of the PCI PF (0-3)
* @vf_id: for %NFP_PORT_VF_PORT ID of the PCI VF within @pf_id
+ * @pf_split: for %NFP_PORT_PF_PORT %true if PCI PF has more than one vNIC
+ * @pf_split_id:for %NFP_PORT_PF_PORT ID of PCI PF vNIC (valid if @pf_split)
* @vnic: for %NFP_PORT_PF_PORT, %NFP_PORT_VF_PORT vNIC ctrl memory
* @port_list: entry on pf's list of ports
*/
@@ -99,6 +102,7 @@
/* NFP_PORT_PHYS_PORT */
struct {
unsigned int eth_id;
+ bool eth_forced;
struct nfp_eth_table_port *eth_port;
u8 __iomem *eth_stats;
};
@@ -106,6 +110,8 @@
struct {
unsigned int pf_id;
unsigned int vf_id;
+ bool pf_split;
+ unsigned int pf_split_id;
u8 __iomem *vnic;
};
};
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_shared_buf.c b/drivers/net/ethernet/netronome/nfp/nfp_shared_buf.c
new file mode 100644
index 0000000..0ecd837
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_shared_buf.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+/*
+ * Copyright (C) 2018 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below. You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <net/devlink.h>
+
+#include "nfpcore/nfp_cpp.h"
+#include "nfpcore/nfp_nffw.h"
+#include "nfp_abi.h"
+#include "nfp_app.h"
+#include "nfp_main.h"
+
+static u32 nfp_shared_buf_pool_unit(struct nfp_pf *pf, unsigned int sb)
+{
+ __le32 sb_id = cpu_to_le32(sb);
+ unsigned int i;
+
+ for (i = 0; i < pf->num_shared_bufs; i++)
+ if (pf->shared_bufs[i].id == sb_id)
+ return le32_to_cpu(pf->shared_bufs[i].pool_size_unit);
+
+ WARN_ON_ONCE(1);
+ return 0;
+}
+
+int nfp_shared_buf_pool_get(struct nfp_pf *pf, unsigned int sb, u16 pool_index,
+ struct devlink_sb_pool_info *pool_info)
+{
+ struct nfp_shared_buf_pool_info_get get_data;
+ struct nfp_shared_buf_pool_id id = {
+ .shared_buf = cpu_to_le32(sb),
+ .pool = cpu_to_le32(pool_index),
+ };
+ unsigned int unit_size;
+ int n;
+
+ unit_size = nfp_shared_buf_pool_unit(pf, sb);
+ if (!unit_size)
+ return -EINVAL;
+
+ n = nfp_mbox_cmd(pf, NFP_MBOX_POOL_GET, &id, sizeof(id),
+ &get_data, sizeof(get_data));
+ if (n < 0)
+ return n;
+ if (n < sizeof(get_data))
+ return -EIO;
+
+ pool_info->pool_type = le32_to_cpu(get_data.pool_type);
+ pool_info->threshold_type = le32_to_cpu(get_data.threshold_type);
+ pool_info->size = le32_to_cpu(get_data.size) * unit_size;
+
+ return 0;
+}
+
+int nfp_shared_buf_pool_set(struct nfp_pf *pf, unsigned int sb,
+ u16 pool_index, u32 size,
+ enum devlink_sb_threshold_type threshold_type)
+{
+ struct nfp_shared_buf_pool_info_set set_data = {
+ .id = {
+ .shared_buf = cpu_to_le32(sb),
+ .pool = cpu_to_le32(pool_index),
+ },
+ .threshold_type = cpu_to_le32(threshold_type),
+ };
+ unsigned int unit_size;
+
+ unit_size = nfp_shared_buf_pool_unit(pf, sb);
+ if (!unit_size || size % unit_size)
+ return -EINVAL;
+ set_data.size = cpu_to_le32(size / unit_size);
+
+ return nfp_mbox_cmd(pf, NFP_MBOX_POOL_SET, &set_data, sizeof(set_data),
+ NULL, 0);
+}
+
+int nfp_shared_buf_register(struct nfp_pf *pf)
+{
+ struct devlink *devlink = priv_to_devlink(pf);
+ unsigned int i, num_entries, entry_sz;
+ struct nfp_cpp_area *sb_desc_area;
+ u8 __iomem *sb_desc;
+ int n, err;
+
+ if (!pf->mbox)
+ return 0;
+
+ n = nfp_pf_rtsym_read_optional(pf, NFP_SHARED_BUF_COUNT_SYM_NAME, 0);
+ if (n <= 0)
+ return n;
+ num_entries = n;
+
+ sb_desc = nfp_pf_map_rtsym(pf, "sb_tbl", NFP_SHARED_BUF_TABLE_SYM_NAME,
+ num_entries * sizeof(pf->shared_bufs[0]),
+ &sb_desc_area);
+ if (IS_ERR(sb_desc))
+ return PTR_ERR(sb_desc);
+
+ entry_sz = nfp_cpp_area_size(sb_desc_area) / num_entries;
+
+ pf->shared_bufs = kmalloc_array(num_entries, sizeof(pf->shared_bufs[0]),
+ GFP_KERNEL);
+ if (!pf->shared_bufs) {
+ err = -ENOMEM;
+ goto err_release_area;
+ }
+
+ for (i = 0; i < num_entries; i++) {
+ struct nfp_shared_buf *sb = &pf->shared_bufs[i];
+
+ /* Entries may be larger in future FW */
+ memcpy_fromio(sb, sb_desc + i * entry_sz, sizeof(*sb));
+
+ err = devlink_sb_register(devlink,
+ le32_to_cpu(sb->id),
+ le32_to_cpu(sb->size),
+ le16_to_cpu(sb->ingress_pools_count),
+ le16_to_cpu(sb->egress_pools_count),
+ le16_to_cpu(sb->ingress_tc_count),
+ le16_to_cpu(sb->egress_tc_count));
+ if (err)
+ goto err_unreg_prev;
+ }
+ pf->num_shared_bufs = num_entries;
+
+ nfp_cpp_area_release_free(sb_desc_area);
+
+ return 0;
+
+err_unreg_prev:
+ while (i--)
+ devlink_sb_unregister(devlink,
+ le32_to_cpu(pf->shared_bufs[i].id));
+ kfree(pf->shared_bufs);
+err_release_area:
+ nfp_cpp_area_release_free(sb_desc_area);
+ return err;
+}
+
+void nfp_shared_buf_unregister(struct nfp_pf *pf)
+{
+ struct devlink *devlink = priv_to_devlink(pf);
+ unsigned int i;
+
+ for (i = 0; i < pf->num_shared_bufs; i++)
+ devlink_sb_unregister(devlink,
+ le32_to_cpu(pf->shared_bufs[i].id));
+ kfree(pf->shared_bufs);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
index ced62d1..f44d0a8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
@@ -94,6 +94,8 @@
/* MAC Statistics Accumulator */
#define NFP_RESOURCE_MAC_STATISTICS "mac.stat"
+int nfp_resource_table_init(struct nfp_cpp *cpp);
+
struct nfp_resource *
nfp_resource_acquire(struct nfp_cpp *cpp, const char *name);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index cd67832..749655c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -933,7 +933,6 @@
u32 *wrptr32 = kernel_vaddr;
const u32 __iomem *rdptr32;
int n, width;
- bool is_64;
priv = nfp_cpp_area_priv(area);
rdptr64 = priv->iomem + offset;
@@ -943,10 +942,15 @@
return -EFAULT;
width = priv->width.read;
-
if (width <= 0)
return -EINVAL;
+ /* MU reads via a PCIe2CPP BAR support 32bit (and other) lengths */
+ if (priv->target == (NFP_CPP_TARGET_MU & NFP_CPP_TARGET_ID_MASK) &&
+ priv->action == NFP_CPP_ACTION_RW &&
+ (offset % sizeof(u64) == 4 || length % sizeof(u64) == 4))
+ width = TARGET_WIDTH_32;
+
/* Unaligned? Translate to an explicit access */
if ((priv->offset + offset) & (width - 1))
return nfp_cpp_explicit_read(nfp_cpp_area_cpp(area),
@@ -956,36 +960,29 @@
priv->offset + offset,
kernel_vaddr, length, width);
- is_64 = width == TARGET_WIDTH_64;
-
- /* MU reads via a PCIe2CPP BAR supports 32bit (and other) lengths */
- if (priv->target == (NFP_CPP_TARGET_ID_MASK & NFP_CPP_TARGET_MU) &&
- priv->action == NFP_CPP_ACTION_RW)
- is_64 = false;
-
- if (is_64) {
- if (offset % sizeof(u64) != 0 || length % sizeof(u64) != 0)
- return -EINVAL;
- } else {
- if (offset % sizeof(u32) != 0 || length % sizeof(u32) != 0)
- return -EINVAL;
- }
-
if (WARN_ON(!priv->bar))
return -EFAULT;
- if (is_64)
-#ifndef __raw_readq
- return -EINVAL;
-#else
- for (n = 0; n < length; n += sizeof(u64))
- *wrptr64++ = __raw_readq(rdptr64++);
-#endif
- else
+ switch (width) {
+ case TARGET_WIDTH_32:
+ if (offset % sizeof(u32) != 0 || length % sizeof(u32) != 0)
+ return -EINVAL;
+
for (n = 0; n < length; n += sizeof(u32))
*wrptr32++ = __raw_readl(rdptr32++);
+ return n;
+#ifdef __raw_readq
+ case TARGET_WIDTH_64:
+ if (offset % sizeof(u64) != 0 || length % sizeof(u64) != 0)
+ return -EINVAL;
- return n;
+ for (n = 0; n < length; n += sizeof(u64))
+ *wrptr64++ = __raw_readq(rdptr64++);
+ return n;
+#endif
+ default:
+ return -EINVAL;
+ }
}
static int
@@ -999,7 +996,6 @@
struct nfp6000_area_priv *priv;
u32 __iomem *wrptr32;
int n, width;
- bool is_64;
priv = nfp_cpp_area_priv(area);
wrptr64 = priv->iomem + offset;
@@ -1009,10 +1005,15 @@
return -EFAULT;
width = priv->width.write;
-
if (width <= 0)
return -EINVAL;
+ /* MU writes via a PCIe2CPP BAR support 32bit (and other) lengths */
+ if (priv->target == (NFP_CPP_TARGET_ID_MASK & NFP_CPP_TARGET_MU) &&
+ priv->action == NFP_CPP_ACTION_RW &&
+ (offset % sizeof(u64) == 4 || length % sizeof(u64) == 4))
+ width = TARGET_WIDTH_32;
+
/* Unaligned? Translate to an explicit access */
if ((priv->offset + offset) & (width - 1))
return nfp_cpp_explicit_write(nfp_cpp_area_cpp(area),
@@ -1022,40 +1023,33 @@
priv->offset + offset,
kernel_vaddr, length, width);
- is_64 = width == TARGET_WIDTH_64;
-
- /* MU writes via a PCIe2CPP BAR supports 32bit (and other) lengths */
- if (priv->target == (NFP_CPP_TARGET_ID_MASK & NFP_CPP_TARGET_MU) &&
- priv->action == NFP_CPP_ACTION_RW)
- is_64 = false;
-
- if (is_64) {
- if (offset % sizeof(u64) != 0 || length % sizeof(u64) != 0)
- return -EINVAL;
- } else {
- if (offset % sizeof(u32) != 0 || length % sizeof(u32) != 0)
- return -EINVAL;
- }
-
if (WARN_ON(!priv->bar))
return -EFAULT;
- if (is_64)
-#ifndef __raw_writeq
- return -EINVAL;
-#else
- for (n = 0; n < length; n += sizeof(u64)) {
- __raw_writeq(*rdptr64++, wrptr64++);
- wmb();
- }
-#endif
- else
+ switch (width) {
+ case TARGET_WIDTH_32:
+ if (offset % sizeof(u32) != 0 || length % sizeof(u32) != 0)
+ return -EINVAL;
+
for (n = 0; n < length; n += sizeof(u32)) {
__raw_writel(*rdptr32++, wrptr32++);
wmb();
}
+ return n;
+#ifdef __raw_writeq
+ case TARGET_WIDTH_64:
+ if (offset % sizeof(u64) != 0 || length % sizeof(u64) != 0)
+ return -EINVAL;
- return n;
+ for (n = 0; n < length; n += sizeof(u64)) {
+ __raw_writeq(*rdptr64++, wrptr64++);
+ wmb();
+ }
+ return n;
+#endif
+ default:
+ return -EINVAL;
+ }
}
struct nfp6000_explicit_priv {
@@ -1330,6 +1324,7 @@
/* Finished with card initialization. */
dev_info(&pdev->dev,
"Netronome Flow Processor NFP4000/NFP6000 PCIe Card Probe\n");
+ pcie_print_link_status(pdev);
nfp = kzalloc(sizeof(*nfp), GFP_KERNEL);
if (!nfp) {
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
index c8f2c06..4e19add 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpp.h
@@ -295,6 +295,8 @@
int nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex);
int nfp_cpp_mutex_unlock(struct nfp_cpp_mutex *mutex);
int nfp_cpp_mutex_trylock(struct nfp_cpp_mutex *mutex);
+int nfp_cpp_mutex_reclaim(struct nfp_cpp *cpp, int target,
+ unsigned long long address);
/**
* nfp_cppcore_pcie_unit() - Get PCI Unit of a CPP handle
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
index cb28ac0..c88bf67 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c
@@ -59,6 +59,11 @@
return (u32)interface << 16 | 0x0000;
}
+static u32 nfp_mutex_owner(u32 val)
+{
+ return val >> 16;
+}
+
static bool nfp_mutex_is_locked(u32 val)
{
return (val & 0xffff) == 0x000f;
@@ -351,3 +356,43 @@
return nfp_mutex_is_locked(tmp) ? -EBUSY : -EINVAL;
}
+
+/**
+ * nfp_cpp_mutex_reclaim() - Unlock mutex if held by local endpoint
+ * @cpp: NFP CPP handle
+ * @target: NFP CPP target ID (ie NFP_CPP_TARGET_CLS or NFP_CPP_TARGET_MU)
+ * @address: Offset into the address space of the NFP CPP target ID
+ *
+ * Release lock if held by local system. Extreme care is advised, call only
+ * when no local lock users can exist.
+ *
+ * Return: 0 if the lock was OK, 1 if locked by us, -errno on invalid mutex
+ */
+int nfp_cpp_mutex_reclaim(struct nfp_cpp *cpp, int target,
+ unsigned long long address)
+{
+ const u32 mur = NFP_CPP_ID(target, 3, 0); /* atomic_read */
+ const u32 muw = NFP_CPP_ID(target, 4, 0); /* atomic_write */
+ u16 interface = nfp_cpp_interface(cpp);
+ int err;
+ u32 tmp;
+
+ err = nfp_cpp_mutex_validate(interface, &target, address);
+ if (err)
+ return err;
+
+ /* Check lock */
+ err = nfp_cpp_readl(cpp, mur, address, &tmp);
+ if (err < 0)
+ return err;
+
+ if (nfp_mutex_is_unlocked(tmp) || nfp_mutex_owner(tmp) != interface)
+ return 0;
+
+ /* Bust the lock */
+ err = nfp_cpp_writel(cpp, muw, address, nfp_mutex_unlocked(interface));
+ if (err < 0)
+ return err;
+
+ return 1;
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
index c9724fb..df599d5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nffw.h
@@ -100,6 +100,8 @@
u64 nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name,
int *error);
+int nfp_rtsym_write_le(struct nfp_rtsym_table *rtbl, const char *name,
+ u64 value);
u8 __iomem *
nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name, const char *id,
unsigned int min_size, struct nfp_cpp_area **area);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
index 7e14725..2dd89db 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
@@ -338,3 +338,62 @@
{
return res->size;
}
+
+/**
+ * nfp_resource_table_init() - Run initial checks on the resource table
+ * @cpp: NFP CPP handle
+ *
+ * Start-of-day init procedure for resource table. Must be called before
+ * any local resource table users may exist.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int nfp_resource_table_init(struct nfp_cpp *cpp)
+{
+ struct nfp_cpp_mutex *dev_mutex;
+ int i, err;
+
+ err = nfp_cpp_mutex_reclaim(cpp, NFP_RESOURCE_TBL_TARGET,
+ NFP_RESOURCE_TBL_BASE);
+ if (err < 0) {
+ nfp_err(cpp, "Error: failed to reclaim resource table mutex\n");
+ return err;
+ }
+ if (err)
+ nfp_warn(cpp, "Warning: busted main resource table mutex\n");
+
+ dev_mutex = nfp_cpp_mutex_alloc(cpp, NFP_RESOURCE_TBL_TARGET,
+ NFP_RESOURCE_TBL_BASE,
+ NFP_RESOURCE_TBL_KEY);
+ if (!dev_mutex)
+ return -ENOMEM;
+
+ if (nfp_cpp_mutex_lock(dev_mutex)) {
+ nfp_err(cpp, "Error: failed to claim resource table mutex\n");
+ nfp_cpp_mutex_free(dev_mutex);
+ return -EINVAL;
+ }
+
+ /* Resource 0 is the dev_mutex, start from 1 */
+ for (i = 1; i < NFP_RESOURCE_TBL_ENTRIES; i++) {
+ u64 addr = NFP_RESOURCE_TBL_BASE +
+ sizeof(struct nfp_resource_entry) * i;
+
+ err = nfp_cpp_mutex_reclaim(cpp, NFP_RESOURCE_TBL_TARGET, addr);
+ if (err < 0) {
+ nfp_err(cpp,
+ "Error: failed to reclaim resource %d mutex\n",
+ i);
+ goto err_unlock;
+ }
+ if (err)
+ nfp_warn(cpp, "Warning: busted resource %d mutex\n", i);
+ }
+
+ err = 0;
+err_unlock:
+ nfp_cpp_mutex_unlock(dev_mutex);
+ nfp_cpp_mutex_free(dev_mutex);
+
+ return err;
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 46107ae..9e34216 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -286,6 +286,49 @@
return val;
}
+/**
+ * nfp_rtsym_write_le() - Write an unsigned scalar value to a symbol
+ * @rtbl: NFP RTsym table
+ * @name: Symbol name
+ * @value: Value to write
+ *
+ * Lookup a symbol and write a value to it. Symbol can be 4 or 8 bytes in size.
+ * If 4 bytes then the lower 32-bits of 'value' are used. Value will be
+ * written as simple little-endian unsigned value.
+ *
+ * Return: 0 on success or error code.
+ */
+int nfp_rtsym_write_le(struct nfp_rtsym_table *rtbl, const char *name,
+ u64 value)
+{
+ const struct nfp_rtsym *sym;
+ int err;
+ u32 id;
+
+ sym = nfp_rtsym_lookup(rtbl, name);
+ if (!sym)
+ return -ENOENT;
+
+ id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0, sym->domain);
+
+ switch (sym->size) {
+ case 4:
+ err = nfp_cpp_writel(rtbl->cpp, id, sym->addr, value);
+ break;
+ case 8:
+ err = nfp_cpp_writeq(rtbl->cpp, id, sym->addr, value);
+ break;
+ default:
+ nfp_err(rtbl->cpp,
+ "rtsym '%s' unsupported or non-scalar size: %lld\n",
+ name, sym->size);
+ err = -EINVAL;
+ break;
+ }
+
+ return err;
+}
+
u8 __iomem *
nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name, const char *id,
unsigned int min_size, struct nfp_cpp_area **area)
diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile
index c70cf2a..a0acb94 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -3,7 +3,7 @@
qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o \
qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o qed_l2.o \
- qed_selftest.o qed_dcbx.o qed_debug.o qed_ptp.o
+ qed_selftest.o qed_dcbx.o qed_debug.o qed_ptp.o qed_mng_tlv.o
qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
qed-$(CONFIG_QED_LL2) += qed_ll2.o
qed-$(CONFIG_QED_RDMA) += qed_roce.o qed_rdma.o qed_iwarp.o
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
index e07460a..00db340 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -92,6 +92,8 @@
struct qed_dev_info;
union qed_mcp_protocol_stats;
enum qed_mcp_protocol_type;
+enum qed_mfw_tlv_type;
+union qed_mfw_tlv_data;
/* helpers */
#define QED_MFW_GET_FIELD(name, field) \
@@ -439,6 +441,59 @@
u32 init_ops_size;
};
+enum qed_mf_mode_bit {
+ /* Supports PF-classification based on tag */
+ QED_MF_OVLAN_CLSS,
+
+ /* Supports PF-classification based on MAC */
+ QED_MF_LLH_MAC_CLSS,
+
+ /* Supports PF-classification based on protocol type */
+ QED_MF_LLH_PROTO_CLSS,
+
+ /* Requires a default PF to be set */
+ QED_MF_NEED_DEF_PF,
+
+ /* Allow LL2 to multicast/broadcast */
+ QED_MF_LL2_NON_UNICAST,
+
+ /* Allow Cross-PF [& child VFs] Tx-switching */
+ QED_MF_INTER_PF_SWITCH,
+
+ /* Unified Fabtic Port support enabled */
+ QED_MF_UFP_SPECIFIC,
+
+ /* Disable Accelerated Receive Flow Steering (aRFS) */
+ QED_MF_DISABLE_ARFS,
+
+ /* Use vlan for steering */
+ QED_MF_8021Q_TAGGING,
+
+ /* Use stag for steering */
+ QED_MF_8021AD_TAGGING,
+
+ /* Allow DSCP to TC mapping */
+ QED_MF_DSCP_TO_TC_MAP,
+};
+
+enum qed_ufp_mode {
+ QED_UFP_MODE_ETS,
+ QED_UFP_MODE_VNIC_BW,
+ QED_UFP_MODE_UNKNOWN
+};
+
+enum qed_ufp_pri_type {
+ QED_UFP_PRI_OS,
+ QED_UFP_PRI_VNIC,
+ QED_UFP_PRI_UNKNOWN
+};
+
+struct qed_ufp_info {
+ enum qed_ufp_pri_type pri_type;
+ enum qed_ufp_mode mode;
+ u8 tc;
+};
+
enum BAR_ID {
BAR_ID_0, /* used for GRC */
BAR_ID_1 /* Used for doorbells */
@@ -460,6 +515,10 @@
void (*func)(void *);
};
+enum qed_slowpath_wq_flag {
+ QED_SLOWPATH_MFW_TLV_REQ,
+};
+
struct qed_hwfn {
struct qed_dev *cdev;
u8 my_id; /* ID inside the PF */
@@ -547,6 +606,8 @@
struct qed_dcbx_info *p_dcbx_info;
+ struct qed_ufp_info ufp_info;
+
struct qed_dmae_info dmae_info;
/* QM init */
@@ -587,6 +648,9 @@
#endif
struct z_stream_s *stream;
+ struct workqueue_struct *slowpath_wq;
+ struct delayed_work slowpath_task;
+ unsigned long slowpath_task_flags;
};
struct pci_params {
@@ -669,10 +733,8 @@
u8 num_funcs_in_port;
u8 path_id;
- enum qed_mf_mode mf_mode;
-#define IS_MF_DEFAULT(_p_hwfn) (((_p_hwfn)->cdev)->mf_mode == QED_MF_DEFAULT)
-#define IS_MF_SI(_p_hwfn) (((_p_hwfn)->cdev)->mf_mode == QED_MF_NPAR)
-#define IS_MF_SD(_p_hwfn) (((_p_hwfn)->cdev)->mf_mode == QED_MF_OVLAN)
+
+ unsigned long mf_bits;
int pcie_width;
int pcie_speed;
@@ -853,5 +915,9 @@
union qed_mcp_protocol_stats *stats);
int qed_slowpath_irq_req(struct qed_hwfn *hwfn);
void qed_slowpath_irq_sync(struct qed_hwfn *p_hwfn);
+int qed_mfw_tlv_req(struct qed_hwfn *hwfn);
+int qed_mfw_fill_tlv_data(struct qed_hwfn *hwfn,
+ enum qed_mfw_tlv_type type,
+ union qed_mfw_tlv_data *tlv_data);
#endif /* _QED_H */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index 449777f..8f31406 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -274,8 +274,8 @@
u32 pri_tc_tbl, int count, u8 dcbx_version)
{
enum dcbx_protocol_type type;
+ bool enable, ieee, eth_tlv;
u8 tc, priority_map;
- bool enable, ieee;
u16 protocol_id;
int priority;
int i;
@@ -283,6 +283,7 @@
DP_VERBOSE(p_hwfn, QED_MSG_DCB, "Num APP entries = %d\n", count);
ieee = (dcbx_version == DCBX_CONFIG_VERSION_IEEE);
+ eth_tlv = false;
/* Parse APP TLV */
for (i = 0; i < count; i++) {
protocol_id = QED_MFW_GET_FIELD(p_tbl[i].entry,
@@ -304,13 +305,22 @@
* indication, but we only got here if there was an
* app tlv for the protocol, so dcbx must be enabled.
*/
- enable = !(type == DCBX_PROTOCOL_ETH);
+ if (type == DCBX_PROTOCOL_ETH) {
+ enable = false;
+ eth_tlv = true;
+ } else {
+ enable = true;
+ }
qed_dcbx_update_app_info(p_data, p_hwfn, enable,
priority, tc, type);
}
}
+ /* If Eth TLV is not detected, use UFP TC as default TC */
+ if (test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits) && !eth_tlv)
+ p_data->arr[DCBX_PROTOCOL_ETH].tc = p_hwfn->ufp_info.tc;
+
/* Update ramrod protocol data and hw_info fields
* with default info when corresponding APP TLV's are not detected.
* The enabled field has a different logic for ethernet as only for
diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c
index 4926c55..39124b5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
@@ -419,6 +419,7 @@
#define NUM_RSS_MEM_TYPES 5
#define NUM_BIG_RAM_TYPES 3
+#define BIG_RAM_NAME_LEN 3
#define NUM_PHY_TBUS_ADDRESSES 2048
#define PHY_DUMP_SIZE_DWORDS (NUM_PHY_TBUS_ADDRESSES / 2)
@@ -3650,8 +3651,8 @@
BIT(big_ram->is_256b_bit_offset[dev_data->chip_id]) ? 256
: 128;
- strscpy(type_name, big_ram->instance_name, sizeof(type_name));
- strscpy(mem_name, big_ram->instance_name, sizeof(mem_name));
+ strncpy(type_name, big_ram->instance_name, BIG_RAM_NAME_LEN);
+ strncpy(mem_name, big_ram->instance_name, BIG_RAM_NAME_LEN);
/* Dump memory header */
offset += qed_grc_dump_mem_hdr(p_hwfn,
@@ -7778,6 +7779,57 @@
return qed_dbg_feature_size(cdev, DBG_FEATURE_IGU_FIFO);
}
+int qed_dbg_nvm_image_length(struct qed_hwfn *p_hwfn,
+ enum qed_nvm_images image_id, u32 *length)
+{
+ struct qed_nvm_image_att image_att;
+ int rc;
+
+ *length = 0;
+ rc = qed_mcp_get_nvm_image_att(p_hwfn, image_id, &image_att);
+ if (rc)
+ return rc;
+
+ *length = image_att.length;
+
+ return rc;
+}
+
+int qed_dbg_nvm_image(struct qed_dev *cdev, void *buffer,
+ u32 *num_dumped_bytes, enum qed_nvm_images image_id)
+{
+ struct qed_hwfn *p_hwfn =
+ &cdev->hwfns[cdev->dbg_params.engine_for_debug];
+ u32 len_rounded, i;
+ __be32 val;
+ int rc;
+
+ *num_dumped_bytes = 0;
+ rc = qed_dbg_nvm_image_length(p_hwfn, image_id, &len_rounded);
+ if (rc)
+ return rc;
+
+ DP_NOTICE(p_hwfn->cdev,
+ "Collecting a debug feature [\"nvram image %d\"]\n",
+ image_id);
+
+ len_rounded = roundup(len_rounded, sizeof(u32));
+ rc = qed_mcp_get_nvm_image(p_hwfn, image_id, buffer, len_rounded);
+ if (rc)
+ return rc;
+
+ /* QED_NVM_IMAGE_NVM_META image is not swapped like other images */
+ if (image_id != QED_NVM_IMAGE_NVM_META)
+ for (i = 0; i < len_rounded; i += 4) {
+ val = cpu_to_be32(*(u32 *)(buffer + i));
+ *(u32 *)(buffer + i) = val;
+ }
+
+ *num_dumped_bytes = len_rounded;
+
+ return rc;
+}
+
int qed_dbg_protection_override(struct qed_dev *cdev, void *buffer,
u32 *num_dumped_bytes)
{
@@ -7831,6 +7883,9 @@
IGU_FIFO = 6,
PHY = 7,
FW_ASSERTS = 8,
+ NVM_CFG1 = 9,
+ DEFAULT_CFG = 10,
+ NVM_META = 11,
};
static u32 qed_calc_regdump_header(enum debug_print_features feature,
@@ -7965,13 +8020,61 @@
DP_ERR(cdev, "qed_dbg_mcp_trace failed. rc = %d\n", rc);
}
+ /* nvm cfg1 */
+ rc = qed_dbg_nvm_image(cdev,
+ (u8 *)buffer + offset + REGDUMP_HEADER_SIZE,
+ &feature_size, QED_NVM_IMAGE_NVM_CFG1);
+ if (!rc) {
+ *(u32 *)((u8 *)buffer + offset) =
+ qed_calc_regdump_header(NVM_CFG1, cur_engine,
+ feature_size, omit_engine);
+ offset += (feature_size + REGDUMP_HEADER_SIZE);
+ } else if (rc != -ENOENT) {
+ DP_ERR(cdev,
+ "qed_dbg_nvm_image failed for image %d (%s), rc = %d\n",
+ QED_NVM_IMAGE_NVM_CFG1, "QED_NVM_IMAGE_NVM_CFG1", rc);
+ }
+
+ /* nvm default */
+ rc = qed_dbg_nvm_image(cdev,
+ (u8 *)buffer + offset + REGDUMP_HEADER_SIZE,
+ &feature_size, QED_NVM_IMAGE_DEFAULT_CFG);
+ if (!rc) {
+ *(u32 *)((u8 *)buffer + offset) =
+ qed_calc_regdump_header(DEFAULT_CFG, cur_engine,
+ feature_size, omit_engine);
+ offset += (feature_size + REGDUMP_HEADER_SIZE);
+ } else if (rc != -ENOENT) {
+ DP_ERR(cdev,
+ "qed_dbg_nvm_image failed for image %d (%s), rc = %d\n",
+ QED_NVM_IMAGE_DEFAULT_CFG, "QED_NVM_IMAGE_DEFAULT_CFG",
+ rc);
+ }
+
+ /* nvm meta */
+ rc = qed_dbg_nvm_image(cdev,
+ (u8 *)buffer + offset + REGDUMP_HEADER_SIZE,
+ &feature_size, QED_NVM_IMAGE_NVM_META);
+ if (!rc) {
+ *(u32 *)((u8 *)buffer + offset) =
+ qed_calc_regdump_header(NVM_META, cur_engine,
+ feature_size, omit_engine);
+ offset += (feature_size + REGDUMP_HEADER_SIZE);
+ } else if (rc != -ENOENT) {
+ DP_ERR(cdev,
+ "qed_dbg_nvm_image failed for image %d (%s), rc = %d\n",
+ QED_NVM_IMAGE_NVM_META, "QED_NVM_IMAGE_NVM_META", rc);
+ }
+
return 0;
}
int qed_dbg_all_data_size(struct qed_dev *cdev)
{
+ struct qed_hwfn *p_hwfn =
+ &cdev->hwfns[cdev->dbg_params.engine_for_debug];
+ u32 regs_len = 0, image_len = 0;
u8 cur_engine, org_engine;
- u32 regs_len = 0;
org_engine = qed_get_debug_engine(cdev);
for (cur_engine = 0; cur_engine < cdev->num_hwfns; cur_engine++) {
@@ -7993,6 +8096,15 @@
/* Engine common */
regs_len += REGDUMP_HEADER_SIZE + qed_dbg_mcp_trace_size(cdev);
+ qed_dbg_nvm_image_length(p_hwfn, QED_NVM_IMAGE_NVM_CFG1, &image_len);
+ if (image_len)
+ regs_len += REGDUMP_HEADER_SIZE + image_len;
+ qed_dbg_nvm_image_length(p_hwfn, QED_NVM_IMAGE_DEFAULT_CFG, &image_len);
+ if (image_len)
+ regs_len += REGDUMP_HEADER_SIZE + image_len;
+ qed_dbg_nvm_image_length(p_hwfn, QED_NVM_IMAGE_NVM_META, &image_len);
+ if (image_len)
+ regs_len += REGDUMP_HEADER_SIZE + image_len;
return regs_len;
}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index d2ad5e9..5605289 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1149,18 +1149,10 @@
return -EINVAL;
}
- switch (p_hwfn->cdev->mf_mode) {
- case QED_MF_DEFAULT:
- case QED_MF_NPAR:
- hw_mode |= 1 << MODE_MF_SI;
- break;
- case QED_MF_OVLAN:
+ if (test_bit(QED_MF_OVLAN_CLSS, &p_hwfn->cdev->mf_bits))
hw_mode |= 1 << MODE_MF_SD;
- break;
- default:
- DP_NOTICE(p_hwfn, "Unsupported MF mode, init as DEFAULT\n");
+ else
hw_mode |= 1 << MODE_MF_SI;
- }
hw_mode |= 1 << MODE_ASIC;
@@ -1507,6 +1499,11 @@
STORE_RT_REG(p_hwfn, NIG_REG_LLH_FUNC_TAG_EN_RT_OFFSET, 1);
STORE_RT_REG(p_hwfn, NIG_REG_LLH_FUNC_TAG_VALUE_RT_OFFSET,
p_hwfn->hw_info.ovlan);
+
+ DP_VERBOSE(p_hwfn, NETIF_MSG_HW,
+ "Configuring LLH_FUNC_FILTER_HDR_SEL\n");
+ STORE_RT_REG(p_hwfn, NIG_REG_LLH_FUNC_FILTER_HDR_SEL_RT_OFFSET,
+ 1);
}
/* Enable classification by MAC if needed */
@@ -1557,7 +1554,6 @@
/* send function start command */
rc = qed_sp_pf_start(p_hwfn, p_ptt, p_tunn,
- p_hwfn->cdev->mf_mode,
allow_npar_tx_switch);
if (rc) {
DP_NOTICE(p_hwfn, "Function start ramrod failed\n");
@@ -1644,6 +1640,7 @@
bool b_default_mtu = true;
struct qed_hwfn *p_hwfn;
int rc = 0, mfw_rc, i;
+ u16 ether_type;
if ((p_params->int_mode == QED_INT_MODE_MSI) && (cdev->num_hwfns > 1)) {
DP_NOTICE(cdev, "MSI mode is not supported for CMT devices\n");
@@ -1677,6 +1674,24 @@
if (rc)
return rc;
+ if (IS_PF(cdev) && (test_bit(QED_MF_8021Q_TAGGING,
+ &cdev->mf_bits) ||
+ test_bit(QED_MF_8021AD_TAGGING,
+ &cdev->mf_bits))) {
+ if (test_bit(QED_MF_8021Q_TAGGING, &cdev->mf_bits))
+ ether_type = ETH_P_8021Q;
+ else
+ ether_type = ETH_P_8021AD;
+ STORE_RT_REG(p_hwfn, PRS_REG_TAG_ETHERTYPE_0_RT_OFFSET,
+ ether_type);
+ STORE_RT_REG(p_hwfn, NIG_REG_TAG_ETHERTYPE_0_RT_OFFSET,
+ ether_type);
+ STORE_RT_REG(p_hwfn, PBF_REG_TAG_ETHERTYPE_0_RT_OFFSET,
+ ether_type);
+ STORE_RT_REG(p_hwfn, DORQ_REG_TAG1_ETHERTYPE_RT_OFFSET,
+ ether_type);
+ }
+
qed_fill_load_req_params(&load_req_params,
p_params->p_drv_load_params);
rc = qed_mcp_load_req(p_hwfn, p_hwfn->p_main_ptt,
@@ -2639,31 +2654,57 @@
link->pause.autoneg,
p_caps->default_eee, p_caps->eee_lpi_timer);
- /* Read Multi-function information from shmem */
- addr = MCP_REG_SCRATCH + nvm_cfg1_offset +
- offsetof(struct nvm_cfg1, glob) +
- offsetof(struct nvm_cfg1_glob, generic_cont0);
+ if (IS_LEAD_HWFN(p_hwfn)) {
+ struct qed_dev *cdev = p_hwfn->cdev;
- generic_cont0 = qed_rd(p_hwfn, p_ptt, addr);
+ /* Read Multi-function information from shmem */
+ addr = MCP_REG_SCRATCH + nvm_cfg1_offset +
+ offsetof(struct nvm_cfg1, glob) +
+ offsetof(struct nvm_cfg1_glob, generic_cont0);
- mf_mode = (generic_cont0 & NVM_CFG1_GLOB_MF_MODE_MASK) >>
- NVM_CFG1_GLOB_MF_MODE_OFFSET;
+ generic_cont0 = qed_rd(p_hwfn, p_ptt, addr);
- switch (mf_mode) {
- case NVM_CFG1_GLOB_MF_MODE_MF_ALLOWED:
- p_hwfn->cdev->mf_mode = QED_MF_OVLAN;
- break;
- case NVM_CFG1_GLOB_MF_MODE_NPAR1_0:
- p_hwfn->cdev->mf_mode = QED_MF_NPAR;
- break;
- case NVM_CFG1_GLOB_MF_MODE_DEFAULT:
- p_hwfn->cdev->mf_mode = QED_MF_DEFAULT;
- break;
+ mf_mode = (generic_cont0 & NVM_CFG1_GLOB_MF_MODE_MASK) >>
+ NVM_CFG1_GLOB_MF_MODE_OFFSET;
+
+ switch (mf_mode) {
+ case NVM_CFG1_GLOB_MF_MODE_MF_ALLOWED:
+ cdev->mf_bits = BIT(QED_MF_OVLAN_CLSS);
+ break;
+ case NVM_CFG1_GLOB_MF_MODE_UFP:
+ cdev->mf_bits = BIT(QED_MF_OVLAN_CLSS) |
+ BIT(QED_MF_LLH_PROTO_CLSS) |
+ BIT(QED_MF_UFP_SPECIFIC) |
+ BIT(QED_MF_8021Q_TAGGING);
+ break;
+ case NVM_CFG1_GLOB_MF_MODE_BD:
+ cdev->mf_bits = BIT(QED_MF_OVLAN_CLSS) |
+ BIT(QED_MF_LLH_PROTO_CLSS) |
+ BIT(QED_MF_8021AD_TAGGING);
+ break;
+ case NVM_CFG1_GLOB_MF_MODE_NPAR1_0:
+ cdev->mf_bits = BIT(QED_MF_LLH_MAC_CLSS) |
+ BIT(QED_MF_LLH_PROTO_CLSS) |
+ BIT(QED_MF_LL2_NON_UNICAST) |
+ BIT(QED_MF_INTER_PF_SWITCH);
+ break;
+ case NVM_CFG1_GLOB_MF_MODE_DEFAULT:
+ cdev->mf_bits = BIT(QED_MF_LLH_MAC_CLSS) |
+ BIT(QED_MF_LLH_PROTO_CLSS) |
+ BIT(QED_MF_LL2_NON_UNICAST);
+ if (QED_IS_BB(p_hwfn->cdev))
+ cdev->mf_bits |= BIT(QED_MF_NEED_DEF_PF);
+ break;
+ }
+
+ DP_INFO(p_hwfn, "Multi function mode is 0x%lx\n",
+ cdev->mf_bits);
}
- DP_INFO(p_hwfn, "Multi function mode is %08x\n",
- p_hwfn->cdev->mf_mode);
- /* Read Multi-function information from shmem */
+ DP_INFO(p_hwfn, "Multi function mode is 0x%lx\n",
+ p_hwfn->cdev->mf_bits);
+
+ /* Read device capabilities information from shmem */
addr = MCP_REG_SCRATCH + nvm_cfg1_offset +
offsetof(struct nvm_cfg1, glob) +
offsetof(struct nvm_cfg1_glob, device_capabilities);
@@ -2856,6 +2897,8 @@
qed_mcp_cmd_port_init(p_hwfn, p_ptt);
qed_get_eee_caps(p_hwfn, p_ptt);
+
+ qed_mcp_read_ufp_config(p_hwfn, p_ptt);
}
if (qed_mcp_is_init(p_hwfn)) {
@@ -3462,7 +3505,7 @@
u32 high = 0, low = 0, en;
int i;
- if (!(IS_MF_SI(p_hwfn) || IS_MF_DEFAULT(p_hwfn)))
+ if (!test_bit(QED_MF_LLH_MAC_CLSS, &p_hwfn->cdev->mf_bits))
return 0;
qed_llh_mac_to_filter(&high, &low, p_filter);
@@ -3507,7 +3550,7 @@
u32 high = 0, low = 0;
int i;
- if (!(IS_MF_SI(p_hwfn) || IS_MF_DEFAULT(p_hwfn)))
+ if (!test_bit(QED_MF_LLH_MAC_CLSS, &p_hwfn->cdev->mf_bits))
return;
qed_llh_mac_to_filter(&high, &low, p_filter);
@@ -3549,7 +3592,7 @@
u32 high = 0, low = 0, en;
int i;
- if (!(IS_MF_SI(p_hwfn) || IS_MF_DEFAULT(p_hwfn)))
+ if (!test_bit(QED_MF_LLH_PROTO_CLSS, &p_hwfn->cdev->mf_bits))
return 0;
switch (type) {
@@ -3647,7 +3690,7 @@
u32 high = 0, low = 0;
int i;
- if (!(IS_MF_SI(p_hwfn) || IS_MF_DEFAULT(p_hwfn)))
+ if (!test_bit(QED_MF_LLH_PROTO_CLSS, &p_hwfn->cdev->mf_bits))
return;
switch (type) {
diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
index 2dc9b31..cc1b373 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
@@ -313,6 +313,9 @@
p_data->d_id.addr_mid = p_conn->d_id.addr_mid;
p_data->d_id.addr_lo = p_conn->d_id.addr_lo;
p_data->flags = p_conn->flags;
+ if (test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits))
+ SET_FIELD(p_data->flags,
+ FCOE_CONN_OFFLOAD_RAMROD_DATA_B_SINGLE_VLAN, 1);
p_data->def_q_idx = p_conn->def_q_idx;
return qed_spq_post(p_hwfn, p_ent, NULL);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 7f5ec42..8e1e6e1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -11863,6 +11863,8 @@
u32 running_bundle_id;
s32 external_temperature;
u32 mdump_reason;
+ u32 data_ptr;
+ u32 data_size;
};
struct fw_flr_mb {
@@ -11993,6 +11995,16 @@
#define EEE_REMOTE_TW_TX_OFFSET 0
#define EEE_REMOTE_TW_RX_MASK 0xffff0000
#define EEE_REMOTE_TW_RX_OFFSET 16
+
+ u32 oem_cfg_port;
+#define OEM_CFG_CHANNEL_TYPE_MASK 0x00000003
+#define OEM_CFG_CHANNEL_TYPE_OFFSET 0
+#define OEM_CFG_CHANNEL_TYPE_VLAN_PARTITION 0x1
+#define OEM_CFG_CHANNEL_TYPE_STAGGED 0x2
+#define OEM_CFG_SCHED_TYPE_MASK 0x0000000C
+#define OEM_CFG_SCHED_TYPE_OFFSET 2
+#define OEM_CFG_SCHED_TYPE_ETS 0x1
+#define OEM_CFG_SCHED_TYPE_VNIC_BW 0x2
};
struct public_func {
@@ -12069,6 +12081,23 @@
#define DRV_ID_DRV_INIT_HW_MASK 0x80000000
#define DRV_ID_DRV_INIT_HW_SHIFT 31
#define DRV_ID_DRV_INIT_HW_FLAG (1 << DRV_ID_DRV_INIT_HW_SHIFT)
+
+ u32 oem_cfg_func;
+#define OEM_CFG_FUNC_TC_MASK 0x0000000F
+#define OEM_CFG_FUNC_TC_OFFSET 0
+#define OEM_CFG_FUNC_TC_0 0x0
+#define OEM_CFG_FUNC_TC_1 0x1
+#define OEM_CFG_FUNC_TC_2 0x2
+#define OEM_CFG_FUNC_TC_3 0x3
+#define OEM_CFG_FUNC_TC_4 0x4
+#define OEM_CFG_FUNC_TC_5 0x5
+#define OEM_CFG_FUNC_TC_6 0x6
+#define OEM_CFG_FUNC_TC_7 0x7
+
+#define OEM_CFG_FUNC_HOST_PRI_CTRL_MASK 0x00000030
+#define OEM_CFG_FUNC_HOST_PRI_CTRL_OFFSET 4
+#define OEM_CFG_FUNC_HOST_PRI_CTRL_VNIC 0x1
+#define OEM_CFG_FUNC_HOST_PRI_CTRL_OS 0x2
};
struct mcp_mac {
@@ -12295,6 +12324,7 @@
#define DRV_MSG_CODE_BIST_TEST 0x001e0000
#define DRV_MSG_CODE_SET_LED_MODE 0x00200000
#define DRV_MSG_CODE_RESOURCE_CMD 0x00230000
+#define DRV_MSG_CODE_GET_TLV_DONE 0x002f0000
#define RESOURCE_CMD_REQ_RESC_MASK 0x0000001F
#define RESOURCE_CMD_REQ_RESC_SHIFT 0
@@ -12495,6 +12525,8 @@
MFW_DRV_MSG_BW_UPDATE10,
MFW_DRV_MSG_TRANSCEIVER_STATE_CHANGE,
MFW_DRV_MSG_BW_UPDATE11,
+ MFW_DRV_MSG_OEM_CFG_UPDATE,
+ MFW_DRV_MSG_GET_TLV_REQ,
MFW_DRV_MSG_MAX
};
@@ -12530,6 +12562,233 @@
struct public_func func[MCP_GLOB_FUNC_MAX];
};
+/* OCBB definitions */
+enum tlvs {
+ /* Category 1: Device Properties */
+ DRV_TLV_CLP_STR,
+ DRV_TLV_CLP_STR_CTD,
+ /* Category 6: Device Configuration */
+ DRV_TLV_SCSI_TO,
+ DRV_TLV_R_T_TOV,
+ DRV_TLV_R_A_TOV,
+ DRV_TLV_E_D_TOV,
+ DRV_TLV_CR_TOV,
+ DRV_TLV_BOOT_TYPE,
+ /* Category 8: Port Configuration */
+ DRV_TLV_NPIV_ENABLED,
+ /* Category 10: Function Configuration */
+ DRV_TLV_FEATURE_FLAGS,
+ DRV_TLV_LOCAL_ADMIN_ADDR,
+ DRV_TLV_ADDITIONAL_MAC_ADDR_1,
+ DRV_TLV_ADDITIONAL_MAC_ADDR_2,
+ DRV_TLV_LSO_MAX_OFFLOAD_SIZE,
+ DRV_TLV_LSO_MIN_SEGMENT_COUNT,
+ DRV_TLV_PROMISCUOUS_MODE,
+ DRV_TLV_TX_DESCRIPTORS_QUEUE_SIZE,
+ DRV_TLV_RX_DESCRIPTORS_QUEUE_SIZE,
+ DRV_TLV_NUM_OF_NET_QUEUE_VMQ_CFG,
+ DRV_TLV_FLEX_NIC_OUTER_VLAN_ID,
+ DRV_TLV_OS_DRIVER_STATES,
+ DRV_TLV_PXE_BOOT_PROGRESS,
+ /* Category 12: FC/FCoE Configuration */
+ DRV_TLV_NPIV_STATE,
+ DRV_TLV_NUM_OF_NPIV_IDS,
+ DRV_TLV_SWITCH_NAME,
+ DRV_TLV_SWITCH_PORT_NUM,
+ DRV_TLV_SWITCH_PORT_ID,
+ DRV_TLV_VENDOR_NAME,
+ DRV_TLV_SWITCH_MODEL,
+ DRV_TLV_SWITCH_FW_VER,
+ DRV_TLV_QOS_PRIORITY_PER_802_1P,
+ DRV_TLV_PORT_ALIAS,
+ DRV_TLV_PORT_STATE,
+ DRV_TLV_FIP_TX_DESCRIPTORS_QUEUE_SIZE,
+ DRV_TLV_FCOE_RX_DESCRIPTORS_QUEUE_SIZE,
+ DRV_TLV_LINK_FAILURE_COUNT,
+ DRV_TLV_FCOE_BOOT_PROGRESS,
+ /* Category 13: iSCSI Configuration */
+ DRV_TLV_TARGET_LLMNR_ENABLED,
+ DRV_TLV_HEADER_DIGEST_FLAG_ENABLED,
+ DRV_TLV_DATA_DIGEST_FLAG_ENABLED,
+ DRV_TLV_AUTHENTICATION_METHOD,
+ DRV_TLV_ISCSI_BOOT_TARGET_PORTAL,
+ DRV_TLV_MAX_FRAME_SIZE,
+ DRV_TLV_PDU_TX_DESCRIPTORS_QUEUE_SIZE,
+ DRV_TLV_PDU_RX_DESCRIPTORS_QUEUE_SIZE,
+ DRV_TLV_ISCSI_BOOT_PROGRESS,
+ /* Category 20: Device Data */
+ DRV_TLV_PCIE_BUS_RX_UTILIZATION,
+ DRV_TLV_PCIE_BUS_TX_UTILIZATION,
+ DRV_TLV_DEVICE_CPU_CORES_UTILIZATION,
+ DRV_TLV_LAST_VALID_DCC_TLV_RECEIVED,
+ DRV_TLV_NCSI_RX_BYTES_RECEIVED,
+ DRV_TLV_NCSI_TX_BYTES_SENT,
+ /* Category 22: Base Port Data */
+ DRV_TLV_RX_DISCARDS,
+ DRV_TLV_RX_ERRORS,
+ DRV_TLV_TX_ERRORS,
+ DRV_TLV_TX_DISCARDS,
+ DRV_TLV_RX_FRAMES_RECEIVED,
+ DRV_TLV_TX_FRAMES_SENT,
+ /* Category 23: FC/FCoE Port Data */
+ DRV_TLV_RX_BROADCAST_PACKETS,
+ DRV_TLV_TX_BROADCAST_PACKETS,
+ /* Category 28: Base Function Data */
+ DRV_TLV_NUM_OFFLOADED_CONNECTIONS_TCP_IPV4,
+ DRV_TLV_NUM_OFFLOADED_CONNECTIONS_TCP_IPV6,
+ DRV_TLV_TX_DESCRIPTOR_QUEUE_AVG_DEPTH,
+ DRV_TLV_RX_DESCRIPTORS_QUEUE_AVG_DEPTH,
+ DRV_TLV_PF_RX_FRAMES_RECEIVED,
+ DRV_TLV_RX_BYTES_RECEIVED,
+ DRV_TLV_PF_TX_FRAMES_SENT,
+ DRV_TLV_TX_BYTES_SENT,
+ DRV_TLV_IOV_OFFLOAD,
+ DRV_TLV_PCI_ERRORS_CAP_ID,
+ DRV_TLV_UNCORRECTABLE_ERROR_STATUS,
+ DRV_TLV_UNCORRECTABLE_ERROR_MASK,
+ DRV_TLV_CORRECTABLE_ERROR_STATUS,
+ DRV_TLV_CORRECTABLE_ERROR_MASK,
+ DRV_TLV_PCI_ERRORS_AECC_REGISTER,
+ DRV_TLV_TX_QUEUES_EMPTY,
+ DRV_TLV_RX_QUEUES_EMPTY,
+ DRV_TLV_TX_QUEUES_FULL,
+ DRV_TLV_RX_QUEUES_FULL,
+ /* Category 29: FC/FCoE Function Data */
+ DRV_TLV_FCOE_TX_DESCRIPTOR_QUEUE_AVG_DEPTH,
+ DRV_TLV_FCOE_RX_DESCRIPTORS_QUEUE_AVG_DEPTH,
+ DRV_TLV_FCOE_RX_FRAMES_RECEIVED,
+ DRV_TLV_FCOE_RX_BYTES_RECEIVED,
+ DRV_TLV_FCOE_TX_FRAMES_SENT,
+ DRV_TLV_FCOE_TX_BYTES_SENT,
+ DRV_TLV_CRC_ERROR_COUNT,
+ DRV_TLV_CRC_ERROR_1_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_CRC_ERROR_1_TIMESTAMP,
+ DRV_TLV_CRC_ERROR_2_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_CRC_ERROR_2_TIMESTAMP,
+ DRV_TLV_CRC_ERROR_3_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_CRC_ERROR_3_TIMESTAMP,
+ DRV_TLV_CRC_ERROR_4_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_CRC_ERROR_4_TIMESTAMP,
+ DRV_TLV_CRC_ERROR_5_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_CRC_ERROR_5_TIMESTAMP,
+ DRV_TLV_LOSS_OF_SYNC_ERROR_COUNT,
+ DRV_TLV_LOSS_OF_SIGNAL_ERRORS,
+ DRV_TLV_PRIMITIVE_SEQUENCE_PROTOCOL_ERROR_COUNT,
+ DRV_TLV_DISPARITY_ERROR_COUNT,
+ DRV_TLV_CODE_VIOLATION_ERROR_COUNT,
+ DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_1,
+ DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_2,
+ DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_3,
+ DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_4,
+ DRV_TLV_LAST_FLOGI_TIMESTAMP,
+ DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_1,
+ DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_2,
+ DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_3,
+ DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_4,
+ DRV_TLV_LAST_FLOGI_ACC_TIMESTAMP,
+ DRV_TLV_LAST_FLOGI_RJT,
+ DRV_TLV_LAST_FLOGI_RJT_TIMESTAMP,
+ DRV_TLV_FDISCS_SENT_COUNT,
+ DRV_TLV_FDISC_ACCS_RECEIVED,
+ DRV_TLV_FDISC_RJTS_RECEIVED,
+ DRV_TLV_PLOGI_SENT_COUNT,
+ DRV_TLV_PLOGI_ACCS_RECEIVED,
+ DRV_TLV_PLOGI_RJTS_RECEIVED,
+ DRV_TLV_PLOGI_1_SENT_DESTINATION_FC_ID,
+ DRV_TLV_PLOGI_1_TIMESTAMP,
+ DRV_TLV_PLOGI_2_SENT_DESTINATION_FC_ID,
+ DRV_TLV_PLOGI_2_TIMESTAMP,
+ DRV_TLV_PLOGI_3_SENT_DESTINATION_FC_ID,
+ DRV_TLV_PLOGI_3_TIMESTAMP,
+ DRV_TLV_PLOGI_4_SENT_DESTINATION_FC_ID,
+ DRV_TLV_PLOGI_4_TIMESTAMP,
+ DRV_TLV_PLOGI_5_SENT_DESTINATION_FC_ID,
+ DRV_TLV_PLOGI_5_TIMESTAMP,
+ DRV_TLV_PLOGI_1_ACC_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_PLOGI_1_ACC_TIMESTAMP,
+ DRV_TLV_PLOGI_2_ACC_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_PLOGI_2_ACC_TIMESTAMP,
+ DRV_TLV_PLOGI_3_ACC_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_PLOGI_3_ACC_TIMESTAMP,
+ DRV_TLV_PLOGI_4_ACC_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_PLOGI_4_ACC_TIMESTAMP,
+ DRV_TLV_PLOGI_5_ACC_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_PLOGI_5_ACC_TIMESTAMP,
+ DRV_TLV_LOGOS_ISSUED,
+ DRV_TLV_LOGO_ACCS_RECEIVED,
+ DRV_TLV_LOGO_RJTS_RECEIVED,
+ DRV_TLV_LOGO_1_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_LOGO_1_TIMESTAMP,
+ DRV_TLV_LOGO_2_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_LOGO_2_TIMESTAMP,
+ DRV_TLV_LOGO_3_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_LOGO_3_TIMESTAMP,
+ DRV_TLV_LOGO_4_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_LOGO_4_TIMESTAMP,
+ DRV_TLV_LOGO_5_RECEIVED_SOURCE_FC_ID,
+ DRV_TLV_LOGO_5_TIMESTAMP,
+ DRV_TLV_LOGOS_RECEIVED,
+ DRV_TLV_ACCS_ISSUED,
+ DRV_TLV_PRLIS_ISSUED,
+ DRV_TLV_ACCS_RECEIVED,
+ DRV_TLV_ABTS_SENT_COUNT,
+ DRV_TLV_ABTS_ACCS_RECEIVED,
+ DRV_TLV_ABTS_RJTS_RECEIVED,
+ DRV_TLV_ABTS_1_SENT_DESTINATION_FC_ID,
+ DRV_TLV_ABTS_1_TIMESTAMP,
+ DRV_TLV_ABTS_2_SENT_DESTINATION_FC_ID,
+ DRV_TLV_ABTS_2_TIMESTAMP,
+ DRV_TLV_ABTS_3_SENT_DESTINATION_FC_ID,
+ DRV_TLV_ABTS_3_TIMESTAMP,
+ DRV_TLV_ABTS_4_SENT_DESTINATION_FC_ID,
+ DRV_TLV_ABTS_4_TIMESTAMP,
+ DRV_TLV_ABTS_5_SENT_DESTINATION_FC_ID,
+ DRV_TLV_ABTS_5_TIMESTAMP,
+ DRV_TLV_RSCNS_RECEIVED,
+ DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_1,
+ DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_2,
+ DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_3,
+ DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_4,
+ DRV_TLV_LUN_RESETS_ISSUED,
+ DRV_TLV_ABORT_TASK_SETS_ISSUED,
+ DRV_TLV_TPRLOS_SENT,
+ DRV_TLV_NOS_SENT_COUNT,
+ DRV_TLV_NOS_RECEIVED_COUNT,
+ DRV_TLV_OLS_COUNT,
+ DRV_TLV_LR_COUNT,
+ DRV_TLV_LRR_COUNT,
+ DRV_TLV_LIP_SENT_COUNT,
+ DRV_TLV_LIP_RECEIVED_COUNT,
+ DRV_TLV_EOFA_COUNT,
+ DRV_TLV_EOFNI_COUNT,
+ DRV_TLV_SCSI_STATUS_CHECK_CONDITION_COUNT,
+ DRV_TLV_SCSI_STATUS_CONDITION_MET_COUNT,
+ DRV_TLV_SCSI_STATUS_BUSY_COUNT,
+ DRV_TLV_SCSI_STATUS_INTERMEDIATE_COUNT,
+ DRV_TLV_SCSI_STATUS_INTERMEDIATE_CONDITION_MET_COUNT,
+ DRV_TLV_SCSI_STATUS_RESERVATION_CONFLICT_COUNT,
+ DRV_TLV_SCSI_STATUS_TASK_SET_FULL_COUNT,
+ DRV_TLV_SCSI_STATUS_ACA_ACTIVE_COUNT,
+ DRV_TLV_SCSI_STATUS_TASK_ABORTED_COUNT,
+ DRV_TLV_SCSI_CHECK_CONDITION_1_RECEIVED_SK_ASC_ASCQ,
+ DRV_TLV_SCSI_CHECK_1_TIMESTAMP,
+ DRV_TLV_SCSI_CHECK_CONDITION_2_RECEIVED_SK_ASC_ASCQ,
+ DRV_TLV_SCSI_CHECK_2_TIMESTAMP,
+ DRV_TLV_SCSI_CHECK_CONDITION_3_RECEIVED_SK_ASC_ASCQ,
+ DRV_TLV_SCSI_CHECK_3_TIMESTAMP,
+ DRV_TLV_SCSI_CHECK_CONDITION_4_RECEIVED_SK_ASC_ASCQ,
+ DRV_TLV_SCSI_CHECK_4_TIMESTAMP,
+ DRV_TLV_SCSI_CHECK_CONDITION_5_RECEIVED_SK_ASC_ASCQ,
+ DRV_TLV_SCSI_CHECK_5_TIMESTAMP,
+ /* Category 30: iSCSI Function Data */
+ DRV_TLV_PDU_TX_DESCRIPTOR_QUEUE_AVG_DEPTH,
+ DRV_TLV_PDU_RX_DESCRIPTORS_QUEUE_AVG_DEPTH,
+ DRV_TLV_ISCSI_PDU_RX_FRAMES_RECEIVED,
+ DRV_TLV_ISCSI_PDU_RX_BYTES_RECEIVED,
+ DRV_TLV_ISCSI_PDU_TX_FRAMES_SENT,
+ DRV_TLV_ISCSI_PDU_TX_BYTES_SENT
+};
+
struct nvm_cfg_mac_address {
u32 mac_addr_hi;
#define NVM_CFG_MAC_ADDRESS_HI_MASK 0x0000FFFF
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 8667799d..1c0d0c2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -1677,6 +1677,8 @@
HILO_64_REGPAIR(tstats.mftag_filter_discard);
p_stats->common.mac_filter_discards +=
HILO_64_REGPAIR(tstats.eth_mac_filter_discard);
+ p_stats->common.gft_filter_drop +=
+ HILO_64_REGPAIR(tstats.eth_gft_drop_pkt);
}
static void __qed_get_vport_ustats_addrlen(struct qed_hwfn *p_hwfn,
@@ -1973,6 +1975,8 @@
return GFT_PROFILE_TYPE_4_TUPLE;
if (mode == QED_FILTER_CONFIG_MODE_IP_DEST)
return GFT_PROFILE_TYPE_IP_DST_ADDR;
+ if (mode == QED_FILTER_CONFIG_MODE_IP_SRC)
+ return GFT_PROFILE_TYPE_IP_SRC_ADDR;
return GFT_PROFILE_TYPE_L4_DST_PORT;
}
@@ -2013,16 +2017,6 @@
u8 abs_vport_id = 0;
int rc = -EINVAL;
- rc = qed_fw_vport(p_hwfn, p_params->vport_id, &abs_vport_id);
- if (rc)
- return rc;
-
- if (p_params->qid != QED_RFS_NTUPLE_QID_RSS) {
- rc = qed_fw_l2_queue(p_hwfn, p_params->qid, &abs_rx_q_id);
- if (rc)
- return rc;
- }
-
/* Get SPQ entry */
memset(&init_data, 0, sizeof(init_data));
init_data.cid = qed_spq_get_cid(p_hwfn);
@@ -2047,15 +2041,28 @@
DMA_REGPAIR_LE(p_ramrod->pkt_hdr_addr, p_params->addr);
p_ramrod->pkt_hdr_length = cpu_to_le16(p_params->length);
- if (p_params->qid != QED_RFS_NTUPLE_QID_RSS) {
- p_ramrod->rx_qid_valid = 1;
- p_ramrod->rx_qid = cpu_to_le16(abs_rx_q_id);
+ if (p_params->b_is_drop) {
+ p_ramrod->vport_id = cpu_to_le16(ETH_GFT_TRASHCAN_VPORT);
+ } else {
+ rc = qed_fw_vport(p_hwfn, p_params->vport_id, &abs_vport_id);
+ if (rc)
+ return rc;
+
+ if (p_params->qid != QED_RFS_NTUPLE_QID_RSS) {
+ rc = qed_fw_l2_queue(p_hwfn, p_params->qid,
+ &abs_rx_q_id);
+ if (rc)
+ return rc;
+
+ p_ramrod->rx_qid_valid = 1;
+ p_ramrod->rx_qid = cpu_to_le16(abs_rx_q_id);
+ }
+
+ p_ramrod->vport_id = cpu_to_le16((u16)abs_vport_id);
}
p_ramrod->flow_id_valid = 0;
p_ramrod->flow_id = 0;
-
- p_ramrod->vport_id = cpu_to_le16((u16)abs_vport_id);
p_ramrod->filter_action = p_params->b_is_add ? GFT_ADD_FILTER
: GFT_DELETE_FILTER;
@@ -2848,6 +2855,24 @@
cqe);
}
+static int qed_req_bulletin_update_mac(struct qed_dev *cdev, u8 *mac)
+{
+ int i, ret;
+
+ if (IS_PF(cdev))
+ return 0;
+
+ for_each_hwfn(cdev, i) {
+ struct qed_hwfn *p_hwfn = &cdev->hwfns[i];
+
+ ret = qed_vf_pf_bulletin_update_mac(p_hwfn, mac);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_QED_SRIOV
extern const struct qed_iov_hv_ops qed_iov_ops_pass;
#endif
@@ -2885,6 +2910,7 @@
.ntuple_filter_config = &qed_ntuple_arfs_filter_config,
.configure_arfs_searcher = &qed_configure_arfs_searcher,
.get_coalesce = &qed_get_coalesce,
+ .req_bulletin_update_mac = &qed_req_bulletin_update_mac,
};
const struct qed_eth_ops *qed_get_eth_ops(void)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index 468c59d..c97ebd6 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -960,12 +960,16 @@
p_ramrod->drop_ttl0_flg = p_ll2_conn->input.rx_drop_ttl0_flg;
p_ramrod->inner_vlan_stripping_en =
p_ll2_conn->input.rx_vlan_removal_en;
+
+ if (test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits) &&
+ p_ll2_conn->input.conn_type == QED_LL2_TYPE_FCOE)
+ p_ramrod->report_outer_vlan = 1;
p_ramrod->queue_id = p_ll2_conn->queue_id;
p_ramrod->main_func_queue = p_ll2_conn->main_func_queue ? 1 : 0;
- if ((IS_MF_DEFAULT(p_hwfn) || IS_MF_SI(p_hwfn)) &&
- p_ramrod->main_func_queue && (conn_type != QED_LL2_TYPE_ROCE) &&
- (conn_type != QED_LL2_TYPE_IWARP)) {
+ if (test_bit(QED_MF_LL2_NON_UNICAST, &p_hwfn->cdev->mf_bits) &&
+ p_ramrod->main_func_queue && conn_type != QED_LL2_TYPE_ROCE &&
+ conn_type != QED_LL2_TYPE_IWARP) {
p_ramrod->mf_si_bcast_accept_all = 1;
p_ramrod->mf_si_mcast_accept_all = 1;
} else {
@@ -1534,11 +1538,12 @@
qed_ll2_establish_connection_ooo(p_hwfn, p_ll2_conn);
if (p_ll2_conn->input.conn_type == QED_LL2_TYPE_FCOE) {
+ if (!test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits))
+ qed_llh_add_protocol_filter(p_hwfn, p_ptt,
+ ETH_P_FCOE, 0,
+ QED_LLH_FILTER_ETHERTYPE);
qed_llh_add_protocol_filter(p_hwfn, p_ptt,
- 0x8906, 0,
- QED_LLH_FILTER_ETHERTYPE);
- qed_llh_add_protocol_filter(p_hwfn, p_ptt,
- 0x8914, 0,
+ ETH_P_FIP, 0,
QED_LLH_FILTER_ETHERTYPE);
}
@@ -1694,11 +1699,16 @@
start_bd = (struct core_tx_bd *)qed_chain_produce(p_tx_chain);
if (QED_IS_IWARP_PERSONALITY(p_hwfn) &&
- p_ll2->input.conn_type == QED_LL2_TYPE_OOO)
+ p_ll2->input.conn_type == QED_LL2_TYPE_OOO) {
start_bd->nw_vlan_or_lb_echo =
cpu_to_le16(IWARP_LL2_IN_ORDER_TX_QUEUE);
- else
+ } else {
start_bd->nw_vlan_or_lb_echo = cpu_to_le16(pkt->vlan);
+ if (test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits) &&
+ p_ll2->input.conn_type == QED_LL2_TYPE_FCOE)
+ pkt->remove_stag = true;
+ }
+
SET_FIELD(start_bd->bitfield1, CORE_TX_BD_L4_HDR_OFFSET_W,
cpu_to_le16(pkt->l4_hdr_offset_w));
SET_FIELD(start_bd->bitfield1, CORE_TX_BD_TX_DST, tx_dest);
@@ -1709,6 +1719,9 @@
SET_FIELD(bd_data, CORE_TX_BD_DATA_IP_CSUM, !!(pkt->enable_ip_cksum));
SET_FIELD(bd_data, CORE_TX_BD_DATA_L4_CSUM, !!(pkt->enable_l4_cksum));
SET_FIELD(bd_data, CORE_TX_BD_DATA_IP_LEN, !!(pkt->calc_ip_len));
+ SET_FIELD(bd_data, CORE_TX_BD_DATA_DISABLE_STAG_INSERTION,
+ !!(pkt->remove_stag));
+
start_bd->bd_data.as_bitfield = cpu_to_le16(bd_data);
DMA_REGPAIR_LE(start_bd->addr, pkt->first_frag);
start_bd->nbytes = cpu_to_le16(pkt->first_frag_len);
@@ -1933,11 +1946,12 @@
qed_ooo_release_all_isles(p_hwfn, p_hwfn->p_ooo_info);
if (p_ll2_conn->input.conn_type == QED_LL2_TYPE_FCOE) {
+ if (!test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits))
+ qed_llh_remove_protocol_filter(p_hwfn, p_ptt,
+ ETH_P_FCOE, 0,
+ QED_LLH_FILTER_ETHERTYPE);
qed_llh_remove_protocol_filter(p_hwfn, p_ptt,
- 0x8906, 0,
- QED_LLH_FILTER_ETHERTYPE);
- qed_llh_remove_protocol_filter(p_hwfn, p_ptt,
- 0x8914, 0,
+ ETH_P_FIP, 0,
QED_LLH_FILTER_ETHERTYPE);
}
@@ -2399,7 +2413,8 @@
return -EINVAL;
}
-static int qed_ll2_start_xmit(struct qed_dev *cdev, struct sk_buff *skb)
+static int qed_ll2_start_xmit(struct qed_dev *cdev, struct sk_buff *skb,
+ unsigned long xmit_flags)
{
struct qed_ll2_tx_pkt_info pkt;
const skb_frag_t *frag;
@@ -2444,6 +2459,9 @@
pkt.first_frag = mapping;
pkt.first_frag_len = skb->len;
pkt.cookie = skb;
+ if (test_bit(QED_MF_UFP_SPECIFIC, &cdev->mf_bits) &&
+ test_bit(QED_LL2_XMIT_FLAGS_FIP_DISCOVERY, &xmit_flags))
+ pkt.remove_stag = true;
rc = qed_ll2_prepare_tx_packet(&cdev->hwfns[0], cdev->ll2->handle,
&pkt, 1);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 7870ae2..68c4399 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -264,7 +264,6 @@
dev_info->pci_mem_end = cdev->pci_params.mem_end;
dev_info->pci_irq = cdev->pci_params.irq;
dev_info->rdma_supported = QED_IS_RDMA_PERSONALITY(p_hwfn);
- dev_info->is_mf_default = IS_MF_DEFAULT(&cdev->hwfns[0]);
dev_info->dev_type = cdev->type;
ether_addr_copy(dev_info->hw_mac, hw_info->hw_mac_addr);
@@ -273,7 +272,8 @@
dev_info->fw_minor = FW_MINOR_VERSION;
dev_info->fw_rev = FW_REVISION_VERSION;
dev_info->fw_eng = FW_ENGINEERING_VERSION;
- dev_info->mf_mode = cdev->mf_mode;
+ dev_info->b_inter_pf_switch = test_bit(QED_MF_INTER_PF_SWITCH,
+ &cdev->mf_bits);
dev_info->tx_switching = true;
if (hw_info->b_wol_support == QED_WOL_SUPPORT_PME)
@@ -946,6 +946,68 @@
}
}
+static void qed_slowpath_wq_stop(struct qed_dev *cdev)
+{
+ int i;
+
+ if (IS_VF(cdev))
+ return;
+
+ for_each_hwfn(cdev, i) {
+ if (!cdev->hwfns[i].slowpath_wq)
+ continue;
+
+ flush_workqueue(cdev->hwfns[i].slowpath_wq);
+ destroy_workqueue(cdev->hwfns[i].slowpath_wq);
+ }
+}
+
+static void qed_slowpath_task(struct work_struct *work)
+{
+ struct qed_hwfn *hwfn = container_of(work, struct qed_hwfn,
+ slowpath_task.work);
+ struct qed_ptt *ptt = qed_ptt_acquire(hwfn);
+
+ if (!ptt) {
+ queue_delayed_work(hwfn->slowpath_wq, &hwfn->slowpath_task, 0);
+ return;
+ }
+
+ if (test_and_clear_bit(QED_SLOWPATH_MFW_TLV_REQ,
+ &hwfn->slowpath_task_flags))
+ qed_mfw_process_tlv_req(hwfn, ptt);
+
+ qed_ptt_release(hwfn, ptt);
+}
+
+static int qed_slowpath_wq_start(struct qed_dev *cdev)
+{
+ struct qed_hwfn *hwfn;
+ char name[NAME_SIZE];
+ int i;
+
+ if (IS_VF(cdev))
+ return 0;
+
+ for_each_hwfn(cdev, i) {
+ hwfn = &cdev->hwfns[i];
+
+ snprintf(name, NAME_SIZE, "slowpath-%02x:%02x.%02x",
+ cdev->pdev->bus->number,
+ PCI_SLOT(cdev->pdev->devfn), hwfn->abs_pf_id);
+
+ hwfn->slowpath_wq = alloc_workqueue(name, 0, 0);
+ if (!hwfn->slowpath_wq) {
+ DP_NOTICE(hwfn, "Cannot create slowpath workqueue\n");
+ return -ENOMEM;
+ }
+
+ INIT_DELAYED_WORK(&hwfn->slowpath_task, qed_slowpath_task);
+ }
+
+ return 0;
+}
+
static int qed_slowpath_start(struct qed_dev *cdev,
struct qed_slowpath_params *params)
{
@@ -961,6 +1023,9 @@
if (qed_iov_wq_start(cdev))
goto err;
+ if (qed_slowpath_wq_start(cdev))
+ goto err;
+
if (IS_PF(cdev)) {
rc = request_firmware(&cdev->firmware, QED_FW_FILE_NAME,
&cdev->pdev->dev);
@@ -1095,6 +1160,8 @@
qed_iov_wq_stop(cdev, false);
+ qed_slowpath_wq_stop(cdev);
+
return rc;
}
@@ -1103,6 +1170,8 @@
if (!cdev)
return -ENODEV;
+ qed_slowpath_wq_stop(cdev);
+
qed_ll2_dealloc_if(cdev);
if (IS_PF(cdev)) {
@@ -1894,15 +1963,8 @@
u8 *buf, u16 len)
{
struct qed_hwfn *hwfn = QED_LEADING_HWFN(cdev);
- struct qed_ptt *ptt = qed_ptt_acquire(hwfn);
- int rc;
- if (!ptt)
- return -EAGAIN;
-
- rc = qed_mcp_get_nvm_image(hwfn, ptt, type, buf, len);
- qed_ptt_release(hwfn, ptt);
- return rc;
+ return qed_mcp_get_nvm_image(hwfn, type, buf, len);
}
static int qed_set_coalesce(struct qed_dev *cdev, u16 rx_coal, u16 tx_coal,
@@ -2095,3 +2157,89 @@
return;
}
}
+
+int qed_mfw_tlv_req(struct qed_hwfn *hwfn)
+{
+ DP_VERBOSE(hwfn->cdev, NETIF_MSG_DRV,
+ "Scheduling slowpath task [Flag: %d]\n",
+ QED_SLOWPATH_MFW_TLV_REQ);
+ smp_mb__before_atomic();
+ set_bit(QED_SLOWPATH_MFW_TLV_REQ, &hwfn->slowpath_task_flags);
+ smp_mb__after_atomic();
+ queue_delayed_work(hwfn->slowpath_wq, &hwfn->slowpath_task, 0);
+
+ return 0;
+}
+
+static void
+qed_fill_generic_tlv_data(struct qed_dev *cdev, struct qed_mfw_tlv_generic *tlv)
+{
+ struct qed_common_cb_ops *op = cdev->protocol_ops.common;
+ struct qed_eth_stats_common *p_common;
+ struct qed_generic_tlvs gen_tlvs;
+ struct qed_eth_stats stats;
+ int i;
+
+ memset(&gen_tlvs, 0, sizeof(gen_tlvs));
+ op->get_generic_tlv_data(cdev->ops_cookie, &gen_tlvs);
+
+ if (gen_tlvs.feat_flags & QED_TLV_IP_CSUM)
+ tlv->flags.ipv4_csum_offload = true;
+ if (gen_tlvs.feat_flags & QED_TLV_LSO)
+ tlv->flags.lso_supported = true;
+ tlv->flags.b_set = true;
+
+ for (i = 0; i < QED_TLV_MAC_COUNT; i++) {
+ if (is_valid_ether_addr(gen_tlvs.mac[i])) {
+ ether_addr_copy(tlv->mac[i], gen_tlvs.mac[i]);
+ tlv->mac_set[i] = true;
+ }
+ }
+
+ qed_get_vport_stats(cdev, &stats);
+ p_common = &stats.common;
+ tlv->rx_frames = p_common->rx_ucast_pkts + p_common->rx_mcast_pkts +
+ p_common->rx_bcast_pkts;
+ tlv->rx_frames_set = true;
+ tlv->rx_bytes = p_common->rx_ucast_bytes + p_common->rx_mcast_bytes +
+ p_common->rx_bcast_bytes;
+ tlv->rx_bytes_set = true;
+ tlv->tx_frames = p_common->tx_ucast_pkts + p_common->tx_mcast_pkts +
+ p_common->tx_bcast_pkts;
+ tlv->tx_frames_set = true;
+ tlv->tx_bytes = p_common->tx_ucast_bytes + p_common->tx_mcast_bytes +
+ p_common->tx_bcast_bytes;
+ tlv->rx_bytes_set = true;
+}
+
+int qed_mfw_fill_tlv_data(struct qed_hwfn *hwfn, enum qed_mfw_tlv_type type,
+ union qed_mfw_tlv_data *tlv_buf)
+{
+ struct qed_dev *cdev = hwfn->cdev;
+ struct qed_common_cb_ops *ops;
+
+ ops = cdev->protocol_ops.common;
+ if (!ops || !ops->get_protocol_tlv_data || !ops->get_generic_tlv_data) {
+ DP_NOTICE(hwfn, "Can't collect TLV management info\n");
+ return -EINVAL;
+ }
+
+ switch (type) {
+ case QED_MFW_TLV_GENERIC:
+ qed_fill_generic_tlv_data(hwfn->cdev, &tlv_buf->generic);
+ break;
+ case QED_MFW_TLV_ETH:
+ ops->get_protocol_tlv_data(cdev->ops_cookie, &tlv_buf->eth);
+ break;
+ case QED_MFW_TLV_FCOE:
+ ops->get_protocol_tlv_data(cdev->ops_cookie, &tlv_buf->fcoe);
+ break;
+ case QED_MFW_TLV_ISCSI:
+ ops->get_protocol_tlv_data(cdev->ops_cookie, &tlv_buf->iscsi);
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index ec0d425..2612e3e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -40,6 +40,7 @@
#include <linux/string.h>
#include <linux/etherdevice.h>
#include "qed.h"
+#include "qed_cxt.h"
#include "qed_dcbx.h"
#include "qed_hsi.h"
#include "qed_hw.h"
@@ -1486,6 +1487,80 @@
&resp, ¶m);
}
+void qed_mcp_read_ufp_config(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
+{
+ struct public_func shmem_info;
+ u32 port_cfg, val;
+
+ if (!test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits))
+ return;
+
+ memset(&p_hwfn->ufp_info, 0, sizeof(p_hwfn->ufp_info));
+ port_cfg = qed_rd(p_hwfn, p_ptt, p_hwfn->mcp_info->port_addr +
+ offsetof(struct public_port, oem_cfg_port));
+ val = (port_cfg & OEM_CFG_CHANNEL_TYPE_MASK) >>
+ OEM_CFG_CHANNEL_TYPE_OFFSET;
+ if (val != OEM_CFG_CHANNEL_TYPE_STAGGED)
+ DP_NOTICE(p_hwfn, "Incorrect UFP Channel type %d\n", val);
+
+ val = (port_cfg & OEM_CFG_SCHED_TYPE_MASK) >> OEM_CFG_SCHED_TYPE_OFFSET;
+ if (val == OEM_CFG_SCHED_TYPE_ETS) {
+ p_hwfn->ufp_info.mode = QED_UFP_MODE_ETS;
+ } else if (val == OEM_CFG_SCHED_TYPE_VNIC_BW) {
+ p_hwfn->ufp_info.mode = QED_UFP_MODE_VNIC_BW;
+ } else {
+ p_hwfn->ufp_info.mode = QED_UFP_MODE_UNKNOWN;
+ DP_NOTICE(p_hwfn, "Unknown UFP scheduling mode %d\n", val);
+ }
+
+ qed_mcp_get_shmem_func(p_hwfn, p_ptt, &shmem_info, MCP_PF_ID(p_hwfn));
+ val = (port_cfg & OEM_CFG_FUNC_TC_MASK) >> OEM_CFG_FUNC_TC_OFFSET;
+ p_hwfn->ufp_info.tc = (u8)val;
+ val = (port_cfg & OEM_CFG_FUNC_HOST_PRI_CTRL_MASK) >>
+ OEM_CFG_FUNC_HOST_PRI_CTRL_OFFSET;
+ if (val == OEM_CFG_FUNC_HOST_PRI_CTRL_VNIC) {
+ p_hwfn->ufp_info.pri_type = QED_UFP_PRI_VNIC;
+ } else if (val == OEM_CFG_FUNC_HOST_PRI_CTRL_OS) {
+ p_hwfn->ufp_info.pri_type = QED_UFP_PRI_OS;
+ } else {
+ p_hwfn->ufp_info.pri_type = QED_UFP_PRI_UNKNOWN;
+ DP_NOTICE(p_hwfn, "Unknown Host priority control %d\n", val);
+ }
+
+ DP_NOTICE(p_hwfn,
+ "UFP shmem config: mode = %d tc = %d pri_type = %d\n",
+ p_hwfn->ufp_info.mode,
+ p_hwfn->ufp_info.tc, p_hwfn->ufp_info.pri_type);
+}
+
+static int
+qed_mcp_handle_ufp_event(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
+{
+ qed_mcp_read_ufp_config(p_hwfn, p_ptt);
+
+ if (p_hwfn->ufp_info.mode == QED_UFP_MODE_VNIC_BW) {
+ p_hwfn->qm_info.ooo_tc = p_hwfn->ufp_info.tc;
+ p_hwfn->hw_info.offload_tc = p_hwfn->ufp_info.tc;
+
+ qed_qm_reconf(p_hwfn, p_ptt);
+ } else if (p_hwfn->ufp_info.mode == QED_UFP_MODE_ETS) {
+ /* Merge UFP TC with the dcbx TC data */
+ qed_dcbx_mib_update_event(p_hwfn, p_ptt,
+ QED_DCBX_OPERATIONAL_MIB);
+ } else {
+ DP_ERR(p_hwfn, "Invalid sched type, discard the UFP config\n");
+ return -EINVAL;
+ }
+
+ /* update storm FW with negotiation results */
+ qed_sp_pf_update_ufp(p_hwfn);
+
+ /* update stag pcp value */
+ qed_sp_pf_update_stag(p_hwfn);
+
+ return 0;
+}
+
int qed_mcp_handle_events(struct qed_hwfn *p_hwfn,
struct qed_ptt *p_ptt)
{
@@ -1529,6 +1604,9 @@
qed_dcbx_mib_update_event(p_hwfn, p_ptt,
QED_DCBX_OPERATIONAL_MIB);
break;
+ case MFW_DRV_MSG_OEM_CFG_UPDATE:
+ qed_mcp_handle_ufp_event(p_hwfn, p_ptt);
+ break;
case MFW_DRV_MSG_TRANSCEIVER_STATE_CHANGE:
qed_mcp_handle_transceiver_change(p_hwfn, p_ptt);
break;
@@ -1544,6 +1622,8 @@
case MFW_DRV_MSG_S_TAG_UPDATE:
qed_mcp_update_stag(p_hwfn, p_ptt);
break;
+ case MFW_DRV_MSG_GET_TLV_REQ:
+ qed_mfw_tlv_req(p_hwfn);
break;
default:
DP_INFO(p_hwfn, "Unimplemented MFW message %d\n", i);
@@ -2529,9 +2609,8 @@
return rc;
}
-static int
+int
qed_mcp_get_nvm_image_att(struct qed_hwfn *p_hwfn,
- struct qed_ptt *p_ptt,
enum qed_nvm_images image_id,
struct qed_nvm_image_att *p_image_att)
{
@@ -2546,6 +2625,15 @@
case QED_NVM_IMAGE_FCOE_CFG:
type = NVM_TYPE_FCOE_CFG;
break;
+ case QED_NVM_IMAGE_NVM_CFG1:
+ type = NVM_TYPE_NVM_CFG1;
+ break;
+ case QED_NVM_IMAGE_DEFAULT_CFG:
+ type = NVM_TYPE_DEFAULT_CFG;
+ break;
+ case QED_NVM_IMAGE_NVM_META:
+ type = NVM_TYPE_META;
+ break;
default:
DP_NOTICE(p_hwfn, "Unknown request of image_id %08x\n",
image_id);
@@ -2569,7 +2657,6 @@
}
int qed_mcp_get_nvm_image(struct qed_hwfn *p_hwfn,
- struct qed_ptt *p_ptt,
enum qed_nvm_images image_id,
u8 *p_buffer, u32 buffer_len)
{
@@ -2578,7 +2665,7 @@
memset(p_buffer, 0, buffer_len);
- rc = qed_mcp_get_nvm_image_att(p_hwfn, p_ptt, image_id, &image_att);
+ rc = qed_mcp_get_nvm_image_att(p_hwfn, image_id, &image_att);
if (rc)
return rc;
@@ -2590,9 +2677,6 @@
return -EINVAL;
}
- /* Each NVM image is suffixed by CRC; Upper-layer has no need for it */
- image_att.length -= 4;
-
if (image_att.length > buffer_len) {
DP_VERBOSE(p_hwfn,
QED_MSG_STORAGE,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.h b/drivers/net/ethernet/qlogic/qed/qed_mcp.h
index 8a5c988..632a838 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.h
@@ -213,6 +213,44 @@
QED_OV_WOL_ENABLED
};
+enum qed_mfw_tlv_type {
+ QED_MFW_TLV_GENERIC = 0x1, /* Core driver TLVs */
+ QED_MFW_TLV_ETH = 0x2, /* L2 driver TLVs */
+ QED_MFW_TLV_FCOE = 0x4, /* FCoE protocol TLVs */
+ QED_MFW_TLV_ISCSI = 0x8, /* SCSI protocol TLVs */
+ QED_MFW_TLV_MAX = 0x16,
+};
+
+struct qed_mfw_tlv_generic {
+#define QED_MFW_TLV_FLAGS_SIZE 2
+ struct {
+ u8 ipv4_csum_offload;
+ u8 lso_supported;
+ bool b_set;
+ } flags;
+
+#define QED_MFW_TLV_MAC_COUNT 3
+ /* First entry for primary MAC, 2 secondary MACs possible */
+ u8 mac[QED_MFW_TLV_MAC_COUNT][6];
+ bool mac_set[QED_MFW_TLV_MAC_COUNT];
+
+ u64 rx_frames;
+ bool rx_frames_set;
+ u64 rx_bytes;
+ bool rx_bytes_set;
+ u64 tx_frames;
+ bool tx_frames_set;
+ u64 tx_bytes;
+ bool tx_bytes_set;
+};
+
+union qed_mfw_tlv_data {
+ struct qed_mfw_tlv_generic generic;
+ struct qed_mfw_tlv_eth eth;
+ struct qed_mfw_tlv_fcoe fcoe;
+ struct qed_mfw_tlv_iscsi iscsi;
+};
+
/**
* @brief - returns the link params of the hw function
*
@@ -486,7 +524,20 @@
* @brief Allows reading a whole nvram image
*
* @param p_hwfn
- * @param p_ptt
+ * @param image_id - image to get attributes for
+ * @param p_image_att - image attributes structure into which to fill data
+ *
+ * @return int - 0 - operation was successful.
+ */
+int
+qed_mcp_get_nvm_image_att(struct qed_hwfn *p_hwfn,
+ enum qed_nvm_images image_id,
+ struct qed_nvm_image_att *p_image_att);
+
+/**
+ * @brief Allows reading a whole nvram image
+ *
+ * @param p_hwfn
* @param image_id - image requested for reading
* @param p_buffer - allocated buffer into which to fill data
* @param buffer_len - length of the allocated buffer.
@@ -494,7 +545,6 @@
* @return 0 iff p_buffer now contains the nvram image.
*/
int qed_mcp_get_nvm_image(struct qed_hwfn *p_hwfn,
- struct qed_ptt *p_ptt,
enum qed_nvm_images image_id,
u8 *p_buffer, u32 buffer_len);
@@ -549,6 +599,17 @@
struct bist_nvm_image_att *p_image_att,
u32 image_index);
+/**
+ * @brief - Processes the TLV request from MFW i.e., get the required TLV info
+ * from the qed client and send it to the MFW.
+ *
+ * @param p_hwfn
+ * @param p_ptt
+ *
+ * @param return 0 upon success.
+ */
+int qed_mfw_process_tlv_req(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt);
+
/* Using hwfn number (and not pf_num) is required since in CMT mode,
* same pf_num may be used by two different hwfn
* TODO - this shouldn't really be in .h file, but until all fields
@@ -609,6 +670,14 @@
u32 mcp_param;
};
+struct qed_drv_tlv_hdr {
+ u8 tlv_type;
+ u8 tlv_length; /* In dwords - not including this header */
+ u8 tlv_reserved;
+#define QED_DRV_TLV_FLAGS_CHANGED 0x01
+ u8 tlv_flags;
+};
+
/**
* @brief Initialize the interface with the MCP
*
@@ -993,6 +1062,14 @@
int qed_mcp_set_capabilities(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt);
/**
+ * @brief Read ufp config from the shared memory.
+ *
+ * @param p_hwfn
+ * @param p_ptt
+ */
+void qed_mcp_read_ufp_config(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt);
+
+/**
* @brief Populate the nvm info shadow in the given hardware function
*
* @param p_hwfn
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
new file mode 100644
index 0000000..6c16158
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c
@@ -0,0 +1,1337 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/types.h>
+#include <asm/byteorder.h>
+#include <linux/bug.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
+#include "qed.h"
+#include "qed_hw.h"
+#include "qed_mcp.h"
+#include "qed_reg_addr.h"
+
+#define TLV_TYPE(p) (p[0])
+#define TLV_LENGTH(p) (p[1])
+#define TLV_FLAGS(p) (p[3])
+
+#define QED_TLV_DATA_MAX (14)
+struct qed_tlv_parsed_buf {
+ /* To be filled with the address to set in Value field */
+ void *p_val;
+
+ /* To be used internally in case the value has to be modified */
+ u8 data[QED_TLV_DATA_MAX];
+};
+
+static int qed_mfw_get_tlv_group(u8 tlv_type, u8 *tlv_group)
+{
+ switch (tlv_type) {
+ case DRV_TLV_FEATURE_FLAGS:
+ case DRV_TLV_LOCAL_ADMIN_ADDR:
+ case DRV_TLV_ADDITIONAL_MAC_ADDR_1:
+ case DRV_TLV_ADDITIONAL_MAC_ADDR_2:
+ case DRV_TLV_OS_DRIVER_STATES:
+ case DRV_TLV_PXE_BOOT_PROGRESS:
+ case DRV_TLV_RX_FRAMES_RECEIVED:
+ case DRV_TLV_RX_BYTES_RECEIVED:
+ case DRV_TLV_TX_FRAMES_SENT:
+ case DRV_TLV_TX_BYTES_SENT:
+ case DRV_TLV_NPIV_ENABLED:
+ case DRV_TLV_PCIE_BUS_RX_UTILIZATION:
+ case DRV_TLV_PCIE_BUS_TX_UTILIZATION:
+ case DRV_TLV_DEVICE_CPU_CORES_UTILIZATION:
+ case DRV_TLV_LAST_VALID_DCC_TLV_RECEIVED:
+ case DRV_TLV_NCSI_RX_BYTES_RECEIVED:
+ case DRV_TLV_NCSI_TX_BYTES_SENT:
+ *tlv_group |= QED_MFW_TLV_GENERIC;
+ break;
+ case DRV_TLV_LSO_MAX_OFFLOAD_SIZE:
+ case DRV_TLV_LSO_MIN_SEGMENT_COUNT:
+ case DRV_TLV_PROMISCUOUS_MODE:
+ case DRV_TLV_TX_DESCRIPTORS_QUEUE_SIZE:
+ case DRV_TLV_RX_DESCRIPTORS_QUEUE_SIZE:
+ case DRV_TLV_NUM_OF_NET_QUEUE_VMQ_CFG:
+ case DRV_TLV_NUM_OFFLOADED_CONNECTIONS_TCP_IPV4:
+ case DRV_TLV_NUM_OFFLOADED_CONNECTIONS_TCP_IPV6:
+ case DRV_TLV_TX_DESCRIPTOR_QUEUE_AVG_DEPTH:
+ case DRV_TLV_RX_DESCRIPTORS_QUEUE_AVG_DEPTH:
+ case DRV_TLV_IOV_OFFLOAD:
+ case DRV_TLV_TX_QUEUES_EMPTY:
+ case DRV_TLV_RX_QUEUES_EMPTY:
+ case DRV_TLV_TX_QUEUES_FULL:
+ case DRV_TLV_RX_QUEUES_FULL:
+ *tlv_group |= QED_MFW_TLV_ETH;
+ break;
+ case DRV_TLV_SCSI_TO:
+ case DRV_TLV_R_T_TOV:
+ case DRV_TLV_R_A_TOV:
+ case DRV_TLV_E_D_TOV:
+ case DRV_TLV_CR_TOV:
+ case DRV_TLV_BOOT_TYPE:
+ case DRV_TLV_NPIV_STATE:
+ case DRV_TLV_NUM_OF_NPIV_IDS:
+ case DRV_TLV_SWITCH_NAME:
+ case DRV_TLV_SWITCH_PORT_NUM:
+ case DRV_TLV_SWITCH_PORT_ID:
+ case DRV_TLV_VENDOR_NAME:
+ case DRV_TLV_SWITCH_MODEL:
+ case DRV_TLV_SWITCH_FW_VER:
+ case DRV_TLV_QOS_PRIORITY_PER_802_1P:
+ case DRV_TLV_PORT_ALIAS:
+ case DRV_TLV_PORT_STATE:
+ case DRV_TLV_FIP_TX_DESCRIPTORS_QUEUE_SIZE:
+ case DRV_TLV_FCOE_RX_DESCRIPTORS_QUEUE_SIZE:
+ case DRV_TLV_LINK_FAILURE_COUNT:
+ case DRV_TLV_FCOE_BOOT_PROGRESS:
+ case DRV_TLV_RX_BROADCAST_PACKETS:
+ case DRV_TLV_TX_BROADCAST_PACKETS:
+ case DRV_TLV_FCOE_TX_DESCRIPTOR_QUEUE_AVG_DEPTH:
+ case DRV_TLV_FCOE_RX_DESCRIPTORS_QUEUE_AVG_DEPTH:
+ case DRV_TLV_FCOE_RX_FRAMES_RECEIVED:
+ case DRV_TLV_FCOE_RX_BYTES_RECEIVED:
+ case DRV_TLV_FCOE_TX_FRAMES_SENT:
+ case DRV_TLV_FCOE_TX_BYTES_SENT:
+ case DRV_TLV_CRC_ERROR_COUNT:
+ case DRV_TLV_CRC_ERROR_1_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_1_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_2_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_2_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_3_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_3_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_4_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_4_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_5_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_5_TIMESTAMP:
+ case DRV_TLV_LOSS_OF_SYNC_ERROR_COUNT:
+ case DRV_TLV_LOSS_OF_SIGNAL_ERRORS:
+ case DRV_TLV_PRIMITIVE_SEQUENCE_PROTOCOL_ERROR_COUNT:
+ case DRV_TLV_DISPARITY_ERROR_COUNT:
+ case DRV_TLV_CODE_VIOLATION_ERROR_COUNT:
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_1:
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_2:
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_3:
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_4:
+ case DRV_TLV_LAST_FLOGI_TIMESTAMP:
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_1:
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_2:
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_3:
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_4:
+ case DRV_TLV_LAST_FLOGI_ACC_TIMESTAMP:
+ case DRV_TLV_LAST_FLOGI_RJT:
+ case DRV_TLV_LAST_FLOGI_RJT_TIMESTAMP:
+ case DRV_TLV_FDISCS_SENT_COUNT:
+ case DRV_TLV_FDISC_ACCS_RECEIVED:
+ case DRV_TLV_FDISC_RJTS_RECEIVED:
+ case DRV_TLV_PLOGI_SENT_COUNT:
+ case DRV_TLV_PLOGI_ACCS_RECEIVED:
+ case DRV_TLV_PLOGI_RJTS_RECEIVED:
+ case DRV_TLV_PLOGI_1_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_1_TIMESTAMP:
+ case DRV_TLV_PLOGI_2_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_2_TIMESTAMP:
+ case DRV_TLV_PLOGI_3_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_3_TIMESTAMP:
+ case DRV_TLV_PLOGI_4_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_4_TIMESTAMP:
+ case DRV_TLV_PLOGI_5_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_5_TIMESTAMP:
+ case DRV_TLV_PLOGI_1_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_1_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_2_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_2_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_3_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_3_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_4_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_4_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_5_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_5_ACC_TIMESTAMP:
+ case DRV_TLV_LOGOS_ISSUED:
+ case DRV_TLV_LOGO_ACCS_RECEIVED:
+ case DRV_TLV_LOGO_RJTS_RECEIVED:
+ case DRV_TLV_LOGO_1_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_1_TIMESTAMP:
+ case DRV_TLV_LOGO_2_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_2_TIMESTAMP:
+ case DRV_TLV_LOGO_3_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_3_TIMESTAMP:
+ case DRV_TLV_LOGO_4_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_4_TIMESTAMP:
+ case DRV_TLV_LOGO_5_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_5_TIMESTAMP:
+ case DRV_TLV_LOGOS_RECEIVED:
+ case DRV_TLV_ACCS_ISSUED:
+ case DRV_TLV_PRLIS_ISSUED:
+ case DRV_TLV_ACCS_RECEIVED:
+ case DRV_TLV_ABTS_SENT_COUNT:
+ case DRV_TLV_ABTS_ACCS_RECEIVED:
+ case DRV_TLV_ABTS_RJTS_RECEIVED:
+ case DRV_TLV_ABTS_1_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_1_TIMESTAMP:
+ case DRV_TLV_ABTS_2_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_2_TIMESTAMP:
+ case DRV_TLV_ABTS_3_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_3_TIMESTAMP:
+ case DRV_TLV_ABTS_4_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_4_TIMESTAMP:
+ case DRV_TLV_ABTS_5_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_5_TIMESTAMP:
+ case DRV_TLV_RSCNS_RECEIVED:
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_1:
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_2:
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_3:
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_4:
+ case DRV_TLV_LUN_RESETS_ISSUED:
+ case DRV_TLV_ABORT_TASK_SETS_ISSUED:
+ case DRV_TLV_TPRLOS_SENT:
+ case DRV_TLV_NOS_SENT_COUNT:
+ case DRV_TLV_NOS_RECEIVED_COUNT:
+ case DRV_TLV_OLS_COUNT:
+ case DRV_TLV_LR_COUNT:
+ case DRV_TLV_LRR_COUNT:
+ case DRV_TLV_LIP_SENT_COUNT:
+ case DRV_TLV_LIP_RECEIVED_COUNT:
+ case DRV_TLV_EOFA_COUNT:
+ case DRV_TLV_EOFNI_COUNT:
+ case DRV_TLV_SCSI_STATUS_CHECK_CONDITION_COUNT:
+ case DRV_TLV_SCSI_STATUS_CONDITION_MET_COUNT:
+ case DRV_TLV_SCSI_STATUS_BUSY_COUNT:
+ case DRV_TLV_SCSI_STATUS_INTERMEDIATE_COUNT:
+ case DRV_TLV_SCSI_STATUS_INTERMEDIATE_CONDITION_MET_COUNT:
+ case DRV_TLV_SCSI_STATUS_RESERVATION_CONFLICT_COUNT:
+ case DRV_TLV_SCSI_STATUS_TASK_SET_FULL_COUNT:
+ case DRV_TLV_SCSI_STATUS_ACA_ACTIVE_COUNT:
+ case DRV_TLV_SCSI_STATUS_TASK_ABORTED_COUNT:
+ case DRV_TLV_SCSI_CHECK_CONDITION_1_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_1_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_CONDITION_2_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_2_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_CONDITION_3_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_3_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_CONDITION_4_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_4_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_CONDITION_5_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_5_TIMESTAMP:
+ *tlv_group = QED_MFW_TLV_FCOE;
+ break;
+ case DRV_TLV_TARGET_LLMNR_ENABLED:
+ case DRV_TLV_HEADER_DIGEST_FLAG_ENABLED:
+ case DRV_TLV_DATA_DIGEST_FLAG_ENABLED:
+ case DRV_TLV_AUTHENTICATION_METHOD:
+ case DRV_TLV_ISCSI_BOOT_TARGET_PORTAL:
+ case DRV_TLV_MAX_FRAME_SIZE:
+ case DRV_TLV_PDU_TX_DESCRIPTORS_QUEUE_SIZE:
+ case DRV_TLV_PDU_RX_DESCRIPTORS_QUEUE_SIZE:
+ case DRV_TLV_ISCSI_BOOT_PROGRESS:
+ case DRV_TLV_PDU_TX_DESCRIPTOR_QUEUE_AVG_DEPTH:
+ case DRV_TLV_PDU_RX_DESCRIPTORS_QUEUE_AVG_DEPTH:
+ case DRV_TLV_ISCSI_PDU_RX_FRAMES_RECEIVED:
+ case DRV_TLV_ISCSI_PDU_RX_BYTES_RECEIVED:
+ case DRV_TLV_ISCSI_PDU_TX_FRAMES_SENT:
+ case DRV_TLV_ISCSI_PDU_TX_BYTES_SENT:
+ *tlv_group |= QED_MFW_TLV_ISCSI;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/* Returns size of the data buffer or, -1 in case TLV data is not available. */
+static int
+qed_mfw_get_gen_tlv_value(struct qed_drv_tlv_hdr *p_tlv,
+ struct qed_mfw_tlv_generic *p_drv_buf,
+ struct qed_tlv_parsed_buf *p_buf)
+{
+ switch (p_tlv->tlv_type) {
+ case DRV_TLV_FEATURE_FLAGS:
+ if (p_drv_buf->flags.b_set) {
+ memset(p_buf->data, 0, sizeof(u8) * QED_TLV_DATA_MAX);
+ p_buf->data[0] = p_drv_buf->flags.ipv4_csum_offload ?
+ 1 : 0;
+ p_buf->data[0] |= (p_drv_buf->flags.lso_supported ?
+ 1 : 0) << 1;
+ p_buf->p_val = p_buf->data;
+ return QED_MFW_TLV_FLAGS_SIZE;
+ }
+ break;
+
+ case DRV_TLV_LOCAL_ADMIN_ADDR:
+ case DRV_TLV_ADDITIONAL_MAC_ADDR_1:
+ case DRV_TLV_ADDITIONAL_MAC_ADDR_2:
+ {
+ int idx = p_tlv->tlv_type - DRV_TLV_LOCAL_ADMIN_ADDR;
+
+ if (p_drv_buf->mac_set[idx]) {
+ p_buf->p_val = p_drv_buf->mac[idx];
+ return ETH_ALEN;
+ }
+ break;
+ }
+
+ case DRV_TLV_RX_FRAMES_RECEIVED:
+ if (p_drv_buf->rx_frames_set) {
+ p_buf->p_val = &p_drv_buf->rx_frames;
+ return sizeof(p_drv_buf->rx_frames);
+ }
+ break;
+ case DRV_TLV_RX_BYTES_RECEIVED:
+ if (p_drv_buf->rx_bytes_set) {
+ p_buf->p_val = &p_drv_buf->rx_bytes;
+ return sizeof(p_drv_buf->rx_bytes);
+ }
+ break;
+ case DRV_TLV_TX_FRAMES_SENT:
+ if (p_drv_buf->tx_frames_set) {
+ p_buf->p_val = &p_drv_buf->tx_frames;
+ return sizeof(p_drv_buf->tx_frames);
+ }
+ break;
+ case DRV_TLV_TX_BYTES_SENT:
+ if (p_drv_buf->tx_bytes_set) {
+ p_buf->p_val = &p_drv_buf->tx_bytes;
+ return sizeof(p_drv_buf->tx_bytes);
+ }
+ break;
+ default:
+ break;
+ }
+
+ return -1;
+}
+
+static int
+qed_mfw_get_eth_tlv_value(struct qed_drv_tlv_hdr *p_tlv,
+ struct qed_mfw_tlv_eth *p_drv_buf,
+ struct qed_tlv_parsed_buf *p_buf)
+{
+ switch (p_tlv->tlv_type) {
+ case DRV_TLV_LSO_MAX_OFFLOAD_SIZE:
+ if (p_drv_buf->lso_maxoff_size_set) {
+ p_buf->p_val = &p_drv_buf->lso_maxoff_size;
+ return sizeof(p_drv_buf->lso_maxoff_size);
+ }
+ break;
+ case DRV_TLV_LSO_MIN_SEGMENT_COUNT:
+ if (p_drv_buf->lso_minseg_size_set) {
+ p_buf->p_val = &p_drv_buf->lso_minseg_size;
+ return sizeof(p_drv_buf->lso_minseg_size);
+ }
+ break;
+ case DRV_TLV_PROMISCUOUS_MODE:
+ if (p_drv_buf->prom_mode_set) {
+ p_buf->p_val = &p_drv_buf->prom_mode;
+ return sizeof(p_drv_buf->prom_mode);
+ }
+ break;
+ case DRV_TLV_TX_DESCRIPTORS_QUEUE_SIZE:
+ if (p_drv_buf->tx_descr_size_set) {
+ p_buf->p_val = &p_drv_buf->tx_descr_size;
+ return sizeof(p_drv_buf->tx_descr_size);
+ }
+ break;
+ case DRV_TLV_RX_DESCRIPTORS_QUEUE_SIZE:
+ if (p_drv_buf->rx_descr_size_set) {
+ p_buf->p_val = &p_drv_buf->rx_descr_size;
+ return sizeof(p_drv_buf->rx_descr_size);
+ }
+ break;
+ case DRV_TLV_NUM_OF_NET_QUEUE_VMQ_CFG:
+ if (p_drv_buf->netq_count_set) {
+ p_buf->p_val = &p_drv_buf->netq_count;
+ return sizeof(p_drv_buf->netq_count);
+ }
+ break;
+ case DRV_TLV_NUM_OFFLOADED_CONNECTIONS_TCP_IPV4:
+ if (p_drv_buf->tcp4_offloads_set) {
+ p_buf->p_val = &p_drv_buf->tcp4_offloads;
+ return sizeof(p_drv_buf->tcp4_offloads);
+ }
+ break;
+ case DRV_TLV_NUM_OFFLOADED_CONNECTIONS_TCP_IPV6:
+ if (p_drv_buf->tcp6_offloads_set) {
+ p_buf->p_val = &p_drv_buf->tcp6_offloads;
+ return sizeof(p_drv_buf->tcp6_offloads);
+ }
+ break;
+ case DRV_TLV_TX_DESCRIPTOR_QUEUE_AVG_DEPTH:
+ if (p_drv_buf->tx_descr_qdepth_set) {
+ p_buf->p_val = &p_drv_buf->tx_descr_qdepth;
+ return sizeof(p_drv_buf->tx_descr_qdepth);
+ }
+ break;
+ case DRV_TLV_RX_DESCRIPTORS_QUEUE_AVG_DEPTH:
+ if (p_drv_buf->rx_descr_qdepth_set) {
+ p_buf->p_val = &p_drv_buf->rx_descr_qdepth;
+ return sizeof(p_drv_buf->rx_descr_qdepth);
+ }
+ break;
+ case DRV_TLV_IOV_OFFLOAD:
+ if (p_drv_buf->iov_offload_set) {
+ p_buf->p_val = &p_drv_buf->iov_offload;
+ return sizeof(p_drv_buf->iov_offload);
+ }
+ break;
+ case DRV_TLV_TX_QUEUES_EMPTY:
+ if (p_drv_buf->txqs_empty_set) {
+ p_buf->p_val = &p_drv_buf->txqs_empty;
+ return sizeof(p_drv_buf->txqs_empty);
+ }
+ break;
+ case DRV_TLV_RX_QUEUES_EMPTY:
+ if (p_drv_buf->rxqs_empty_set) {
+ p_buf->p_val = &p_drv_buf->rxqs_empty;
+ return sizeof(p_drv_buf->rxqs_empty);
+ }
+ break;
+ case DRV_TLV_TX_QUEUES_FULL:
+ if (p_drv_buf->num_txqs_full_set) {
+ p_buf->p_val = &p_drv_buf->num_txqs_full;
+ return sizeof(p_drv_buf->num_txqs_full);
+ }
+ break;
+ case DRV_TLV_RX_QUEUES_FULL:
+ if (p_drv_buf->num_rxqs_full_set) {
+ p_buf->p_val = &p_drv_buf->num_rxqs_full;
+ return sizeof(p_drv_buf->num_rxqs_full);
+ }
+ break;
+ default:
+ break;
+ }
+
+ return -1;
+}
+
+static int
+qed_mfw_get_tlv_time_value(struct qed_mfw_tlv_time *p_time,
+ struct qed_tlv_parsed_buf *p_buf)
+{
+ if (!p_time->b_set)
+ return -1;
+
+ /* Validate numbers */
+ if (p_time->month > 12)
+ p_time->month = 0;
+ if (p_time->day > 31)
+ p_time->day = 0;
+ if (p_time->hour > 23)
+ p_time->hour = 0;
+ if (p_time->min > 59)
+ p_time->hour = 0;
+ if (p_time->msec > 999)
+ p_time->msec = 0;
+ if (p_time->usec > 999)
+ p_time->usec = 0;
+
+ memset(p_buf->data, 0, sizeof(u8) * QED_TLV_DATA_MAX);
+ snprintf(p_buf->data, 14, "%d%d%d%d%d%d",
+ p_time->month, p_time->day,
+ p_time->hour, p_time->min, p_time->msec, p_time->usec);
+
+ p_buf->p_val = p_buf->data;
+
+ return QED_MFW_TLV_TIME_SIZE;
+}
+
+static int
+qed_mfw_get_fcoe_tlv_value(struct qed_drv_tlv_hdr *p_tlv,
+ struct qed_mfw_tlv_fcoe *p_drv_buf,
+ struct qed_tlv_parsed_buf *p_buf)
+{
+ struct qed_mfw_tlv_time *p_time;
+ u8 idx;
+
+ switch (p_tlv->tlv_type) {
+ case DRV_TLV_SCSI_TO:
+ if (p_drv_buf->scsi_timeout_set) {
+ p_buf->p_val = &p_drv_buf->scsi_timeout;
+ return sizeof(p_drv_buf->scsi_timeout);
+ }
+ break;
+ case DRV_TLV_R_T_TOV:
+ if (p_drv_buf->rt_tov_set) {
+ p_buf->p_val = &p_drv_buf->rt_tov;
+ return sizeof(p_drv_buf->rt_tov);
+ }
+ break;
+ case DRV_TLV_R_A_TOV:
+ if (p_drv_buf->ra_tov_set) {
+ p_buf->p_val = &p_drv_buf->ra_tov;
+ return sizeof(p_drv_buf->ra_tov);
+ }
+ break;
+ case DRV_TLV_E_D_TOV:
+ if (p_drv_buf->ed_tov_set) {
+ p_buf->p_val = &p_drv_buf->ed_tov;
+ return sizeof(p_drv_buf->ed_tov);
+ }
+ break;
+ case DRV_TLV_CR_TOV:
+ if (p_drv_buf->cr_tov_set) {
+ p_buf->p_val = &p_drv_buf->cr_tov;
+ return sizeof(p_drv_buf->cr_tov);
+ }
+ break;
+ case DRV_TLV_BOOT_TYPE:
+ if (p_drv_buf->boot_type_set) {
+ p_buf->p_val = &p_drv_buf->boot_type;
+ return sizeof(p_drv_buf->boot_type);
+ }
+ break;
+ case DRV_TLV_NPIV_STATE:
+ if (p_drv_buf->npiv_state_set) {
+ p_buf->p_val = &p_drv_buf->npiv_state;
+ return sizeof(p_drv_buf->npiv_state);
+ }
+ break;
+ case DRV_TLV_NUM_OF_NPIV_IDS:
+ if (p_drv_buf->num_npiv_ids_set) {
+ p_buf->p_val = &p_drv_buf->num_npiv_ids;
+ return sizeof(p_drv_buf->num_npiv_ids);
+ }
+ break;
+ case DRV_TLV_SWITCH_NAME:
+ if (p_drv_buf->switch_name_set) {
+ p_buf->p_val = &p_drv_buf->switch_name;
+ return sizeof(p_drv_buf->switch_name);
+ }
+ break;
+ case DRV_TLV_SWITCH_PORT_NUM:
+ if (p_drv_buf->switch_portnum_set) {
+ p_buf->p_val = &p_drv_buf->switch_portnum;
+ return sizeof(p_drv_buf->switch_portnum);
+ }
+ break;
+ case DRV_TLV_SWITCH_PORT_ID:
+ if (p_drv_buf->switch_portid_set) {
+ p_buf->p_val = &p_drv_buf->switch_portid;
+ return sizeof(p_drv_buf->switch_portid);
+ }
+ break;
+ case DRV_TLV_VENDOR_NAME:
+ if (p_drv_buf->vendor_name_set) {
+ p_buf->p_val = &p_drv_buf->vendor_name;
+ return sizeof(p_drv_buf->vendor_name);
+ }
+ break;
+ case DRV_TLV_SWITCH_MODEL:
+ if (p_drv_buf->switch_model_set) {
+ p_buf->p_val = &p_drv_buf->switch_model;
+ return sizeof(p_drv_buf->switch_model);
+ }
+ break;
+ case DRV_TLV_SWITCH_FW_VER:
+ if (p_drv_buf->switch_fw_version_set) {
+ p_buf->p_val = &p_drv_buf->switch_fw_version;
+ return sizeof(p_drv_buf->switch_fw_version);
+ }
+ break;
+ case DRV_TLV_QOS_PRIORITY_PER_802_1P:
+ if (p_drv_buf->qos_pri_set) {
+ p_buf->p_val = &p_drv_buf->qos_pri;
+ return sizeof(p_drv_buf->qos_pri);
+ }
+ break;
+ case DRV_TLV_PORT_ALIAS:
+ if (p_drv_buf->port_alias_set) {
+ p_buf->p_val = &p_drv_buf->port_alias;
+ return sizeof(p_drv_buf->port_alias);
+ }
+ break;
+ case DRV_TLV_PORT_STATE:
+ if (p_drv_buf->port_state_set) {
+ p_buf->p_val = &p_drv_buf->port_state;
+ return sizeof(p_drv_buf->port_state);
+ }
+ break;
+ case DRV_TLV_FIP_TX_DESCRIPTORS_QUEUE_SIZE:
+ if (p_drv_buf->fip_tx_descr_size_set) {
+ p_buf->p_val = &p_drv_buf->fip_tx_descr_size;
+ return sizeof(p_drv_buf->fip_tx_descr_size);
+ }
+ break;
+ case DRV_TLV_FCOE_RX_DESCRIPTORS_QUEUE_SIZE:
+ if (p_drv_buf->fip_rx_descr_size_set) {
+ p_buf->p_val = &p_drv_buf->fip_rx_descr_size;
+ return sizeof(p_drv_buf->fip_rx_descr_size);
+ }
+ break;
+ case DRV_TLV_LINK_FAILURE_COUNT:
+ if (p_drv_buf->link_failures_set) {
+ p_buf->p_val = &p_drv_buf->link_failures;
+ return sizeof(p_drv_buf->link_failures);
+ }
+ break;
+ case DRV_TLV_FCOE_BOOT_PROGRESS:
+ if (p_drv_buf->fcoe_boot_progress_set) {
+ p_buf->p_val = &p_drv_buf->fcoe_boot_progress;
+ return sizeof(p_drv_buf->fcoe_boot_progress);
+ }
+ break;
+ case DRV_TLV_RX_BROADCAST_PACKETS:
+ if (p_drv_buf->rx_bcast_set) {
+ p_buf->p_val = &p_drv_buf->rx_bcast;
+ return sizeof(p_drv_buf->rx_bcast);
+ }
+ break;
+ case DRV_TLV_TX_BROADCAST_PACKETS:
+ if (p_drv_buf->tx_bcast_set) {
+ p_buf->p_val = &p_drv_buf->tx_bcast;
+ return sizeof(p_drv_buf->tx_bcast);
+ }
+ break;
+ case DRV_TLV_FCOE_TX_DESCRIPTOR_QUEUE_AVG_DEPTH:
+ if (p_drv_buf->fcoe_txq_depth_set) {
+ p_buf->p_val = &p_drv_buf->fcoe_txq_depth;
+ return sizeof(p_drv_buf->fcoe_txq_depth);
+ }
+ break;
+ case DRV_TLV_FCOE_RX_DESCRIPTORS_QUEUE_AVG_DEPTH:
+ if (p_drv_buf->fcoe_rxq_depth_set) {
+ p_buf->p_val = &p_drv_buf->fcoe_rxq_depth;
+ return sizeof(p_drv_buf->fcoe_rxq_depth);
+ }
+ break;
+ case DRV_TLV_FCOE_RX_FRAMES_RECEIVED:
+ if (p_drv_buf->fcoe_rx_frames_set) {
+ p_buf->p_val = &p_drv_buf->fcoe_rx_frames;
+ return sizeof(p_drv_buf->fcoe_rx_frames);
+ }
+ break;
+ case DRV_TLV_FCOE_RX_BYTES_RECEIVED:
+ if (p_drv_buf->fcoe_rx_bytes_set) {
+ p_buf->p_val = &p_drv_buf->fcoe_rx_bytes;
+ return sizeof(p_drv_buf->fcoe_rx_bytes);
+ }
+ break;
+ case DRV_TLV_FCOE_TX_FRAMES_SENT:
+ if (p_drv_buf->fcoe_tx_frames_set) {
+ p_buf->p_val = &p_drv_buf->fcoe_tx_frames;
+ return sizeof(p_drv_buf->fcoe_tx_frames);
+ }
+ break;
+ case DRV_TLV_FCOE_TX_BYTES_SENT:
+ if (p_drv_buf->fcoe_tx_bytes_set) {
+ p_buf->p_val = &p_drv_buf->fcoe_tx_bytes;
+ return sizeof(p_drv_buf->fcoe_tx_bytes);
+ }
+ break;
+ case DRV_TLV_CRC_ERROR_COUNT:
+ if (p_drv_buf->crc_count_set) {
+ p_buf->p_val = &p_drv_buf->crc_count;
+ return sizeof(p_drv_buf->crc_count);
+ }
+ break;
+ case DRV_TLV_CRC_ERROR_1_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_2_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_3_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_4_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_CRC_ERROR_5_RECEIVED_SOURCE_FC_ID:
+ idx = (p_tlv->tlv_type -
+ DRV_TLV_CRC_ERROR_1_RECEIVED_SOURCE_FC_ID) / 2;
+
+ if (p_drv_buf->crc_err_src_fcid_set[idx]) {
+ p_buf->p_val = &p_drv_buf->crc_err_src_fcid[idx];
+ return sizeof(p_drv_buf->crc_err_src_fcid[idx]);
+ }
+ break;
+ case DRV_TLV_CRC_ERROR_1_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_2_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_3_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_4_TIMESTAMP:
+ case DRV_TLV_CRC_ERROR_5_TIMESTAMP:
+ idx = (p_tlv->tlv_type - DRV_TLV_CRC_ERROR_1_TIMESTAMP) / 2;
+
+ return qed_mfw_get_tlv_time_value(&p_drv_buf->crc_err[idx],
+ p_buf);
+ case DRV_TLV_LOSS_OF_SYNC_ERROR_COUNT:
+ if (p_drv_buf->losync_err_set) {
+ p_buf->p_val = &p_drv_buf->losync_err;
+ return sizeof(p_drv_buf->losync_err);
+ }
+ break;
+ case DRV_TLV_LOSS_OF_SIGNAL_ERRORS:
+ if (p_drv_buf->losig_err_set) {
+ p_buf->p_val = &p_drv_buf->losig_err;
+ return sizeof(p_drv_buf->losig_err);
+ }
+ break;
+ case DRV_TLV_PRIMITIVE_SEQUENCE_PROTOCOL_ERROR_COUNT:
+ if (p_drv_buf->primtive_err_set) {
+ p_buf->p_val = &p_drv_buf->primtive_err;
+ return sizeof(p_drv_buf->primtive_err);
+ }
+ break;
+ case DRV_TLV_DISPARITY_ERROR_COUNT:
+ if (p_drv_buf->disparity_err_set) {
+ p_buf->p_val = &p_drv_buf->disparity_err;
+ return sizeof(p_drv_buf->disparity_err);
+ }
+ break;
+ case DRV_TLV_CODE_VIOLATION_ERROR_COUNT:
+ if (p_drv_buf->code_violation_err_set) {
+ p_buf->p_val = &p_drv_buf->code_violation_err;
+ return sizeof(p_drv_buf->code_violation_err);
+ }
+ break;
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_1:
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_2:
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_3:
+ case DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_4:
+ idx = p_tlv->tlv_type -
+ DRV_TLV_LAST_FLOGI_ISSUED_COMMON_PARAMETERS_WORD_1;
+ if (p_drv_buf->flogi_param_set[idx]) {
+ p_buf->p_val = &p_drv_buf->flogi_param[idx];
+ return sizeof(p_drv_buf->flogi_param[idx]);
+ }
+ break;
+ case DRV_TLV_LAST_FLOGI_TIMESTAMP:
+ return qed_mfw_get_tlv_time_value(&p_drv_buf->flogi_tstamp,
+ p_buf);
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_1:
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_2:
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_3:
+ case DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_4:
+ idx = p_tlv->tlv_type -
+ DRV_TLV_LAST_FLOGI_ACC_COMMON_PARAMETERS_WORD_1;
+
+ if (p_drv_buf->flogi_acc_param_set[idx]) {
+ p_buf->p_val = &p_drv_buf->flogi_acc_param[idx];
+ return sizeof(p_drv_buf->flogi_acc_param[idx]);
+ }
+ break;
+ case DRV_TLV_LAST_FLOGI_ACC_TIMESTAMP:
+ return qed_mfw_get_tlv_time_value(&p_drv_buf->flogi_acc_tstamp,
+ p_buf);
+ case DRV_TLV_LAST_FLOGI_RJT:
+ if (p_drv_buf->flogi_rjt_set) {
+ p_buf->p_val = &p_drv_buf->flogi_rjt;
+ return sizeof(p_drv_buf->flogi_rjt);
+ }
+ break;
+ case DRV_TLV_LAST_FLOGI_RJT_TIMESTAMP:
+ return qed_mfw_get_tlv_time_value(&p_drv_buf->flogi_rjt_tstamp,
+ p_buf);
+ case DRV_TLV_FDISCS_SENT_COUNT:
+ if (p_drv_buf->fdiscs_set) {
+ p_buf->p_val = &p_drv_buf->fdiscs;
+ return sizeof(p_drv_buf->fdiscs);
+ }
+ break;
+ case DRV_TLV_FDISC_ACCS_RECEIVED:
+ if (p_drv_buf->fdisc_acc_set) {
+ p_buf->p_val = &p_drv_buf->fdisc_acc;
+ return sizeof(p_drv_buf->fdisc_acc);
+ }
+ break;
+ case DRV_TLV_FDISC_RJTS_RECEIVED:
+ if (p_drv_buf->fdisc_rjt_set) {
+ p_buf->p_val = &p_drv_buf->fdisc_rjt;
+ return sizeof(p_drv_buf->fdisc_rjt);
+ }
+ break;
+ case DRV_TLV_PLOGI_SENT_COUNT:
+ if (p_drv_buf->plogi_set) {
+ p_buf->p_val = &p_drv_buf->plogi;
+ return sizeof(p_drv_buf->plogi);
+ }
+ break;
+ case DRV_TLV_PLOGI_ACCS_RECEIVED:
+ if (p_drv_buf->plogi_acc_set) {
+ p_buf->p_val = &p_drv_buf->plogi_acc;
+ return sizeof(p_drv_buf->plogi_acc);
+ }
+ break;
+ case DRV_TLV_PLOGI_RJTS_RECEIVED:
+ if (p_drv_buf->plogi_rjt_set) {
+ p_buf->p_val = &p_drv_buf->plogi_rjt;
+ return sizeof(p_drv_buf->plogi_rjt);
+ }
+ break;
+ case DRV_TLV_PLOGI_1_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_2_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_3_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_4_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_PLOGI_5_SENT_DESTINATION_FC_ID:
+ idx = (p_tlv->tlv_type -
+ DRV_TLV_PLOGI_1_SENT_DESTINATION_FC_ID) / 2;
+
+ if (p_drv_buf->plogi_dst_fcid_set[idx]) {
+ p_buf->p_val = &p_drv_buf->plogi_dst_fcid[idx];
+ return sizeof(p_drv_buf->plogi_dst_fcid[idx]);
+ }
+ break;
+ case DRV_TLV_PLOGI_1_TIMESTAMP:
+ case DRV_TLV_PLOGI_2_TIMESTAMP:
+ case DRV_TLV_PLOGI_3_TIMESTAMP:
+ case DRV_TLV_PLOGI_4_TIMESTAMP:
+ case DRV_TLV_PLOGI_5_TIMESTAMP:
+ idx = (p_tlv->tlv_type - DRV_TLV_PLOGI_1_TIMESTAMP) / 2;
+
+ return qed_mfw_get_tlv_time_value(&p_drv_buf->plogi_tstamp[idx],
+ p_buf);
+ case DRV_TLV_PLOGI_1_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_2_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_3_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_4_ACC_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_PLOGI_5_ACC_RECEIVED_SOURCE_FC_ID:
+ idx = (p_tlv->tlv_type -
+ DRV_TLV_PLOGI_1_ACC_RECEIVED_SOURCE_FC_ID) / 2;
+
+ if (p_drv_buf->plogi_acc_src_fcid_set[idx]) {
+ p_buf->p_val = &p_drv_buf->plogi_acc_src_fcid[idx];
+ return sizeof(p_drv_buf->plogi_acc_src_fcid[idx]);
+ }
+ break;
+ case DRV_TLV_PLOGI_1_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_2_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_3_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_4_ACC_TIMESTAMP:
+ case DRV_TLV_PLOGI_5_ACC_TIMESTAMP:
+ idx = (p_tlv->tlv_type - DRV_TLV_PLOGI_1_ACC_TIMESTAMP) / 2;
+ p_time = &p_drv_buf->plogi_acc_tstamp[idx];
+
+ return qed_mfw_get_tlv_time_value(p_time, p_buf);
+ case DRV_TLV_LOGOS_ISSUED:
+ if (p_drv_buf->tx_plogos_set) {
+ p_buf->p_val = &p_drv_buf->tx_plogos;
+ return sizeof(p_drv_buf->tx_plogos);
+ }
+ break;
+ case DRV_TLV_LOGO_ACCS_RECEIVED:
+ if (p_drv_buf->plogo_acc_set) {
+ p_buf->p_val = &p_drv_buf->plogo_acc;
+ return sizeof(p_drv_buf->plogo_acc);
+ }
+ break;
+ case DRV_TLV_LOGO_RJTS_RECEIVED:
+ if (p_drv_buf->plogo_rjt_set) {
+ p_buf->p_val = &p_drv_buf->plogo_rjt;
+ return sizeof(p_drv_buf->plogo_rjt);
+ }
+ break;
+ case DRV_TLV_LOGO_1_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_2_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_3_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_4_RECEIVED_SOURCE_FC_ID:
+ case DRV_TLV_LOGO_5_RECEIVED_SOURCE_FC_ID:
+ idx = (p_tlv->tlv_type - DRV_TLV_LOGO_1_RECEIVED_SOURCE_FC_ID) /
+ 2;
+
+ if (p_drv_buf->plogo_src_fcid_set[idx]) {
+ p_buf->p_val = &p_drv_buf->plogo_src_fcid[idx];
+ return sizeof(p_drv_buf->plogo_src_fcid[idx]);
+ }
+ break;
+ case DRV_TLV_LOGO_1_TIMESTAMP:
+ case DRV_TLV_LOGO_2_TIMESTAMP:
+ case DRV_TLV_LOGO_3_TIMESTAMP:
+ case DRV_TLV_LOGO_4_TIMESTAMP:
+ case DRV_TLV_LOGO_5_TIMESTAMP:
+ idx = (p_tlv->tlv_type - DRV_TLV_LOGO_1_TIMESTAMP) / 2;
+
+ return qed_mfw_get_tlv_time_value(&p_drv_buf->plogo_tstamp[idx],
+ p_buf);
+ case DRV_TLV_LOGOS_RECEIVED:
+ if (p_drv_buf->rx_logos_set) {
+ p_buf->p_val = &p_drv_buf->rx_logos;
+ return sizeof(p_drv_buf->rx_logos);
+ }
+ break;
+ case DRV_TLV_ACCS_ISSUED:
+ if (p_drv_buf->tx_accs_set) {
+ p_buf->p_val = &p_drv_buf->tx_accs;
+ return sizeof(p_drv_buf->tx_accs);
+ }
+ break;
+ case DRV_TLV_PRLIS_ISSUED:
+ if (p_drv_buf->tx_prlis_set) {
+ p_buf->p_val = &p_drv_buf->tx_prlis;
+ return sizeof(p_drv_buf->tx_prlis);
+ }
+ break;
+ case DRV_TLV_ACCS_RECEIVED:
+ if (p_drv_buf->rx_accs_set) {
+ p_buf->p_val = &p_drv_buf->rx_accs;
+ return sizeof(p_drv_buf->rx_accs);
+ }
+ break;
+ case DRV_TLV_ABTS_SENT_COUNT:
+ if (p_drv_buf->tx_abts_set) {
+ p_buf->p_val = &p_drv_buf->tx_abts;
+ return sizeof(p_drv_buf->tx_abts);
+ }
+ break;
+ case DRV_TLV_ABTS_ACCS_RECEIVED:
+ if (p_drv_buf->rx_abts_acc_set) {
+ p_buf->p_val = &p_drv_buf->rx_abts_acc;
+ return sizeof(p_drv_buf->rx_abts_acc);
+ }
+ break;
+ case DRV_TLV_ABTS_RJTS_RECEIVED:
+ if (p_drv_buf->rx_abts_rjt_set) {
+ p_buf->p_val = &p_drv_buf->rx_abts_rjt;
+ return sizeof(p_drv_buf->rx_abts_rjt);
+ }
+ break;
+ case DRV_TLV_ABTS_1_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_2_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_3_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_4_SENT_DESTINATION_FC_ID:
+ case DRV_TLV_ABTS_5_SENT_DESTINATION_FC_ID:
+ idx = (p_tlv->tlv_type -
+ DRV_TLV_ABTS_1_SENT_DESTINATION_FC_ID) / 2;
+
+ if (p_drv_buf->abts_dst_fcid_set[idx]) {
+ p_buf->p_val = &p_drv_buf->abts_dst_fcid[idx];
+ return sizeof(p_drv_buf->abts_dst_fcid[idx]);
+ }
+ break;
+ case DRV_TLV_ABTS_1_TIMESTAMP:
+ case DRV_TLV_ABTS_2_TIMESTAMP:
+ case DRV_TLV_ABTS_3_TIMESTAMP:
+ case DRV_TLV_ABTS_4_TIMESTAMP:
+ case DRV_TLV_ABTS_5_TIMESTAMP:
+ idx = (p_tlv->tlv_type - DRV_TLV_ABTS_1_TIMESTAMP) / 2;
+
+ return qed_mfw_get_tlv_time_value(&p_drv_buf->abts_tstamp[idx],
+ p_buf);
+ case DRV_TLV_RSCNS_RECEIVED:
+ if (p_drv_buf->rx_rscn_set) {
+ p_buf->p_val = &p_drv_buf->rx_rscn;
+ return sizeof(p_drv_buf->rx_rscn);
+ }
+ break;
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_1:
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_2:
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_3:
+ case DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_4:
+ idx = p_tlv->tlv_type - DRV_TLV_LAST_RSCN_RECEIVED_N_PORT_1;
+
+ if (p_drv_buf->rx_rscn_nport_set[idx]) {
+ p_buf->p_val = &p_drv_buf->rx_rscn_nport[idx];
+ return sizeof(p_drv_buf->rx_rscn_nport[idx]);
+ }
+ break;
+ case DRV_TLV_LUN_RESETS_ISSUED:
+ if (p_drv_buf->tx_lun_rst_set) {
+ p_buf->p_val = &p_drv_buf->tx_lun_rst;
+ return sizeof(p_drv_buf->tx_lun_rst);
+ }
+ break;
+ case DRV_TLV_ABORT_TASK_SETS_ISSUED:
+ if (p_drv_buf->abort_task_sets_set) {
+ p_buf->p_val = &p_drv_buf->abort_task_sets;
+ return sizeof(p_drv_buf->abort_task_sets);
+ }
+ break;
+ case DRV_TLV_TPRLOS_SENT:
+ if (p_drv_buf->tx_tprlos_set) {
+ p_buf->p_val = &p_drv_buf->tx_tprlos;
+ return sizeof(p_drv_buf->tx_tprlos);
+ }
+ break;
+ case DRV_TLV_NOS_SENT_COUNT:
+ if (p_drv_buf->tx_nos_set) {
+ p_buf->p_val = &p_drv_buf->tx_nos;
+ return sizeof(p_drv_buf->tx_nos);
+ }
+ break;
+ case DRV_TLV_NOS_RECEIVED_COUNT:
+ if (p_drv_buf->rx_nos_set) {
+ p_buf->p_val = &p_drv_buf->rx_nos;
+ return sizeof(p_drv_buf->rx_nos);
+ }
+ break;
+ case DRV_TLV_OLS_COUNT:
+ if (p_drv_buf->ols_set) {
+ p_buf->p_val = &p_drv_buf->ols;
+ return sizeof(p_drv_buf->ols);
+ }
+ break;
+ case DRV_TLV_LR_COUNT:
+ if (p_drv_buf->lr_set) {
+ p_buf->p_val = &p_drv_buf->lr;
+ return sizeof(p_drv_buf->lr);
+ }
+ break;
+ case DRV_TLV_LRR_COUNT:
+ if (p_drv_buf->lrr_set) {
+ p_buf->p_val = &p_drv_buf->lrr;
+ return sizeof(p_drv_buf->lrr);
+ }
+ break;
+ case DRV_TLV_LIP_SENT_COUNT:
+ if (p_drv_buf->tx_lip_set) {
+ p_buf->p_val = &p_drv_buf->tx_lip;
+ return sizeof(p_drv_buf->tx_lip);
+ }
+ break;
+ case DRV_TLV_LIP_RECEIVED_COUNT:
+ if (p_drv_buf->rx_lip_set) {
+ p_buf->p_val = &p_drv_buf->rx_lip;
+ return sizeof(p_drv_buf->rx_lip);
+ }
+ break;
+ case DRV_TLV_EOFA_COUNT:
+ if (p_drv_buf->eofa_set) {
+ p_buf->p_val = &p_drv_buf->eofa;
+ return sizeof(p_drv_buf->eofa);
+ }
+ break;
+ case DRV_TLV_EOFNI_COUNT:
+ if (p_drv_buf->eofni_set) {
+ p_buf->p_val = &p_drv_buf->eofni;
+ return sizeof(p_drv_buf->eofni);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_CHECK_CONDITION_COUNT:
+ if (p_drv_buf->scsi_chks_set) {
+ p_buf->p_val = &p_drv_buf->scsi_chks;
+ return sizeof(p_drv_buf->scsi_chks);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_CONDITION_MET_COUNT:
+ if (p_drv_buf->scsi_cond_met_set) {
+ p_buf->p_val = &p_drv_buf->scsi_cond_met;
+ return sizeof(p_drv_buf->scsi_cond_met);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_BUSY_COUNT:
+ if (p_drv_buf->scsi_busy_set) {
+ p_buf->p_val = &p_drv_buf->scsi_busy;
+ return sizeof(p_drv_buf->scsi_busy);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_INTERMEDIATE_COUNT:
+ if (p_drv_buf->scsi_inter_set) {
+ p_buf->p_val = &p_drv_buf->scsi_inter;
+ return sizeof(p_drv_buf->scsi_inter);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_INTERMEDIATE_CONDITION_MET_COUNT:
+ if (p_drv_buf->scsi_inter_cond_met_set) {
+ p_buf->p_val = &p_drv_buf->scsi_inter_cond_met;
+ return sizeof(p_drv_buf->scsi_inter_cond_met);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_RESERVATION_CONFLICT_COUNT:
+ if (p_drv_buf->scsi_rsv_conflicts_set) {
+ p_buf->p_val = &p_drv_buf->scsi_rsv_conflicts;
+ return sizeof(p_drv_buf->scsi_rsv_conflicts);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_TASK_SET_FULL_COUNT:
+ if (p_drv_buf->scsi_tsk_full_set) {
+ p_buf->p_val = &p_drv_buf->scsi_tsk_full;
+ return sizeof(p_drv_buf->scsi_tsk_full);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_ACA_ACTIVE_COUNT:
+ if (p_drv_buf->scsi_aca_active_set) {
+ p_buf->p_val = &p_drv_buf->scsi_aca_active;
+ return sizeof(p_drv_buf->scsi_aca_active);
+ }
+ break;
+ case DRV_TLV_SCSI_STATUS_TASK_ABORTED_COUNT:
+ if (p_drv_buf->scsi_tsk_abort_set) {
+ p_buf->p_val = &p_drv_buf->scsi_tsk_abort;
+ return sizeof(p_drv_buf->scsi_tsk_abort);
+ }
+ break;
+ case DRV_TLV_SCSI_CHECK_CONDITION_1_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_CONDITION_2_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_CONDITION_3_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_CONDITION_4_RECEIVED_SK_ASC_ASCQ:
+ case DRV_TLV_SCSI_CHECK_CONDITION_5_RECEIVED_SK_ASC_ASCQ:
+ idx = (p_tlv->tlv_type -
+ DRV_TLV_SCSI_CHECK_CONDITION_1_RECEIVED_SK_ASC_ASCQ) / 2;
+
+ if (p_drv_buf->scsi_rx_chk_set[idx]) {
+ p_buf->p_val = &p_drv_buf->scsi_rx_chk[idx];
+ return sizeof(p_drv_buf->scsi_rx_chk[idx]);
+ }
+ break;
+ case DRV_TLV_SCSI_CHECK_1_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_2_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_3_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_4_TIMESTAMP:
+ case DRV_TLV_SCSI_CHECK_5_TIMESTAMP:
+ idx = (p_tlv->tlv_type - DRV_TLV_SCSI_CHECK_1_TIMESTAMP) / 2;
+ p_time = &p_drv_buf->scsi_chk_tstamp[idx];
+
+ return qed_mfw_get_tlv_time_value(p_time, p_buf);
+ default:
+ break;
+ }
+
+ return -1;
+}
+
+static int
+qed_mfw_get_iscsi_tlv_value(struct qed_drv_tlv_hdr *p_tlv,
+ struct qed_mfw_tlv_iscsi *p_drv_buf,
+ struct qed_tlv_parsed_buf *p_buf)
+{
+ switch (p_tlv->tlv_type) {
+ case DRV_TLV_TARGET_LLMNR_ENABLED:
+ if (p_drv_buf->target_llmnr_set) {
+ p_buf->p_val = &p_drv_buf->target_llmnr;
+ return sizeof(p_drv_buf->target_llmnr);
+ }
+ break;
+ case DRV_TLV_HEADER_DIGEST_FLAG_ENABLED:
+ if (p_drv_buf->header_digest_set) {
+ p_buf->p_val = &p_drv_buf->header_digest;
+ return sizeof(p_drv_buf->header_digest);
+ }
+ break;
+ case DRV_TLV_DATA_DIGEST_FLAG_ENABLED:
+ if (p_drv_buf->data_digest_set) {
+ p_buf->p_val = &p_drv_buf->data_digest;
+ return sizeof(p_drv_buf->data_digest);
+ }
+ break;
+ case DRV_TLV_AUTHENTICATION_METHOD:
+ if (p_drv_buf->auth_method_set) {
+ p_buf->p_val = &p_drv_buf->auth_method;
+ return sizeof(p_drv_buf->auth_method);
+ }
+ break;
+ case DRV_TLV_ISCSI_BOOT_TARGET_PORTAL:
+ if (p_drv_buf->boot_taget_portal_set) {
+ p_buf->p_val = &p_drv_buf->boot_taget_portal;
+ return sizeof(p_drv_buf->boot_taget_portal);
+ }
+ break;
+ case DRV_TLV_MAX_FRAME_SIZE:
+ if (p_drv_buf->frame_size_set) {
+ p_buf->p_val = &p_drv_buf->frame_size;
+ return sizeof(p_drv_buf->frame_size);
+ }
+ break;
+ case DRV_TLV_PDU_TX_DESCRIPTORS_QUEUE_SIZE:
+ if (p_drv_buf->tx_desc_size_set) {
+ p_buf->p_val = &p_drv_buf->tx_desc_size;
+ return sizeof(p_drv_buf->tx_desc_size);
+ }
+ break;
+ case DRV_TLV_PDU_RX_DESCRIPTORS_QUEUE_SIZE:
+ if (p_drv_buf->rx_desc_size_set) {
+ p_buf->p_val = &p_drv_buf->rx_desc_size;
+ return sizeof(p_drv_buf->rx_desc_size);
+ }
+ break;
+ case DRV_TLV_ISCSI_BOOT_PROGRESS:
+ if (p_drv_buf->boot_progress_set) {
+ p_buf->p_val = &p_drv_buf->boot_progress;
+ return sizeof(p_drv_buf->boot_progress);
+ }
+ break;
+ case DRV_TLV_PDU_TX_DESCRIPTOR_QUEUE_AVG_DEPTH:
+ if (p_drv_buf->tx_desc_qdepth_set) {
+ p_buf->p_val = &p_drv_buf->tx_desc_qdepth;
+ return sizeof(p_drv_buf->tx_desc_qdepth);
+ }
+ break;
+ case DRV_TLV_PDU_RX_DESCRIPTORS_QUEUE_AVG_DEPTH:
+ if (p_drv_buf->rx_desc_qdepth_set) {
+ p_buf->p_val = &p_drv_buf->rx_desc_qdepth;
+ return sizeof(p_drv_buf->rx_desc_qdepth);
+ }
+ break;
+ case DRV_TLV_ISCSI_PDU_RX_FRAMES_RECEIVED:
+ if (p_drv_buf->rx_frames_set) {
+ p_buf->p_val = &p_drv_buf->rx_frames;
+ return sizeof(p_drv_buf->rx_frames);
+ }
+ break;
+ case DRV_TLV_ISCSI_PDU_RX_BYTES_RECEIVED:
+ if (p_drv_buf->rx_bytes_set) {
+ p_buf->p_val = &p_drv_buf->rx_bytes;
+ return sizeof(p_drv_buf->rx_bytes);
+ }
+ break;
+ case DRV_TLV_ISCSI_PDU_TX_FRAMES_SENT:
+ if (p_drv_buf->tx_frames_set) {
+ p_buf->p_val = &p_drv_buf->tx_frames;
+ return sizeof(p_drv_buf->tx_frames);
+ }
+ break;
+ case DRV_TLV_ISCSI_PDU_TX_BYTES_SENT:
+ if (p_drv_buf->tx_bytes_set) {
+ p_buf->p_val = &p_drv_buf->tx_bytes;
+ return sizeof(p_drv_buf->tx_bytes);
+ }
+ break;
+ default:
+ break;
+ }
+
+ return -1;
+}
+
+static int qed_mfw_update_tlvs(struct qed_hwfn *p_hwfn,
+ u8 tlv_group, u8 *p_mfw_buf, u32 size)
+{
+ union qed_mfw_tlv_data *p_tlv_data;
+ struct qed_tlv_parsed_buf buffer;
+ struct qed_drv_tlv_hdr tlv;
+ int len = 0;
+ u32 offset;
+ u8 *p_tlv;
+
+ p_tlv_data = vzalloc(sizeof(*p_tlv_data));
+ if (!p_tlv_data)
+ return -ENOMEM;
+
+ if (qed_mfw_fill_tlv_data(p_hwfn, tlv_group, p_tlv_data)) {
+ vfree(p_tlv_data);
+ return -EINVAL;
+ }
+
+ memset(&tlv, 0, sizeof(tlv));
+ for (offset = 0; offset < size;
+ offset += sizeof(tlv) + sizeof(u32) * tlv.tlv_length) {
+ p_tlv = &p_mfw_buf[offset];
+ tlv.tlv_type = TLV_TYPE(p_tlv);
+ tlv.tlv_length = TLV_LENGTH(p_tlv);
+ tlv.tlv_flags = TLV_FLAGS(p_tlv);
+
+ DP_VERBOSE(p_hwfn, QED_MSG_SP,
+ "Type %d length = %d flags = 0x%x\n", tlv.tlv_type,
+ tlv.tlv_length, tlv.tlv_flags);
+
+ if (tlv_group == QED_MFW_TLV_GENERIC)
+ len = qed_mfw_get_gen_tlv_value(&tlv,
+ &p_tlv_data->generic,
+ &buffer);
+ else if (tlv_group == QED_MFW_TLV_ETH)
+ len = qed_mfw_get_eth_tlv_value(&tlv,
+ &p_tlv_data->eth,
+ &buffer);
+ else if (tlv_group == QED_MFW_TLV_FCOE)
+ len = qed_mfw_get_fcoe_tlv_value(&tlv,
+ &p_tlv_data->fcoe,
+ &buffer);
+ else
+ len = qed_mfw_get_iscsi_tlv_value(&tlv,
+ &p_tlv_data->iscsi,
+ &buffer);
+
+ if (len > 0) {
+ WARN(len > 4 * tlv.tlv_length,
+ "Incorrect MFW TLV length %d, it shouldn't be greater than %d\n",
+ len, 4 * tlv.tlv_length);
+ len = min_t(int, len, 4 * tlv.tlv_length);
+ tlv.tlv_flags |= QED_DRV_TLV_FLAGS_CHANGED;
+ TLV_FLAGS(p_tlv) = tlv.tlv_flags;
+ memcpy(p_mfw_buf + offset + sizeof(tlv),
+ buffer.p_val, len);
+ }
+ }
+
+ vfree(p_tlv_data);
+
+ return 0;
+}
+
+int qed_mfw_process_tlv_req(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
+{
+ u32 addr, size, offset, resp, param, val, global_offsize, global_addr;
+ u8 tlv_group = 0, id, *p_mfw_buf = NULL, *p_temp;
+ struct qed_drv_tlv_hdr tlv;
+ int rc;
+
+ addr = SECTION_OFFSIZE_ADDR(p_hwfn->mcp_info->public_base,
+ PUBLIC_GLOBAL);
+ global_offsize = qed_rd(p_hwfn, p_ptt, addr);
+ global_addr = SECTION_ADDR(global_offsize, 0);
+ addr = global_addr + offsetof(struct public_global, data_ptr);
+ addr = qed_rd(p_hwfn, p_ptt, addr);
+ size = qed_rd(p_hwfn, p_ptt, global_addr +
+ offsetof(struct public_global, data_size));
+
+ if (!size) {
+ DP_NOTICE(p_hwfn, "Invalid TLV req size = %d\n", size);
+ goto drv_done;
+ }
+
+ p_mfw_buf = vzalloc(size);
+ if (!p_mfw_buf) {
+ DP_NOTICE(p_hwfn, "Failed allocate memory for p_mfw_buf\n");
+ goto drv_done;
+ }
+
+ /* Read the TLV request to local buffer. MFW represents the TLV in
+ * little endian format and mcp returns it bigendian format. Hence
+ * driver need to convert data to little endian first and then do the
+ * memcpy (casting) to preserve the MFW TLV format in the driver buffer.
+ *
+ */
+ for (offset = 0; offset < size; offset += sizeof(u32)) {
+ val = qed_rd(p_hwfn, p_ptt, addr + offset);
+ val = be32_to_cpu(val);
+ memcpy(&p_mfw_buf[offset], &val, sizeof(u32));
+ }
+
+ /* Parse the headers to enumerate the requested TLV groups */
+ for (offset = 0; offset < size;
+ offset += sizeof(tlv) + sizeof(u32) * tlv.tlv_length) {
+ p_temp = &p_mfw_buf[offset];
+ tlv.tlv_type = TLV_TYPE(p_temp);
+ tlv.tlv_length = TLV_LENGTH(p_temp);
+ if (qed_mfw_get_tlv_group(tlv.tlv_type, &tlv_group))
+ DP_VERBOSE(p_hwfn, NETIF_MSG_DRV,
+ "Un recognized TLV %d\n", tlv.tlv_type);
+ }
+
+ /* Sanitize the TLV groups according to personality */
+ if ((tlv_group & QED_MFW_TLV_ETH) && !QED_IS_L2_PERSONALITY(p_hwfn)) {
+ DP_VERBOSE(p_hwfn, QED_MSG_SP,
+ "Skipping L2 TLVs for non-L2 function\n");
+ tlv_group &= ~QED_MFW_TLV_ETH;
+ }
+
+ if ((tlv_group & QED_MFW_TLV_FCOE) &&
+ p_hwfn->hw_info.personality != QED_PCI_FCOE) {
+ DP_VERBOSE(p_hwfn, QED_MSG_SP,
+ "Skipping FCoE TLVs for non-FCoE function\n");
+ tlv_group &= ~QED_MFW_TLV_FCOE;
+ }
+
+ if ((tlv_group & QED_MFW_TLV_ISCSI) &&
+ p_hwfn->hw_info.personality != QED_PCI_ISCSI) {
+ DP_VERBOSE(p_hwfn, QED_MSG_SP,
+ "Skipping iSCSI TLVs for non-iSCSI function\n");
+ tlv_group &= ~QED_MFW_TLV_ISCSI;
+ }
+
+ /* Update the TLV values in the local buffer */
+ for (id = QED_MFW_TLV_GENERIC; id < QED_MFW_TLV_MAX; id <<= 1) {
+ if (tlv_group & id)
+ if (qed_mfw_update_tlvs(p_hwfn, id, p_mfw_buf, size))
+ goto drv_done;
+ }
+
+ /* Write the TLV data to shared memory. The stream of 4 bytes first need
+ * to be mem-copied to u32 element to make it as LSB format. And then
+ * converted to big endian as required by mcp-write.
+ */
+ for (offset = 0; offset < size; offset += sizeof(u32)) {
+ memcpy(&val, &p_mfw_buf[offset], sizeof(u32));
+ val = cpu_to_be32(val);
+ qed_wr(p_hwfn, p_ptt, addr + offset, val);
+ }
+
+drv_done:
+ rc = qed_mcp_cmd(p_hwfn, p_ptt, DRV_MSG_CODE_GET_TLV_DONE, 0, &resp,
+ ¶m);
+
+ vfree(p_mfw_buf);
+
+ return rc;
+}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
index ab4ad8a..e95431f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
@@ -416,7 +416,6 @@
* @param p_hwfn
* @param p_ptt
* @param p_tunn
- * @param mode
* @param allow_npar_tx_switch
*
* @return int
@@ -425,7 +424,7 @@
int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
struct qed_ptt *p_ptt,
struct qed_tunnel_info *p_tunn,
- enum qed_mf_mode mode, bool allow_npar_tx_switch);
+ bool allow_npar_tx_switch);
/**
* @brief qed_sp_pf_update - PF Function Update Ramrod
@@ -463,6 +462,15 @@
* @return int
*/
+/**
+ * @brief qed_sp_pf_update_ufp - PF ufp update Ramrod
+ *
+ * @param p_hwfn
+ *
+ * @return int
+ */
+int qed_sp_pf_update_ufp(struct qed_hwfn *p_hwfn);
+
int qed_sp_pf_stop(struct qed_hwfn *p_hwfn);
int qed_sp_pf_update_tunn_cfg(struct qed_hwfn *p_hwfn,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
index 5e927b6..8de644b4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
@@ -306,7 +306,7 @@
int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
struct qed_ptt *p_ptt,
struct qed_tunnel_info *p_tunn,
- enum qed_mf_mode mode, bool allow_npar_tx_switch)
+ bool allow_npar_tx_switch)
{
struct pf_start_ramrod_data *p_ramrod = NULL;
u16 sb = qed_int_get_sp_sb_id(p_hwfn);
@@ -314,7 +314,7 @@
struct qed_spq_entry *p_ent = NULL;
struct qed_sp_init_data init_data;
int rc = -EINVAL;
- u8 page_cnt;
+ u8 page_cnt, i;
/* update initial eq producer */
qed_eq_prod_update(p_hwfn,
@@ -339,21 +339,36 @@
p_ramrod->dont_log_ramrods = 0;
p_ramrod->log_type_mask = cpu_to_le16(0xf);
- switch (mode) {
- case QED_MF_DEFAULT:
- case QED_MF_NPAR:
- p_ramrod->mf_mode = MF_NPAR;
- break;
- case QED_MF_OVLAN:
+ if (test_bit(QED_MF_OVLAN_CLSS, &p_hwfn->cdev->mf_bits))
p_ramrod->mf_mode = MF_OVLAN;
- break;
- default:
- DP_NOTICE(p_hwfn, "Unsupported MF mode, init as DEFAULT\n");
+ else
p_ramrod->mf_mode = MF_NPAR;
- }
p_ramrod->outer_tag_config.outer_tag.tci =
- cpu_to_le16(p_hwfn->hw_info.ovlan);
+ cpu_to_le16(p_hwfn->hw_info.ovlan);
+ if (test_bit(QED_MF_8021Q_TAGGING, &p_hwfn->cdev->mf_bits)) {
+ p_ramrod->outer_tag_config.outer_tag.tpid = ETH_P_8021Q;
+ } else if (test_bit(QED_MF_8021AD_TAGGING, &p_hwfn->cdev->mf_bits)) {
+ p_ramrod->outer_tag_config.outer_tag.tpid = ETH_P_8021AD;
+ p_ramrod->outer_tag_config.enable_stag_pri_change = 1;
+ }
+
+ p_ramrod->outer_tag_config.pri_map_valid = 1;
+ for (i = 0; i < QED_MAX_PFC_PRIORITIES; i++)
+ p_ramrod->outer_tag_config.inner_to_outer_pri_map[i] = i;
+
+ /* enable_stag_pri_change should be set if port is in BD mode or,
+ * UFP with Host Control mode.
+ */
+ if (test_bit(QED_MF_UFP_SPECIFIC, &p_hwfn->cdev->mf_bits)) {
+ if (p_hwfn->ufp_info.pri_type == QED_UFP_PRI_OS)
+ p_ramrod->outer_tag_config.enable_stag_pri_change = 1;
+ else
+ p_ramrod->outer_tag_config.enable_stag_pri_change = 0;
+
+ p_ramrod->outer_tag_config.outer_tag.tci |=
+ cpu_to_le16(((u16)p_hwfn->ufp_info.tc << 13));
+ }
/* Place EQ address in RAMROD */
DMA_REGPAIR_LE(p_ramrod->event_ring_pbl_addr,
@@ -365,7 +380,7 @@
qed_tunn_set_pf_start_params(p_hwfn, p_tunn, &p_ramrod->tunnel_config);
- if (IS_MF_SI(p_hwfn))
+ if (test_bit(QED_MF_INTER_PF_SWITCH, &p_hwfn->cdev->mf_bits))
p_ramrod->allow_npar_tx_switching = allow_npar_tx_switch;
switch (p_hwfn->hw_info.personality) {
@@ -434,6 +449,39 @@
return qed_spq_post(p_hwfn, p_ent, NULL);
}
+int qed_sp_pf_update_ufp(struct qed_hwfn *p_hwfn)
+{
+ struct qed_spq_entry *p_ent = NULL;
+ struct qed_sp_init_data init_data;
+ int rc = -EOPNOTSUPP;
+
+ if (p_hwfn->ufp_info.pri_type == QED_UFP_PRI_UNKNOWN) {
+ DP_INFO(p_hwfn, "Invalid priority type %d\n",
+ p_hwfn->ufp_info.pri_type);
+ return -EINVAL;
+ }
+
+ /* Get SPQ entry */
+ memset(&init_data, 0, sizeof(init_data));
+ init_data.cid = qed_spq_get_cid(p_hwfn);
+ init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+ init_data.comp_mode = QED_SPQ_MODE_CB;
+
+ rc = qed_sp_init_request(p_hwfn, &p_ent,
+ COMMON_RAMROD_PF_UPDATE, PROTOCOLID_COMMON,
+ &init_data);
+ if (rc)
+ return rc;
+
+ p_ent->ramrod.pf_update.update_enable_stag_pri_change = true;
+ if (p_hwfn->ufp_info.pri_type == QED_UFP_PRI_OS)
+ p_ent->ramrod.pf_update.enable_stag_pri_change = 1;
+ else
+ p_ent->ramrod.pf_update.enable_stag_pri_change = 0;
+
+ return qed_spq_post(p_hwfn, p_ent, NULL);
+}
+
/* Set pf update ramrod command params */
int qed_sp_pf_update_tunn_cfg(struct qed_hwfn *p_hwfn,
struct qed_ptt *p_ptt,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
index 5acb91b..f01bf52 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
@@ -48,7 +48,7 @@
u8 opcode,
__le16 echo,
union event_ring_data *data, u8 fw_return_code);
-
+static int qed_iov_bulletin_set_mac(struct qed_hwfn *p_hwfn, u8 *mac, int vfid);
static u8 qed_vf_calculate_legacy(struct qed_vf_info *p_vf)
{
@@ -1790,7 +1790,8 @@
if (!p_vf->vport_instance)
return -EINVAL;
- if (events & BIT(MAC_ADDR_FORCED)) {
+ if ((events & BIT(MAC_ADDR_FORCED)) ||
+ p_vf->p_vf_info.is_trusted_configured) {
/* Since there's no way [currently] of removing the MAC,
* we can always assume this means we need to force it.
*/
@@ -1809,8 +1810,12 @@
"PF failed to configure MAC for VF\n");
return rc;
}
-
- p_vf->configured_features |= 1 << MAC_ADDR_FORCED;
+ if (p_vf->p_vf_info.is_trusted_configured)
+ p_vf->configured_features |=
+ BIT(VFPF_BULLETIN_MAC_ADDR);
+ else
+ p_vf->configured_features |=
+ BIT(MAC_ADDR_FORCED);
}
if (events & BIT(VLAN_ADDR_FORCED)) {
@@ -3170,6 +3175,10 @@
if (p_vf->bulletin.p_virt->valid_bitmap & BIT(MAC_ADDR_FORCED))
return 0;
+ /* Don't keep track of shadow copy since we don't intend to restore. */
+ if (p_vf->p_vf_info.is_trusted_configured)
+ return 0;
+
/* First remove entries and then add new ones */
if (p_params->opcode == QED_FILTER_REMOVE) {
for (i = 0; i < QED_ETH_VF_NUM_MAC_FILTERS; i++) {
@@ -3244,9 +3253,17 @@
/* No real decision to make; Store the configured MAC */
if (params->type == QED_FILTER_MAC ||
- params->type == QED_FILTER_MAC_VLAN)
+ params->type == QED_FILTER_MAC_VLAN) {
ether_addr_copy(vf->mac, params->mac);
+ if (vf->is_trusted_configured) {
+ qed_iov_bulletin_set_mac(hwfn, vf->mac, vfid);
+
+ /* Update and post bulleitin again */
+ qed_schedule_iov(hwfn, QED_IOV_WQ_BULLETIN_UPDATE_FLAG);
+ }
+ }
+
return 0;
}
@@ -3803,6 +3820,40 @@
__qed_vf_get_link_caps(p_hwfn, p_caps, p_bulletin);
}
+static int
+qed_iov_vf_pf_bulletin_update_mac(struct qed_hwfn *p_hwfn,
+ struct qed_ptt *p_ptt,
+ struct qed_vf_info *p_vf)
+{
+ struct qed_bulletin_content *p_bulletin = p_vf->bulletin.p_virt;
+ struct qed_iov_vf_mbx *mbx = &p_vf->vf_mbx;
+ struct vfpf_bulletin_update_mac_tlv *p_req;
+ u8 status = PFVF_STATUS_SUCCESS;
+ int rc = 0;
+
+ if (!p_vf->p_vf_info.is_trusted_configured) {
+ DP_VERBOSE(p_hwfn,
+ QED_MSG_IOV,
+ "Blocking bulletin update request from untrusted VF[%d]\n",
+ p_vf->abs_vf_id);
+ status = PFVF_STATUS_NOT_SUPPORTED;
+ rc = -EINVAL;
+ goto send_status;
+ }
+
+ p_req = &mbx->req_virt->bulletin_update_mac;
+ ether_addr_copy(p_bulletin->mac, p_req->mac);
+ DP_VERBOSE(p_hwfn, QED_MSG_IOV,
+ "Updated bulletin of VF[%d] with requested MAC[%pM]\n",
+ p_vf->abs_vf_id, p_req->mac);
+
+send_status:
+ qed_iov_prepare_resp(p_hwfn, p_ptt, p_vf,
+ CHANNEL_TLV_BULLETIN_UPDATE_MAC,
+ sizeof(struct pfvf_def_resp_tlv), status);
+ return rc;
+}
+
static void qed_iov_process_mbx_req(struct qed_hwfn *p_hwfn,
struct qed_ptt *p_ptt, int vfid)
{
@@ -3882,6 +3933,9 @@
case CHANNEL_TLV_COALESCE_READ:
qed_iov_vf_pf_get_coalesce(p_hwfn, p_ptt, p_vf);
break;
+ case CHANNEL_TLV_BULLETIN_UPDATE_MAC:
+ qed_iov_vf_pf_bulletin_update_mac(p_hwfn, p_ptt, p_vf);
+ break;
}
} else if (qed_iov_tlv_supported(mbx->first_tlv.tl.type)) {
DP_VERBOSE(p_hwfn, QED_MSG_IOV,
@@ -4081,16 +4135,60 @@
return;
}
- feature = 1 << MAC_ADDR_FORCED;
+ if (vf_info->p_vf_info.is_trusted_configured) {
+ feature = BIT(VFPF_BULLETIN_MAC_ADDR);
+ /* Trust mode will disable Forced MAC */
+ vf_info->bulletin.p_virt->valid_bitmap &=
+ ~BIT(MAC_ADDR_FORCED);
+ } else {
+ feature = BIT(MAC_ADDR_FORCED);
+ /* Forced MAC will disable MAC_ADDR */
+ vf_info->bulletin.p_virt->valid_bitmap &=
+ ~BIT(VFPF_BULLETIN_MAC_ADDR);
+ }
+
memcpy(vf_info->bulletin.p_virt->mac, mac, ETH_ALEN);
vf_info->bulletin.p_virt->valid_bitmap |= feature;
- /* Forced MAC will disable MAC_ADDR */
- vf_info->bulletin.p_virt->valid_bitmap &= ~BIT(VFPF_BULLETIN_MAC_ADDR);
qed_iov_configure_vport_forced(p_hwfn, vf_info, feature);
}
+static int qed_iov_bulletin_set_mac(struct qed_hwfn *p_hwfn, u8 *mac, int vfid)
+{
+ struct qed_vf_info *vf_info;
+ u64 feature;
+
+ vf_info = qed_iov_get_vf_info(p_hwfn, (u16)vfid, true);
+ if (!vf_info) {
+ DP_NOTICE(p_hwfn->cdev, "Can not set MAC, invalid vfid [%d]\n",
+ vfid);
+ return -EINVAL;
+ }
+
+ if (vf_info->b_malicious) {
+ DP_NOTICE(p_hwfn->cdev, "Can't set MAC to malicious VF [%d]\n",
+ vfid);
+ return -EINVAL;
+ }
+
+ if (vf_info->bulletin.p_virt->valid_bitmap & BIT(MAC_ADDR_FORCED)) {
+ DP_VERBOSE(p_hwfn, QED_MSG_IOV,
+ "Can not set MAC, Forced MAC is configured\n");
+ return -EINVAL;
+ }
+
+ feature = BIT(VFPF_BULLETIN_MAC_ADDR);
+ ether_addr_copy(vf_info->bulletin.p_virt->mac, mac);
+
+ vf_info->bulletin.p_virt->valid_bitmap |= feature;
+
+ if (vf_info->p_vf_info.is_trusted_configured)
+ qed_iov_configure_vport_forced(p_hwfn, vf_info, feature);
+
+ return 0;
+}
+
static void qed_iov_bulletin_set_forced_vlan(struct qed_hwfn *p_hwfn,
u16 pvid, int vfid)
{
@@ -4204,6 +4302,21 @@
return rc;
}
+static u8 *qed_iov_bulletin_get_mac(struct qed_hwfn *p_hwfn, u16 rel_vf_id)
+{
+ struct qed_vf_info *p_vf;
+
+ p_vf = qed_iov_get_vf_info(p_hwfn, rel_vf_id, true);
+ if (!p_vf || !p_vf->bulletin.p_virt)
+ return NULL;
+
+ if (!(p_vf->bulletin.p_virt->valid_bitmap &
+ BIT(VFPF_BULLETIN_MAC_ADDR)))
+ return NULL;
+
+ return p_vf->bulletin.p_virt->mac;
+}
+
static u8 *qed_iov_bulletin_get_forced_mac(struct qed_hwfn *p_hwfn,
u16 rel_vf_id)
{
@@ -4493,8 +4606,12 @@
if (!vf_info)
continue;
- /* Set the forced MAC, and schedule the IOV task */
- ether_addr_copy(vf_info->forced_mac, mac);
+ /* Set the MAC, and schedule the IOV task */
+ if (vf_info->is_trusted_configured)
+ ether_addr_copy(vf_info->mac, mac);
+ else
+ ether_addr_copy(vf_info->forced_mac, mac);
+
qed_schedule_iov(hwfn, QED_IOV_WQ_SET_UNICAST_FILTER_FLAG);
}
@@ -4802,6 +4919,33 @@
qed_ptt_release(hwfn, ptt);
}
+static bool qed_pf_validate_req_vf_mac(struct qed_hwfn *hwfn,
+ u8 *mac,
+ struct qed_public_vf_info *info)
+{
+ if (info->is_trusted_configured) {
+ if (is_valid_ether_addr(info->mac) &&
+ (!mac || !ether_addr_equal(mac, info->mac)))
+ return true;
+ } else {
+ if (is_valid_ether_addr(info->forced_mac) &&
+ (!mac || !ether_addr_equal(mac, info->forced_mac)))
+ return true;
+ }
+
+ return false;
+}
+
+static void qed_set_bulletin_mac(struct qed_hwfn *hwfn,
+ struct qed_public_vf_info *info,
+ int vfid)
+{
+ if (info->is_trusted_configured)
+ qed_iov_bulletin_set_mac(hwfn, info->mac, vfid);
+ else
+ qed_iov_bulletin_set_forced_mac(hwfn, info->forced_mac, vfid);
+}
+
static void qed_handle_pf_set_vf_unicast(struct qed_hwfn *hwfn)
{
int i;
@@ -4816,18 +4960,20 @@
continue;
/* Update data on bulletin board */
- mac = qed_iov_bulletin_get_forced_mac(hwfn, i);
- if (is_valid_ether_addr(info->forced_mac) &&
- (!mac || !ether_addr_equal(mac, info->forced_mac))) {
+ if (info->is_trusted_configured)
+ mac = qed_iov_bulletin_get_mac(hwfn, i);
+ else
+ mac = qed_iov_bulletin_get_forced_mac(hwfn, i);
+
+ if (qed_pf_validate_req_vf_mac(hwfn, mac, info)) {
DP_VERBOSE(hwfn,
QED_MSG_IOV,
"Handling PF setting of VF MAC to VF 0x%02x [Abs 0x%02x]\n",
i,
hwfn->cdev->p_iov_info->first_vf_in_pf + i);
- /* Update bulletin board with forced MAC */
- qed_iov_bulletin_set_forced_mac(hwfn,
- info->forced_mac, i);
+ /* Update bulletin board with MAC */
+ qed_set_bulletin_mac(hwfn, info, i);
update = true;
}
@@ -4867,6 +5013,72 @@
qed_ptt_release(hwfn, ptt);
}
+static void qed_update_mac_for_vf_trust_change(struct qed_hwfn *hwfn, int vf_id)
+{
+ struct qed_public_vf_info *vf_info;
+ struct qed_vf_info *vf;
+ u8 *force_mac;
+ int i;
+
+ vf_info = qed_iov_get_public_vf_info(hwfn, vf_id, true);
+ vf = qed_iov_get_vf_info(hwfn, vf_id, true);
+
+ if (!vf_info || !vf)
+ return;
+
+ /* Force MAC converted to generic MAC in case of VF trust on */
+ if (vf_info->is_trusted_configured &&
+ (vf->bulletin.p_virt->valid_bitmap & BIT(MAC_ADDR_FORCED))) {
+ force_mac = qed_iov_bulletin_get_forced_mac(hwfn, vf_id);
+
+ if (force_mac) {
+ /* Clear existing shadow copy of MAC to have a clean
+ * slate.
+ */
+ for (i = 0; i < QED_ETH_VF_NUM_MAC_FILTERS; i++) {
+ if (ether_addr_equal(vf->shadow_config.macs[i],
+ vf_info->mac)) {
+ memset(vf->shadow_config.macs[i], 0,
+ ETH_ALEN);
+ DP_VERBOSE(hwfn, QED_MSG_IOV,
+ "Shadow MAC %pM removed for VF 0x%02x, VF trust mode is ON\n",
+ vf_info->mac, vf_id);
+ break;
+ }
+ }
+
+ ether_addr_copy(vf_info->mac, force_mac);
+ memset(vf_info->forced_mac, 0, ETH_ALEN);
+ vf->bulletin.p_virt->valid_bitmap &=
+ ~BIT(MAC_ADDR_FORCED);
+ qed_schedule_iov(hwfn, QED_IOV_WQ_BULLETIN_UPDATE_FLAG);
+ }
+ }
+
+ /* Update shadow copy with VF MAC when trust mode is turned off */
+ if (!vf_info->is_trusted_configured) {
+ u8 empty_mac[ETH_ALEN];
+
+ memset(empty_mac, 0, ETH_ALEN);
+ for (i = 0; i < QED_ETH_VF_NUM_MAC_FILTERS; i++) {
+ if (ether_addr_equal(vf->shadow_config.macs[i],
+ empty_mac)) {
+ ether_addr_copy(vf->shadow_config.macs[i],
+ vf_info->mac);
+ DP_VERBOSE(hwfn, QED_MSG_IOV,
+ "Shadow is updated with %pM for VF 0x%02x, VF trust mode is OFF\n",
+ vf_info->mac, vf_id);
+ break;
+ }
+ }
+ /* Clear bulletin when trust mode is turned off,
+ * to have a clean slate for next (normal) operations.
+ */
+ qed_iov_bulletin_set_mac(hwfn, empty_mac, vf_id);
+ qed_schedule_iov(hwfn, QED_IOV_WQ_BULLETIN_UPDATE_FLAG);
+ }
+}
+
static void qed_iov_handle_trust_change(struct qed_hwfn *hwfn)
{
struct qed_sp_vport_update_params params;
@@ -4890,6 +5102,9 @@
continue;
vf_info->is_trusted_configured = vf_info->is_trusted_request;
+ /* Handle forced MAC mode */
+ qed_update_mac_for_vf_trust_change(hwfn, i);
+
/* Validate that the VF has a configured vport */
vf = qed_iov_get_vf_info(hwfn, i, true);
if (!vf->vport_instance)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c b/drivers/net/ethernet/qlogic/qed/qed_vf.c
index 91b5e9f..2d7fcd6 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_vf.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c
@@ -1375,6 +1375,35 @@
}
int
+qed_vf_pf_bulletin_update_mac(struct qed_hwfn *p_hwfn,
+ u8 *p_mac)
+{
+ struct qed_vf_iov *p_iov = p_hwfn->vf_iov_info;
+ struct vfpf_bulletin_update_mac_tlv *p_req;
+ struct pfvf_def_resp_tlv *p_resp;
+ int rc;
+
+ if (!p_mac)
+ return -EINVAL;
+
+ /* clear mailbox and prep header tlv */
+ p_req = qed_vf_pf_prep(p_hwfn, CHANNEL_TLV_BULLETIN_UPDATE_MAC,
+ sizeof(*p_req));
+ ether_addr_copy(p_req->mac, p_mac);
+ DP_VERBOSE(p_hwfn, QED_MSG_IOV,
+ "Requesting bulletin update for MAC[%pM]\n", p_mac);
+
+ /* add list termination tlv */
+ qed_add_tlv(p_hwfn, &p_iov->offset, CHANNEL_TLV_LIST_END,
+ sizeof(struct channel_list_end_tlv));
+
+ p_resp = &p_iov->pf2vf_reply->default_resp;
+ rc = qed_send_msg2pf(p_hwfn, &p_resp->hdr.status, sizeof(*p_resp));
+ qed_vf_pf_req_end(p_hwfn, rc);
+ return rc;
+}
+
+int
qed_vf_pf_set_coalesce(struct qed_hwfn *p_hwfn,
u16 rx_coal, u16 tx_coal, struct qed_queue_cid *p_cid)
{
diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.h b/drivers/net/ethernet/qlogic/qed/qed_vf.h
index 97d44df..4f05d5e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_vf.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_vf.h
@@ -518,6 +518,12 @@
u8 padding[6];
};
+struct vfpf_bulletin_update_mac_tlv {
+ struct vfpf_first_tlv first_tlv;
+ u8 mac[ETH_ALEN];
+ u8 padding[2];
+};
+
union vfpf_tlvs {
struct vfpf_first_tlv first_tlv;
struct vfpf_acquire_tlv acquire;
@@ -532,6 +538,7 @@
struct vfpf_update_tunn_param_tlv tunn_param_update;
struct vfpf_update_coalesce update_coalesce;
struct vfpf_read_coal_req_tlv read_coal_req;
+ struct vfpf_bulletin_update_mac_tlv bulletin_update_mac;
struct tlv_buffer_size tlv_buf_size;
};
@@ -650,6 +657,7 @@
CHANNEL_TLV_COALESCE_UPDATE,
CHANNEL_TLV_QID,
CHANNEL_TLV_COALESCE_READ,
+ CHANNEL_TLV_BULLETIN_UPDATE_MAC,
CHANNEL_TLV_MAX,
/* Required for iterating over vport-update tlvs.
@@ -1042,6 +1050,13 @@
struct qed_tunnel_info *p_tunn);
u32 qed_vf_hw_bar_size(struct qed_hwfn *p_hwfn, enum BAR_ID bar_id);
+/**
+ * @brief - Ask PF to update the MAC address in it's bulletin board
+ *
+ * @param p_mac - mac address to be updated in bulletin board
+ */
+int qed_vf_pf_bulletin_update_mac(struct qed_hwfn *p_hwfn, u8 *p_mac);
+
#else
static inline void qed_vf_get_link_params(struct qed_hwfn *p_hwfn,
struct qed_mcp_link_params *params)
@@ -1228,6 +1243,12 @@
return -EINVAL;
}
+static inline int qed_vf_pf_bulletin_update_mac(struct qed_hwfn *p_hwfn,
+ u8 *p_mac)
+{
+ return -EINVAL;
+}
+
static inline u32
qed_vf_hw_bar_size(struct qed_hwfn *p_hwfn,
enum BAR_ID bar_id)
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 9935978c..81c5c8df 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -75,6 +75,7 @@
u64 rx_bcast_pkts;
u64 mftag_filter_discards;
u64 mac_filter_discards;
+ u64 gft_filter_drop;
u64 tx_ucast_bytes;
u64 tx_mcast_bytes;
u64 tx_bcast_bytes;
@@ -290,15 +291,12 @@
* aggregation.
*/
struct sw_rx_data buffer;
- dma_addr_t buffer_mapping;
-
struct sk_buff *skb;
/* We need some structs from the start cookie until termination */
u16 vlan_tag;
- u16 start_cqe_bd_len;
- u8 start_cqe_placement_offset;
+ bool tpa_start_fail;
u8 state;
u8 frag_id;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index ecbf1de..6906e04 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -161,6 +161,7 @@
QEDE_STAT(no_buff_discards),
QEDE_PF_STAT(mftag_filter_discards),
QEDE_PF_STAT(mac_filter_discards),
+ QEDE_PF_STAT(gft_filter_drop),
QEDE_STAT(tx_err_drop_pkts),
QEDE_STAT(ttl0_discard),
QEDE_STAT(packet_too_big_discard),
@@ -1508,7 +1509,8 @@
len = le16_to_cpu(fp_cqe->len_on_first_bd);
data_ptr = (u8 *)(page_address(sw_rx_data->data) +
fp_cqe->placement_offset +
- sw_rx_data->page_offset);
+ sw_rx_data->page_offset +
+ rxq->rx_headroom);
if (ether_addr_equal(data_ptr, edev->ndev->dev_addr) &&
ether_addr_equal(data_ptr + ETH_ALEN,
edev->ndev->dev_addr)) {
diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index 6687e04..e9e088d 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -38,6 +38,7 @@
#include <linux/qed/qed_if.h>
#include "qede.h"
+#define QEDE_FILTER_PRINT_MAX_LEN (64)
struct qede_arfs_tuple {
union {
__be32 src_ipv4;
@@ -51,6 +52,18 @@
__be16 dst_port;
__be16 eth_proto;
u8 ip_proto;
+
+ /* Describe filtering mode needed for this kind of filter */
+ enum qed_filter_config_mode mode;
+
+ /* Used to compare new/old filters. Return true if IPs match */
+ bool (*ip_comp)(struct qede_arfs_tuple *a, struct qede_arfs_tuple *b);
+
+ /* Given an address into ethhdr build a header from tuple info */
+ void (*build_hdr)(struct qede_arfs_tuple *t, void *header);
+
+ /* Stringify the tuple for a print into the provided buffer */
+ void (*stringify)(struct qede_arfs_tuple *t, void *buffer);
};
struct qede_arfs_fltr_node {
@@ -73,9 +86,11 @@
u16 sw_id;
u16 rxq_id;
u16 next_rxq_id;
+ u8 vfid;
bool filter_op;
bool used;
u8 fw_rc;
+ bool b_is_drop;
struct hlist_node node;
};
@@ -90,7 +105,9 @@
spinlock_t arfs_list_lock;
unsigned long *arfs_fltr_bmap;
int filter_count;
- bool enable;
+
+ /* Currently configured filtering mode */
+ enum qed_filter_config_mode mode;
};
static void qede_configure_arfs_fltr(struct qede_dev *edev,
@@ -109,12 +126,22 @@
params.length = n->buf_len;
params.qid = rxq_id;
params.b_is_add = add_fltr;
+ params.b_is_drop = n->b_is_drop;
- DP_VERBOSE(edev, NETIF_MSG_RX_STATUS,
- "%s arfs filter flow_id=%d, sw_id=%d, src_port=%d, dst_port=%d, rxq=%d\n",
- add_fltr ? "Adding" : "Deleting",
- n->flow_id, n->sw_id, ntohs(n->tuple.src_port),
- ntohs(n->tuple.dst_port), rxq_id);
+ if (n->vfid) {
+ params.b_is_vf = true;
+ params.vf_id = n->vfid - 1;
+ }
+
+ if (n->tuple.stringify) {
+ char tuple_buffer[QEDE_FILTER_PRINT_MAX_LEN];
+
+ n->tuple.stringify(&n->tuple, tuple_buffer);
+ DP_VERBOSE(edev, NETIF_MSG_RX_STATUS,
+ "%s sw_id[0x%x]: %s [vf %u queue %d]\n",
+ add_fltr ? "Adding" : "Deleting",
+ n->sw_id, tuple_buffer, n->vfid, rxq_id);
+ }
n->used = true;
n->filter_op = add_fltr;
@@ -145,14 +172,13 @@
INIT_HLIST_NODE(&fltr->node);
hlist_add_head(&fltr->node,
QEDE_ARFS_BUCKET_HEAD(edev, bucket_idx));
+
edev->arfs->filter_count++;
-
- if (edev->arfs->filter_count == 1 && !edev->arfs->enable) {
- enum qed_filter_config_mode mode;
-
- mode = QED_FILTER_CONFIG_MODE_5_TUPLE;
- edev->ops->configure_arfs_searcher(edev->cdev, mode);
- edev->arfs->enable = true;
+ if (edev->arfs->filter_count == 1 &&
+ edev->arfs->mode == QED_FILTER_CONFIG_MODE_DISABLE) {
+ edev->ops->configure_arfs_searcher(edev->cdev,
+ fltr->tuple.mode);
+ edev->arfs->mode = fltr->tuple.mode;
}
return 0;
@@ -167,14 +193,15 @@
fltr->buf_len, DMA_TO_DEVICE);
qede_free_arfs_filter(edev, fltr);
- edev->arfs->filter_count--;
- if (!edev->arfs->filter_count && edev->arfs->enable) {
+ edev->arfs->filter_count--;
+ if (!edev->arfs->filter_count &&
+ edev->arfs->mode != QED_FILTER_CONFIG_MODE_DISABLE) {
enum qed_filter_config_mode mode;
mode = QED_FILTER_CONFIG_MODE_DISABLE;
- edev->arfs->enable = false;
edev->ops->configure_arfs_searcher(edev->cdev, mode);
+ edev->arfs->mode = QED_FILTER_CONFIG_MODE_DISABLE;
}
}
@@ -264,25 +291,17 @@
}
}
+#ifdef CONFIG_RFS_ACCEL
spin_lock_bh(&edev->arfs->arfs_list_lock);
- if (!edev->arfs->filter_count) {
- if (edev->arfs->enable) {
- enum qed_filter_config_mode mode;
-
- mode = QED_FILTER_CONFIG_MODE_DISABLE;
- edev->arfs->enable = false;
- edev->ops->configure_arfs_searcher(edev->cdev, mode);
- }
-#ifdef CONFIG_RFS_ACCEL
- } else {
+ if (edev->arfs->filter_count) {
set_bit(QEDE_SP_ARFS_CONFIG, &edev->sp_flags);
schedule_delayed_work(&edev->sp_task,
QEDE_SP_TASK_POLL_DELAY);
-#endif
}
spin_unlock_bh(&edev->arfs->arfs_list_lock);
+#endif
}
/* This function waits until all aRFS filters get deleted and freed.
@@ -512,6 +531,7 @@
eth->h_proto = skb->protocol;
n->tuple.eth_proto = skb->protocol;
n->tuple.ip_proto = ip_proto;
+ n->tuple.mode = QED_FILTER_CONFIG_MODE_5_TUPLE;
memcpy(n->data + ETH_HLEN, skb->data, skb_headlen(skb));
rc = qede_enqueue_fltr_and_config_searcher(edev, n, tbl_idx);
@@ -550,8 +570,7 @@
__qede_lock(edev);
- /* MAC hints take effect only if we haven't set one already */
- if (is_valid_ether_addr(edev->ndev->dev_addr) && !forced) {
+ if (!is_valid_ether_addr(mac)) {
__qede_unlock(edev);
return;
}
@@ -1161,6 +1180,10 @@
if (edev->state != QEDE_STATE_OPEN) {
DP_VERBOSE(edev, NETIF_MSG_IFDOWN,
"The device is currently down\n");
+ /* Ask PF to explicitly update a copy in bulletin board */
+ if (IS_VF(edev) && edev->ops->req_bulletin_update_mac)
+ edev->ops->req_bulletin_update_mac(edev->cdev,
+ ndev->dev_addr);
goto out;
}
@@ -1336,38 +1359,6 @@
return NULL;
}
-static bool
-qede_compare_user_flow_ips(struct qede_arfs_fltr_node *tpos,
- struct ethtool_rx_flow_spec *fsp,
- __be16 proto)
-{
- if (proto == htons(ETH_P_IP)) {
- struct ethtool_tcpip4_spec *ip;
-
- ip = &fsp->h_u.tcp_ip4_spec;
-
- if (tpos->tuple.src_ipv4 == ip->ip4src &&
- tpos->tuple.dst_ipv4 == ip->ip4dst)
- return true;
- else
- return false;
- } else {
- struct ethtool_tcpip6_spec *ip6;
- struct in6_addr *src;
-
- ip6 = &fsp->h_u.tcp_ip6_spec;
- src = &tpos->tuple.src_ipv6;
-
- if (!memcmp(src, &ip6->ip6src, sizeof(struct in6_addr)) &&
- !memcmp(&tpos->tuple.dst_ipv6, &ip6->ip6dst,
- sizeof(struct in6_addr)))
- return true;
- else
- return false;
- }
- return false;
-}
-
int qede_get_cls_rule_all(struct qede_dev *edev, struct ethtool_rxnfc *info,
u32 *rule_locs)
{
@@ -1452,76 +1443,19 @@
fsp->ring_cookie = fltr->rxq_id;
+ if (fltr->vfid) {
+ fsp->ring_cookie |= ((u64)fltr->vfid) <<
+ ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
+ }
+
+ if (fltr->b_is_drop)
+ fsp->ring_cookie = RX_CLS_FLOW_DISC;
unlock:
__qede_unlock(edev);
return rc;
}
static int
-qede_validate_and_check_flow_exist(struct qede_dev *edev,
- struct ethtool_rx_flow_spec *fsp,
- int *min_hlen)
-{
- __be16 src_port = 0x0, dst_port = 0x0;
- struct qede_arfs_fltr_node *fltr;
- struct hlist_node *temp;
- struct hlist_head *head;
- __be16 eth_proto;
- u8 ip_proto;
-
- if (fsp->location >= QEDE_RFS_MAX_FLTR ||
- fsp->ring_cookie >= QEDE_RSS_COUNT(edev))
- return -EINVAL;
-
- if (fsp->flow_type == TCP_V4_FLOW) {
- *min_hlen += sizeof(struct iphdr) +
- sizeof(struct tcphdr);
- eth_proto = htons(ETH_P_IP);
- ip_proto = IPPROTO_TCP;
- } else if (fsp->flow_type == UDP_V4_FLOW) {
- *min_hlen += sizeof(struct iphdr) +
- sizeof(struct udphdr);
- eth_proto = htons(ETH_P_IP);
- ip_proto = IPPROTO_UDP;
- } else if (fsp->flow_type == TCP_V6_FLOW) {
- *min_hlen += sizeof(struct ipv6hdr) +
- sizeof(struct tcphdr);
- eth_proto = htons(ETH_P_IPV6);
- ip_proto = IPPROTO_TCP;
- } else if (fsp->flow_type == UDP_V6_FLOW) {
- *min_hlen += sizeof(struct ipv6hdr) +
- sizeof(struct udphdr);
- eth_proto = htons(ETH_P_IPV6);
- ip_proto = IPPROTO_UDP;
- } else {
- DP_NOTICE(edev, "Unsupported flow type = 0x%x\n",
- fsp->flow_type);
- return -EPROTONOSUPPORT;
- }
-
- if (eth_proto == htons(ETH_P_IP)) {
- src_port = fsp->h_u.tcp_ip4_spec.psrc;
- dst_port = fsp->h_u.tcp_ip4_spec.pdst;
- } else {
- src_port = fsp->h_u.tcp_ip6_spec.psrc;
- dst_port = fsp->h_u.tcp_ip6_spec.pdst;
- }
-
- head = QEDE_ARFS_BUCKET_HEAD(edev, 0);
- hlist_for_each_entry_safe(fltr, temp, head, node) {
- if ((fltr->tuple.ip_proto == ip_proto &&
- fltr->tuple.eth_proto == eth_proto &&
- qede_compare_user_flow_ips(fltr, fsp, eth_proto) &&
- fltr->tuple.src_port == src_port &&
- fltr->tuple.dst_port == dst_port) ||
- fltr->sw_id == fsp->location)
- return -EEXIST;
- }
-
- return 0;
-}
-
-static int
qede_poll_arfs_filter_config(struct qede_dev *edev,
struct qede_arfs_fltr_node *fltr)
{
@@ -1533,6 +1467,7 @@
}
if (count == 0 || fltr->fw_rc) {
+ DP_NOTICE(edev, "Timeout in polling filter config\n");
qede_dequeue_fltr_and_config_searcher(edev, fltr);
return -EIO;
}
@@ -1540,14 +1475,412 @@
return fltr->fw_rc;
}
+static int qede_flow_get_min_header_size(struct qede_arfs_tuple *t)
+{
+ int size = ETH_HLEN;
+
+ if (t->eth_proto == htons(ETH_P_IP))
+ size += sizeof(struct iphdr);
+ else
+ size += sizeof(struct ipv6hdr);
+
+ if (t->ip_proto == IPPROTO_TCP)
+ size += sizeof(struct tcphdr);
+ else
+ size += sizeof(struct udphdr);
+
+ return size;
+}
+
+static bool qede_flow_spec_ipv4_cmp(struct qede_arfs_tuple *a,
+ struct qede_arfs_tuple *b)
+{
+ if (a->eth_proto != htons(ETH_P_IP) ||
+ b->eth_proto != htons(ETH_P_IP))
+ return false;
+
+ return (a->src_ipv4 == b->src_ipv4) &&
+ (a->dst_ipv4 == b->dst_ipv4);
+}
+
+static void qede_flow_build_ipv4_hdr(struct qede_arfs_tuple *t,
+ void *header)
+{
+ __be16 *ports = (__be16 *)(header + ETH_HLEN + sizeof(struct iphdr));
+ struct iphdr *ip = (struct iphdr *)(header + ETH_HLEN);
+ struct ethhdr *eth = (struct ethhdr *)header;
+
+ eth->h_proto = t->eth_proto;
+ ip->saddr = t->src_ipv4;
+ ip->daddr = t->dst_ipv4;
+ ip->version = 0x4;
+ ip->ihl = 0x5;
+ ip->protocol = t->ip_proto;
+ ip->tot_len = cpu_to_be16(qede_flow_get_min_header_size(t) - ETH_HLEN);
+
+ /* ports is weakly typed to suit both TCP and UDP ports */
+ ports[0] = t->src_port;
+ ports[1] = t->dst_port;
+}
+
+static void qede_flow_stringify_ipv4_hdr(struct qede_arfs_tuple *t,
+ void *buffer)
+{
+ const char *prefix = t->ip_proto == IPPROTO_TCP ? "TCP" : "UDP";
+
+ snprintf(buffer, QEDE_FILTER_PRINT_MAX_LEN,
+ "%s %pI4 (%04x) -> %pI4 (%04x)",
+ prefix, &t->src_ipv4, t->src_port,
+ &t->dst_ipv4, t->dst_port);
+}
+
+static bool qede_flow_spec_ipv6_cmp(struct qede_arfs_tuple *a,
+ struct qede_arfs_tuple *b)
+{
+ if (a->eth_proto != htons(ETH_P_IPV6) ||
+ b->eth_proto != htons(ETH_P_IPV6))
+ return false;
+
+ if (memcmp(&a->src_ipv6, &b->src_ipv6, sizeof(struct in6_addr)))
+ return false;
+
+ if (memcmp(&a->dst_ipv6, &b->dst_ipv6, sizeof(struct in6_addr)))
+ return false;
+
+ return true;
+}
+
+static void qede_flow_build_ipv6_hdr(struct qede_arfs_tuple *t,
+ void *header)
+{
+ __be16 *ports = (__be16 *)(header + ETH_HLEN + sizeof(struct ipv6hdr));
+ struct ipv6hdr *ip6 = (struct ipv6hdr *)(header + ETH_HLEN);
+ struct ethhdr *eth = (struct ethhdr *)header;
+
+ eth->h_proto = t->eth_proto;
+ memcpy(&ip6->saddr, &t->src_ipv6, sizeof(struct in6_addr));
+ memcpy(&ip6->daddr, &t->dst_ipv6, sizeof(struct in6_addr));
+ ip6->version = 0x6;
+
+ if (t->ip_proto == IPPROTO_TCP) {
+ ip6->nexthdr = NEXTHDR_TCP;
+ ip6->payload_len = cpu_to_be16(sizeof(struct tcphdr));
+ } else {
+ ip6->nexthdr = NEXTHDR_UDP;
+ ip6->payload_len = cpu_to_be16(sizeof(struct udphdr));
+ }
+
+ /* ports is weakly typed to suit both TCP and UDP ports */
+ ports[0] = t->src_port;
+ ports[1] = t->dst_port;
+}
+
+/* Validate fields which are set and not accepted by the driver */
+static int qede_flow_spec_validate_unused(struct qede_dev *edev,
+ struct ethtool_rx_flow_spec *fs)
+{
+ if (fs->flow_type & FLOW_MAC_EXT) {
+ DP_INFO(edev, "Don't support MAC extensions\n");
+ return -EOPNOTSUPP;
+ }
+
+ if ((fs->flow_type & FLOW_EXT) &&
+ (fs->h_ext.vlan_etype || fs->h_ext.vlan_tci)) {
+ DP_INFO(edev, "Don't support vlan-based classification\n");
+ return -EOPNOTSUPP;
+ }
+
+ if ((fs->flow_type & FLOW_EXT) &&
+ (fs->h_ext.data[0] || fs->h_ext.data[1])) {
+ DP_INFO(edev, "Don't support user defined data\n");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int qede_flow_spec_to_tuple_ipv4_common(struct qede_dev *edev,
+ struct qede_arfs_tuple *t,
+ struct ethtool_rx_flow_spec *fs)
+{
+ if ((fs->h_u.tcp_ip4_spec.ip4src &
+ fs->m_u.tcp_ip4_spec.ip4src) != fs->h_u.tcp_ip4_spec.ip4src) {
+ DP_INFO(edev, "Don't support IP-masks\n");
+ return -EOPNOTSUPP;
+ }
+
+ if ((fs->h_u.tcp_ip4_spec.ip4dst &
+ fs->m_u.tcp_ip4_spec.ip4dst) != fs->h_u.tcp_ip4_spec.ip4dst) {
+ DP_INFO(edev, "Don't support IP-masks\n");
+ return -EOPNOTSUPP;
+ }
+
+ if ((fs->h_u.tcp_ip4_spec.psrc &
+ fs->m_u.tcp_ip4_spec.psrc) != fs->h_u.tcp_ip4_spec.psrc) {
+ DP_INFO(edev, "Don't support port-masks\n");
+ return -EOPNOTSUPP;
+ }
+
+ if ((fs->h_u.tcp_ip4_spec.pdst &
+ fs->m_u.tcp_ip4_spec.pdst) != fs->h_u.tcp_ip4_spec.pdst) {
+ DP_INFO(edev, "Don't support port-masks\n");
+ return -EOPNOTSUPP;
+ }
+
+ if (fs->h_u.tcp_ip4_spec.tos) {
+ DP_INFO(edev, "Don't support tos\n");
+ return -EOPNOTSUPP;
+ }
+
+ t->eth_proto = htons(ETH_P_IP);
+ t->src_ipv4 = fs->h_u.tcp_ip4_spec.ip4src;
+ t->dst_ipv4 = fs->h_u.tcp_ip4_spec.ip4dst;
+ t->src_port = fs->h_u.tcp_ip4_spec.psrc;
+ t->dst_port = fs->h_u.tcp_ip4_spec.pdst;
+
+ /* We must either have a valid 4-tuple or only dst port
+ * or only src ip as an input
+ */
+ if (t->src_port && t->dst_port && t->src_ipv4 && t->dst_ipv4) {
+ t->mode = QED_FILTER_CONFIG_MODE_5_TUPLE;
+ } else if (!t->src_port && t->dst_port &&
+ !t->src_ipv4 && !t->dst_ipv4) {
+ t->mode = QED_FILTER_CONFIG_MODE_L4_PORT;
+ } else if (!t->src_port && !t->dst_port &&
+ !t->dst_ipv4 && t->src_ipv4) {
+ t->mode = QED_FILTER_CONFIG_MODE_IP_SRC;
+ } else {
+ DP_INFO(edev, "Invalid N-tuple\n");
+ return -EOPNOTSUPP;
+ }
+
+ t->ip_comp = qede_flow_spec_ipv4_cmp;
+ t->build_hdr = qede_flow_build_ipv4_hdr;
+ t->stringify = qede_flow_stringify_ipv4_hdr;
+
+ return 0;
+}
+
+static int qede_flow_spec_to_tuple_tcpv4(struct qede_dev *edev,
+ struct qede_arfs_tuple *t,
+ struct ethtool_rx_flow_spec *fs)
+{
+ t->ip_proto = IPPROTO_TCP;
+
+ if (qede_flow_spec_to_tuple_ipv4_common(edev, t, fs))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int qede_flow_spec_to_tuple_udpv4(struct qede_dev *edev,
+ struct qede_arfs_tuple *t,
+ struct ethtool_rx_flow_spec *fs)
+{
+ t->ip_proto = IPPROTO_UDP;
+
+ if (qede_flow_spec_to_tuple_ipv4_common(edev, t, fs))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int qede_flow_spec_to_tuple_ipv6_common(struct qede_dev *edev,
+ struct qede_arfs_tuple *t,
+ struct ethtool_rx_flow_spec *fs)
+{
+ struct in6_addr zero_addr;
+ void *p;
+
+ p = &zero_addr;
+ memset(p, 0, sizeof(zero_addr));
+
+ if ((fs->h_u.tcp_ip6_spec.psrc &
+ fs->m_u.tcp_ip6_spec.psrc) != fs->h_u.tcp_ip6_spec.psrc) {
+ DP_INFO(edev, "Don't support port-masks\n");
+ return -EOPNOTSUPP;
+ }
+
+ if ((fs->h_u.tcp_ip6_spec.pdst &
+ fs->m_u.tcp_ip6_spec.pdst) != fs->h_u.tcp_ip6_spec.pdst) {
+ DP_INFO(edev, "Don't support port-masks\n");
+ return -EOPNOTSUPP;
+ }
+
+ if (fs->h_u.tcp_ip6_spec.tclass) {
+ DP_INFO(edev, "Don't support tclass\n");
+ return -EOPNOTSUPP;
+ }
+
+ t->eth_proto = htons(ETH_P_IPV6);
+ memcpy(&t->src_ipv6, &fs->h_u.tcp_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&t->dst_ipv6, &fs->h_u.tcp_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ t->src_port = fs->h_u.tcp_ip6_spec.psrc;
+ t->dst_port = fs->h_u.tcp_ip6_spec.pdst;
+
+ /* We must make sure we have a valid 4-tuple or only dest port
+ * or only src ip as an input
+ */
+ if (t->src_port && t->dst_port &&
+ memcmp(&t->src_ipv6, p, sizeof(struct in6_addr)) &&
+ memcmp(&t->dst_ipv6, p, sizeof(struct in6_addr))) {
+ t->mode = QED_FILTER_CONFIG_MODE_5_TUPLE;
+ } else if (!t->src_port && t->dst_port &&
+ !memcmp(&t->src_ipv6, p, sizeof(struct in6_addr)) &&
+ !memcmp(&t->dst_ipv6, p, sizeof(struct in6_addr))) {
+ t->mode = QED_FILTER_CONFIG_MODE_L4_PORT;
+ } else if (!t->src_port && !t->dst_port &&
+ !memcmp(&t->dst_ipv6, p, sizeof(struct in6_addr)) &&
+ memcmp(&t->src_ipv6, p, sizeof(struct in6_addr))) {
+ t->mode = QED_FILTER_CONFIG_MODE_IP_SRC;
+ } else {
+ DP_INFO(edev, "Invalid N-tuple\n");
+ return -EOPNOTSUPP;
+ }
+
+ t->ip_comp = qede_flow_spec_ipv6_cmp;
+ t->build_hdr = qede_flow_build_ipv6_hdr;
+
+ return 0;
+}
+
+static int qede_flow_spec_to_tuple_tcpv6(struct qede_dev *edev,
+ struct qede_arfs_tuple *t,
+ struct ethtool_rx_flow_spec *fs)
+{
+ t->ip_proto = IPPROTO_TCP;
+
+ if (qede_flow_spec_to_tuple_ipv6_common(edev, t, fs))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int qede_flow_spec_to_tuple_udpv6(struct qede_dev *edev,
+ struct qede_arfs_tuple *t,
+ struct ethtool_rx_flow_spec *fs)
+{
+ t->ip_proto = IPPROTO_UDP;
+
+ if (qede_flow_spec_to_tuple_ipv6_common(edev, t, fs))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int qede_flow_spec_to_tuple(struct qede_dev *edev,
+ struct qede_arfs_tuple *t,
+ struct ethtool_rx_flow_spec *fs)
+{
+ memset(t, 0, sizeof(*t));
+
+ if (qede_flow_spec_validate_unused(edev, fs))
+ return -EOPNOTSUPP;
+
+ switch ((fs->flow_type & ~FLOW_EXT)) {
+ case TCP_V4_FLOW:
+ return qede_flow_spec_to_tuple_tcpv4(edev, t, fs);
+ case UDP_V4_FLOW:
+ return qede_flow_spec_to_tuple_udpv4(edev, t, fs);
+ case TCP_V6_FLOW:
+ return qede_flow_spec_to_tuple_tcpv6(edev, t, fs);
+ case UDP_V6_FLOW:
+ return qede_flow_spec_to_tuple_udpv6(edev, t, fs);
+ default:
+ DP_VERBOSE(edev, NETIF_MSG_IFUP,
+ "Can't support flow of type %08x\n", fs->flow_type);
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int qede_flow_spec_validate(struct qede_dev *edev,
+ struct ethtool_rx_flow_spec *fs,
+ struct qede_arfs_tuple *t)
+{
+ if (fs->location >= QEDE_RFS_MAX_FLTR) {
+ DP_INFO(edev, "Location out-of-bounds\n");
+ return -EINVAL;
+ }
+
+ /* Check location isn't already in use */
+ if (test_bit(fs->location, edev->arfs->arfs_fltr_bmap)) {
+ DP_INFO(edev, "Location already in use\n");
+ return -EINVAL;
+ }
+
+ /* Check if the filtering-mode could support the filter */
+ if (edev->arfs->filter_count &&
+ edev->arfs->mode != t->mode) {
+ DP_INFO(edev,
+ "flow_spec would require filtering mode %08x, but %08x is configured\n",
+ t->mode, edev->arfs->filter_count);
+ return -EINVAL;
+ }
+
+ /* If drop requested then no need to validate other data */
+ if (fs->ring_cookie == RX_CLS_FLOW_DISC)
+ return 0;
+
+ if (ethtool_get_flow_spec_ring_vf(fs->ring_cookie))
+ return 0;
+
+ if (fs->ring_cookie >= QEDE_RSS_COUNT(edev)) {
+ DP_INFO(edev, "Queue out-of-bounds\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/* Must be called while qede lock is held */
+static struct qede_arfs_fltr_node *
+qede_flow_find_fltr(struct qede_dev *edev, struct qede_arfs_tuple *t)
+{
+ struct qede_arfs_fltr_node *fltr;
+ struct hlist_node *temp;
+ struct hlist_head *head;
+
+ head = QEDE_ARFS_BUCKET_HEAD(edev, 0);
+
+ hlist_for_each_entry_safe(fltr, temp, head, node) {
+ if (fltr->tuple.ip_proto == t->ip_proto &&
+ fltr->tuple.src_port == t->src_port &&
+ fltr->tuple.dst_port == t->dst_port &&
+ t->ip_comp(&fltr->tuple, t))
+ return fltr;
+ }
+
+ return NULL;
+}
+
+static void qede_flow_set_destination(struct qede_dev *edev,
+ struct qede_arfs_fltr_node *n,
+ struct ethtool_rx_flow_spec *fs)
+{
+ if (fs->ring_cookie == RX_CLS_FLOW_DISC) {
+ n->b_is_drop = true;
+ return;
+ }
+
+ n->vfid = ethtool_get_flow_spec_ring_vf(fs->ring_cookie);
+ n->rxq_id = ethtool_get_flow_spec_ring(fs->ring_cookie);
+ n->next_rxq_id = n->rxq_id;
+
+ if (n->vfid)
+ DP_VERBOSE(edev, QED_MSG_SP,
+ "Configuring N-tuple for VF 0x%02x\n", n->vfid - 1);
+}
+
int qede_add_cls_rule(struct qede_dev *edev, struct ethtool_rxnfc *info)
{
struct ethtool_rx_flow_spec *fsp = &info->fs;
struct qede_arfs_fltr_node *n;
- int min_hlen = ETH_HLEN, rc;
- struct ethhdr *eth;
- struct iphdr *ip;
- __be16 *ports;
+ struct qede_arfs_tuple t;
+ int min_hlen, rc;
__qede_lock(edev);
@@ -1556,16 +1889,28 @@
goto unlock;
}
- rc = qede_validate_and_check_flow_exist(edev, fsp, &min_hlen);
+ /* Translate the flow specification into something fittign our DB */
+ rc = qede_flow_spec_to_tuple(edev, &t, fsp);
if (rc)
goto unlock;
+ /* Make sure location is valid and filter isn't already set */
+ rc = qede_flow_spec_validate(edev, fsp, &t);
+ if (rc)
+ goto unlock;
+
+ if (qede_flow_find_fltr(edev, &t)) {
+ rc = -EINVAL;
+ goto unlock;
+ }
+
n = kzalloc(sizeof(*n), GFP_KERNEL);
if (!n) {
rc = -ENOMEM;
goto unlock;
}
+ min_hlen = qede_flow_get_min_header_size(&t);
n->data = kzalloc(min_hlen, GFP_KERNEL);
if (!n->data) {
kfree(n);
@@ -1576,68 +1921,13 @@
n->sw_id = fsp->location;
set_bit(n->sw_id, edev->arfs->arfs_fltr_bmap);
n->buf_len = min_hlen;
- n->rxq_id = fsp->ring_cookie;
- n->next_rxq_id = n->rxq_id;
- eth = (struct ethhdr *)n->data;
- if (info->fs.flow_type == TCP_V4_FLOW ||
- info->fs.flow_type == UDP_V4_FLOW) {
- ports = (__be16 *)(n->data + ETH_HLEN +
- sizeof(struct iphdr));
- eth->h_proto = htons(ETH_P_IP);
- n->tuple.eth_proto = htons(ETH_P_IP);
- n->tuple.src_ipv4 = info->fs.h_u.tcp_ip4_spec.ip4src;
- n->tuple.dst_ipv4 = info->fs.h_u.tcp_ip4_spec.ip4dst;
- n->tuple.src_port = info->fs.h_u.tcp_ip4_spec.psrc;
- n->tuple.dst_port = info->fs.h_u.tcp_ip4_spec.pdst;
- ports[0] = n->tuple.src_port;
- ports[1] = n->tuple.dst_port;
- ip = (struct iphdr *)(n->data + ETH_HLEN);
- ip->saddr = info->fs.h_u.tcp_ip4_spec.ip4src;
- ip->daddr = info->fs.h_u.tcp_ip4_spec.ip4dst;
- ip->version = 0x4;
- ip->ihl = 0x5;
+ memcpy(&n->tuple, &t, sizeof(n->tuple));
- if (info->fs.flow_type == TCP_V4_FLOW) {
- n->tuple.ip_proto = IPPROTO_TCP;
- ip->protocol = IPPROTO_TCP;
- } else {
- n->tuple.ip_proto = IPPROTO_UDP;
- ip->protocol = IPPROTO_UDP;
- }
- ip->tot_len = cpu_to_be16(min_hlen - ETH_HLEN);
- } else {
- struct ipv6hdr *ip6;
+ qede_flow_set_destination(edev, n, fsp);
- ip6 = (struct ipv6hdr *)(n->data + ETH_HLEN);
- ports = (__be16 *)(n->data + ETH_HLEN +
- sizeof(struct ipv6hdr));
- eth->h_proto = htons(ETH_P_IPV6);
- n->tuple.eth_proto = htons(ETH_P_IPV6);
- memcpy(&n->tuple.src_ipv6, &info->fs.h_u.tcp_ip6_spec.ip6src,
- sizeof(struct in6_addr));
- memcpy(&n->tuple.dst_ipv6, &info->fs.h_u.tcp_ip6_spec.ip6dst,
- sizeof(struct in6_addr));
- n->tuple.src_port = info->fs.h_u.tcp_ip6_spec.psrc;
- n->tuple.dst_port = info->fs.h_u.tcp_ip6_spec.pdst;
- ports[0] = n->tuple.src_port;
- ports[1] = n->tuple.dst_port;
- memcpy(&ip6->saddr, &n->tuple.src_ipv6,
- sizeof(struct in6_addr));
- memcpy(&ip6->daddr, &n->tuple.dst_ipv6,
- sizeof(struct in6_addr));
- ip6->version = 0x6;
-
- if (info->fs.flow_type == TCP_V6_FLOW) {
- n->tuple.ip_proto = IPPROTO_TCP;
- ip6->nexthdr = NEXTHDR_TCP;
- ip6->payload_len = cpu_to_be16(sizeof(struct tcphdr));
- } else {
- n->tuple.ip_proto = IPPROTO_UDP;
- ip6->nexthdr = NEXTHDR_UDP;
- ip6->payload_len = cpu_to_be16(sizeof(struct udphdr));
- }
- }
+ /* Build a minimal header according to the flow */
+ n->tuple.build_hdr(&n->tuple, n->data);
rc = qede_enqueue_fltr_and_config_searcher(edev, n, 0);
if (rc)
@@ -1647,6 +1937,7 @@
rc = qede_poll_arfs_filter_config(edev, n);
unlock:
__qede_unlock(edev);
+
return rc;
}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 1494130..6c70239 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -660,7 +660,8 @@
/* Add one frag and update the appropriate fields in the skb */
skb_fill_page_desc(skb, tpa_info->frag_id++,
- current_bd->data, current_bd->page_offset,
+ current_bd->data,
+ current_bd->page_offset + rxq->rx_headroom,
len_on_bd);
if (unlikely(qede_realloc_rx_buffer(rxq, current_bd))) {
@@ -671,8 +672,7 @@
goto out;
}
- qed_chain_consume(&rxq->rx_bd_ring);
- rxq->sw_rx_cons++;
+ qede_rx_bd_ring_consume(rxq);
skb->data_len += len_on_bd;
skb->truesize += rxq->rx_buf_seg_size;
@@ -721,64 +721,129 @@
return QEDE_CSUM_UNNECESSARY | tcsum;
}
+static inline struct sk_buff *
+qede_build_skb(struct qede_rx_queue *rxq,
+ struct sw_rx_data *bd, u16 len, u16 pad)
+{
+ struct sk_buff *skb;
+ void *buf;
+
+ buf = page_address(bd->data) + bd->page_offset;
+ skb = build_skb(buf, rxq->rx_buf_seg_size);
+
+ skb_reserve(skb, pad);
+ skb_put(skb, len);
+
+ return skb;
+}
+
+static struct sk_buff *
+qede_tpa_rx_build_skb(struct qede_dev *edev,
+ struct qede_rx_queue *rxq,
+ struct sw_rx_data *bd, u16 len, u16 pad,
+ bool alloc_skb)
+{
+ struct sk_buff *skb;
+
+ skb = qede_build_skb(rxq, bd, len, pad);
+ bd->page_offset += rxq->rx_buf_seg_size;
+
+ if (bd->page_offset == PAGE_SIZE) {
+ if (unlikely(qede_alloc_rx_buffer(rxq, true))) {
+ DP_NOTICE(edev,
+ "Failed to allocate RX buffer for tpa start\n");
+ bd->page_offset -= rxq->rx_buf_seg_size;
+ page_ref_inc(bd->data);
+ dev_kfree_skb_any(skb);
+ return NULL;
+ }
+ } else {
+ page_ref_inc(bd->data);
+ qede_reuse_page(rxq, bd);
+ }
+
+ /* We've consumed the first BD and prepared an SKB */
+ qede_rx_bd_ring_consume(rxq);
+
+ return skb;
+}
+
+static struct sk_buff *
+qede_rx_build_skb(struct qede_dev *edev,
+ struct qede_rx_queue *rxq,
+ struct sw_rx_data *bd, u16 len, u16 pad)
+{
+ struct sk_buff *skb = NULL;
+
+ /* For smaller frames still need to allocate skb, memcpy
+ * data and benefit in reusing the page segment instead of
+ * un-mapping it.
+ */
+ if ((len + pad <= edev->rx_copybreak)) {
+ unsigned int offset = bd->page_offset + pad;
+
+ skb = netdev_alloc_skb(edev->ndev, QEDE_RX_HDR_SIZE);
+ if (unlikely(!skb))
+ return NULL;
+
+ skb_reserve(skb, pad);
+ memcpy(skb_put(skb, len),
+ page_address(bd->data) + offset, len);
+ qede_reuse_page(rxq, bd);
+ goto out;
+ }
+
+ skb = qede_build_skb(rxq, bd, len, pad);
+
+ if (unlikely(qede_realloc_rx_buffer(rxq, bd))) {
+ /* Incr page ref count to reuse on allocation failure so
+ * that it doesn't get freed while freeing SKB [as its
+ * already mapped there].
+ */
+ page_ref_inc(bd->data);
+ dev_kfree_skb_any(skb);
+ return NULL;
+ }
+out:
+ /* We've consumed the first BD and prepared an SKB */
+ qede_rx_bd_ring_consume(rxq);
+
+ return skb;
+}
+
static void qede_tpa_start(struct qede_dev *edev,
struct qede_rx_queue *rxq,
struct eth_fast_path_rx_tpa_start_cqe *cqe)
{
struct qede_agg_info *tpa_info = &rxq->tpa_info[cqe->tpa_agg_index];
- struct eth_rx_bd *rx_bd_cons = qed_chain_consume(&rxq->rx_bd_ring);
- struct eth_rx_bd *rx_bd_prod = qed_chain_produce(&rxq->rx_bd_ring);
- struct sw_rx_data *replace_buf = &tpa_info->buffer;
- dma_addr_t mapping = tpa_info->buffer_mapping;
struct sw_rx_data *sw_rx_data_cons;
- struct sw_rx_data *sw_rx_data_prod;
+ u16 pad;
sw_rx_data_cons = &rxq->sw_rx_ring[rxq->sw_rx_cons & NUM_RX_BDS_MAX];
- sw_rx_data_prod = &rxq->sw_rx_ring[rxq->sw_rx_prod & NUM_RX_BDS_MAX];
+ pad = cqe->placement_offset + rxq->rx_headroom;
- /* Use pre-allocated replacement buffer - we can't release the agg.
- * start until its over and we don't want to risk allocation failing
- * here, so re-allocate when aggregation will be over.
- */
- sw_rx_data_prod->mapping = replace_buf->mapping;
+ tpa_info->skb = qede_tpa_rx_build_skb(edev, rxq, sw_rx_data_cons,
+ le16_to_cpu(cqe->len_on_first_bd),
+ pad, false);
+ tpa_info->buffer.page_offset = sw_rx_data_cons->page_offset;
+ tpa_info->buffer.mapping = sw_rx_data_cons->mapping;
- sw_rx_data_prod->data = replace_buf->data;
- rx_bd_prod->addr.hi = cpu_to_le32(upper_32_bits(mapping));
- rx_bd_prod->addr.lo = cpu_to_le32(lower_32_bits(mapping));
- sw_rx_data_prod->page_offset = replace_buf->page_offset;
-
- rxq->sw_rx_prod++;
-
- /* move partial skb from cons to pool (don't unmap yet)
- * save mapping, incase we drop the packet later on.
- */
- tpa_info->buffer = *sw_rx_data_cons;
- mapping = HILO_U64(le32_to_cpu(rx_bd_cons->addr.hi),
- le32_to_cpu(rx_bd_cons->addr.lo));
-
- tpa_info->buffer_mapping = mapping;
- rxq->sw_rx_cons++;
-
- /* set tpa state to start only if we are able to allocate skb
- * for this aggregation, otherwise mark as error and aggregation will
- * be dropped
- */
- tpa_info->skb = netdev_alloc_skb(edev->ndev,
- le16_to_cpu(cqe->len_on_first_bd));
if (unlikely(!tpa_info->skb)) {
DP_NOTICE(edev, "Failed to allocate SKB for gro\n");
+
+ /* Consume from ring but do not produce since
+ * this might be used by FW still, it will be re-used
+ * at TPA end.
+ */
+ tpa_info->tpa_start_fail = true;
+ qede_rx_bd_ring_consume(rxq);
tpa_info->state = QEDE_AGG_STATE_ERROR;
goto cons_buf;
}
- /* Start filling in the aggregation info */
- skb_put(tpa_info->skb, le16_to_cpu(cqe->len_on_first_bd));
tpa_info->frag_id = 0;
tpa_info->state = QEDE_AGG_STATE_START;
- /* Store some information from first CQE */
- tpa_info->start_cqe_placement_offset = cqe->placement_offset;
- tpa_info->start_cqe_bd_len = le16_to_cpu(cqe->len_on_first_bd);
if ((le16_to_cpu(cqe->pars_flags.flags) >>
PARSING_AND_ERR_FLAGS_TAG8021QEXIST_SHIFT) &
PARSING_AND_ERR_FLAGS_TAG8021QEXIST_MASK)
@@ -899,6 +964,10 @@
tpa_info = &rxq->tpa_info[cqe->tpa_agg_index];
skb = tpa_info->skb;
+ if (tpa_info->buffer.page_offset == PAGE_SIZE)
+ dma_unmap_page(rxq->dev, tpa_info->buffer.mapping,
+ PAGE_SIZE, rxq->data_direction);
+
for (i = 0; cqe->len_list[i]; i++)
qede_fill_frag_skb(edev, rxq, cqe->tpa_agg_index,
le16_to_cpu(cqe->len_list[i]));
@@ -919,11 +988,6 @@
"Strange - total packet len [cqe] is %4x but SKB has len %04x\n",
le16_to_cpu(cqe->total_packet_len), skb->len);
- memcpy(skb->data,
- page_address(tpa_info->buffer.data) +
- tpa_info->start_cqe_placement_offset +
- tpa_info->buffer.page_offset, tpa_info->start_cqe_bd_len);
-
/* Finalize the SKB */
skb->protocol = eth_type_trans(skb, edev->ndev);
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -940,6 +1004,12 @@
return 1;
err:
tpa_info->state = QEDE_AGG_STATE_NONE;
+
+ if (tpa_info->tpa_start_fail) {
+ qede_reuse_page(rxq, &tpa_info->buffer);
+ tpa_info->tpa_start_fail = false;
+ }
+
dev_kfree_skb_any(tpa_info->skb);
tpa_info->skb = NULL;
return 0;
@@ -1058,65 +1128,6 @@
return false;
}
-static struct sk_buff *qede_rx_allocate_skb(struct qede_dev *edev,
- struct qede_rx_queue *rxq,
- struct sw_rx_data *bd, u16 len,
- u16 pad)
-{
- unsigned int offset = bd->page_offset + pad;
- struct skb_frag_struct *frag;
- struct page *page = bd->data;
- unsigned int pull_len;
- struct sk_buff *skb;
- unsigned char *va;
-
- /* Allocate a new SKB with a sufficient large header len */
- skb = netdev_alloc_skb(edev->ndev, QEDE_RX_HDR_SIZE);
- if (unlikely(!skb))
- return NULL;
-
- /* Copy data into SKB - if it's small, we can simply copy it and
- * re-use the already allcoated & mapped memory.
- */
- if (len + pad <= edev->rx_copybreak) {
- skb_put_data(skb, page_address(page) + offset, len);
- qede_reuse_page(rxq, bd);
- goto out;
- }
-
- frag = &skb_shinfo(skb)->frags[0];
-
- skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
- page, offset, len, rxq->rx_buf_seg_size);
-
- va = skb_frag_address(frag);
- pull_len = eth_get_headlen(va, QEDE_RX_HDR_SIZE);
-
- /* Align the pull_len to optimize memcpy */
- memcpy(skb->data, va, ALIGN(pull_len, sizeof(long)));
-
- /* Correct the skb & frag sizes offset after the pull */
- skb_frag_size_sub(frag, pull_len);
- frag->page_offset += pull_len;
- skb->data_len -= pull_len;
- skb->tail += pull_len;
-
- if (unlikely(qede_realloc_rx_buffer(rxq, bd))) {
- /* Incr page ref count to reuse on allocation failure so
- * that it doesn't get freed while freeing SKB [as its
- * already mapped there].
- */
- page_ref_inc(page);
- dev_kfree_skb_any(skb);
- return NULL;
- }
-
-out:
- /* We've consumed the first BD and prepared an SKB */
- qede_rx_bd_ring_consume(rxq);
- return skb;
-}
-
static int qede_rx_build_jumbo(struct qede_dev *edev,
struct qede_rx_queue *rxq,
struct sk_buff *skb,
@@ -1157,7 +1168,7 @@
PAGE_SIZE, DMA_FROM_DEVICE);
skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
- bd->data, 0, cur_size);
+ bd->data, rxq->rx_headroom, cur_size);
skb->truesize += PAGE_SIZE;
skb->data_len += cur_size;
@@ -1256,7 +1267,7 @@
/* Basic validation passed; Need to prepare an SKB. This would also
* guarantee to finally consume the first BD upon success.
*/
- skb = qede_rx_allocate_skb(edev, rxq, bd, len, pad);
+ skb = qede_rx_build_skb(edev, rxq, bd, len, pad);
if (!skb) {
rxq->rx_alloc_errors++;
qede_recycle_rx_bd_ring(rxq, fp_cqe->bd_num);
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index f6655e2..d118771 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -133,6 +133,9 @@
static void qede_remove(struct pci_dev *pdev);
static void qede_shutdown(struct pci_dev *pdev);
static void qede_link_update(void *dev, struct qed_link_output *link);
+static void qede_get_eth_tlv_data(void *edev, void *data);
+static void qede_get_generic_tlv_data(void *edev,
+ struct qed_generic_tlvs *data);
/* The qede lock is used to protect driver state change and driver flows that
* are not reentrant.
@@ -199,7 +202,7 @@
/* Enable/Disable Tx switching for PF */
if ((rc == num_vfs_param) && netif_running(edev->ndev) &&
- qed_info->mf_mode != QED_MF_NPAR && qed_info->tx_switching) {
+ !qed_info->b_inter_pf_switch && qed_info->tx_switching) {
vport_params->vport_id = 0;
vport_params->update_tx_switching_flg = 1;
vport_params->tx_switching_flg = num_vfs_param ? 1 : 0;
@@ -228,6 +231,8 @@
.arfs_filter_op = qede_arfs_filter_op,
#endif
.link_update = qede_link_update,
+ .get_generic_tlv_data = qede_get_generic_tlv_data,
+ .get_protocol_tlv_data = qede_get_eth_tlv_data,
},
.force_mac = qede_force_mac,
.ports_update = qede_udp_ports_update,
@@ -342,6 +347,7 @@
p_common->rx_bcast_pkts = stats.common.rx_bcast_pkts;
p_common->mftag_filter_discards = stats.common.mftag_filter_discards;
p_common->mac_filter_discards = stats.common.mac_filter_discards;
+ p_common->gft_filter_drop = stats.common.gft_filter_drop;
p_common->tx_ucast_bytes = stats.common.tx_ucast_bytes;
p_common->tx_mcast_bytes = stats.common.tx_mcast_bytes;
@@ -1196,30 +1202,8 @@
}
}
-static void qede_free_sge_mem(struct qede_dev *edev, struct qede_rx_queue *rxq)
-{
- int i;
-
- if (edev->gro_disable)
- return;
-
- for (i = 0; i < ETH_TPA_MAX_AGGS_NUM; i++) {
- struct qede_agg_info *tpa_info = &rxq->tpa_info[i];
- struct sw_rx_data *replace_buf = &tpa_info->buffer;
-
- if (replace_buf->data) {
- dma_unmap_page(&edev->pdev->dev,
- replace_buf->mapping,
- PAGE_SIZE, DMA_FROM_DEVICE);
- __free_page(replace_buf->data);
- }
- }
-}
-
static void qede_free_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
{
- qede_free_sge_mem(edev, rxq);
-
/* Free rx buffers */
qede_free_rx_buffers(edev, rxq);
@@ -1231,45 +1215,15 @@
edev->ops->common->chain_free(edev->cdev, &rxq->rx_comp_ring);
}
-static int qede_alloc_sge_mem(struct qede_dev *edev, struct qede_rx_queue *rxq)
+static void qede_set_tpa_param(struct qede_rx_queue *rxq)
{
- dma_addr_t mapping;
int i;
- if (edev->gro_disable)
- return 0;
-
for (i = 0; i < ETH_TPA_MAX_AGGS_NUM; i++) {
struct qede_agg_info *tpa_info = &rxq->tpa_info[i];
- struct sw_rx_data *replace_buf = &tpa_info->buffer;
- replace_buf->data = alloc_pages(GFP_ATOMIC, 0);
- if (unlikely(!replace_buf->data)) {
- DP_NOTICE(edev,
- "Failed to allocate TPA skb pool [replacement buffer]\n");
- goto err;
- }
-
- mapping = dma_map_page(&edev->pdev->dev, replace_buf->data, 0,
- PAGE_SIZE, DMA_FROM_DEVICE);
- if (unlikely(dma_mapping_error(&edev->pdev->dev, mapping))) {
- DP_NOTICE(edev,
- "Failed to map TPA replacement buffer\n");
- goto err;
- }
-
- replace_buf->mapping = mapping;
- tpa_info->buffer.page_offset = 0;
- tpa_info->buffer_mapping = mapping;
tpa_info->state = QEDE_AGG_STATE_NONE;
}
-
- return 0;
-err:
- qede_free_sge_mem(edev, rxq);
- edev->gro_disable = 1;
- edev->ndev->features &= ~NETIF_F_GRO_HW;
- return -ENOMEM;
}
/* This function allocates all memory needed per Rx queue */
@@ -1280,19 +1234,24 @@
rxq->num_rx_buffers = edev->q_num_rx_buffers;
rxq->rx_buf_size = NET_IP_ALIGN + ETH_OVERHEAD + edev->ndev->mtu;
- rxq->rx_headroom = edev->xdp_prog ? XDP_PACKET_HEADROOM : 0;
+
+ rxq->rx_headroom = edev->xdp_prog ? XDP_PACKET_HEADROOM : NET_SKB_PAD;
+ size = rxq->rx_headroom +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
/* Make sure that the headroom and payload fit in a single page */
- if (rxq->rx_buf_size + rxq->rx_headroom > PAGE_SIZE)
- rxq->rx_buf_size = PAGE_SIZE - rxq->rx_headroom;
+ if (rxq->rx_buf_size + size > PAGE_SIZE)
+ rxq->rx_buf_size = PAGE_SIZE - size;
- /* Segment size to spilt a page in multiple equal parts,
+ /* Segment size to spilt a page in multiple equal parts ,
* unless XDP is used in which case we'd use the entire page.
*/
- if (!edev->xdp_prog)
- rxq->rx_buf_seg_size = roundup_pow_of_two(rxq->rx_buf_size);
- else
+ if (!edev->xdp_prog) {
+ size = size + rxq->rx_buf_size;
+ rxq->rx_buf_seg_size = roundup_pow_of_two(size);
+ } else {
rxq->rx_buf_seg_size = PAGE_SIZE;
+ }
/* Allocate the parallel driver ring for Rx buffers */
size = sizeof(*rxq->sw_rx_ring) * RX_RING_SIZE;
@@ -1336,7 +1295,8 @@
}
}
- rc = qede_alloc_sge_mem(edev, rxq);
+ if (!edev->gro_disable)
+ qede_set_tpa_param(rxq);
err:
return rc;
}
@@ -1927,7 +1887,7 @@
vport_update_params->update_vport_active_flg = 1;
vport_update_params->vport_active_flg = 1;
- if ((qed_info->mf_mode == QED_MF_NPAR || pci_num_vf(edev->pdev)) &&
+ if ((qed_info->b_inter_pf_switch || pci_num_vf(edev->pdev)) &&
qed_info->tx_switching) {
vport_update_params->update_tx_switching_flg = 1;
vport_update_params->tx_switching_flg = 1;
@@ -2177,3 +2137,99 @@
}
}
}
+
+static bool qede_is_txq_full(struct qede_dev *edev, struct qede_tx_queue *txq)
+{
+ struct netdev_queue *netdev_txq;
+
+ netdev_txq = netdev_get_tx_queue(edev->ndev, txq->index);
+ if (netif_xmit_stopped(netdev_txq))
+ return true;
+
+ return false;
+}
+
+static void qede_get_generic_tlv_data(void *dev, struct qed_generic_tlvs *data)
+{
+ struct qede_dev *edev = dev;
+ struct netdev_hw_addr *ha;
+ int i;
+
+ if (edev->ndev->features & NETIF_F_IP_CSUM)
+ data->feat_flags |= QED_TLV_IP_CSUM;
+ if (edev->ndev->features & NETIF_F_TSO)
+ data->feat_flags |= QED_TLV_LSO;
+
+ ether_addr_copy(data->mac[0], edev->ndev->dev_addr);
+ memset(data->mac[1], 0, ETH_ALEN);
+ memset(data->mac[2], 0, ETH_ALEN);
+ /* Copy the first two UC macs */
+ netif_addr_lock_bh(edev->ndev);
+ i = 1;
+ netdev_for_each_uc_addr(ha, edev->ndev) {
+ ether_addr_copy(data->mac[i++], ha->addr);
+ if (i == QED_TLV_MAC_COUNT)
+ break;
+ }
+
+ netif_addr_unlock_bh(edev->ndev);
+}
+
+static void qede_get_eth_tlv_data(void *dev, void *data)
+{
+ struct qed_mfw_tlv_eth *etlv = data;
+ struct qede_dev *edev = dev;
+ struct qede_fastpath *fp;
+ int i;
+
+ etlv->lso_maxoff_size = 0XFFFF;
+ etlv->lso_maxoff_size_set = true;
+ etlv->lso_minseg_size = (u16)ETH_TX_LSO_WINDOW_MIN_LEN;
+ etlv->lso_minseg_size_set = true;
+ etlv->prom_mode = !!(edev->ndev->flags & IFF_PROMISC);
+ etlv->prom_mode_set = true;
+ etlv->tx_descr_size = QEDE_TSS_COUNT(edev);
+ etlv->tx_descr_size_set = true;
+ etlv->rx_descr_size = QEDE_RSS_COUNT(edev);
+ etlv->rx_descr_size_set = true;
+ etlv->iov_offload = QED_MFW_TLV_IOV_OFFLOAD_VEB;
+ etlv->iov_offload_set = true;
+
+ /* Fill information regarding queues; Should be done under the qede
+ * lock to guarantee those don't change beneath our feet.
+ */
+ etlv->txqs_empty = true;
+ etlv->rxqs_empty = true;
+ etlv->num_txqs_full = 0;
+ etlv->num_rxqs_full = 0;
+
+ __qede_lock(edev);
+ for_each_queue(i) {
+ fp = &edev->fp_array[i];
+ if (fp->type & QEDE_FASTPATH_TX) {
+ if (fp->txq->sw_tx_cons != fp->txq->sw_tx_prod)
+ etlv->txqs_empty = false;
+ if (qede_is_txq_full(edev, fp->txq))
+ etlv->num_txqs_full++;
+ }
+ if (fp->type & QEDE_FASTPATH_RX) {
+ if (qede_has_rx_work(fp->rxq))
+ etlv->rxqs_empty = false;
+
+ /* This one is a bit tricky; Firmware might stop
+ * placing packets if ring is not yet full.
+ * Give an approximation.
+ */
+ if (le16_to_cpu(*fp->rxq->hw_cons_ptr) -
+ qed_chain_get_cons_idx(&fp->rxq->rx_comp_ring) >
+ RX_RING_SIZE - 100)
+ etlv->num_rxqs_full++;
+ }
+ }
+ __qede_unlock(edev);
+
+ etlv->txqs_empty_set = true;
+ etlv->rxqs_empty_set = true;
+ etlv->num_txqs_full_set = true;
+ etlv->num_rxqs_full_set = true;
+}
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index d5a32b7..031f6e6 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -683,10 +683,11 @@
struct emac_tx_queue *tx_q)
{
struct emac_ring_header *ring_header = &adpt->ring_header;
+ int node = dev_to_node(adpt->netdev->dev.parent);
size_t size;
size = sizeof(struct emac_buffer) * tx_q->tpd.count;
- tx_q->tpd.tpbuff = kzalloc(size, GFP_KERNEL);
+ tx_q->tpd.tpbuff = kzalloc_node(size, GFP_KERNEL, node);
if (!tx_q->tpd.tpbuff)
return -ENOMEM;
@@ -723,11 +724,12 @@
static int emac_rx_descs_alloc(struct emac_adapter *adpt)
{
struct emac_ring_header *ring_header = &adpt->ring_header;
+ int node = dev_to_node(adpt->netdev->dev.parent);
struct emac_rx_queue *rx_q = &adpt->rx_q;
size_t size;
size = sizeof(struct emac_buffer) * rx_q->rfd.count;
- rx_q->rfd.rfbuff = kzalloc(size, GFP_KERNEL);
+ rx_q->rfd.rfbuff = kzalloc_node(size, GFP_KERNEL, node);
if (!rx_q->rfd.rfbuff)
return -ENOMEM;
@@ -920,14 +922,13 @@
static void emac_adjust_link(struct net_device *netdev)
{
struct emac_adapter *adpt = netdev_priv(netdev);
- struct emac_sgmii *sgmii = &adpt->phy;
struct phy_device *phydev = netdev->phydev;
if (phydev->link) {
emac_mac_start(adpt);
- sgmii->link_up(adpt);
+ emac_sgmii_link_change(adpt, true);
} else {
- sgmii->link_down(adpt);
+ emac_sgmii_link_change(adpt, false);
emac_mac_stop(adpt);
}
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
index e8ab512..562420b 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
@@ -53,6 +53,46 @@
#define SERDES_START_WAIT_TIMES 100
+int emac_sgmii_init(struct emac_adapter *adpt)
+{
+ if (!(adpt->phy.sgmii_ops && adpt->phy.sgmii_ops->init))
+ return 0;
+
+ return adpt->phy.sgmii_ops->init(adpt);
+}
+
+int emac_sgmii_open(struct emac_adapter *adpt)
+{
+ if (!(adpt->phy.sgmii_ops && adpt->phy.sgmii_ops->open))
+ return 0;
+
+ return adpt->phy.sgmii_ops->open(adpt);
+}
+
+void emac_sgmii_close(struct emac_adapter *adpt)
+{
+ if (!(adpt->phy.sgmii_ops && adpt->phy.sgmii_ops->close))
+ return;
+
+ adpt->phy.sgmii_ops->close(adpt);
+}
+
+int emac_sgmii_link_change(struct emac_adapter *adpt, bool link_state)
+{
+ if (!(adpt->phy.sgmii_ops && adpt->phy.sgmii_ops->link_change))
+ return 0;
+
+ return adpt->phy.sgmii_ops->link_change(adpt, link_state);
+}
+
+void emac_sgmii_reset(struct emac_adapter *adpt)
+{
+ if (!(adpt->phy.sgmii_ops && adpt->phy.sgmii_ops->reset))
+ return;
+
+ adpt->phy.sgmii_ops->reset(adpt);
+}
+
/* Initialize the SGMII link between the internal and external PHYs. */
static void emac_sgmii_link_init(struct emac_adapter *adpt)
{
@@ -163,21 +203,21 @@
msleep(50);
}
-void emac_sgmii_reset(struct emac_adapter *adpt)
+static void emac_sgmii_common_reset(struct emac_adapter *adpt)
{
int ret;
emac_sgmii_reset_prepare(adpt);
emac_sgmii_link_init(adpt);
- ret = adpt->phy.initialize(adpt);
+ ret = emac_sgmii_init(adpt);
if (ret)
netdev_err(adpt->netdev,
"could not reinitialize internal PHY (error=%i)\n",
ret);
}
-static int emac_sgmii_open(struct emac_adapter *adpt)
+static int emac_sgmii_common_open(struct emac_adapter *adpt)
{
struct emac_sgmii *sgmii = &adpt->phy;
int ret;
@@ -201,43 +241,53 @@
return 0;
}
-static int emac_sgmii_close(struct emac_adapter *adpt)
+static void emac_sgmii_common_close(struct emac_adapter *adpt)
{
struct emac_sgmii *sgmii = &adpt->phy;
/* Make sure interrupts are disabled */
writel(0, sgmii->base + EMAC_SGMII_PHY_INTERRUPT_MASK);
free_irq(sgmii->irq, adpt);
-
- return 0;
}
/* The error interrupts are only valid after the link is up */
-static int emac_sgmii_link_up(struct emac_adapter *adpt)
+static int emac_sgmii_common_link_change(struct emac_adapter *adpt, bool linkup)
{
struct emac_sgmii *sgmii = &adpt->phy;
int ret;
- /* Clear and enable interrupts */
- ret = emac_sgmii_irq_clear(adpt, 0xff);
- if (ret)
- return ret;
+ if (linkup) {
+ /* Clear and enable interrupts */
+ ret = emac_sgmii_irq_clear(adpt, 0xff);
+ if (ret)
+ return ret;
- writel(SGMII_ISR_MASK, sgmii->base + EMAC_SGMII_PHY_INTERRUPT_MASK);
+ writel(SGMII_ISR_MASK,
+ sgmii->base + EMAC_SGMII_PHY_INTERRUPT_MASK);
+ } else {
+ /* Disable interrupts */
+ writel(0, sgmii->base + EMAC_SGMII_PHY_INTERRUPT_MASK);
+ synchronize_irq(sgmii->irq);
+ }
return 0;
}
-static int emac_sgmii_link_down(struct emac_adapter *adpt)
-{
- struct emac_sgmii *sgmii = &adpt->phy;
+static struct sgmii_ops qdf2432_ops = {
+ .init = emac_sgmii_init_qdf2432,
+ .open = emac_sgmii_common_open,
+ .close = emac_sgmii_common_close,
+ .link_change = emac_sgmii_common_link_change,
+ .reset = emac_sgmii_common_reset,
+};
- /* Disable interrupts */
- writel(0, sgmii->base + EMAC_SGMII_PHY_INTERRUPT_MASK);
- synchronize_irq(sgmii->irq);
-
- return 0;
-}
+static struct sgmii_ops qdf2400_ops = {
+ .init = emac_sgmii_init_qdf2400,
+ .open = emac_sgmii_common_open,
+ .close = emac_sgmii_common_close,
+ .link_change = emac_sgmii_common_link_change,
+ .reset = emac_sgmii_common_reset,
+};
static int emac_sgmii_acpi_match(struct device *dev, void *data)
{
@@ -249,7 +299,7 @@
{}
};
const struct acpi_device_id *id = acpi_match_device(match_table, dev);
- emac_sgmii_function *initialize = data;
+ struct sgmii_ops **ops = data;
if (id) {
acpi_handle handle = ACPI_HANDLE(dev);
@@ -270,10 +320,10 @@
switch (hrv) {
case 1:
- *initialize = emac_sgmii_init_qdf2432;
+ *ops = &qdf2432_ops;
return 1;
case 2:
- *initialize = emac_sgmii_init_qdf2400;
+ *ops = &qdf2400_ops;
return 1;
}
}
@@ -294,14 +344,6 @@
{}
};
-/* Dummy function for systems without an internal PHY. This avoids having
- * to check for NULL pointers before calling the functions.
- */
-static int emac_sgmii_dummy(struct emac_adapter *adpt)
-{
- return 0;
-}
-
int emac_sgmii_config(struct platform_device *pdev, struct emac_adapter *adpt)
{
struct platform_device *sgmii_pdev = NULL;
@@ -312,22 +354,11 @@
if (has_acpi_companion(&pdev->dev)) {
struct device *dev;
- dev = device_find_child(&pdev->dev, &phy->initialize,
+ dev = device_find_child(&pdev->dev, &phy->sgmii_ops,
emac_sgmii_acpi_match);
if (!dev) {
dev_warn(&pdev->dev, "cannot find internal phy node\n");
- /* There is typically no internal PHY on emulation
- * systems, so if we can't find the node, assume
- * we are on an emulation system and stub-out
- * support for the internal PHY. These systems only
- * use ACPI.
- */
- phy->open = emac_sgmii_dummy;
- phy->close = emac_sgmii_dummy;
- phy->link_up = emac_sgmii_dummy;
- phy->link_down = emac_sgmii_dummy;
-
return 0;
}
@@ -355,14 +386,9 @@
goto error_put_device;
}
- phy->initialize = (emac_sgmii_function)match->data;
+ phy->sgmii_ops->init = match->data;
}
- phy->open = emac_sgmii_open;
- phy->close = emac_sgmii_close;
- phy->link_up = emac_sgmii_link_up;
- phy->link_down = emac_sgmii_link_down;
-
/* Base address is the first address */
res = platform_get_resource(sgmii_pdev, IORESOURCE_MEM, 0);
if (!res) {
@@ -386,7 +412,7 @@
}
}
- ret = phy->initialize(adpt);
+ ret = emac_sgmii_init(adpt);
if (ret)
goto error;
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
index e7c0c3b..31ba21e 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
+++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
@@ -16,36 +16,44 @@
struct emac_adapter;
struct platform_device;
-typedef int (*emac_sgmii_function)(struct emac_adapter *adpt);
+/** emac_sgmii - internal emac phy
+ * @init initialization function
+ * @open called when the driver is opened
+ * @close called when the driver is closed
+ * @link_change called when the link state changes
+ */
+struct sgmii_ops {
+ int (*init)(struct emac_adapter *adpt);
+ int (*open)(struct emac_adapter *adpt);
+ void (*close)(struct emac_adapter *adpt);
+ int (*link_change)(struct emac_adapter *adpt, bool link_state);
+ void (*reset)(struct emac_adapter *adpt);
+};
/** emac_sgmii - internal emac phy
* @base base address
* @digital per-lane digital block
* @irq the interrupt number
* @decode_error_count reference count of consecutive decode errors
- * @initialize initialization function
- * @open called when the driver is opened
- * @close called when the driver is closed
- * @link_up called when the link comes up
- * @link_down called when the link comes down
+ * @sgmii_ops sgmii ops
*/
struct emac_sgmii {
void __iomem *base;
void __iomem *digital;
unsigned int irq;
atomic_t decode_error_count;
- emac_sgmii_function initialize;
- emac_sgmii_function open;
- emac_sgmii_function close;
- emac_sgmii_function link_up;
- emac_sgmii_function link_down;
+ struct sgmii_ops *sgmii_ops;
};
int emac_sgmii_config(struct platform_device *pdev, struct emac_adapter *adpt);
-void emac_sgmii_reset(struct emac_adapter *adpt);
int emac_sgmii_init_fsm9900(struct emac_adapter *adpt);
int emac_sgmii_init_qdf2432(struct emac_adapter *adpt);
int emac_sgmii_init_qdf2400(struct emac_adapter *adpt);
+int emac_sgmii_init(struct emac_adapter *adpt);
+int emac_sgmii_open(struct emac_adapter *adpt);
+void emac_sgmii_close(struct emac_adapter *adpt);
+int emac_sgmii_link_change(struct emac_adapter *adpt, bool link_state);
+void emac_sgmii_reset(struct emac_adapter *adpt);
#endif
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index 13235ba..2a0cbc5 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -253,7 +253,7 @@
return ret;
}
- ret = adpt->phy.open(adpt);
+ ret = emac_sgmii_open(adpt);
if (ret) {
emac_mac_rx_tx_rings_free_all(adpt);
free_irq(irq->irq, irq);
@@ -264,7 +264,7 @@
if (ret) {
emac_mac_rx_tx_rings_free_all(adpt);
free_irq(irq->irq, irq);
- adpt->phy.close(adpt);
+ emac_sgmii_close(adpt);
return ret;
}
@@ -278,7 +278,7 @@
mutex_lock(&adpt->reset_lock);
- adpt->phy.close(adpt);
+ emac_sgmii_close(adpt);
emac_mac_down(adpt);
emac_mac_rx_tx_rings_free_all(adpt);
@@ -761,11 +761,10 @@
{
struct net_device *netdev = dev_get_drvdata(&pdev->dev);
struct emac_adapter *adpt = netdev_priv(netdev);
- struct emac_sgmii *sgmii = &adpt->phy;
if (netdev->flags & IFF_UP) {
/* Closing the SGMII turns off its interrupts */
- sgmii->close(adpt);
+ emac_sgmii_close(adpt);
/* Resetting the MAC turns off all DMA and its interrupts */
emac_mac_reset(adpt);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index 0b5b5da..34ac45a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -54,11 +54,24 @@
struct u64_stats_sync syncp;
};
+struct rmnet_priv_stats {
+ u64 csum_ok;
+ u64 csum_valid_unset;
+ u64 csum_validation_failed;
+ u64 csum_err_bad_buffer;
+ u64 csum_err_invalid_ip_version;
+ u64 csum_err_invalid_transport;
+ u64 csum_fragmented_pkt;
+ u64 csum_skipped;
+ u64 csum_sw;
+};
+
struct rmnet_priv {
u8 mux_id;
struct net_device *real_dev;
struct rmnet_pcpu_stats __percpu *pcpu_stats;
struct gro_cells gro_cells;
+ struct rmnet_priv_stats stats;
};
struct rmnet_port *rmnet_get_port(struct net_device *real_dev);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 6fcd586..7fd86d4 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -148,7 +148,7 @@
if (skb_headroom(skb) < required_headroom) {
if (pskb_expand_head(skb, required_headroom, 0, GFP_KERNEL))
- goto fail;
+ return -ENOMEM;
}
if (port->data_format & RMNET_FLAGS_EGRESS_MAP_CKSUMV4)
@@ -156,17 +156,13 @@
map_header = rmnet_map_add_map_header(skb, additional_header_len, 0);
if (!map_header)
- goto fail;
+ return -ENOMEM;
map_header->mux_id = mux_id;
skb->protocol = htons(ETH_P_MAP);
return 0;
-
-fail:
- kfree_skb(skb);
- return -ENOMEM;
}
static void
@@ -228,15 +224,18 @@
mux_id = priv->mux_id;
port = rmnet_get_port(skb->dev);
- if (!port) {
- kfree_skb(skb);
- return;
- }
+ if (!port)
+ goto drop;
if (rmnet_map_egress_handler(skb, port, mux_id, orig_dev))
- return;
+ goto drop;
rmnet_vnd_tx_fixup(skb, orig_dev);
dev_queue_xmit(skb);
+ return;
+
+drop:
+ this_cpu_inc(priv->pcpu_stats->stats.tx_drops);
+ kfree_skb(skb);
}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
index 78fdad0..56a93df 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -69,17 +69,9 @@
struct rmnet_map_control_command *cmd;
int xmit_status;
- if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4) {
- if (skb->len < sizeof(struct rmnet_map_header) +
- RMNET_MAP_GET_LENGTH(skb) +
- sizeof(struct rmnet_map_dl_csum_trailer)) {
- kfree_skb(skb);
- return;
- }
-
- skb_trim(skb, skb->len -
- sizeof(struct rmnet_map_dl_csum_trailer));
- }
+ if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4)
+ skb_trim(skb,
+ skb->len - sizeof(struct rmnet_map_dl_csum_trailer));
skb->protocol = htons(ETH_P_MAP);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index a6ea094..57a9c31 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -48,7 +48,8 @@
static int
rmnet_map_ipv4_dl_csum_trailer(struct sk_buff *skb,
- struct rmnet_map_dl_csum_trailer *csum_trailer)
+ struct rmnet_map_dl_csum_trailer *csum_trailer,
+ struct rmnet_priv *priv)
{
__sum16 *csum_field, csum_temp, pseudo_csum, hdr_csum, ip_payload_csum;
u16 csum_value, csum_value_final;
@@ -58,19 +59,25 @@
ip4h = (struct iphdr *)(skb->data);
if ((ntohs(ip4h->frag_off) & IP_MF) ||
- ((ntohs(ip4h->frag_off) & IP_OFFSET) > 0))
+ ((ntohs(ip4h->frag_off) & IP_OFFSET) > 0)) {
+ priv->stats.csum_fragmented_pkt++;
return -EOPNOTSUPP;
+ }
txporthdr = skb->data + ip4h->ihl * 4;
csum_field = rmnet_map_get_csum_field(ip4h->protocol, txporthdr);
- if (!csum_field)
+ if (!csum_field) {
+ priv->stats.csum_err_invalid_transport++;
return -EPROTONOSUPPORT;
+ }
/* RFC 768 - Skip IPv4 UDP packets where sender checksum field is 0 */
- if (*csum_field == 0 && ip4h->protocol == IPPROTO_UDP)
+ if (*csum_field == 0 && ip4h->protocol == IPPROTO_UDP) {
+ priv->stats.csum_skipped++;
return 0;
+ }
csum_value = ~ntohs(csum_trailer->csum_value);
hdr_csum = ~ip_fast_csum(ip4h, (int)ip4h->ihl);
@@ -102,16 +109,20 @@
}
}
- if (csum_value_final == ntohs((__force __be16)*csum_field))
+ if (csum_value_final == ntohs((__force __be16)*csum_field)) {
+ priv->stats.csum_ok++;
return 0;
- else
+ } else {
+ priv->stats.csum_validation_failed++;
return -EINVAL;
+ }
}
#if IS_ENABLED(CONFIG_IPV6)
static int
rmnet_map_ipv6_dl_csum_trailer(struct sk_buff *skb,
- struct rmnet_map_dl_csum_trailer *csum_trailer)
+ struct rmnet_map_dl_csum_trailer *csum_trailer,
+ struct rmnet_priv *priv)
{
__sum16 *csum_field, ip6_payload_csum, pseudo_csum, csum_temp;
u16 csum_value, csum_value_final;
@@ -125,8 +136,10 @@
txporthdr = skb->data + sizeof(struct ipv6hdr);
csum_field = rmnet_map_get_csum_field(ip6h->nexthdr, txporthdr);
- if (!csum_field)
+ if (!csum_field) {
+ priv->stats.csum_err_invalid_transport++;
return -EPROTONOSUPPORT;
+ }
csum_value = ~ntohs(csum_trailer->csum_value);
ip6_hdr_csum = (__force __be16)
@@ -164,10 +177,13 @@
}
}
- if (csum_value_final == ntohs((__force __be16)*csum_field))
+ if (csum_value_final == ntohs((__force __be16)*csum_field)) {
+ priv->stats.csum_ok++;
return 0;
- else
+ } else {
+ priv->stats.csum_validation_failed++;
return -EINVAL;
+ }
}
#endif
@@ -339,24 +355,34 @@
*/
int rmnet_map_checksum_downlink_packet(struct sk_buff *skb, u16 len)
{
+ struct rmnet_priv *priv = netdev_priv(skb->dev);
struct rmnet_map_dl_csum_trailer *csum_trailer;
- if (unlikely(!(skb->dev->features & NETIF_F_RXCSUM)))
+ if (unlikely(!(skb->dev->features & NETIF_F_RXCSUM))) {
+ priv->stats.csum_sw++;
return -EOPNOTSUPP;
+ }
csum_trailer = (struct rmnet_map_dl_csum_trailer *)(skb->data + len);
- if (!csum_trailer->valid)
+ if (!csum_trailer->valid) {
+ priv->stats.csum_valid_unset++;
return -EINVAL;
+ }
- if (skb->protocol == htons(ETH_P_IP))
- return rmnet_map_ipv4_dl_csum_trailer(skb, csum_trailer);
- else if (skb->protocol == htons(ETH_P_IPV6))
+ if (skb->protocol == htons(ETH_P_IP)) {
+ return rmnet_map_ipv4_dl_csum_trailer(skb, csum_trailer, priv);
+ } else if (skb->protocol == htons(ETH_P_IPV6)) {
#if IS_ENABLED(CONFIG_IPV6)
- return rmnet_map_ipv6_dl_csum_trailer(skb, csum_trailer);
+ return rmnet_map_ipv6_dl_csum_trailer(skb, csum_trailer, priv);
#else
+ priv->stats.csum_err_invalid_ip_version++;
return -EPROTONOSUPPORT;
#endif
+ } else {
+ priv->stats.csum_err_invalid_ip_version++;
+ return -EPROTONOSUPPORT;
+ }
return 0;
}
@@ -367,6 +393,7 @@
void rmnet_map_checksum_uplink_packet(struct sk_buff *skb,
struct net_device *orig_dev)
{
+ struct rmnet_priv *priv = netdev_priv(orig_dev);
struct rmnet_map_ul_csum_header *ul_header;
void *iphdr;
@@ -389,8 +416,11 @@
rmnet_map_ipv6_ul_csum_header(iphdr, ul_header, skb);
return;
#else
+ priv->stats.csum_err_invalid_ip_version++;
goto sw_csum;
#endif
+ } else {
+ priv->stats.csum_err_invalid_ip_version++;
}
}
@@ -399,4 +429,6 @@
ul_header->csum_insert_offset = 0;
ul_header->csum_enabled = 0;
ul_header->udp_ip4_ind = 0;
+
+ priv->stats.csum_sw++;
}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 2ea16a0..cb02e1a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -152,6 +152,56 @@
.ndo_get_stats64 = rmnet_get_stats64,
};
+static const char rmnet_gstrings_stats[][ETH_GSTRING_LEN] = {
+ "Checksum ok",
+ "Checksum valid bit not set",
+ "Checksum validation failed",
+ "Checksum error bad buffer",
+ "Checksum error bad ip version",
+ "Checksum error bad transport",
+ "Checksum skipped on ip fragment",
+ "Checksum skipped",
+ "Checksum computed in software",
+};
+
+static void rmnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+ switch (stringset) {
+ case ETH_SS_STATS:
+ memcpy(buf, &rmnet_gstrings_stats,
+ sizeof(rmnet_gstrings_stats));
+ break;
+ }
+}
+
+static int rmnet_get_sset_count(struct net_device *dev, int sset)
+{
+ switch (sset) {
+ case ETH_SS_STATS:
+ return ARRAY_SIZE(rmnet_gstrings_stats);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static void rmnet_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *stats, u64 *data)
+{
+ struct rmnet_priv *priv = netdev_priv(dev);
+ struct rmnet_priv_stats *st = &priv->stats;
+
+ if (!data)
+ return;
+
+ memcpy(data, st, ARRAY_SIZE(rmnet_gstrings_stats) * sizeof(u64));
+}
+
+static const struct ethtool_ops rmnet_ethtool_ops = {
+ .get_ethtool_stats = rmnet_get_ethtool_stats,
+ .get_strings = rmnet_get_strings,
+ .get_sset_count = rmnet_get_sset_count,
+};
+
/* Called by kernel whenever a new rmnet<n> device is created. Sets MTU,
* flags, ARP type, needed headroom, etc...
*/
@@ -170,6 +220,7 @@
rmnet_dev->flags &= ~(IFF_BROADCAST | IFF_MULTICAST);
rmnet_dev->needs_free_netdev = true;
+ rmnet_dev->ethtool_ops = &rmnet_ethtool_ops;
}
/* Exposed API */
diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
index d118da5..ffd68a7 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -1104,7 +1104,6 @@
return 0;
err_out:
- netif_napi_del(&tp->napi);
__rtl8139_cleanup_dev (dev);
pci_disable_device (pdev);
return i;
@@ -1119,7 +1118,6 @@
assert (dev != NULL);
cancel_delayed_work_sync(&tp->thread);
- netif_napi_del(&tp->napi);
unregister_netdev (dev);
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index c7aac1f..75dfac0 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -84,12 +84,11 @@
The RTL chips use a 64 element hash table based on the Ethernet CRC. */
static const int multicast_filter_limit = 32;
-#define MAX_READ_REQUEST_SHIFT 12
#define TX_DMA_BURST 7 /* Maximum PCI burst, '7' is unlimited */
#define InterFrameGap 0x03 /* 3 means InterFrameGap = the shortest one */
#define R8169_REGS_SIZE 256
-#define R8169_NAPI_WEIGHT 64
+#define R8169_RX_BUF_SIZE (SZ_16K - 1)
#define NUM_TX_DESC 64 /* Number of Tx descriptor registers */
#define NUM_RX_DESC 256U /* Number of Rx descriptor registers */
#define R8169_TX_RING_BYTES (NUM_TX_DESC * sizeof(struct TxDesc))
@@ -172,12 +171,11 @@
#define JUMBO_7K (7*1024 - ETH_HLEN - 2)
#define JUMBO_9K (9*1024 - ETH_HLEN - 2)
-#define _R(NAME,TD,FW,SZ,B) { \
+#define _R(NAME,TD,FW,SZ) { \
.name = NAME, \
.txd_version = TD, \
.fw_name = FW, \
.jumbo_max = SZ, \
- .jumbo_tx_csum = B \
}
static const struct {
@@ -185,135 +183,111 @@
enum rtl_tx_desc_version txd_version;
const char *fw_name;
u16 jumbo_max;
- bool jumbo_tx_csum;
} rtl_chip_infos[] = {
/* PCI devices. */
[RTL_GIGA_MAC_VER_01] =
- _R("RTL8169", RTL_TD_0, NULL, JUMBO_7K, true),
+ _R("RTL8169", RTL_TD_0, NULL, JUMBO_7K),
[RTL_GIGA_MAC_VER_02] =
- _R("RTL8169s", RTL_TD_0, NULL, JUMBO_7K, true),
+ _R("RTL8169s", RTL_TD_0, NULL, JUMBO_7K),
[RTL_GIGA_MAC_VER_03] =
- _R("RTL8110s", RTL_TD_0, NULL, JUMBO_7K, true),
+ _R("RTL8110s", RTL_TD_0, NULL, JUMBO_7K),
[RTL_GIGA_MAC_VER_04] =
- _R("RTL8169sb/8110sb", RTL_TD_0, NULL, JUMBO_7K, true),
+ _R("RTL8169sb/8110sb", RTL_TD_0, NULL, JUMBO_7K),
[RTL_GIGA_MAC_VER_05] =
- _R("RTL8169sc/8110sc", RTL_TD_0, NULL, JUMBO_7K, true),
+ _R("RTL8169sc/8110sc", RTL_TD_0, NULL, JUMBO_7K),
[RTL_GIGA_MAC_VER_06] =
- _R("RTL8169sc/8110sc", RTL_TD_0, NULL, JUMBO_7K, true),
+ _R("RTL8169sc/8110sc", RTL_TD_0, NULL, JUMBO_7K),
/* PCI-E devices. */
[RTL_GIGA_MAC_VER_07] =
- _R("RTL8102e", RTL_TD_1, NULL, JUMBO_1K, true),
+ _R("RTL8102e", RTL_TD_1, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_08] =
- _R("RTL8102e", RTL_TD_1, NULL, JUMBO_1K, true),
+ _R("RTL8102e", RTL_TD_1, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_09] =
- _R("RTL8102e", RTL_TD_1, NULL, JUMBO_1K, true),
+ _R("RTL8102e", RTL_TD_1, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_10] =
- _R("RTL8101e", RTL_TD_0, NULL, JUMBO_1K, true),
+ _R("RTL8101e", RTL_TD_0, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_11] =
- _R("RTL8168b/8111b", RTL_TD_0, NULL, JUMBO_4K, false),
+ _R("RTL8168b/8111b", RTL_TD_0, NULL, JUMBO_4K),
[RTL_GIGA_MAC_VER_12] =
- _R("RTL8168b/8111b", RTL_TD_0, NULL, JUMBO_4K, false),
+ _R("RTL8168b/8111b", RTL_TD_0, NULL, JUMBO_4K),
[RTL_GIGA_MAC_VER_13] =
- _R("RTL8101e", RTL_TD_0, NULL, JUMBO_1K, true),
+ _R("RTL8101e", RTL_TD_0, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_14] =
- _R("RTL8100e", RTL_TD_0, NULL, JUMBO_1K, true),
+ _R("RTL8100e", RTL_TD_0, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_15] =
- _R("RTL8100e", RTL_TD_0, NULL, JUMBO_1K, true),
+ _R("RTL8100e", RTL_TD_0, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_16] =
- _R("RTL8101e", RTL_TD_0, NULL, JUMBO_1K, true),
+ _R("RTL8101e", RTL_TD_0, NULL, JUMBO_1K),
[RTL_GIGA_MAC_VER_17] =
- _R("RTL8168b/8111b", RTL_TD_0, NULL, JUMBO_4K, false),
+ _R("RTL8168b/8111b", RTL_TD_0, NULL, JUMBO_4K),
[RTL_GIGA_MAC_VER_18] =
- _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K),
[RTL_GIGA_MAC_VER_19] =
- _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K),
[RTL_GIGA_MAC_VER_20] =
- _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K),
[RTL_GIGA_MAC_VER_21] =
- _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K),
[RTL_GIGA_MAC_VER_22] =
- _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K),
[RTL_GIGA_MAC_VER_23] =
- _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K),
[RTL_GIGA_MAC_VER_24] =
- _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K),
[RTL_GIGA_MAC_VER_25] =
- _R("RTL8168d/8111d", RTL_TD_1, FIRMWARE_8168D_1,
- JUMBO_9K, false),
+ _R("RTL8168d/8111d", RTL_TD_1, FIRMWARE_8168D_1, JUMBO_9K),
[RTL_GIGA_MAC_VER_26] =
- _R("RTL8168d/8111d", RTL_TD_1, FIRMWARE_8168D_2,
- JUMBO_9K, false),
+ _R("RTL8168d/8111d", RTL_TD_1, FIRMWARE_8168D_2, JUMBO_9K),
[RTL_GIGA_MAC_VER_27] =
- _R("RTL8168dp/8111dp", RTL_TD_1, NULL, JUMBO_9K, false),
+ _R("RTL8168dp/8111dp", RTL_TD_1, NULL, JUMBO_9K),
[RTL_GIGA_MAC_VER_28] =
- _R("RTL8168dp/8111dp", RTL_TD_1, NULL, JUMBO_9K, false),
+ _R("RTL8168dp/8111dp", RTL_TD_1, NULL, JUMBO_9K),
[RTL_GIGA_MAC_VER_29] =
- _R("RTL8105e", RTL_TD_1, FIRMWARE_8105E_1,
- JUMBO_1K, true),
+ _R("RTL8105e", RTL_TD_1, FIRMWARE_8105E_1, JUMBO_1K),
[RTL_GIGA_MAC_VER_30] =
- _R("RTL8105e", RTL_TD_1, FIRMWARE_8105E_1,
- JUMBO_1K, true),
+ _R("RTL8105e", RTL_TD_1, FIRMWARE_8105E_1, JUMBO_1K),
[RTL_GIGA_MAC_VER_31] =
- _R("RTL8168dp/8111dp", RTL_TD_1, NULL, JUMBO_9K, false),
+ _R("RTL8168dp/8111dp", RTL_TD_1, NULL, JUMBO_9K),
[RTL_GIGA_MAC_VER_32] =
- _R("RTL8168e/8111e", RTL_TD_1, FIRMWARE_8168E_1,
- JUMBO_9K, false),
+ _R("RTL8168e/8111e", RTL_TD_1, FIRMWARE_8168E_1, JUMBO_9K),
[RTL_GIGA_MAC_VER_33] =
- _R("RTL8168e/8111e", RTL_TD_1, FIRMWARE_8168E_2,
- JUMBO_9K, false),
+ _R("RTL8168e/8111e", RTL_TD_1, FIRMWARE_8168E_2, JUMBO_9K),
[RTL_GIGA_MAC_VER_34] =
- _R("RTL8168evl/8111evl",RTL_TD_1, FIRMWARE_8168E_3,
- JUMBO_9K, false),
+ _R("RTL8168evl/8111evl",RTL_TD_1, FIRMWARE_8168E_3, JUMBO_9K),
[RTL_GIGA_MAC_VER_35] =
- _R("RTL8168f/8111f", RTL_TD_1, FIRMWARE_8168F_1,
- JUMBO_9K, false),
+ _R("RTL8168f/8111f", RTL_TD_1, FIRMWARE_8168F_1, JUMBO_9K),
[RTL_GIGA_MAC_VER_36] =
- _R("RTL8168f/8111f", RTL_TD_1, FIRMWARE_8168F_2,
- JUMBO_9K, false),
+ _R("RTL8168f/8111f", RTL_TD_1, FIRMWARE_8168F_2, JUMBO_9K),
[RTL_GIGA_MAC_VER_37] =
- _R("RTL8402", RTL_TD_1, FIRMWARE_8402_1,
- JUMBO_1K, true),
+ _R("RTL8402", RTL_TD_1, FIRMWARE_8402_1, JUMBO_1K),
[RTL_GIGA_MAC_VER_38] =
- _R("RTL8411", RTL_TD_1, FIRMWARE_8411_1,
- JUMBO_9K, false),
+ _R("RTL8411", RTL_TD_1, FIRMWARE_8411_1, JUMBO_9K),
[RTL_GIGA_MAC_VER_39] =
- _R("RTL8106e", RTL_TD_1, FIRMWARE_8106E_1,
- JUMBO_1K, true),
+ _R("RTL8106e", RTL_TD_1, FIRMWARE_8106E_1, JUMBO_1K),
[RTL_GIGA_MAC_VER_40] =
- _R("RTL8168g/8111g", RTL_TD_1, FIRMWARE_8168G_2,
- JUMBO_9K, false),
+ _R("RTL8168g/8111g", RTL_TD_1, FIRMWARE_8168G_2, JUMBO_9K),
[RTL_GIGA_MAC_VER_41] =
- _R("RTL8168g/8111g", RTL_TD_1, NULL, JUMBO_9K, false),
+ _R("RTL8168g/8111g", RTL_TD_1, NULL, JUMBO_9K),
[RTL_GIGA_MAC_VER_42] =
- _R("RTL8168g/8111g", RTL_TD_1, FIRMWARE_8168G_3,
- JUMBO_9K, false),
+ _R("RTL8168g/8111g", RTL_TD_1, FIRMWARE_8168G_3, JUMBO_9K),
[RTL_GIGA_MAC_VER_43] =
- _R("RTL8106e", RTL_TD_1, FIRMWARE_8106E_2,
- JUMBO_1K, true),
+ _R("RTL8106e", RTL_TD_1, FIRMWARE_8106E_2, JUMBO_1K),
[RTL_GIGA_MAC_VER_44] =
- _R("RTL8411", RTL_TD_1, FIRMWARE_8411_2,
- JUMBO_9K, false),
+ _R("RTL8411", RTL_TD_1, FIRMWARE_8411_2, JUMBO_9K),
[RTL_GIGA_MAC_VER_45] =
- _R("RTL8168h/8111h", RTL_TD_1, FIRMWARE_8168H_1,
- JUMBO_9K, false),
+ _R("RTL8168h/8111h", RTL_TD_1, FIRMWARE_8168H_1, JUMBO_9K),
[RTL_GIGA_MAC_VER_46] =
- _R("RTL8168h/8111h", RTL_TD_1, FIRMWARE_8168H_2,
- JUMBO_9K, false),
+ _R("RTL8168h/8111h", RTL_TD_1, FIRMWARE_8168H_2, JUMBO_9K),
[RTL_GIGA_MAC_VER_47] =
- _R("RTL8107e", RTL_TD_1, FIRMWARE_8107E_1,
- JUMBO_1K, false),
+ _R("RTL8107e", RTL_TD_1, FIRMWARE_8107E_1, JUMBO_1K),
[RTL_GIGA_MAC_VER_48] =
- _R("RTL8107e", RTL_TD_1, FIRMWARE_8107E_2,
- JUMBO_1K, false),
+ _R("RTL8107e", RTL_TD_1, FIRMWARE_8107E_2, JUMBO_1K),
[RTL_GIGA_MAC_VER_49] =
- _R("RTL8168ep/8111ep", RTL_TD_1, NULL,
- JUMBO_9K, false),
+ _R("RTL8168ep/8111ep", RTL_TD_1, NULL, JUMBO_9K),
[RTL_GIGA_MAC_VER_50] =
- _R("RTL8168ep/8111ep", RTL_TD_1, NULL,
- JUMBO_9K, false),
+ _R("RTL8168ep/8111ep", RTL_TD_1, NULL, JUMBO_9K),
[RTL_GIGA_MAC_VER_51] =
- _R("RTL8168ep/8111ep", RTL_TD_1, NULL,
- JUMBO_9K, false),
+ _R("RTL8168ep/8111ep", RTL_TD_1, NULL, JUMBO_9K),
};
#undef _R
@@ -345,7 +319,6 @@
MODULE_DEVICE_TABLE(pci, rtl8169_pci_tbl);
-static int rx_buf_sz = 16383;
static int use_dac = -1;
static struct {
u32 msg_enable;
@@ -437,13 +410,8 @@
CSIAR = 0x68,
#define CSIAR_FLAG 0x80000000
#define CSIAR_WRITE_CMD 0x80000000
-#define CSIAR_BYTE_ENABLE 0x0f
-#define CSIAR_BYTE_ENABLE_SHIFT 12
-#define CSIAR_ADDR_MASK 0x0fff
-#define CSIAR_FUNC_CARD 0x00000000
-#define CSIAR_FUNC_SDIO 0x00010000
-#define CSIAR_FUNC_NIC 0x00020000
-#define CSIAR_FUNC_NIC2 0x00010000
+#define CSIAR_BYTE_ENABLE 0x0000f000
+#define CSIAR_ADDR_MASK 0x00000fff
PMCH = 0x6f,
EPHYAR = 0x80,
#define EPHYAR_FLAG 0x80000000
@@ -626,6 +594,7 @@
RxChkSum = (1 << 5),
PCIDAC = (1 << 4),
PCIMulRW = (1 << 3),
+#define INTT_MASK GENMASK(1, 0)
INTT_0 = 0x0000, // 8168
INTT_1 = 0x0001, // 8168
INTT_2 = 0x0002, // 8168
@@ -716,6 +685,7 @@
};
#define RsvdMask 0x3fffc000
+#define CPCMD_QUIRK_MASK (Normal_mode | RxVlan | RxChkSum | INTT_MASK)
struct TxDesc {
__le32 opts1;
@@ -778,7 +748,6 @@
struct net_device *dev;
struct napi_struct napi;
u32 msg_enable;
- u16 txd_version;
u16 mac_version;
u32 cur_rx; /* Index into the Rx descriptor buffer of next Rx pkt. */
u32 cur_tx; /* Index into the Tx descriptor buffer of next Rx pkt. */
@@ -802,26 +771,16 @@
int (*read)(struct rtl8169_private *, int);
} mdio_ops;
- struct pll_power_ops {
- void (*down)(struct rtl8169_private *);
- void (*up)(struct rtl8169_private *);
- } pll_power_ops;
-
struct jumbo_ops {
void (*enable)(struct rtl8169_private *);
void (*disable)(struct rtl8169_private *);
} jumbo_ops;
- struct csi_ops {
- void (*write)(struct rtl8169_private *, int, int);
- u32 (*read)(struct rtl8169_private *, int);
- } csi_ops;
-
int (*set_speed)(struct net_device *, u8 aneg, u16 sp, u8 dpx, u32 adv);
int (*get_link_ksettings)(struct net_device *,
struct ethtool_link_ksettings *);
void (*phy_reset_enable)(struct rtl8169_private *tp);
- void (*hw_start)(struct net_device *);
+ void (*hw_start)(struct rtl8169_private *tp);
unsigned int (*phy_reset_pending)(struct rtl8169_private *tp);
unsigned int (*link_ok)(struct rtl8169_private *tp);
int (*do_ioctl)(struct rtl8169_private *tp, struct mii_ioctl_data *data, int cmd);
@@ -833,14 +792,11 @@
struct work_struct work;
} wk;
- unsigned features;
-
struct mii_if_info mii;
dma_addr_t counters_phys_addr;
struct rtl8169_counters *counters;
struct rtl8169_tc_offsets tc_offset;
u32 saved_wolopts;
- u32 opts1_mask;
struct rtl_fw {
const struct firmware *fw;
@@ -1645,23 +1601,8 @@
if (options & LinkUp)
wolopts |= WAKE_PHY;
switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_34:
- case RTL_GIGA_MAC_VER_35:
- case RTL_GIGA_MAC_VER_36:
- case RTL_GIGA_MAC_VER_37:
- case RTL_GIGA_MAC_VER_38:
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_34 ... RTL_GIGA_MAC_VER_38:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
if (rtl_eri_read(tp, 0xdc, ERIAR_EXGMAC) & MagicPacket_v2)
wolopts |= WAKE_MAGIC;
break;
@@ -1722,23 +1663,8 @@
RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_34:
- case RTL_GIGA_MAC_VER_35:
- case RTL_GIGA_MAC_VER_36:
- case RTL_GIGA_MAC_VER_37:
- case RTL_GIGA_MAC_VER_38:
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_34 ... RTL_GIGA_MAC_VER_38:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
tmp = ARRAY_SIZE(cfg) - 1;
if (wolopts & WAKE_MAGIC)
rtl_w0w1_eri(tp,
@@ -1960,18 +1886,20 @@
features &= ~NETIF_F_ALL_TSO;
if (dev->mtu > JUMBO_1K &&
- !rtl_chip_infos[tp->mac_version].jumbo_tx_csum)
+ tp->mac_version > RTL_GIGA_MAC_VER_06)
features &= ~NETIF_F_IP_CSUM;
return features;
}
-static void __rtl8169_set_features(struct net_device *dev,
- netdev_features_t features)
+static int rtl8169_set_features(struct net_device *dev,
+ netdev_features_t features)
{
struct rtl8169_private *tp = netdev_priv(dev);
u32 rx_config;
+ rtl_lock_work(tp);
+
rx_config = RTL_R32(tp, RxConfig);
if (features & NETIF_F_RXALL)
rx_config |= (AcceptErr | AcceptRunt);
@@ -1990,28 +1918,14 @@
else
tp->cp_cmd &= ~RxVlan;
- tp->cp_cmd |= RTL_R16(tp, CPlusCmd) & ~(RxVlan | RxChkSum);
-
RTL_W16(tp, CPlusCmd, tp->cp_cmd);
RTL_R16(tp, CPlusCmd);
-}
-static int rtl8169_set_features(struct net_device *dev,
- netdev_features_t features)
-{
- struct rtl8169_private *tp = netdev_priv(dev);
-
- features &= NETIF_F_RXALL | NETIF_F_RXCSUM | NETIF_F_HW_VLAN_CTAG_RX;
-
- rtl_lock_work(tp);
- if (features ^ dev->features)
- __rtl8169_set_features(dev, features);
rtl_unlock_work(tp);
return 0;
}
-
static inline u32 rtl8169_tx_vlan_tag(struct sk_buff *skb)
{
return (skb_vlan_tag_present(skb)) ?
@@ -2155,9 +2069,8 @@
return RTL_R32(tp, CounterAddrLow) & (CounterReset | CounterDump);
}
-static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd)
+static bool rtl8169_do_counters(struct rtl8169_private *tp, u32 counter_cmd)
{
- struct rtl8169_private *tp = netdev_priv(dev);
dma_addr_t paddr = tp->counters_phys_addr;
u32 cmd;
@@ -2170,10 +2083,8 @@
return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
}
-static bool rtl8169_reset_counters(struct net_device *dev)
+static bool rtl8169_reset_counters(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
-
/*
* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the
* tally counters.
@@ -2181,13 +2092,11 @@
if (tp->mac_version < RTL_GIGA_MAC_VER_19)
return true;
- return rtl8169_do_counters(dev, CounterReset);
+ return rtl8169_do_counters(tp, CounterReset);
}
-static bool rtl8169_update_counters(struct net_device *dev)
+static bool rtl8169_update_counters(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
-
/*
* Some chips are unable to dump tally counters when the receiver
* is disabled.
@@ -2195,12 +2104,11 @@
if ((RTL_R8(tp, ChipCmd) & CmdRxEnb) == 0)
return true;
- return rtl8169_do_counters(dev, CounterDump);
+ return rtl8169_do_counters(tp, CounterDump);
}
-static bool rtl8169_init_counter_offsets(struct net_device *dev)
+static bool rtl8169_init_counter_offsets(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
struct rtl8169_counters *counters = tp->counters;
bool ret = false;
@@ -2223,10 +2131,10 @@
return true;
/* If both, reset and update fail, propagate to caller. */
- if (rtl8169_reset_counters(dev))
+ if (rtl8169_reset_counters(tp))
ret = true;
- if (rtl8169_update_counters(dev))
+ if (rtl8169_update_counters(tp))
ret = true;
tp->tc_offset.tx_errors = counters->tx_errors;
@@ -2249,7 +2157,7 @@
pm_runtime_get_noresume(d);
if (pm_runtime_active(d))
- rtl8169_update_counters(dev);
+ rtl8169_update_counters(tp);
pm_runtime_put_noidle(d);
@@ -2391,7 +2299,7 @@
if (IS_ERR(ci))
return PTR_ERR(ci);
- scale = &ci->scalev[RTL_R16(tp, CPlusCmd) & 3];
+ scale = &ci->scalev[tp->cp_cmd & INTT_MASK];
/* read IntrMitigate and adjust according to scale */
for (w = RTL_R16(tp, IntrMitigate); w; w >>= RTL_COALESCE_SHIFT, p++) {
@@ -2490,7 +2398,7 @@
RTL_W16(tp, IntrMitigate, swab16(w));
- tp->cp_cmd = (tp->cp_cmd & ~3) | cp01;
+ tp->cp_cmd = (tp->cp_cmd & ~INTT_MASK) | cp01;
RTL_W16(tp, CPlusCmd, tp->cp_cmd);
RTL_R16(tp, CPlusCmd);
@@ -2520,7 +2428,7 @@
};
static void rtl8169_get_mac_version(struct rtl8169_private *tp,
- struct net_device *dev, u8 default_version)
+ u8 default_version)
{
/*
* The driver currently handles the 8168Bf and the 8168Be identically
@@ -2560,12 +2468,10 @@
/* 8168E family. */
{ 0x7c800000, 0x2c800000, RTL_GIGA_MAC_VER_34 },
- { 0x7cf00000, 0x2c200000, RTL_GIGA_MAC_VER_33 },
{ 0x7cf00000, 0x2c100000, RTL_GIGA_MAC_VER_32 },
{ 0x7c800000, 0x2c000000, RTL_GIGA_MAC_VER_33 },
/* 8168D family. */
- { 0x7cf00000, 0x28300000, RTL_GIGA_MAC_VER_26 },
{ 0x7cf00000, 0x28100000, RTL_GIGA_MAC_VER_25 },
{ 0x7c800000, 0x28000000, RTL_GIGA_MAC_VER_26 },
@@ -2575,32 +2481,24 @@
{ 0x7cf00000, 0x28b00000, RTL_GIGA_MAC_VER_31 },
/* 8168C family. */
- { 0x7cf00000, 0x3cb00000, RTL_GIGA_MAC_VER_24 },
{ 0x7cf00000, 0x3c900000, RTL_GIGA_MAC_VER_23 },
{ 0x7cf00000, 0x3c800000, RTL_GIGA_MAC_VER_18 },
{ 0x7c800000, 0x3c800000, RTL_GIGA_MAC_VER_24 },
{ 0x7cf00000, 0x3c000000, RTL_GIGA_MAC_VER_19 },
{ 0x7cf00000, 0x3c200000, RTL_GIGA_MAC_VER_20 },
{ 0x7cf00000, 0x3c300000, RTL_GIGA_MAC_VER_21 },
- { 0x7cf00000, 0x3c400000, RTL_GIGA_MAC_VER_22 },
{ 0x7c800000, 0x3c000000, RTL_GIGA_MAC_VER_22 },
/* 8168B family. */
{ 0x7cf00000, 0x38000000, RTL_GIGA_MAC_VER_12 },
- { 0x7cf00000, 0x38500000, RTL_GIGA_MAC_VER_17 },
{ 0x7c800000, 0x38000000, RTL_GIGA_MAC_VER_17 },
{ 0x7c800000, 0x30000000, RTL_GIGA_MAC_VER_11 },
/* 8101 family. */
- { 0x7cf00000, 0x44900000, RTL_GIGA_MAC_VER_39 },
{ 0x7c800000, 0x44800000, RTL_GIGA_MAC_VER_39 },
{ 0x7c800000, 0x44000000, RTL_GIGA_MAC_VER_37 },
- { 0x7cf00000, 0x40b00000, RTL_GIGA_MAC_VER_30 },
- { 0x7cf00000, 0x40a00000, RTL_GIGA_MAC_VER_30 },
{ 0x7cf00000, 0x40900000, RTL_GIGA_MAC_VER_29 },
{ 0x7c800000, 0x40800000, RTL_GIGA_MAC_VER_30 },
- { 0x7cf00000, 0x34a00000, RTL_GIGA_MAC_VER_09 },
- { 0x7cf00000, 0x24a00000, RTL_GIGA_MAC_VER_09 },
{ 0x7cf00000, 0x34900000, RTL_GIGA_MAC_VER_08 },
{ 0x7cf00000, 0x24900000, RTL_GIGA_MAC_VER_08 },
{ 0x7cf00000, 0x34800000, RTL_GIGA_MAC_VER_07 },
@@ -2635,8 +2533,8 @@
tp->mac_version = p->mac_version;
if (tp->mac_version == RTL_GIGA_MAC_NONE) {
- netif_notice(tp, probe, dev,
- "unknown MAC, using family default\n");
+ dev_notice(tp_to_dev(tp),
+ "unknown MAC, using family default\n");
tp->mac_version = default_version;
} else if (tp->mac_version == RTL_GIGA_MAC_VER_42) {
tp->mac_version = tp->mii.supports_gmii ?
@@ -4685,18 +4583,7 @@
ops->write = r8168dp_2_mdio_write;
ops->read = r8168dp_2_mdio_read;
break;
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
ops->write = r8168g_mdio_write;
ops->read = r8168g_mdio_read;
break;
@@ -4741,21 +4628,7 @@
case RTL_GIGA_MAC_VER_32:
case RTL_GIGA_MAC_VER_33:
case RTL_GIGA_MAC_VER_34:
- case RTL_GIGA_MAC_VER_37:
- case RTL_GIGA_MAC_VER_38:
- case RTL_GIGA_MAC_VER_39:
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_37 ... RTL_GIGA_MAC_VER_51:
RTL_W32(tp, RxConfig, RTL_R32(tp, RxConfig) |
AcceptBroadcast | AcceptMulticast | AcceptMyPhys);
break;
@@ -4775,79 +4648,13 @@
return true;
}
-static void r810x_phy_power_down(struct rtl8169_private *tp)
-{
- rtl_writephy(tp, 0x1f, 0x0000);
- rtl_writephy(tp, MII_BMCR, BMCR_PDOWN);
-}
-
-static void r810x_phy_power_up(struct rtl8169_private *tp)
-{
- rtl_writephy(tp, 0x1f, 0x0000);
- rtl_writephy(tp, MII_BMCR, BMCR_ANENABLE);
-}
-
-static void r810x_pll_power_down(struct rtl8169_private *tp)
-{
- if (rtl_wol_pll_power_down(tp))
- return;
-
- r810x_phy_power_down(tp);
-
- switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_07:
- case RTL_GIGA_MAC_VER_08:
- case RTL_GIGA_MAC_VER_09:
- case RTL_GIGA_MAC_VER_10:
- case RTL_GIGA_MAC_VER_13:
- case RTL_GIGA_MAC_VER_16:
- break;
- default:
- RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) & ~0x80);
- break;
- }
-}
-
-static void r810x_pll_power_up(struct rtl8169_private *tp)
-{
- r810x_phy_power_up(tp);
-
- switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_07:
- case RTL_GIGA_MAC_VER_08:
- case RTL_GIGA_MAC_VER_09:
- case RTL_GIGA_MAC_VER_10:
- case RTL_GIGA_MAC_VER_13:
- case RTL_GIGA_MAC_VER_16:
- break;
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) | 0xc0);
- break;
- default:
- RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) | 0x80);
- break;
- }
-}
-
static void r8168_phy_power_up(struct rtl8169_private *tp)
{
rtl_writephy(tp, 0x1f, 0x0000);
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_11:
case RTL_GIGA_MAC_VER_12:
- case RTL_GIGA_MAC_VER_17:
- case RTL_GIGA_MAC_VER_18:
- case RTL_GIGA_MAC_VER_19:
- case RTL_GIGA_MAC_VER_20:
- case RTL_GIGA_MAC_VER_21:
- case RTL_GIGA_MAC_VER_22:
- case RTL_GIGA_MAC_VER_23:
- case RTL_GIGA_MAC_VER_24:
- case RTL_GIGA_MAC_VER_25:
- case RTL_GIGA_MAC_VER_26:
- case RTL_GIGA_MAC_VER_27:
- case RTL_GIGA_MAC_VER_28:
+ case RTL_GIGA_MAC_VER_17 ... RTL_GIGA_MAC_VER_28:
case RTL_GIGA_MAC_VER_31:
rtl_writephy(tp, 0x0e, 0x0000);
break;
@@ -4855,6 +4662,9 @@
break;
}
rtl_writephy(tp, MII_BMCR, BMCR_ANENABLE);
+
+ /* give MAC/PHY some time to resume */
+ msleep(20);
}
static void r8168_phy_power_down(struct rtl8169_private *tp)
@@ -4870,18 +4680,7 @@
case RTL_GIGA_MAC_VER_11:
case RTL_GIGA_MAC_VER_12:
- case RTL_GIGA_MAC_VER_17:
- case RTL_GIGA_MAC_VER_18:
- case RTL_GIGA_MAC_VER_19:
- case RTL_GIGA_MAC_VER_20:
- case RTL_GIGA_MAC_VER_21:
- case RTL_GIGA_MAC_VER_22:
- case RTL_GIGA_MAC_VER_23:
- case RTL_GIGA_MAC_VER_24:
- case RTL_GIGA_MAC_VER_25:
- case RTL_GIGA_MAC_VER_26:
- case RTL_GIGA_MAC_VER_27:
- case RTL_GIGA_MAC_VER_28:
+ case RTL_GIGA_MAC_VER_17 ... RTL_GIGA_MAC_VER_28:
case RTL_GIGA_MAC_VER_31:
rtl_writephy(tp, 0x0e, 0x0200);
default:
@@ -4895,12 +4694,6 @@
if (r8168_check_dash(tp))
return;
- if ((tp->mac_version == RTL_GIGA_MAC_VER_23 ||
- tp->mac_version == RTL_GIGA_MAC_VER_24) &&
- (RTL_R16(tp, CPlusCmd) & ASF)) {
- return;
- }
-
if (tp->mac_version == RTL_GIGA_MAC_VER_32 ||
tp->mac_version == RTL_GIGA_MAC_VER_33)
rtl_ephy_write(tp, 0x19, 0xff64);
@@ -4911,16 +4704,15 @@
r8168_phy_power_down(tp);
switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_25:
- case RTL_GIGA_MAC_VER_26:
- case RTL_GIGA_MAC_VER_27:
- case RTL_GIGA_MAC_VER_28:
- case RTL_GIGA_MAC_VER_31:
- case RTL_GIGA_MAC_VER_32:
- case RTL_GIGA_MAC_VER_33:
+ case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
+ case RTL_GIGA_MAC_VER_37:
+ case RTL_GIGA_MAC_VER_39:
+ case RTL_GIGA_MAC_VER_43:
case RTL_GIGA_MAC_VER_44:
case RTL_GIGA_MAC_VER_45:
case RTL_GIGA_MAC_VER_46:
+ case RTL_GIGA_MAC_VER_47:
+ case RTL_GIGA_MAC_VER_48:
case RTL_GIGA_MAC_VER_50:
case RTL_GIGA_MAC_VER_51:
RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) & ~0x80);
@@ -4938,18 +4730,17 @@
static void r8168_pll_power_up(struct rtl8169_private *tp)
{
switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_25:
- case RTL_GIGA_MAC_VER_26:
- case RTL_GIGA_MAC_VER_27:
- case RTL_GIGA_MAC_VER_28:
- case RTL_GIGA_MAC_VER_31:
- case RTL_GIGA_MAC_VER_32:
- case RTL_GIGA_MAC_VER_33:
+ case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
+ case RTL_GIGA_MAC_VER_37:
+ case RTL_GIGA_MAC_VER_39:
+ case RTL_GIGA_MAC_VER_43:
RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) | 0x80);
break;
case RTL_GIGA_MAC_VER_44:
case RTL_GIGA_MAC_VER_45:
case RTL_GIGA_MAC_VER_46:
+ case RTL_GIGA_MAC_VER_47:
+ case RTL_GIGA_MAC_VER_48:
case RTL_GIGA_MAC_VER_50:
case RTL_GIGA_MAC_VER_51:
RTL_W8(tp, PMCH, RTL_R8(tp, PMCH) | 0xc0);
@@ -4966,130 +4757,41 @@
r8168_phy_power_up(tp);
}
-static void rtl_generic_op(struct rtl8169_private *tp,
- void (*op)(struct rtl8169_private *))
-{
- if (op)
- op(tp);
-}
-
static void rtl_pll_power_down(struct rtl8169_private *tp)
{
- rtl_generic_op(tp, tp->pll_power_ops.down);
+ switch (tp->mac_version) {
+ case RTL_GIGA_MAC_VER_01 ... RTL_GIGA_MAC_VER_06:
+ case RTL_GIGA_MAC_VER_13 ... RTL_GIGA_MAC_VER_15:
+ break;
+ default:
+ r8168_pll_power_down(tp);
+ }
}
static void rtl_pll_power_up(struct rtl8169_private *tp)
{
- rtl_generic_op(tp, tp->pll_power_ops.up);
-
- /* give MAC/PHY some time to resume */
- msleep(20);
-}
-
-static void rtl_init_pll_power_ops(struct rtl8169_private *tp)
-{
- struct pll_power_ops *ops = &tp->pll_power_ops;
-
switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_07:
- case RTL_GIGA_MAC_VER_08:
- case RTL_GIGA_MAC_VER_09:
- case RTL_GIGA_MAC_VER_10:
- case RTL_GIGA_MAC_VER_16:
- case RTL_GIGA_MAC_VER_29:
- case RTL_GIGA_MAC_VER_30:
- case RTL_GIGA_MAC_VER_37:
- case RTL_GIGA_MAC_VER_39:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- ops->down = r810x_pll_power_down;
- ops->up = r810x_pll_power_up;
+ case RTL_GIGA_MAC_VER_01 ... RTL_GIGA_MAC_VER_06:
+ case RTL_GIGA_MAC_VER_13 ... RTL_GIGA_MAC_VER_15:
break;
-
- case RTL_GIGA_MAC_VER_11:
- case RTL_GIGA_MAC_VER_12:
- case RTL_GIGA_MAC_VER_17:
- case RTL_GIGA_MAC_VER_18:
- case RTL_GIGA_MAC_VER_19:
- case RTL_GIGA_MAC_VER_20:
- case RTL_GIGA_MAC_VER_21:
- case RTL_GIGA_MAC_VER_22:
- case RTL_GIGA_MAC_VER_23:
- case RTL_GIGA_MAC_VER_24:
- case RTL_GIGA_MAC_VER_25:
- case RTL_GIGA_MAC_VER_26:
- case RTL_GIGA_MAC_VER_27:
- case RTL_GIGA_MAC_VER_28:
- case RTL_GIGA_MAC_VER_31:
- case RTL_GIGA_MAC_VER_32:
- case RTL_GIGA_MAC_VER_33:
- case RTL_GIGA_MAC_VER_34:
- case RTL_GIGA_MAC_VER_35:
- case RTL_GIGA_MAC_VER_36:
- case RTL_GIGA_MAC_VER_38:
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
- ops->down = r8168_pll_power_down;
- ops->up = r8168_pll_power_up;
- break;
-
default:
- ops->down = NULL;
- ops->up = NULL;
- break;
+ r8168_pll_power_up(tp);
}
}
static void rtl_init_rxcfg(struct rtl8169_private *tp)
{
switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_01:
- case RTL_GIGA_MAC_VER_02:
- case RTL_GIGA_MAC_VER_03:
- case RTL_GIGA_MAC_VER_04:
- case RTL_GIGA_MAC_VER_05:
- case RTL_GIGA_MAC_VER_06:
- case RTL_GIGA_MAC_VER_10:
- case RTL_GIGA_MAC_VER_11:
- case RTL_GIGA_MAC_VER_12:
- case RTL_GIGA_MAC_VER_13:
- case RTL_GIGA_MAC_VER_14:
- case RTL_GIGA_MAC_VER_15:
- case RTL_GIGA_MAC_VER_16:
- case RTL_GIGA_MAC_VER_17:
+ case RTL_GIGA_MAC_VER_01 ... RTL_GIGA_MAC_VER_06:
+ case RTL_GIGA_MAC_VER_10 ... RTL_GIGA_MAC_VER_17:
RTL_W32(tp, RxConfig, RX_FIFO_THRESH | RX_DMA_BURST);
break;
- case RTL_GIGA_MAC_VER_18:
- case RTL_GIGA_MAC_VER_19:
- case RTL_GIGA_MAC_VER_20:
- case RTL_GIGA_MAC_VER_21:
- case RTL_GIGA_MAC_VER_22:
- case RTL_GIGA_MAC_VER_23:
- case RTL_GIGA_MAC_VER_24:
+ case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
case RTL_GIGA_MAC_VER_34:
case RTL_GIGA_MAC_VER_35:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
break;
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST | RX_EARLY_OFF);
break;
default:
@@ -5105,16 +4807,20 @@
static void rtl_hw_jumbo_enable(struct rtl8169_private *tp)
{
- RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
- rtl_generic_op(tp, tp->jumbo_ops.enable);
- RTL_W8(tp, Cfg9346, Cfg9346_Lock);
+ if (tp->jumbo_ops.enable) {
+ RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
+ tp->jumbo_ops.enable(tp);
+ RTL_W8(tp, Cfg9346, Cfg9346_Lock);
+ }
}
static void rtl_hw_jumbo_disable(struct rtl8169_private *tp)
{
- RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
- rtl_generic_op(tp, tp->jumbo_ops.disable);
- RTL_W8(tp, Cfg9346, Cfg9346_Lock);
+ if (tp->jumbo_ops.disable) {
+ RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
+ tp->jumbo_ops.disable(tp);
+ RTL_W8(tp, Cfg9346, Cfg9346_Lock);
+ }
}
static void r8168c_hw_jumbo_enable(struct rtl8169_private *tp)
@@ -5128,7 +4834,7 @@
{
RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Jumbo_En0);
RTL_W8(tp, Config4, RTL_R8(tp, Config4) & ~Jumbo_En1);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
}
static void r8168dp_hw_jumbo_enable(struct rtl8169_private *tp)
@@ -5154,7 +4860,7 @@
RTL_W8(tp, MaxTxPacketSize, 0x0c);
RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Jumbo_En0);
RTL_W8(tp, Config4, RTL_R8(tp, Config4) & ~0x01);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
}
static void r8168b_0_hw_jumbo_enable(struct rtl8169_private *tp)
@@ -5166,7 +4872,7 @@
static void r8168b_0_hw_jumbo_disable(struct rtl8169_private *tp)
{
rtl_tx_performance_tweak(tp,
- (0x5 << MAX_READ_REQUEST_SHIFT) | PCI_EXP_DEVCTL_NOSNOOP_EN);
+ PCI_EXP_DEVCTL_READRQ_4096B | PCI_EXP_DEVCTL_NOSNOOP_EN);
}
static void r8168b_1_hw_jumbo_enable(struct rtl8169_private *tp)
@@ -5226,18 +4932,7 @@
* No action needed for jumbo frames with 8169.
* No jumbo for 810x at all.
*/
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
default:
ops->disable = NULL;
ops->enable = NULL;
@@ -5323,32 +5018,21 @@
rtl_rx_close(tp);
- if (tp->mac_version == RTL_GIGA_MAC_VER_27 ||
- tp->mac_version == RTL_GIGA_MAC_VER_28 ||
- tp->mac_version == RTL_GIGA_MAC_VER_31) {
+ switch (tp->mac_version) {
+ case RTL_GIGA_MAC_VER_27:
+ case RTL_GIGA_MAC_VER_28:
+ case RTL_GIGA_MAC_VER_31:
rtl_udelay_loop_wait_low(tp, &rtl_npq_cond, 20, 42*42);
- } else if (tp->mac_version == RTL_GIGA_MAC_VER_34 ||
- tp->mac_version == RTL_GIGA_MAC_VER_35 ||
- tp->mac_version == RTL_GIGA_MAC_VER_36 ||
- tp->mac_version == RTL_GIGA_MAC_VER_37 ||
- tp->mac_version == RTL_GIGA_MAC_VER_38 ||
- tp->mac_version == RTL_GIGA_MAC_VER_40 ||
- tp->mac_version == RTL_GIGA_MAC_VER_41 ||
- tp->mac_version == RTL_GIGA_MAC_VER_42 ||
- tp->mac_version == RTL_GIGA_MAC_VER_43 ||
- tp->mac_version == RTL_GIGA_MAC_VER_44 ||
- tp->mac_version == RTL_GIGA_MAC_VER_45 ||
- tp->mac_version == RTL_GIGA_MAC_VER_46 ||
- tp->mac_version == RTL_GIGA_MAC_VER_47 ||
- tp->mac_version == RTL_GIGA_MAC_VER_48 ||
- tp->mac_version == RTL_GIGA_MAC_VER_49 ||
- tp->mac_version == RTL_GIGA_MAC_VER_50 ||
- tp->mac_version == RTL_GIGA_MAC_VER_51) {
+ break;
+ case RTL_GIGA_MAC_VER_34 ... RTL_GIGA_MAC_VER_38:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
RTL_W8(tp, ChipCmd, RTL_R8(tp, ChipCmd) | StopReq);
rtl_udelay_loop_wait_high(tp, &rtl_txcfg_empty_cond, 100, 666);
- } else {
+ break;
+ default:
RTL_W8(tp, ChipCmd, RTL_R8(tp, ChipCmd) | StopReq);
udelay(100);
+ break;
}
rtl_hw_reset(tp);
@@ -5361,13 +5045,10 @@
(InterFrameGap << TxInterFrameGapShift));
}
-static void rtl_hw_start(struct net_device *dev)
+static void rtl_set_rx_max_size(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
-
- tp->hw_start(dev);
-
- rtl_irq_enable_all(tp);
+ /* Low hurts. Let's disable the filtering. */
+ RTL_W16(tp, RxMaxSize, R8169_RX_BUF_SIZE + 1);
}
static void rtl_set_rx_tx_desc_registers(struct rtl8169_private *tp)
@@ -5383,21 +5064,6 @@
RTL_W32(tp, RxDescAddrLow, ((u64) tp->RxPhyAddr) & DMA_BIT_MASK(32));
}
-static u16 rtl_rw_cpluscmd(struct rtl8169_private *tp)
-{
- u16 cmd;
-
- cmd = RTL_R16(tp, CPlusCmd);
- RTL_W16(tp, CPlusCmd, cmd);
- return cmd;
-}
-
-static void rtl_set_rx_max_size(struct rtl8169_private *tp, unsigned int rx_buf_sz)
-{
- /* Low hurts. Let's disable the filtering. */
- RTL_W16(tp, RxMaxSize, rx_buf_sz + 1);
-}
-
static void rtl8169_set_magic_reg(struct rtl8169_private *tp, unsigned mac_version)
{
static const struct rtl_cfg2_info {
@@ -5475,36 +5141,34 @@
RTL_W32(tp, RxConfig, tmp);
}
-static void rtl_hw_start_8169(struct net_device *dev)
+static void rtl_hw_start(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
- struct pci_dev *pdev = tp->pci_dev;
-
- if (tp->mac_version == RTL_GIGA_MAC_VER_05) {
- RTL_W16(tp, CPlusCmd, RTL_R16(tp, CPlusCmd) | PCIMulRW);
- pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, 0x08);
- }
-
RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
- if (tp->mac_version == RTL_GIGA_MAC_VER_01 ||
- tp->mac_version == RTL_GIGA_MAC_VER_02 ||
- tp->mac_version == RTL_GIGA_MAC_VER_03 ||
- tp->mac_version == RTL_GIGA_MAC_VER_04)
- RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
- rtl_init_rxcfg(tp);
+ tp->hw_start(tp);
+
+ rtl_set_rx_max_size(tp);
+ rtl_set_rx_tx_desc_registers(tp);
+ rtl_set_rx_tx_config_registers(tp);
+ RTL_W8(tp, Cfg9346, Cfg9346_Lock);
+
+ /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
+ RTL_R8(tp, IntrMask);
+ RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
+ rtl_set_rx_mode(tp->dev);
+ /* no early-rx interrupts */
+ RTL_W16(tp, MultiIntr, RTL_R16(tp, MultiIntr) & 0xf000);
+ rtl_irq_enable_all(tp);
+}
+
+static void rtl_hw_start_8169(struct rtl8169_private *tp)
+{
+ if (tp->mac_version == RTL_GIGA_MAC_VER_05)
+ pci_write_config_byte(tp->pci_dev, PCI_CACHE_LINE_SIZE, 0x08);
RTL_W8(tp, EarlyTxThres, NoEarlyTx);
- rtl_set_rx_max_size(tp, rx_buf_sz);
-
- if (tp->mac_version == RTL_GIGA_MAC_VER_01 ||
- tp->mac_version == RTL_GIGA_MAC_VER_02 ||
- tp->mac_version == RTL_GIGA_MAC_VER_03 ||
- tp->mac_version == RTL_GIGA_MAC_VER_04)
- rtl_set_rx_tx_config_registers(tp);
-
- tp->cp_cmd |= rtl_rw_cpluscmd(tp) | PCIMulRW;
+ tp->cp_cmd |= PCIMulRW;
if (tp->mac_version == RTL_GIGA_MAC_VER_02 ||
tp->mac_version == RTL_GIGA_MAC_VER_03) {
@@ -5523,56 +5187,7 @@
*/
RTL_W16(tp, IntrMitigate, 0x0000);
- rtl_set_rx_tx_desc_registers(tp);
-
- if (tp->mac_version != RTL_GIGA_MAC_VER_01 &&
- tp->mac_version != RTL_GIGA_MAC_VER_02 &&
- tp->mac_version != RTL_GIGA_MAC_VER_03 &&
- tp->mac_version != RTL_GIGA_MAC_VER_04) {
- RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
- rtl_set_rx_tx_config_registers(tp);
- }
-
- RTL_W8(tp, Cfg9346, Cfg9346_Lock);
-
- /* Initially a 10 us delay. Turned it into a PCI commit. - FR */
- RTL_R8(tp, IntrMask);
-
RTL_W32(tp, RxMissed, 0);
-
- rtl_set_rx_mode(dev);
-
- /* no early-rx interrupts */
- RTL_W16(tp, MultiIntr, RTL_R16(tp, MultiIntr) & 0xf000);
-}
-
-static void rtl_csi_write(struct rtl8169_private *tp, int addr, int value)
-{
- if (tp->csi_ops.write)
- tp->csi_ops.write(tp, addr, value);
-}
-
-static u32 rtl_csi_read(struct rtl8169_private *tp, int addr)
-{
- return tp->csi_ops.read ? tp->csi_ops.read(tp, addr) : ~0;
-}
-
-static void rtl_csi_access_enable(struct rtl8169_private *tp, u32 bits)
-{
- u32 csi;
-
- csi = rtl_csi_read(tp, 0x070c) & 0x00ffffff;
- rtl_csi_write(tp, 0x070c, csi | bits);
-}
-
-static void rtl_csi_access_enable_1(struct rtl8169_private *tp)
-{
- rtl_csi_access_enable(tp, 0x17000000);
-}
-
-static void rtl_csi_access_enable_2(struct rtl8169_private *tp)
-{
- rtl_csi_access_enable(tp, 0x27000000);
}
DECLARE_RTL_COND(rtl_csiar_cond)
@@ -5580,101 +5195,55 @@
return RTL_R32(tp, CSIAR) & CSIAR_FLAG;
}
-static void r8169_csi_write(struct rtl8169_private *tp, int addr, int value)
+static void rtl_csi_write(struct rtl8169_private *tp, int addr, int value)
{
+ u32 func = PCI_FUNC(tp->pci_dev->devfn);
+
RTL_W32(tp, CSIDR, value);
RTL_W32(tp, CSIAR, CSIAR_WRITE_CMD | (addr & CSIAR_ADDR_MASK) |
- CSIAR_BYTE_ENABLE << CSIAR_BYTE_ENABLE_SHIFT);
+ CSIAR_BYTE_ENABLE | func << 16);
rtl_udelay_loop_wait_low(tp, &rtl_csiar_cond, 10, 100);
}
-static u32 r8169_csi_read(struct rtl8169_private *tp, int addr)
+static u32 rtl_csi_read(struct rtl8169_private *tp, int addr)
{
- RTL_W32(tp, CSIAR, (addr & CSIAR_ADDR_MASK) |
- CSIAR_BYTE_ENABLE << CSIAR_BYTE_ENABLE_SHIFT);
+ u32 func = PCI_FUNC(tp->pci_dev->devfn);
+
+ RTL_W32(tp, CSIAR, (addr & CSIAR_ADDR_MASK) | func << 16 |
+ CSIAR_BYTE_ENABLE);
return rtl_udelay_loop_wait_high(tp, &rtl_csiar_cond, 10, 100) ?
RTL_R32(tp, CSIDR) : ~0;
}
-static void r8402_csi_write(struct rtl8169_private *tp, int addr, int value)
+static void rtl_csi_access_enable(struct rtl8169_private *tp, u8 val)
{
- RTL_W32(tp, CSIDR, value);
- RTL_W32(tp, CSIAR, CSIAR_WRITE_CMD | (addr & CSIAR_ADDR_MASK) |
- CSIAR_BYTE_ENABLE << CSIAR_BYTE_ENABLE_SHIFT |
- CSIAR_FUNC_NIC);
+ struct pci_dev *pdev = tp->pci_dev;
+ u32 csi;
- rtl_udelay_loop_wait_low(tp, &rtl_csiar_cond, 10, 100);
+ /* According to Realtek the value at config space address 0x070f
+ * controls the L0s/L1 entrance latency. We try standard ECAM access
+ * first and if it fails fall back to CSI.
+ */
+ if (pdev->cfg_size > 0x070f &&
+ pci_write_config_byte(pdev, 0x070f, val) == PCIBIOS_SUCCESSFUL)
+ return;
+
+ netdev_notice_once(tp->dev,
+ "No native access to PCI extended config space, falling back to CSI\n");
+ csi = rtl_csi_read(tp, 0x070c) & 0x00ffffff;
+ rtl_csi_write(tp, 0x070c, csi | val << 24);
}
-static u32 r8402_csi_read(struct rtl8169_private *tp, int addr)
+static void rtl_csi_access_enable_1(struct rtl8169_private *tp)
{
- RTL_W32(tp, CSIAR, (addr & CSIAR_ADDR_MASK) | CSIAR_FUNC_NIC |
- CSIAR_BYTE_ENABLE << CSIAR_BYTE_ENABLE_SHIFT);
-
- return rtl_udelay_loop_wait_high(tp, &rtl_csiar_cond, 10, 100) ?
- RTL_R32(tp, CSIDR) : ~0;
+ rtl_csi_access_enable(tp, 0x17);
}
-static void r8411_csi_write(struct rtl8169_private *tp, int addr, int value)
+static void rtl_csi_access_enable_2(struct rtl8169_private *tp)
{
- RTL_W32(tp, CSIDR, value);
- RTL_W32(tp, CSIAR, CSIAR_WRITE_CMD | (addr & CSIAR_ADDR_MASK) |
- CSIAR_BYTE_ENABLE << CSIAR_BYTE_ENABLE_SHIFT |
- CSIAR_FUNC_NIC2);
-
- rtl_udelay_loop_wait_low(tp, &rtl_csiar_cond, 10, 100);
-}
-
-static u32 r8411_csi_read(struct rtl8169_private *tp, int addr)
-{
- RTL_W32(tp, CSIAR, (addr & CSIAR_ADDR_MASK) | CSIAR_FUNC_NIC2 |
- CSIAR_BYTE_ENABLE << CSIAR_BYTE_ENABLE_SHIFT);
-
- return rtl_udelay_loop_wait_high(tp, &rtl_csiar_cond, 10, 100) ?
- RTL_R32(tp, CSIDR) : ~0;
-}
-
-static void rtl_init_csi_ops(struct rtl8169_private *tp)
-{
- struct csi_ops *ops = &tp->csi_ops;
-
- switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_01:
- case RTL_GIGA_MAC_VER_02:
- case RTL_GIGA_MAC_VER_03:
- case RTL_GIGA_MAC_VER_04:
- case RTL_GIGA_MAC_VER_05:
- case RTL_GIGA_MAC_VER_06:
- case RTL_GIGA_MAC_VER_10:
- case RTL_GIGA_MAC_VER_11:
- case RTL_GIGA_MAC_VER_12:
- case RTL_GIGA_MAC_VER_13:
- case RTL_GIGA_MAC_VER_14:
- case RTL_GIGA_MAC_VER_15:
- case RTL_GIGA_MAC_VER_16:
- case RTL_GIGA_MAC_VER_17:
- ops->write = NULL;
- ops->read = NULL;
- break;
-
- case RTL_GIGA_MAC_VER_37:
- case RTL_GIGA_MAC_VER_38:
- ops->write = r8402_csi_write;
- ops->read = r8402_csi_read;
- break;
-
- case RTL_GIGA_MAC_VER_44:
- ops->write = r8411_csi_write;
- ops->read = r8411_csi_read;
- break;
-
- default:
- ops->write = r8169_csi_write;
- ops->read = r8169_csi_read;
- break;
- }
+ rtl_csi_access_enable(tp, 0x27);
}
struct ephy_info {
@@ -5721,25 +5290,15 @@
RTL_W8(tp, Config3, data);
}
-#define R8168_CPCMD_QUIRK_MASK (\
- EnableBist | \
- Mac_dbgo_oe | \
- Force_half_dup | \
- Force_rxflow_en | \
- Force_txflow_en | \
- Cxpl_dbg_sel | \
- ASF | \
- PktCntrDisable | \
- Mac_dbgo_sel)
-
static void rtl_hw_start_8168bb(struct rtl8169_private *tp)
{
RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Beacon_en);
- RTL_W16(tp, CPlusCmd, RTL_R16(tp, CPlusCmd) & ~R8168_CPCMD_QUIRK_MASK);
+ tp->cp_cmd &= CPCMD_QUIRK_MASK;
+ RTL_W16(tp, CPlusCmd, tp->cp_cmd);
if (tp->dev->mtu <= ETH_DATA_LEN) {
- rtl_tx_performance_tweak(tp, (0x5 << MAX_READ_REQUEST_SHIFT) |
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B |
PCI_EXP_DEVCTL_NOSNOOP_EN);
}
}
@@ -5760,11 +5319,12 @@
RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Beacon_en);
if (tp->dev->mtu <= ETH_DATA_LEN)
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
rtl_disable_clock_request(tp);
- RTL_W16(tp, CPlusCmd, RTL_R16(tp, CPlusCmd) & ~R8168_CPCMD_QUIRK_MASK);
+ tp->cp_cmd &= CPCMD_QUIRK_MASK;
+ RTL_W16(tp, CPlusCmd, tp->cp_cmd);
}
static void rtl_hw_start_8168cp_1(struct rtl8169_private *tp)
@@ -5791,9 +5351,10 @@
RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Beacon_en);
if (tp->dev->mtu <= ETH_DATA_LEN)
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
- RTL_W16(tp, CPlusCmd, RTL_R16(tp, CPlusCmd) & ~R8168_CPCMD_QUIRK_MASK);
+ tp->cp_cmd &= CPCMD_QUIRK_MASK;
+ RTL_W16(tp, CPlusCmd, tp->cp_cmd);
}
static void rtl_hw_start_8168cp_3(struct rtl8169_private *tp)
@@ -5808,9 +5369,10 @@
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
if (tp->dev->mtu <= ETH_DATA_LEN)
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
- RTL_W16(tp, CPlusCmd, RTL_R16(tp, CPlusCmd) & ~R8168_CPCMD_QUIRK_MASK);
+ tp->cp_cmd &= CPCMD_QUIRK_MASK;
+ RTL_W16(tp, CPlusCmd, tp->cp_cmd);
}
static void rtl_hw_start_8168c_1(struct rtl8169_private *tp)
@@ -5865,9 +5427,10 @@
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
if (tp->dev->mtu <= ETH_DATA_LEN)
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
- RTL_W16(tp, CPlusCmd, RTL_R16(tp, CPlusCmd) & ~R8168_CPCMD_QUIRK_MASK);
+ tp->cp_cmd &= CPCMD_QUIRK_MASK;
+ RTL_W16(tp, CPlusCmd, tp->cp_cmd);
}
static void rtl_hw_start_8168dp(struct rtl8169_private *tp)
@@ -5875,7 +5438,7 @@
rtl_csi_access_enable_1(tp);
if (tp->dev->mtu <= ETH_DATA_LEN)
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
@@ -5892,7 +5455,7 @@
rtl_csi_access_enable_1(tp);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
@@ -5924,7 +5487,7 @@
rtl_ephy_init(tp, e_info_8168e_1, ARRAY_SIZE(e_info_8168e_1));
if (tp->dev->mtu <= ETH_DATA_LEN)
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
@@ -5949,7 +5512,7 @@
rtl_ephy_init(tp, e_info_8168e_2, ARRAY_SIZE(e_info_8168e_2));
if (tp->dev->mtu <= ETH_DATA_LEN)
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
rtl_eri_write(tp, 0xc0, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
rtl_eri_write(tp, 0xb8, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
@@ -5979,7 +5542,7 @@
{
rtl_csi_access_enable_2(tp);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
rtl_eri_write(tp, 0xc0, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
rtl_eri_write(tp, 0xb8, ERIAR_MASK_0011, 0x0000, ERIAR_EXGMAC);
@@ -6050,7 +5613,7 @@
rtl_csi_access_enable_1(tp);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
rtl_w0w1_eri(tp, 0xdc, ERIAR_MASK_0001, 0x00, 0x01, ERIAR_EXGMAC);
rtl_w0w1_eri(tp, 0xdc, ERIAR_MASK_0001, 0x01, 0x00, ERIAR_EXGMAC);
@@ -6150,7 +5713,7 @@
rtl_csi_access_enable_1(tp);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
rtl_w0w1_eri(tp, 0xdc, ERIAR_MASK_0001, 0x00, 0x01, ERIAR_EXGMAC);
rtl_w0w1_eri(tp, 0xdc, ERIAR_MASK_0001, 0x01, 0x00, ERIAR_EXGMAC);
@@ -6232,7 +5795,7 @@
rtl_csi_access_enable_1(tp);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
rtl_w0w1_eri(tp, 0xdc, ERIAR_MASK_0001, 0x00, 0x01, ERIAR_EXGMAC);
rtl_w0w1_eri(tp, 0xdc, ERIAR_MASK_0001, 0x01, 0x00, ERIAR_EXGMAC);
@@ -6328,18 +5891,12 @@
r8168_mac_ocp_write(tp, 0xe860, data);
}
-static void rtl_hw_start_8168(struct net_device *dev)
+static void rtl_hw_start_8168(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
-
- RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
-
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
- rtl_set_rx_max_size(tp, rx_buf_sz);
-
- tp->cp_cmd |= RTL_R16(tp, CPlusCmd) | PktCntrDisable | INTT_1;
-
+ tp->cp_cmd &= ~INTT_MASK;
+ tp->cp_cmd |= PktCntrDisable | INTT_1;
RTL_W16(tp, CPlusCmd, tp->cp_cmd);
RTL_W16(tp, IntrMitigate, 0x5151);
@@ -6350,12 +5907,6 @@
tp->event_slow &= ~RxOverflow;
}
- rtl_set_rx_tx_desc_registers(tp);
-
- rtl_set_rx_tx_config_registers(tp);
-
- RTL_R8(tp, IntrMask);
-
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_11:
rtl_hw_start_8168bb(tp);
@@ -6456,30 +6007,11 @@
default:
printk(KERN_ERR PFX "%s: unknown chipset (mac_version = %d).\n",
- dev->name, tp->mac_version);
+ tp->dev->name, tp->mac_version);
break;
}
-
- RTL_W8(tp, Cfg9346, Cfg9346_Lock);
-
- RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
-
- rtl_set_rx_mode(dev);
-
- RTL_W16(tp, MultiIntr, RTL_R16(tp, MultiIntr) & 0xf000);
}
-#define R810X_CPCMD_QUIRK_MASK (\
- EnableBist | \
- Mac_dbgo_oe | \
- Force_half_dup | \
- Force_rxflow_en | \
- Force_txflow_en | \
- Cxpl_dbg_sel | \
- ASF | \
- PktCntrDisable | \
- Mac_dbgo_sel)
-
static void rtl_hw_start_8102e_1(struct rtl8169_private *tp)
{
static const struct ephy_info e_info_8102e_1[] = {
@@ -6498,7 +6030,7 @@
RTL_W8(tp, DBG_REG, FIX_NAK_1);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
RTL_W8(tp, Config1,
LEDS1 | LEDS0 | Speed_down | MEMMAP | IOMAP | VPD | PMEnable);
@@ -6515,7 +6047,7 @@
{
rtl_csi_access_enable_2(tp);
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
RTL_W8(tp, Config1, MEMMAP | IOMAP | VPD | PMEnable);
RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Beacon_en);
@@ -6578,7 +6110,7 @@
rtl_ephy_init(tp, e_info_8402, ARRAY_SIZE(e_info_8402));
- rtl_tx_performance_tweak(tp, 0x5 << MAX_READ_REQUEST_SHIFT);
+ rtl_tx_performance_tweak(tp, PCI_EXP_DEVCTL_READRQ_4096B);
rtl_eri_write(tp, 0xc8, ERIAR_MASK_1111, 0x00000002, ERIAR_EXGMAC);
rtl_eri_write(tp, 0xe8, ERIAR_MASK_1111, 0x00000006, ERIAR_EXGMAC);
@@ -6603,32 +6135,21 @@
rtl_pcie_state_l2l3_enable(tp, false);
}
-static void rtl_hw_start_8101(struct net_device *dev)
+static void rtl_hw_start_8101(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
- struct pci_dev *pdev = tp->pci_dev;
-
if (tp->mac_version >= RTL_GIGA_MAC_VER_30)
tp->event_slow &= ~RxFIFOOver;
if (tp->mac_version == RTL_GIGA_MAC_VER_13 ||
tp->mac_version == RTL_GIGA_MAC_VER_16)
- pcie_capability_set_word(pdev, PCI_EXP_DEVCTL,
+ pcie_capability_set_word(tp->pci_dev, PCI_EXP_DEVCTL,
PCI_EXP_DEVCTL_NOSNOOP_EN);
- RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
-
RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
- rtl_set_rx_max_size(tp, rx_buf_sz);
-
- tp->cp_cmd &= ~R810X_CPCMD_QUIRK_MASK;
+ tp->cp_cmd &= CPCMD_QUIRK_MASK;
RTL_W16(tp, CPlusCmd, tp->cp_cmd);
- rtl_set_rx_tx_desc_registers(tp);
-
- rtl_set_rx_tx_config_registers(tp);
-
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_07:
rtl_hw_start_8102e_1(tp);
@@ -6665,17 +6186,7 @@
break;
}
- RTL_W8(tp, Cfg9346, Cfg9346_Lock);
-
RTL_W16(tp, IntrMitigate, 0x0000);
-
- RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
-
- rtl_set_rx_mode(dev);
-
- RTL_R8(tp, IntrMask);
-
- RTL_W16(tp, MultiIntr, RTL_R16(tp, MultiIntr) & 0xf000);
}
static int rtl8169_change_mtu(struct net_device *dev, int new_mtu)
@@ -6702,29 +6213,22 @@
static void rtl8169_free_rx_databuff(struct rtl8169_private *tp,
void **data_buff, struct RxDesc *desc)
{
- dma_unmap_single(tp_to_dev(tp), le64_to_cpu(desc->addr), rx_buf_sz,
- DMA_FROM_DEVICE);
+ dma_unmap_single(tp_to_dev(tp), le64_to_cpu(desc->addr),
+ R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
kfree(*data_buff);
*data_buff = NULL;
rtl8169_make_unusable_by_asic(desc);
}
-static inline void rtl8169_mark_to_asic(struct RxDesc *desc, u32 rx_buf_sz)
+static inline void rtl8169_mark_to_asic(struct RxDesc *desc)
{
u32 eor = le32_to_cpu(desc->opts1) & RingEnd;
/* Force memory writes to complete before releasing descriptor */
dma_wmb();
- desc->opts1 = cpu_to_le32(DescOwn | eor | rx_buf_sz);
-}
-
-static inline void rtl8169_map_to_asic(struct RxDesc *desc, dma_addr_t mapping,
- u32 rx_buf_sz)
-{
- desc->addr = cpu_to_le64(mapping);
- rtl8169_mark_to_asic(desc, rx_buf_sz);
+ desc->opts1 = cpu_to_le32(DescOwn | eor | R8169_RX_BUF_SIZE);
}
static inline void *rtl8169_align(void *data)
@@ -6738,21 +6242,20 @@
void *data;
dma_addr_t mapping;
struct device *d = tp_to_dev(tp);
- struct net_device *dev = tp->dev;
- int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
+ int node = dev_to_node(d);
- data = kmalloc_node(rx_buf_sz, GFP_KERNEL, node);
+ data = kmalloc_node(R8169_RX_BUF_SIZE, GFP_KERNEL, node);
if (!data)
return NULL;
if (rtl8169_align(data) != data) {
kfree(data);
- data = kmalloc_node(rx_buf_sz + 15, GFP_KERNEL, node);
+ data = kmalloc_node(R8169_RX_BUF_SIZE + 15, GFP_KERNEL, node);
if (!data)
return NULL;
}
- mapping = dma_map_single(d, rtl8169_align(data), rx_buf_sz,
+ mapping = dma_map_single(d, rtl8169_align(data), R8169_RX_BUF_SIZE,
DMA_FROM_DEVICE);
if (unlikely(dma_mapping_error(d, mapping))) {
if (net_ratelimit())
@@ -6760,7 +6263,8 @@
goto err_out;
}
- rtl8169_map_to_asic(desc, mapping, rx_buf_sz);
+ desc->addr = cpu_to_le64(mapping);
+ rtl8169_mark_to_asic(desc);
return data;
err_out:
@@ -6792,9 +6296,6 @@
for (i = 0; i < NUM_RX_DESC; i++) {
void *data;
- if (tp->Rx_databuff[i])
- continue;
-
data = rtl8169_alloc_rx_data(tp, tp->RxDescArray + i);
if (!data) {
rtl8169_make_unusable_by_asic(tp->RxDescArray + i);
@@ -6811,14 +6312,12 @@
return -ENOMEM;
}
-static int rtl8169_init_ring(struct net_device *dev)
+static int rtl8169_init_ring(struct rtl8169_private *tp)
{
- struct rtl8169_private *tp = netdev_priv(dev);
-
rtl8169_init_ring_indexes(tp);
- memset(tp->tx_skb, 0x0, NUM_TX_DESC * sizeof(struct ring_info));
- memset(tp->Rx_databuff, 0x0, NUM_RX_DESC * sizeof(void *));
+ memset(tp->tx_skb, 0, sizeof(tp->tx_skb));
+ memset(tp->Rx_databuff, 0, sizeof(tp->Rx_databuff));
return rtl8169_rx_fill(tp);
}
@@ -6877,13 +6376,13 @@
rtl8169_hw_reset(tp);
for (i = 0; i < NUM_RX_DESC; i++)
- rtl8169_mark_to_asic(tp->RxDescArray + i, rx_buf_sz);
+ rtl8169_mark_to_asic(tp->RxDescArray + i);
rtl8169_tx_clear(tp);
rtl8169_init_ring_indexes(tp);
napi_enable(&tp->napi);
- rtl_hw_start(dev);
+ rtl_hw_start(tp);
netif_wake_queue(dev);
rtl8169_check_link_status(dev, tp);
}
@@ -7015,18 +6514,6 @@
return ret;
}
-static inline __be16 get_protocol(struct sk_buff *skb)
-{
- __be16 protocol;
-
- if (skb->protocol == htons(ETH_P_8021Q))
- protocol = vlan_eth_hdr(skb)->h_vlan_encapsulated_proto;
- else
- protocol = skb->protocol;
-
- return protocol;
-}
-
static bool rtl8169_tso_csum_v1(struct rtl8169_private *tp,
struct sk_buff *skb, u32 *opts)
{
@@ -7063,7 +6550,7 @@
return false;
}
- switch (get_protocol(skb)) {
+ switch (vlan_get_protocol(skb)) {
case htons(ETH_P_IP):
opts[0] |= TD1_GTSENV4;
break;
@@ -7095,7 +6582,7 @@
return false;
}
- switch (get_protocol(skb)) {
+ switch (vlan_get_protocol(skb)) {
case htons(ETH_P_IP):
opts[1] |= TD1_IPv4_CS;
ip_protocol = ip_hdr(skb)->protocol;
@@ -7365,7 +6852,7 @@
prefetch(data);
skb = napi_alloc_skb(&tp->napi, pkt_size);
if (skb)
- memcpy(skb->data, data, pkt_size);
+ skb_copy_to_linear_data(skb, data, pkt_size);
dma_sync_single_for_device(d, addr, pkt_size, DMA_FROM_DEVICE);
return skb;
@@ -7383,7 +6870,7 @@
struct RxDesc *desc = tp->RxDescArray + entry;
u32 status;
- status = le32_to_cpu(desc->opts1) & tp->opts1_mask;
+ status = le32_to_cpu(desc->opts1);
if (status & DescOwn)
break;
@@ -7401,14 +6888,16 @@
dev->stats.rx_length_errors++;
if (status & RxCRC)
dev->stats.rx_crc_errors++;
- if (status & RxFOVF) {
+ /* RxFOVF is a reserved bit on later chip versions */
+ if (tp->mac_version == RTL_GIGA_MAC_VER_01 &&
+ status & RxFOVF) {
rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
dev->stats.rx_fifo_errors++;
- }
- if ((status & (RxRUNT | RxCRC)) &&
- !(status & (RxRWT | RxFOVF)) &&
- (dev->features & NETIF_F_RXALL))
+ } else if (status & (RxRUNT | RxCRC) &&
+ !(status & RxRWT) &&
+ dev->features & NETIF_F_RXALL) {
goto process_pkt;
+ }
} else {
struct sk_buff *skb;
dma_addr_t addr;
@@ -7457,7 +6946,7 @@
}
release_descriptor:
desc->opts2 = 0;
- rtl8169_mark_to_asic(desc, rx_buf_sz);
+ rtl8169_mark_to_asic(desc);
}
count = cur_rx - tp->cur_rx;
@@ -7468,8 +6957,7 @@
static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
{
- struct net_device *dev = dev_instance;
- struct rtl8169_private *tp = netdev_priv(dev);
+ struct rtl8169_private *tp = dev_instance;
int handled = 0;
u16 status;
@@ -7480,7 +6968,7 @@
handled = 1;
rtl_irq_disable(tp);
- napi_schedule(&tp->napi);
+ napi_schedule_irqoff(&tp->napi);
}
}
return IRQ_RETVAL(handled);
@@ -7631,7 +7119,7 @@
pm_runtime_get_sync(&pdev->dev);
/* Update counters before going down */
- rtl8169_update_counters(dev);
+ rtl8169_update_counters(tp);
rtl_lock_work(tp);
clear_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags);
@@ -7641,7 +7129,7 @@
cancel_work_sync(&tp->wk.work);
- pci_free_irq(pdev, 0, dev);
+ pci_free_irq(pdev, 0, tp);
dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
tp->RxPhyAddr);
@@ -7686,7 +7174,7 @@
if (!tp->RxDescArray)
goto err_free_tx_0;
- retval = rtl8169_init_ring(dev);
+ retval = rtl8169_init_ring(tp);
if (retval < 0)
goto err_free_rx_1;
@@ -7696,7 +7184,7 @@
rtl_request_firmware(tp);
- retval = pci_request_irq(pdev, 0, rtl8169_interrupt, NULL, dev,
+ retval = pci_request_irq(pdev, 0, rtl8169_interrupt, NULL, tp,
dev->name);
if (retval < 0)
goto err_release_fw_2;
@@ -7709,13 +7197,11 @@
rtl8169_init_phy(dev, tp);
- __rtl8169_set_features(dev, dev->features);
-
rtl_pll_power_up(tp);
- rtl_hw_start(dev);
+ rtl_hw_start(tp);
- if (!rtl8169_init_counter_offsets(dev))
+ if (!rtl8169_init_counter_offsets(tp))
netif_warn(tp, hw, dev, "counter reset/update failed\n");
netif_start_queue(dev);
@@ -7784,7 +7270,7 @@
* from tally counters.
*/
if (pm_runtime_active(&pdev->dev))
- rtl8169_update_counters(dev);
+ rtl8169_update_counters(tp);
/*
* Subtract values fetched during initalization.
@@ -7880,7 +7366,7 @@
/* Update counters before going runtime suspend */
rtl8169_rx_missed(dev);
- rtl8169_update_counters(dev);
+ rtl8169_update_counters(tp);
return 0;
}
@@ -8020,9 +7506,7 @@
};
static const struct rtl_cfg_info {
- void (*hw_start)(struct net_device *);
- unsigned int region;
- unsigned int align;
+ void (*hw_start)(struct rtl8169_private *tp);
u16 event_slow;
unsigned int has_gmii:1;
const struct rtl_coalesce_info *coalesce_info;
@@ -8030,8 +7514,6 @@
} rtl_cfg_infos [] = {
[RTL_CFG_0] = {
.hw_start = rtl_hw_start_8169,
- .region = 1,
- .align = 0,
.event_slow = SYSErr | LinkChg | RxOverflow | RxFIFOOver,
.has_gmii = 1,
.coalesce_info = rtl_coalesce_info_8169,
@@ -8039,8 +7521,6 @@
},
[RTL_CFG_1] = {
.hw_start = rtl_hw_start_8168,
- .region = 2,
- .align = 8,
.event_slow = SYSErr | LinkChg | RxOverflow,
.has_gmii = 1,
.coalesce_info = rtl_coalesce_info_8168_8136,
@@ -8048,8 +7528,6 @@
},
[RTL_CFG_2] = {
.hw_start = rtl_hw_start_8101,
- .region = 2,
- .align = 8,
.event_slow = SYSErr | LinkChg | RxOverflow | RxFIFOOver |
PCSTimeout,
.coalesce_info = rtl_coalesce_info_8168_8136,
@@ -8125,20 +7603,10 @@
static void rtl_hw_initialize(struct rtl8169_private *tp)
{
switch (tp->mac_version) {
- case RTL_GIGA_MAC_VER_40:
- case RTL_GIGA_MAC_VER_41:
- case RTL_GIGA_MAC_VER_42:
- case RTL_GIGA_MAC_VER_43:
- case RTL_GIGA_MAC_VER_44:
- case RTL_GIGA_MAC_VER_45:
- case RTL_GIGA_MAC_VER_46:
- case RTL_GIGA_MAC_VER_47:
- case RTL_GIGA_MAC_VER_48:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_48:
rtl_hw_init_8168g(tp);
break;
- case RTL_GIGA_MAC_VER_49:
- case RTL_GIGA_MAC_VER_50:
- case RTL_GIGA_MAC_VER_51:
+ case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_51:
rtl_hw_init_8168ep(tp);
break;
default:
@@ -8149,11 +7617,10 @@
static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
{
const struct rtl_cfg_info *cfg = rtl_cfg_infos + ent->driver_data;
- const unsigned int region = cfg->region;
struct rtl8169_private *tp;
struct mii_if_info *mii;
struct net_device *dev;
- int chipset, i;
+ int chipset, region, i;
int rc;
if (netif_msg_drv(&debug)) {
@@ -8188,43 +7655,41 @@
/* enable device (incl. PCI PM wakeup and hotplug setup) */
rc = pcim_enable_device(pdev);
if (rc < 0) {
- netif_err(tp, probe, dev, "enable failure\n");
+ dev_err(&pdev->dev, "enable failure\n");
return rc;
}
if (pcim_set_mwi(pdev) < 0)
- netif_info(tp, probe, dev, "Mem-Wr-Inval unavailable\n");
+ dev_info(&pdev->dev, "Mem-Wr-Inval unavailable\n");
- /* make sure PCI base addr 1 is MMIO */
- if (!(pci_resource_flags(pdev, region) & IORESOURCE_MEM)) {
- netif_err(tp, probe, dev,
- "region #%d not an MMIO resource, aborting\n",
- region);
+ /* use first MMIO region */
+ region = ffs(pci_select_bars(pdev, IORESOURCE_MEM)) - 1;
+ if (region < 0) {
+ dev_err(&pdev->dev, "no MMIO resource found\n");
return -ENODEV;
}
/* check for weird/broken PCI region reporting */
if (pci_resource_len(pdev, region) < R8169_REGS_SIZE) {
- netif_err(tp, probe, dev,
- "Invalid PCI region size(s), aborting\n");
+ dev_err(&pdev->dev, "Invalid PCI region size(s), aborting\n");
return -ENODEV;
}
rc = pcim_iomap_regions(pdev, BIT(region), MODULENAME);
if (rc < 0) {
- netif_err(tp, probe, dev, "cannot remap MMIO, aborting\n");
+ dev_err(&pdev->dev, "cannot remap MMIO, aborting\n");
return rc;
}
tp->mmio_addr = pcim_iomap_table(pdev)[region];
if (!pci_is_pcie(pdev))
- netif_info(tp, probe, dev, "not PCI Express\n");
+ dev_info(&pdev->dev, "not PCI Express\n");
/* Identify chip attached to board */
- rtl8169_get_mac_version(tp, dev, cfg->default_ver);
+ rtl8169_get_mac_version(tp, cfg->default_ver);
- tp->cp_cmd = 0;
+ tp->cp_cmd = RTL_R16(tp, CPlusCmd);
if ((sizeof(dma_addr_t) > 4) &&
(use_dac == 1 || (use_dac == -1 && pci_is_pcie(pdev) &&
@@ -8239,7 +7704,7 @@
} else {
rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
if (rc < 0) {
- netif_err(tp, probe, dev, "DMA configuration failed\n");
+ dev_err(&pdev->dev, "DMA configuration failed\n");
return rc;
}
}
@@ -8257,18 +7722,15 @@
pci_set_master(pdev);
rtl_init_mdio_ops(tp);
- rtl_init_pll_power_ops(tp);
rtl_init_jumbo_ops(tp);
- rtl_init_csi_ops(tp);
rtl8169_print_mac_version(tp);
chipset = tp->mac_version;
- tp->txd_version = rtl_chip_infos[chipset].txd_version;
rc = rtl_alloc_irq(tp);
if (rc < 0) {
- netif_err(tp, probe, dev, "Can't allocate interrupt\n");
+ dev_err(&pdev->dev, "Can't allocate interrupt\n");
return rc;
}
@@ -8296,29 +7758,18 @@
u64_stats_init(&tp->tx_stats.syncp);
/* Get MAC address */
- if (tp->mac_version == RTL_GIGA_MAC_VER_35 ||
- tp->mac_version == RTL_GIGA_MAC_VER_36 ||
- tp->mac_version == RTL_GIGA_MAC_VER_37 ||
- tp->mac_version == RTL_GIGA_MAC_VER_38 ||
- tp->mac_version == RTL_GIGA_MAC_VER_40 ||
- tp->mac_version == RTL_GIGA_MAC_VER_41 ||
- tp->mac_version == RTL_GIGA_MAC_VER_42 ||
- tp->mac_version == RTL_GIGA_MAC_VER_43 ||
- tp->mac_version == RTL_GIGA_MAC_VER_44 ||
- tp->mac_version == RTL_GIGA_MAC_VER_45 ||
- tp->mac_version == RTL_GIGA_MAC_VER_46 ||
- tp->mac_version == RTL_GIGA_MAC_VER_47 ||
- tp->mac_version == RTL_GIGA_MAC_VER_48 ||
- tp->mac_version == RTL_GIGA_MAC_VER_49 ||
- tp->mac_version == RTL_GIGA_MAC_VER_50 ||
- tp->mac_version == RTL_GIGA_MAC_VER_51) {
- u16 mac_addr[3];
-
+ switch (tp->mac_version) {
+ u8 mac_addr[ETH_ALEN] __aligned(4);
+ case RTL_GIGA_MAC_VER_35 ... RTL_GIGA_MAC_VER_38:
+ case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
*(u32 *)&mac_addr[0] = rtl_eri_read(tp, 0xe0, ERIAR_EXGMAC);
- *(u16 *)&mac_addr[2] = rtl_eri_read(tp, 0xe4, ERIAR_EXGMAC);
+ *(u16 *)&mac_addr[4] = rtl_eri_read(tp, 0xe4, ERIAR_EXGMAC);
- if (is_valid_ether_addr((u8 *)mac_addr))
- rtl_rar_set(tp, (u8 *)mac_addr);
+ if (is_valid_ether_addr(mac_addr))
+ rtl_rar_set(tp, mac_addr);
+ break;
+ default:
+ break;
}
for (i = 0; i < ETH_ALEN; i++)
dev->dev_addr[i] = RTL_R8(tp, MAC0 + i);
@@ -8326,7 +7777,7 @@
dev->ethtool_ops = &rtl8169_ethtool_ops;
dev->watchdog_timeo = RTL8169_TX_TIMEOUT;
- netif_napi_add(dev, &tp->napi, rtl8169_poll, R8169_NAPI_WEIGHT);
+ netif_napi_add(dev, &tp->napi, rtl8169_poll, NAPI_POLL_WEIGHT);
/* don't enable SG, IP_CSUM and TSO by default - it might not work
* properly for all devices */
@@ -8349,13 +7800,17 @@
/* Disallow toggling */
dev->hw_features &= ~NETIF_F_HW_VLAN_CTAG_RX;
- if (tp->txd_version == RTL_TD_0)
+ switch (rtl_chip_infos[chipset].txd_version) {
+ case RTL_TD_0:
tp->tso_csum = rtl8169_tso_csum_v1;
- else if (tp->txd_version == RTL_TD_1) {
+ break;
+ case RTL_TD_1:
tp->tso_csum = rtl8169_tso_csum_v2;
dev->hw_features |= NETIF_F_IPV6_CSUM | NETIF_F_TSO6;
- } else
+ break;
+ default:
WARN_ON_ONCE(1);
+ }
dev->hw_features |= NETIF_F_RXALL;
dev->hw_features |= NETIF_F_RXFCS;
@@ -8368,9 +7823,6 @@
tp->event_slow = cfg->event_slow;
tp->coalesce_info = cfg->coalesce_info;
- tp->opts1_mask = (tp->mac_version != RTL_GIGA_MAC_VER_01) ?
- ~(RxBOVF | RxFOVF) : ~0;
-
timer_setup(&tp->timer, rtl8169_phy_timer, 0);
tp->rtl_fw = RTL_FIRMWARE_UNKNOWN;
@@ -8387,15 +7839,15 @@
if (rc < 0)
return rc;
- netif_info(tp, probe, dev, "%s at 0x%p, %pM, XID %08x IRQ %d\n",
- rtl_chip_infos[chipset].name, tp->mmio_addr, dev->dev_addr,
- (u32)(RTL_R32(tp, TxConfig) & 0x9cf0f8ff),
+ netif_info(tp, probe, dev, "%s, %pM, XID %08x, IRQ %d\n",
+ rtl_chip_infos[chipset].name, dev->dev_addr,
+ (u32)(RTL_R32(tp, TxConfig) & 0xfcf0f8ff),
pci_irq_vector(pdev, 0));
if (rtl_chip_infos[chipset].jumbo_max != JUMBO_1K) {
netif_info(tp, probe, dev, "jumbo features [frames: %d bytes, "
"tx checksumming: %s]\n",
rtl_chip_infos[chipset].jumbo_max,
- rtl_chip_infos[chipset].jumbo_tx_csum ? "ok" : "ko");
+ tp->mac_version <= RTL_GIGA_MAC_VER_06 ? "ok" : "ko");
}
if (r8168_check_dash(tp))
diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index b6b90a6..d9cadfb 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -442,12 +442,22 @@
static void sh_eth_tsu_write(struct sh_eth_private *mdp, u32 data,
int enum_index)
{
- iowrite32(data, mdp->tsu_addr + mdp->reg_offset[enum_index]);
+ u16 offset = mdp->reg_offset[enum_index];
+
+ if (WARN_ON(offset == SH_ETH_OFFSET_INVALID))
+ return;
+
+ iowrite32(data, mdp->tsu_addr + offset);
}
static u32 sh_eth_tsu_read(struct sh_eth_private *mdp, int enum_index)
{
- return ioread32(mdp->tsu_addr + mdp->reg_offset[enum_index]);
+ u16 offset = mdp->reg_offset[enum_index];
+
+ if (WARN_ON(offset == SH_ETH_OFFSET_INVALID))
+ return ~0U;
+
+ return ioread32(mdp->tsu_addr + offset);
}
static void sh_eth_select_mii(struct net_device *ndev)
@@ -456,6 +466,9 @@
u32 value;
switch (mdp->phy_interface) {
+ case PHY_INTERFACE_MODE_RGMII ... PHY_INTERFACE_MODE_RGMII_TXID:
+ value = 0x3;
+ break;
case PHY_INTERFACE_MODE_GMII:
value = 0x2;
break;
@@ -693,7 +706,7 @@
EESIPR_RTLFIP | EESIPR_RTSFIP |
EESIPR_PREIP | EESIPR_CERFIP,
- .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
+ .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_TRO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
EESR_RDE | EESR_RFRMER | EESR_TFE | EESR_TDE,
.fdr_value = 0x00000f0f,
@@ -725,7 +738,7 @@
EESIPR_RTLFIP | EESIPR_RTSFIP |
EESIPR_PREIP | EESIPR_CERFIP,
- .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
+ .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_TRO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
EESR_RDE | EESR_RFRMER | EESR_TFE | EESR_TDE,
.fdr_value = 0x00000f0f,
@@ -740,6 +753,49 @@
.rmiimode = 1,
.magic = 1,
};
+
+/* R8A77980 */
+static struct sh_eth_cpu_data r8a77980_data = {
+ .soft_reset = sh_eth_soft_reset_gether,
+
+ .set_duplex = sh_eth_set_duplex,
+ .set_rate = sh_eth_set_rate_gether,
+
+ .register_type = SH_ETH_REG_GIGABIT,
+
+ .edtrr_trns = EDTRR_TRNS_GETHER,
+ .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD | ECSR_MPD,
+ .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP |
+ ECSIPR_MPDIP,
+ .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP |
+ EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP |
+ EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP |
+ EESIPR_RMAFIP | EESIPR_RRFIP |
+ EESIPR_RTLFIP | EESIPR_RTSFIP |
+ EESIPR_PREIP | EESIPR_CERFIP,
+
+ .tx_check = EESR_FTC | EESR_CD | EESR_TRO,
+ .eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
+ EESR_RFE | EESR_RDE | EESR_RFRMER |
+ EESR_TFE | EESR_TDE | EESR_ECI,
+ .fdr_value = 0x0000070f,
+
+ .apr = 1,
+ .mpr = 1,
+ .tpauser = 1,
+ .bculr = 1,
+ .hw_swap = 1,
+ .nbst = 1,
+ .rpadir = 1,
+ .rpadir_value = 2 << 16,
+ .no_trimd = 1,
+ .no_ade = 1,
+ .xdfar_rw = 1,
+ .hw_checksum = 1,
+ .select_mii = 1,
+ .magic = 1,
+ .cexcr = 1,
+};
#endif /* CONFIG_OF */
static void sh_eth_set_rate_sh7724(struct net_device *ndev)
@@ -775,7 +831,7 @@
EESIPR_RTLFIP | EESIPR_RTSFIP |
EESIPR_PREIP | EESIPR_CERFIP,
- .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
+ .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_TRO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
EESR_RDE | EESR_RFRMER | EESR_TFE | EESR_TDE,
@@ -820,7 +876,7 @@
EESIPR_RRFIP | EESIPR_RTLFIP | EESIPR_RTSFIP |
EESIPR_PREIP | EESIPR_CERFIP,
- .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
+ .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_TRO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
EESR_RDE | EESR_RFRMER | EESR_TFE | EESR_TDE,
@@ -1421,8 +1477,13 @@
sh_eth_write(ndev, mdp->cd->trscer_err_mask, TRSCER);
+ /* DMA transfer burst mode */
+ if (mdp->cd->nbst)
+ sh_eth_modify(ndev, EDMR, EDMR_NBST, EDMR_NBST);
+
+ /* Burst cycle count upper-limit */
if (mdp->cd->bculr)
- sh_eth_write(ndev, 0x800, BCULR); /* Burst sycle set */
+ sh_eth_write(ndev, 0x800, BCULR);
sh_eth_write(ndev, mdp->cd->fcftr_value, FCFTR);
@@ -2610,12 +2671,6 @@
}
/* For TSU_POSTn. Please refer to the manual about this (strange) bitfields */
-static void *sh_eth_tsu_get_post_reg_offset(struct sh_eth_private *mdp,
- int entry)
-{
- return sh_eth_tsu_get_offset(mdp, TSU_POST1) + (entry / 8 * 4);
-}
-
static u32 sh_eth_tsu_get_post_mask(int entry)
{
return 0x0f << (28 - ((entry % 8) * 4));
@@ -2630,27 +2685,25 @@
int entry)
{
struct sh_eth_private *mdp = netdev_priv(ndev);
+ int reg = TSU_POST1 + entry / 8;
u32 tmp;
- void *reg_offset;
- reg_offset = sh_eth_tsu_get_post_reg_offset(mdp, entry);
- tmp = ioread32(reg_offset);
- iowrite32(tmp | sh_eth_tsu_get_post_bit(mdp, entry), reg_offset);
+ tmp = sh_eth_tsu_read(mdp, reg);
+ sh_eth_tsu_write(mdp, tmp | sh_eth_tsu_get_post_bit(mdp, entry), reg);
}
static bool sh_eth_tsu_disable_cam_entry_post(struct net_device *ndev,
int entry)
{
struct sh_eth_private *mdp = netdev_priv(ndev);
+ int reg = TSU_POST1 + entry / 8;
u32 post_mask, ref_mask, tmp;
- void *reg_offset;
- reg_offset = sh_eth_tsu_get_post_reg_offset(mdp, entry);
post_mask = sh_eth_tsu_get_post_mask(entry);
ref_mask = sh_eth_tsu_get_post_bit(mdp, entry) & ~post_mask;
- tmp = ioread32(reg_offset);
- iowrite32(tmp & ~post_mask, reg_offset);
+ tmp = sh_eth_tsu_read(mdp, reg);
+ sh_eth_tsu_write(mdp, tmp & ~post_mask, reg);
/* If other port enables, the function returns "true" */
return tmp & ref_mask;
@@ -3023,15 +3076,10 @@
pdev->name, pdev->id);
/* register MDIO bus */
- if (dev->of_node) {
- ret = of_mdiobus_register(mdp->mii_bus, dev->of_node);
- } else {
- if (pd->phy_irq > 0)
- mdp->mii_bus->irq[pd->phy] = pd->phy_irq;
+ if (pd->phy_irq > 0)
+ mdp->mii_bus->irq[pd->phy] = pd->phy_irq;
- ret = mdiobus_register(mdp->mii_bus);
- }
-
+ ret = of_mdiobus_register(mdp->mii_bus, dev->of_node);
if (ret)
goto out_free_bus;
@@ -3130,6 +3178,7 @@
{ .compatible = "renesas,ether-r8a7791", .data = &rcar_gen2_data },
{ .compatible = "renesas,ether-r8a7793", .data = &rcar_gen2_data },
{ .compatible = "renesas,ether-r8a7794", .data = &rcar_gen2_data },
+ { .compatible = "renesas,gether-r8a77980", .data = &r8a77980_data },
{ .compatible = "renesas,ether-r7s72100", .data = &r7s72100_data },
{ .compatible = "renesas,rcar-gen1-ether", .data = &rcar_gen1_data },
{ .compatible = "renesas,rcar-gen2-ether", .data = &rcar_gen2_data },
diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
index 1bf930d..5dee19b 100644
--- a/drivers/net/ethernet/renesas/sh_eth.h
+++ b/drivers/net/ethernet/renesas/sh_eth.h
@@ -184,6 +184,7 @@
/* EDMR */
enum DMAC_M_BIT {
+ EDMR_NBST = 0x80,
EDMR_EL = 0x40, /* Litte endian */
EDMR_DL1 = 0x20, EDMR_DL0 = 0x10,
EDMR_SRST_GETHER = 0x03,
@@ -242,7 +243,7 @@
EESR_CND = 0x00000800,
EESR_DLC = 0x00000400,
EESR_CD = 0x00000200,
- EESR_RTO = 0x00000100,
+ EESR_TRO = 0x00000100,
EESR_RMAF = 0x00000080,
EESR_CEEF = 0x00000040,
EESR_CELF = 0x00000020,
@@ -262,7 +263,7 @@
EESR_CERF) /* Recv frame CRC error */
#define DEFAULT_TX_CHECK (EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | \
- EESR_RTO)
+ EESR_TRO)
#define DEFAULT_EESR_ERR_CHECK (EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE | \
EESR_RDE | EESR_RFRMER | EESR_ADE | \
EESR_TFE | EESR_TDE)
@@ -498,20 +499,21 @@
/* hardware features */
unsigned long irq_flags; /* IRQ configuration flags */
- unsigned no_psr:1; /* EtherC DO NOT have PSR */
- unsigned apr:1; /* EtherC have APR */
- unsigned mpr:1; /* EtherC have MPR */
- unsigned tpauser:1; /* EtherC have TPAUSER */
- unsigned bculr:1; /* EtherC have BCULR */
- unsigned tsu:1; /* EtherC have TSU */
- unsigned hw_swap:1; /* E-DMAC have DE bit in EDMR */
- unsigned rpadir:1; /* E-DMAC have RPADIR */
- unsigned no_trimd:1; /* E-DMAC DO NOT have TRIMD */
- unsigned no_ade:1; /* E-DMAC DO NOT have ADE bit in EESR */
+ unsigned no_psr:1; /* EtherC DOES NOT have PSR */
+ unsigned apr:1; /* EtherC has APR */
+ unsigned mpr:1; /* EtherC has MPR */
+ unsigned tpauser:1; /* EtherC has TPAUSER */
+ unsigned bculr:1; /* EtherC has BCULR */
+ unsigned tsu:1; /* EtherC has TSU */
+ unsigned hw_swap:1; /* E-DMAC has DE bit in EDMR */
+ unsigned nbst:1; /* E-DMAC has NBST bit in EDMR */
+ unsigned rpadir:1; /* E-DMAC has RPADIR */
+ unsigned no_trimd:1; /* E-DMAC DOES NOT have TRIMD */
+ unsigned no_ade:1; /* E-DMAC DOES NOT have ADE bit in EESR */
unsigned no_xdfar:1; /* E-DMAC DOES NOT have RDFAR/TDFAR */
unsigned xdfar_rw:1; /* E-DMAC has writeable RDFAR/TDFAR */
unsigned hw_checksum:1; /* E-DMAC has CSMR */
- unsigned select_mii:1; /* EtherC have RMII_MII (MII select register) */
+ unsigned select_mii:1; /* EtherC has RMII_MII (MII select register) */
unsigned rmiimode:1; /* EtherC has RMIIMODE register */
unsigned rtrate:1; /* EtherC has RTRATE register */
unsigned magic:1; /* EtherC has ECMR.MPDE and ECSR.MPD */
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 056cb60..e73e4fe 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2738,6 +2738,8 @@
switch (switchdev_work->event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE:
fdb_info = &switchdev_work->fdb_info;
+ if (!fdb_info->added_by_user)
+ break;
err = rocker_world_port_fdb_add(rocker_port, fdb_info);
if (err) {
netdev_dbg(rocker_port->dev, "fdb add failed err=%d\n", err);
@@ -2747,6 +2749,8 @@
break;
case SWITCHDEV_FDB_DEL_TO_DEVICE:
fdb_info = &switchdev_work->fdb_info;
+ if (!fdb_info->added_by_user)
+ break;
err = rocker_world_port_fdb_del(rocker_port, fdb_info);
if (err)
netdev_dbg(rocker_port->dev, "fdb add failed err=%d\n", err);
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index a4ebd87..88c1eee 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1551,6 +1551,38 @@
return 0;
}
+#if defined(CONFIG_SMP)
+static void efx_set_interrupt_affinity(struct efx_nic *efx)
+{
+ struct efx_channel *channel;
+ unsigned int cpu;
+
+ efx_for_each_channel(channel, efx) {
+ cpu = cpumask_local_spread(channel->channel,
+ pcibus_to_node(efx->pci_dev->bus));
+ irq_set_affinity_hint(channel->irq, cpumask_of(cpu));
+ }
+}
+
+static void efx_clear_interrupt_affinity(struct efx_nic *efx)
+{
+ struct efx_channel *channel;
+
+ efx_for_each_channel(channel, efx)
+ irq_set_affinity_hint(channel->irq, NULL);
+}
+#else
+static void
+efx_set_interrupt_affinity(struct efx_nic *efx __attribute__ ((unused)))
+{
+}
+
+static void
+efx_clear_interrupt_affinity(struct efx_nic *efx __attribute__ ((unused)))
+{
+}
+#endif /* CONFIG_SMP */
+
static int efx_soft_enable_interrupts(struct efx_nic *efx)
{
struct efx_channel *channel, *end_channel;
@@ -3308,6 +3340,7 @@
cancel_work_sync(&efx->reset_work);
efx_disable_interrupts(efx);
+ efx_clear_interrupt_affinity(efx);
efx_nic_fini_interrupt(efx);
efx_fini_port(efx);
efx->type->fini(efx);
@@ -3457,6 +3490,8 @@
rc = efx_nic_init_interrupt(efx);
if (rc)
goto fail5;
+
+ efx_set_interrupt_affinity(efx);
rc = efx_enable_interrupts(efx);
if (rc)
goto fail6;
@@ -3464,6 +3499,7 @@
return 0;
fail6:
+ efx_clear_interrupt_affinity(efx);
efx_nic_fini_interrupt(efx);
fail5:
efx_fini_port(efx);
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index cece961..c3ad564 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -435,17 +435,18 @@
} while (1);
}
-/* Remove buffers put into a tx_queue. None of the buffers must have
- * an skb attached.
+/* Remove buffers put into a tx_queue for the current packet.
+ * None of the buffers must have an skb attached.
*/
-static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue)
+static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue,
+ unsigned int insert_count)
{
struct efx_tx_buffer *buffer;
unsigned int bytes_compl = 0;
unsigned int pkts_compl = 0;
/* Work backwards until we hit the original insert pointer value */
- while (tx_queue->insert_count != tx_queue->write_count) {
+ while (tx_queue->insert_count != insert_count) {
--tx_queue->insert_count;
buffer = __efx_tx_queue_get_insert_buffer(tx_queue);
efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
@@ -504,6 +505,8 @@
*/
netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
{
+ unsigned int old_insert_count = tx_queue->insert_count;
+ bool xmit_more = skb->xmit_more;
bool data_mapped = false;
unsigned int segments;
unsigned int skb_len;
@@ -553,8 +556,10 @@
/* Update BQL */
netdev_tx_sent_queue(tx_queue->core_txq, skb_len);
+ efx_tx_maybe_stop_queue(tx_queue);
+
/* Pass off to hardware */
- if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
+ if (!xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
/* There could be packets left on the partner queue if those
@@ -577,14 +582,26 @@
tx_queue->tx_packets++;
}
- efx_tx_maybe_stop_queue(tx_queue);
-
return NETDEV_TX_OK;
err:
- efx_enqueue_unwind(tx_queue);
+ efx_enqueue_unwind(tx_queue, old_insert_count);
dev_kfree_skb_any(skb);
+
+ /* If we're not expecting another transmit and we had something to push
+ * on this queue or a partner queue then we need to push here to get the
+ * previous packets out.
+ */
+ if (!xmit_more) {
+ struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
+
+ if (txq2->xmit_more_available)
+ efx_nic_push_buffers(txq2);
+
+ efx_nic_push_buffers(tx_queue);
+ }
+
return NETDEV_TX_OK;
}
diff --git a/drivers/net/ethernet/socionext/Kconfig b/drivers/net/ethernet/socionext/Kconfig
index 6bcfe27..b80048c 100644
--- a/drivers/net/ethernet/socionext/Kconfig
+++ b/drivers/net/ethernet/socionext/Kconfig
@@ -14,6 +14,8 @@
config SNI_AVE
tristate "Socionext AVE ethernet support"
depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
+ depends on HAS_IOMEM
+ select MFD_SYSCON
select PHYLIB
---help---
Driver for gigabit ethernet MACs, called AVE, in the
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index f4c0b02..aa50331 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1057,7 +1057,8 @@
return 0;
}
-static int netsec_reset_hardware(struct netsec_priv *priv)
+static int netsec_reset_hardware(struct netsec_priv *priv,
+ bool load_ucode)
{
u32 value;
int err;
@@ -1102,11 +1103,14 @@
netsec_write(priv, NETSEC_REG_NRM_RX_CONFIG,
1 << NETSEC_REG_DESC_ENDIAN);
- err = netsec_netdev_load_microcode(priv);
- if (err) {
- netif_err(priv, probe, priv->ndev,
- "%s: failed to load microcode (%d)\n", __func__, err);
- return err;
+ if (load_ucode) {
+ err = netsec_netdev_load_microcode(priv);
+ if (err) {
+ netif_err(priv, probe, priv->ndev,
+ "%s: failed to load microcode (%d)\n",
+ __func__, err);
+ return err;
+ }
}
/* start DMA engines */
@@ -1313,8 +1317,8 @@
napi_enable(&priv->napi);
netif_start_queue(ndev);
- /* Enable RX intr. */
- netsec_write(priv, NETSEC_REG_INTEN_SET, NETSEC_IRQ_RX);
+ /* Enable TX+RX intr. */
+ netsec_write(priv, NETSEC_REG_INTEN_SET, NETSEC_IRQ_RX | NETSEC_IRQ_TX);
return 0;
err3:
@@ -1328,6 +1332,7 @@
static int netsec_netdev_stop(struct net_device *ndev)
{
+ int ret;
struct netsec_priv *priv = netdev_priv(ndev);
netif_stop_queue(priv->ndev);
@@ -1343,12 +1348,14 @@
netsec_uninit_pkt_dring(priv, NETSEC_RING_TX);
netsec_uninit_pkt_dring(priv, NETSEC_RING_RX);
+ ret = netsec_reset_hardware(priv, false);
+
phy_stop(ndev->phydev);
phy_disconnect(ndev->phydev);
pm_runtime_put_sync(priv->dev);
- return 0;
+ return ret;
}
static int netsec_netdev_init(struct net_device *ndev)
@@ -1364,7 +1371,7 @@
if (ret)
goto err1;
- ret = netsec_reset_hardware(priv);
+ ret = netsec_reset_hardware(priv, true);
if (ret)
goto err2;
diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c
index 0b3b7a4..f7eccee 100644
--- a/drivers/net/ethernet/socionext/sni_ave.c
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -11,6 +11,7 @@
#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/iopoll.h>
+#include <linux/mfd/syscon.h>
#include <linux/mii.h>
#include <linux/module.h>
#include <linux/netdevice.h>
@@ -18,6 +19,7 @@
#include <linux/of_mdio.h>
#include <linux/of_platform.h>
#include <linux/phy.h>
+#include <linux/regmap.h>
#include <linux/reset.h>
#include <linux/types.h>
#include <linux/u64_stats_sync.h>
@@ -197,8 +199,16 @@
#define AVE_INTM_COUNT 20
#define AVE_FORCE_TXINTCNT 1
+/* SG */
+#define SG_ETPINMODE 0x540
+#define SG_ETPINMODE_EXTPHY BIT(1) /* for LD11 */
+#define SG_ETPINMODE_RMII(ins) BIT(ins)
+
#define IS_DESC_64BIT(p) ((p)->data->is_desc_64bit)
+#define AVE_MAX_CLKS 4
+#define AVE_MAX_RSTS 2
+
enum desc_id {
AVE_DESCID_RX,
AVE_DESCID_TX,
@@ -225,10 +235,6 @@
struct ave_desc *desc; /* skb info related descriptor */
};
-struct ave_soc_data {
- bool is_desc_64bit;
-};
-
struct ave_stats {
struct u64_stats_sync syncp;
u64 packets;
@@ -245,11 +251,16 @@
int phy_id;
unsigned int desc_size;
u32 msg_enable;
- struct clk *clk;
- struct reset_control *rst;
+ int nclks;
+ struct clk *clk[AVE_MAX_CLKS];
+ int nrsts;
+ struct reset_control *rst[AVE_MAX_RSTS];
phy_interface_t phy_mode;
struct phy_device *phydev;
struct mii_bus *mdio;
+ struct regmap *regmap;
+ unsigned int pinmode_mask;
+ unsigned int pinmode_val;
/* stats */
struct ave_stats stats_rx;
@@ -272,6 +283,14 @@
const struct ave_soc_data *data;
};
+struct ave_soc_data {
+ bool is_desc_64bit;
+ const char *clock_names[AVE_MAX_CLKS];
+ const char *reset_names[AVE_MAX_RSTS];
+ int (*get_pinmode)(struct ave_private *priv,
+ phy_interface_t phy_mode, u32 arg);
+};
+
static u32 ave_desc_read(struct net_device *ndev, enum desc_id id, int entry,
int offset)
{
@@ -1153,19 +1172,29 @@
struct device_node *np = dev->of_node;
struct device_node *mdio_np;
struct phy_device *phydev;
- int ret;
+ int nc, nr, ret;
/* enable clk because of hw access until ndo_open */
- ret = clk_prepare_enable(priv->clk);
- if (ret) {
- dev_err(dev, "can't enable clock\n");
+ for (nc = 0; nc < priv->nclks; nc++) {
+ ret = clk_prepare_enable(priv->clk[nc]);
+ if (ret) {
+ dev_err(dev, "can't enable clock\n");
+ goto out_clk_disable;
+ }
+ }
+
+ for (nr = 0; nr < priv->nrsts; nr++) {
+ ret = reset_control_deassert(priv->rst[nr]);
+ if (ret) {
+ dev_err(dev, "can't deassert reset\n");
+ goto out_reset_assert;
+ }
+ }
+
+ ret = regmap_update_bits(priv->regmap, SG_ETPINMODE,
+ priv->pinmode_mask, priv->pinmode_val);
+ if (ret)
return ret;
- }
- ret = reset_control_deassert(priv->rst);
- if (ret) {
- dev_err(dev, "can't deassert reset\n");
- goto out_clk_disable;
- }
ave_global_reset(ndev);
@@ -1207,9 +1236,11 @@
out_mdio_unregister:
mdiobus_unregister(priv->mdio);
out_reset_assert:
- reset_control_assert(priv->rst);
+ while (--nr >= 0)
+ reset_control_assert(priv->rst[nr]);
out_clk_disable:
- clk_disable_unprepare(priv->clk);
+ while (--nc >= 0)
+ clk_disable_unprepare(priv->clk[nc]);
return ret;
}
@@ -1217,13 +1248,16 @@
static void ave_uninit(struct net_device *ndev)
{
struct ave_private *priv = netdev_priv(ndev);
+ int i;
phy_disconnect(priv->phydev);
mdiobus_unregister(priv->mdio);
/* disable clk because of hw access after ndo_stop */
- reset_control_assert(priv->rst);
- clk_disable_unprepare(priv->clk);
+ for (i = 0; i < priv->nrsts; i++)
+ reset_control_assert(priv->rst[i]);
+ for (i = 0; i < priv->nclks; i++)
+ clk_disable_unprepare(priv->clk[i]);
}
static int ave_open(struct net_device *ndev)
@@ -1520,6 +1554,7 @@
const struct ave_soc_data *data;
struct device *dev = &pdev->dev;
char buf[ETHTOOL_FWVERS_LEN];
+ struct of_phandle_args args;
phy_interface_t phy_mode;
struct ave_private *priv;
struct net_device *ndev;
@@ -1527,8 +1562,9 @@
struct resource *res;
const void *mac_addr;
void __iomem *base;
+ const char *name;
+ int i, irq, ret;
u64 dma_mask;
- int irq, ret;
u32 ave_id;
data = of_device_get_match_data(dev);
@@ -1541,12 +1577,6 @@
dev_err(dev, "phy-mode not found\n");
return -EINVAL;
}
- if ((!phy_interface_mode_is_rgmii(phy_mode)) &&
- phy_mode != PHY_INTERFACE_MODE_RMII &&
- phy_mode != PHY_INTERFACE_MODE_MII) {
- dev_err(dev, "phy-mode is invalid\n");
- return -EINVAL;
- }
irq = platform_get_irq(pdev, 0);
if (irq < 0) {
@@ -1614,15 +1644,47 @@
u64_stats_init(&priv->stats_tx.syncp);
u64_stats_init(&priv->stats_rx.syncp);
- priv->clk = devm_clk_get(dev, NULL);
- if (IS_ERR(priv->clk)) {
- ret = PTR_ERR(priv->clk);
- goto out_free_netdev;
+ for (i = 0; i < AVE_MAX_CLKS; i++) {
+ name = priv->data->clock_names[i];
+ if (!name)
+ break;
+ priv->clk[i] = devm_clk_get(dev, name);
+ if (IS_ERR(priv->clk[i])) {
+ ret = PTR_ERR(priv->clk[i]);
+ goto out_free_netdev;
+ }
+ priv->nclks++;
}
- priv->rst = devm_reset_control_get_optional_shared(dev, NULL);
- if (IS_ERR(priv->rst)) {
- ret = PTR_ERR(priv->rst);
+ for (i = 0; i < AVE_MAX_RSTS; i++) {
+ name = priv->data->reset_names[i];
+ if (!name)
+ break;
+ priv->rst[i] = devm_reset_control_get_shared(dev, name);
+ if (IS_ERR(priv->rst[i])) {
+ ret = PTR_ERR(priv->rst[i]);
+ goto out_free_netdev;
+ }
+ priv->nrsts++;
+ }
+
+ ret = of_parse_phandle_with_fixed_args(np,
+ "socionext,syscon-phy-mode",
+ 1, 0, &args);
+ if (ret) {
+ netdev_err(ndev, "can't get syscon-phy-mode property\n");
+ goto out_free_netdev;
+ }
+ priv->regmap = syscon_node_to_regmap(args.np);
+ of_node_put(args.np);
+ if (IS_ERR(priv->regmap)) {
+ netdev_err(ndev, "can't map syscon-phy-mode\n");
+ ret = PTR_ERR(priv->regmap);
+ goto out_free_netdev;
+ }
+ ret = priv->data->get_pinmode(priv, phy_mode, args.args[0]);
+ if (ret) {
+ netdev_err(ndev, "invalid phy-mode setting\n");
goto out_free_netdev;
}
@@ -1685,24 +1747,148 @@
return 0;
}
+static int ave_pro4_get_pinmode(struct ave_private *priv,
+ phy_interface_t phy_mode, u32 arg)
+{
+ if (arg > 0)
+ return -EINVAL;
+
+ priv->pinmode_mask = SG_ETPINMODE_RMII(0);
+
+ switch (phy_mode) {
+ case PHY_INTERFACE_MODE_RMII:
+ priv->pinmode_val = SG_ETPINMODE_RMII(0);
+ break;
+ case PHY_INTERFACE_MODE_MII:
+ case PHY_INTERFACE_MODE_RGMII:
+ priv->pinmode_val = 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ave_ld11_get_pinmode(struct ave_private *priv,
+ phy_interface_t phy_mode, u32 arg)
+{
+ if (arg > 0)
+ return -EINVAL;
+
+ priv->pinmode_mask = SG_ETPINMODE_EXTPHY | SG_ETPINMODE_RMII(0);
+
+ switch (phy_mode) {
+ case PHY_INTERFACE_MODE_INTERNAL:
+ priv->pinmode_val = 0;
+ break;
+ case PHY_INTERFACE_MODE_RMII:
+ priv->pinmode_val = SG_ETPINMODE_EXTPHY | SG_ETPINMODE_RMII(0);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ave_ld20_get_pinmode(struct ave_private *priv,
+ phy_interface_t phy_mode, u32 arg)
+{
+ if (arg > 0)
+ return -EINVAL;
+
+ priv->pinmode_mask = SG_ETPINMODE_RMII(0);
+
+ switch (phy_mode) {
+ case PHY_INTERFACE_MODE_RMII:
+ priv->pinmode_val = SG_ETPINMODE_RMII(0);
+ break;
+ case PHY_INTERFACE_MODE_RGMII:
+ priv->pinmode_val = 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ave_pxs3_get_pinmode(struct ave_private *priv,
+ phy_interface_t phy_mode, u32 arg)
+{
+ if (arg > 1)
+ return -EINVAL;
+
+ priv->pinmode_mask = SG_ETPINMODE_RMII(arg);
+
+ switch (phy_mode) {
+ case PHY_INTERFACE_MODE_RMII:
+ priv->pinmode_val = SG_ETPINMODE_RMII(arg);
+ break;
+ case PHY_INTERFACE_MODE_RGMII:
+ priv->pinmode_val = 0;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static const struct ave_soc_data ave_pro4_data = {
.is_desc_64bit = false,
+ .clock_names = {
+ "gio", "ether", "ether-gb", "ether-phy",
+ },
+ .reset_names = {
+ "gio", "ether",
+ },
+ .get_pinmode = ave_pro4_get_pinmode,
};
static const struct ave_soc_data ave_pxs2_data = {
.is_desc_64bit = false,
+ .clock_names = {
+ "ether",
+ },
+ .reset_names = {
+ "ether",
+ },
+ .get_pinmode = ave_pro4_get_pinmode,
};
static const struct ave_soc_data ave_ld11_data = {
.is_desc_64bit = false,
+ .clock_names = {
+ "ether",
+ },
+ .reset_names = {
+ "ether",
+ },
+ .get_pinmode = ave_ld11_get_pinmode,
};
static const struct ave_soc_data ave_ld20_data = {
.is_desc_64bit = true,
+ .clock_names = {
+ "ether",
+ },
+ .reset_names = {
+ "ether",
+ },
+ .get_pinmode = ave_ld20_get_pinmode,
};
static const struct ave_soc_data ave_pxs3_data = {
.is_desc_64bit = false,
+ .clock_names = {
+ "ether",
+ },
+ .reset_names = {
+ "ether",
+ },
+ .get_pinmode = ave_pxs3_get_pinmode,
};
static const struct of_device_id of_ave_match[] = {
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 972e4ef..68e9e26 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -4,7 +4,8 @@
chain_mode.o dwmac_lib.o dwmac1000_core.o dwmac1000_dma.o \
dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o \
mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o dwmac4_descs.o \
- dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o $(stmmac-y)
+ dwmac4_dma.o dwmac4_lib.o dwmac4_core.o dwmac5.o hwif.o \
+ stmmac_tc.o $(stmmac-y)
# Ordering matters. Generic driver must be last.
obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
index e93c40b..b9c9003 100644
--- a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
@@ -24,7 +24,7 @@
#include "stmmac.h"
-static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
+static int jumbo_frm(void *p, struct sk_buff *skb, int csum)
{
struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)p;
unsigned int nopaged_len = skb_headlen(skb);
@@ -51,8 +51,8 @@
tx_q->tx_skbuff_dma[entry].buf = des2;
tx_q->tx_skbuff_dma[entry].len = bmax;
/* do not close the descriptor and do not set own bit */
- priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum, STMMAC_CHAIN_MODE,
- 0, false, skb->len);
+ stmmac_prepare_tx_desc(priv, desc, 1, bmax, csum, STMMAC_CHAIN_MODE,
+ 0, false, skb->len);
while (len != 0) {
tx_q->tx_skbuff[entry] = NULL;
@@ -68,9 +68,8 @@
return -1;
tx_q->tx_skbuff_dma[entry].buf = des2;
tx_q->tx_skbuff_dma[entry].len = bmax;
- priv->hw->desc->prepare_tx_desc(desc, 0, bmax, csum,
- STMMAC_CHAIN_MODE, 1,
- false, skb->len);
+ stmmac_prepare_tx_desc(priv, desc, 0, bmax, csum,
+ STMMAC_CHAIN_MODE, 1, false, skb->len);
len -= bmax;
i++;
} else {
@@ -83,9 +82,8 @@
tx_q->tx_skbuff_dma[entry].buf = des2;
tx_q->tx_skbuff_dma[entry].len = len;
/* last descriptor can be set now */
- priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
- STMMAC_CHAIN_MODE, 1,
- true, skb->len);
+ stmmac_prepare_tx_desc(priv, desc, 0, len, csum,
+ STMMAC_CHAIN_MODE, 1, true, skb->len);
len = 0;
}
}
@@ -95,7 +93,7 @@
return entry;
}
-static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
+static unsigned int is_jumbo_frm(int len, int enh_desc)
{
unsigned int ret = 0;
@@ -107,7 +105,7 @@
return ret;
}
-static void stmmac_init_dma_chain(void *des, dma_addr_t phy_addr,
+static void init_dma_chain(void *des, dma_addr_t phy_addr,
unsigned int size, unsigned int extend_desc)
{
/*
@@ -137,7 +135,7 @@
}
}
-static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
+static void refill_desc3(void *priv_ptr, struct dma_desc *p)
{
struct stmmac_rx_queue *rx_q = (struct stmmac_rx_queue *)priv_ptr;
struct stmmac_priv *priv = rx_q->priv_data;
@@ -153,7 +151,7 @@
sizeof(struct dma_desc)));
}
-static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
+static void clean_desc3(void *priv_ptr, struct dma_desc *p)
{
struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)priv_ptr;
struct stmmac_priv *priv = tx_q->priv_data;
@@ -171,9 +169,9 @@
}
const struct stmmac_mode_ops chain_mode_ops = {
- .init = stmmac_init_dma_chain,
- .is_jumbo_frm = stmmac_is_jumbo_frm,
- .jumbo_frm = stmmac_jumbo_frm,
- .refill_desc3 = stmmac_refill_desc3,
- .clean_desc3 = stmmac_clean_desc3,
+ .init = init_dma_chain,
+ .is_jumbo_frm = is_jumbo_frm,
+ .jumbo_frm = jumbo_frm,
+ .refill_desc3 = refill_desc3,
+ .clean_desc3 = clean_desc3,
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index ad2388a..a679cb7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -32,12 +32,14 @@
#endif
#include "descs.h"
+#include "hwif.h"
#include "mmc.h"
/* Synopsys Core versions */
#define DWMAC_CORE_3_40 0x34
#define DWMAC_CORE_3_50 0x35
#define DWMAC_CORE_4_00 0x40
+#define DWMAC_CORE_4_10 0x41
#define DWMAC_CORE_5_00 0x50
#define DWMAC_CORE_5_10 0x51
#define STMMAC_CHAN0 0 /* Always supported and default for all chips */
@@ -351,6 +353,10 @@
unsigned int rx_fifo_size;
/* Automotive Safety Package */
unsigned int asp;
+ /* RX Parser */
+ unsigned int frpsel;
+ unsigned int frpbs;
+ unsigned int frpes;
};
/* GMAC TX FIFO is 8K, Rx FIFO is 16K */
@@ -377,197 +383,11 @@
#define JUMBO_LEN 9000
-/* Descriptors helpers */
-struct stmmac_desc_ops {
- /* DMA RX descriptor ring initialization */
- void (*init_rx_desc) (struct dma_desc *p, int disable_rx_ic, int mode,
- int end);
- /* DMA TX descriptor ring initialization */
- void (*init_tx_desc) (struct dma_desc *p, int mode, int end);
-
- /* Invoked by the xmit function to prepare the tx descriptor */
- void (*prepare_tx_desc) (struct dma_desc *p, int is_fs, int len,
- bool csum_flag, int mode, bool tx_own,
- bool ls, unsigned int tot_pkt_len);
- void (*prepare_tso_tx_desc)(struct dma_desc *p, int is_fs, int len1,
- int len2, bool tx_own, bool ls,
- unsigned int tcphdrlen,
- unsigned int tcppayloadlen);
- /* Set/get the owner of the descriptor */
- void (*set_tx_owner) (struct dma_desc *p);
- int (*get_tx_owner) (struct dma_desc *p);
- /* Clean the tx descriptor as soon as the tx irq is received */
- void (*release_tx_desc) (struct dma_desc *p, int mode);
- /* Clear interrupt on tx frame completion. When this bit is
- * set an interrupt happens as soon as the frame is transmitted */
- void (*set_tx_ic)(struct dma_desc *p);
- /* Last tx segment reports the transmit status */
- int (*get_tx_ls) (struct dma_desc *p);
- /* Return the transmit status looking at the TDES1 */
- int (*tx_status) (void *data, struct stmmac_extra_stats *x,
- struct dma_desc *p, void __iomem *ioaddr);
- /* Get the buffer size from the descriptor */
- int (*get_tx_len) (struct dma_desc *p);
- /* Handle extra events on specific interrupts hw dependent */
- void (*set_rx_owner) (struct dma_desc *p);
- /* Get the receive frame size */
- int (*get_rx_frame_len) (struct dma_desc *p, int rx_coe_type);
- /* Return the reception status looking at the RDES1 */
- int (*rx_status) (void *data, struct stmmac_extra_stats *x,
- struct dma_desc *p);
- void (*rx_extended_status) (void *data, struct stmmac_extra_stats *x,
- struct dma_extended_desc *p);
- /* Set tx timestamp enable bit */
- void (*enable_tx_timestamp) (struct dma_desc *p);
- /* get tx timestamp status */
- int (*get_tx_timestamp_status) (struct dma_desc *p);
- /* get timestamp value */
- u64(*get_timestamp) (void *desc, u32 ats);
- /* get rx timestamp status */
- int (*get_rx_timestamp_status)(void *desc, void *next_desc, u32 ats);
- /* Display ring */
- void (*display_ring)(void *head, unsigned int size, bool rx);
- /* set MSS via context descriptor */
- void (*set_mss)(struct dma_desc *p, unsigned int mss);
-};
-
extern const struct stmmac_desc_ops enh_desc_ops;
extern const struct stmmac_desc_ops ndesc_ops;
-/* Specific DMA helpers */
-struct stmmac_dma_ops {
- /* DMA core initialization */
- int (*reset)(void __iomem *ioaddr);
- void (*init)(void __iomem *ioaddr, struct stmmac_dma_cfg *dma_cfg,
- u32 dma_tx, u32 dma_rx, int atds);
- void (*init_chan)(void __iomem *ioaddr,
- struct stmmac_dma_cfg *dma_cfg, u32 chan);
- void (*init_rx_chan)(void __iomem *ioaddr,
- struct stmmac_dma_cfg *dma_cfg,
- u32 dma_rx_phy, u32 chan);
- void (*init_tx_chan)(void __iomem *ioaddr,
- struct stmmac_dma_cfg *dma_cfg,
- u32 dma_tx_phy, u32 chan);
- /* Configure the AXI Bus Mode Register */
- void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
- /* Dump DMA registers */
- void (*dump_regs)(void __iomem *ioaddr, u32 *reg_space);
- /* Set tx/rx threshold in the csr6 register
- * An invalid value enables the store-and-forward mode */
- void (*dma_mode)(void __iomem *ioaddr, int txmode, int rxmode,
- int rxfifosz);
- void (*dma_rx_mode)(void __iomem *ioaddr, int mode, u32 channel,
- int fifosz, u8 qmode);
- void (*dma_tx_mode)(void __iomem *ioaddr, int mode, u32 channel,
- int fifosz, u8 qmode);
- /* To track extra statistic (if supported) */
- void (*dma_diagnostic_fr) (void *data, struct stmmac_extra_stats *x,
- void __iomem *ioaddr);
- void (*enable_dma_transmission) (void __iomem *ioaddr);
- void (*enable_dma_irq)(void __iomem *ioaddr, u32 chan);
- void (*disable_dma_irq)(void __iomem *ioaddr, u32 chan);
- void (*start_tx)(void __iomem *ioaddr, u32 chan);
- void (*stop_tx)(void __iomem *ioaddr, u32 chan);
- void (*start_rx)(void __iomem *ioaddr, u32 chan);
- void (*stop_rx)(void __iomem *ioaddr, u32 chan);
- int (*dma_interrupt) (void __iomem *ioaddr,
- struct stmmac_extra_stats *x, u32 chan);
- /* If supported then get the optional core features */
- void (*get_hw_feature)(void __iomem *ioaddr,
- struct dma_features *dma_cap);
- /* Program the HW RX Watchdog */
- void (*rx_watchdog)(void __iomem *ioaddr, u32 riwt, u32 number_chan);
- void (*set_tx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
- void (*set_rx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
- void (*set_rx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
- void (*set_tx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
- void (*enable_tso)(void __iomem *ioaddr, bool en, u32 chan);
-};
-
struct mac_device_info;
-/* Helpers to program the MAC core */
-struct stmmac_ops {
- /* MAC core initialization */
- void (*core_init)(struct mac_device_info *hw, struct net_device *dev);
- /* Enable the MAC RX/TX */
- void (*set_mac)(void __iomem *ioaddr, bool enable);
- /* Enable and verify that the IPC module is supported */
- int (*rx_ipc)(struct mac_device_info *hw);
- /* Enable RX Queues */
- void (*rx_queue_enable)(struct mac_device_info *hw, u8 mode, u32 queue);
- /* RX Queues Priority */
- void (*rx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
- /* TX Queues Priority */
- void (*tx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
- /* RX Queues Routing */
- void (*rx_queue_routing)(struct mac_device_info *hw, u8 packet,
- u32 queue);
- /* Program RX Algorithms */
- void (*prog_mtl_rx_algorithms)(struct mac_device_info *hw, u32 rx_alg);
- /* Program TX Algorithms */
- void (*prog_mtl_tx_algorithms)(struct mac_device_info *hw, u32 tx_alg);
- /* Set MTL TX queues weight */
- void (*set_mtl_tx_queue_weight)(struct mac_device_info *hw,
- u32 weight, u32 queue);
- /* RX MTL queue to RX dma mapping */
- void (*map_mtl_to_dma)(struct mac_device_info *hw, u32 queue, u32 chan);
- /* Configure AV Algorithm */
- void (*config_cbs)(struct mac_device_info *hw, u32 send_slope,
- u32 idle_slope, u32 high_credit, u32 low_credit,
- u32 queue);
- /* Dump MAC registers */
- void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
- /* Handle extra events on specific interrupts hw dependent */
- int (*host_irq_status)(struct mac_device_info *hw,
- struct stmmac_extra_stats *x);
- /* Handle MTL interrupts */
- int (*host_mtl_irq_status)(struct mac_device_info *hw, u32 chan);
- /* Multicast filter setting */
- void (*set_filter)(struct mac_device_info *hw, struct net_device *dev);
- /* Flow control setting */
- void (*flow_ctrl)(struct mac_device_info *hw, unsigned int duplex,
- unsigned int fc, unsigned int pause_time, u32 tx_cnt);
- /* Set power management mode (e.g. magic frame) */
- void (*pmt)(struct mac_device_info *hw, unsigned long mode);
- /* Set/Get Unicast MAC addresses */
- void (*set_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
- unsigned int reg_n);
- void (*get_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
- unsigned int reg_n);
- void (*set_eee_mode)(struct mac_device_info *hw,
- bool en_tx_lpi_clockgating);
- void (*reset_eee_mode)(struct mac_device_info *hw);
- void (*set_eee_timer)(struct mac_device_info *hw, int ls, int tw);
- void (*set_eee_pls)(struct mac_device_info *hw, int link);
- void (*debug)(void __iomem *ioaddr, struct stmmac_extra_stats *x,
- u32 rx_queues, u32 tx_queues);
- /* PCS calls */
- void (*pcs_ctrl_ane)(void __iomem *ioaddr, bool ane, bool srgmi_ral,
- bool loopback);
- void (*pcs_rane)(void __iomem *ioaddr, bool restart);
- void (*pcs_get_adv_lp)(void __iomem *ioaddr, struct rgmii_adv *adv);
- /* Safety Features */
- int (*safety_feat_config)(void __iomem *ioaddr, unsigned int asp);
- bool (*safety_feat_irq_status)(struct net_device *ndev,
- void __iomem *ioaddr, unsigned int asp,
- struct stmmac_safety_stats *stats);
- const char *(*safety_feat_dump)(struct stmmac_safety_stats *stats,
- int index, unsigned long *count);
-};
-
-/* PTP and HW Timer helpers */
-struct stmmac_hwtimestamp {
- void (*config_hw_tstamping) (void __iomem *ioaddr, u32 data);
- u32 (*config_sub_second_increment)(void __iomem *ioaddr, u32 ptp_clock,
- int gmac4);
- int (*init_systime) (void __iomem *ioaddr, u32 sec, u32 nsec);
- int (*config_addend) (void __iomem *ioaddr, u32 addend);
- int (*adjust_systime) (void __iomem *ioaddr, u32 sec, u32 nsec,
- int add_sub, int gmac4);
- u64(*get_systime) (void __iomem *ioaddr);
-};
-
extern const struct stmmac_hwtimestamp stmmac_ptp;
extern const struct stmmac_mode_ops dwmac4_ring_mode_ops;
@@ -590,24 +410,13 @@
unsigned int clk_csr_mask;
};
-/* Helpers to manage the descriptors for chain and ring modes */
-struct stmmac_mode_ops {
- void (*init) (void *des, dma_addr_t phy_addr, unsigned int size,
- unsigned int extend_desc);
- unsigned int (*is_jumbo_frm) (int len, int ehn_desc);
- int (*jumbo_frm)(void *priv, struct sk_buff *skb, int csum);
- int (*set_16kib_bfsize)(int mtu);
- void (*init_desc3)(struct dma_desc *p);
- void (*refill_desc3) (void *priv, struct dma_desc *p);
- void (*clean_desc3) (void *priv, struct dma_desc *p);
-};
-
struct mac_device_info {
const struct stmmac_ops *mac;
const struct stmmac_desc_ops *desc;
const struct stmmac_dma_ops *dma;
const struct stmmac_mode_ops *mode;
const struct stmmac_hwtimestamp *ptp;
+ const struct stmmac_tc_ops *tc;
struct mii_regs mii; /* MII register Addresses */
struct mac_link link;
void __iomem *pcsr; /* vpointer to device CSRs */
@@ -625,12 +434,9 @@
u32 reg_shift;
};
-struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins,
- int perfect_uc_entries,
- int *synopsys_id);
-struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id);
-struct mac_device_info *dwmac4_setup(void __iomem *ioaddr, int mcbins,
- int perfect_uc_entries, int *synopsys_id);
+int dwmac100_setup(struct stmmac_priv *priv);
+int dwmac1000_setup(struct stmmac_priv *priv);
+int dwmac4_setup(struct stmmac_priv *priv);
void stmmac_set_mac_addr(void __iomem *ioaddr, u8 addr[6],
unsigned int high, unsigned int low);
@@ -650,24 +456,4 @@
extern const struct stmmac_mode_ops chain_mode_ops;
extern const struct stmmac_desc_ops dwmac4_desc_ops;
-/**
- * stmmac_get_synopsys_id - return the SYINID.
- * @priv: driver private structure
- * Description: this simple function is to decode and return the SYINID
- * starting from the HW core register.
- */
-static inline u32 stmmac_get_synopsys_id(u32 hwid)
-{
- /* Check Synopsys Id (not available on old chips) */
- if (likely(hwid)) {
- u32 uid = ((hwid & 0x0000ff00) >> 8);
- u32 synid = (hwid & 0x000000ff);
-
- pr_info("stmmac - user ID: 0x%x, Synopsys ID: 0x%x\n",
- uid, synid);
-
- return synid;
- }
- return 0;
-}
#endif /* __COMMON_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 7cb7940..4ff231d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -18,6 +18,7 @@
#include <linux/io.h>
#include <linux/ioport.h>
#include <linux/module.h>
+#include <linux/of_device.h>
#include <linux/of_net.h>
#include <linux/mfd/syscon.h>
#include <linux/platform_device.h>
@@ -29,6 +30,10 @@
#define PRG_ETH0_RGMII_MODE BIT(0)
+#define PRG_ETH0_EXT_PHY_MODE_MASK GENMASK(2, 0)
+#define PRG_ETH0_EXT_RGMII_MODE 1
+#define PRG_ETH0_EXT_RMII_MODE 4
+
/* mux to choose between fclk_div2 (bit unset) and mpll2 (bit set) */
#define PRG_ETH0_CLK_M250_SEL_SHIFT 4
#define PRG_ETH0_CLK_M250_SEL_MASK GENMASK(4, 4)
@@ -47,12 +52,20 @@
#define MUX_CLK_NUM_PARENTS 2
+struct meson8b_dwmac;
+
+struct meson8b_dwmac_data {
+ int (*set_phy_mode)(struct meson8b_dwmac *dwmac);
+};
+
struct meson8b_dwmac {
- struct device *dev;
- void __iomem *regs;
- phy_interface_t phy_mode;
- struct clk *rgmii_tx_clk;
- u32 tx_delay_ns;
+ struct device *dev;
+ void __iomem *regs;
+
+ const struct meson8b_dwmac_data *data;
+ phy_interface_t phy_mode;
+ struct clk *rgmii_tx_clk;
+ u32 tx_delay_ns;
};
struct meson8b_dwmac_clk_configs {
@@ -171,6 +184,59 @@
return 0;
}
+static int meson8b_set_phy_mode(struct meson8b_dwmac *dwmac)
+{
+ switch (dwmac->phy_mode) {
+ case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ /* enable RGMII mode */
+ meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+ PRG_ETH0_RGMII_MODE,
+ PRG_ETH0_RGMII_MODE);
+ break;
+ case PHY_INTERFACE_MODE_RMII:
+ /* disable RGMII mode -> enables RMII mode */
+ meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+ PRG_ETH0_RGMII_MODE, 0);
+ break;
+ default:
+ dev_err(dwmac->dev, "fail to set phy-mode %s\n",
+ phy_modes(dwmac->phy_mode));
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int meson_axg_set_phy_mode(struct meson8b_dwmac *dwmac)
+{
+ switch (dwmac->phy_mode) {
+ case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ /* enable RGMII mode */
+ meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+ PRG_ETH0_EXT_PHY_MODE_MASK,
+ PRG_ETH0_EXT_RGMII_MODE);
+ break;
+ case PHY_INTERFACE_MODE_RMII:
+ /* disable RGMII mode -> enables RMII mode */
+ meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
+ PRG_ETH0_EXT_PHY_MODE_MASK,
+ PRG_ETH0_EXT_RMII_MODE);
+ break;
+ default:
+ dev_err(dwmac->dev, "fail to set phy-mode %s\n",
+ phy_modes(dwmac->phy_mode));
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
{
int ret;
@@ -188,10 +254,6 @@
case PHY_INTERFACE_MODE_RGMII_ID:
case PHY_INTERFACE_MODE_RGMII_TXID:
- /* enable RGMII mode */
- meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
- PRG_ETH0_RGMII_MODE);
-
/* only relevant for RMII mode -> disable in RGMII mode */
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
PRG_ETH0_INVERTED_RMII_CLK, 0);
@@ -224,10 +286,6 @@
break;
case PHY_INTERFACE_MODE_RMII:
- /* disable RGMII mode -> enables RMII mode */
- meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
- 0);
-
/* invert internal clk_rmii_i to generate 25/2.5 tx_rx_clk */
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
PRG_ETH0_INVERTED_RMII_CLK,
@@ -274,6 +332,11 @@
goto err_remove_config_dt;
}
+ dwmac->data = (const struct meson8b_dwmac_data *)
+ of_device_get_match_data(&pdev->dev);
+ if (!dwmac->data)
+ return -EINVAL;
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
dwmac->regs = devm_ioremap_resource(&pdev->dev, res);
if (IS_ERR(dwmac->regs)) {
@@ -298,6 +361,10 @@
if (ret)
goto err_remove_config_dt;
+ ret = dwmac->data->set_phy_mode(dwmac);
+ if (ret)
+ goto err_remove_config_dt;
+
ret = meson8b_init_prg_eth(dwmac);
if (ret)
goto err_remove_config_dt;
@@ -316,10 +383,31 @@
return ret;
}
+static const struct meson8b_dwmac_data meson8b_dwmac_data = {
+ .set_phy_mode = meson8b_set_phy_mode,
+};
+
+static const struct meson8b_dwmac_data meson_axg_dwmac_data = {
+ .set_phy_mode = meson_axg_set_phy_mode,
+};
+
static const struct of_device_id meson8b_dwmac_match[] = {
- { .compatible = "amlogic,meson8b-dwmac" },
- { .compatible = "amlogic,meson8m2-dwmac" },
- { .compatible = "amlogic,meson-gxbb-dwmac" },
+ {
+ .compatible = "amlogic,meson8b-dwmac",
+ .data = &meson8b_dwmac_data,
+ },
+ {
+ .compatible = "amlogic,meson8m2-dwmac",
+ .data = &meson8b_dwmac_data,
+ },
+ {
+ .compatible = "amlogic,meson-gxbb-dwmac",
+ .data = &meson8b_dwmac_data,
+ },
+ {
+ .compatible = "amlogic,meson-axg-dwmac",
+ .data = &meson_axg_dwmac_data,
+ },
{ }
};
MODULE_DEVICE_TABLE(of, meson8b_dwmac_match);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 13133b3..f08625a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -1104,30 +1104,20 @@
} else {
if (bsp_priv->clk_enabled) {
if (phy_iface == PHY_INTERFACE_MODE_RMII) {
- if (!IS_ERR(bsp_priv->mac_clk_rx))
- clk_disable_unprepare(
- bsp_priv->mac_clk_rx);
+ clk_disable_unprepare(bsp_priv->mac_clk_rx);
- if (!IS_ERR(bsp_priv->clk_mac_ref))
- clk_disable_unprepare(
- bsp_priv->clk_mac_ref);
+ clk_disable_unprepare(bsp_priv->clk_mac_ref);
- if (!IS_ERR(bsp_priv->clk_mac_refout))
- clk_disable_unprepare(
- bsp_priv->clk_mac_refout);
+ clk_disable_unprepare(bsp_priv->clk_mac_refout);
}
- if (!IS_ERR(bsp_priv->clk_phy))
- clk_disable_unprepare(bsp_priv->clk_phy);
+ clk_disable_unprepare(bsp_priv->clk_phy);
- if (!IS_ERR(bsp_priv->aclk_mac))
- clk_disable_unprepare(bsp_priv->aclk_mac);
+ clk_disable_unprepare(bsp_priv->aclk_mac);
- if (!IS_ERR(bsp_priv->pclk_mac))
- clk_disable_unprepare(bsp_priv->pclk_mac);
+ clk_disable_unprepare(bsp_priv->pclk_mac);
- if (!IS_ERR(bsp_priv->mac_clk_tx))
- clk_disable_unprepare(bsp_priv->mac_clk_tx);
+ clk_disable_unprepare(bsp_priv->mac_clk_tx);
/**
* if (!IS_ERR(bsp_priv->clk_mac))
* clk_disable_unprepare(bsp_priv->clk_mac);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index a3fa65b..2e6e2a9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -42,17 +42,27 @@
* This value is used for disabling properly EMAC
* and used as a good starting value in case of the
* boot process(uboot) leave some stuff.
+ * @syscon_field reg_field for the syscon's gmac register
* @soc_has_internal_phy: Does the MAC embed an internal PHY
* @support_mii: Does the MAC handle MII
* @support_rmii: Does the MAC handle RMII
* @support_rgmii: Does the MAC handle RGMII
+ *
+ * @rx_delay_max: Maximum raw value for RX delay chain
+ * @tx_delay_max: Maximum raw value for TX delay chain
+ * These two also indicate the bitmask for
+ * the RX and TX delay chain registers. A
+ * value of zero indicates this is not supported.
*/
struct emac_variant {
u32 default_syscon_value;
+ const struct reg_field *syscon_field;
bool soc_has_internal_phy;
bool support_mii;
bool support_rmii;
bool support_rgmii;
+ u8 rx_delay_max;
+ u8 tx_delay_max;
};
/* struct sunxi_priv_data - hold all sunxi private data
@@ -71,38 +81,70 @@
struct regulator *regulator;
struct reset_control *rst_ephy;
const struct emac_variant *variant;
- struct regmap *regmap;
+ struct regmap_field *regmap_field;
bool internal_phy_powered;
void *mux_handle;
};
+/* EMAC clock register @ 0x30 in the "system control" address range */
+static const struct reg_field sun8i_syscon_reg_field = {
+ .reg = 0x30,
+ .lsb = 0,
+ .msb = 31,
+};
+
+/* EMAC clock register @ 0x164 in the CCU address range */
+static const struct reg_field sun8i_ccu_reg_field = {
+ .reg = 0x164,
+ .lsb = 0,
+ .msb = 31,
+};
+
static const struct emac_variant emac_variant_h3 = {
.default_syscon_value = 0x58000,
+ .syscon_field = &sun8i_syscon_reg_field,
.soc_has_internal_phy = true,
.support_mii = true,
.support_rmii = true,
- .support_rgmii = true
+ .support_rgmii = true,
+ .rx_delay_max = 31,
+ .tx_delay_max = 7,
};
static const struct emac_variant emac_variant_v3s = {
.default_syscon_value = 0x38000,
+ .syscon_field = &sun8i_syscon_reg_field,
.soc_has_internal_phy = true,
.support_mii = true
};
static const struct emac_variant emac_variant_a83t = {
.default_syscon_value = 0,
+ .syscon_field = &sun8i_syscon_reg_field,
.soc_has_internal_phy = false,
.support_mii = true,
- .support_rgmii = true
+ .support_rgmii = true,
+ .rx_delay_max = 31,
+ .tx_delay_max = 7,
+};
+
+static const struct emac_variant emac_variant_r40 = {
+ .default_syscon_value = 0,
+ .syscon_field = &sun8i_ccu_reg_field,
+ .support_mii = true,
+ .support_rgmii = true,
+ .rx_delay_max = 7,
};
static const struct emac_variant emac_variant_a64 = {
.default_syscon_value = 0,
+ .syscon_field = &sun8i_syscon_reg_field,
.soc_has_internal_phy = false,
.support_mii = true,
.support_rmii = true,
- .support_rgmii = true
+ .support_rgmii = true,
+ .rx_delay_max = 31,
+ .tx_delay_max = 7,
};
#define EMAC_BASIC_CTL0 0x00
@@ -206,9 +248,7 @@
#define SYSCON_RMII_EN BIT(13) /* 1: enable RMII (overrides EPIT) */
/* Generic system control EMAC_CLK bits */
-#define SYSCON_ETXDC_MASK GENMASK(2, 0)
#define SYSCON_ETXDC_SHIFT 10
-#define SYSCON_ERXDC_MASK GENMASK(4, 0)
#define SYSCON_ERXDC_SHIFT 5
/* EMAC PHY Interface Type */
#define SYSCON_EPIT BIT(2) /* 1: RGMII, 0: MII */
@@ -216,7 +256,6 @@
#define SYSCON_ETCS_MII 0x0
#define SYSCON_ETCS_EXT_GMII 0x1
#define SYSCON_ETCS_INT_GMII 0x2
-#define SYSCON_EMAC_REG 0x30
/* sun8i_dwmac_dma_reset() - reset the EMAC
* Called from stmmac via stmmac_dma_ops->reset
@@ -237,17 +276,28 @@
* Called from stmmac via stmmac_dma_ops->init
*/
static void sun8i_dwmac_dma_init(void __iomem *ioaddr,
- struct stmmac_dma_cfg *dma_cfg,
- u32 dma_tx, u32 dma_rx, int atds)
+ struct stmmac_dma_cfg *dma_cfg, int atds)
{
- /* Write TX and RX descriptors address */
- writel(dma_rx, ioaddr + EMAC_RX_DESC_LIST);
- writel(dma_tx, ioaddr + EMAC_TX_DESC_LIST);
-
writel(EMAC_RX_INT | EMAC_TX_INT, ioaddr + EMAC_INT_EN);
writel(0x1FFFFFF, ioaddr + EMAC_INT_STA);
}
+static void sun8i_dwmac_dma_init_rx(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_rx_phy, u32 chan)
+{
+ /* Write RX descriptors address */
+ writel(dma_rx_phy, ioaddr + EMAC_RX_DESC_LIST);
+}
+
+static void sun8i_dwmac_dma_init_tx(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_tx_phy, u32 chan)
+{
+ /* Write TX descriptors address */
+ writel(dma_tx_phy, ioaddr + EMAC_TX_DESC_LIST);
+}
+
/* sun8i_dwmac_dump_regs() - Dump EMAC address space
* Called from stmmac_dma_ops->dump_regs
* Used for ethtool
@@ -398,13 +448,36 @@
return ret;
}
-static void sun8i_dwmac_dma_operation_mode(void __iomem *ioaddr, int txmode,
- int rxmode, int rxfifosz)
+static void sun8i_dwmac_dma_operation_mode_rx(void __iomem *ioaddr, int mode,
+ u32 channel, int fifosz, u8 qmode)
+{
+ u32 v;
+
+ v = readl(ioaddr + EMAC_RX_CTL1);
+ if (mode == SF_DMA_MODE) {
+ v |= EMAC_RX_MD;
+ } else {
+ v &= ~EMAC_RX_MD;
+ v &= ~EMAC_RX_TH_MASK;
+ if (mode < 32)
+ v |= EMAC_RX_TH_32;
+ else if (mode < 64)
+ v |= EMAC_RX_TH_64;
+ else if (mode < 96)
+ v |= EMAC_RX_TH_96;
+ else if (mode < 128)
+ v |= EMAC_RX_TH_128;
+ }
+ writel(v, ioaddr + EMAC_RX_CTL1);
+}
+
+static void sun8i_dwmac_dma_operation_mode_tx(void __iomem *ioaddr, int mode,
+ u32 channel, int fifosz, u8 qmode)
{
u32 v;
v = readl(ioaddr + EMAC_TX_CTL1);
- if (txmode == SF_DMA_MODE) {
+ if (mode == SF_DMA_MODE) {
v |= EMAC_TX_MD;
/* Undocumented bit (called TX_NEXT_FRM in BSP), the original
* comment is
@@ -415,40 +488,26 @@
} else {
v &= ~EMAC_TX_MD;
v &= ~EMAC_TX_TH_MASK;
- if (txmode < 64)
+ if (mode < 64)
v |= EMAC_TX_TH_64;
- else if (txmode < 128)
+ else if (mode < 128)
v |= EMAC_TX_TH_128;
- else if (txmode < 192)
+ else if (mode < 192)
v |= EMAC_TX_TH_192;
- else if (txmode < 256)
+ else if (mode < 256)
v |= EMAC_TX_TH_256;
}
writel(v, ioaddr + EMAC_TX_CTL1);
-
- v = readl(ioaddr + EMAC_RX_CTL1);
- if (rxmode == SF_DMA_MODE) {
- v |= EMAC_RX_MD;
- } else {
- v &= ~EMAC_RX_MD;
- v &= ~EMAC_RX_TH_MASK;
- if (rxmode < 32)
- v |= EMAC_RX_TH_32;
- else if (rxmode < 64)
- v |= EMAC_RX_TH_64;
- else if (rxmode < 96)
- v |= EMAC_RX_TH_96;
- else if (rxmode < 128)
- v |= EMAC_RX_TH_128;
- }
- writel(v, ioaddr + EMAC_RX_CTL1);
}
static const struct stmmac_dma_ops sun8i_dwmac_dma_ops = {
.reset = sun8i_dwmac_dma_reset,
.init = sun8i_dwmac_dma_init,
+ .init_rx_chan = sun8i_dwmac_dma_init_rx,
+ .init_tx_chan = sun8i_dwmac_dma_init_tx,
.dump_regs = sun8i_dwmac_dump_regs,
- .dma_mode = sun8i_dwmac_dma_operation_mode,
+ .dma_rx_mode = sun8i_dwmac_dma_operation_mode_rx,
+ .dma_tx_mode = sun8i_dwmac_dma_operation_mode_tx,
.enable_dma_transmission = sun8i_dwmac_enable_dma_transmission,
.enable_dma_irq = sun8i_dwmac_enable_dma_irq,
.disable_dma_irq = sun8i_dwmac_disable_dma_irq,
@@ -745,7 +804,7 @@
bool need_power_ephy = false;
if (current_child ^ desired_child) {
- regmap_read(gmac->regmap, SYSCON_EMAC_REG, ®);
+ regmap_field_read(gmac->regmap_field, ®);
switch (desired_child) {
case DWMAC_SUN8I_MDIO_MUX_INTERNAL_ID:
dev_info(priv->device, "Switch mux to internal PHY");
@@ -763,7 +822,7 @@
desired_child);
return -EINVAL;
}
- regmap_write(gmac->regmap, SYSCON_EMAC_REG, val);
+ regmap_field_write(gmac->regmap_field, val);
if (need_power_ephy) {
ret = sun8i_dwmac_power_internal_phy(priv);
if (ret)
@@ -801,7 +860,7 @@
int ret;
u32 reg, val;
- regmap_read(gmac->regmap, SYSCON_EMAC_REG, &val);
+ regmap_field_read(gmac->regmap_field, &val);
reg = gmac->variant->default_syscon_value;
if (reg != val)
dev_warn(priv->device,
@@ -835,8 +894,9 @@
}
val /= 100;
dev_dbg(priv->device, "set tx-delay to %x\n", val);
- if (val <= SYSCON_ETXDC_MASK) {
- reg &= ~(SYSCON_ETXDC_MASK << SYSCON_ETXDC_SHIFT);
+ if (val <= gmac->variant->tx_delay_max) {
+ reg &= ~(gmac->variant->tx_delay_max <<
+ SYSCON_ETXDC_SHIFT);
reg |= (val << SYSCON_ETXDC_SHIFT);
} else {
dev_err(priv->device, "Invalid TX clock delay: %d\n",
@@ -852,8 +912,9 @@
}
val /= 100;
dev_dbg(priv->device, "set rx-delay to %x\n", val);
- if (val <= SYSCON_ERXDC_MASK) {
- reg &= ~(SYSCON_ERXDC_MASK << SYSCON_ERXDC_SHIFT);
+ if (val <= gmac->variant->rx_delay_max) {
+ reg &= ~(gmac->variant->rx_delay_max <<
+ SYSCON_ERXDC_SHIFT);
reg |= (val << SYSCON_ERXDC_SHIFT);
} else {
dev_err(priv->device, "Invalid RX clock delay: %d\n",
@@ -883,7 +944,7 @@
return -EINVAL;
}
- regmap_write(gmac->regmap, SYSCON_EMAC_REG, reg);
+ regmap_field_write(gmac->regmap_field, reg);
return 0;
}
@@ -892,7 +953,7 @@
{
u32 reg = gmac->variant->default_syscon_value;
- regmap_write(gmac->regmap, SYSCON_EMAC_REG, reg);
+ regmap_field_write(gmac->regmap_field, reg);
}
static void sun8i_dwmac_exit(struct platform_device *pdev, void *priv)
@@ -971,6 +1032,34 @@
return mac;
}
+static struct regmap *sun8i_dwmac_get_syscon_from_dev(struct device_node *node)
+{
+ struct device_node *syscon_node;
+ struct platform_device *syscon_pdev;
+ struct regmap *regmap = NULL;
+
+ syscon_node = of_parse_phandle(node, "syscon", 0);
+ if (!syscon_node)
+ return ERR_PTR(-ENODEV);
+
+ syscon_pdev = of_find_device_by_node(syscon_node);
+ if (!syscon_pdev) {
+ /* platform device might not be probed yet */
+ regmap = ERR_PTR(-EPROBE_DEFER);
+ goto out_put_node;
+ }
+
+ /* If no regmap is found then the other device driver is at fault */
+ regmap = dev_get_regmap(&syscon_pdev->dev, NULL);
+ if (!regmap)
+ regmap = ERR_PTR(-EINVAL);
+
+ platform_device_put(syscon_pdev);
+out_put_node:
+ of_node_put(syscon_node);
+ return regmap;
+}
+
static int sun8i_dwmac_probe(struct platform_device *pdev)
{
struct plat_stmmacenet_data *plat_dat;
@@ -980,6 +1069,7 @@
int ret;
struct stmmac_priv *priv;
struct net_device *ndev;
+ struct regmap *regmap;
ret = stmmac_get_platform_resources(pdev, &stmmac_res);
if (ret)
@@ -1014,14 +1104,41 @@
gmac->regulator = NULL;
}
- gmac->regmap = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
- "syscon");
- if (IS_ERR(gmac->regmap)) {
- ret = PTR_ERR(gmac->regmap);
+ /* The "GMAC clock control" register might be located in the
+ * CCU address range (on the R40), or the system control address
+ * range (on most other sun8i and later SoCs).
+ *
+ * The former controls most if not all clocks in the SoC. The
+ * latter has an SoC identification register, and on some SoCs,
+ * controls to map device specific SRAM to either the intended
+ * peripheral, or the CPU address space.
+ *
+ * In either case, there should be a coordinated and restricted
+ * method of accessing the register needed here. This is done by
+ * having the device export a custom regmap, instead of a generic
+ * syscon, which grants all access to all registers.
+ *
+ * To support old device trees, we fall back to using the syscon
+ * interface if possible.
+ */
+ regmap = sun8i_dwmac_get_syscon_from_dev(pdev->dev.of_node);
+ if (IS_ERR(regmap))
+ regmap = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
+ "syscon");
+ if (IS_ERR(regmap)) {
+ ret = PTR_ERR(regmap);
dev_err(&pdev->dev, "Unable to map syscon: %d\n", ret);
return ret;
}
+ gmac->regmap_field = devm_regmap_field_alloc(dev, regmap,
+ *gmac->variant->syscon_field);
+ if (IS_ERR(gmac->regmap_field)) {
+ ret = PTR_ERR(gmac->regmap_field);
+ dev_err(dev, "Unable to map syscon register: %d\n", ret);
+ return ret;
+ }
+
plat_dat->interface = of_get_phy_mode(dev->of_node);
/* platform data specifying hardware features and callbacks.
@@ -1078,6 +1195,8 @@
.data = &emac_variant_v3s },
{ .compatible = "allwinner,sun8i-a83t-emac",
.data = &emac_variant_a83t },
+ { .compatible = "allwinner,sun8i-r40-gmac",
+ .data = &emac_variant_r40 },
{ .compatible = "allwinner,sun50i-a64-emac",
.data = &emac_variant_a64 },
{ }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
index c02d366..184ca13 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
@@ -29,7 +29,6 @@
#define GMAC_MII_DATA 0x00000014 /* MII Data */
#define GMAC_FLOW_CTRL 0x00000018 /* Flow Control */
#define GMAC_VLAN_TAG 0x0000001c /* VLAN Tag */
-#define GMAC_VERSION 0x00000020 /* GMAC CORE Version */
#define GMAC_DEBUG 0x00000024 /* GMAC debug register */
#define GMAC_WAKEUP_FILTER 0x00000028 /* Wake-up Frame Filter */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index ef10baf..0877bde 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -27,6 +27,7 @@
#include <linux/ethtool.h>
#include <net/dsa.h>
#include <asm/io.h>
+#include "stmmac.h"
#include "stmmac_pcs.h"
#include "dwmac1000.h"
@@ -498,7 +499,7 @@
x->mac_gmii_rx_proto_engine++;
}
-static const struct stmmac_ops dwmac1000_ops = {
+const struct stmmac_ops dwmac1000_ops = {
.core_init = dwmac1000_core_init,
.set_mac = stmmac_set_mac,
.rx_ipc = dwmac1000_rx_ipc_enable,
@@ -519,28 +520,21 @@
.pcs_get_adv_lp = dwmac1000_get_adv_lp,
};
-struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins,
- int perfect_uc_entries,
- int *synopsys_id)
+int dwmac1000_setup(struct stmmac_priv *priv)
{
- struct mac_device_info *mac;
- u32 hwid = readl(ioaddr + GMAC_VERSION);
+ struct mac_device_info *mac = priv->hw;
- mac = kzalloc(sizeof(const struct mac_device_info), GFP_KERNEL);
- if (!mac)
- return NULL;
+ dev_info(priv->device, "\tDWMAC1000\n");
- mac->pcsr = ioaddr;
- mac->multicast_filter_bins = mcbins;
- mac->unicast_filter_entries = perfect_uc_entries;
+ priv->dev->priv_flags |= IFF_UNICAST_FLT;
+ mac->pcsr = priv->ioaddr;
+ mac->multicast_filter_bins = priv->plat->multicast_filter_bins;
+ mac->unicast_filter_entries = priv->plat->unicast_filter_entries;
mac->mcast_bits_log2 = 0;
if (mac->multicast_filter_bins)
mac->mcast_bits_log2 = ilog2(mac->multicast_filter_bins);
- mac->mac = &dwmac1000_ops;
- mac->dma = &dwmac1000_dma_ops;
-
mac->link.duplex = GMAC_CONTROL_DM;
mac->link.speed10 = GMAC_CONTROL_PS;
mac->link.speed100 = GMAC_CONTROL_PS | GMAC_CONTROL_FES;
@@ -555,8 +549,5 @@
mac->mii.clk_csr_shift = 2;
mac->mii.clk_csr_mask = GENMASK(5, 2);
- /* Get and dump the chip ID */
- *synopsys_id = stmmac_get_synopsys_id(hwid);
-
- return mac;
+ return 0;
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 7ecf549..aacc4aa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -81,8 +81,7 @@
}
static void dwmac1000_dma_init(void __iomem *ioaddr,
- struct stmmac_dma_cfg *dma_cfg,
- u32 dma_tx, u32 dma_rx, int atds)
+ struct stmmac_dma_cfg *dma_cfg, int atds)
{
u32 value = readl(ioaddr + DMA_BUS_MODE);
int txpbl = dma_cfg->txpbl ?: dma_cfg->pbl;
@@ -119,12 +118,22 @@
/* Mask interrupts by writing to CSR7 */
writel(DMA_INTR_DEFAULT_MASK, ioaddr + DMA_INTR_ENA);
+}
- /* RX/TX descriptor base address lists must be written into
- * DMA CSR3 and CSR4, respectively
- */
- writel(dma_tx, ioaddr + DMA_TX_BASE_ADDR);
- writel(dma_rx, ioaddr + DMA_RCV_BASE_ADDR);
+static void dwmac1000_dma_init_rx(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_rx_phy, u32 chan)
+{
+ /* RX descriptor base address list must be written into DMA CSR3 */
+ writel(dma_rx_phy, ioaddr + DMA_RCV_BASE_ADDR);
+}
+
+static void dwmac1000_dma_init_tx(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_tx_phy, u32 chan)
+{
+ /* TX descriptor base address list must be written into DMA CSR4 */
+ writel(dma_tx_phy, ioaddr + DMA_TX_BASE_ADDR);
}
static u32 dwmac1000_configure_fc(u32 csr6, int rxfifosz)
@@ -148,12 +157,40 @@
return csr6;
}
-static void dwmac1000_dma_operation_mode(void __iomem *ioaddr, int txmode,
- int rxmode, int rxfifosz)
+static void dwmac1000_dma_operation_mode_rx(void __iomem *ioaddr, int mode,
+ u32 channel, int fifosz, u8 qmode)
{
u32 csr6 = readl(ioaddr + DMA_CONTROL);
- if (txmode == SF_DMA_MODE) {
+ if (mode == SF_DMA_MODE) {
+ pr_debug("GMAC: enable RX store and forward mode\n");
+ csr6 |= DMA_CONTROL_RSF;
+ } else {
+ pr_debug("GMAC: disable RX SF mode (threshold %d)\n", mode);
+ csr6 &= ~DMA_CONTROL_RSF;
+ csr6 &= DMA_CONTROL_TC_RX_MASK;
+ if (mode <= 32)
+ csr6 |= DMA_CONTROL_RTC_32;
+ else if (mode <= 64)
+ csr6 |= DMA_CONTROL_RTC_64;
+ else if (mode <= 96)
+ csr6 |= DMA_CONTROL_RTC_96;
+ else
+ csr6 |= DMA_CONTROL_RTC_128;
+ }
+
+ /* Configure flow control based on rx fifo size */
+ csr6 = dwmac1000_configure_fc(csr6, fifosz);
+
+ writel(csr6, ioaddr + DMA_CONTROL);
+}
+
+static void dwmac1000_dma_operation_mode_tx(void __iomem *ioaddr, int mode,
+ u32 channel, int fifosz, u8 qmode)
+{
+ u32 csr6 = readl(ioaddr + DMA_CONTROL);
+
+ if (mode == SF_DMA_MODE) {
pr_debug("GMAC: enable TX store and forward mode\n");
/* Transmit COE type 2 cannot be done in cut-through mode. */
csr6 |= DMA_CONTROL_TSF;
@@ -162,42 +199,22 @@
*/
csr6 |= DMA_CONTROL_OSF;
} else {
- pr_debug("GMAC: disabling TX SF (threshold %d)\n", txmode);
+ pr_debug("GMAC: disabling TX SF (threshold %d)\n", mode);
csr6 &= ~DMA_CONTROL_TSF;
csr6 &= DMA_CONTROL_TC_TX_MASK;
/* Set the transmit threshold */
- if (txmode <= 32)
+ if (mode <= 32)
csr6 |= DMA_CONTROL_TTC_32;
- else if (txmode <= 64)
+ else if (mode <= 64)
csr6 |= DMA_CONTROL_TTC_64;
- else if (txmode <= 128)
+ else if (mode <= 128)
csr6 |= DMA_CONTROL_TTC_128;
- else if (txmode <= 192)
+ else if (mode <= 192)
csr6 |= DMA_CONTROL_TTC_192;
else
csr6 |= DMA_CONTROL_TTC_256;
}
- if (rxmode == SF_DMA_MODE) {
- pr_debug("GMAC: enable RX store and forward mode\n");
- csr6 |= DMA_CONTROL_RSF;
- } else {
- pr_debug("GMAC: disable RX SF mode (threshold %d)\n", rxmode);
- csr6 &= ~DMA_CONTROL_RSF;
- csr6 &= DMA_CONTROL_TC_RX_MASK;
- if (rxmode <= 32)
- csr6 |= DMA_CONTROL_RTC_32;
- else if (rxmode <= 64)
- csr6 |= DMA_CONTROL_RTC_64;
- else if (rxmode <= 96)
- csr6 |= DMA_CONTROL_RTC_96;
- else
- csr6 |= DMA_CONTROL_RTC_128;
- }
-
- /* Configure flow control based on rx fifo size */
- csr6 = dwmac1000_configure_fc(csr6, rxfifosz);
-
writel(csr6, ioaddr + DMA_CONTROL);
}
@@ -256,9 +273,12 @@
const struct stmmac_dma_ops dwmac1000_dma_ops = {
.reset = dwmac_dma_reset,
.init = dwmac1000_dma_init,
+ .init_rx_chan = dwmac1000_dma_init_rx,
+ .init_tx_chan = dwmac1000_dma_init_tx,
.axi = dwmac1000_dma_axi,
.dump_regs = dwmac1000_dump_dma_regs,
- .dma_mode = dwmac1000_dma_operation_mode,
+ .dma_rx_mode = dwmac1000_dma_operation_mode_rx,
+ .dma_tx_mode = dwmac1000_dma_operation_mode_tx,
.enable_dma_transmission = dwmac_enable_dma_transmission,
.enable_dma_irq = dwmac_enable_dma_irq,
.disable_dma_irq = dwmac_disable_dma_irq,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 91b23f9..b735143 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -27,6 +27,7 @@
#include <linux/crc32.h>
#include <net/dsa.h>
#include <asm/io.h>
+#include "stmmac.h"
#include "dwmac100.h"
static void dwmac100_core_init(struct mac_device_info *hw,
@@ -159,7 +160,7 @@
return;
}
-static const struct stmmac_ops dwmac100_ops = {
+const struct stmmac_ops dwmac100_ops = {
.core_init = dwmac100_core_init,
.set_mac = stmmac_set_mac,
.rx_ipc = dwmac100_rx_ipc_enable,
@@ -172,20 +173,13 @@
.get_umac_addr = dwmac100_get_umac_addr,
};
-struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id)
+int dwmac100_setup(struct stmmac_priv *priv)
{
- struct mac_device_info *mac;
+ struct mac_device_info *mac = priv->hw;
- mac = kzalloc(sizeof(const struct mac_device_info), GFP_KERNEL);
- if (!mac)
- return NULL;
+ dev_info(priv->device, "\tDWMAC100\n");
- pr_info("\tDWMAC100\n");
-
- mac->pcsr = ioaddr;
- mac->mac = &dwmac100_ops;
- mac->dma = &dwmac100_dma_ops;
-
+ mac->pcsr = priv->ioaddr;
mac->link.duplex = MAC_CONTROL_F;
mac->link.speed10 = 0;
mac->link.speed100 = 0;
@@ -200,8 +194,5 @@
mac->mii.clk_csr_shift = 2;
mac->mii.clk_csr_mask = GENMASK(5, 2);
- /* Synopsys Id is not available on old chips */
- *synopsys_id = 0;
-
- return mac;
+ return 0;
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
index 6502b9a..21dee25 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
@@ -29,8 +29,7 @@
#include "dwmac_dma.h"
static void dwmac100_dma_init(void __iomem *ioaddr,
- struct stmmac_dma_cfg *dma_cfg,
- u32 dma_tx, u32 dma_rx, int atds)
+ struct stmmac_dma_cfg *dma_cfg, int atds)
{
/* Enable Application Access by writing to DMA CSR0 */
writel(DMA_BUS_MODE_DEFAULT | (dma_cfg->pbl << DMA_BUS_MODE_PBL_SHIFT),
@@ -38,12 +37,22 @@
/* Mask interrupts by writing to CSR7 */
writel(DMA_INTR_DEFAULT_MASK, ioaddr + DMA_INTR_ENA);
+}
- /* RX/TX descriptor base addr lists must be written into
- * DMA CSR3 and CSR4, respectively
- */
- writel(dma_tx, ioaddr + DMA_TX_BASE_ADDR);
- writel(dma_rx, ioaddr + DMA_RCV_BASE_ADDR);
+static void dwmac100_dma_init_rx(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_rx_phy, u32 chan)
+{
+ /* RX descriptor base addr lists must be written into DMA CSR3 */
+ writel(dma_rx_phy, ioaddr + DMA_RCV_BASE_ADDR);
+}
+
+static void dwmac100_dma_init_tx(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_tx_phy, u32 chan)
+{
+ /* TX descriptor base addr lists must be written into DMA CSR4 */
+ writel(dma_tx_phy, ioaddr + DMA_TX_BASE_ADDR);
}
/* Store and Forward capability is not used at all.
@@ -51,14 +60,14 @@
* The transmit threshold can be programmed by setting the TTC bits in the DMA
* control register.
*/
-static void dwmac100_dma_operation_mode(void __iomem *ioaddr, int txmode,
- int rxmode, int rxfifosz)
+static void dwmac100_dma_operation_mode_tx(void __iomem *ioaddr, int mode,
+ u32 channel, int fifosz, u8 qmode)
{
u32 csr6 = readl(ioaddr + DMA_CONTROL);
- if (txmode <= 32)
+ if (mode <= 32)
csr6 |= DMA_CONTROL_TTC_32;
- else if (txmode <= 64)
+ else if (mode <= 64)
csr6 |= DMA_CONTROL_TTC_64;
else
csr6 |= DMA_CONTROL_TTC_128;
@@ -112,8 +121,10 @@
const struct stmmac_dma_ops dwmac100_dma_ops = {
.reset = dwmac_dma_reset,
.init = dwmac100_dma_init,
+ .init_rx_chan = dwmac100_dma_init_rx,
+ .init_tx_chan = dwmac100_dma_init_tx,
.dump_regs = dwmac100_dump_dma_regs,
- .dma_mode = dwmac100_dma_operation_mode,
+ .dma_tx_mode = dwmac100_dma_operation_mode_tx,
.dma_diagnostic_fr = dwmac100_dma_diagnostic_fr,
.enable_dma_transmission = dwmac_enable_dma_transmission,
.enable_dma_irq = dwmac_enable_dma_irq,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index dedd406..6330a55 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -34,7 +34,6 @@
#define GMAC_PCS_BASE 0x000000e0
#define GMAC_PHYIF_CONTROL_STATUS 0x000000f8
#define GMAC_PMT 0x000000c0
-#define GMAC_VERSION 0x00000110
#define GMAC_DEBUG 0x00000114
#define GMAC_HW_FEATURE0 0x0000011c
#define GMAC_HW_FEATURE1 0x00000120
@@ -195,6 +194,9 @@
/* MAC HW features3 bitmap */
#define GMAC_HW_FEAT_ASP GENMASK(29, 28)
+#define GMAC_HW_FEAT_FRPES GENMASK(14, 13)
+#define GMAC_HW_FEAT_FRPBS GENMASK(12, 11)
+#define GMAC_HW_FEAT_FRPSEL BIT(10)
/* MAC HW ADDR regs */
#define GMAC_HI_DCS GENMASK(18, 16)
@@ -203,6 +205,7 @@
/* MTL registers */
#define MTL_OPERATION_MODE 0x00000c00
+#define MTL_FRPE BIT(15)
#define MTL_OPERATION_SCHALG_MASK GENMASK(6, 5)
#define MTL_OPERATION_SCHALG_WRR (0x0 << 5)
#define MTL_OPERATION_SCHALG_WFQ (0x1 << 5)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 517b1f6..a7121a7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -18,6 +18,7 @@
#include <linux/ethtool.h>
#include <linux/io.h>
#include <net/dsa.h>
+#include "stmmac.h"
#include "stmmac_pcs.h"
#include "dwmac4.h"
#include "dwmac5.h"
@@ -700,7 +701,7 @@
x->mac_gmii_rx_proto_engine++;
}
-static const struct stmmac_ops dwmac4_ops = {
+const struct stmmac_ops dwmac4_ops = {
.core_init = dwmac4_core_init,
.set_mac = stmmac_set_mac,
.rx_ipc = dwmac4_rx_ipc_enable,
@@ -731,7 +732,7 @@
.set_filter = dwmac4_set_filter,
};
-static const struct stmmac_ops dwmac410_ops = {
+const struct stmmac_ops dwmac410_ops = {
.core_init = dwmac4_core_init,
.set_mac = stmmac_dwmac4_set_mac,
.rx_ipc = dwmac4_rx_ipc_enable,
@@ -762,7 +763,7 @@
.set_filter = dwmac4_set_filter,
};
-static const struct stmmac_ops dwmac510_ops = {
+const struct stmmac_ops dwmac510_ops = {
.core_init = dwmac4_core_init,
.set_mac = stmmac_dwmac4_set_mac,
.rx_ipc = dwmac4_rx_ipc_enable,
@@ -794,21 +795,19 @@
.safety_feat_config = dwmac5_safety_feat_config,
.safety_feat_irq_status = dwmac5_safety_feat_irq_status,
.safety_feat_dump = dwmac5_safety_feat_dump,
+ .rxp_config = dwmac5_rxp_config,
};
-struct mac_device_info *dwmac4_setup(void __iomem *ioaddr, int mcbins,
- int perfect_uc_entries, int *synopsys_id)
+int dwmac4_setup(struct stmmac_priv *priv)
{
- struct mac_device_info *mac;
- u32 hwid = readl(ioaddr + GMAC_VERSION);
+ struct mac_device_info *mac = priv->hw;
- mac = kzalloc(sizeof(const struct mac_device_info), GFP_KERNEL);
- if (!mac)
- return NULL;
+ dev_info(priv->device, "\tDWMAC4/5\n");
- mac->pcsr = ioaddr;
- mac->multicast_filter_bins = mcbins;
- mac->unicast_filter_entries = perfect_uc_entries;
+ priv->dev->priv_flags |= IFF_UNICAST_FLT;
+ mac->pcsr = priv->ioaddr;
+ mac->multicast_filter_bins = priv->plat->multicast_filter_bins;
+ mac->unicast_filter_entries = priv->plat->unicast_filter_entries;
mac->mcast_bits_log2 = 0;
if (mac->multicast_filter_bins)
@@ -828,20 +827,5 @@
mac->mii.clk_csr_shift = 8;
mac->mii.clk_csr_mask = GENMASK(11, 8);
- /* Get and dump the chip ID */
- *synopsys_id = stmmac_get_synopsys_id(hwid);
-
- if (*synopsys_id > DWMAC_CORE_4_00)
- mac->dma = &dwmac410_dma_ops;
- else
- mac->dma = &dwmac4_dma_ops;
-
- if (*synopsys_id >= DWMAC_CORE_5_10)
- mac->mac = &dwmac510_ops;
- else if (*synopsys_id >= DWMAC_CORE_4_00)
- mac->mac = &dwmac410_ops;
- else
- mac->mac = &dwmac4_ops;
-
- return mac;
+ return 0;
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
index 2a6521d..20299f6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
@@ -189,9 +189,12 @@
p->des3 |= cpu_to_le32(TDES3_OWN);
}
-static void dwmac4_set_rx_owner(struct dma_desc *p)
+static void dwmac4_set_rx_owner(struct dma_desc *p, int disable_rx_ic)
{
- p->des3 |= cpu_to_le32(RDES3_OWN);
+ p->des3 = cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR);
+
+ if (!disable_rx_ic)
+ p->des3 |= cpu_to_le32(RDES3_INT_ON_COMPLETION_EN);
}
static int dwmac4_get_tx_ls(struct dma_desc *p)
@@ -223,7 +226,7 @@
return 0;
}
-static inline u64 dwmac4_get_timestamp(void *desc, u32 ats)
+static inline void dwmac4_get_timestamp(void *desc, u32 ats, u64 *ts)
{
struct dma_desc *p = (struct dma_desc *)desc;
u64 ns;
@@ -232,7 +235,7 @@
/* convert high/sec time stamp value to nanosecond */
ns += le32_to_cpu(p->des1) * 1000000000ULL;
- return ns;
+ *ts = ns;
}
static int dwmac4_rx_check_timestamp(void *desc)
@@ -292,10 +295,7 @@
static void dwmac4_rd_init_rx_desc(struct dma_desc *p, int disable_rx_ic,
int mode, int end)
{
- p->des3 = cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR);
-
- if (!disable_rx_ic)
- p->des3 |= cpu_to_le32(RDES3_INT_ON_COMPLETION_EN);
+ dwmac4_set_rx_owner(p, disable_rx_ic);
}
static void dwmac4_rd_init_tx_desc(struct dma_desc *p, int mode, int end)
@@ -424,6 +424,25 @@
p->des3 = cpu_to_le32(TDES3_CONTEXT_TYPE | TDES3_CTXT_TCMSSV);
}
+static void dwmac4_get_addr(struct dma_desc *p, unsigned int *addr)
+{
+ *addr = le32_to_cpu(p->des0);
+}
+
+static void dwmac4_set_addr(struct dma_desc *p, dma_addr_t addr)
+{
+ p->des0 = cpu_to_le32(addr);
+ p->des1 = 0;
+}
+
+static void dwmac4_clear(struct dma_desc *p)
+{
+ p->des0 = 0;
+ p->des1 = 0;
+ p->des2 = 0;
+ p->des3 = 0;
+}
+
const struct stmmac_desc_ops dwmac4_desc_ops = {
.tx_status = dwmac4_wrback_get_tx_status,
.rx_status = dwmac4_wrback_get_rx_status,
@@ -445,6 +464,9 @@
.init_tx_desc = dwmac4_rd_init_tx_desc,
.display_ring = dwmac4_display_ring,
.set_mss = dwmac4_set_mss_ctxt,
+ .get_addr = dwmac4_get_addr,
+ .set_addr = dwmac4_set_addr,
+ .clear = dwmac4_clear,
};
const struct stmmac_mode_ops dwmac4_ring_mode_ops = { };
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index d37d457..bf8e5a1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -94,6 +94,10 @@
value = readl(ioaddr + DMA_CHAN_TX_CONTROL(chan));
value = value | (txpbl << DMA_BUS_MODE_PBL_SHIFT);
+
+ /* Enable OSP to get best performance */
+ value |= DMA_CONTROL_OSP;
+
writel(value, ioaddr + DMA_CHAN_TX_CONTROL(chan));
writel(dma_tx_phy, ioaddr + DMA_CHAN_TX_BASE_ADDR(chan));
@@ -116,8 +120,7 @@
}
static void dwmac4_dma_init(void __iomem *ioaddr,
- struct stmmac_dma_cfg *dma_cfg,
- u32 dma_tx, u32 dma_rx, int atds)
+ struct stmmac_dma_cfg *dma_cfg, int atds)
{
u32 value = readl(ioaddr + DMA_SYS_BUS_MODE);
@@ -379,6 +382,9 @@
/* 5.10 Features */
dma_cap->asp = (hw_cap & GMAC_HW_FEAT_ASP) >> 28;
+ dma_cap->frpes = (hw_cap & GMAC_HW_FEAT_FRPES) >> 13;
+ dma_cap->frpbs = (hw_cap & GMAC_HW_FEAT_FRPBS) >> 11;
+ dma_cap->frpsel = (hw_cap & GMAC_HW_FEAT_FRPSEL) >> 10;
}
/* Enable/disable TSO feature and set MSS */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
index 8474bf9..c63c1fe 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
@@ -184,7 +184,6 @@
#define DMA_CHAN0_DBG_STAT_RPS_SHIFT 8
int dwmac4_dma_reset(void __iomem *ioaddr);
-void dwmac4_enable_dma_transmission(void __iomem *ioaddr, u32 tail_ptr);
void dwmac4_enable_dma_irq(void __iomem *ioaddr, u32 chan);
void dwmac410_enable_dma_irq(void __iomem *ioaddr, u32 chan);
void dwmac4_disable_dma_irq(void __iomem *ioaddr, u32 chan);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
index 860de39..b2becb80 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
@@ -7,6 +7,7 @@
#include "common.h"
#include "dwmac4.h"
#include "dwmac5.h"
+#include "stmmac.h"
struct dwmac5_error_desc {
bool valid;
@@ -237,15 +238,16 @@
return 0;
}
-bool dwmac5_safety_feat_irq_status(struct net_device *ndev,
+int dwmac5_safety_feat_irq_status(struct net_device *ndev,
void __iomem *ioaddr, unsigned int asp,
struct stmmac_safety_stats *stats)
{
- bool ret = false, err, corr;
+ bool err, corr;
u32 mtl, dma;
+ int ret = 0;
if (!asp)
- return false;
+ return -EINVAL;
mtl = readl(ioaddr + MTL_SAFETY_INT_STATUS);
dma = readl(ioaddr + DMA_SAFETY_INT_STATUS);
@@ -282,17 +284,213 @@
{ dwmac5_dma_errors },
};
-const char *dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
- int index, unsigned long *count)
+int dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
+ int index, unsigned long *count, const char **desc)
{
int module = index / 32, offset = index % 32;
unsigned long *ptr = (unsigned long *)stats;
if (module >= ARRAY_SIZE(dwmac5_all_errors))
- return NULL;
+ return -EINVAL;
if (!dwmac5_all_errors[module].desc[offset].valid)
- return NULL;
+ return -EINVAL;
if (count)
*count = *(ptr + index);
- return dwmac5_all_errors[module].desc[offset].desc;
+ if (desc)
+ *desc = dwmac5_all_errors[module].desc[offset].desc;
+ return 0;
+}
+
+static int dwmac5_rxp_disable(void __iomem *ioaddr)
+{
+ u32 val;
+ int ret;
+
+ val = readl(ioaddr + MTL_OPERATION_MODE);
+ val &= ~MTL_FRPE;
+ writel(val, ioaddr + MTL_OPERATION_MODE);
+
+ ret = readl_poll_timeout(ioaddr + MTL_RXP_CONTROL_STATUS, val,
+ val & RXPI, 1, 10000);
+ if (ret)
+ return ret;
+ return 0;
+}
+
+static void dwmac5_rxp_enable(void __iomem *ioaddr)
+{
+ u32 val;
+
+ val = readl(ioaddr + MTL_OPERATION_MODE);
+ val |= MTL_FRPE;
+ writel(val, ioaddr + MTL_OPERATION_MODE);
+}
+
+static int dwmac5_rxp_update_single_entry(void __iomem *ioaddr,
+ struct stmmac_tc_entry *entry,
+ int pos)
+{
+ int ret, i;
+
+ for (i = 0; i < (sizeof(entry->val) / sizeof(u32)); i++) {
+ int real_pos = pos * (sizeof(entry->val) / sizeof(u32)) + i;
+ u32 val;
+
+ /* Wait for ready */
+ ret = readl_poll_timeout(ioaddr + MTL_RXP_IACC_CTRL_STATUS,
+ val, !(val & STARTBUSY), 1, 10000);
+ if (ret)
+ return ret;
+
+ /* Write data */
+ val = *((u32 *)&entry->val + i);
+ writel(val, ioaddr + MTL_RXP_IACC_DATA);
+
+ /* Write pos */
+ val = real_pos & ADDR;
+ writel(val, ioaddr + MTL_RXP_IACC_CTRL_STATUS);
+
+ /* Write OP */
+ val |= WRRDN;
+ writel(val, ioaddr + MTL_RXP_IACC_CTRL_STATUS);
+
+ /* Start Write */
+ val |= STARTBUSY;
+ writel(val, ioaddr + MTL_RXP_IACC_CTRL_STATUS);
+
+ /* Wait for done */
+ ret = readl_poll_timeout(ioaddr + MTL_RXP_IACC_CTRL_STATUS,
+ val, !(val & STARTBUSY), 1, 10000);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static struct stmmac_tc_entry *
+dwmac5_rxp_get_next_entry(struct stmmac_tc_entry *entries, unsigned int count,
+ u32 curr_prio)
+{
+ struct stmmac_tc_entry *entry;
+ u32 min_prio = ~0x0;
+ int i, min_prio_idx;
+ bool found = false;
+
+ for (i = count - 1; i >= 0; i--) {
+ entry = &entries[i];
+
+ /* Do not update unused entries */
+ if (!entry->in_use)
+ continue;
+ /* Do not update already updated entries (i.e. fragments) */
+ if (entry->in_hw)
+ continue;
+ /* Let last entry be updated last */
+ if (entry->is_last)
+ continue;
+ /* Do not return fragments */
+ if (entry->is_frag)
+ continue;
+ /* Check if we already checked this prio */
+ if (entry->prio < curr_prio)
+ continue;
+ /* Check if this is the minimum prio */
+ if (entry->prio < min_prio) {
+ min_prio = entry->prio;
+ min_prio_idx = i;
+ found = true;
+ }
+ }
+
+ if (found)
+ return &entries[min_prio_idx];
+ return NULL;
+}
+
+int dwmac5_rxp_config(void __iomem *ioaddr, struct stmmac_tc_entry *entries,
+ unsigned int count)
+{
+ struct stmmac_tc_entry *entry, *frag;
+ int i, ret, nve = 0;
+ u32 curr_prio = 0;
+ u32 old_val, val;
+
+ /* Force disable RX */
+ old_val = readl(ioaddr + GMAC_CONFIG);
+ val = old_val & ~GMAC_CONFIG_RE;
+ writel(val, ioaddr + GMAC_CONFIG);
+
+ /* Disable RX Parser */
+ ret = dwmac5_rxp_disable(ioaddr);
+ if (ret)
+ goto re_enable;
+
+ /* Set all entries as NOT in HW */
+ for (i = 0; i < count; i++) {
+ entry = &entries[i];
+ entry->in_hw = false;
+ }
+
+ /* Update entries by reverse order */
+ while (1) {
+ entry = dwmac5_rxp_get_next_entry(entries, count, curr_prio);
+ if (!entry)
+ break;
+
+ curr_prio = entry->prio;
+ frag = entry->frag_ptr;
+
+ /* Set special fragment requirements */
+ if (frag) {
+ entry->val.af = 0;
+ entry->val.rf = 0;
+ entry->val.nc = 1;
+ entry->val.ok_index = nve + 2;
+ }
+
+ ret = dwmac5_rxp_update_single_entry(ioaddr, entry, nve);
+ if (ret)
+ goto re_enable;
+
+ entry->table_pos = nve++;
+ entry->in_hw = true;
+
+ if (frag && !frag->in_hw) {
+ ret = dwmac5_rxp_update_single_entry(ioaddr, frag, nve);
+ if (ret)
+ goto re_enable;
+ frag->table_pos = nve++;
+ frag->in_hw = true;
+ }
+ }
+
+ if (!nve)
+ goto re_enable;
+
+ /* Update all pass entry */
+ for (i = 0; i < count; i++) {
+ entry = &entries[i];
+ if (!entry->is_last)
+ continue;
+
+ ret = dwmac5_rxp_update_single_entry(ioaddr, entry, nve);
+ if (ret)
+ goto re_enable;
+
+ entry->table_pos = nve++;
+ }
+
+ /* Assume n. of parsable entries == n. of valid entries */
+ val = (nve << 16) & NPE;
+ val |= nve & NVE;
+ writel(val, ioaddr + MTL_RXP_CONTROL_STATUS);
+
+ /* Enable RX Parser */
+ dwmac5_rxp_enable(ioaddr);
+
+re_enable:
+ /* Re-enable RX */
+ writel(old_val, ioaddr + GMAC_CONFIG);
+ return ret;
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
index a0d2c44..cc810af 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
@@ -11,6 +11,17 @@
#define PRTYEN BIT(1)
#define TMOUTEN BIT(0)
+#define MTL_RXP_CONTROL_STATUS 0x00000ca0
+#define RXPI BIT(31)
+#define NPE GENMASK(23, 16)
+#define NVE GENMASK(7, 0)
+#define MTL_RXP_IACC_CTRL_STATUS 0x00000cb0
+#define STARTBUSY BIT(31)
+#define RXPEIEC GENMASK(22, 21)
+#define RXPEIEE BIT(20)
+#define WRRDN BIT(16)
+#define ADDR GENMASK(15, 0)
+#define MTL_RXP_IACC_DATA 0x00000cb4
#define MTL_ECC_CONTROL 0x00000cc0
#define TSOEE BIT(4)
#define MRXPEE BIT(3)
@@ -43,10 +54,12 @@
#define DMA_ECC_INT_STATUS 0x00001088
int dwmac5_safety_feat_config(void __iomem *ioaddr, unsigned int asp);
-bool dwmac5_safety_feat_irq_status(struct net_device *ndev,
+int dwmac5_safety_feat_irq_status(struct net_device *ndev,
void __iomem *ioaddr, unsigned int asp,
struct stmmac_safety_stats *stats);
-const char *dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
- int index, unsigned long *count);
+int dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
+ int index, unsigned long *count, const char **desc);
+int dwmac5_rxp_config(void __iomem *ioaddr, struct stmmac_tc_entry *entries,
+ unsigned int count);
#endif /* __DWMAC5_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
index 6768a25..77914c8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
@@ -292,7 +292,7 @@
p->des0 |= cpu_to_le32(ETDES0_OWN);
}
-static void enh_desc_set_rx_owner(struct dma_desc *p)
+static void enh_desc_set_rx_owner(struct dma_desc *p, int disable_rx_ic)
{
p->des0 |= cpu_to_le32(RDES0_OWN);
}
@@ -382,7 +382,7 @@
return (le32_to_cpu(p->des0) & ETDES0_TIME_STAMP_STATUS) >> 17;
}
-static u64 enh_desc_get_timestamp(void *desc, u32 ats)
+static void enh_desc_get_timestamp(void *desc, u32 ats, u64 *ts)
{
u64 ns;
@@ -397,7 +397,7 @@
ns += le32_to_cpu(p->des3) * 1000000000ULL;
}
- return ns;
+ *ts = ns;
}
static int enh_desc_get_rx_timestamp_status(void *desc, void *next_desc,
@@ -437,6 +437,21 @@
pr_info("\n");
}
+static void enh_desc_get_addr(struct dma_desc *p, unsigned int *addr)
+{
+ *addr = le32_to_cpu(p->des2);
+}
+
+static void enh_desc_set_addr(struct dma_desc *p, dma_addr_t addr)
+{
+ p->des2 = cpu_to_le32(addr);
+}
+
+static void enh_desc_clear(struct dma_desc *p)
+{
+ p->des2 = 0;
+}
+
const struct stmmac_desc_ops enh_desc_ops = {
.tx_status = enh_desc_get_tx_status,
.rx_status = enh_desc_get_rx_status,
@@ -457,4 +472,7 @@
.get_timestamp = enh_desc_get_timestamp,
.get_rx_timestamp_status = enh_desc_get_rx_timestamp_status,
.display_ring = enh_desc_display_ring,
+ .get_addr = enh_desc_get_addr,
+ .set_addr = enh_desc_set_addr,
+ .clear = enh_desc_clear,
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
new file mode 100644
index 0000000..14770fc
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -0,0 +1,268 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac HW Interface Handling
+ */
+
+#include "common.h"
+#include "stmmac.h"
+#include "stmmac_ptp.h"
+
+static u32 stmmac_get_id(struct stmmac_priv *priv, u32 id_reg)
+{
+ u32 reg = readl(priv->ioaddr + id_reg);
+
+ if (!reg) {
+ dev_info(priv->device, "Version ID not available\n");
+ return 0x0;
+ }
+
+ dev_info(priv->device, "User ID: 0x%x, Synopsys ID: 0x%x\n",
+ (unsigned int)(reg & GENMASK(15, 8)) >> 8,
+ (unsigned int)(reg & GENMASK(7, 0)));
+ return reg & GENMASK(7, 0);
+}
+
+static void stmmac_dwmac_mode_quirk(struct stmmac_priv *priv)
+{
+ struct mac_device_info *mac = priv->hw;
+
+ if (priv->chain_mode) {
+ dev_info(priv->device, "Chain mode enabled\n");
+ priv->mode = STMMAC_CHAIN_MODE;
+ mac->mode = &chain_mode_ops;
+ } else {
+ dev_info(priv->device, "Ring mode enabled\n");
+ priv->mode = STMMAC_RING_MODE;
+ mac->mode = &ring_mode_ops;
+ }
+}
+
+static int stmmac_dwmac1_quirks(struct stmmac_priv *priv)
+{
+ struct mac_device_info *mac = priv->hw;
+
+ if (priv->plat->enh_desc) {
+ dev_info(priv->device, "Enhanced/Alternate descriptors\n");
+
+ /* GMAC older than 3.50 has no extended descriptors */
+ if (priv->synopsys_id >= DWMAC_CORE_3_50) {
+ dev_info(priv->device, "Enabled extended descriptors\n");
+ priv->extend_desc = 1;
+ } else {
+ dev_warn(priv->device, "Extended descriptors not supported\n");
+ }
+
+ mac->desc = &enh_desc_ops;
+ } else {
+ dev_info(priv->device, "Normal descriptors\n");
+ mac->desc = &ndesc_ops;
+ }
+
+ stmmac_dwmac_mode_quirk(priv);
+ return 0;
+}
+
+static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
+{
+ stmmac_dwmac_mode_quirk(priv);
+ return 0;
+}
+
+static const struct stmmac_hwif_entry {
+ bool gmac;
+ bool gmac4;
+ u32 min_id;
+ const struct stmmac_regs_off regs;
+ const void *desc;
+ const void *dma;
+ const void *mac;
+ const void *hwtimestamp;
+ const void *mode;
+ const void *tc;
+ int (*setup)(struct stmmac_priv *priv);
+ int (*quirks)(struct stmmac_priv *priv);
+} stmmac_hw[] = {
+ /* NOTE: New HW versions shall go to the end of this table */
+ {
+ .gmac = false,
+ .gmac4 = false,
+ .min_id = 0,
+ .regs = {
+ .ptp_off = PTP_GMAC3_X_OFFSET,
+ .mmc_off = MMC_GMAC3_X_OFFSET,
+ },
+ .desc = NULL,
+ .dma = &dwmac100_dma_ops,
+ .mac = &dwmac100_ops,
+ .hwtimestamp = &stmmac_ptp,
+ .mode = NULL,
+ .tc = NULL,
+ .setup = dwmac100_setup,
+ .quirks = stmmac_dwmac1_quirks,
+ }, {
+ .gmac = true,
+ .gmac4 = false,
+ .min_id = 0,
+ .regs = {
+ .ptp_off = PTP_GMAC3_X_OFFSET,
+ .mmc_off = MMC_GMAC3_X_OFFSET,
+ },
+ .desc = NULL,
+ .dma = &dwmac1000_dma_ops,
+ .mac = &dwmac1000_ops,
+ .hwtimestamp = &stmmac_ptp,
+ .mode = NULL,
+ .tc = NULL,
+ .setup = dwmac1000_setup,
+ .quirks = stmmac_dwmac1_quirks,
+ }, {
+ .gmac = false,
+ .gmac4 = true,
+ .min_id = 0,
+ .regs = {
+ .ptp_off = PTP_GMAC4_OFFSET,
+ .mmc_off = MMC_GMAC4_OFFSET,
+ },
+ .desc = &dwmac4_desc_ops,
+ .dma = &dwmac4_dma_ops,
+ .mac = &dwmac4_ops,
+ .hwtimestamp = &stmmac_ptp,
+ .mode = NULL,
+ .tc = NULL,
+ .setup = dwmac4_setup,
+ .quirks = stmmac_dwmac4_quirks,
+ }, {
+ .gmac = false,
+ .gmac4 = true,
+ .min_id = DWMAC_CORE_4_00,
+ .regs = {
+ .ptp_off = PTP_GMAC4_OFFSET,
+ .mmc_off = MMC_GMAC4_OFFSET,
+ },
+ .desc = &dwmac4_desc_ops,
+ .dma = &dwmac4_dma_ops,
+ .mac = &dwmac410_ops,
+ .hwtimestamp = &stmmac_ptp,
+ .mode = &dwmac4_ring_mode_ops,
+ .tc = NULL,
+ .setup = dwmac4_setup,
+ .quirks = NULL,
+ }, {
+ .gmac = false,
+ .gmac4 = true,
+ .min_id = DWMAC_CORE_4_10,
+ .regs = {
+ .ptp_off = PTP_GMAC4_OFFSET,
+ .mmc_off = MMC_GMAC4_OFFSET,
+ },
+ .desc = &dwmac4_desc_ops,
+ .dma = &dwmac410_dma_ops,
+ .mac = &dwmac410_ops,
+ .hwtimestamp = &stmmac_ptp,
+ .mode = &dwmac4_ring_mode_ops,
+ .tc = NULL,
+ .setup = dwmac4_setup,
+ .quirks = NULL,
+ }, {
+ .gmac = false,
+ .gmac4 = true,
+ .min_id = DWMAC_CORE_5_10,
+ .regs = {
+ .ptp_off = PTP_GMAC4_OFFSET,
+ .mmc_off = MMC_GMAC4_OFFSET,
+ },
+ .desc = &dwmac4_desc_ops,
+ .dma = &dwmac410_dma_ops,
+ .mac = &dwmac510_ops,
+ .hwtimestamp = &stmmac_ptp,
+ .mode = &dwmac4_ring_mode_ops,
+ .tc = &dwmac510_tc_ops,
+ .setup = dwmac4_setup,
+ .quirks = NULL,
+ }
+};
+
+int stmmac_hwif_init(struct stmmac_priv *priv)
+{
+ bool needs_gmac4 = priv->plat->has_gmac4;
+ bool needs_gmac = priv->plat->has_gmac;
+ const struct stmmac_hwif_entry *entry;
+ struct mac_device_info *mac;
+ bool needs_setup = true;
+ int i, ret;
+ u32 id;
+
+ if (needs_gmac) {
+ id = stmmac_get_id(priv, GMAC_VERSION);
+ } else if (needs_gmac4) {
+ id = stmmac_get_id(priv, GMAC4_VERSION);
+ } else {
+ id = 0;
+ }
+
+ /* Save ID for later use */
+ priv->synopsys_id = id;
+
+ /* Lets assume some safe values first */
+ priv->ptpaddr = priv->ioaddr +
+ (needs_gmac4 ? PTP_GMAC4_OFFSET : PTP_GMAC3_X_OFFSET);
+ priv->mmcaddr = priv->ioaddr +
+ (needs_gmac4 ? MMC_GMAC4_OFFSET : MMC_GMAC3_X_OFFSET);
+
+ /* Check for HW specific setup first */
+ if (priv->plat->setup) {
+ mac = priv->plat->setup(priv);
+ needs_setup = false;
+ } else {
+ mac = devm_kzalloc(priv->device, sizeof(*mac), GFP_KERNEL);
+ }
+
+ if (!mac)
+ return -ENOMEM;
+
+ /* Fallback to generic HW */
+ for (i = ARRAY_SIZE(stmmac_hw) - 1; i >= 0; i--) {
+ entry = &stmmac_hw[i];
+
+ if (needs_gmac ^ entry->gmac)
+ continue;
+ if (needs_gmac4 ^ entry->gmac4)
+ continue;
+ /* Use synopsys_id var because some setups can override this */
+ if (priv->synopsys_id < entry->min_id)
+ continue;
+
+ /* Only use generic HW helpers if needed */
+ mac->desc = mac->desc ? : entry->desc;
+ mac->dma = mac->dma ? : entry->dma;
+ mac->mac = mac->mac ? : entry->mac;
+ mac->ptp = mac->ptp ? : entry->hwtimestamp;
+ mac->mode = mac->mode ? : entry->mode;
+ mac->tc = mac->tc ? : entry->tc;
+
+ priv->hw = mac;
+ priv->ptpaddr = priv->ioaddr + entry->regs.ptp_off;
+ priv->mmcaddr = priv->ioaddr + entry->regs.mmc_off;
+
+ /* Entry found */
+ if (needs_setup) {
+ ret = entry->setup(priv);
+ if (ret)
+ return ret;
+ }
+
+ /* Run quirks, if needed */
+ if (entry->quirks) {
+ ret = entry->quirks(priv);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+ }
+
+ dev_err(priv->device, "Failed to find HW IF (id=0x%x, gmac=%d/%d)\n",
+ id, needs_gmac, needs_gmac4);
+ return -EINVAL;
+}
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
new file mode 100644
index 0000000..f499a7f
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -0,0 +1,470 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+// Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+// stmmac HW Interface Callbacks
+
+#ifndef __STMMAC_HWIF_H__
+#define __STMMAC_HWIF_H__
+
+#include <linux/netdevice.h>
+
+#define stmmac_do_void_callback(__priv, __module, __cname, __arg0, __args...) \
+({ \
+ int __result = -EINVAL; \
+ if ((__priv)->hw->__module && (__priv)->hw->__module->__cname) { \
+ (__priv)->hw->__module->__cname((__arg0), ##__args); \
+ __result = 0; \
+ } \
+ __result; \
+})
+#define stmmac_do_callback(__priv, __module, __cname, __arg0, __args...) \
+({ \
+ int __result = -EINVAL; \
+ if ((__priv)->hw->__module && (__priv)->hw->__module->__cname) \
+ __result = (__priv)->hw->__module->__cname((__arg0), ##__args); \
+ __result; \
+})
+
+struct stmmac_extra_stats;
+struct stmmac_safety_stats;
+struct dma_desc;
+struct dma_extended_desc;
+
+/* Descriptors helpers */
+struct stmmac_desc_ops {
+ /* DMA RX descriptor ring initialization */
+ void (*init_rx_desc)(struct dma_desc *p, int disable_rx_ic, int mode,
+ int end);
+ /* DMA TX descriptor ring initialization */
+ void (*init_tx_desc)(struct dma_desc *p, int mode, int end);
+ /* Invoked by the xmit function to prepare the tx descriptor */
+ void (*prepare_tx_desc)(struct dma_desc *p, int is_fs, int len,
+ bool csum_flag, int mode, bool tx_own, bool ls,
+ unsigned int tot_pkt_len);
+ void (*prepare_tso_tx_desc)(struct dma_desc *p, int is_fs, int len1,
+ int len2, bool tx_own, bool ls, unsigned int tcphdrlen,
+ unsigned int tcppayloadlen);
+ /* Set/get the owner of the descriptor */
+ void (*set_tx_owner)(struct dma_desc *p);
+ int (*get_tx_owner)(struct dma_desc *p);
+ /* Clean the tx descriptor as soon as the tx irq is received */
+ void (*release_tx_desc)(struct dma_desc *p, int mode);
+ /* Clear interrupt on tx frame completion. When this bit is
+ * set an interrupt happens as soon as the frame is transmitted */
+ void (*set_tx_ic)(struct dma_desc *p);
+ /* Last tx segment reports the transmit status */
+ int (*get_tx_ls)(struct dma_desc *p);
+ /* Return the transmit status looking at the TDES1 */
+ int (*tx_status)(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p, void __iomem *ioaddr);
+ /* Get the buffer size from the descriptor */
+ int (*get_tx_len)(struct dma_desc *p);
+ /* Handle extra events on specific interrupts hw dependent */
+ void (*set_rx_owner)(struct dma_desc *p, int disable_rx_ic);
+ /* Get the receive frame size */
+ int (*get_rx_frame_len)(struct dma_desc *p, int rx_coe_type);
+ /* Return the reception status looking at the RDES1 */
+ int (*rx_status)(void *data, struct stmmac_extra_stats *x,
+ struct dma_desc *p);
+ void (*rx_extended_status)(void *data, struct stmmac_extra_stats *x,
+ struct dma_extended_desc *p);
+ /* Set tx timestamp enable bit */
+ void (*enable_tx_timestamp) (struct dma_desc *p);
+ /* get tx timestamp status */
+ int (*get_tx_timestamp_status) (struct dma_desc *p);
+ /* get timestamp value */
+ void (*get_timestamp)(void *desc, u32 ats, u64 *ts);
+ /* get rx timestamp status */
+ int (*get_rx_timestamp_status)(void *desc, void *next_desc, u32 ats);
+ /* Display ring */
+ void (*display_ring)(void *head, unsigned int size, bool rx);
+ /* set MSS via context descriptor */
+ void (*set_mss)(struct dma_desc *p, unsigned int mss);
+ /* get descriptor skbuff address */
+ void (*get_addr)(struct dma_desc *p, unsigned int *addr);
+ /* set descriptor skbuff address */
+ void (*set_addr)(struct dma_desc *p, dma_addr_t addr);
+ /* clear descriptor */
+ void (*clear)(struct dma_desc *p);
+};
+
+#define stmmac_init_rx_desc(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, init_rx_desc, __args)
+#define stmmac_init_tx_desc(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, init_tx_desc, __args)
+#define stmmac_prepare_tx_desc(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, prepare_tx_desc, __args)
+#define stmmac_prepare_tso_tx_desc(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, prepare_tso_tx_desc, __args)
+#define stmmac_set_tx_owner(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, set_tx_owner, __args)
+#define stmmac_get_tx_owner(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, get_tx_owner, __args)
+#define stmmac_release_tx_desc(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, release_tx_desc, __args)
+#define stmmac_set_tx_ic(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, set_tx_ic, __args)
+#define stmmac_get_tx_ls(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, get_tx_ls, __args)
+#define stmmac_tx_status(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, tx_status, __args)
+#define stmmac_get_tx_len(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, get_tx_len, __args)
+#define stmmac_set_rx_owner(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, set_rx_owner, __args)
+#define stmmac_get_rx_frame_len(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, get_rx_frame_len, __args)
+#define stmmac_rx_status(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, rx_status, __args)
+#define stmmac_rx_extended_status(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, rx_extended_status, __args)
+#define stmmac_enable_tx_timestamp(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, enable_tx_timestamp, __args)
+#define stmmac_get_tx_timestamp_status(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, get_tx_timestamp_status, __args)
+#define stmmac_get_timestamp(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, get_timestamp, __args)
+#define stmmac_get_rx_timestamp_status(__priv, __args...) \
+ stmmac_do_callback(__priv, desc, get_rx_timestamp_status, __args)
+#define stmmac_display_ring(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, display_ring, __args)
+#define stmmac_set_mss(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, set_mss, __args)
+#define stmmac_get_desc_addr(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, get_addr, __args)
+#define stmmac_set_desc_addr(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, set_addr, __args)
+#define stmmac_clear_desc(__priv, __args...) \
+ stmmac_do_void_callback(__priv, desc, clear, __args)
+
+struct stmmac_dma_cfg;
+struct dma_features;
+
+/* Specific DMA helpers */
+struct stmmac_dma_ops {
+ /* DMA core initialization */
+ int (*reset)(void __iomem *ioaddr);
+ void (*init)(void __iomem *ioaddr, struct stmmac_dma_cfg *dma_cfg,
+ int atds);
+ void (*init_chan)(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg, u32 chan);
+ void (*init_rx_chan)(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_rx_phy, u32 chan);
+ void (*init_tx_chan)(void __iomem *ioaddr,
+ struct stmmac_dma_cfg *dma_cfg,
+ u32 dma_tx_phy, u32 chan);
+ /* Configure the AXI Bus Mode Register */
+ void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
+ /* Dump DMA registers */
+ void (*dump_regs)(void __iomem *ioaddr, u32 *reg_space);
+ void (*dma_rx_mode)(void __iomem *ioaddr, int mode, u32 channel,
+ int fifosz, u8 qmode);
+ void (*dma_tx_mode)(void __iomem *ioaddr, int mode, u32 channel,
+ int fifosz, u8 qmode);
+ /* To track extra statistic (if supported) */
+ void (*dma_diagnostic_fr) (void *data, struct stmmac_extra_stats *x,
+ void __iomem *ioaddr);
+ void (*enable_dma_transmission) (void __iomem *ioaddr);
+ void (*enable_dma_irq)(void __iomem *ioaddr, u32 chan);
+ void (*disable_dma_irq)(void __iomem *ioaddr, u32 chan);
+ void (*start_tx)(void __iomem *ioaddr, u32 chan);
+ void (*stop_tx)(void __iomem *ioaddr, u32 chan);
+ void (*start_rx)(void __iomem *ioaddr, u32 chan);
+ void (*stop_rx)(void __iomem *ioaddr, u32 chan);
+ int (*dma_interrupt) (void __iomem *ioaddr,
+ struct stmmac_extra_stats *x, u32 chan);
+ /* If supported then get the optional core features */
+ void (*get_hw_feature)(void __iomem *ioaddr,
+ struct dma_features *dma_cap);
+ /* Program the HW RX Watchdog */
+ void (*rx_watchdog)(void __iomem *ioaddr, u32 riwt, u32 number_chan);
+ void (*set_tx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
+ void (*set_rx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
+ void (*set_rx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
+ void (*set_tx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
+ void (*enable_tso)(void __iomem *ioaddr, bool en, u32 chan);
+};
+
+#define stmmac_reset(__priv, __args...) \
+ stmmac_do_callback(__priv, dma, reset, __args)
+#define stmmac_dma_init(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, init, __args)
+#define stmmac_init_chan(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, init_chan, __args)
+#define stmmac_init_rx_chan(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, init_rx_chan, __args)
+#define stmmac_init_tx_chan(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, init_tx_chan, __args)
+#define stmmac_axi(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, axi, __args)
+#define stmmac_dump_dma_regs(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, dump_regs, __args)
+#define stmmac_dma_rx_mode(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, dma_rx_mode, __args)
+#define stmmac_dma_tx_mode(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, dma_tx_mode, __args)
+#define stmmac_dma_diagnostic_fr(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, dma_diagnostic_fr, __args)
+#define stmmac_enable_dma_transmission(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, enable_dma_transmission, __args)
+#define stmmac_enable_dma_irq(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, enable_dma_irq, __args)
+#define stmmac_disable_dma_irq(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, disable_dma_irq, __args)
+#define stmmac_start_tx(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, start_tx, __args)
+#define stmmac_stop_tx(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, stop_tx, __args)
+#define stmmac_start_rx(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, start_rx, __args)
+#define stmmac_stop_rx(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, stop_rx, __args)
+#define stmmac_dma_interrupt_status(__priv, __args...) \
+ stmmac_do_callback(__priv, dma, dma_interrupt, __args)
+#define stmmac_get_hw_feature(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, get_hw_feature, __args)
+#define stmmac_rx_watchdog(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, rx_watchdog, __args)
+#define stmmac_set_tx_ring_len(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, set_tx_ring_len, __args)
+#define stmmac_set_rx_ring_len(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, set_rx_ring_len, __args)
+#define stmmac_set_rx_tail_ptr(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, set_rx_tail_ptr, __args)
+#define stmmac_set_tx_tail_ptr(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, set_tx_tail_ptr, __args)
+#define stmmac_enable_tso(__priv, __args...) \
+ stmmac_do_void_callback(__priv, dma, enable_tso, __args)
+
+struct mac_device_info;
+struct net_device;
+struct rgmii_adv;
+struct stmmac_safety_stats;
+struct stmmac_tc_entry;
+
+/* Helpers to program the MAC core */
+struct stmmac_ops {
+ /* MAC core initialization */
+ void (*core_init)(struct mac_device_info *hw, struct net_device *dev);
+ /* Enable the MAC RX/TX */
+ void (*set_mac)(void __iomem *ioaddr, bool enable);
+ /* Enable and verify that the IPC module is supported */
+ int (*rx_ipc)(struct mac_device_info *hw);
+ /* Enable RX Queues */
+ void (*rx_queue_enable)(struct mac_device_info *hw, u8 mode, u32 queue);
+ /* RX Queues Priority */
+ void (*rx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
+ /* TX Queues Priority */
+ void (*tx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
+ /* RX Queues Routing */
+ void (*rx_queue_routing)(struct mac_device_info *hw, u8 packet,
+ u32 queue);
+ /* Program RX Algorithms */
+ void (*prog_mtl_rx_algorithms)(struct mac_device_info *hw, u32 rx_alg);
+ /* Program TX Algorithms */
+ void (*prog_mtl_tx_algorithms)(struct mac_device_info *hw, u32 tx_alg);
+ /* Set MTL TX queues weight */
+ void (*set_mtl_tx_queue_weight)(struct mac_device_info *hw,
+ u32 weight, u32 queue);
+ /* RX MTL queue to RX dma mapping */
+ void (*map_mtl_to_dma)(struct mac_device_info *hw, u32 queue, u32 chan);
+ /* Configure AV Algorithm */
+ void (*config_cbs)(struct mac_device_info *hw, u32 send_slope,
+ u32 idle_slope, u32 high_credit, u32 low_credit,
+ u32 queue);
+ /* Dump MAC registers */
+ void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
+ /* Handle extra events on specific interrupts hw dependent */
+ int (*host_irq_status)(struct mac_device_info *hw,
+ struct stmmac_extra_stats *x);
+ /* Handle MTL interrupts */
+ int (*host_mtl_irq_status)(struct mac_device_info *hw, u32 chan);
+ /* Multicast filter setting */
+ void (*set_filter)(struct mac_device_info *hw, struct net_device *dev);
+ /* Flow control setting */
+ void (*flow_ctrl)(struct mac_device_info *hw, unsigned int duplex,
+ unsigned int fc, unsigned int pause_time, u32 tx_cnt);
+ /* Set power management mode (e.g. magic frame) */
+ void (*pmt)(struct mac_device_info *hw, unsigned long mode);
+ /* Set/Get Unicast MAC addresses */
+ void (*set_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
+ unsigned int reg_n);
+ void (*get_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
+ unsigned int reg_n);
+ void (*set_eee_mode)(struct mac_device_info *hw,
+ bool en_tx_lpi_clockgating);
+ void (*reset_eee_mode)(struct mac_device_info *hw);
+ void (*set_eee_timer)(struct mac_device_info *hw, int ls, int tw);
+ void (*set_eee_pls)(struct mac_device_info *hw, int link);
+ void (*debug)(void __iomem *ioaddr, struct stmmac_extra_stats *x,
+ u32 rx_queues, u32 tx_queues);
+ /* PCS calls */
+ void (*pcs_ctrl_ane)(void __iomem *ioaddr, bool ane, bool srgmi_ral,
+ bool loopback);
+ void (*pcs_rane)(void __iomem *ioaddr, bool restart);
+ void (*pcs_get_adv_lp)(void __iomem *ioaddr, struct rgmii_adv *adv);
+ /* Safety Features */
+ int (*safety_feat_config)(void __iomem *ioaddr, unsigned int asp);
+ int (*safety_feat_irq_status)(struct net_device *ndev,
+ void __iomem *ioaddr, unsigned int asp,
+ struct stmmac_safety_stats *stats);
+ int (*safety_feat_dump)(struct stmmac_safety_stats *stats,
+ int index, unsigned long *count, const char **desc);
+ /* Flexible RX Parser */
+ int (*rxp_config)(void __iomem *ioaddr, struct stmmac_tc_entry *entries,
+ unsigned int count);
+};
+
+#define stmmac_core_init(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, core_init, __args)
+#define stmmac_mac_set(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, set_mac, __args)
+#define stmmac_rx_ipc(__priv, __args...) \
+ stmmac_do_callback(__priv, mac, rx_ipc, __args)
+#define stmmac_rx_queue_enable(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, rx_queue_enable, __args)
+#define stmmac_rx_queue_prio(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, rx_queue_prio, __args)
+#define stmmac_tx_queue_prio(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, tx_queue_prio, __args)
+#define stmmac_rx_queue_routing(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, rx_queue_routing, __args)
+#define stmmac_prog_mtl_rx_algorithms(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, prog_mtl_rx_algorithms, __args)
+#define stmmac_prog_mtl_tx_algorithms(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, prog_mtl_tx_algorithms, __args)
+#define stmmac_set_mtl_tx_queue_weight(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, set_mtl_tx_queue_weight, __args)
+#define stmmac_map_mtl_to_dma(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, map_mtl_to_dma, __args)
+#define stmmac_config_cbs(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, config_cbs, __args)
+#define stmmac_dump_mac_regs(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, dump_regs, __args)
+#define stmmac_host_irq_status(__priv, __args...) \
+ stmmac_do_callback(__priv, mac, host_irq_status, __args)
+#define stmmac_host_mtl_irq_status(__priv, __args...) \
+ stmmac_do_callback(__priv, mac, host_mtl_irq_status, __args)
+#define stmmac_set_filter(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, set_filter, __args)
+#define stmmac_flow_ctrl(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, flow_ctrl, __args)
+#define stmmac_pmt(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, pmt, __args)
+#define stmmac_set_umac_addr(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, set_umac_addr, __args)
+#define stmmac_get_umac_addr(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, get_umac_addr, __args)
+#define stmmac_set_eee_mode(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, set_eee_mode, __args)
+#define stmmac_reset_eee_mode(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, reset_eee_mode, __args)
+#define stmmac_set_eee_timer(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, set_eee_timer, __args)
+#define stmmac_set_eee_pls(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, set_eee_pls, __args)
+#define stmmac_mac_debug(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, debug, __args)
+#define stmmac_pcs_ctrl_ane(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, pcs_ctrl_ane, __args)
+#define stmmac_pcs_rane(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, pcs_rane, __args)
+#define stmmac_pcs_get_adv_lp(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mac, pcs_get_adv_lp, __args)
+#define stmmac_safety_feat_config(__priv, __args...) \
+ stmmac_do_callback(__priv, mac, safety_feat_config, __args)
+#define stmmac_safety_feat_irq_status(__priv, __args...) \
+ stmmac_do_callback(__priv, mac, safety_feat_irq_status, __args)
+#define stmmac_safety_feat_dump(__priv, __args...) \
+ stmmac_do_callback(__priv, mac, safety_feat_dump, __args)
+#define stmmac_rxp_config(__priv, __args...) \
+ stmmac_do_callback(__priv, mac, rxp_config, __args)
+
+/* PTP and HW Timer helpers */
+struct stmmac_hwtimestamp {
+ void (*config_hw_tstamping) (void __iomem *ioaddr, u32 data);
+ void (*config_sub_second_increment)(void __iomem *ioaddr, u32 ptp_clock,
+ int gmac4, u32 *ssinc);
+ int (*init_systime) (void __iomem *ioaddr, u32 sec, u32 nsec);
+ int (*config_addend) (void __iomem *ioaddr, u32 addend);
+ int (*adjust_systime) (void __iomem *ioaddr, u32 sec, u32 nsec,
+ int add_sub, int gmac4);
+ void (*get_systime) (void __iomem *ioaddr, u64 *systime);
+};
+
+#define stmmac_config_hw_tstamping(__priv, __args...) \
+ stmmac_do_void_callback(__priv, ptp, config_hw_tstamping, __args)
+#define stmmac_config_sub_second_increment(__priv, __args...) \
+ stmmac_do_void_callback(__priv, ptp, config_sub_second_increment, __args)
+#define stmmac_init_systime(__priv, __args...) \
+ stmmac_do_callback(__priv, ptp, init_systime, __args)
+#define stmmac_config_addend(__priv, __args...) \
+ stmmac_do_callback(__priv, ptp, config_addend, __args)
+#define stmmac_adjust_systime(__priv, __args...) \
+ stmmac_do_callback(__priv, ptp, adjust_systime, __args)
+#define stmmac_get_systime(__priv, __args...) \
+ stmmac_do_void_callback(__priv, ptp, get_systime, __args)
+
+/* Helpers to manage the descriptors for chain and ring modes */
+struct stmmac_mode_ops {
+ void (*init) (void *des, dma_addr_t phy_addr, unsigned int size,
+ unsigned int extend_desc);
+ unsigned int (*is_jumbo_frm) (int len, int ehn_desc);
+ int (*jumbo_frm)(void *priv, struct sk_buff *skb, int csum);
+ int (*set_16kib_bfsize)(int mtu);
+ void (*init_desc3)(struct dma_desc *p);
+ void (*refill_desc3) (void *priv, struct dma_desc *p);
+ void (*clean_desc3) (void *priv, struct dma_desc *p);
+};
+
+#define stmmac_mode_init(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mode, init, __args)
+#define stmmac_is_jumbo_frm(__priv, __args...) \
+ stmmac_do_callback(__priv, mode, is_jumbo_frm, __args)
+#define stmmac_jumbo_frm(__priv, __args...) \
+ stmmac_do_callback(__priv, mode, jumbo_frm, __args)
+#define stmmac_set_16kib_bfsize(__priv, __args...) \
+ stmmac_do_callback(__priv, mode, set_16kib_bfsize, __args)
+#define stmmac_init_desc3(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mode, init_desc3, __args)
+#define stmmac_refill_desc3(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mode, refill_desc3, __args)
+#define stmmac_clean_desc3(__priv, __args...) \
+ stmmac_do_void_callback(__priv, mode, clean_desc3, __args)
+
+struct stmmac_priv;
+struct tc_cls_u32_offload;
+
+struct stmmac_tc_ops {
+ int (*init)(struct stmmac_priv *priv);
+ int (*setup_cls_u32)(struct stmmac_priv *priv,
+ struct tc_cls_u32_offload *cls);
+};
+
+#define stmmac_tc_init(__priv, __args...) \
+ stmmac_do_callback(__priv, tc, init, __args)
+#define stmmac_tc_setup_cls_u32(__priv, __args...) \
+ stmmac_do_callback(__priv, tc, setup_cls_u32, __args)
+
+struct stmmac_regs_off {
+ u32 ptp_off;
+ u32 mmc_off;
+};
+
+extern const struct stmmac_ops dwmac100_ops;
+extern const struct stmmac_dma_ops dwmac100_dma_ops;
+extern const struct stmmac_ops dwmac1000_ops;
+extern const struct stmmac_dma_ops dwmac1000_dma_ops;
+extern const struct stmmac_ops dwmac4_ops;
+extern const struct stmmac_dma_ops dwmac4_dma_ops;
+extern const struct stmmac_ops dwmac410_ops;
+extern const struct stmmac_dma_ops dwmac410_dma_ops;
+extern const struct stmmac_ops dwmac510_ops;
+extern const struct stmmac_tc_ops dwmac510_tc_ops;
+
+#define GMAC_VERSION 0x00000020 /* GMAC CORE Version */
+#define GMAC4_VERSION 0x00000110 /* GMAC4+ CORE Version */
+
+int stmmac_hwif_init(struct stmmac_priv *priv);
+
+#endif /* __STMMAC_HWIF_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index ebd9e5e..de65bb2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -168,7 +168,7 @@
p->des0 |= cpu_to_le32(TDES0_OWN);
}
-static void ndesc_set_rx_owner(struct dma_desc *p)
+static void ndesc_set_rx_owner(struct dma_desc *p, int disable_rx_ic)
{
p->des0 |= cpu_to_le32(RDES0_OWN);
}
@@ -253,7 +253,7 @@
return (le32_to_cpu(p->des0) & TDES0_TIME_STAMP_STATUS) >> 17;
}
-static u64 ndesc_get_timestamp(void *desc, u32 ats)
+static void ndesc_get_timestamp(void *desc, u32 ats, u64 *ts)
{
struct dma_desc *p = (struct dma_desc *)desc;
u64 ns;
@@ -262,7 +262,7 @@
/* convert high/sec time stamp value to nanosecond */
ns += le32_to_cpu(p->des3) * 1000000000ULL;
- return ns;
+ *ts = ns;
}
static int ndesc_get_rx_timestamp_status(void *desc, void *next_desc, u32 ats)
@@ -297,6 +297,21 @@
pr_info("\n");
}
+static void ndesc_get_addr(struct dma_desc *p, unsigned int *addr)
+{
+ *addr = le32_to_cpu(p->des2);
+}
+
+static void ndesc_set_addr(struct dma_desc *p, dma_addr_t addr)
+{
+ p->des2 = cpu_to_le32(addr);
+}
+
+static void ndesc_clear(struct dma_desc *p)
+{
+ p->des2 = 0;
+}
+
const struct stmmac_desc_ops ndesc_ops = {
.tx_status = ndesc_get_tx_status,
.rx_status = ndesc_get_rx_status,
@@ -316,4 +331,7 @@
.get_timestamp = ndesc_get_timestamp,
.get_rx_timestamp_status = ndesc_get_rx_timestamp_status,
.display_ring = ndesc_display_ring,
+ .get_addr = ndesc_get_addr,
+ .set_addr = ndesc_set_addr,
+ .clear = ndesc_clear,
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
index 28e4b5d..a7ffc73 100644
--- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
@@ -24,7 +24,7 @@
#include "stmmac.h"
-static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
+static int jumbo_frm(void *p, struct sk_buff *skb, int csum)
{
struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)p;
unsigned int nopaged_len = skb_headlen(skb);
@@ -58,9 +58,8 @@
tx_q->tx_skbuff_dma[entry].is_jumbo = true;
desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB);
- priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum,
- STMMAC_RING_MODE, 0,
- false, skb->len);
+ stmmac_prepare_tx_desc(priv, desc, 1, bmax, csum,
+ STMMAC_RING_MODE, 0, false, skb->len);
tx_q->tx_skbuff[entry] = NULL;
entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
@@ -79,9 +78,8 @@
tx_q->tx_skbuff_dma[entry].is_jumbo = true;
desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB);
- priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
- STMMAC_RING_MODE, 1,
- true, skb->len);
+ stmmac_prepare_tx_desc(priv, desc, 0, len, csum,
+ STMMAC_RING_MODE, 1, true, skb->len);
} else {
des2 = dma_map_single(priv->device, skb->data,
nopaged_len, DMA_TO_DEVICE);
@@ -92,9 +90,8 @@
tx_q->tx_skbuff_dma[entry].len = nopaged_len;
tx_q->tx_skbuff_dma[entry].is_jumbo = true;
desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB);
- priv->hw->desc->prepare_tx_desc(desc, 1, nopaged_len, csum,
- STMMAC_RING_MODE, 0,
- true, skb->len);
+ stmmac_prepare_tx_desc(priv, desc, 1, nopaged_len, csum,
+ STMMAC_RING_MODE, 0, true, skb->len);
}
tx_q->cur_tx = entry;
@@ -102,7 +99,7 @@
return entry;
}
-static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
+static unsigned int is_jumbo_frm(int len, int enh_desc)
{
unsigned int ret = 0;
@@ -112,7 +109,7 @@
return ret;
}
-static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
+static void refill_desc3(void *priv_ptr, struct dma_desc *p)
{
struct stmmac_priv *priv = (struct stmmac_priv *)priv_ptr;
@@ -122,12 +119,12 @@
}
/* In ring mode we need to fill the desc3 because it is used as buffer */
-static void stmmac_init_desc3(struct dma_desc *p)
+static void init_desc3(struct dma_desc *p)
{
p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB);
}
-static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
+static void clean_desc3(void *priv_ptr, struct dma_desc *p)
{
struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)priv_ptr;
struct stmmac_priv *priv = tx_q->priv_data;
@@ -140,7 +137,7 @@
p->des3 = 0;
}
-static int stmmac_set_16kib_bfsize(int mtu)
+static int set_16kib_bfsize(int mtu)
{
int ret = 0;
if (unlikely(mtu >= BUF_SIZE_8KiB))
@@ -149,10 +146,10 @@
}
const struct stmmac_mode_ops ring_mode_ops = {
- .is_jumbo_frm = stmmac_is_jumbo_frm,
- .jumbo_frm = stmmac_jumbo_frm,
- .refill_desc3 = stmmac_refill_desc3,
- .init_desc3 = stmmac_init_desc3,
- .clean_desc3 = stmmac_clean_desc3,
- .set_16kib_bfsize = stmmac_set_16kib_bfsize,
+ .is_jumbo_frm = is_jumbo_frm,
+ .jumbo_frm = jumbo_frm,
+ .refill_desc3 = refill_desc3,
+ .init_desc3 = init_desc3,
+ .clean_desc3 = clean_desc3,
+ .set_16kib_bfsize = set_16kib_bfsize,
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index da50451..4d425b1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -76,11 +76,36 @@
struct napi_struct napi ____cacheline_aligned_in_smp;
};
+struct stmmac_tc_entry {
+ bool in_use;
+ bool in_hw;
+ bool is_last;
+ bool is_frag;
+ void *frag_ptr;
+ unsigned int table_pos;
+ u32 handle;
+ u32 prio;
+ struct {
+ u32 match_data;
+ u32 match_en;
+ u8 af:1;
+ u8 rf:1;
+ u8 im:1;
+ u8 nc:1;
+ u8 res1:4;
+ u8 frame_offset;
+ u8 ok_index;
+ u8 dma_ch_no;
+ u32 res2;
+ } __packed val;
+};
+
struct stmmac_priv {
/* Frequently used values are kept adjacent for cache effect */
u32 tx_count_frames;
u32 tx_coal_frames;
u32 tx_coal_timer;
+ bool tx_timer_armed;
int tx_coalesce;
int hwts_tx_en;
@@ -130,6 +155,7 @@
int eee_active;
int tx_lpi_timer;
unsigned int mode;
+ unsigned int chain_mode;
int extend_desc;
struct ptp_clock *ptp_clock;
struct ptp_clock_info ptp_clock_ops;
@@ -150,6 +176,11 @@
unsigned long state;
struct workqueue_struct *wq;
struct work_struct service_task;
+
+ /* TC Handling */
+ unsigned int tc_entries_max;
+ unsigned int tc_off_max;
+ struct stmmac_tc_entry *tc_entries;
};
enum stmmac_state {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 2c6ed47..6d82b3e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -291,11 +291,9 @@
cmd->base.speed = priv->xstats.pcs_speed;
/* Get and convert ADV/LP_ADV from the HW AN registers */
- if (!priv->hw->mac->pcs_get_adv_lp)
+ if (stmmac_pcs_get_adv_lp(priv, priv->ioaddr, &adv))
return -EOPNOTSUPP; /* should never happen indeed */
- priv->hw->mac->pcs_get_adv_lp(priv->ioaddr, &adv);
-
/* Encoding of PSE bits is defined in 802.3z, 37.2.1.4 */
ethtool_convert_link_mode_to_legacy_u32(
@@ -393,11 +391,7 @@
ADVERTISED_10baseT_Full);
spin_lock(&priv->lock);
-
- if (priv->hw->mac->pcs_ctrl_ane)
- priv->hw->mac->pcs_ctrl_ane(priv->ioaddr, 1,
- priv->hw->ps, 0);
-
+ stmmac_pcs_ctrl_ane(priv, priv->ioaddr, 1, priv->hw->ps, 0);
spin_unlock(&priv->lock);
return 0;
@@ -442,8 +436,8 @@
memset(reg_space, 0x0, REG_SPACE_SIZE);
- priv->hw->mac->dump_regs(priv->hw, reg_space);
- priv->hw->dma->dump_regs(priv->ioaddr, reg_space);
+ stmmac_dump_mac_regs(priv, priv->hw, reg_space);
+ stmmac_dump_dma_regs(priv, priv->ioaddr, reg_space);
/* Copy DMA registers to where ethtool expects them */
memcpy(®_space[ETHTOOL_DMA_OFFSET], ®_space[DMA_BUS_MODE / 4],
NUM_DWMAC1000_DMA_REGS * 4);
@@ -454,15 +448,13 @@
struct ethtool_pauseparam *pause)
{
struct stmmac_priv *priv = netdev_priv(netdev);
+ struct rgmii_adv adv_lp;
pause->rx_pause = 0;
pause->tx_pause = 0;
- if (priv->hw->pcs && priv->hw->mac->pcs_get_adv_lp) {
- struct rgmii_adv adv_lp;
-
+ if (priv->hw->pcs && !stmmac_pcs_get_adv_lp(priv, priv->ioaddr, &adv_lp)) {
pause->autoneg = 1;
- priv->hw->mac->pcs_get_adv_lp(priv->ioaddr, &adv_lp);
if (!adv_lp.pause)
return;
} else {
@@ -488,12 +480,10 @@
u32 tx_cnt = priv->plat->tx_queues_to_use;
struct phy_device *phy = netdev->phydev;
int new_pause = FLOW_OFF;
+ struct rgmii_adv adv_lp;
- if (priv->hw->pcs && priv->hw->mac->pcs_get_adv_lp) {
- struct rgmii_adv adv_lp;
-
+ if (priv->hw->pcs && !stmmac_pcs_get_adv_lp(priv, priv->ioaddr, &adv_lp)) {
pause->autoneg = 1;
- priv->hw->mac->pcs_get_adv_lp(priv->ioaddr, &adv_lp);
if (!adv_lp.pause)
return -EOPNOTSUPP;
} else {
@@ -515,37 +505,32 @@
return phy_start_aneg(phy);
}
- priv->hw->mac->flow_ctrl(priv->hw, phy->duplex, priv->flow_ctrl,
- priv->pause, tx_cnt);
+ stmmac_flow_ctrl(priv, priv->hw, phy->duplex, priv->flow_ctrl,
+ priv->pause, tx_cnt);
return 0;
}
static void stmmac_get_ethtool_stats(struct net_device *dev,
struct ethtool_stats *dummy, u64 *data)
{
- const char *(*dump)(struct stmmac_safety_stats *stats, int index,
- unsigned long *count);
struct stmmac_priv *priv = netdev_priv(dev);
u32 rx_queues_count = priv->plat->rx_queues_to_use;
u32 tx_queues_count = priv->plat->tx_queues_to_use;
unsigned long count;
- int i, j = 0;
+ int i, j = 0, ret;
- if (priv->dma_cap.asp && priv->hw->mac->safety_feat_dump) {
- dump = priv->hw->mac->safety_feat_dump;
-
+ if (priv->dma_cap.asp) {
for (i = 0; i < STMMAC_SAFETY_FEAT_SIZE; i++) {
- if (dump(&priv->sstats, i, &count))
+ if (!stmmac_safety_feat_dump(priv, &priv->sstats, i,
+ &count, NULL))
data[j++] = count;
}
}
/* Update the DMA HW counters for dwmac10/100 */
- if (priv->hw->dma->dma_diagnostic_fr)
- priv->hw->dma->dma_diagnostic_fr(&dev->stats,
- (void *) &priv->xstats,
- priv->ioaddr);
- else {
+ ret = stmmac_dma_diagnostic_fr(priv, &dev->stats, (void *) &priv->xstats,
+ priv->ioaddr);
+ if (ret) {
/* If supported, for new GMAC chips expose the MMC counters */
if (priv->dma_cap.rmon) {
dwmac_mmc_read(priv->mmcaddr, &priv->mmc);
@@ -565,11 +550,10 @@
priv->xstats.phy_eee_wakeup_error_n = val;
}
- if ((priv->hw->mac->debug) &&
- (priv->synopsys_id >= DWMAC_CORE_3_50))
- priv->hw->mac->debug(priv->ioaddr,
- (void *)&priv->xstats,
- rx_queues_count, tx_queues_count);
+ if (priv->synopsys_id >= DWMAC_CORE_3_50)
+ stmmac_mac_debug(priv, priv->ioaddr,
+ (void *)&priv->xstats,
+ rx_queues_count, tx_queues_count);
}
for (i = 0; i < STMMAC_STATS_LEN; i++) {
char *p = (char *)priv + stmmac_gstrings_stats[i].stat_offset;
@@ -581,8 +565,6 @@
static int stmmac_get_sset_count(struct net_device *netdev, int sset)
{
struct stmmac_priv *priv = netdev_priv(netdev);
- const char *(*dump)(struct stmmac_safety_stats *stats, int index,
- unsigned long *count);
int i, len, safety_len = 0;
switch (sset) {
@@ -591,11 +573,11 @@
if (priv->dma_cap.rmon)
len += STMMAC_MMC_STATS_LEN;
- if (priv->dma_cap.asp && priv->hw->mac->safety_feat_dump) {
- dump = priv->hw->mac->safety_feat_dump;
-
+ if (priv->dma_cap.asp) {
for (i = 0; i < STMMAC_SAFETY_FEAT_SIZE; i++) {
- if (dump(&priv->sstats, i, NULL))
+ if (!stmmac_safety_feat_dump(priv,
+ &priv->sstats, i,
+ NULL, NULL))
safety_len++;
}
@@ -613,17 +595,15 @@
int i;
u8 *p = data;
struct stmmac_priv *priv = netdev_priv(dev);
- const char *(*dump)(struct stmmac_safety_stats *stats, int index,
- unsigned long *count);
switch (stringset) {
case ETH_SS_STATS:
- if (priv->dma_cap.asp && priv->hw->mac->safety_feat_dump) {
- dump = priv->hw->mac->safety_feat_dump;
+ if (priv->dma_cap.asp) {
for (i = 0; i < STMMAC_SAFETY_FEAT_SIZE; i++) {
- const char *desc = dump(&priv->sstats, i, NULL);
-
- if (desc) {
+ const char *desc;
+ if (!stmmac_safety_feat_dump(priv,
+ &priv->sstats, i,
+ NULL, &desc)) {
memcpy(p, desc, ETH_GSTRING_LEN);
p += ETH_GSTRING_LEN;
}
@@ -810,7 +790,7 @@
priv->tx_coal_frames = ec->tx_max_coalesced_frames;
priv->tx_coal_timer = ec->tx_coalesce_usecs;
priv->rx_riwt = rx_riwt;
- priv->hw->dma->rx_watchdog(priv->ioaddr, priv->rx_riwt, rx_cnt);
+ stmmac_rx_watchdog(priv, priv->ioaddr, priv->rx_riwt, rx_cnt);
return 0;
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
index 08c19eb..8d9cc21 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
@@ -24,13 +24,13 @@
#include "common.h"
#include "stmmac_ptp.h"
-static void stmmac_config_hw_tstamping(void __iomem *ioaddr, u32 data)
+static void config_hw_tstamping(void __iomem *ioaddr, u32 data)
{
writel(data, ioaddr + PTP_TCR);
}
-static u32 stmmac_config_sub_second_increment(void __iomem *ioaddr,
- u32 ptp_clock, int gmac4)
+static void config_sub_second_increment(void __iomem *ioaddr,
+ u32 ptp_clock, int gmac4, u32 *ssinc)
{
u32 value = readl(ioaddr + PTP_TCR);
unsigned long data;
@@ -57,10 +57,11 @@
writel(reg_value, ioaddr + PTP_SSIR);
- return data;
+ if (ssinc)
+ *ssinc = data;
}
-static int stmmac_init_systime(void __iomem *ioaddr, u32 sec, u32 nsec)
+static int init_systime(void __iomem *ioaddr, u32 sec, u32 nsec)
{
int limit;
u32 value;
@@ -85,7 +86,7 @@
return 0;
}
-static int stmmac_config_addend(void __iomem *ioaddr, u32 addend)
+static int config_addend(void __iomem *ioaddr, u32 addend)
{
u32 value;
int limit;
@@ -109,8 +110,8 @@
return 0;
}
-static int stmmac_adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec,
- int add_sub, int gmac4)
+static int adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec,
+ int add_sub, int gmac4)
{
u32 value;
int limit;
@@ -152,7 +153,7 @@
return 0;
}
-static u64 stmmac_get_systime(void __iomem *ioaddr)
+static void get_systime(void __iomem *ioaddr, u64 *systime)
{
u64 ns;
@@ -161,14 +162,15 @@
/* Get the TSS and convert sec time value to nanosecond */
ns += readl(ioaddr + PTP_STSR) * 1000000000ULL;
- return ns;
+ if (systime)
+ *systime = ns;
}
const struct stmmac_hwtimestamp stmmac_ptp = {
- .config_hw_tstamping = stmmac_config_hw_tstamping,
- .init_systime = stmmac_init_systime,
- .config_sub_second_increment = stmmac_config_sub_second_increment,
- .config_addend = stmmac_config_addend,
- .adjust_systime = stmmac_adjust_systime,
- .get_systime = stmmac_get_systime,
+ .config_hw_tstamping = config_hw_tstamping,
+ .init_systime = init_systime,
+ .config_sub_second_increment = config_sub_second_increment,
+ .config_addend = config_addend,
+ .adjust_systime = adjust_systime,
+ .get_systime = get_systime,
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index b65e2d1..c32de53 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -45,11 +45,13 @@
#include <linux/seq_file.h>
#endif /* CONFIG_DEBUG_FS */
#include <linux/net_tstamp.h>
+#include <net/pkt_cls.h>
#include "stmmac_ptp.h"
#include "stmmac.h"
#include <linux/reset.h>
#include <linux/of_mdio.h>
#include "dwmac1000.h"
+#include "hwif.h"
#define STMMAC_ALIGN(x) L1_CACHE_ALIGN(x)
#define TSO_MAX_BUFF_SIZE (SZ_16K - 1)
@@ -335,8 +337,8 @@
/* Check and enter in LPI mode */
if (!priv->tx_path_in_lpi_mode)
- priv->hw->mac->set_eee_mode(priv->hw,
- priv->plat->en_tx_lpi_clockgating);
+ stmmac_set_eee_mode(priv, priv->hw,
+ priv->plat->en_tx_lpi_clockgating);
}
/**
@@ -347,7 +349,7 @@
*/
void stmmac_disable_eee_mode(struct stmmac_priv *priv)
{
- priv->hw->mac->reset_eee_mode(priv->hw);
+ stmmac_reset_eee_mode(priv, priv->hw);
del_timer_sync(&priv->eee_ctrl_timer);
priv->tx_path_in_lpi_mode = false;
}
@@ -410,8 +412,8 @@
if (priv->eee_active) {
netdev_dbg(priv->dev, "disable EEE\n");
del_timer_sync(&priv->eee_ctrl_timer);
- priv->hw->mac->set_eee_timer(priv->hw, 0,
- tx_lpi_timer);
+ stmmac_set_eee_timer(priv, priv->hw, 0,
+ tx_lpi_timer);
}
priv->eee_active = 0;
spin_unlock_irqrestore(&priv->lock, flags);
@@ -426,12 +428,11 @@
mod_timer(&priv->eee_ctrl_timer,
STMMAC_LPI_T(eee_timer));
- priv->hw->mac->set_eee_timer(priv->hw,
- STMMAC_DEFAULT_LIT_LS,
- tx_lpi_timer);
+ stmmac_set_eee_timer(priv, priv->hw,
+ STMMAC_DEFAULT_LIT_LS, tx_lpi_timer);
}
/* Set HW EEE according to the speed */
- priv->hw->mac->set_eee_pls(priv->hw, ndev->phydev->link);
+ stmmac_set_eee_pls(priv, priv->hw, ndev->phydev->link);
ret = true;
spin_unlock_irqrestore(&priv->lock, flags);
@@ -464,9 +465,9 @@
return;
/* check tx tstamp status */
- if (priv->hw->desc->get_tx_timestamp_status(p)) {
+ if (stmmac_get_tx_timestamp_status(priv, p)) {
/* get the valid tstamp */
- ns = priv->hw->desc->get_timestamp(p, priv->adv_ts);
+ stmmac_get_timestamp(priv, p, priv->adv_ts, &ns);
memset(&shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps));
shhwtstamp.hwtstamp = ns_to_ktime(ns);
@@ -502,8 +503,8 @@
desc = np;
/* Check if timestamp is available */
- if (priv->hw->desc->get_rx_timestamp_status(p, np, priv->adv_ts)) {
- ns = priv->hw->desc->get_timestamp(desc, priv->adv_ts);
+ if (stmmac_get_rx_timestamp_status(priv, p, np, priv->adv_ts)) {
+ stmmac_get_timestamp(priv, desc, priv->adv_ts, &ns);
netdev_dbg(priv->dev, "get valid RX hw timestamp %llu\n", ns);
shhwtstamp = skb_hwtstamps(skb);
memset(shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps));
@@ -707,18 +708,18 @@
priv->hwts_tx_en = config.tx_type == HWTSTAMP_TX_ON;
if (!priv->hwts_tx_en && !priv->hwts_rx_en)
- priv->hw->ptp->config_hw_tstamping(priv->ptpaddr, 0);
+ stmmac_config_hw_tstamping(priv, priv->ptpaddr, 0);
else {
value = (PTP_TCR_TSENA | PTP_TCR_TSCFUPDT | PTP_TCR_TSCTRLSSR |
tstamp_all | ptp_v2 | ptp_over_ethernet |
ptp_over_ipv6_udp | ptp_over_ipv4_udp | ts_event_en |
ts_master_en | snap_type_sel);
- priv->hw->ptp->config_hw_tstamping(priv->ptpaddr, value);
+ stmmac_config_hw_tstamping(priv, priv->ptpaddr, value);
/* program Sub Second Increment reg */
- sec_inc = priv->hw->ptp->config_sub_second_increment(
- priv->ptpaddr, priv->plat->clk_ptp_rate,
- priv->plat->has_gmac4);
+ stmmac_config_sub_second_increment(priv,
+ priv->ptpaddr, priv->plat->clk_ptp_rate,
+ priv->plat->has_gmac4, &sec_inc);
temp = div_u64(1000000000ULL, sec_inc);
/* calculate default added value:
@@ -728,15 +729,14 @@
*/
temp = (u64)(temp << 32);
priv->default_addend = div_u64(temp, priv->plat->clk_ptp_rate);
- priv->hw->ptp->config_addend(priv->ptpaddr,
- priv->default_addend);
+ stmmac_config_addend(priv, priv->ptpaddr, priv->default_addend);
/* initialize system time */
ktime_get_real_ts64(&now);
/* lower 32 bits of tv_sec are safe until y2106 */
- priv->hw->ptp->init_systime(priv->ptpaddr, (u32)now.tv_sec,
- now.tv_nsec);
+ stmmac_init_systime(priv, priv->ptpaddr,
+ (u32)now.tv_sec, now.tv_nsec);
}
return copy_to_user(ifr->ifr_data, &config,
@@ -770,7 +770,6 @@
netdev_info(priv->dev,
"IEEE 1588-2008 Advanced Timestamp supported\n");
- priv->hw->ptp = &stmmac_ptp;
priv->hwts_tx_en = 0;
priv->hwts_rx_en = 0;
@@ -795,8 +794,8 @@
{
u32 tx_cnt = priv->plat->tx_queues_to_use;
- priv->hw->mac->flow_ctrl(priv->hw, duplex, priv->flow_ctrl,
- priv->pause, tx_cnt);
+ stmmac_flow_ctrl(priv, priv->hw, duplex, priv->flow_ctrl,
+ priv->pause, tx_cnt);
}
/**
@@ -1008,7 +1007,7 @@
head_rx = (void *)rx_q->dma_rx;
/* Display RX ring */
- priv->hw->desc->display_ring(head_rx, DMA_RX_SIZE, true);
+ stmmac_display_ring(priv, head_rx, DMA_RX_SIZE, true);
}
}
@@ -1029,7 +1028,7 @@
else
head_tx = (void *)tx_q->dma_tx;
- priv->hw->desc->display_ring(head_tx, DMA_TX_SIZE, false);
+ stmmac_display_ring(priv, head_tx, DMA_TX_SIZE, false);
}
}
@@ -1073,13 +1072,13 @@
/* Clear the RX descriptors */
for (i = 0; i < DMA_RX_SIZE; i++)
if (priv->extend_desc)
- priv->hw->desc->init_rx_desc(&rx_q->dma_erx[i].basic,
- priv->use_riwt, priv->mode,
- (i == DMA_RX_SIZE - 1));
+ stmmac_init_rx_desc(priv, &rx_q->dma_erx[i].basic,
+ priv->use_riwt, priv->mode,
+ (i == DMA_RX_SIZE - 1));
else
- priv->hw->desc->init_rx_desc(&rx_q->dma_rx[i],
- priv->use_riwt, priv->mode,
- (i == DMA_RX_SIZE - 1));
+ stmmac_init_rx_desc(priv, &rx_q->dma_rx[i],
+ priv->use_riwt, priv->mode,
+ (i == DMA_RX_SIZE - 1));
}
/**
@@ -1097,13 +1096,11 @@
/* Clear the TX descriptors */
for (i = 0; i < DMA_TX_SIZE; i++)
if (priv->extend_desc)
- priv->hw->desc->init_tx_desc(&tx_q->dma_etx[i].basic,
- priv->mode,
- (i == DMA_TX_SIZE - 1));
+ stmmac_init_tx_desc(priv, &tx_q->dma_etx[i].basic,
+ priv->mode, (i == DMA_TX_SIZE - 1));
else
- priv->hw->desc->init_tx_desc(&tx_q->dma_tx[i],
- priv->mode,
- (i == DMA_TX_SIZE - 1));
+ stmmac_init_tx_desc(priv, &tx_q->dma_tx[i],
+ priv->mode, (i == DMA_TX_SIZE - 1));
}
/**
@@ -1159,14 +1156,10 @@
return -EINVAL;
}
- if (priv->synopsys_id >= DWMAC_CORE_4_00)
- p->des0 = cpu_to_le32(rx_q->rx_skbuff_dma[i]);
- else
- p->des2 = cpu_to_le32(rx_q->rx_skbuff_dma[i]);
+ stmmac_set_desc_addr(priv, p, rx_q->rx_skbuff_dma[i]);
- if ((priv->hw->mode->init_desc3) &&
- (priv->dma_buf_sz == BUF_SIZE_16KiB))
- priv->hw->mode->init_desc3(p);
+ if (priv->dma_buf_sz == BUF_SIZE_16KiB)
+ stmmac_init_desc3(priv, p);
return 0;
}
@@ -1232,13 +1225,14 @@
{
struct stmmac_priv *priv = netdev_priv(dev);
u32 rx_count = priv->plat->rx_queues_to_use;
- unsigned int bfsize = 0;
int ret = -ENOMEM;
+ int bfsize = 0;
int queue;
int i;
- if (priv->hw->mode->set_16kib_bfsize)
- bfsize = priv->hw->mode->set_16kib_bfsize(dev->mtu);
+ bfsize = stmmac_set_16kib_bfsize(priv, dev->mtu);
+ if (bfsize < 0)
+ bfsize = 0;
if (bfsize < BUF_SIZE_16KiB)
bfsize = stmmac_set_bfsize(dev->mtu, priv->dma_buf_sz);
@@ -1282,13 +1276,11 @@
/* Setup the chained descriptor addresses */
if (priv->mode == STMMAC_CHAIN_MODE) {
if (priv->extend_desc)
- priv->hw->mode->init(rx_q->dma_erx,
- rx_q->dma_rx_phy,
- DMA_RX_SIZE, 1);
+ stmmac_mode_init(priv, rx_q->dma_erx,
+ rx_q->dma_rx_phy, DMA_RX_SIZE, 1);
else
- priv->hw->mode->init(rx_q->dma_rx,
- rx_q->dma_rx_phy,
- DMA_RX_SIZE, 0);
+ stmmac_mode_init(priv, rx_q->dma_rx,
+ rx_q->dma_rx_phy, DMA_RX_SIZE, 0);
}
}
@@ -1335,13 +1327,11 @@
/* Setup the chained descriptor addresses */
if (priv->mode == STMMAC_CHAIN_MODE) {
if (priv->extend_desc)
- priv->hw->mode->init(tx_q->dma_etx,
- tx_q->dma_tx_phy,
- DMA_TX_SIZE, 1);
+ stmmac_mode_init(priv, tx_q->dma_etx,
+ tx_q->dma_tx_phy, DMA_TX_SIZE, 1);
else
- priv->hw->mode->init(tx_q->dma_tx,
- tx_q->dma_tx_phy,
- DMA_TX_SIZE, 0);
+ stmmac_mode_init(priv, tx_q->dma_tx,
+ tx_q->dma_tx_phy, DMA_TX_SIZE, 0);
}
for (i = 0; i < DMA_TX_SIZE; i++) {
@@ -1351,14 +1341,7 @@
else
p = tx_q->dma_tx + i;
- if (priv->synopsys_id >= DWMAC_CORE_4_00) {
- p->des0 = 0;
- p->des1 = 0;
- p->des2 = 0;
- p->des3 = 0;
- } else {
- p->des2 = 0;
- }
+ stmmac_clear_desc(priv, p);
tx_q->tx_skbuff_dma[i].buf = 0;
tx_q->tx_skbuff_dma[i].map_as_page = false;
@@ -1664,7 +1647,7 @@
for (queue = 0; queue < rx_queues_count; queue++) {
mode = priv->plat->rx_queues_cfg[queue].mode_to_use;
- priv->hw->mac->rx_queue_enable(priv->hw, mode, queue);
+ stmmac_rx_queue_enable(priv, priv->hw, mode, queue);
}
}
@@ -1678,7 +1661,7 @@
static void stmmac_start_rx_dma(struct stmmac_priv *priv, u32 chan)
{
netdev_dbg(priv->dev, "DMA RX processes started in channel %d\n", chan);
- priv->hw->dma->start_rx(priv->ioaddr, chan);
+ stmmac_start_rx(priv, priv->ioaddr, chan);
}
/**
@@ -1691,7 +1674,7 @@
static void stmmac_start_tx_dma(struct stmmac_priv *priv, u32 chan)
{
netdev_dbg(priv->dev, "DMA TX processes started in channel %d\n", chan);
- priv->hw->dma->start_tx(priv->ioaddr, chan);
+ stmmac_start_tx(priv, priv->ioaddr, chan);
}
/**
@@ -1704,7 +1687,7 @@
static void stmmac_stop_rx_dma(struct stmmac_priv *priv, u32 chan)
{
netdev_dbg(priv->dev, "DMA RX processes stopped in channel %d\n", chan);
- priv->hw->dma->stop_rx(priv->ioaddr, chan);
+ stmmac_stop_rx(priv, priv->ioaddr, chan);
}
/**
@@ -1717,7 +1700,7 @@
static void stmmac_stop_tx_dma(struct stmmac_priv *priv, u32 chan)
{
netdev_dbg(priv->dev, "DMA TX processes stopped in channel %d\n", chan);
- priv->hw->dma->stop_tx(priv->ioaddr, chan);
+ stmmac_stop_tx(priv, priv->ioaddr, chan);
}
/**
@@ -1804,23 +1787,18 @@
}
/* configure all channels */
- if (priv->synopsys_id >= DWMAC_CORE_4_00) {
- for (chan = 0; chan < rx_channels_count; chan++) {
- qmode = priv->plat->rx_queues_cfg[chan].mode_to_use;
+ for (chan = 0; chan < rx_channels_count; chan++) {
+ qmode = priv->plat->rx_queues_cfg[chan].mode_to_use;
- priv->hw->dma->dma_rx_mode(priv->ioaddr, rxmode, chan,
- rxfifosz, qmode);
- }
+ stmmac_dma_rx_mode(priv, priv->ioaddr, rxmode, chan,
+ rxfifosz, qmode);
+ }
- for (chan = 0; chan < tx_channels_count; chan++) {
- qmode = priv->plat->tx_queues_cfg[chan].mode_to_use;
+ for (chan = 0; chan < tx_channels_count; chan++) {
+ qmode = priv->plat->tx_queues_cfg[chan].mode_to_use;
- priv->hw->dma->dma_tx_mode(priv->ioaddr, txmode, chan,
- txfifosz, qmode);
- }
- } else {
- priv->hw->dma->dma_mode(priv->ioaddr, txmode, rxmode,
- rxfifosz);
+ stmmac_dma_tx_mode(priv, priv->ioaddr, txmode, chan,
+ txfifosz, qmode);
}
}
@@ -1851,9 +1829,8 @@
else
p = tx_q->dma_tx + entry;
- status = priv->hw->desc->tx_status(&priv->dev->stats,
- &priv->xstats, p,
- priv->ioaddr);
+ status = stmmac_tx_status(priv, &priv->dev->stats,
+ &priv->xstats, p, priv->ioaddr);
/* Check if the descriptor is owned by the DMA */
if (unlikely(status & tx_dma_own))
break;
@@ -1891,8 +1868,7 @@
tx_q->tx_skbuff_dma[entry].map_as_page = false;
}
- if (priv->hw->mode->clean_desc3)
- priv->hw->mode->clean_desc3(tx_q, p);
+ stmmac_clean_desc3(priv, tx_q, p);
tx_q->tx_skbuff_dma[entry].last_segment = false;
tx_q->tx_skbuff_dma[entry].is_jumbo = false;
@@ -1904,7 +1880,7 @@
tx_q->tx_skbuff[entry] = NULL;
}
- priv->hw->desc->release_tx_desc(p, priv->mode);
+ stmmac_release_tx_desc(priv, p, priv->mode);
entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
}
@@ -1929,16 +1905,6 @@
netif_tx_unlock(priv->dev);
}
-static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv, u32 chan)
-{
- priv->hw->dma->enable_dma_irq(priv->ioaddr, chan);
-}
-
-static inline void stmmac_disable_dma_irq(struct stmmac_priv *priv, u32 chan)
-{
- priv->hw->dma->disable_dma_irq(priv->ioaddr, chan);
-}
-
/**
* stmmac_tx_err - to manage the tx error
* @priv: driver private structure
@@ -1957,13 +1923,11 @@
dma_free_tx_skbufs(priv, chan);
for (i = 0; i < DMA_TX_SIZE; i++)
if (priv->extend_desc)
- priv->hw->desc->init_tx_desc(&tx_q->dma_etx[i].basic,
- priv->mode,
- (i == DMA_TX_SIZE - 1));
+ stmmac_init_tx_desc(priv, &tx_q->dma_etx[i].basic,
+ priv->mode, (i == DMA_TX_SIZE - 1));
else
- priv->hw->desc->init_tx_desc(&tx_q->dma_tx[i],
- priv->mode,
- (i == DMA_TX_SIZE - 1));
+ stmmac_init_tx_desc(priv, &tx_q->dma_tx[i],
+ priv->mode, (i == DMA_TX_SIZE - 1));
tx_q->dirty_tx = 0;
tx_q->cur_tx = 0;
tx_q->mss = 0;
@@ -2003,31 +1967,22 @@
rxfifosz /= rx_channels_count;
txfifosz /= tx_channels_count;
- if (priv->synopsys_id >= DWMAC_CORE_4_00) {
- priv->hw->dma->dma_rx_mode(priv->ioaddr, rxmode, chan,
- rxfifosz, rxqmode);
- priv->hw->dma->dma_tx_mode(priv->ioaddr, txmode, chan,
- txfifosz, txqmode);
- } else {
- priv->hw->dma->dma_mode(priv->ioaddr, txmode, rxmode,
- rxfifosz);
- }
+ stmmac_dma_rx_mode(priv, priv->ioaddr, rxmode, chan, rxfifosz, rxqmode);
+ stmmac_dma_tx_mode(priv, priv->ioaddr, txmode, chan, txfifosz, txqmode);
}
static bool stmmac_safety_feat_interrupt(struct stmmac_priv *priv)
{
- bool ret = false;
+ int ret;
- /* Safety features are only available in cores >= 5.10 */
- if (priv->synopsys_id < DWMAC_CORE_5_10)
- return ret;
- if (priv->hw->mac->safety_feat_irq_status)
- ret = priv->hw->mac->safety_feat_irq_status(priv->dev,
- priv->ioaddr, priv->dma_cap.asp, &priv->sstats);
-
- if (ret)
+ ret = stmmac_safety_feat_irq_status(priv, priv->dev,
+ priv->ioaddr, priv->dma_cap.asp, &priv->sstats);
+ if (ret && (ret != -EINVAL)) {
stmmac_global_err(priv);
- return ret;
+ return true;
+ }
+
+ return false;
}
/**
@@ -2045,7 +2000,11 @@
tx_channel_count : rx_channel_count;
u32 chan;
bool poll_scheduled = false;
- int status[channels_to_check];
+ int status[max_t(u32, MTL_MAX_TX_QUEUES, MTL_MAX_RX_QUEUES)];
+
+ /* Make sure we never check beyond our status buffer. */
+ if (WARN_ON_ONCE(channels_to_check > ARRAY_SIZE(status)))
+ channels_to_check = ARRAY_SIZE(status);
/* Each DMA channel can be used for rx and tx simultaneously, yet
* napi_struct is embedded in struct stmmac_rx_queue rather than in a
@@ -2054,16 +2013,15 @@
* all tx queues rather than just a single tx queue.
*/
for (chan = 0; chan < channels_to_check; chan++)
- status[chan] = priv->hw->dma->dma_interrupt(priv->ioaddr,
- &priv->xstats,
- chan);
+ status[chan] = stmmac_dma_interrupt_status(priv, priv->ioaddr,
+ &priv->xstats, chan);
for (chan = 0; chan < rx_channel_count; chan++) {
if (likely(status[chan] & handle_rx)) {
struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
if (likely(napi_schedule_prep(&rx_q->napi))) {
- stmmac_disable_dma_irq(priv, chan);
+ stmmac_disable_dma_irq(priv, priv->ioaddr, chan);
__napi_schedule(&rx_q->napi);
poll_scheduled = true;
}
@@ -2084,7 +2042,8 @@
&priv->rx_queue[0];
if (likely(napi_schedule_prep(&rx_q->napi))) {
- stmmac_disable_dma_irq(priv, chan);
+ stmmac_disable_dma_irq(priv,
+ priv->ioaddr, chan);
__napi_schedule(&rx_q->napi);
}
break;
@@ -2126,14 +2085,6 @@
unsigned int mode = MMC_CNTRL_RESET_ON_READ | MMC_CNTRL_COUNTER_RESET |
MMC_CNTRL_PRESET | MMC_CNTRL_FULL_HALF_PRESET;
- if (priv->synopsys_id >= DWMAC_CORE_4_00) {
- priv->ptpaddr = priv->ioaddr + PTP_GMAC4_OFFSET;
- priv->mmcaddr = priv->ioaddr + MMC_GMAC4_OFFSET;
- } else {
- priv->ptpaddr = priv->ioaddr + PTP_GMAC3_X_OFFSET;
- priv->mmcaddr = priv->ioaddr + MMC_GMAC3_X_OFFSET;
- }
-
dwmac_mmc_intr_all_mask(priv->mmcaddr);
if (priv->dma_cap.rmon) {
@@ -2144,32 +2095,6 @@
}
/**
- * stmmac_selec_desc_mode - to select among: normal/alternate/extend descriptors
- * @priv: driver private structure
- * Description: select the Enhanced/Alternate or Normal descriptors.
- * In case of Enhanced/Alternate, it checks if the extended descriptors are
- * supported by the HW capability register.
- */
-static void stmmac_selec_desc_mode(struct stmmac_priv *priv)
-{
- if (priv->plat->enh_desc) {
- dev_info(priv->device, "Enhanced/Alternate descriptors\n");
-
- /* GMAC older than 3.50 has no extended descriptors */
- if (priv->synopsys_id >= DWMAC_CORE_3_50) {
- dev_info(priv->device, "Enabled extended descriptors\n");
- priv->extend_desc = 1;
- } else
- dev_warn(priv->device, "Extended descriptors not supported\n");
-
- priv->hw->desc = &enh_desc_ops;
- } else {
- dev_info(priv->device, "Normal descriptors\n");
- priv->hw->desc = &ndesc_ops;
- }
-}
-
-/**
* stmmac_get_hw_features - get MAC capabilities from the HW cap. register.
* @priv: driver private structure
* Description:
@@ -2180,15 +2105,7 @@
*/
static int stmmac_get_hw_features(struct stmmac_priv *priv)
{
- u32 ret = 0;
-
- if (priv->hw->dma->get_hw_feature) {
- priv->hw->dma->get_hw_feature(priv->ioaddr,
- &priv->dma_cap);
- ret = 1;
- }
-
- return ret;
+ return stmmac_get_hw_feature(priv, priv->ioaddr, &priv->dma_cap) == 0;
}
/**
@@ -2201,8 +2118,7 @@
static void stmmac_check_ether_addr(struct stmmac_priv *priv)
{
if (!is_valid_ether_addr(priv->dev->dev_addr)) {
- priv->hw->mac->get_umac_addr(priv->hw,
- priv->dev->dev_addr, 0);
+ stmmac_get_umac_addr(priv, priv->hw, priv->dev->dev_addr, 0);
if (!is_valid_ether_addr(priv->dev->dev_addr))
eth_hw_addr_random(priv->dev);
netdev_info(priv->dev, "device MAC address %pM\n",
@@ -2222,10 +2138,9 @@
{
u32 rx_channels_count = priv->plat->rx_queues_to_use;
u32 tx_channels_count = priv->plat->tx_queues_to_use;
+ u32 dma_csr_ch = max(rx_channels_count, tx_channels_count);
struct stmmac_rx_queue *rx_q;
struct stmmac_tx_queue *tx_q;
- u32 dummy_dma_rx_phy = 0;
- u32 dummy_dma_tx_phy = 0;
u32 chan = 0;
int atds = 0;
int ret = 0;
@@ -2238,59 +2153,47 @@
if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE))
atds = 1;
- ret = priv->hw->dma->reset(priv->ioaddr);
+ ret = stmmac_reset(priv, priv->ioaddr);
if (ret) {
dev_err(priv->device, "Failed to reset the dma\n");
return ret;
}
- if (priv->synopsys_id >= DWMAC_CORE_4_00) {
- /* DMA Configuration */
- priv->hw->dma->init(priv->ioaddr, priv->plat->dma_cfg,
- dummy_dma_tx_phy, dummy_dma_rx_phy, atds);
-
- /* DMA RX Channel Configuration */
- for (chan = 0; chan < rx_channels_count; chan++) {
- rx_q = &priv->rx_queue[chan];
-
- priv->hw->dma->init_rx_chan(priv->ioaddr,
- priv->plat->dma_cfg,
- rx_q->dma_rx_phy, chan);
-
- rx_q->rx_tail_addr = rx_q->dma_rx_phy +
- (DMA_RX_SIZE * sizeof(struct dma_desc));
- priv->hw->dma->set_rx_tail_ptr(priv->ioaddr,
- rx_q->rx_tail_addr,
- chan);
- }
-
- /* DMA TX Channel Configuration */
- for (chan = 0; chan < tx_channels_count; chan++) {
- tx_q = &priv->tx_queue[chan];
-
- priv->hw->dma->init_chan(priv->ioaddr,
- priv->plat->dma_cfg,
- chan);
-
- priv->hw->dma->init_tx_chan(priv->ioaddr,
- priv->plat->dma_cfg,
- tx_q->dma_tx_phy, chan);
-
- tx_q->tx_tail_addr = tx_q->dma_tx_phy +
- (DMA_TX_SIZE * sizeof(struct dma_desc));
- priv->hw->dma->set_tx_tail_ptr(priv->ioaddr,
- tx_q->tx_tail_addr,
- chan);
- }
- } else {
+ /* DMA RX Channel Configuration */
+ for (chan = 0; chan < rx_channels_count; chan++) {
rx_q = &priv->rx_queue[chan];
- tx_q = &priv->tx_queue[chan];
- priv->hw->dma->init(priv->ioaddr, priv->plat->dma_cfg,
- tx_q->dma_tx_phy, rx_q->dma_rx_phy, atds);
+
+ stmmac_init_rx_chan(priv, priv->ioaddr, priv->plat->dma_cfg,
+ rx_q->dma_rx_phy, chan);
+
+ rx_q->rx_tail_addr = rx_q->dma_rx_phy +
+ (DMA_RX_SIZE * sizeof(struct dma_desc));
+ stmmac_set_rx_tail_ptr(priv, priv->ioaddr,
+ rx_q->rx_tail_addr, chan);
}
- if (priv->plat->axi && priv->hw->dma->axi)
- priv->hw->dma->axi(priv->ioaddr, priv->plat->axi);
+ /* DMA TX Channel Configuration */
+ for (chan = 0; chan < tx_channels_count; chan++) {
+ tx_q = &priv->tx_queue[chan];
+
+ stmmac_init_tx_chan(priv, priv->ioaddr, priv->plat->dma_cfg,
+ tx_q->dma_tx_phy, chan);
+
+ tx_q->tx_tail_addr = tx_q->dma_tx_phy +
+ (DMA_TX_SIZE * sizeof(struct dma_desc));
+ stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
+ tx_q->tx_tail_addr, chan);
+ }
+
+ /* DMA CSR Channel configuration */
+ for (chan = 0; chan < dma_csr_ch; chan++)
+ stmmac_init_chan(priv, priv->ioaddr, priv->plat->dma_cfg, chan);
+
+ /* DMA Configuration */
+ stmmac_dma_init(priv, priv->ioaddr, priv->plat->dma_cfg, atds);
+
+ if (priv->plat->axi)
+ stmmac_axi(priv, priv->ioaddr, priv->plat->axi);
return ret;
}
@@ -2336,18 +2239,14 @@
u32 chan;
/* set TX ring length */
- if (priv->hw->dma->set_tx_ring_len) {
- for (chan = 0; chan < tx_channels_count; chan++)
- priv->hw->dma->set_tx_ring_len(priv->ioaddr,
- (DMA_TX_SIZE - 1), chan);
- }
+ for (chan = 0; chan < tx_channels_count; chan++)
+ stmmac_set_tx_ring_len(priv, priv->ioaddr,
+ (DMA_TX_SIZE - 1), chan);
/* set RX ring length */
- if (priv->hw->dma->set_rx_ring_len) {
- for (chan = 0; chan < rx_channels_count; chan++)
- priv->hw->dma->set_rx_ring_len(priv->ioaddr,
- (DMA_RX_SIZE - 1), chan);
- }
+ for (chan = 0; chan < rx_channels_count; chan++)
+ stmmac_set_rx_ring_len(priv, priv->ioaddr,
+ (DMA_RX_SIZE - 1), chan);
}
/**
@@ -2363,7 +2262,7 @@
for (queue = 0; queue < tx_queues_count; queue++) {
weight = priv->plat->tx_queues_cfg[queue].weight;
- priv->hw->mac->set_mtl_tx_queue_weight(priv->hw, weight, queue);
+ stmmac_set_mtl_tx_queue_weight(priv, priv->hw, weight, queue);
}
}
@@ -2384,7 +2283,7 @@
if (mode_to_use == MTL_QUEUE_DCB)
continue;
- priv->hw->mac->config_cbs(priv->hw,
+ stmmac_config_cbs(priv, priv->hw,
priv->plat->tx_queues_cfg[queue].send_slope,
priv->plat->tx_queues_cfg[queue].idle_slope,
priv->plat->tx_queues_cfg[queue].high_credit,
@@ -2406,7 +2305,7 @@
for (queue = 0; queue < rx_queues_count; queue++) {
chan = priv->plat->rx_queues_cfg[queue].chan;
- priv->hw->mac->map_mtl_to_dma(priv->hw, queue, chan);
+ stmmac_map_mtl_to_dma(priv, priv->hw, queue, chan);
}
}
@@ -2426,7 +2325,7 @@
continue;
prio = priv->plat->rx_queues_cfg[queue].prio;
- priv->hw->mac->rx_queue_prio(priv->hw, prio, queue);
+ stmmac_rx_queue_prio(priv, priv->hw, prio, queue);
}
}
@@ -2446,7 +2345,7 @@
continue;
prio = priv->plat->tx_queues_cfg[queue].prio;
- priv->hw->mac->tx_queue_prio(priv->hw, prio, queue);
+ stmmac_tx_queue_prio(priv, priv->hw, prio, queue);
}
}
@@ -2467,7 +2366,7 @@
continue;
packet = priv->plat->rx_queues_cfg[queue].pkt_route;
- priv->hw->mac->rx_queue_routing(priv->hw, packet, queue);
+ stmmac_rx_queue_routing(priv, priv->hw, packet, queue);
}
}
@@ -2481,50 +2380,47 @@
u32 rx_queues_count = priv->plat->rx_queues_to_use;
u32 tx_queues_count = priv->plat->tx_queues_to_use;
- if (tx_queues_count > 1 && priv->hw->mac->set_mtl_tx_queue_weight)
+ if (tx_queues_count > 1)
stmmac_set_tx_queue_weight(priv);
/* Configure MTL RX algorithms */
- if (rx_queues_count > 1 && priv->hw->mac->prog_mtl_rx_algorithms)
- priv->hw->mac->prog_mtl_rx_algorithms(priv->hw,
- priv->plat->rx_sched_algorithm);
+ if (rx_queues_count > 1)
+ stmmac_prog_mtl_rx_algorithms(priv, priv->hw,
+ priv->plat->rx_sched_algorithm);
/* Configure MTL TX algorithms */
- if (tx_queues_count > 1 && priv->hw->mac->prog_mtl_tx_algorithms)
- priv->hw->mac->prog_mtl_tx_algorithms(priv->hw,
- priv->plat->tx_sched_algorithm);
+ if (tx_queues_count > 1)
+ stmmac_prog_mtl_tx_algorithms(priv, priv->hw,
+ priv->plat->tx_sched_algorithm);
/* Configure CBS in AVB TX queues */
- if (tx_queues_count > 1 && priv->hw->mac->config_cbs)
+ if (tx_queues_count > 1)
stmmac_configure_cbs(priv);
/* Map RX MTL to DMA channels */
- if (priv->hw->mac->map_mtl_to_dma)
- stmmac_rx_queue_dma_chan_map(priv);
+ stmmac_rx_queue_dma_chan_map(priv);
/* Enable MAC RX Queues */
- if (priv->hw->mac->rx_queue_enable)
- stmmac_mac_enable_rx_queues(priv);
+ stmmac_mac_enable_rx_queues(priv);
/* Set RX priorities */
- if (rx_queues_count > 1 && priv->hw->mac->rx_queue_prio)
+ if (rx_queues_count > 1)
stmmac_mac_config_rx_queues_prio(priv);
/* Set TX priorities */
- if (tx_queues_count > 1 && priv->hw->mac->tx_queue_prio)
+ if (tx_queues_count > 1)
stmmac_mac_config_tx_queues_prio(priv);
/* Set RX routing */
- if (rx_queues_count > 1 && priv->hw->mac->rx_queue_routing)
+ if (rx_queues_count > 1)
stmmac_mac_config_rx_queues_routing(priv);
}
static void stmmac_safety_feat_configuration(struct stmmac_priv *priv)
{
- if (priv->hw->mac->safety_feat_config && priv->dma_cap.asp) {
+ if (priv->dma_cap.asp) {
netdev_info(priv->dev, "Enabling Safety Features\n");
- priv->hw->mac->safety_feat_config(priv->ioaddr,
- priv->dma_cap.asp);
+ stmmac_safety_feat_config(priv, priv->ioaddr, priv->dma_cap.asp);
} else {
netdev_info(priv->dev, "No Safety Features support found\n");
}
@@ -2559,7 +2455,7 @@
}
/* Copy the MAC addr into the HW */
- priv->hw->mac->set_umac_addr(priv->hw, dev->dev_addr, 0);
+ stmmac_set_umac_addr(priv, priv->hw, dev->dev_addr, 0);
/* PS and related bits will be programmed according to the speed */
if (priv->hw->pcs) {
@@ -2575,17 +2471,15 @@
}
/* Initialize the MAC Core */
- priv->hw->mac->core_init(priv->hw, dev);
+ stmmac_core_init(priv, priv->hw, dev);
/* Initialize MTL*/
- if (priv->synopsys_id >= DWMAC_CORE_4_00)
- stmmac_mtl_configuration(priv);
+ stmmac_mtl_configuration(priv);
/* Initialize Safety Features */
- if (priv->synopsys_id >= DWMAC_CORE_5_10)
- stmmac_safety_feat_configuration(priv);
+ stmmac_safety_feat_configuration(priv);
- ret = priv->hw->mac->rx_ipc(priv->hw);
+ ret = stmmac_rx_ipc(priv, priv->hw);
if (!ret) {
netdev_warn(priv->dev, "RX IPC Checksum Offload disabled\n");
priv->plat->rx_coe = STMMAC_RX_COE_NONE;
@@ -2593,7 +2487,7 @@
}
/* Enable the MAC Rx/Tx */
- priv->hw->mac->set_mac(priv->ioaddr, true);
+ stmmac_mac_set(priv, priv->ioaddr, true);
/* Set the HW DMA mode and the COE */
stmmac_dma_operation_mode(priv);
@@ -2623,13 +2517,14 @@
priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS;
- if ((priv->use_riwt) && (priv->hw->dma->rx_watchdog)) {
- priv->rx_riwt = MAX_DMA_RIWT;
- priv->hw->dma->rx_watchdog(priv->ioaddr, MAX_DMA_RIWT, rx_cnt);
+ if (priv->use_riwt) {
+ ret = stmmac_rx_watchdog(priv, priv->ioaddr, MAX_DMA_RIWT, rx_cnt);
+ if (!ret)
+ priv->rx_riwt = MAX_DMA_RIWT;
}
- if (priv->hw->pcs && priv->hw->mac->pcs_ctrl_ane)
- priv->hw->mac->pcs_ctrl_ane(priv->hw, 1, priv->hw->ps, 0);
+ if (priv->hw->pcs)
+ stmmac_pcs_ctrl_ane(priv, priv->hw, 1, priv->hw->ps, 0);
/* set TX and RX rings length */
stmmac_set_rings_length(priv);
@@ -2637,7 +2532,7 @@
/* Enable TSO */
if (priv->tso) {
for (chan = 0; chan < tx_cnt; chan++)
- priv->hw->dma->enable_tso(priv->ioaddr, 1, chan);
+ stmmac_enable_tso(priv, priv->ioaddr, 1, chan);
}
return 0;
@@ -2808,7 +2703,7 @@
free_dma_desc_resources(priv);
/* Disable the MAC Rx/Tx */
- priv->hw->mac->set_mac(priv->ioaddr, false);
+ stmmac_mac_set(priv, priv->ioaddr, false);
netif_carrier_off(dev);
@@ -2851,10 +2746,10 @@
buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ?
TSO_MAX_BUFF_SIZE : tmp_len;
- priv->hw->desc->prepare_tso_tx_desc(desc, 0, buff_size,
- 0, 1,
- (last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE),
- 0, 0);
+ stmmac_prepare_tso_tx_desc(priv, desc, 0, buff_size,
+ 0, 1,
+ (last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE),
+ 0, 0);
tmp_len -= TSO_MAX_BUFF_SIZE;
}
@@ -2926,7 +2821,7 @@
/* set new MSS value if needed */
if (mss != tx_q->mss) {
mss_desc = tx_q->dma_tx + tx_q->cur_tx;
- priv->hw->desc->set_mss(mss_desc, mss);
+ stmmac_set_mss(priv, mss_desc, mss);
tx_q->mss = mss;
tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, DMA_TX_SIZE);
WARN_ON(tx_q->tx_skbuff[tx_q->cur_tx]);
@@ -3012,7 +2907,7 @@
STMMAC_COAL_TIMER(priv->tx_coal_timer));
} else {
priv->tx_count_frames = 0;
- priv->hw->desc->set_tx_ic(desc);
+ stmmac_set_tx_ic(priv, desc);
priv->xstats.tx_set_ic_bit++;
}
@@ -3022,11 +2917,11 @@
priv->hwts_tx_en)) {
/* declare that device is doing timestamping */
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
- priv->hw->desc->enable_tx_timestamp(first);
+ stmmac_enable_tx_timestamp(priv, first);
}
/* Complete the first descriptor before granting the DMA */
- priv->hw->desc->prepare_tso_tx_desc(first, 1,
+ stmmac_prepare_tso_tx_desc(priv, first, 1,
proto_hdr_len,
pay_len,
1, tx_q->tx_skbuff_dma[first_entry].last_segment,
@@ -3040,7 +2935,7 @@
* sure that MSS's own bit is the last thing written.
*/
dma_wmb();
- priv->hw->desc->set_tx_owner(mss_desc);
+ stmmac_set_tx_owner(priv, mss_desc);
}
/* The own bit must be the latest setting done when prepare the
@@ -3054,8 +2949,7 @@
__func__, tx_q->cur_tx, tx_q->dirty_tx, first_entry,
tx_q->cur_tx, first, nfrags);
- priv->hw->desc->display_ring((void *)tx_q->dma_tx, DMA_TX_SIZE,
- 0);
+ stmmac_display_ring(priv, (void *)tx_q->dma_tx, DMA_TX_SIZE, 0);
pr_info(">>> frame to be transmitted: ");
print_pkt(skb->data, skb_headlen(skb));
@@ -3063,8 +2957,7 @@
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
- priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, tx_q->tx_tail_addr,
- queue);
+ stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
return NETDEV_TX_OK;
@@ -3136,12 +3029,11 @@
enh_desc = priv->plat->enh_desc;
/* To program the descriptors according to the size of the frame */
if (enh_desc)
- is_jumbo = priv->hw->mode->is_jumbo_frm(skb->len, enh_desc);
+ is_jumbo = stmmac_is_jumbo_frm(priv, skb->len, enh_desc);
- if (unlikely(is_jumbo) && likely(priv->synopsys_id <
- DWMAC_CORE_4_00)) {
- entry = priv->hw->mode->jumbo_frm(tx_q, skb, csum_insertion);
- if (unlikely(entry < 0))
+ if (unlikely(is_jumbo)) {
+ entry = stmmac_jumbo_frm(priv, tx_q, skb, csum_insertion);
+ if (unlikely(entry < 0) && (entry != -EINVAL))
goto dma_map_err;
}
@@ -3164,19 +3056,16 @@
goto dma_map_err; /* should reuse desc w/o issues */
tx_q->tx_skbuff_dma[entry].buf = des;
- if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00))
- desc->des0 = cpu_to_le32(des);
- else
- desc->des2 = cpu_to_le32(des);
+
+ stmmac_set_desc_addr(priv, desc, des);
tx_q->tx_skbuff_dma[entry].map_as_page = true;
tx_q->tx_skbuff_dma[entry].len = len;
tx_q->tx_skbuff_dma[entry].last_segment = last_segment;
/* Prepare the descriptor and set the own bit too */
- priv->hw->desc->prepare_tx_desc(desc, 0, len, csum_insertion,
- priv->mode, 1, last_segment,
- skb->len);
+ stmmac_prepare_tx_desc(priv, desc, 0, len, csum_insertion,
+ priv->mode, 1, last_segment, skb->len);
}
/* Only the last descriptor gets to point to the skb. */
@@ -3203,7 +3092,7 @@
else
tx_head = (void *)tx_q->dma_tx;
- priv->hw->desc->display_ring(tx_head, DMA_TX_SIZE, false);
+ stmmac_display_ring(priv, tx_head, DMA_TX_SIZE, false);
netdev_dbg(priv->dev, ">>> frame to be transmitted: ");
print_pkt(skb->data, skb->len);
@@ -3223,13 +3112,16 @@
* element in case of no SG.
*/
priv->tx_count_frames += nfrags + 1;
- if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
+ if (likely(priv->tx_coal_frames > priv->tx_count_frames) &&
+ !priv->tx_timer_armed) {
mod_timer(&priv->txtimer,
STMMAC_COAL_TIMER(priv->tx_coal_timer));
+ priv->tx_timer_armed = true;
} else {
priv->tx_count_frames = 0;
- priv->hw->desc->set_tx_ic(desc);
+ stmmac_set_tx_ic(priv, desc);
priv->xstats.tx_set_ic_bit++;
+ priv->tx_timer_armed = false;
}
skb_tx_timestamp(skb);
@@ -3247,10 +3139,8 @@
goto dma_map_err;
tx_q->tx_skbuff_dma[first_entry].buf = des;
- if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00))
- first->des0 = cpu_to_le32(des);
- else
- first->des2 = cpu_to_le32(des);
+
+ stmmac_set_desc_addr(priv, first, des);
tx_q->tx_skbuff_dma[first_entry].len = nopaged_len;
tx_q->tx_skbuff_dma[first_entry].last_segment = last_segment;
@@ -3259,13 +3149,13 @@
priv->hwts_tx_en)) {
/* declare that device is doing timestamping */
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
- priv->hw->desc->enable_tx_timestamp(first);
+ stmmac_enable_tx_timestamp(priv, first);
}
/* Prepare the first descriptor setting the OWN bit too */
- priv->hw->desc->prepare_tx_desc(first, 1, nopaged_len,
- csum_insertion, priv->mode, 1,
- last_segment, skb->len);
+ stmmac_prepare_tx_desc(priv, first, 1, nopaged_len,
+ csum_insertion, priv->mode, 1, last_segment,
+ skb->len);
/* The own bit must be the latest setting done when prepare the
* descriptor and then barrier is needed to make sure that
@@ -3276,11 +3166,8 @@
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
- if (priv->synopsys_id < DWMAC_CORE_4_00)
- priv->hw->dma->enable_dma_transmission(priv->ioaddr);
- else
- priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, tx_q->tx_tail_addr,
- queue);
+ stmmac_enable_dma_transmission(priv, priv->ioaddr);
+ stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
return NETDEV_TX_OK;
@@ -3364,14 +3251,8 @@
break;
}
- if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00)) {
- p->des0 = cpu_to_le32(rx_q->rx_skbuff_dma[entry]);
- p->des1 = 0;
- } else {
- p->des2 = cpu_to_le32(rx_q->rx_skbuff_dma[entry]);
- }
- if (priv->hw->mode->refill_desc3)
- priv->hw->mode->refill_desc3(rx_q, p);
+ stmmac_set_desc_addr(priv, p, rx_q->rx_skbuff_dma[entry]);
+ stmmac_refill_desc3(priv, rx_q, p);
if (rx_q->rx_zeroc_thresh > 0)
rx_q->rx_zeroc_thresh--;
@@ -3381,10 +3262,7 @@
}
dma_wmb();
- if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00))
- priv->hw->desc->init_rx_desc(p, priv->use_riwt, 0, 0);
- else
- priv->hw->desc->set_rx_owner(p);
+ stmmac_set_rx_owner(priv, p, priv->use_riwt);
dma_wmb();
@@ -3418,7 +3296,7 @@
else
rx_head = (void *)rx_q->dma_rx;
- priv->hw->desc->display_ring(rx_head, DMA_RX_SIZE, true);
+ stmmac_display_ring(priv, rx_head, DMA_RX_SIZE, true);
}
while (count < limit) {
int status;
@@ -3431,8 +3309,8 @@
p = rx_q->dma_rx + entry;
/* read the status of the incoming frame */
- status = priv->hw->desc->rx_status(&priv->dev->stats,
- &priv->xstats, p);
+ status = stmmac_rx_status(priv, &priv->dev->stats,
+ &priv->xstats, p);
/* check if managed by the DMA otherwise go ahead */
if (unlikely(status & dma_own))
break;
@@ -3449,11 +3327,9 @@
prefetch(np);
- if ((priv->extend_desc) && (priv->hw->desc->rx_extended_status))
- priv->hw->desc->rx_extended_status(&priv->dev->stats,
- &priv->xstats,
- rx_q->dma_erx +
- entry);
+ if (priv->extend_desc)
+ stmmac_rx_extended_status(priv, &priv->dev->stats,
+ &priv->xstats, rx_q->dma_erx + entry);
if (unlikely(status == discard_frame)) {
priv->dev->stats.rx_errors++;
if (priv->hwts_rx_en && !priv->extend_desc) {
@@ -3474,12 +3350,8 @@
int frame_len;
unsigned int des;
- if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00))
- des = le32_to_cpu(p->des0);
- else
- des = le32_to_cpu(p->des2);
-
- frame_len = priv->hw->desc->get_rx_frame_len(p, coe);
+ stmmac_get_desc_addr(priv, p, &des);
+ frame_len = stmmac_get_rx_frame_len(priv, p, coe);
/* If frame length is greater than skb buffer size
* (preallocated during init) then the packet is
@@ -3621,7 +3493,7 @@
work_done = stmmac_rx(priv, budget, rx_q->queue_index);
if (work_done < budget) {
napi_complete_done(napi, work_done);
- stmmac_enable_dma_irq(priv, chan);
+ stmmac_enable_dma_irq(priv, priv->ioaddr, chan);
}
return work_done;
}
@@ -3654,7 +3526,7 @@
{
struct stmmac_priv *priv = netdev_priv(dev);
- priv->hw->mac->set_filter(priv->hw, dev);
+ stmmac_set_filter(priv, priv->hw, dev);
}
/**
@@ -3727,7 +3599,7 @@
/* No check needed because rx_coe has been set before and it will be
* fixed in case of issue.
*/
- priv->hw->mac->rx_ipc(priv->hw);
+ stmmac_rx_ipc(priv, priv->hw);
return 0;
}
@@ -3771,8 +3643,8 @@
/* To handle GMAC own interrupts */
if ((priv->plat->has_gmac) || (priv->plat->has_gmac4)) {
- int status = priv->hw->mac->host_irq_status(priv->hw,
- &priv->xstats);
+ int status = stmmac_host_irq_status(priv, priv->hw, &priv->xstats);
+ int mtl_status;
if (unlikely(status)) {
/* For LPI we need to save the tx status */
@@ -3782,21 +3654,18 @@
priv->tx_path_in_lpi_mode = false;
}
- if (priv->synopsys_id >= DWMAC_CORE_4_00) {
- for (queue = 0; queue < queues_count; queue++) {
- struct stmmac_rx_queue *rx_q =
- &priv->rx_queue[queue];
+ for (queue = 0; queue < queues_count; queue++) {
+ struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
- status |=
- priv->hw->mac->host_mtl_irq_status(priv->hw,
- queue);
-
- if (status & CORE_IRQ_MTL_RX_OVERFLOW &&
- priv->hw->dma->set_rx_tail_ptr)
- priv->hw->dma->set_rx_tail_ptr(priv->ioaddr,
- rx_q->rx_tail_addr,
+ mtl_status = stmmac_host_mtl_irq_status(priv, priv->hw,
queue);
- }
+ if (mtl_status != -EINVAL)
+ status |= mtl_status;
+
+ if (status & CORE_IRQ_MTL_RX_OVERFLOW)
+ stmmac_set_rx_tail_ptr(priv, priv->ioaddr,
+ rx_q->rx_tail_addr,
+ queue);
}
/* PCS link status */
@@ -3860,6 +3729,58 @@
return ret;
}
+static int stmmac_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct stmmac_priv *priv = cb_priv;
+ int ret = -EOPNOTSUPP;
+
+ stmmac_disable_all_queues(priv);
+
+ switch (type) {
+ case TC_SETUP_CLSU32:
+ if (tc_cls_can_offload_and_chain0(priv->dev, type_data))
+ ret = stmmac_tc_setup_cls_u32(priv, priv, type_data);
+ break;
+ default:
+ break;
+ }
+
+ stmmac_enable_all_queues(priv);
+ return ret;
+}
+
+static int stmmac_setup_tc_block(struct stmmac_priv *priv,
+ struct tc_block_offload *f)
+{
+ if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
+ return -EOPNOTSUPP;
+
+ switch (f->command) {
+ case TC_BLOCK_BIND:
+ return tcf_block_cb_register(f->block, stmmac_setup_tc_block_cb,
+ priv, priv);
+ case TC_BLOCK_UNBIND:
+ tcf_block_cb_unregister(f->block, stmmac_setup_tc_block_cb, priv);
+ return 0;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int stmmac_setup_tc(struct net_device *ndev, enum tc_setup_type type,
+ void *type_data)
+{
+ struct stmmac_priv *priv = netdev_priv(ndev);
+
+ switch (type) {
+ case TC_SETUP_BLOCK:
+ return stmmac_setup_tc_block(priv, type_data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static int stmmac_set_mac_address(struct net_device *ndev, void *addr)
{
struct stmmac_priv *priv = netdev_priv(ndev);
@@ -3869,7 +3790,7 @@
if (ret)
return ret;
- priv->hw->mac->set_umac_addr(priv->hw, ndev->dev_addr, 0);
+ stmmac_set_umac_addr(priv, priv->hw, ndev->dev_addr, 0);
return ret;
}
@@ -4098,6 +4019,7 @@
.ndo_set_rx_mode = stmmac_set_rx_mode,
.ndo_tx_timeout = stmmac_tx_timeout,
.ndo_do_ioctl = stmmac_ioctl,
+ .ndo_setup_tc = stmmac_setup_tc,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = stmmac_poll_controller,
#endif
@@ -4145,49 +4067,17 @@
*/
static int stmmac_hw_init(struct stmmac_priv *priv)
{
- struct mac_device_info *mac;
-
- /* Identify the MAC HW device */
- if (priv->plat->setup) {
- mac = priv->plat->setup(priv);
- } else if (priv->plat->has_gmac) {
- priv->dev->priv_flags |= IFF_UNICAST_FLT;
- mac = dwmac1000_setup(priv->ioaddr,
- priv->plat->multicast_filter_bins,
- priv->plat->unicast_filter_entries,
- &priv->synopsys_id);
- } else if (priv->plat->has_gmac4) {
- priv->dev->priv_flags |= IFF_UNICAST_FLT;
- mac = dwmac4_setup(priv->ioaddr,
- priv->plat->multicast_filter_bins,
- priv->plat->unicast_filter_entries,
- &priv->synopsys_id);
- } else {
- mac = dwmac100_setup(priv->ioaddr, &priv->synopsys_id);
- }
- if (!mac)
- return -ENOMEM;
-
- priv->hw = mac;
+ int ret;
/* dwmac-sun8i only work in chain mode */
if (priv->plat->has_sun8i)
chain_mode = 1;
+ priv->chain_mode = chain_mode;
- /* To use the chained or ring mode */
- if (priv->synopsys_id >= DWMAC_CORE_4_00) {
- priv->hw->mode = &dwmac4_ring_mode_ops;
- } else {
- if (chain_mode) {
- priv->hw->mode = &chain_mode_ops;
- dev_info(priv->device, "Chain mode enabled\n");
- priv->mode = STMMAC_CHAIN_MODE;
- } else {
- priv->hw->mode = &ring_mode_ops;
- dev_info(priv->device, "Ring mode enabled\n");
- priv->mode = STMMAC_RING_MODE;
- }
- }
+ /* Initialize HW Interface */
+ ret = stmmac_hwif_init(priv);
+ if (ret)
+ return ret;
/* Get the HW capability (new GMAC newer than 3.50a) */
priv->hw_cap_support = stmmac_get_hw_features(priv);
@@ -4221,12 +4111,6 @@
dev_info(priv->device, "No HW DMA feature register supported\n");
}
- /* To use alternate (extended), normal or GMAC4 descriptor structures */
- if (priv->synopsys_id >= DWMAC_CORE_4_00)
- priv->hw->desc = &dwmac4_desc_ops;
- else
- stmmac_selec_desc_mode(priv);
-
if (priv->plat->rx_coe) {
priv->hw->rx_csum = priv->plat->rx_coe;
dev_info(priv->device, "RX Checksum Offload Engine supported\n");
@@ -4335,6 +4219,11 @@
ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
NETIF_F_RXCSUM;
+ ret = stmmac_tc_init(priv, priv);
+ if (!ret) {
+ ndev->hw_features |= NETIF_F_HW_TC;
+ }
+
if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
ndev->hw_features |= NETIF_F_TSO | NETIF_F_TSO6;
priv->tso = true;
@@ -4458,7 +4347,7 @@
stmmac_stop_all_dma(priv);
- priv->hw->mac->set_mac(priv->ioaddr, false);
+ stmmac_mac_set(priv, priv->ioaddr, false);
netif_carrier_off(ndev);
unregister_netdev(ndev);
if (priv->plat->stmmac_rst)
@@ -4507,10 +4396,10 @@
/* Enable Power down mode by programming the PMT regs */
if (device_may_wakeup(priv->device)) {
- priv->hw->mac->pmt(priv->hw, priv->wolopts);
+ stmmac_pmt(priv, priv->hw, priv->wolopts);
priv->irq_wake = 1;
} else {
- priv->hw->mac->set_mac(priv->ioaddr, false);
+ stmmac_mac_set(priv, priv->ioaddr, false);
pinctrl_pm_select_sleep_state(priv->device);
/* Disable clock in case of PWM is off */
clk_disable(priv->plat->pclk);
@@ -4574,7 +4463,7 @@
*/
if (device_may_wakeup(priv->device)) {
spin_lock_irqsave(&priv->lock, flags);
- priv->hw->mac->pmt(priv->hw, 0);
+ stmmac_pmt(priv, priv->hw, 0);
spin_unlock_irqrestore(&priv->lock, flags);
priv->irq_wake = 0;
} else {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index f5f37bfa..5df1a60 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -233,10 +233,7 @@
new_bus->phy_mask = mdio_bus_data->phy_mask;
new_bus->parent = priv->device;
- if (mdio_node)
- err = of_mdiobus_register(new_bus, mdio_node);
- else
- err = mdiobus_register(new_bus);
+ err = of_mdiobus_register(new_bus, mdio_node);
if (err != 0) {
dev_err(dev, "Cannot register the MDIO bus\n");
goto bus_register_fail;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index e471a90..7d3a5c7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -49,9 +49,7 @@
addend = neg_adj ? (addend - diff) : (addend + diff);
spin_lock_irqsave(&priv->ptp_lock, flags);
-
- priv->hw->ptp->config_addend(priv->ptpaddr, addend);
-
+ stmmac_config_addend(priv, priv->ptpaddr, addend);
spin_unlock_irqrestore(&priv->ptp_lock, flags);
return 0;
@@ -84,10 +82,8 @@
nsec = reminder;
spin_lock_irqsave(&priv->ptp_lock, flags);
-
- priv->hw->ptp->adjust_systime(priv->ptpaddr, sec, nsec, neg_adj,
- priv->plat->has_gmac4);
-
+ stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj,
+ priv->plat->has_gmac4);
spin_unlock_irqrestore(&priv->ptp_lock, flags);
return 0;
@@ -110,9 +106,7 @@
u64 ns;
spin_lock_irqsave(&priv->ptp_lock, flags);
-
- ns = priv->hw->ptp->get_systime(priv->ptpaddr);
-
+ stmmac_get_systime(priv, priv->ptpaddr, &ns);
spin_unlock_irqrestore(&priv->ptp_lock, flags);
*ts = ns_to_timespec64(ns);
@@ -137,9 +131,7 @@
unsigned long flags;
spin_lock_irqsave(&priv->ptp_lock, flags);
-
- priv->hw->ptp->init_systime(priv->ptpaddr, ts->tv_sec, ts->tv_nsec);
-
+ stmmac_init_systime(priv, priv->ptpaddr, ts->tv_sec, ts->tv_nsec);
spin_unlock_irqrestore(&priv->ptp_lock, flags);
return 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
new file mode 100644
index 0000000..881c94b
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+ * stmmac TC Handling (HW only)
+ */
+
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gact.h>
+#include "common.h"
+#include "dwmac4.h"
+#include "dwmac5.h"
+#include "stmmac.h"
+
+static void tc_fill_all_pass_entry(struct stmmac_tc_entry *entry)
+{
+ memset(entry, 0, sizeof(*entry));
+ entry->in_use = true;
+ entry->is_last = true;
+ entry->is_frag = false;
+ entry->prio = ~0x0;
+ entry->handle = 0;
+ entry->val.match_data = 0x0;
+ entry->val.match_en = 0x0;
+ entry->val.af = 1;
+ entry->val.dma_ch_no = 0x0;
+}
+
+static struct stmmac_tc_entry *tc_find_entry(struct stmmac_priv *priv,
+ struct tc_cls_u32_offload *cls,
+ bool free)
+{
+ struct stmmac_tc_entry *entry, *first = NULL, *dup = NULL;
+ u32 loc = cls->knode.handle;
+ int i;
+
+ for (i = 0; i < priv->tc_entries_max; i++) {
+ entry = &priv->tc_entries[i];
+ if (!entry->in_use && !first && free)
+ first = entry;
+ if (entry->handle == loc && !free)
+ dup = entry;
+ }
+
+ if (dup)
+ return dup;
+ if (first) {
+ first->handle = loc;
+ first->in_use = true;
+
+ /* Reset HW values */
+ memset(&first->val, 0, sizeof(first->val));
+ }
+
+ return first;
+}
+
+static int tc_fill_actions(struct stmmac_tc_entry *entry,
+ struct stmmac_tc_entry *frag,
+ struct tc_cls_u32_offload *cls)
+{
+ struct stmmac_tc_entry *action_entry = entry;
+ const struct tc_action *act;
+ struct tcf_exts *exts;
+ LIST_HEAD(actions);
+
+ exts = cls->knode.exts;
+ if (!tcf_exts_has_actions(exts))
+ return -EINVAL;
+ if (frag)
+ action_entry = frag;
+
+ tcf_exts_to_list(exts, &actions);
+ list_for_each_entry(act, &actions, list) {
+ /* Accept */
+ if (is_tcf_gact_ok(act)) {
+ action_entry->val.af = 1;
+ break;
+ }
+ /* Drop */
+ if (is_tcf_gact_shot(act)) {
+ action_entry->val.rf = 1;
+ break;
+ }
+
+ /* Unsupported */
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int tc_fill_entry(struct stmmac_priv *priv,
+ struct tc_cls_u32_offload *cls)
+{
+ struct stmmac_tc_entry *entry, *frag = NULL;
+ struct tc_u32_sel *sel = cls->knode.sel;
+ u32 off, data, mask, real_off, rem;
+ u32 prio = cls->common.prio;
+ int ret;
+
+ /* Only 1 match per entry */
+ if (sel->nkeys <= 0 || sel->nkeys > 1)
+ return -EINVAL;
+
+ off = sel->keys[0].off << sel->offshift;
+ data = sel->keys[0].val;
+ mask = sel->keys[0].mask;
+
+ switch (ntohs(cls->common.protocol)) {
+ case ETH_P_ALL:
+ break;
+ case ETH_P_IP:
+ off += ETH_HLEN;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (off > priv->tc_off_max)
+ return -EINVAL;
+
+ real_off = off / 4;
+ rem = off % 4;
+
+ entry = tc_find_entry(priv, cls, true);
+ if (!entry)
+ return -EINVAL;
+
+ if (rem) {
+ frag = tc_find_entry(priv, cls, true);
+ if (!frag) {
+ ret = -EINVAL;
+ goto err_unuse;
+ }
+
+ entry->frag_ptr = frag;
+ entry->val.match_en = (mask << (rem * 8)) &
+ GENMASK(31, rem * 8);
+ entry->val.match_data = (data << (rem * 8)) &
+ GENMASK(31, rem * 8);
+ entry->val.frame_offset = real_off;
+ entry->prio = prio;
+
+ frag->val.match_en = (mask >> (rem * 8)) &
+ GENMASK(rem * 8 - 1, 0);
+ frag->val.match_data = (data >> (rem * 8)) &
+ GENMASK(rem * 8 - 1, 0);
+ frag->val.frame_offset = real_off + 1;
+ frag->prio = prio;
+ frag->is_frag = true;
+ } else {
+ entry->frag_ptr = NULL;
+ entry->val.match_en = mask;
+ entry->val.match_data = data;
+ entry->val.frame_offset = real_off;
+ entry->prio = prio;
+ }
+
+ ret = tc_fill_actions(entry, frag, cls);
+ if (ret)
+ goto err_unuse;
+
+ return 0;
+
+err_unuse:
+ if (frag)
+ frag->in_use = false;
+ entry->in_use = false;
+ return ret;
+}
+
+static void tc_unfill_entry(struct stmmac_priv *priv,
+ struct tc_cls_u32_offload *cls)
+{
+ struct stmmac_tc_entry *entry;
+
+ entry = tc_find_entry(priv, cls, false);
+ if (!entry)
+ return;
+
+ entry->in_use = false;
+ if (entry->frag_ptr) {
+ entry = entry->frag_ptr;
+ entry->is_frag = false;
+ entry->in_use = false;
+ }
+}
+
+static int tc_config_knode(struct stmmac_priv *priv,
+ struct tc_cls_u32_offload *cls)
+{
+ int ret;
+
+ ret = tc_fill_entry(priv, cls);
+ if (ret)
+ return ret;
+
+ ret = stmmac_rxp_config(priv, priv->hw->pcsr, priv->tc_entries,
+ priv->tc_entries_max);
+ if (ret)
+ goto err_unfill;
+
+ return 0;
+
+err_unfill:
+ tc_unfill_entry(priv, cls);
+ return ret;
+}
+
+static int tc_delete_knode(struct stmmac_priv *priv,
+ struct tc_cls_u32_offload *cls)
+{
+ int ret;
+
+ /* Set entry and fragments as not used */
+ tc_unfill_entry(priv, cls);
+
+ ret = stmmac_rxp_config(priv, priv->hw->pcsr, priv->tc_entries,
+ priv->tc_entries_max);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int tc_setup_cls_u32(struct stmmac_priv *priv,
+ struct tc_cls_u32_offload *cls)
+{
+ switch (cls->command) {
+ case TC_CLSU32_REPLACE_KNODE:
+ tc_unfill_entry(priv, cls);
+ /* Fall through */
+ case TC_CLSU32_NEW_KNODE:
+ return tc_config_knode(priv, cls);
+ case TC_CLSU32_DELETE_KNODE:
+ return tc_delete_knode(priv, cls);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int tc_init(struct stmmac_priv *priv)
+{
+ struct dma_features *dma_cap = &priv->dma_cap;
+ unsigned int count;
+
+ if (!dma_cap->frpsel)
+ return -EINVAL;
+
+ switch (dma_cap->frpbs) {
+ case 0x0:
+ priv->tc_off_max = 64;
+ break;
+ case 0x1:
+ priv->tc_off_max = 128;
+ break;
+ case 0x2:
+ priv->tc_off_max = 256;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ switch (dma_cap->frpes) {
+ case 0x0:
+ count = 64;
+ break;
+ case 0x1:
+ count = 128;
+ break;
+ case 0x2:
+ count = 256;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* Reserve one last filter which lets all pass */
+ priv->tc_entries_max = count;
+ priv->tc_entries = devm_kzalloc(priv->device,
+ sizeof(*priv->tc_entries) * count, GFP_KERNEL);
+ if (!priv->tc_entries)
+ return -ENOMEM;
+
+ tc_fill_all_pass_entry(&priv->tc_entries[count - 1]);
+
+ dev_info(priv->device, "Enabling HW TC (entries=%d, max_off=%d)\n",
+ priv->tc_entries_max, priv->tc_off_max);
+ return 0;
+}
+
+const struct stmmac_tc_ops dwmac510_tc_ops = {
+ .init = tc_init,
+ .setup_cls_u32 = tc_setup_cls_u32,
+};
diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index 48a541e..9263d63 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -18,7 +18,7 @@
config TI_DAVINCI_EMAC
tristate "TI DaVinci EMAC Support"
- depends on ARM && ( ARCH_DAVINCI || ARCH_OMAP3 )
+ depends on ARM && ( ARCH_DAVINCI || ARCH_OMAP3 ) || COMPILE_TEST
select TI_DAVINCI_MDIO
select TI_DAVINCI_CPDMA
select PHYLIB
@@ -30,7 +30,7 @@
config TI_DAVINCI_MDIO
tristate "TI DaVinci MDIO Support"
- depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || ARCH_KEYSTONE
+ depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || ARCH_KEYSTONE || COMPILE_TEST
select PHYLIB
---help---
This driver supports TI's DaVinci MDIO module.
@@ -40,7 +40,7 @@
config TI_DAVINCI_CPDMA
tristate "TI DaVinci CPDMA Support"
- depends on ARCH_DAVINCI || ARCH_OMAP2PLUS
+ depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || COMPILE_TEST
---help---
This driver supports TI's DaVinci CPDMA dma engine.
@@ -60,7 +60,7 @@
config TI_CPSW
tristate "TI CPSW Switch Support"
- depends on ARCH_DAVINCI || ARCH_OMAP2PLUS
+ depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || COMPILE_TEST
select TI_DAVINCI_CPDMA
select TI_DAVINCI_MDIO
select TI_CPSW_PHY_SEL
@@ -75,7 +75,7 @@
config TI_CPTS
bool "TI Common Platform Time Sync (CPTS) Support"
- depends on TI_CPSW || TI_KEYSTONE_NETCP
+ depends on TI_CPSW || TI_KEYSTONE_NETCP || COMPILE_TEST
depends on POSIX_TIMERS
---help---
This driver supports the Common Platform Time Sync unit of
diff --git a/drivers/net/ethernet/ti/cpsw-phy-sel.c b/drivers/net/ethernet/ti/cpsw-phy-sel.c
index 1801364..0c1adad 100644
--- a/drivers/net/ethernet/ti/cpsw-phy-sel.c
+++ b/drivers/net/ethernet/ti/cpsw-phy-sel.c
@@ -177,12 +177,18 @@
}
dev = bus_find_device(&platform_bus_type, NULL, node, match);
- of_node_put(node);
+ if (!dev) {
+ dev_err(dev, "unable to find platform device for %pOF\n", node);
+ goto out;
+ }
+
priv = dev_get_drvdata(dev);
priv->cpsw_phy_sel(priv, phy_mode, slave);
put_device(dev);
+out:
+ of_node_put(node);
}
EXPORT_SYMBOL_GPL(cpsw_phy_sel);
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 28d893b..643cd2d 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -36,6 +36,7 @@
#include <linux/of_device.h>
#include <linux/if_vlan.h>
#include <linux/kmemleak.h>
+#include <linux/sys_soc.h>
#include <linux/pinctrl/consumer.h>
@@ -957,7 +958,7 @@
return IRQ_HANDLED;
}
-static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
+static int cpsw_tx_mq_poll(struct napi_struct *napi_tx, int budget)
{
u32 ch_map;
int num_tx, cur_budget, ch;
@@ -984,7 +985,21 @@
if (num_tx < budget) {
napi_complete(napi_tx);
writel(0xff, &cpsw->wr_regs->tx_en);
- if (cpsw->quirk_irq && cpsw->tx_irq_disabled) {
+ }
+
+ return num_tx;
+}
+
+static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
+{
+ struct cpsw_common *cpsw = napi_to_cpsw(napi_tx);
+ int num_tx;
+
+ num_tx = cpdma_chan_process(cpsw->txv[0].ch, budget);
+ if (num_tx < budget) {
+ napi_complete(napi_tx);
+ writel(0xff, &cpsw->wr_regs->tx_en);
+ if (cpsw->tx_irq_disabled) {
cpsw->tx_irq_disabled = false;
enable_irq(cpsw->irqs_table[1]);
}
@@ -993,7 +1008,7 @@
return num_tx;
}
-static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget)
+static int cpsw_rx_mq_poll(struct napi_struct *napi_rx, int budget)
{
u32 ch_map;
int num_rx, cur_budget, ch;
@@ -1020,7 +1035,21 @@
if (num_rx < budget) {
napi_complete_done(napi_rx, num_rx);
writel(0xff, &cpsw->wr_regs->rx_en);
- if (cpsw->quirk_irq && cpsw->rx_irq_disabled) {
+ }
+
+ return num_rx;
+}
+
+static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget)
+{
+ struct cpsw_common *cpsw = napi_to_cpsw(napi_rx);
+ int num_rx;
+
+ num_rx = cpdma_chan_process(cpsw->rxv[0].ch, budget);
+ if (num_rx < budget) {
+ napi_complete_done(napi_rx, num_rx);
+ writel(0xff, &cpsw->wr_regs->rx_en);
+ if (cpsw->rx_irq_disabled) {
cpsw->rx_irq_disabled = false;
enable_irq(cpsw->irqs_table[0]);
}
@@ -1252,8 +1281,8 @@
for (i = 0; i < ch_stats_len; i++) {
line = i % CPSW_STATS_CH_LEN;
snprintf(*p, ETH_GSTRING_LEN,
- "%s DMA chan %d: %s", rx_dir ? "Rx" : "Tx",
- i / CPSW_STATS_CH_LEN,
+ "%s DMA chan %ld: %s", rx_dir ? "Rx" : "Tx",
+ (long)(i / CPSW_STATS_CH_LEN),
cpsw_gstrings_ch_stats[line].stat_string);
*p += ETH_GSTRING_LEN;
}
@@ -2364,9 +2393,9 @@
{
struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
+ ch->max_rx = cpsw->quirk_irq ? 1 : CPSW_MAX_QUEUES;
+ ch->max_tx = cpsw->quirk_irq ? 1 : CPSW_MAX_QUEUES;
ch->max_combined = 0;
- ch->max_rx = CPSW_MAX_QUEUES;
- ch->max_tx = CPSW_MAX_QUEUES;
ch->max_other = 0;
ch->other_count = 0;
ch->rx_count = cpsw->rx_ch_num;
@@ -2377,6 +2406,11 @@
static int cpsw_check_ch_settings(struct cpsw_common *cpsw,
struct ethtool_channels *ch)
{
+ if (cpsw->quirk_irq) {
+ dev_err(cpsw->dev, "Maximum one tx/rx queue is allowed");
+ return -EOPNOTSUPP;
+ }
+
if (ch->combined_count)
return -EINVAL;
@@ -2917,44 +2951,20 @@
return ret;
}
-#define CPSW_QUIRK_IRQ BIT(0)
-
-static const struct platform_device_id cpsw_devtype[] = {
- {
- /* keep it for existing comaptibles */
- .name = "cpsw",
- .driver_data = CPSW_QUIRK_IRQ,
- }, {
- .name = "am335x-cpsw",
- .driver_data = CPSW_QUIRK_IRQ,
- }, {
- .name = "am4372-cpsw",
- .driver_data = 0,
- }, {
- .name = "dra7-cpsw",
- .driver_data = 0,
- }, {
- /* sentinel */
- }
-};
-MODULE_DEVICE_TABLE(platform, cpsw_devtype);
-
-enum ti_cpsw_type {
- CPSW = 0,
- AM335X_CPSW,
- AM4372_CPSW,
- DRA7_CPSW,
-};
-
static const struct of_device_id cpsw_of_mtable[] = {
- { .compatible = "ti,cpsw", .data = &cpsw_devtype[CPSW], },
- { .compatible = "ti,am335x-cpsw", .data = &cpsw_devtype[AM335X_CPSW], },
- { .compatible = "ti,am4372-cpsw", .data = &cpsw_devtype[AM4372_CPSW], },
- { .compatible = "ti,dra7-cpsw", .data = &cpsw_devtype[DRA7_CPSW], },
+ { .compatible = "ti,cpsw"},
+ { .compatible = "ti,am335x-cpsw"},
+ { .compatible = "ti,am4372-cpsw"},
+ { .compatible = "ti,dra7-cpsw"},
{ /* sentinel */ },
};
MODULE_DEVICE_TABLE(of, cpsw_of_mtable);
+static const struct soc_device_attribute cpsw_soc_devices[] = {
+ { .family = "AM33xx", .revision = "ES1.0"},
+ { /* sentinel */ }
+};
+
static int cpsw_probe(struct platform_device *pdev)
{
struct clk *clk;
@@ -2966,9 +2976,9 @@
void __iomem *ss_regs;
void __iomem *cpts_regs;
struct resource *res, *ss_res;
- const struct of_device_id *of_id;
struct gpio_descs *mode;
u32 slave_offset, sliver_offset, slave_size;
+ const struct soc_device_attribute *soc;
struct cpsw_common *cpsw;
int ret = 0, i;
int irq;
@@ -3141,6 +3151,10 @@
goto clean_dt_ret;
}
+ soc = soc_device_match(cpsw_soc_devices);
+ if (soc)
+ cpsw->quirk_irq = 1;
+
cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
if (IS_ERR(cpsw->txv[0].ch)) {
dev_err(priv->dev, "error initializing tx dma channel\n");
@@ -3180,19 +3194,16 @@
goto clean_dma_ret;
}
- of_id = of_match_device(cpsw_of_mtable, &pdev->dev);
- if (of_id) {
- pdev->id_entry = of_id->data;
- if (pdev->id_entry->driver_data)
- cpsw->quirk_irq = true;
- }
-
ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
ndev->netdev_ops = &cpsw_netdev_ops;
ndev->ethtool_ops = &cpsw_ethtool_ops;
- netif_napi_add(ndev, &cpsw->napi_rx, cpsw_rx_poll, CPSW_POLL_WEIGHT);
- netif_tx_napi_add(ndev, &cpsw->napi_tx, cpsw_tx_poll, CPSW_POLL_WEIGHT);
+ netif_napi_add(ndev, &cpsw->napi_rx,
+ cpsw->quirk_irq ? cpsw_rx_poll : cpsw_rx_mq_poll,
+ CPSW_POLL_WEIGHT);
+ netif_tx_napi_add(ndev, &cpsw->napi_tx,
+ cpsw->quirk_irq ? cpsw_tx_poll : cpsw_tx_mq_poll,
+ CPSW_POLL_WEIGHT);
cpsw_split_res(ndev);
/* register the network device */
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index e7b76f6..6f63c87 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -294,7 +294,8 @@
delay = CPTS_SKB_TX_WORK_TIMEOUT;
spin_unlock_irqrestore(&cpts->lock, flags);
- pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec);
+ pr_debug("cpts overflow check at %lld.%09ld\n",
+ (long long)ts.tv_sec, ts.tv_nsec);
return (long)delay;
}
@@ -564,7 +565,7 @@
cpts->refclk = devm_clk_get(dev, "cpts");
if (IS_ERR(cpts->refclk)) {
dev_err(dev, "Failed to get cpts refclk\n");
- return ERR_PTR(PTR_ERR(cpts->refclk));
+ return ERR_CAST(cpts->refclk);
}
clk_prepare(cpts->refclk);
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 31ae041..cdbddf1 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -191,7 +191,7 @@
return;
WARN(gen_pool_size(pool->gen_pool) != gen_pool_avail(pool->gen_pool),
- "cpdma_desc_pool size %d != avail %d",
+ "cpdma_desc_pool size %zd != avail %zd",
gen_pool_size(pool->gen_pool),
gen_pool_avail(pool->gen_pool));
if (pool->cpumap)
@@ -1080,7 +1080,7 @@
writel_relaxed(buffer, &desc->hw_buffer);
writel_relaxed(len, &desc->hw_len);
writel_relaxed(mode | len, &desc->hw_mode);
- writel_relaxed(token, &desc->sw_token);
+ writel_relaxed((uintptr_t)token, &desc->sw_token);
writel_relaxed(buffer, &desc->sw_buffer);
writel_relaxed(len, &desc->sw_len);
desc_read(desc, sw_len);
@@ -1121,15 +1121,15 @@
struct cpdma_desc_pool *pool = ctlr->pool;
dma_addr_t buff_dma;
int origlen;
- void *token;
+ uintptr_t token;
- token = (void *)desc_read(desc, sw_token);
+ token = desc_read(desc, sw_token);
buff_dma = desc_read(desc, sw_buffer);
origlen = desc_read(desc, sw_len);
dma_unmap_single(ctlr->dev, buff_dma, origlen, chan->dir);
cpdma_desc_free(pool, desc, 1);
- (*chan->handler)(token, outlen, status);
+ (*chan->handler)((void *)token, outlen, status);
}
static int __cpdma_chan_process(struct cpdma_chan *chan)
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index abceea8..be0fec1 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1930,8 +1930,8 @@
if (netif_msg_probe(priv)) {
dev_notice(&pdev->dev, "DaVinci EMAC Probe found device "
- "(regs: %p, irq: %d)\n",
- (void *)priv->emac_base_phys, ndev->irq);
+ "(regs: %pa, irq: %d)\n",
+ &priv->emac_base_phys, ndev->irq);
}
pm_runtime_put(&pdev->dev);
diff --git a/drivers/net/ethernet/ti/davinci_mdio.c b/drivers/net/ethernet/ti/davinci_mdio.c
index 3c33f45..8ac7283 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -34,6 +34,7 @@
#include <linux/clk.h>
#include <linux/err.h>
#include <linux/io.h>
+#include <linux/iopoll.h>
#include <linux/pm_runtime.h>
#include <linux/davinci_emac.h>
#include <linux/of.h>
@@ -227,14 +228,14 @@
static inline int wait_for_idle(struct davinci_mdio_data *data)
{
struct davinci_mdio_regs __iomem *regs = data->regs;
- unsigned long timeout = jiffies + msecs_to_jiffies(MDIO_TIMEOUT);
+ u32 val, ret;
- while (time_after(timeout, jiffies)) {
- if (__raw_readl(®s->control) & CONTROL_IDLE)
- return 0;
- }
- dev_err(data->dev, "timed out waiting for idle\n");
- return -ETIMEDOUT;
+ ret = readl_poll_timeout(®s->control, val, val & CONTROL_IDLE,
+ 0, MDIO_TIMEOUT * 1000);
+ if (ret)
+ dev_err(data->dev, "timed out waiting for idle\n");
+
+ return ret;
}
static int davinci_mdio_read(struct mii_bus *bus, int phy_id, int phy_reg)
@@ -428,12 +429,10 @@
* defined to support backward compatibility with DTs which assume that
* Davinci MDIO will always scan the bus for PHYs detection.
*/
- if (dev->of_node && of_get_child_count(dev->of_node)) {
+ if (dev->of_node && of_get_child_count(dev->of_node))
data->skip_scan = true;
- ret = of_mdiobus_register(data->bus, dev->of_node);
- } else {
- ret = mdiobus_register(data->bus);
- }
+
+ ret = of_mdiobus_register(data->bus, dev->of_node);
if (ret)
goto bail_out;
diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index 8900a6f..c4ffdf4 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -33,6 +33,8 @@
#define SGMII_LINK_MAC_MAC_FORCED 2
#define SGMII_LINK_MAC_FIBER 3
#define SGMII_LINK_MAC_PHY_NO_MDIO 4
+#define RGMII_LINK_MAC_PHY 5
+#define RGMII_LINK_MAC_PHY_NO_MDIO 7
#define XGMII_LINK_MAC_PHY 10
#define XGMII_LINK_MAC_MAC_FORCED 11
@@ -212,6 +214,7 @@
int (*add_vid)(void *intf_priv, int vid);
int (*del_vid)(void *intf_priv, int vid);
int (*ioctl)(void *intf_priv, struct ifreq *req, int cmd);
+ int (*set_rx_mode)(void *intf_priv, bool promisc);
/* used internally */
struct list_head module_list;
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index f5a7eb2..e40aa3e 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1509,6 +1509,24 @@
}
}
+static int netcp_set_promiscuous(struct netcp_intf *netcp, bool promisc)
+{
+ struct netcp_intf_modpriv *priv;
+ struct netcp_module *module;
+ int error;
+
+ for_each_module(netcp, priv) {
+ module = priv->netcp_module;
+ if (!module->set_rx_mode)
+ continue;
+
+ error = module->set_rx_mode(priv->module_priv, promisc);
+ if (error)
+ return error;
+ }
+ return 0;
+}
+
static void netcp_set_rx_mode(struct net_device *ndev)
{
struct netcp_intf *netcp = netdev_priv(ndev);
@@ -1538,6 +1556,7 @@
/* finally sweep and callout into modules */
netcp_addr_sweep_del(netcp);
netcp_addr_sweep_add(netcp);
+ netcp_set_promiscuous(netcp, promisc);
spin_unlock(&netcp->lock);
}
@@ -2155,8 +2174,13 @@
struct device_node *child, *interfaces;
struct netcp_device *netcp_device;
struct device *dev = &pdev->dev;
+ struct netcp_module *module;
int ret;
+ if (!knav_dma_device_ready() ||
+ !knav_qmss_device_ready())
+ return -EPROBE_DEFER;
+
if (!node) {
dev_err(dev, "could not find device info\n");
return -ENODEV;
@@ -2203,6 +2227,14 @@
/* Add the device instance to the list */
list_add_tail(&netcp_device->device_list, &netcp_devices);
+ /* Probe & attach any modules already registered */
+ mutex_lock(&netcp_modules_lock);
+ for_each_netcp_module(module) {
+ ret = netcp_module_probe(netcp_device, module);
+ if (ret < 0)
+ dev_err(dev, "module(%s) probe failed\n", module->name);
+ }
+ mutex_unlock(&netcp_modules_lock);
return 0;
probe_quit_interface:
diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index 56dbc0b..6a728d3 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -21,6 +21,7 @@
#include <linux/io.h>
#include <linux/module.h>
#include <linux/of_mdio.h>
+#include <linux/of_net.h>
#include <linux/of_address.h>
#include <linux/if_vlan.h>
#include <linux/ptp_classify.h>
@@ -42,7 +43,7 @@
/* 1G Ethernet SS defines */
#define GBE_MODULE_NAME "netcp-gbe"
-#define GBE_SS_VERSION_14 0x4ed21104
+#define GBE_SS_VERSION_14 0x4ed2
#define GBE_SS_REG_INDEX 0
#define GBE_SGMII34_REG_INDEX 1
@@ -72,6 +73,11 @@
#define IS_SS_ID_NU(d) \
(GBE_IDENT((d)->ss_version) == GBE_SS_ID_NU)
+#define IS_SS_ID_VER_14(d) \
+ (GBE_IDENT((d)->ss_version) == GBE_SS_VERSION_14)
+#define IS_SS_ID_2U(d) \
+ (GBE_IDENT((d)->ss_version) == GBE_SS_ID_2U)
+
#define GBENU_SS_REG_INDEX 0
#define GBENU_SM_REG_INDEX 1
#define GBENU_SGMII_MODULE_OFFSET 0x100
@@ -86,7 +92,7 @@
/* 10G Ethernet SS defines */
#define XGBE_MODULE_NAME "netcp-xgbe"
-#define XGBE_SS_VERSION_10 0x4ee42100
+#define XGBE_SS_VERSION_10 0x4ee4
#define XGBE_SS_REG_INDEX 0
#define XGBE_SM_REG_INDEX 1
@@ -166,6 +172,11 @@
#define GBE_RXHOOK_ORDER 0
#define GBE_DEFAULT_ALE_AGEOUT 30
#define SLAVE_LINK_IS_XGMII(s) ((s)->link_interface >= XGMII_LINK_MAC_PHY)
+#define SLAVE_LINK_IS_RGMII(s) \
+ (((s)->link_interface >= RGMII_LINK_MAC_PHY) && \
+ ((s)->link_interface <= RGMII_LINK_MAC_PHY_NO_MDIO))
+#define SLAVE_LINK_IS_SGMII(s) \
+ ((s)->link_interface <= SGMII_LINK_MAC_PHY_NO_MDIO)
#define NETCP_LINK_STATE_INVALID -1
#define GBE_SET_REG_OFS(p, rb, rn) p->rb##_ofs.rn = \
@@ -549,6 +560,7 @@
struct gbe_ss_regs_ofs {
u16 id_ver;
u16 control;
+ u16 rgmii_status; /* 2U */
};
struct gbe_switch_regs {
@@ -591,6 +603,7 @@
struct gbe_port_regs_ofs {
u16 port_vlan;
u16 tx_pri_map;
+ u16 rx_pri_map;
u16 sa_lo;
u16 sa_hi;
u16 ts_ctl;
@@ -695,6 +708,7 @@
u32 link_interface;
u32 mac_control;
u8 phy_port_t;
+ struct device_node *node;
struct device_node *phy_node;
struct ts_ctl ts_ctl;
struct list_head slave_list;
@@ -1915,7 +1929,7 @@
gbe_dev = gbe_intf->gbe_dev;
spin_lock_bh(&gbe_dev->hw_stats_lock);
- if (gbe_dev->ss_version == GBE_SS_VERSION_14)
+ if (IS_SS_ID_VER_14(gbe_dev))
gbe_update_stats_ver14(gbe_dev, data);
else
gbe_update_stats(gbe_dev, data);
@@ -2091,8 +2105,9 @@
ALE_PORT_STATE_FORWARD);
if (ndev && slave->open &&
- slave->link_interface != SGMII_LINK_MAC_PHY &&
- slave->link_interface != XGMII_LINK_MAC_PHY)
+ ((slave->link_interface != SGMII_LINK_MAC_PHY) &&
+ (slave->link_interface != RGMII_LINK_MAC_PHY) &&
+ (slave->link_interface != XGMII_LINK_MAC_PHY)))
netif_carrier_on(ndev);
} else {
writel(mac_control, GBE_REG_ADDR(slave, emac_regs,
@@ -2101,8 +2116,9 @@
ALE_PORT_STATE,
ALE_PORT_STATE_DISABLE);
if (ndev &&
- slave->link_interface != SGMII_LINK_MAC_PHY &&
- slave->link_interface != XGMII_LINK_MAC_PHY)
+ ((slave->link_interface != SGMII_LINK_MAC_PHY) &&
+ (slave->link_interface != RGMII_LINK_MAC_PHY) &&
+ (slave->link_interface != XGMII_LINK_MAC_PHY)))
netif_carrier_off(ndev);
}
@@ -2115,23 +2131,35 @@
return !slave->phy || slave->phy->link;
}
+#define RGMII_REG_STATUS_LINK BIT(0)
+
+static void netcp_2u_rgmii_get_port_link(struct gbe_priv *gbe_dev, bool *status)
+{
+ u32 val = 0;
+
+ val = readl(GBE_REG_ADDR(gbe_dev, ss_regs, rgmii_status));
+ *status = !!(val & RGMII_REG_STATUS_LINK);
+}
+
static void netcp_ethss_update_link_state(struct gbe_priv *gbe_dev,
struct gbe_slave *slave,
struct net_device *ndev)
{
- int sp = slave->slave_num;
- int phy_link_state, sgmii_link_state = 1, link_state;
+ bool sw_link_state = true, phy_link_state;
+ int sp = slave->slave_num, link_state;
if (!slave->open)
return;
- if (!SLAVE_LINK_IS_XGMII(slave)) {
- sgmii_link_state =
- netcp_sgmii_get_port_link(SGMII_BASE(gbe_dev, sp), sp);
- }
+ if (SLAVE_LINK_IS_RGMII(slave))
+ netcp_2u_rgmii_get_port_link(gbe_dev,
+ &sw_link_state);
+ if (SLAVE_LINK_IS_SGMII(slave))
+ sw_link_state =
+ netcp_sgmii_get_port_link(SGMII_BASE(gbe_dev, sp), sp);
phy_link_state = gbe_phy_link_status(slave);
- link_state = phy_link_state & sgmii_link_state;
+ link_state = phy_link_state & sw_link_state;
if (atomic_xchg(&slave->link_state, link_state) != link_state)
netcp_ethss_link_state_action(gbe_dev, ndev, slave,
@@ -2205,7 +2233,7 @@
max_rx_len = NETCP_MAX_FRAME_SIZE;
/* Enable correct MII mode at SS level */
- if ((gbe_dev->ss_version == XGBE_SS_VERSION_10) &&
+ if (IS_SS_ID_XGBE(gbe_dev) &&
(slave->link_interface >= XGMII_LINK_MAC_PHY)) {
xgmii_mode = readl(GBE_REG_ADDR(gbe_dev, ss_regs, control));
xgmii_mode |= (1 << slave->slave_num);
@@ -2236,7 +2264,8 @@
struct gbe_priv *gbe_dev = intf->gbe_dev;
struct gbe_slave *slave = intf->slave;
- gbe_sgmii_rtreset(gbe_dev, slave, true);
+ if (!IS_SS_ID_2U(gbe_dev))
+ gbe_sgmii_rtreset(gbe_dev, slave, true);
gbe_port_reset(slave);
/* Disable forwarding */
cpsw_ale_control_set(gbe_dev->ale, slave->port_num,
@@ -2271,11 +2300,20 @@
void (*hndlr)(struct net_device *) = gbe_adjust_link;
- gbe_sgmii_config(priv, slave);
+ if (!IS_SS_ID_2U(priv))
+ gbe_sgmii_config(priv, slave);
gbe_port_reset(slave);
- gbe_sgmii_rtreset(priv, slave, false);
+ if (!IS_SS_ID_2U(priv))
+ gbe_sgmii_rtreset(priv, slave, false);
gbe_port_config(priv, slave, priv->rx_packet_max);
gbe_set_slave_mac(slave, gbe_intf);
+ /* For NU & 2U switch, map the vlan priorities to zero
+ * as we only configure to use priority 0
+ */
+ if (IS_SS_ID_MU(priv))
+ writel(HOST_TX_PRI_MAP_DEFAULT,
+ GBE_REG_ADDR(slave, port_regs, rx_pri_map));
+
/* enable forwarding */
cpsw_ale_control_set(priv->ale, slave->port_num,
ALE_PORT_STATE, ALE_PORT_STATE_FORWARD);
@@ -2286,6 +2324,21 @@
has_phy = true;
phy_mode = PHY_INTERFACE_MODE_SGMII;
slave->phy_port_t = PORT_MII;
+ } else if (slave->link_interface == RGMII_LINK_MAC_PHY) {
+ has_phy = true;
+ phy_mode = of_get_phy_mode(slave->node);
+ /* if phy-mode is not present, default to
+ * PHY_INTERFACE_MODE_RGMII
+ */
+ if (phy_mode < 0)
+ phy_mode = PHY_INTERFACE_MODE_RGMII;
+
+ if (!phy_interface_mode_is_rgmii(phy_mode)) {
+ dev_err(priv->dev,
+ "Unsupported phy mode %d\n", phy_mode);
+ return -EINVAL;
+ }
+ slave->phy_port_t = PORT_MII;
} else if (slave->link_interface == XGMII_LINK_MAC_PHY) {
has_phy = true;
phy_mode = PHY_INTERFACE_MODE_NA;
@@ -2293,7 +2346,7 @@
}
if (has_phy) {
- if (priv->ss_version == XGBE_SS_VERSION_10)
+ if (IS_SS_ID_XGBE(priv))
hndlr = xgbe_adjust_link;
slave->phy = of_phy_connect(gbe_intf->ndev,
@@ -2722,6 +2775,61 @@
}
#endif /* CONFIG_TI_CPTS */
+static int gbe_set_rx_mode(void *intf_priv, bool promisc)
+{
+ struct gbe_intf *gbe_intf = intf_priv;
+ struct gbe_priv *gbe_dev = gbe_intf->gbe_dev;
+ struct cpsw_ale *ale = gbe_dev->ale;
+ unsigned long timeout;
+ int i, ret = -ETIMEDOUT;
+
+ /* Disable(1)/Enable(0) Learn for all ports (host is port 0 and
+ * slaves are port 1 and up
+ */
+ for (i = 0; i <= gbe_dev->num_slaves; i++) {
+ cpsw_ale_control_set(ale, i,
+ ALE_PORT_NOLEARN, !!promisc);
+ cpsw_ale_control_set(ale, i,
+ ALE_PORT_NO_SA_UPDATE, !!promisc);
+ }
+
+ if (!promisc) {
+ /* Don't Flood All Unicast Packets to Host port */
+ cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 0);
+ dev_vdbg(gbe_dev->dev, "promiscuous mode disabled\n");
+ return 0;
+ }
+
+ timeout = jiffies + HZ;
+
+ /* Clear All Untouched entries */
+ cpsw_ale_control_set(ale, 0, ALE_AGEOUT, 1);
+ do {
+ cpu_relax();
+ if (cpsw_ale_control_get(ale, 0, ALE_AGEOUT)) {
+ ret = 0;
+ break;
+ }
+
+ } while (time_after(timeout, jiffies));
+
+ /* Make sure it is not a false timeout */
+ if (ret && !cpsw_ale_control_get(ale, 0, ALE_AGEOUT))
+ return ret;
+
+ cpsw_ale_control_set(ale, 0, ALE_AGEOUT, 1);
+
+ /* Clear all mcast from ALE */
+ cpsw_ale_flush_multicast(ale,
+ GBE_PORT_MASK(gbe_dev->ale_ports),
+ -1);
+
+ /* Flood All Unicast Packets to Host port */
+ cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
+ dev_vdbg(gbe_dev->dev, "promiscuous mode enabled\n");
+ return ret;
+}
+
static int gbe_ioctl(void *intf_priv, struct ifreq *req, int cmd)
{
struct gbe_intf *gbe_intf = intf_priv;
@@ -2764,7 +2872,7 @@
/* A timer runs as a BH, no need to block them */
spin_lock(&gbe_dev->hw_stats_lock);
- if (gbe_dev->ss_version == GBE_SS_VERSION_14)
+ if (IS_SS_ID_VER_14(gbe_dev))
gbe_update_stats_ver14(gbe_dev, NULL);
else
gbe_update_stats(gbe_dev, NULL);
@@ -2807,7 +2915,7 @@
GBE_RTL_VERSION(reg), GBE_IDENT(reg));
/* For 10G and on NetCP 1.5, use directed to port */
- if ((gbe_dev->ss_version == XGBE_SS_VERSION_10) || IS_SS_ID_MU(gbe_dev))
+ if (IS_SS_ID_XGBE(gbe_dev) || IS_SS_ID_MU(gbe_dev))
gbe_intf->tx_pipe.flags = SWITCH_TO_PORT_IN_TAGINFO;
if (gbe_dev->enable_ale)
@@ -2911,8 +3019,10 @@
slave->link_interface = SGMII_LINK_MAC_PHY;
}
+ slave->node = node;
slave->open = false;
if ((slave->link_interface == SGMII_LINK_MAC_PHY) ||
+ (slave->link_interface == RGMII_LINK_MAC_PHY) ||
(slave->link_interface == XGMII_LINK_MAC_PHY))
slave->phy_node = of_parse_phandle(node, "phy-handle", 0);
slave->port_num = gbe_get_slave_port(gbe_dev, slave->slave_num);
@@ -2924,7 +3034,7 @@
/* Emac regs memmap are contiguous but port regs are not */
port_reg_num = slave->slave_num;
- if (gbe_dev->ss_version == GBE_SS_VERSION_14) {
+ if (IS_SS_ID_VER_14(gbe_dev)) {
if (slave->slave_num > 1) {
port_reg_ofs = GBE13_SLAVE_PORT2_OFFSET;
port_reg_num -= 2;
@@ -2939,7 +3049,7 @@
emac_reg_ofs = GBENU_EMAC_OFFSET;
port_reg_blk_sz = 0x1000;
emac_reg_blk_sz = 0x1000;
- } else if (gbe_dev->ss_version == XGBE_SS_VERSION_10) {
+ } else if (IS_SS_ID_XGBE(gbe_dev)) {
port_reg_ofs = XGBE10_SLAVE_PORT_OFFSET;
emac_reg_ofs = XGBE10_EMAC_OFFSET;
port_reg_blk_sz = 0x30;
@@ -2955,7 +3065,7 @@
slave->emac_regs = gbe_dev->switch_regs + emac_reg_ofs +
(emac_reg_blk_sz * slave->slave_num);
- if (gbe_dev->ss_version == GBE_SS_VERSION_14) {
+ if (IS_SS_ID_VER_14(gbe_dev)) {
/* Initialize slave port register offsets */
GBE_SET_REG_OFS(slave, port_regs, port_vlan);
GBE_SET_REG_OFS(slave, port_regs, tx_pri_map);
@@ -2976,6 +3086,7 @@
/* Initialize slave port register offsets */
GBENU_SET_REG_OFS(slave, port_regs, port_vlan);
GBENU_SET_REG_OFS(slave, port_regs, tx_pri_map);
+ GBENU_SET_REG_OFS(slave, port_regs, rx_pri_map);
GBENU_SET_REG_OFS(slave, port_regs, sa_lo);
GBENU_SET_REG_OFS(slave, port_regs, sa_hi);
GBENU_SET_REG_OFS(slave, port_regs, ts_ctl);
@@ -2989,7 +3100,7 @@
GBENU_SET_REG_OFS(slave, emac_regs, mac_control);
GBENU_SET_REG_OFS(slave, emac_regs, soft_reset);
- } else if (gbe_dev->ss_version == XGBE_SS_VERSION_10) {
+ } else if (IS_SS_ID_XGBE(gbe_dev)) {
/* Initialize slave port register offsets */
XGBE_SET_REG_OFS(slave, port_regs, port_vlan);
XGBE_SET_REG_OFS(slave, port_regs, tx_pri_map);
@@ -3039,7 +3150,8 @@
continue;
}
- gbe_sgmii_config(gbe_dev, slave);
+ if (!IS_SS_ID_2U(gbe_dev))
+ gbe_sgmii_config(gbe_dev, slave);
gbe_port_reset(slave);
gbe_port_config(gbe_dev, slave, gbe_dev->rx_packet_max);
list_add_tail(&slave->slave_list, &gbe_dev->secondary_slaves);
@@ -3073,6 +3185,9 @@
if (slave->link_interface == SGMII_LINK_MAC_PHY) {
phy_mode = PHY_INTERFACE_MODE_SGMII;
slave->phy_port_t = PORT_MII;
+ } else if (slave->link_interface == RGMII_LINK_MAC_PHY) {
+ phy_mode = PHY_INTERFACE_MODE_RGMII;
+ slave->phy_port_t = PORT_MII;
} else {
phy_mode = PHY_INTERFACE_MODE_NA;
slave->phy_port_t = PORT_FIBRE;
@@ -3080,6 +3195,7 @@
for_each_sec_slave(slave, gbe_dev) {
if ((slave->link_interface != SGMII_LINK_MAC_PHY) &&
+ (slave->link_interface != RGMII_LINK_MAC_PHY) &&
(slave->link_interface != XGMII_LINK_MAC_PHY))
continue;
slave->phy =
@@ -3355,7 +3471,7 @@
gbe_dev->num_stats_mods = gbe_dev->max_num_ports;
gbe_dev->et_stats = gbenu_et_stats;
- if (IS_SS_ID_NU(gbe_dev))
+ if (IS_SS_ID_MU(gbe_dev))
gbe_dev->num_et_stats = GBENU_ET_STATS_HOST_SIZE +
(gbe_dev->max_num_slaves * GBENU_ET_STATS_PORT_SIZE);
else
@@ -3396,7 +3512,9 @@
}
gbe_dev->switch_regs = regs;
- gbe_dev->sgmii_port_regs = gbe_dev->ss_regs + GBENU_SGMII_MODULE_OFFSET;
+ if (!IS_SS_ID_2U(gbe_dev))
+ gbe_dev->sgmii_port_regs =
+ gbe_dev->ss_regs + GBENU_SGMII_MODULE_OFFSET;
/* Although sgmii modules are mem mapped to one contiguous
* region on GBENU devices, setting sgmii_port34_regs allows
@@ -3419,6 +3537,8 @@
/* Subsystem registers */
GBENU_SET_REG_OFS(gbe_dev, ss_regs, id_ver);
+ /* ok to set for MU, but used by 2U only */
+ GBENU_SET_REG_OFS(gbe_dev, ss_regs, rgmii_status);
/* Switch module registers */
GBENU_SET_REG_OFS(gbe_dev, switch_regs, id_ver);
@@ -3464,6 +3584,7 @@
gbe_dev->max_num_slaves = 8;
} else if (of_device_is_compatible(node, "ti,netcp-gbe-2")) {
gbe_dev->max_num_slaves = 1;
+ gbe_module.set_rx_mode = gbe_set_rx_mode;
} else if (of_device_is_compatible(node, "ti,netcp-xgbe")) {
gbe_dev->max_num_slaves = 2;
} else {
@@ -3508,7 +3629,7 @@
dev_dbg(dev, "ss_version: 0x%08x\n", gbe_dev->ss_version);
- if (gbe_dev->ss_version == GBE_SS_VERSION_14)
+ if (IS_SS_ID_VER_14(gbe_dev))
ret = set_gbe_ethss14_priv(gbe_dev, node);
else if (IS_SS_ID_MU(gbe_dev))
ret = set_gbenu_ethss_priv(gbe_dev, node);
@@ -3606,7 +3727,7 @@
spin_lock_bh(&gbe_dev->hw_stats_lock);
for (i = 0; i < gbe_dev->num_stats_mods; i++) {
- if (gbe_dev->ss_version == GBE_SS_VERSION_14)
+ if (IS_SS_ID_VER_14(gbe_dev))
gbe_reset_mod_stats_ver14(gbe_dev, i);
else
gbe_reset_mod_stats(gbe_dev, i);
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index b919e89..750eaa5 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -36,6 +36,8 @@
#define GENEVE_VER 0
#define GENEVE_BASE_HLEN (sizeof(struct udphdr) + sizeof(struct genevehdr))
+#define GENEVE_IPV4_HLEN (ETH_HLEN + sizeof(struct iphdr) + GENEVE_BASE_HLEN)
+#define GENEVE_IPV6_HLEN (ETH_HLEN + sizeof(struct ipv6hdr) + GENEVE_BASE_HLEN)
/* per-network namespace private data for this module */
struct geneve_net {
@@ -826,8 +828,8 @@
return PTR_ERR(rt);
if (skb_dst(skb)) {
- int mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr) -
- GENEVE_BASE_HLEN - info->options_len - 14;
+ int mtu = dst_mtu(&rt->dst) - GENEVE_IPV4_HLEN -
+ info->options_len;
skb_dst_update_pmtu(skb, mtu);
}
@@ -872,8 +874,7 @@
return PTR_ERR(dst);
if (skb_dst(skb)) {
- int mtu = dst_mtu(dst) - sizeof(struct ipv6hdr) -
- GENEVE_BASE_HLEN - info->options_len - 14;
+ int mtu = dst_mtu(dst) - GENEVE_IPV6_HLEN - info->options_len;
skb_dst_update_pmtu(skb, mtu);
}
@@ -941,11 +942,10 @@
static int geneve_change_mtu(struct net_device *dev, int new_mtu)
{
- /* Only possible if called internally, ndo_change_mtu path's new_mtu
- * is guaranteed to be between dev->min_mtu and dev->max_mtu.
- */
if (new_mtu > dev->max_mtu)
new_mtu = dev->max_mtu;
+ else if (new_mtu < dev->min_mtu)
+ new_mtu = dev->min_mtu;
dev->mtu = new_mtu;
return 0;
@@ -1261,7 +1261,7 @@
}
if (data[IFLA_GENEVE_REMOTE6]) {
- #if IS_ENABLED(CONFIG_IPV6)
+#if IS_ENABLED(CONFIG_IPV6)
if (changelink && (ip_tunnel_info_af(info) == AF_INET)) {
attrtype = IFLA_GENEVE_REMOTE6;
goto change_notsup;
@@ -1387,6 +1387,48 @@
return -EOPNOTSUPP;
}
+static void geneve_link_config(struct net_device *dev,
+ struct ip_tunnel_info *info, struct nlattr *tb[])
+{
+ struct geneve_dev *geneve = netdev_priv(dev);
+ int ldev_mtu = 0;
+
+ if (tb[IFLA_MTU]) {
+ geneve_change_mtu(dev, nla_get_u32(tb[IFLA_MTU]));
+ return;
+ }
+
+ switch (ip_tunnel_info_af(info)) {
+ case AF_INET: {
+ struct flowi4 fl4 = { .daddr = info->key.u.ipv4.dst };
+ struct rtable *rt = ip_route_output_key(geneve->net, &fl4);
+
+ if (!IS_ERR(rt) && rt->dst.dev) {
+ ldev_mtu = rt->dst.dev->mtu - GENEVE_IPV4_HLEN;
+ ip_rt_put(rt);
+ }
+ break;
+ }
+#if IS_ENABLED(CONFIG_IPV6)
+ case AF_INET6: {
+ struct rt6_info *rt = rt6_lookup(geneve->net,
+ &info->key.u.ipv6.dst, NULL, 0,
+ NULL, 0);
+
+ if (rt && rt->dst.dev)
+ ldev_mtu = rt->dst.dev->mtu - GENEVE_IPV6_HLEN;
+ ip6_rt_put(rt);
+ break;
+ }
+#endif
+ }
+
+ if (ldev_mtu <= 0)
+ return;
+
+ geneve_change_mtu(dev, ldev_mtu - info->options_len);
+}
+
static int geneve_newlink(struct net *net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
@@ -1402,8 +1444,14 @@
if (err)
return err;
- return geneve_configure(net, dev, extack, &info, metadata,
- use_udp6_rx_checksums);
+ err = geneve_configure(net, dev, extack, &info, metadata,
+ use_udp6_rx_checksums);
+ if (err)
+ return err;
+
+ geneve_link_config(dev, &info, tb);
+
+ return 0;
}
/* Quiesces the geneve device data path for both TX and RX.
@@ -1477,8 +1525,10 @@
if (err)
return err;
- if (!geneve_dst_addr_equal(&geneve->info, &info))
+ if (!geneve_dst_addr_equal(&geneve->info, &info)) {
dst_cache_reset(&info.dst_cache);
+ geneve_link_config(dev, &info, tb);
+ }
geneve_quiesce(geneve, &gs4, &gs6);
geneve->info = info;
diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index c180b48..13e4c1e 100644
--- a/drivers/net/hamradio/mkiss.c
+++ b/drivers/net/hamradio/mkiss.c
@@ -217,7 +217,7 @@
c = *s++;
else if (len > 1)
c = crc >> 8;
- else if (len > 0)
+ else
c = crc & 0xff;
len--;
diff --git a/drivers/net/hippi/rrunner.c b/drivers/net/hippi/rrunner.c
index 1ab97d9..f411164 100644
--- a/drivers/net/hippi/rrunner.c
+++ b/drivers/net/hippi/rrunner.c
@@ -867,7 +867,7 @@
dev->name);
goto drop;
case E_FRM_ERR:
- printk(KERN_WARNING "%s: Framming Error\n",
+ printk(KERN_WARNING "%s: Framing Error\n",
dev->name);
goto drop;
case E_FLG_SYN_ERR:
diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig
index 936968d..0765d5f 100644
--- a/drivers/net/hyperv/Kconfig
+++ b/drivers/net/hyperv/Kconfig
@@ -1,5 +1,6 @@
config HYPERV_NET
tristate "Microsoft Hyper-V virtual network driver"
depends on HYPERV
+ select UCS2_STRING
help
Select this option to enable the Hyper-V virtual network driver.
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 960f061..1be34d2 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -110,7 +110,7 @@
u16 hashkey_size;
/* The offset of the secret key from the beginning of this structure */
- u32 kashkey_offset;
+ u32 hashkey_offset;
u32 processor_masks_offset;
u32 num_processor_masks;
@@ -237,6 +237,8 @@
#define NVSP_PROTOCOL_VERSION_2 0x30002
#define NVSP_PROTOCOL_VERSION_4 0x40000
#define NVSP_PROTOCOL_VERSION_5 0x50000
+#define NVSP_PROTOCOL_VERSION_6 0x60000
+#define NVSP_PROTOCOL_VERSION_61 0x60001
enum {
NVSP_MSG_TYPE_NONE = 0,
@@ -308,6 +310,12 @@
NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE,
NVSP_MSG5_MAX = NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE,
+
+ /* Version 6 messages */
+ NVSP_MSG6_TYPE_PD_API,
+ NVSP_MSG6_TYPE_PD_POST_BATCH,
+
+ NVSP_MSG6_MAX = NVSP_MSG6_TYPE_PD_POST_BATCH
};
enum {
@@ -619,12 +627,168 @@
struct nvsp_5_send_indirect_table send_table;
} __packed;
+enum nvsp_6_pd_api_op {
+ PD_API_OP_CONFIG = 1,
+ PD_API_OP_SW_DATAPATH, /* Switch Datapath */
+ PD_API_OP_OPEN_PROVIDER,
+ PD_API_OP_CLOSE_PROVIDER,
+ PD_API_OP_CREATE_QUEUE,
+ PD_API_OP_FLUSH_QUEUE,
+ PD_API_OP_FREE_QUEUE,
+ PD_API_OP_ALLOC_COM_BUF, /* Allocate Common Buffer */
+ PD_API_OP_FREE_COM_BUF, /* Free Common Buffer */
+ PD_API_OP_MAX
+};
+
+struct grp_affinity {
+ u64 mask;
+ u16 grp;
+ u16 reserved[3];
+} __packed;
+
+struct nvsp_6_pd_api_req {
+ u32 op;
+
+ union {
+ /* MMIO information is sent from the VM to VSP */
+ struct __packed {
+ u64 mmio_pa; /* MMIO Physical Address */
+ u32 mmio_len;
+
+ /* Number of PD queues a VM can support */
+ u16 num_subchn;
+ } config;
+
+ /* Switch Datapath */
+ struct __packed {
+ /* Host Datapath Is PacketDirect */
+ u8 host_dpath_is_pd;
+
+ /* Guest PacketDirect Is Enabled */
+ u8 guest_pd_enabled;
+ } sw_dpath;
+
+ /* Open Provider*/
+ struct __packed {
+ u32 prov_id; /* Provider id */
+ u32 flag;
+ } open_prov;
+
+ /* Close Provider */
+ struct __packed {
+ u32 prov_id;
+ } cls_prov;
+
+ /* Create Queue*/
+ struct __packed {
+ u32 prov_id;
+ u16 q_id;
+ u16 q_size;
+ u8 is_recv_q;
+ u8 is_rss_q;
+ u32 recv_data_len;
+ struct grp_affinity affy;
+ } cr_q;
+
+ /* Delete Queue*/
+ struct __packed {
+ u32 prov_id;
+ u16 q_id;
+ } del_q;
+
+ /* Flush Queue */
+ struct __packed {
+ u32 prov_id;
+ u16 q_id;
+ } flush_q;
+
+ /* Allocate Common Buffer */
+ struct __packed {
+ u32 len;
+ u32 pf_node; /* Preferred Node */
+ u16 region_id;
+ } alloc_com_buf;
+
+ /* Free Common Buffer */
+ struct __packed {
+ u32 len;
+ u64 pa; /* Physical Address */
+ u32 pf_node; /* Preferred Node */
+ u16 region_id;
+ u8 cache_type;
+ } free_com_buf;
+ } __packed;
+} __packed;
+
+struct nvsp_6_pd_api_comp {
+ u32 op;
+ u32 status;
+
+ union {
+ struct __packed {
+ /* actual number of PD queues allocated to the VM */
+ u16 num_pd_q;
+
+ /* Num Receive Rss PD Queues */
+ u8 num_rss_q;
+
+ u8 is_supported; /* Is supported by VSP */
+ u8 is_enabled; /* Is enabled by VSP */
+ } config;
+
+ /* Open Provider */
+ struct __packed {
+ u32 prov_id;
+ } open_prov;
+
+ /* Create Queue */
+ struct __packed {
+ u32 prov_id;
+ u16 q_id;
+ u16 q_size;
+ u32 recv_data_len;
+ struct grp_affinity affy;
+ } cr_q;
+
+ /* Allocate Common Buffer */
+ struct __packed {
+ u64 pa; /* Physical Address */
+ u32 len;
+ u32 pf_node; /* Preferred Node */
+ u16 region_id;
+ u8 cache_type;
+ } alloc_com_buf;
+ } __packed;
+} __packed;
+
+struct nvsp_6_pd_buf {
+ u32 region_offset;
+ u16 region_id;
+ u16 is_partial:1;
+ u16 reserved:15;
+} __packed;
+
+struct nvsp_6_pd_batch_msg {
+ struct nvsp_message_header hdr;
+ u16 count;
+ u16 guest2host:1;
+ u16 is_recv:1;
+ u16 reserved:14;
+ struct nvsp_6_pd_buf pd_buf[0];
+} __packed;
+
+union nvsp_6_message_uber {
+ struct nvsp_6_pd_api_req pd_req;
+ struct nvsp_6_pd_api_comp pd_comp;
+} __packed;
+
union nvsp_all_messages {
union nvsp_message_init_uber init_msg;
union nvsp_1_message_uber v1_msg;
union nvsp_2_message_uber v2_msg;
union nvsp_4_message_uber v4_msg;
union nvsp_5_message_uber v5_msg;
+ union nvsp_6_message_uber v6_msg;
} __packed;
/* ALL Messages */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 04f611e..d2ee66c2 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -525,7 +525,8 @@
struct net_device *ndev = hv_get_drvdata(device);
static const u32 ver_list[] = {
NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2,
- NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5
+ NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5,
+ NVSP_PROTOCOL_VERSION_6, NVSP_PROTOCOL_VERSION_61
};
struct nvsp_message *init_packet;
int ndis_version, i, ret;
@@ -651,16 +652,14 @@
sync_change_bit(index, net_device->send_section_map);
}
-static void netvsc_send_tx_complete(struct netvsc_device *net_device,
- struct vmbus_channel *incoming_channel,
- struct hv_device *device,
+static void netvsc_send_tx_complete(struct net_device *ndev,
+ struct netvsc_device *net_device,
+ struct vmbus_channel *channel,
const struct vmpacket_descriptor *desc,
int budget)
{
struct sk_buff *skb = (struct sk_buff *)(unsigned long)desc->trans_id;
- struct net_device *ndev = hv_get_drvdata(device);
struct net_device_context *ndev_ctx = netdev_priv(ndev);
- struct vmbus_channel *channel = device->channel;
u16 q_idx = 0;
int queue_sends;
@@ -674,7 +673,6 @@
if (send_index != NETVSC_INVALID_INDEX)
netvsc_free_send_slot(net_device, send_index);
q_idx = packet->q_idx;
- channel = incoming_channel;
tx_stats = &net_device->chan_table[q_idx].tx_stats;
@@ -704,14 +702,13 @@
}
}
-static void netvsc_send_completion(struct netvsc_device *net_device,
+static void netvsc_send_completion(struct net_device *ndev,
+ struct netvsc_device *net_device,
struct vmbus_channel *incoming_channel,
- struct hv_device *device,
const struct vmpacket_descriptor *desc,
int budget)
{
- struct nvsp_message *nvsp_packet = hv_pkt_data(desc);
- struct net_device *ndev = hv_get_drvdata(device);
+ const struct nvsp_message *nvsp_packet = hv_pkt_data(desc);
switch (nvsp_packet->hdr.msg_type) {
case NVSP_MSG_TYPE_INIT_COMPLETE:
@@ -725,8 +722,8 @@
break;
case NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE:
- netvsc_send_tx_complete(net_device, incoming_channel,
- device, desc, budget);
+ netvsc_send_tx_complete(ndev, net_device, incoming_channel,
+ desc, budget);
break;
default:
@@ -1091,12 +1088,11 @@
static int netvsc_receive(struct net_device *ndev,
struct netvsc_device *net_device,
- struct net_device_context *net_device_ctx,
- struct hv_device *device,
struct vmbus_channel *channel,
const struct vmpacket_descriptor *desc,
- struct nvsp_message *nvsp)
+ const struct nvsp_message *nvsp)
{
+ struct net_device_context *net_device_ctx = netdev_priv(ndev);
const struct vmtransfer_page_packet_header *vmxferpage_packet
= container_of(desc, const struct vmtransfer_page_packet_header, d);
u16 q_idx = channel->offermsg.offer.sub_channel_index;
@@ -1157,13 +1153,12 @@
return count;
}
-static void netvsc_send_table(struct hv_device *hdev,
- struct nvsp_message *nvmsg)
+static void netvsc_send_table(struct net_device *ndev,
+ const struct nvsp_message *nvmsg)
{
- struct net_device *ndev = hv_get_drvdata(hdev);
struct net_device_context *net_device_ctx = netdev_priv(ndev);
- int i;
u32 count, *tab;
+ int i;
count = nvmsg->msg.v5_msg.send_table.count;
if (count != VRSS_SEND_TAB_SIZE) {
@@ -1178,24 +1173,25 @@
net_device_ctx->tx_table[i] = tab[i];
}
-static void netvsc_send_vf(struct net_device_context *net_device_ctx,
- struct nvsp_message *nvmsg)
+static void netvsc_send_vf(struct net_device *ndev,
+ const struct nvsp_message *nvmsg)
{
+ struct net_device_context *net_device_ctx = netdev_priv(ndev);
+
net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
}
-static inline void netvsc_receive_inband(struct hv_device *hdev,
- struct net_device_context *net_device_ctx,
- struct nvsp_message *nvmsg)
+static void netvsc_receive_inband(struct net_device *ndev,
+ const struct nvsp_message *nvmsg)
{
switch (nvmsg->hdr.msg_type) {
case NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE:
- netvsc_send_table(hdev, nvmsg);
+ netvsc_send_table(ndev, nvmsg);
break;
case NVSP_MSG4_TYPE_SEND_VF_ASSOCIATION:
- netvsc_send_vf(net_device_ctx, nvmsg);
+ netvsc_send_vf(ndev, nvmsg);
break;
}
}
@@ -1207,24 +1203,23 @@
const struct vmpacket_descriptor *desc,
int budget)
{
- struct net_device_context *net_device_ctx = netdev_priv(ndev);
- struct nvsp_message *nvmsg = hv_pkt_data(desc);
+ const struct nvsp_message *nvmsg = hv_pkt_data(desc);
trace_nvsp_recv(ndev, channel, nvmsg);
switch (desc->type) {
case VM_PKT_COMP:
- netvsc_send_completion(net_device, channel, device,
+ netvsc_send_completion(ndev, net_device, channel,
desc, budget);
break;
case VM_PKT_DATA_USING_XFER_PAGES:
- return netvsc_receive(ndev, net_device, net_device_ctx,
- device, channel, desc, nvmsg);
+ return netvsc_receive(ndev, net_device, channel,
+ desc, nvmsg);
break;
case VM_PKT_DATA_INBAND:
- netvsc_receive_inband(device, net_device_ctx, nvmsg);
+ netvsc_receive_inband(ndev, nvmsg);
break;
default:
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index da07ccd..60a5769 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1618,8 +1618,24 @@
return ret;
}
+static u32 netvsc_get_msglevel(struct net_device *ndev)
+{
+ struct net_device_context *ndev_ctx = netdev_priv(ndev);
+
+ return ndev_ctx->msg_enable;
+}
+
+static void netvsc_set_msglevel(struct net_device *ndev, u32 val)
+{
+ struct net_device_context *ndev_ctx = netdev_priv(ndev);
+
+ ndev_ctx->msg_enable = val;
+}
+
static const struct ethtool_ops ethtool_ops = {
.get_drvinfo = netvsc_get_drvinfo,
+ .get_msglevel = netvsc_get_msglevel,
+ .set_msglevel = netvsc_set_msglevel,
.get_link = ethtool_op_get_link,
.get_ethtool_stats = netvsc_get_ethtool_stats,
.get_sset_count = netvsc_get_sset_count,
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index e7ca5b5..5428bb2 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -29,6 +29,7 @@
#include <linux/nls.h>
#include <linux/vmalloc.h>
#include <linux/rtnetlink.h>
+#include <linux/ucs2_string.h>
#include "hyperv_net.h"
#include "netvsc_trace.h"
@@ -751,7 +752,7 @@
rssp->indirect_tabsize = 4*ITAB_NUM;
rssp->indirect_taboffset = sizeof(struct ndis_recv_scale_param);
rssp->hashkey_size = NETVSC_HASH_KEYLEN;
- rssp->kashkey_offset = rssp->indirect_taboffset +
+ rssp->hashkey_offset = rssp->indirect_taboffset +
rssp->indirect_tabsize;
/* Set indirection table entries */
@@ -760,7 +761,7 @@
itab[i] = rdev->rx_table[i];
/* Set hask key values */
- keyp = (u8 *)((unsigned long)rssp + rssp->kashkey_offset);
+ keyp = (u8 *)((unsigned long)rssp + rssp->hashkey_offset);
memcpy(keyp, rss_key, NETVSC_HASH_KEYLEN);
ret = rndis_filter_send_request(rdev, request);
@@ -1223,6 +1224,32 @@
return ret;
}
+static void rndis_get_friendly_name(struct net_device *net,
+ struct rndis_device *rndis_device,
+ struct netvsc_device *net_device)
+{
+ ucs2_char_t wname[256];
+ unsigned long len;
+ u8 ifalias[256];
+ u32 size;
+
+ size = sizeof(wname);
+ if (rndis_filter_query_device(rndis_device, net_device,
+ RNDIS_OID_GEN_FRIENDLY_NAME,
+ wname, &size) != 0)
+ return; /* ignore if host does not support */
+
+ if (size == 0)
+ return; /* name not set */
+
+ /* Convert Windows Unicode string to UTF-8 */
+ len = ucs2_as_utf8(ifalias, wname, sizeof(ifalias));
+
+ /* ignore the default value from host */
+ if (strcmp(ifalias, "Network Adapter") != 0)
+ dev_set_alias(net, ifalias, len);
+}
+
struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
struct netvsc_device_info *device_info)
{
@@ -1276,6 +1303,10 @@
memcpy(device_info->mac_adr, rndis_device->hw_mac_adr, ETH_ALEN);
+ /* Get friendly name as ifalias*/
+ if (!net->ifalias)
+ rndis_get_friendly_name(net, rndis_device, net_device);
+
/* Query and set hardware capabilities */
ret = rndis_netdev_set_hwcaps(rndis_device, net_device);
if (ret != 0)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 725f4b4..adde8fc 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -514,6 +514,7 @@
const struct macvlan_dev *vlan = netdev_priv(dev);
const struct macvlan_port *port = vlan->port;
const struct macvlan_dev *dest;
+ void *accel_priv = NULL;
if (vlan->mode == MACVLAN_MODE_BRIDGE) {
const struct ethhdr *eth = (void *)skb->data;
@@ -533,9 +534,14 @@
}
}
+ /* For packets that are non-multicast and not bridged we will pass
+ * the necessary information so that the lowerdev can distinguish
+ * the source of the packets via the accel_priv value.
+ */
+ accel_priv = vlan->accel_priv;
xmit_world:
skb->dev = vlan->lowerdev;
- return dev_queue_xmit(skb);
+ return dev_queue_xmit_accel(skb, accel_priv);
}
static inline netdev_tx_t macvlan_netpoll_send_skb(struct macvlan_dev *vlan, struct sk_buff *skb)
@@ -552,19 +558,14 @@
static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
+ struct macvlan_dev *vlan = netdev_priv(dev);
unsigned int len = skb->len;
int ret;
- struct macvlan_dev *vlan = netdev_priv(dev);
if (unlikely(netpoll_tx_running(dev)))
return macvlan_netpoll_send_skb(vlan, skb);
- if (vlan->fwd_priv) {
- skb->dev = vlan->lowerdev;
- ret = dev_queue_xmit_accel(skb, vlan->fwd_priv);
- } else {
- ret = macvlan_queue_xmit(skb, dev);
- }
+ ret = macvlan_queue_xmit(skb, dev);
if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
struct vlan_pcpu_stats *pcpu_stats;
@@ -613,26 +614,27 @@
goto hash_add;
}
- if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
- vlan->fwd_priv =
- lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
-
- /* If we get a NULL pointer back, or if we get an error
- * then we should just fall through to the non accelerated path
- */
- if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
- vlan->fwd_priv = NULL;
- } else
- return 0;
- }
-
err = -EBUSY;
if (macvlan_addr_busy(vlan->port, dev->dev_addr))
goto out;
- err = dev_uc_add(lowerdev, dev->dev_addr);
- if (err < 0)
- goto out;
+ /* Attempt to populate accel_priv which is used to offload the L2
+ * forwarding requests for unicast packets.
+ */
+ if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD)
+ vlan->accel_priv =
+ lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
+
+ /* If earlier attempt to offload failed, or accel_priv is not
+ * populated we must add the unicast address to the lower device.
+ */
+ if (IS_ERR_OR_NULL(vlan->accel_priv)) {
+ vlan->accel_priv = NULL;
+ err = dev_uc_add(lowerdev, dev->dev_addr);
+ if (err < 0)
+ goto out;
+ }
+
if (dev->flags & IFF_ALLMULTI) {
err = dev_set_allmulti(lowerdev, 1);
if (err < 0)
@@ -653,13 +655,14 @@
if (dev->flags & IFF_ALLMULTI)
dev_set_allmulti(lowerdev, -1);
del_unicast:
- dev_uc_del(lowerdev, dev->dev_addr);
-out:
- if (vlan->fwd_priv) {
+ if (vlan->accel_priv) {
lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
- vlan->fwd_priv);
- vlan->fwd_priv = NULL;
+ vlan->accel_priv);
+ vlan->accel_priv = NULL;
+ } else {
+ dev_uc_del(lowerdev, dev->dev_addr);
}
+out:
return err;
}
@@ -668,11 +671,10 @@
struct macvlan_dev *vlan = netdev_priv(dev);
struct net_device *lowerdev = vlan->lowerdev;
- if (vlan->fwd_priv) {
+ if (vlan->accel_priv) {
lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
- vlan->fwd_priv);
- vlan->fwd_priv = NULL;
- return 0;
+ vlan->accel_priv);
+ vlan->accel_priv = NULL;
}
dev_uc_unsync(lowerdev, dev);
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index bdfbabb..343989f 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -118,11 +118,18 @@
config MDIO_MOXART
tristate "MOXA ART MDIO interface support"
- depends on ARCH_MOXART
+ depends on ARCH_MOXART || COMPILE_TEST
help
This driver supports the MDIO interface found in the network
interface units of the MOXA ART SoC
+config MDIO_MSCC_MIIM
+ tristate "Microsemi MIIM interface support"
+ depends on HAS_IOMEM
+ help
+ This driver supports the MIIM (MDIO) interface found in the network
+ switches of the Microsemi SoCs
+
config MDIO_OCTEON
tristate "Octeon and some ThunderX SOCs MDIO buses"
depends on 64BIT
@@ -135,7 +142,7 @@
config MDIO_SUN4I
tristate "Allwinner sun4i MDIO interface support"
- depends on ARCH_SUNXI
+ depends on ARCH_SUNXI || COMPILE_TEST
help
This driver supports the MDIO interface found in the network
interface units of the Allwinner SoC that have an EMAC (A10,
@@ -218,6 +225,12 @@
---help---
Currently supports the Aquantia AQ1202, AQ2104, AQR105, AQR405
+config ASIX_PHY
+ tristate "Asix PHYs"
+ help
+ Currently supports the Asix Electronics PHY found in the X-Surf 100
+ AX88796B package.
+
config AT803X_PHY
tristate "AT803X PHYs"
---help---
@@ -285,6 +298,11 @@
---help---
Supports the DP83822 PHY.
+config DP83TC811_PHY
+ tristate "Texas Instruments DP83TC822 PHY"
+ ---help---
+ Supports the DP83TC822 PHY.
+
config DP83848_PHY
tristate "Texas Instruments DP83848 PHY"
---help---
@@ -354,6 +372,11 @@
help
Supports the LAN88XX PHYs.
+config MICROCHIP_T1_PHY
+ tristate "Microchip T1 PHYs"
+ ---help---
+ Supports the LAN87XX PHYs.
+
config MICROSEMI_PHY
tristate "Microsemi PHYs"
---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 01acbcb..5805c0b 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -34,6 +34,7 @@
obj-$(CONFIG_MDIO_HISI_FEMAC) += mdio-hisi-femac.o
obj-$(CONFIG_MDIO_I2C) += mdio-i2c.o
obj-$(CONFIG_MDIO_MOXART) += mdio-moxart.o
+obj-$(CONFIG_MDIO_MSCC_MIIM) += mdio-mscc-miim.o
obj-$(CONFIG_MDIO_OCTEON) += mdio-octeon.o
obj-$(CONFIG_MDIO_SUN4I) += mdio-sun4i.o
obj-$(CONFIG_MDIO_THUNDER) += mdio-thunder.o
@@ -45,6 +46,7 @@
obj-$(CONFIG_AMD_PHY) += amd.o
obj-$(CONFIG_AQUANTIA_PHY) += aquantia.o
+obj-$(CONFIG_ASIX_PHY) += asix.o
obj-$(CONFIG_AT803X_PHY) += at803x.o
obj-$(CONFIG_BCM63XX_PHY) += bcm63xx.o
obj-$(CONFIG_BCM7XXX_PHY) += bcm7xxx.o
@@ -57,6 +59,7 @@
obj-$(CONFIG_DAVICOM_PHY) += davicom.o
obj-$(CONFIG_DP83640_PHY) += dp83640.o
obj-$(CONFIG_DP83822_PHY) += dp83822.o
+obj-$(CONFIG_DP83TC811_PHY) += dp83tc811.o
obj-$(CONFIG_DP83848_PHY) += dp83848.o
obj-$(CONFIG_DP83867_PHY) += dp83867.o
obj-$(CONFIG_FIXED_PHY) += fixed_phy.o
@@ -70,6 +73,7 @@
obj-$(CONFIG_MICREL_KS8995MA) += spi_ks8995.o
obj-$(CONFIG_MICREL_PHY) += micrel.o
obj-$(CONFIG_MICROCHIP_PHY) += microchip.o
+obj-$(CONFIG_MICROCHIP_T1_PHY) += microchip_t1.o
obj-$(CONFIG_MICROSEMI_PHY) += mscc.o
obj-$(CONFIG_NATIONAL_PHY) += national.o
obj-$(CONFIG_QSEMI_PHY) += qsemi.o
diff --git a/drivers/net/phy/asix.c b/drivers/net/phy/asix.c
new file mode 100644
index 0000000..8ebe7f5
--- /dev/null
+++ b/drivers/net/phy/asix.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Driver for Asix PHYs
+ *
+ * Author: Michael Schmitz <schmitzmic@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ */
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/mii.h>
+#include <linux/phy.h>
+
+#define PHY_ID_ASIX_AX88796B 0x003b1841
+
+MODULE_DESCRIPTION("Asix PHY driver");
+MODULE_AUTHOR("Michael Schmitz <schmitzmic@gmail.com>");
+MODULE_LICENSE("GPL");
+
+/**
+ * asix_soft_reset - software reset the PHY via BMCR_RESET bit
+ * @phydev: target phy_device struct
+ *
+ * Description: Perform a software PHY reset using the standard
+ * BMCR_RESET bit and poll for the reset bit to be cleared.
+ * Toggle BMCR_RESET bit off to accommodate broken AX8796B PHY implementation
+ * such as used on the Individual Computers' X-Surf 100 Zorro card.
+ *
+ * Returns: 0 on success, < 0 on failure
+ */
+static int asix_soft_reset(struct phy_device *phydev)
+{
+ int ret;
+
+ /* Asix PHY won't reset unless reset bit toggles */
+ ret = phy_write(phydev, MII_BMCR, 0);
+ if (ret < 0)
+ return ret;
+
+ return genphy_soft_reset(phydev);
+}
+
+static struct phy_driver asix_driver[] = { {
+ .phy_id = PHY_ID_ASIX_AX88796B,
+ .name = "Asix Electronics AX88796B",
+ .phy_id_mask = 0xfffffff0,
+ .features = PHY_BASIC_FEATURES,
+ .soft_reset = asix_soft_reset,
+} };
+
+module_phy_driver(asix_driver);
+
+static struct mdio_device_id __maybe_unused asix_tbl[] = {
+ { PHY_ID_ASIX_AX88796B, 0xfffffff0 },
+ { }
+};
+
+MODULE_DEVICE_TABLE(mdio, asix_tbl);
diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
index d5e0833..e10e7b5 100644
--- a/drivers/net/phy/bcm-phy-lib.c
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -346,10 +346,6 @@
}
EXPORT_SYMBOL_GPL(bcm_phy_get_strings);
-#ifndef UINT64_MAX
-#define UINT64_MAX (u64)(~((u64)0))
-#endif
-
/* Caller is supposed to provide appropriate storage for the library code to
* access the shadow copy
*/
@@ -362,7 +358,7 @@
val = phy_read(phydev, stat.reg);
if (val < 0) {
- ret = UINT64_MAX;
+ ret = U64_MAX;
} else {
val >>= stat.shift;
val = val & ((1 << stat.bits) - 1);
diff --git a/drivers/net/phy/dp83tc811.c b/drivers/net/phy/dp83tc811.c
new file mode 100644
index 0000000..081d99a
--- /dev/null
+++ b/drivers/net/phy/dp83tc811.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for the Texas Instruments DP83TC811 PHY
+ *
+ * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/
+ *
+ */
+
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+#include <linux/kernel.h>
+#include <linux/mii.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/phy.h>
+#include <linux/netdevice.h>
+
+#define DP83TC811_PHY_ID 0x2000a253
+#define DP83811_DEVADDR 0x1f
+
+#define MII_DP83811_SGMII_CTRL 0x09
+#define MII_DP83811_INT_STAT1 0x12
+#define MII_DP83811_INT_STAT2 0x13
+#define MII_DP83811_RESET_CTRL 0x1f
+
+#define DP83811_HW_RESET BIT(15)
+#define DP83811_SW_RESET BIT(14)
+
+/* INT_STAT1 bits */
+#define DP83811_RX_ERR_HF_INT_EN BIT(0)
+#define DP83811_MS_TRAINING_INT_EN BIT(1)
+#define DP83811_ANEG_COMPLETE_INT_EN BIT(2)
+#define DP83811_ESD_EVENT_INT_EN BIT(3)
+#define DP83811_WOL_INT_EN BIT(4)
+#define DP83811_LINK_STAT_INT_EN BIT(5)
+#define DP83811_ENERGY_DET_INT_EN BIT(6)
+#define DP83811_LINK_QUAL_INT_EN BIT(7)
+
+/* INT_STAT2 bits */
+#define DP83811_JABBER_DET_INT_EN BIT(0)
+#define DP83811_POLARITY_INT_EN BIT(1)
+#define DP83811_SLEEP_MODE_INT_EN BIT(2)
+#define DP83811_OVERTEMP_INT_EN BIT(3)
+#define DP83811_OVERVOLTAGE_INT_EN BIT(6)
+#define DP83811_UNDERVOLTAGE_INT_EN BIT(7)
+
+#define MII_DP83811_RXSOP1 0x04a5
+#define MII_DP83811_RXSOP2 0x04a6
+#define MII_DP83811_RXSOP3 0x04a7
+
+/* WoL Registers */
+#define MII_DP83811_WOL_CFG 0x04a0
+#define MII_DP83811_WOL_STAT 0x04a1
+#define MII_DP83811_WOL_DA1 0x04a2
+#define MII_DP83811_WOL_DA2 0x04a3
+#define MII_DP83811_WOL_DA3 0x04a4
+
+/* WoL bits */
+#define DP83811_WOL_MAGIC_EN BIT(0)
+#define DP83811_WOL_SECURE_ON BIT(5)
+#define DP83811_WOL_EN BIT(7)
+#define DP83811_WOL_INDICATION_SEL BIT(8)
+#define DP83811_WOL_CLR_INDICATION BIT(11)
+
+/* SGMII CTRL bits */
+#define DP83811_TDR_AUTO BIT(8)
+#define DP83811_SGMII_EN BIT(12)
+#define DP83811_SGMII_AUTO_NEG_EN BIT(13)
+#define DP83811_SGMII_TX_ERR_DIS BIT(14)
+#define DP83811_SGMII_SOFT_RESET BIT(15)
+
+static int dp83811_ack_interrupt(struct phy_device *phydev)
+{
+ int err;
+
+ err = phy_read(phydev, MII_DP83811_INT_STAT1);
+ if (err < 0)
+ return err;
+
+ err = phy_read(phydev, MII_DP83811_INT_STAT2);
+ if (err < 0)
+ return err;
+
+ return 0;
+}
+
+static int dp83811_set_wol(struct phy_device *phydev,
+ struct ethtool_wolinfo *wol)
+{
+ struct net_device *ndev = phydev->attached_dev;
+ const u8 *mac;
+ u16 value;
+
+ if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
+ mac = (const u8 *)ndev->dev_addr;
+
+ if (!is_valid_ether_addr(mac))
+ return -EINVAL;
+
+ /* MAC addresses start with byte 5, but stored in mac[0].
+ * 811 PHYs store bytes 4|5, 2|3, 0|1
+ */
+ phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA1,
+ (mac[1] << 8) | mac[0]);
+ phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA2,
+ (mac[3] << 8) | mac[2]);
+ phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_DA3,
+ (mac[5] << 8) | mac[4]);
+
+ value = phy_read_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_WOL_CFG);
+ if (wol->wolopts & WAKE_MAGIC)
+ value |= DP83811_WOL_MAGIC_EN;
+ else
+ value &= ~DP83811_WOL_MAGIC_EN;
+
+ if (wol->wolopts & WAKE_MAGICSECURE) {
+ phy_write_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_RXSOP1,
+ (wol->sopass[1] << 8) | wol->sopass[0]);
+ phy_write_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_RXSOP2,
+ (wol->sopass[3] << 8) | wol->sopass[2]);
+ phy_write_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_RXSOP3,
+ (wol->sopass[5] << 8) | wol->sopass[4]);
+ value |= DP83811_WOL_SECURE_ON;
+ } else {
+ value &= ~DP83811_WOL_SECURE_ON;
+ }
+
+ value |= (DP83811_WOL_EN | DP83811_WOL_INDICATION_SEL |
+ DP83811_WOL_CLR_INDICATION);
+ phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+ value);
+ } else {
+ value = phy_read_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_WOL_CFG);
+ value &= ~DP83811_WOL_EN;
+ phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+ value);
+ }
+
+ return 0;
+}
+
+static void dp83811_get_wol(struct phy_device *phydev,
+ struct ethtool_wolinfo *wol)
+{
+ u16 sopass_val;
+ int value;
+
+ wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
+ wol->wolopts = 0;
+
+ value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+ if (value & DP83811_WOL_MAGIC_EN)
+ wol->wolopts |= WAKE_MAGIC;
+
+ if (value & DP83811_WOL_SECURE_ON) {
+ sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_RXSOP1);
+ wol->sopass[0] = (sopass_val & 0xff);
+ wol->sopass[1] = (sopass_val >> 8);
+
+ sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_RXSOP2);
+ wol->sopass[2] = (sopass_val & 0xff);
+ wol->sopass[3] = (sopass_val >> 8);
+
+ sopass_val = phy_read_mmd(phydev, DP83811_DEVADDR,
+ MII_DP83811_RXSOP3);
+ wol->sopass[4] = (sopass_val & 0xff);
+ wol->sopass[5] = (sopass_val >> 8);
+
+ wol->wolopts |= WAKE_MAGICSECURE;
+ }
+
+ /* WoL is not enabled so set wolopts to 0 */
+ if (!(value & DP83811_WOL_EN))
+ wol->wolopts = 0;
+}
+
+static int dp83811_config_intr(struct phy_device *phydev)
+{
+ int misr_status, err;
+
+ if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+ misr_status = phy_read(phydev, MII_DP83811_INT_STAT1);
+ if (misr_status < 0)
+ return misr_status;
+
+ misr_status |= (DP83811_RX_ERR_HF_INT_EN |
+ DP83811_MS_TRAINING_INT_EN |
+ DP83811_ANEG_COMPLETE_INT_EN |
+ DP83811_ESD_EVENT_INT_EN |
+ DP83811_WOL_INT_EN |
+ DP83811_LINK_STAT_INT_EN |
+ DP83811_ENERGY_DET_INT_EN |
+ DP83811_LINK_QUAL_INT_EN);
+
+ err = phy_write(phydev, MII_DP83811_INT_STAT1, misr_status);
+ if (err < 0)
+ return err;
+
+ misr_status = phy_read(phydev, MII_DP83811_INT_STAT2);
+ if (misr_status < 0)
+ return misr_status;
+
+ misr_status |= (DP83811_JABBER_DET_INT_EN |
+ DP83811_POLARITY_INT_EN |
+ DP83811_SLEEP_MODE_INT_EN |
+ DP83811_OVERTEMP_INT_EN |
+ DP83811_OVERVOLTAGE_INT_EN |
+ DP83811_UNDERVOLTAGE_INT_EN);
+
+ err = phy_write(phydev, MII_DP83811_INT_STAT2, misr_status);
+
+ } else {
+ err = phy_write(phydev, MII_DP83811_INT_STAT1, 0);
+ if (err < 0)
+ return err;
+
+ err = phy_write(phydev, MII_DP83811_INT_STAT1, 0);
+ }
+
+ return err;
+}
+
+static int dp83811_config_aneg(struct phy_device *phydev)
+{
+ int value, err;
+
+ if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+ value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
+ if (phydev->autoneg == AUTONEG_ENABLE) {
+ err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+ (DP83811_SGMII_AUTO_NEG_EN | value));
+ if (err < 0)
+ return err;
+ } else {
+ err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+ (~DP83811_SGMII_AUTO_NEG_EN & value));
+ if (err < 0)
+ return err;
+ }
+ }
+
+ return genphy_config_aneg(phydev);
+}
+
+static int dp83811_config_init(struct phy_device *phydev)
+{
+ int value, err;
+
+ err = genphy_config_init(phydev);
+ if (err < 0)
+ return err;
+
+ if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+ value = phy_read(phydev, MII_DP83811_SGMII_CTRL);
+ if (!(value & DP83811_SGMII_EN)) {
+ err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+ (DP83811_SGMII_EN | value));
+ if (err < 0)
+ return err;
+ } else {
+ err = phy_write(phydev, MII_DP83811_SGMII_CTRL,
+ (~DP83811_SGMII_EN & value));
+ if (err < 0)
+ return err;
+ }
+ }
+
+ value = DP83811_WOL_MAGIC_EN | DP83811_WOL_SECURE_ON | DP83811_WOL_EN;
+
+ return phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG,
+ value);
+}
+
+static int dp83811_phy_reset(struct phy_device *phydev)
+{
+ int err;
+
+ err = phy_write(phydev, MII_DP83811_RESET_CTRL, DP83811_HW_RESET);
+ if (err < 0)
+ return err;
+
+ return 0;
+}
+
+static int dp83811_suspend(struct phy_device *phydev)
+{
+ int value;
+
+ value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+ if (!(value & DP83811_WOL_EN))
+ genphy_suspend(phydev);
+
+ return 0;
+}
+
+static int dp83811_resume(struct phy_device *phydev)
+{
+ int value;
+
+ genphy_resume(phydev);
+
+ value = phy_read_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG);
+
+ phy_write_mmd(phydev, DP83811_DEVADDR, MII_DP83811_WOL_CFG, value |
+ DP83811_WOL_CLR_INDICATION);
+
+ return 0;
+}
+
+static struct phy_driver dp83811_driver[] = {
+ {
+ .phy_id = DP83TC811_PHY_ID,
+ .phy_id_mask = 0xfffffff0,
+ .name = "TI DP83TC811",
+ .features = PHY_BASIC_FEATURES,
+ .flags = PHY_HAS_INTERRUPT,
+ .config_init = dp83811_config_init,
+ .config_aneg = dp83811_config_aneg,
+ .soft_reset = dp83811_phy_reset,
+ .get_wol = dp83811_get_wol,
+ .set_wol = dp83811_set_wol,
+ .ack_interrupt = dp83811_ack_interrupt,
+ .config_intr = dp83811_config_intr,
+ .suspend = dp83811_suspend,
+ .resume = dp83811_resume,
+ },
+};
+module_phy_driver(dp83811_driver);
+
+static struct mdio_device_id __maybe_unused dp83811_tbl[] = {
+ { DP83TC811_PHY_ID, 0xfffffff0 },
+ { },
+};
+MODULE_DEVICE_TABLE(mdio, dp83811_tbl);
+
+MODULE_DESCRIPTION("Texas Instruments DP83TC811 PHY driver");
+MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 25e2a09..b8f57e9 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -1482,9 +1482,6 @@
}
}
-#ifndef UINT64_MAX
-#define UINT64_MAX (u64)(~((u64)0))
-#endif
static u64 marvell_get_stat(struct phy_device *phydev, int i)
{
struct marvell_hw_stat stat = marvell_hw_stats[i];
@@ -1494,7 +1491,7 @@
val = phy_read_paged(phydev, stat.page, stat.reg);
if (val < 0) {
- ret = UINT64_MAX;
+ ret = U64_MAX;
} else {
val = val & ((1 << stat.bits) - 1);
priv->stats[i] += val;
diff --git a/drivers/net/phy/mdio-bitbang.c b/drivers/net/phy/mdio-bitbang.c
index 403b085..15352f9 100644
--- a/drivers/net/phy/mdio-bitbang.c
+++ b/drivers/net/phy/mdio-bitbang.c
@@ -205,14 +205,6 @@
return 0;
}
-static int mdiobb_reset(struct mii_bus *bus)
-{
- struct mdiobb_ctrl *ctrl = bus->priv;
- if (ctrl->reset)
- ctrl->reset(bus);
- return 0;
-}
-
struct mii_bus *alloc_mdio_bitbang(struct mdiobb_ctrl *ctrl)
{
struct mii_bus *bus;
@@ -225,7 +217,6 @@
bus->read = mdiobb_read;
bus->write = mdiobb_write;
- bus->reset = mdiobb_reset;
bus->priv = ctrl;
return bus;
diff --git a/drivers/net/phy/mdio-boardinfo.c b/drivers/net/phy/mdio-boardinfo.c
index 1861f38..863496f 100644
--- a/drivers/net/phy/mdio-boardinfo.c
+++ b/drivers/net/phy/mdio-boardinfo.c
@@ -30,17 +30,20 @@
struct mdio_board_info *bi))
{
struct mdio_board_entry *be;
+ struct mdio_board_entry *tmp;
struct mdio_board_info *bi;
int ret;
mutex_lock(&mdio_board_lock);
- list_for_each_entry(be, &mdio_board_list, list) {
+ list_for_each_entry_safe(be, tmp, &mdio_board_list, list) {
bi = &be->board_info;
if (strcmp(bus->id, bi->bus_id))
continue;
+ mutex_unlock(&mdio_board_lock);
ret = cb(bus, bi);
+ mutex_lock(&mdio_board_lock);
if (ret)
continue;
diff --git a/drivers/net/phy/mdio-gpio.c b/drivers/net/phy/mdio-gpio.c
index 4333c6e..4e4c8da 100644
--- a/drivers/net/phy/mdio-gpio.c
+++ b/drivers/net/phy/mdio-gpio.c
@@ -24,8 +24,10 @@
#include <linux/slab.h>
#include <linux/interrupt.h>
#include <linux/platform_device.h>
+#include <linux/mdio-bitbang.h>
+#include <linux/mdio-gpio.h>
#include <linux/gpio.h>
-#include <linux/platform_data/mdio-gpio.h>
+#include <linux/gpio/consumer.h>
#include <linux/of_gpio.h>
#include <linux/of_mdio.h>
@@ -35,37 +37,22 @@
struct gpio_desc *mdc, *mdio, *mdo;
};
-static void *mdio_gpio_of_get_data(struct platform_device *pdev)
+static int mdio_gpio_get_data(struct device *dev,
+ struct mdio_gpio_info *bitbang)
{
- struct device_node *np = pdev->dev.of_node;
- struct mdio_gpio_platform_data *pdata;
- enum of_gpio_flags flags;
- int ret;
+ bitbang->mdc = devm_gpiod_get_index(dev, NULL, MDIO_GPIO_MDC,
+ GPIOD_OUT_LOW);
+ if (IS_ERR(bitbang->mdc))
+ return PTR_ERR(bitbang->mdc);
- pdata = devm_kzalloc(&pdev->dev, sizeof(*pdata), GFP_KERNEL);
- if (!pdata)
- return NULL;
+ bitbang->mdio = devm_gpiod_get_index(dev, NULL, MDIO_GPIO_MDIO,
+ GPIOD_IN);
+ if (IS_ERR(bitbang->mdio))
+ return PTR_ERR(bitbang->mdio);
- ret = of_get_gpio_flags(np, 0, &flags);
- if (ret < 0)
- return NULL;
-
- pdata->mdc = ret;
- pdata->mdc_active_low = flags & OF_GPIO_ACTIVE_LOW;
-
- ret = of_get_gpio_flags(np, 1, &flags);
- if (ret < 0)
- return NULL;
- pdata->mdio = ret;
- pdata->mdio_active_low = flags & OF_GPIO_ACTIVE_LOW;
-
- ret = of_get_gpio_flags(np, 2, &flags);
- if (ret > 0) {
- pdata->mdo = ret;
- pdata->mdo_active_low = flags & OF_GPIO_ACTIVE_LOW;
- }
-
- return pdata;
+ bitbang->mdo = devm_gpiod_get_index_optional(dev, NULL, MDIO_GPIO_MDO,
+ GPIOD_OUT_LOW);
+ return PTR_ERR_OR_ZERO(bitbang->mdo);
}
static void mdio_dir(struct mdiobb_ctrl *ctrl, int dir)
@@ -125,78 +112,28 @@
};
static struct mii_bus *mdio_gpio_bus_init(struct device *dev,
- struct mdio_gpio_platform_data *pdata,
+ struct mdio_gpio_info *bitbang,
int bus_id)
{
struct mii_bus *new_bus;
- struct mdio_gpio_info *bitbang;
- int i;
- int mdc, mdio, mdo;
- unsigned long mdc_flags = GPIOF_OUT_INIT_LOW;
- unsigned long mdio_flags = GPIOF_DIR_IN;
- unsigned long mdo_flags = GPIOF_OUT_INIT_HIGH;
-
- bitbang = devm_kzalloc(dev, sizeof(*bitbang), GFP_KERNEL);
- if (!bitbang)
- goto out;
bitbang->ctrl.ops = &mdio_gpio_ops;
- bitbang->ctrl.reset = pdata->reset;
- mdc = pdata->mdc;
- bitbang->mdc = gpio_to_desc(mdc);
- if (pdata->mdc_active_low)
- mdc_flags = GPIOF_OUT_INIT_HIGH | GPIOF_ACTIVE_LOW;
- mdio = pdata->mdio;
- bitbang->mdio = gpio_to_desc(mdio);
- if (pdata->mdio_active_low)
- mdio_flags |= GPIOF_ACTIVE_LOW;
- mdo = pdata->mdo;
- if (mdo) {
- bitbang->mdo = gpio_to_desc(mdo);
- if (pdata->mdo_active_low)
- mdo_flags = GPIOF_OUT_INIT_LOW | GPIOF_ACTIVE_LOW;
- }
new_bus = alloc_mdio_bitbang(&bitbang->ctrl);
if (!new_bus)
- goto out;
+ return NULL;
- new_bus->name = "GPIO Bitbanged MDIO",
-
- new_bus->phy_mask = pdata->phy_mask;
- new_bus->phy_ignore_ta_mask = pdata->phy_ignore_ta_mask;
- memcpy(new_bus->irq, pdata->irqs, sizeof(new_bus->irq));
+ new_bus->name = "GPIO Bitbanged MDIO";
new_bus->parent = dev;
- if (new_bus->phy_mask == ~0)
- goto out_free_bus;
-
- for (i = 0; i < PHY_MAX_ADDR; i++)
- if (!new_bus->irq[i])
- new_bus->irq[i] = PHY_POLL;
-
if (bus_id != -1)
snprintf(new_bus->id, MII_BUS_ID_SIZE, "gpio-%x", bus_id);
else
strncpy(new_bus->id, "gpio", MII_BUS_ID_SIZE);
- if (devm_gpio_request_one(dev, mdc, mdc_flags, "mdc"))
- goto out_free_bus;
-
- if (devm_gpio_request_one(dev, mdio, mdio_flags, "mdio"))
- goto out_free_bus;
-
- if (mdo && devm_gpio_request_one(dev, mdo, mdo_flags, "mdo"))
- goto out_free_bus;
-
dev_set_drvdata(dev, new_bus);
return new_bus;
-
-out_free_bus:
- free_mdio_bitbang(new_bus);
-out:
- return NULL;
}
static void mdio_gpio_bus_deinit(struct device *dev)
@@ -216,34 +153,33 @@
static int mdio_gpio_probe(struct platform_device *pdev)
{
- struct mdio_gpio_platform_data *pdata;
+ struct mdio_gpio_info *bitbang;
struct mii_bus *new_bus;
int ret, bus_id;
+ bitbang = devm_kzalloc(&pdev->dev, sizeof(*bitbang), GFP_KERNEL);
+ if (!bitbang)
+ return -ENOMEM;
+
+ ret = mdio_gpio_get_data(&pdev->dev, bitbang);
+ if (ret)
+ return ret;
+
if (pdev->dev.of_node) {
- pdata = mdio_gpio_of_get_data(pdev);
bus_id = of_alias_get_id(pdev->dev.of_node, "mdio-gpio");
if (bus_id < 0) {
dev_warn(&pdev->dev, "failed to get alias id\n");
bus_id = 0;
}
} else {
- pdata = dev_get_platdata(&pdev->dev);
bus_id = pdev->id;
}
- if (!pdata)
- return -ENODEV;
-
- new_bus = mdio_gpio_bus_init(&pdev->dev, pdata, bus_id);
+ new_bus = mdio_gpio_bus_init(&pdev->dev, bitbang, bus_id);
if (!new_bus)
return -ENODEV;
- if (pdev->dev.of_node)
- ret = of_mdiobus_register(new_bus, pdev->dev.of_node);
- else
- ret = mdiobus_register(new_bus);
-
+ ret = of_mdiobus_register(new_bus, pdev->dev.of_node);
if (ret)
mdio_gpio_bus_deinit(&pdev->dev);
diff --git a/drivers/net/phy/mdio-mscc-miim.c b/drivers/net/phy/mdio-mscc-miim.c
new file mode 100644
index 0000000..badbc99
--- /dev/null
+++ b/drivers/net/phy/mdio-mscc-miim.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Driver for the MDIO interface of Microsemi network switches.
+ *
+ * Author: Alexandre Belloni <alexandre.belloni@bootlin.com>
+ * Copyright (c) 2017 Microsemi Corporation
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/platform_device.h>
+#include <linux/bitops.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/of_mdio.h>
+
+#define MSCC_MIIM_REG_STATUS 0x0
+#define MSCC_MIIM_STATUS_STAT_BUSY BIT(3)
+#define MSCC_MIIM_REG_CMD 0x8
+#define MSCC_MIIM_CMD_OPR_WRITE BIT(1)
+#define MSCC_MIIM_CMD_OPR_READ BIT(2)
+#define MSCC_MIIM_CMD_WRDATA_SHIFT 4
+#define MSCC_MIIM_CMD_REGAD_SHIFT 20
+#define MSCC_MIIM_CMD_PHYAD_SHIFT 25
+#define MSCC_MIIM_CMD_VLD BIT(31)
+#define MSCC_MIIM_REG_DATA 0xC
+#define MSCC_MIIM_DATA_ERROR (BIT(16) | BIT(17))
+
+#define MSCC_PHY_REG_PHY_CFG 0x0
+#define PHY_CFG_PHY_ENA (BIT(0) | BIT(1) | BIT(2) | BIT(3))
+#define PHY_CFG_PHY_COMMON_RESET BIT(4)
+#define PHY_CFG_PHY_RESET (BIT(5) | BIT(6) | BIT(7) | BIT(8))
+#define MSCC_PHY_REG_PHY_STATUS 0x4
+
+struct mscc_miim_dev {
+ void __iomem *regs;
+ void __iomem *phy_regs;
+};
+
+static int mscc_miim_wait_ready(struct mii_bus *bus)
+{
+ struct mscc_miim_dev *miim = bus->priv;
+ u32 val;
+
+ readl_poll_timeout(miim->regs + MSCC_MIIM_REG_STATUS, val,
+ !(val & MSCC_MIIM_STATUS_STAT_BUSY), 100, 250000);
+ if (val & MSCC_MIIM_STATUS_STAT_BUSY)
+ return -ETIMEDOUT;
+
+ return 0;
+}
+
+static int mscc_miim_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+ struct mscc_miim_dev *miim = bus->priv;
+ u32 val;
+ int ret;
+
+ ret = mscc_miim_wait_ready(bus);
+ if (ret)
+ goto out;
+
+ writel(MSCC_MIIM_CMD_VLD | (mii_id << MSCC_MIIM_CMD_PHYAD_SHIFT) |
+ (regnum << MSCC_MIIM_CMD_REGAD_SHIFT) | MSCC_MIIM_CMD_OPR_READ,
+ miim->regs + MSCC_MIIM_REG_CMD);
+
+ ret = mscc_miim_wait_ready(bus);
+ if (ret)
+ goto out;
+
+ val = readl(miim->regs + MSCC_MIIM_REG_DATA);
+ if (val & MSCC_MIIM_DATA_ERROR) {
+ ret = -EIO;
+ goto out;
+ }
+
+ ret = val & 0xFFFF;
+out:
+ return ret;
+}
+
+static int mscc_miim_write(struct mii_bus *bus, int mii_id,
+ int regnum, u16 value)
+{
+ struct mscc_miim_dev *miim = bus->priv;
+ int ret;
+
+ ret = mscc_miim_wait_ready(bus);
+ if (ret < 0)
+ goto out;
+
+ writel(MSCC_MIIM_CMD_VLD | (mii_id << MSCC_MIIM_CMD_PHYAD_SHIFT) |
+ (regnum << MSCC_MIIM_CMD_REGAD_SHIFT) |
+ (value << MSCC_MIIM_CMD_WRDATA_SHIFT) |
+ MSCC_MIIM_CMD_OPR_WRITE,
+ miim->regs + MSCC_MIIM_REG_CMD);
+
+out:
+ return ret;
+}
+
+static int mscc_miim_reset(struct mii_bus *bus)
+{
+ struct mscc_miim_dev *miim = bus->priv;
+
+ if (miim->phy_regs) {
+ writel(0, miim->phy_regs + MSCC_PHY_REG_PHY_CFG);
+ writel(0x1ff, miim->phy_regs + MSCC_PHY_REG_PHY_CFG);
+ mdelay(500);
+ }
+
+ return 0;
+}
+
+static int mscc_miim_probe(struct platform_device *pdev)
+{
+ struct resource *res;
+ struct mii_bus *bus;
+ struct mscc_miim_dev *dev;
+ int ret;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res)
+ return -ENODEV;
+
+ bus = devm_mdiobus_alloc_size(&pdev->dev, sizeof(*dev));
+ if (!bus)
+ return -ENOMEM;
+
+ bus->name = "mscc_miim";
+ bus->read = mscc_miim_read;
+ bus->write = mscc_miim_write;
+ bus->reset = mscc_miim_reset;
+ snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(&pdev->dev));
+ bus->parent = &pdev->dev;
+
+ dev = bus->priv;
+ dev->regs = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(dev->regs)) {
+ dev_err(&pdev->dev, "Unable to map MIIM registers\n");
+ return PTR_ERR(dev->regs);
+ }
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+ if (res) {
+ dev->phy_regs = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(dev->phy_regs)) {
+ dev_err(&pdev->dev, "Unable to map internal phy registers\n");
+ return PTR_ERR(dev->phy_regs);
+ }
+ }
+
+ ret = of_mdiobus_register(bus, pdev->dev.of_node);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
+ return ret;
+ }
+
+ platform_set_drvdata(pdev, bus);
+
+ return 0;
+}
+
+static int mscc_miim_remove(struct platform_device *pdev)
+{
+ struct mii_bus *bus = platform_get_drvdata(pdev);
+
+ mdiobus_unregister(bus);
+
+ return 0;
+}
+
+static const struct of_device_id mscc_miim_match[] = {
+ { .compatible = "mscc,ocelot-miim" },
+ { }
+};
+MODULE_DEVICE_TABLE(of, mscc_miim_match);
+
+static struct platform_driver mscc_miim_driver = {
+ .probe = mscc_miim_probe,
+ .remove = mscc_miim_remove,
+ .driver = {
+ .name = "mscc-miim",
+ .of_match_table = mscc_miim_match,
+ },
+};
+
+module_platform_driver(mscc_miim_driver);
+
+MODULE_DESCRIPTION("Microsemi MIIM driver");
+MODULE_AUTHOR("Alexandre Belloni <alexandre.belloni@bootlin.com>");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index ab195f0..3db06b4 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -681,9 +681,6 @@
}
}
-#ifndef UINT64_MAX
-#define UINT64_MAX (u64)(~((u64)0))
-#endif
static u64 kszphy_get_stat(struct phy_device *phydev, int i)
{
struct kszphy_hw_stat stat = kszphy_hw_stats[i];
@@ -693,7 +690,7 @@
val = phy_read(phydev, stat.reg);
if (val < 0) {
- ret = UINT64_MAX;
+ ret = U64_MAX;
} else {
val = val & ((1 << stat.bits) - 1);
priv->stats[i] += val;
diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
index a97ac8c..2d67937 100644
--- a/drivers/net/phy/microchip.c
+++ b/drivers/net/phy/microchip.c
@@ -21,6 +21,8 @@
#include <linux/phy.h>
#include <linux/microchipphy.h>
#include <linux/delay.h>
+#include <linux/of.h>
+#include <dt-bindings/net/microchip-lan78xx.h>
#define DRIVER_AUTHOR "WOOJUNG HUH <woojung.huh@microchip.com>"
#define DRIVER_DESC "Microchip LAN88XX PHY driver"
@@ -225,6 +227,8 @@
{
struct device *dev = &phydev->mdio.dev;
struct lan88xx_priv *priv;
+ u32 led_modes[4];
+ int len;
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
@@ -232,6 +236,27 @@
priv->wolopts = 0;
+ len = of_property_read_variable_u32_array(dev->of_node,
+ "microchip,led-modes",
+ led_modes,
+ 0,
+ ARRAY_SIZE(led_modes));
+ if (len >= 0) {
+ u32 reg = 0;
+ int i;
+
+ for (i = 0; i < len; i++) {
+ if (led_modes[i] > 15)
+ return -EINVAL;
+ reg |= led_modes[i] << (i * 4);
+ }
+ for (; i < ARRAY_SIZE(led_modes); i++)
+ reg |= LAN78XX_FORCE_LED_OFF << (i * 4);
+ (void)phy_write(phydev, LAN78XX_PHY_LED_MODE_SELECT, reg);
+ } else if (len == -EOVERFLOW) {
+ return -EINVAL;
+ }
+
/* these values can be used to identify internal PHY */
priv->chip_id = phy_read_mmd(phydev, 3, LAN88XX_MMD3_CHIP_ID);
priv->chip_rev = phy_read_mmd(phydev, 3, LAN88XX_MMD3_CHIP_REV);
diff --git a/drivers/net/phy/microchip_t1.c b/drivers/net/phy/microchip_t1.c
new file mode 100644
index 0000000..b1917dd
--- /dev/null
+++ b/drivers/net/phy/microchip_t1.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 Microchip Technology
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mii.h>
+#include <linux/phy.h>
+
+/* Interrupt Source Register */
+#define LAN87XX_INTERRUPT_SOURCE (0x18)
+
+/* Interrupt Mask Register */
+#define LAN87XX_INTERRUPT_MASK (0x19)
+#define LAN87XX_MASK_LINK_UP (0x0004)
+#define LAN87XX_MASK_LINK_DOWN (0x0002)
+
+#define DRIVER_AUTHOR "Nisar Sayed <nisar.sayed@microchip.com>"
+#define DRIVER_DESC "Microchip LAN87XX T1 PHY driver"
+
+static int lan87xx_phy_config_intr(struct phy_device *phydev)
+{
+ int rc, val = 0;
+
+ if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+ /* unmask all source and clear them before enable */
+ rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, 0x7FFF);
+ rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
+ val = LAN87XX_MASK_LINK_UP | LAN87XX_MASK_LINK_DOWN;
+ }
+
+ rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, val);
+
+ return rc < 0 ? rc : 0;
+}
+
+static int lan87xx_phy_ack_interrupt(struct phy_device *phydev)
+{
+ int rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
+
+ return rc < 0 ? rc : 0;
+}
+
+static struct phy_driver microchip_t1_phy_driver[] = {
+ {
+ .phy_id = 0x0007c150,
+ .phy_id_mask = 0xfffffff0,
+ .name = "Microchip LAN87xx T1",
+
+ .features = SUPPORTED_100baseT_Full,
+ .flags = PHY_HAS_INTERRUPT,
+
+ .config_init = genphy_config_init,
+ .config_aneg = genphy_config_aneg,
+
+ .ack_interrupt = lan87xx_phy_ack_interrupt,
+ .config_intr = lan87xx_phy_config_intr,
+
+ .suspend = genphy_suspend,
+ .resume = genphy_resume,
+ }
+};
+
+module_phy_driver(microchip_t1_phy_driver);
+
+static struct mdio_device_id __maybe_unused microchip_t1_tbl[] = {
+ { 0x0007c150, 0xfffffff0 },
+ { }
+};
+
+MODULE_DEVICE_TABLE(mdio, microchip_t1_tbl);
+
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index c582b2d..af4dc44 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -19,6 +19,7 @@
#include <linux/phylink.h>
#include <linux/rtnetlink.h>
#include <linux/spinlock.h>
+#include <linux/timer.h>
#include <linux/workqueue.h>
#include "sfp.h"
@@ -54,6 +55,7 @@
/* The link configuration settings */
struct phylink_link_state link_config;
struct gpio_desc *link_gpio;
+ struct timer_list link_poll;
void (*get_fixed_state)(struct net_device *dev,
struct phylink_link_state *s);
@@ -360,7 +362,7 @@
if (pl->get_fixed_state)
pl->get_fixed_state(pl->netdev, state);
else if (pl->link_gpio)
- state->link = !!gpiod_get_value(pl->link_gpio);
+ state->link = !!gpiod_get_value_cansleep(pl->link_gpio);
}
/* Flow control is resolved according to our and the link partners
@@ -500,6 +502,15 @@
queue_work(system_power_efficient_wq, &pl->resolve);
}
+static void phylink_fixed_poll(struct timer_list *t)
+{
+ struct phylink *pl = container_of(t, struct phylink, link_poll);
+
+ mod_timer(t, jiffies + HZ);
+
+ phylink_run_resolve(pl);
+}
+
static const struct sfp_upstream_ops sfp_phylink_ops;
static int phylink_register_sfp(struct phylink *pl,
@@ -572,6 +583,7 @@
pl->link_config.an_enabled = true;
pl->ops = ops;
__set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
+ timer_setup(&pl->link_poll, phylink_fixed_poll, 0);
bitmap_fill(pl->supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
linkmode_copy(pl->link_config.advertising, pl->supported);
@@ -612,6 +624,8 @@
{
if (pl->sfp_bus)
sfp_unregister_upstream(pl->sfp_bus);
+ if (!IS_ERR_OR_NULL(pl->link_gpio))
+ gpiod_put(pl->link_gpio);
cancel_work_sync(&pl->resolve);
kfree(pl);
@@ -903,6 +917,8 @@
clear_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
phylink_run_resolve(pl);
+ if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
+ mod_timer(&pl->link_poll, jiffies + HZ);
if (pl->sfp_bus)
sfp_upstream_start(pl->sfp_bus);
if (pl->phydev)
@@ -927,6 +943,8 @@
phy_stop(pl->phydev);
if (pl->sfp_bus)
sfp_upstream_stop(pl->sfp_bus);
+ if (pl->link_an_mode == MLO_AN_FIXED && !IS_ERR(pl->link_gpio))
+ del_timer_sync(&pl->link_poll);
set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
queue_work(system_power_efficient_wq, &pl->resolve);
diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
index fd6c23f..d437f4f5 100644
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -132,6 +132,13 @@
br_max = br_nom + br_nom * id->ext.br_min / 100;
br_min = br_nom - br_nom * id->ext.br_min / 100;
}
+
+ /* When using passive cables, in case neither BR,min nor BR,max
+ * are specified, set br_min to 0 as the nominal value is then
+ * used as the maximum.
+ */
+ if (br_min == br_max && id->base.sfp_ct_passive)
+ br_min = 0;
}
/* Set ethtool support from the compliance fields. */
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 4ab6e9a..c4c92db 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -976,6 +976,7 @@
if (pdev->dev.of_node) {
struct device_node *node = pdev->dev.of_node;
const struct of_device_id *id;
+ struct i2c_adapter *i2c;
struct device_node *np;
id = of_match_node(sfp_of_match, node);
@@ -985,19 +986,20 @@
sff = sfp->type = id->data;
np = of_parse_phandle(node, "i2c-bus", 0);
- if (np) {
- struct i2c_adapter *i2c;
+ if (!np) {
+ dev_err(sfp->dev, "missing 'i2c-bus' property\n");
+ return -ENODEV;
+ }
- i2c = of_find_i2c_adapter_by_node(np);
- of_node_put(np);
- if (!i2c)
- return -EPROBE_DEFER;
+ i2c = of_find_i2c_adapter_by_node(np);
+ of_node_put(np);
+ if (!i2c)
+ return -EPROBE_DEFER;
- err = sfp_i2c_configure(sfp, i2c);
- if (err < 0) {
- i2c_put_adapter(i2c);
- return err;
- }
+ err = sfp_i2c_configure(sfp, i2c);
+ if (err < 0) {
+ i2c_put_adapter(i2c);
+ return err;
}
}
@@ -1065,6 +1067,15 @@
if (poll)
mod_delayed_work(system_wq, &sfp->poll, poll_jiffies);
+ /* We could have an issue in cases no Tx disable pin is available or
+ * wired as modules using a laser as their light source will continue to
+ * be active when the fiber is removed. This could be a safety issue and
+ * we should at least warn the user about that.
+ */
+ if (!sfp->gpio[GPIO_TX_DISABLE])
+ dev_warn(sfp->dev,
+ "No tx_disable pin: SFP modules will always be emitting.\n");
+
return 0;
}
diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index be399d6..c3282083 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -168,9 +168,6 @@
}
}
-#ifndef UINT64_MAX
-#define UINT64_MAX (u64)(~((u64)0))
-#endif
static u64 smsc_get_stat(struct phy_device *phydev, int i)
{
struct smsc_hw_stat stat = smsc_hw_stats[i];
@@ -179,7 +176,7 @@
val = phy_read(phydev, stat.reg);
if (val < 0)
- ret = UINT64_MAX;
+ ret = U64_MAX;
else
ret = val;
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index ddb6bf8..e6730a0 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1026,7 +1026,8 @@
}
team->dev->vlan_features = vlan_features;
- team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
+ team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+ NETIF_F_GSO_UDP_L4;
team->dev->hard_header_len = max_hard_header_len;
team->dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
@@ -1128,6 +1129,7 @@
int err;
lag_upper_info.tx_type = team->mode->lag_tx_type;
+ lag_upper_info.hash_type = NETDEV_LAG_HASH_UNKNOWN;
err = netdev_master_upper_dev_link(port->dev, team->dev, NULL,
&lag_upper_info, extack);
if (err)
@@ -2117,7 +2119,7 @@
NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_VLAN_CTAG_FILTER;
- dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
+ dev->hw_features |= NETIF_F_GSO_ENCAP_ALL | NETIF_F_GSO_UDP_L4;
dev->features |= dev->hw_features;
}
@@ -2942,7 +2944,7 @@
case NETDEV_CHANGE:
if (netif_running(port->dev))
team_port_change_check(port,
- !!netif_carrier_ok(port->dev));
+ !!netif_oper_up(port->dev));
break;
case NETDEV_UNREGISTER:
team_del_slave(port->team->dev, dev);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 45d8077..2265d2c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -70,6 +70,7 @@
#include <net/netns/generic.h>
#include <net/rtnetlink.h>
#include <net/sock.h>
+#include <net/xdp.h>
#include <linux/seq_file.h>
#include <linux/uio.h>
#include <linux/skb_array.h>
@@ -248,11 +249,11 @@
__be16 h_vlan_TCI;
};
-bool tun_is_xdp_buff(void *ptr)
+bool tun_is_xdp_frame(void *ptr)
{
return (unsigned long)ptr & TUN_XDP_FLAG;
}
-EXPORT_SYMBOL(tun_is_xdp_buff);
+EXPORT_SYMBOL(tun_is_xdp_frame);
void *tun_xdp_to_ptr(void *ptr)
{
@@ -525,11 +526,6 @@
rcu_read_lock();
- /* We may get a very small possibility of OOO during switching, not
- * worth to optimize.*/
- if (tun->numqueues == 1 || tfile->detached)
- goto unlock;
-
e = tun_flow_find(head, rxhash);
if (likely(e)) {
/* TODO: keep queueing to old queue until it's empty? */
@@ -548,7 +544,6 @@
spin_unlock_bh(&tun->lock);
}
-unlock:
rcu_read_unlock();
}
@@ -660,10 +655,10 @@
{
if (!ptr)
return;
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
- put_page(virt_to_head_page(xdp->data));
+ xdp_return_frame(xdpf);
} else {
__skb_array_destroy_skb(ptr);
}
@@ -848,6 +843,12 @@
tun->dev, tfile->queue_index);
if (err < 0)
goto out;
+ err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
+ MEM_TYPE_PAGE_SHARED, NULL);
+ if (err < 0) {
+ xdp_rxq_info_unreg(&tfile->xdp_rxq);
+ goto out;
+ }
err = 0;
}
@@ -1284,42 +1285,54 @@
.ndo_get_stats64 = tun_net_get_stats64,
};
-static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
+static int tun_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames)
{
struct tun_struct *tun = netdev_priv(dev);
- struct xdp_buff *buff = xdp->data_hard_start;
- int headroom = xdp->data - xdp->data_hard_start;
struct tun_file *tfile;
u32 numqueues;
- int ret = 0;
-
- /* Assure headroom is available and buff is properly aligned */
- if (unlikely(headroom < sizeof(*xdp) || tun_is_xdp_buff(xdp)))
- return -ENOSPC;
-
- *buff = *xdp;
+ int drops = 0;
+ int cnt = n;
+ int i;
rcu_read_lock();
numqueues = READ_ONCE(tun->numqueues);
if (!numqueues) {
- ret = -ENOSPC;
- goto out;
+ rcu_read_unlock();
+ return -ENXIO; /* Caller will free/return all frames */
}
tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
numqueues]);
- /* Encode the XDP flag into lowest bit for consumer to differ
- * XDP buffer from sk_buff.
- */
- if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(buff))) {
- this_cpu_inc(tun->pcpu_stats->tx_dropped);
- ret = -ENOSPC;
- }
-out:
+ spin_lock(&tfile->tx_ring.producer_lock);
+ for (i = 0; i < n; i++) {
+ struct xdp_frame *xdp = frames[i];
+ /* Encode the XDP flag into lowest bit for consumer to differ
+ * XDP buffer from sk_buff.
+ */
+ void *frame = tun_xdp_to_ptr(xdp);
+
+ if (__ptr_ring_produce(&tfile->tx_ring, frame)) {
+ this_cpu_inc(tun->pcpu_stats->tx_dropped);
+ xdp_return_frame_rx_napi(xdp);
+ drops++;
+ }
+ }
+ spin_unlock(&tfile->tx_ring.producer_lock);
+
rcu_read_unlock();
- return ret;
+ return cnt - drops;
+}
+
+static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
+{
+ struct xdp_frame *frame = convert_to_xdp_frame(xdp);
+
+ if (unlikely(!frame))
+ return -EOVERFLOW;
+
+ return tun_xdp_xmit(dev, 1, &frame);
}
static void tun_xdp_flush(struct net_device *dev)
@@ -1680,7 +1693,7 @@
case XDP_TX:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
- if (tun_xdp_xmit(tun->dev, &xdp))
+ if (tun_xdp_tx(tun->dev, &xdp))
goto err_redirect;
tun_xdp_flush(tun->dev);
rcu_read_unlock();
@@ -1688,6 +1701,7 @@
return NULL;
case XDP_PASS:
delta = orig_data - xdp.data;
+ len = xdp.data_end - xdp.data;
break;
default:
bpf_warn_invalid_xdp_action(act);
@@ -1708,7 +1722,7 @@
}
skb_reserve(skb, pad - delta);
- skb_put(skb, len + delta);
+ skb_put(skb, len);
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
@@ -1929,10 +1943,13 @@
rcu_read_unlock();
}
- rcu_read_lock();
- if (!rcu_dereference(tun->steering_prog))
+ /* Compute the costly rx hash only if needed for flow updates.
+ * We may get a very small possibility of OOO during switching, not
+ * worth to optimize.
+ */
+ if (!rcu_access_pointer(tun->steering_prog) && tun->numqueues > 1 &&
+ !tfile->detached)
rxhash = __skb_get_hash_symmetric(skb);
- rcu_read_unlock();
if (frags) {
/* Exercise flow dissector code path. */
@@ -2001,11 +2018,11 @@
static ssize_t tun_put_user_xdp(struct tun_struct *tun,
struct tun_file *tfile,
- struct xdp_buff *xdp,
+ struct xdp_frame *xdp_frame,
struct iov_iter *iter)
{
int vnet_hdr_sz = 0;
- size_t size = xdp->data_end - xdp->data;
+ size_t size = xdp_frame->len;
struct tun_pcpu_stats *stats;
size_t ret;
@@ -2021,7 +2038,7 @@
iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso));
}
- ret = copy_to_iter(xdp->data, size, iter) + vnet_hdr_sz;
+ ret = copy_to_iter(xdp_frame->data, size, iter) + vnet_hdr_sz;
stats = get_cpu_ptr(tun->pcpu_stats);
u64_stats_update_begin(&stats->syncp);
@@ -2189,11 +2206,11 @@
return err;
}
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
- ret = tun_put_user_xdp(tun, tfile, xdp, to);
- put_page(virt_to_head_page(xdp->data));
+ ret = tun_put_user_xdp(tun, tfile, xdpf, to);
+ xdp_return_frame(xdpf);
} else {
struct sk_buff *skb = ptr;
@@ -2432,10 +2449,10 @@
static int tun_ptr_peek_len(void *ptr)
{
if (likely(ptr)) {
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
- return xdp->data_end - xdp->data;
+ return xdpf->len;
}
return __skb_array_len_with_tag(ptr);
} else {
@@ -2849,10 +2866,10 @@
unsigned long arg, int ifreq_len)
{
struct tun_file *tfile = file->private_data;
+ struct net *net = sock_net(&tfile->sk);
struct tun_struct *tun;
void __user* argp = (void __user*)arg;
struct ifreq ifr;
- struct net *net;
kuid_t owner;
kgid_t group;
int sndbuf;
@@ -2876,14 +2893,18 @@
*/
return put_user(IFF_TUN | IFF_TAP | TUN_FEATURES,
(unsigned int __user*)argp);
- } else if (cmd == TUNSETQUEUE)
+ } else if (cmd == TUNSETQUEUE) {
return tun_set_queue(file, &ifr);
+ } else if (cmd == SIOCGSKNS) {
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return -EPERM;
+ return open_related_ns(&net->ns, get_net_ns);
+ }
ret = 0;
rtnl_lock();
tun = tun_get(tfile);
- net = sock_net(&tfile->sk);
if (cmd == TUNSETIFF) {
ret = -EEXIST;
if (tun)
@@ -2913,14 +2934,6 @@
tfile->ifindex = ifindex;
goto unlock;
}
- if (cmd == SIOCGSKNS) {
- ret = -EPERM;
- if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
- goto unlock;
-
- ret = open_related_ns(&net->ns, get_net_ns);
- goto unlock;
- }
ret = -EBADFD;
if (!tun)
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index f28bd74..418b090 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -111,6 +111,7 @@
select MII
select PHYLIB
select MICROCHIP_PHY
+ select FIXED_PHY
help
This option adds support for Microchip LAN78XX based USB 2
& USB 3 10/100/1000 Ethernet adapters.
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 0867f72..8dff87e 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -36,13 +36,14 @@
#include <linux/irq.h>
#include <linux/irqchip/chained_irq.h>
#include <linux/microchipphy.h>
-#include <linux/phy.h>
+#include <linux/phy_fixed.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
#include "lan78xx.h"
#define DRIVER_AUTHOR "WOOJUNG HUH <woojung.huh@microchip.com>"
#define DRIVER_DESC "LAN78XX USB 3.0 Gigabit Ethernet Devices"
#define DRIVER_NAME "lan78xx"
-#define DRIVER_VERSION "1.0.6"
#define TX_TIMEOUT_JIFFIES (5 * HZ)
#define THROTTLE_JIFFIES (HZ / 8)
@@ -278,6 +279,30 @@
u64 eee_tx_lpi_time;
};
+static u32 lan78xx_regs[] = {
+ ID_REV,
+ INT_STS,
+ HW_CFG,
+ PMT_CTL,
+ E2P_CMD,
+ E2P_DATA,
+ USB_STATUS,
+ VLAN_TYPE,
+ MAC_CR,
+ MAC_RX,
+ MAC_TX,
+ FLOW,
+ ERR_STS,
+ MII_ACC,
+ MII_DATA,
+ EEE_TX_LPI_REQ_DLY,
+ EEE_TW_TX_SYS,
+ EEE_TX_LPI_REM_DLY,
+ WUCSR
+};
+
+#define PHY_REG_SIZE (32 * sizeof(u32))
+
struct lan78xx_net;
struct lan78xx_priv {
@@ -1477,7 +1502,6 @@
struct lan78xx_net *dev = netdev_priv(net);
strncpy(info->driver, DRIVER_NAME, sizeof(info->driver));
- strncpy(info->version, DRIVER_VERSION, sizeof(info->version));
usb_make_path(dev->udev, info->bus_info, sizeof(info->bus_info));
}
@@ -1605,6 +1629,34 @@
return ret;
}
+static int lan78xx_get_regs_len(struct net_device *netdev)
+{
+ if (!netdev->phydev)
+ return (sizeof(lan78xx_regs));
+ else
+ return (sizeof(lan78xx_regs) + PHY_REG_SIZE);
+}
+
+static void
+lan78xx_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
+ void *buf)
+{
+ u32 *data = buf;
+ int i, j;
+ struct lan78xx_net *dev = netdev_priv(netdev);
+
+ /* Read Device/MAC registers */
+ for (i = 0; i < (sizeof(lan78xx_regs) / sizeof(u32)); i++)
+ lan78xx_read_reg(dev, lan78xx_regs[i], &data[i]);
+
+ if (!netdev->phydev)
+ return;
+
+ /* Read PHY registers */
+ for (j = 0; j < 32; i++, j++)
+ data[i] = phy_read(netdev->phydev, j);
+}
+
static const struct ethtool_ops lan78xx_ethtool_ops = {
.get_link = lan78xx_get_link,
.nway_reset = phy_ethtool_nway_reset,
@@ -1625,6 +1677,8 @@
.set_pauseparam = lan78xx_set_pause,
.get_link_ksettings = lan78xx_get_link_ksettings,
.set_link_ksettings = lan78xx_set_link_ksettings,
+ .get_regs_len = lan78xx_get_regs_len,
+ .get_regs = lan78xx_get_regs,
};
static int lan78xx_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
@@ -1652,34 +1706,31 @@
addr[5] = (addr_hi >> 8) & 0xFF;
if (!is_valid_ether_addr(addr)) {
- /* reading mac address from EEPROM or OTP */
- if ((lan78xx_read_eeprom(dev, EEPROM_MAC_OFFSET, ETH_ALEN,
- addr) == 0) ||
- (lan78xx_read_otp(dev, EEPROM_MAC_OFFSET, ETH_ALEN,
- addr) == 0)) {
- if (is_valid_ether_addr(addr)) {
- /* eeprom values are valid so use them */
- netif_dbg(dev, ifup, dev->net,
- "MAC address read from EEPROM");
- } else {
- /* generate random MAC */
- random_ether_addr(addr);
- netif_dbg(dev, ifup, dev->net,
- "MAC address set to random addr");
- }
-
- addr_lo = addr[0] | (addr[1] << 8) |
- (addr[2] << 16) | (addr[3] << 24);
- addr_hi = addr[4] | (addr[5] << 8);
-
- ret = lan78xx_write_reg(dev, RX_ADDRL, addr_lo);
- ret = lan78xx_write_reg(dev, RX_ADDRH, addr_hi);
+ if (!eth_platform_get_mac_address(&dev->udev->dev, addr)) {
+ /* valid address present in Device Tree */
+ netif_dbg(dev, ifup, dev->net,
+ "MAC address read from Device Tree");
+ } else if (((lan78xx_read_eeprom(dev, EEPROM_MAC_OFFSET,
+ ETH_ALEN, addr) == 0) ||
+ (lan78xx_read_otp(dev, EEPROM_MAC_OFFSET,
+ ETH_ALEN, addr) == 0)) &&
+ is_valid_ether_addr(addr)) {
+ /* eeprom values are valid so use them */
+ netif_dbg(dev, ifup, dev->net,
+ "MAC address read from EEPROM");
} else {
/* generate random MAC */
random_ether_addr(addr);
netif_dbg(dev, ifup, dev->net,
"MAC address set to random addr");
}
+
+ addr_lo = addr[0] | (addr[1] << 8) |
+ (addr[2] << 16) | (addr[3] << 24);
+ addr_hi = addr[4] | (addr[5] << 8);
+
+ ret = lan78xx_write_reg(dev, RX_ADDRL, addr_lo);
+ ret = lan78xx_write_reg(dev, RX_ADDRH, addr_hi);
}
ret = lan78xx_write_reg(dev, MAF_LO(0), addr_lo);
@@ -1762,6 +1813,7 @@
static int lan78xx_mdio_init(struct lan78xx_net *dev)
{
+ struct device_node *node;
int ret;
dev->mdiobus = mdiobus_alloc();
@@ -1790,7 +1842,10 @@
break;
}
- ret = mdiobus_register(dev->mdiobus);
+ node = of_get_child_by_name(dev->udev->dev.of_node, "mdio");
+ ret = of_mdiobus_register(dev->mdiobus, node);
+ if (node)
+ of_node_put(node);
if (ret) {
netdev_err(dev->net, "can't register MDIO bus\n");
goto exit1;
@@ -2003,52 +2058,91 @@
return 1;
}
+static struct phy_device *lan7801_phy_init(struct lan78xx_net *dev)
+{
+ u32 buf;
+ int ret;
+ struct fixed_phy_status fphy_status = {
+ .link = 1,
+ .speed = SPEED_1000,
+ .duplex = DUPLEX_FULL,
+ };
+ struct phy_device *phydev;
+
+ phydev = phy_find_first(dev->mdiobus);
+ if (!phydev) {
+ netdev_dbg(dev->net, "PHY Not Found!! Registering Fixed PHY\n");
+ phydev = fixed_phy_register(PHY_POLL, &fphy_status, -1,
+ NULL);
+ if (IS_ERR(phydev)) {
+ netdev_err(dev->net, "No PHY/fixed_PHY found\n");
+ return NULL;
+ }
+ netdev_dbg(dev->net, "Registered FIXED PHY\n");
+ dev->interface = PHY_INTERFACE_MODE_RGMII;
+ ret = lan78xx_write_reg(dev, MAC_RGMII_ID,
+ MAC_RGMII_ID_TXC_DELAY_EN_);
+ ret = lan78xx_write_reg(dev, RGMII_TX_BYP_DLL, 0x3D00);
+ ret = lan78xx_read_reg(dev, HW_CFG, &buf);
+ buf |= HW_CFG_CLK125_EN_;
+ buf |= HW_CFG_REFCLK25_EN_;
+ ret = lan78xx_write_reg(dev, HW_CFG, buf);
+ } else {
+ if (!phydev->drv) {
+ netdev_err(dev->net, "no PHY driver found\n");
+ return NULL;
+ }
+ dev->interface = PHY_INTERFACE_MODE_RGMII;
+ /* external PHY fixup for KSZ9031RNX */
+ ret = phy_register_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0,
+ ksz9031rnx_fixup);
+ if (ret < 0) {
+ netdev_err(dev->net, "Failed to register fixup for PHY_KSZ9031RNX\n");
+ return NULL;
+ }
+ /* external PHY fixup for LAN8835 */
+ ret = phy_register_fixup_for_uid(PHY_LAN8835, 0xfffffff0,
+ lan8835_fixup);
+ if (ret < 0) {
+ netdev_err(dev->net, "Failed to register fixup for PHY_LAN8835\n");
+ return NULL;
+ }
+ /* add more external PHY fixup here if needed */
+
+ phydev->is_internal = false;
+ }
+ return phydev;
+}
+
static int lan78xx_phy_init(struct lan78xx_net *dev)
{
int ret;
u32 mii_adv;
struct phy_device *phydev;
- phydev = phy_find_first(dev->mdiobus);
- if (!phydev) {
- netdev_err(dev->net, "no PHY found\n");
- return -EIO;
- }
-
- if ((dev->chipid == ID_REV_CHIP_ID_7800_) ||
- (dev->chipid == ID_REV_CHIP_ID_7850_)) {
- phydev->is_internal = true;
- dev->interface = PHY_INTERFACE_MODE_GMII;
-
- } else if (dev->chipid == ID_REV_CHIP_ID_7801_) {
- if (!phydev->drv) {
- netdev_err(dev->net, "no PHY driver found\n");
+ switch (dev->chipid) {
+ case ID_REV_CHIP_ID_7801_:
+ phydev = lan7801_phy_init(dev);
+ if (!phydev) {
+ netdev_err(dev->net, "lan7801: PHY Init Failed");
return -EIO;
}
+ break;
- dev->interface = PHY_INTERFACE_MODE_RGMII;
-
- /* external PHY fixup for KSZ9031RNX */
- ret = phy_register_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0,
- ksz9031rnx_fixup);
- if (ret < 0) {
- netdev_err(dev->net, "fail to register fixup\n");
- return ret;
+ case ID_REV_CHIP_ID_7800_:
+ case ID_REV_CHIP_ID_7850_:
+ phydev = phy_find_first(dev->mdiobus);
+ if (!phydev) {
+ netdev_err(dev->net, "no PHY found\n");
+ return -EIO;
}
- /* external PHY fixup for LAN8835 */
- ret = phy_register_fixup_for_uid(PHY_LAN8835, 0xfffffff0,
- lan8835_fixup);
- if (ret < 0) {
- netdev_err(dev->net, "fail to register fixup\n");
- return ret;
- }
- /* add more external PHY fixup here if needed */
+ phydev->is_internal = true;
+ dev->interface = PHY_INTERFACE_MODE_GMII;
+ break;
- phydev->is_internal = false;
- } else {
- netdev_err(dev->net, "unknown ID found\n");
- ret = -EIO;
- goto error;
+ default:
+ netdev_err(dev->net, "Unknown CHIP ID found\n");
+ return -EIO;
}
/* if phyirq is not set, use polling mode in phylib */
@@ -2067,6 +2161,16 @@
if (ret) {
netdev_err(dev->net, "can't attach PHY to %s\n",
dev->mdiobus->id);
+ if (dev->chipid == ID_REV_CHIP_ID_7801_) {
+ if (phy_is_pseudo_fixed_link(phydev)) {
+ fixed_phy_unregister(phydev);
+ } else {
+ phy_unregister_fixup_for_uid(PHY_KSZ9031RNX,
+ 0xfffffff0);
+ phy_unregister_fixup_for_uid(PHY_LAN8835,
+ 0xfffffff0);
+ }
+ }
return -EIO;
}
@@ -2079,17 +2183,33 @@
mii_adv = (u32)mii_advertise_flowctrl(dev->fc_request_control);
phydev->advertising |= mii_adv_to_ethtool_adv_t(mii_adv);
+ if (phydev->mdio.dev.of_node) {
+ u32 reg;
+ int len;
+
+ len = of_property_count_elems_of_size(phydev->mdio.dev.of_node,
+ "microchip,led-modes",
+ sizeof(u32));
+ if (len >= 0) {
+ /* Ensure the appropriate LEDs are enabled */
+ lan78xx_read_reg(dev, HW_CFG, ®);
+ reg &= ~(HW_CFG_LED0_EN_ |
+ HW_CFG_LED1_EN_ |
+ HW_CFG_LED2_EN_ |
+ HW_CFG_LED3_EN_);
+ reg |= (len > 0) * HW_CFG_LED0_EN_ |
+ (len > 1) * HW_CFG_LED1_EN_ |
+ (len > 2) * HW_CFG_LED2_EN_ |
+ (len > 3) * HW_CFG_LED3_EN_;
+ lan78xx_write_reg(dev, HW_CFG, reg);
+ }
+ }
+
genphy_config_aneg(phydev);
dev->fc_autoneg = phydev->autoneg;
return 0;
-
-error:
- phy_unregister_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0);
- phy_unregister_fixup_for_uid(PHY_LAN8835, 0xfffffff0);
-
- return ret;
}
static int lan78xx_set_rx_max_frame_length(struct lan78xx_net *dev, int size)
@@ -3487,6 +3607,7 @@
struct lan78xx_net *dev;
struct usb_device *udev;
struct net_device *net;
+ struct phy_device *phydev;
dev = usb_get_intfdata(intf);
usb_set_intfdata(intf, NULL);
@@ -3495,12 +3616,16 @@
udev = interface_to_usbdev(intf);
net = dev->net;
+ phydev = net->phydev;
phy_unregister_fixup_for_uid(PHY_KSZ9031RNX, 0xfffffff0);
phy_unregister_fixup_for_uid(PHY_LAN8835, 0xfffffff0);
phy_disconnect(net->phydev);
+ if (phy_is_pseudo_fixed_link(phydev))
+ fixed_phy_unregister(phydev);
+
unregister_netdev(net);
cancel_delayed_work_sync(&dev->wq);
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 032e1ac..b2647dd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -419,46 +419,70 @@
virtqueue_kick(sq->vq);
}
-static bool __virtnet_xdp_xmit(struct virtnet_info *vi,
- struct xdp_buff *xdp)
+static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
+ struct send_queue *sq,
+ struct xdp_frame *xdpf)
{
struct virtio_net_hdr_mrg_rxbuf *hdr;
- unsigned int len;
- struct send_queue *sq;
- unsigned int qp;
- void *xdp_sent;
int err;
+ /* virtqueue want to use data area in-front of packet */
+ if (unlikely(xdpf->metasize > 0))
+ return -EOPNOTSUPP;
+
+ if (unlikely(xdpf->headroom < vi->hdr_len))
+ return -EOVERFLOW;
+
+ /* Make room for virtqueue hdr (also change xdpf->headroom?) */
+ xdpf->data -= vi->hdr_len;
+ /* Zero header and leave csum up to XDP layers */
+ hdr = xdpf->data;
+ memset(hdr, 0, vi->hdr_len);
+ xdpf->len += vi->hdr_len;
+
+ sg_init_one(sq->sg, xdpf->data, xdpf->len);
+
+ err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdpf, GFP_ATOMIC);
+ if (unlikely(err))
+ return -ENOSPC; /* Caller handle free/refcnt */
+
+ return 0;
+}
+
+static int __virtnet_xdp_tx_xmit(struct virtnet_info *vi,
+ struct xdp_frame *xdpf)
+{
+ struct xdp_frame *xdpf_sent;
+ struct send_queue *sq;
+ unsigned int len;
+ unsigned int qp;
+
qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
sq = &vi->sq[qp];
/* Free up any pending old buffers before queueing new ones. */
- while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
- struct page *sent_page = virt_to_head_page(xdp_sent);
+ while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+ xdp_return_frame(xdpf_sent);
- put_page(sent_page);
- }
-
- xdp->data -= vi->hdr_len;
- /* Zero header and leave csum up to XDP layers */
- hdr = xdp->data;
- memset(hdr, 0, vi->hdr_len);
-
- sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
-
- err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp->data, GFP_ATOMIC);
- if (unlikely(err))
- return false; /* Caller handle free/refcnt */
-
- return true;
+ return __virtnet_xdp_xmit_one(vi, sq, xdpf);
}
-static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp)
+static int virtnet_xdp_xmit(struct net_device *dev,
+ int n, struct xdp_frame **frames)
{
struct virtnet_info *vi = netdev_priv(dev);
struct receive_queue *rq = vi->rq;
+ struct xdp_frame *xdpf_sent;
struct bpf_prog *xdp_prog;
- bool sent;
+ struct send_queue *sq;
+ unsigned int len;
+ unsigned int qp;
+ int drops = 0;
+ int err;
+ int i;
+
+ qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+ sq = &vi->sq[qp];
/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
* indicate XDP resources have been successfully allocated.
@@ -467,10 +491,20 @@
if (!xdp_prog)
return -ENXIO;
- sent = __virtnet_xdp_xmit(vi, xdp);
- if (!sent)
- return -ENOSPC;
- return 0;
+ /* Free up any pending old buffers before queueing new ones. */
+ while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL)
+ xdp_return_frame(xdpf_sent);
+
+ for (i = 0; i < n; i++) {
+ struct xdp_frame *xdpf = frames[i];
+
+ err = __virtnet_xdp_xmit_one(vi, sq, xdpf);
+ if (err) {
+ xdp_return_frame_rx_napi(xdpf);
+ drops++;
+ }
+ }
+ return n - drops;
}
static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
@@ -559,7 +593,6 @@
struct page *page = virt_to_head_page(buf);
unsigned int delta = 0;
struct page *xdp_page;
- bool sent;
int err;
len -= vi->hdr_len;
@@ -568,6 +601,7 @@
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
+ struct xdp_frame *xdpf;
struct xdp_buff xdp;
void *orig_data;
u32 act;
@@ -608,10 +642,14 @@
case XDP_PASS:
/* Recalculate length in case bpf program changed it */
delta = orig_data - xdp.data;
+ len = xdp.data_end - xdp.data;
break;
case XDP_TX:
- sent = __virtnet_xdp_xmit(vi, &xdp);
- if (unlikely(!sent)) {
+ xdpf = convert_to_xdp_frame(&xdp);
+ if (unlikely(!xdpf))
+ goto err_xdp;
+ err = __virtnet_xdp_tx_xmit(vi, xdpf);
+ if (unlikely(err)) {
trace_xdp_exception(vi->dev, xdp_prog, act);
goto err_xdp;
}
@@ -641,7 +679,7 @@
goto err;
}
skb_reserve(skb, headroom - delta);
- skb_put(skb, len + delta);
+ skb_put(skb, len);
if (!delta) {
buf += header_offset;
memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
@@ -694,7 +732,6 @@
struct bpf_prog *xdp_prog;
unsigned int truesize;
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
- bool sent;
int err;
head_skb = NULL;
@@ -702,6 +739,7 @@
rcu_read_lock();
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
+ struct xdp_frame *xdpf;
struct page *xdp_page;
struct xdp_buff xdp;
void *data;
@@ -755,6 +793,10 @@
offset = xdp.data -
page_address(xdp_page) - vi->hdr_len;
+ /* recalculate len if xdp.data or xdp.data_end were
+ * adjusted
+ */
+ len = xdp.data_end - xdp.data + vi->hdr_len;
/* We can only create skb based on xdp_page. */
if (unlikely(xdp_page != page)) {
rcu_read_unlock();
@@ -765,8 +807,11 @@
}
break;
case XDP_TX:
- sent = __virtnet_xdp_xmit(vi, &xdp);
- if (unlikely(!sent)) {
+ xdpf = convert_to_xdp_frame(&xdp);
+ if (unlikely(!xdpf))
+ goto err_xdp;
+ err = __virtnet_xdp_tx_xmit(vi, xdpf);
+ if (unlikely(err)) {
trace_xdp_exception(vi->dev, xdp_prog, act);
if (unlikely(xdp_page != page))
put_page(xdp_page);
@@ -1311,6 +1356,13 @@
if (err < 0)
return err;
+ err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq,
+ MEM_TYPE_PAGE_SHARED, NULL);
+ if (err < 0) {
+ xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
+ return err;
+ }
+
virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
}
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 27a9bb8..e454dfc 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2949,7 +2949,7 @@
* completion.
*/
while (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state))
- msleep(1);
+ usleep_range(1000, 2000);
vmxnet3_quiesce_dev(adapter);
@@ -2999,7 +2999,7 @@
* completion.
*/
while (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state))
- msleep(1);
+ usleep_range(1000, 2000);
if (netif_running(netdev)) {
vmxnet3_quiesce_dev(adapter);
@@ -3589,7 +3589,7 @@
* completion.
*/
while (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state))
- msleep(1);
+ usleep_range(1000, 2000);
if (test_and_set_bit(VMXNET3_STATE_BIT_QUIESCED,
&adapter->state)) {
diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index 2ff2731..559db05 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -600,7 +600,7 @@
* completion.
*/
while (test_and_set_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state))
- msleep(1);
+ usleep_range(1000, 2000);
if (netif_running(netdev)) {
vmxnet3_quiesce_dev(adapter);
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 0a2b180..90b5f39 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -48,6 +48,9 @@
struct net_vrf {
struct rtable __rcu *rth;
struct rt6_info __rcu *rt6;
+#if IS_ENABLED(CONFIG_IPV6)
+ struct fib6_table *fib6_table;
+#endif
u32 tb_id;
};
@@ -496,7 +499,6 @@
int flags = DST_HOST | DST_NOPOLICY | DST_NOXFRM;
struct net_vrf *vrf = netdev_priv(dev);
struct net *net = dev_net(dev);
- struct fib6_table *rt6i_table;
struct rt6_info *rt6;
int rc = -ENOMEM;
@@ -504,8 +506,8 @@
if (!ipv6_mod_enabled())
return 0;
- rt6i_table = fib6_new_table(net, vrf->tb_id);
- if (!rt6i_table)
+ vrf->fib6_table = fib6_new_table(net, vrf->tb_id);
+ if (!vrf->fib6_table)
goto out;
/* create a dst for routing packets out a VRF device */
@@ -513,7 +515,6 @@
if (!rt6)
goto out;
- rt6->rt6i_table = rt6i_table;
rt6->dst.output = vrf_output6;
rcu_assign_pointer(vrf->rt6, rt6);
@@ -946,22 +947,8 @@
int flags)
{
struct net_vrf *vrf = netdev_priv(dev);
- struct fib6_table *table = NULL;
- struct rt6_info *rt6;
- rcu_read_lock();
-
- /* fib6_table does not have a refcnt and can not be freed */
- rt6 = rcu_dereference(vrf->rt6);
- if (likely(rt6))
- table = rt6->rt6i_table;
-
- rcu_read_unlock();
-
- if (!table)
- return NULL;
-
- return ip6_pol_route(net, table, ifindex, fl6, skb, flags);
+ return ip6_pol_route(net, vrf->fib6_table, ifindex, fl6, skb, flags);
}
static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev,
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fab7a4d..aee0e60 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2085,9 +2085,13 @@
local_ip = vxlan->cfg.saddr;
dst_cache = &rdst->dst_cache;
md->gbp = skb->mark;
- ttl = vxlan->cfg.ttl;
- if (!ttl && vxlan_addr_multicast(dst))
- ttl = 1;
+ if (flags & VXLAN_F_TTL_INHERIT) {
+ ttl = ip_tunnel_get_ttl(old_iph, skb);
+ } else {
+ ttl = vxlan->cfg.ttl;
+ if (!ttl && vxlan_addr_multicast(dst))
+ ttl = 1;
+ }
tos = vxlan->cfg.tos;
if (tos == 1)
@@ -2709,6 +2713,7 @@
[IFLA_VXLAN_GBP] = { .type = NLA_FLAG, },
[IFLA_VXLAN_GPE] = { .type = NLA_FLAG, },
[IFLA_VXLAN_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG },
+ [IFLA_VXLAN_TTL_INHERIT] = { .type = NLA_FLAG },
};
static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -3254,6 +3259,12 @@
if (data[IFLA_VXLAN_TTL])
conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
+ if (data[IFLA_VXLAN_TTL_INHERIT]) {
+ if (changelink)
+ return -EOPNOTSUPP;
+ conf->flags |= VXLAN_F_TTL_INHERIT;
+ }
+
if (data[IFLA_VXLAN_LABEL])
conf->label = nla_get_be32(data[IFLA_VXLAN_LABEL]) &
IPV6_FLOWLABEL_MASK;
diff --git a/drivers/net/wireless/ath/ath10k/Kconfig b/drivers/net/wireless/ath/ath10k/Kconfig
index deb5ae2..84f071a 100644
--- a/drivers/net/wireless/ath/ath10k/Kconfig
+++ b/drivers/net/wireless/ath/ath10k/Kconfig
@@ -4,12 +4,16 @@
select ATH_COMMON
select CRC32
select WANT_DEV_COREDUMP
+ select ATH10K_CE
---help---
This module adds support for wireless adapters based on
Atheros IEEE 802.11ac family of chipsets.
If you choose to build a module, it'll be called ath10k.
+config ATH10K_CE
+ bool
+
config ATH10K_PCI
tristate "Atheros ath10k PCI support"
depends on ATH10K && PCI
@@ -36,6 +40,14 @@
This module adds experimental support for USB bus. Currently
work in progress and will not fully work.
+config ATH10K_SNOC
+ tristate "Qualcomm ath10k SNOC support (EXPERIMENTAL)"
+ depends on ATH10K && ARCH_QCOM
+ ---help---
+ This module adds support for integrated WCN3990 chip connected
+ to system NOC(SNOC). Currently work in progress and will not
+ fully work.
+
config ATH10K_DEBUG
bool "Atheros ath10k debugging"
depends on ATH10K
diff --git a/drivers/net/wireless/ath/ath10k/Makefile b/drivers/net/wireless/ath/ath10k/Makefile
index 6739ac2..44d60a6 100644
--- a/drivers/net/wireless/ath/ath10k/Makefile
+++ b/drivers/net/wireless/ath/ath10k/Makefile
@@ -22,10 +22,10 @@
ath10k_core-$(CONFIG_MAC80211_DEBUGFS) += debugfs_sta.o
ath10k_core-$(CONFIG_PM) += wow.o
ath10k_core-$(CONFIG_DEV_COREDUMP) += coredump.o
+ath10k_core-$(CONFIG_ATH10K_CE) += ce.o
obj-$(CONFIG_ATH10K_PCI) += ath10k_pci.o
-ath10k_pci-y += pci.o \
- ce.o
+ath10k_pci-y += pci.o
ath10k_pci-$(CONFIG_ATH10K_AHB) += ahb.o
@@ -35,5 +35,8 @@
obj-$(CONFIG_ATH10K_USB) += ath10k_usb.o
ath10k_usb-y += usb.o
+obj-$(CONFIG_ATH10K_SNOC) += ath10k_snoc.o
+ath10k_snoc-y += snoc.o
+
# for tracing framework to find trace.h
CFLAGS_trace.o := -I$(src)
diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index b9def7b..3b96a43 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -1,6 +1,7 @@
/*
* Copyright (c) 2005-2011 Atheros Communications Inc.
* Copyright (c) 2011-2017 Qualcomm Atheros, Inc.
+ * Copyright (c) 2018 The Linux Foundation. All rights reserved.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
@@ -58,6 +59,74 @@
* the buffer is sent/received.
*/
+static inline u32 shadow_sr_wr_ind_addr(struct ath10k *ar,
+ struct ath10k_ce_pipe *ce_state)
+{
+ u32 ce_id = ce_state->id;
+ u32 addr = 0;
+
+ switch (ce_id) {
+ case 0:
+ addr = 0x00032000;
+ break;
+ case 3:
+ addr = 0x0003200C;
+ break;
+ case 4:
+ addr = 0x00032010;
+ break;
+ case 5:
+ addr = 0x00032014;
+ break;
+ case 7:
+ addr = 0x0003201C;
+ break;
+ default:
+ ath10k_warn(ar, "invalid CE id: %d", ce_id);
+ break;
+ }
+ return addr;
+}
+
+static inline u32 shadow_dst_wr_ind_addr(struct ath10k *ar,
+ struct ath10k_ce_pipe *ce_state)
+{
+ u32 ce_id = ce_state->id;
+ u32 addr = 0;
+
+ switch (ce_id) {
+ case 1:
+ addr = 0x00032034;
+ break;
+ case 2:
+ addr = 0x00032038;
+ break;
+ case 5:
+ addr = 0x00032044;
+ break;
+ case 7:
+ addr = 0x0003204C;
+ break;
+ case 8:
+ addr = 0x00032050;
+ break;
+ case 9:
+ addr = 0x00032054;
+ break;
+ case 10:
+ addr = 0x00032058;
+ break;
+ case 11:
+ addr = 0x0003205C;
+ break;
+ default:
+ ath10k_warn(ar, "invalid CE id: %d", ce_id);
+ break;
+ }
+
+ return addr;
+}
+
static inline unsigned int
ath10k_set_ring_byte(unsigned int offset,
struct ath10k_hw_ce_regs_addr_map *addr_map)
@@ -116,11 +185,46 @@
ar->hw_ce_regs->sr_wr_index_addr);
}
+static inline u32 ath10k_ce_src_ring_read_index_from_ddr(struct ath10k *ar,
+ u32 ce_id)
+{
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+
+ return ce->vaddr_rri[ce_id] & CE_DDR_RRI_MASK;
+}
+
static inline u32 ath10k_ce_src_ring_read_index_get(struct ath10k *ar,
u32 ce_ctrl_addr)
{
- return ath10k_ce_read32(ar, ce_ctrl_addr +
- ar->hw_ce_regs->current_srri_addr);
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+ u32 ce_id = COPY_ENGINE_ID(ce_ctrl_addr);
+ struct ath10k_ce_pipe *ce_state = &ce->ce_states[ce_id];
+ u32 index;
+
+ if (ar->hw_params.rri_on_ddr &&
+ (ce_state->attr_flags & CE_ATTR_DIS_INTR))
+ index = ath10k_ce_src_ring_read_index_from_ddr(ar, ce_id);
+ else
+ index = ath10k_ce_read32(ar, ce_ctrl_addr +
+ ar->hw_ce_regs->current_srri_addr);
+
+ return index;
+}
+
+static inline void
+ath10k_ce_shadow_src_ring_write_index_set(struct ath10k *ar,
+ struct ath10k_ce_pipe *ce_state,
+ unsigned int value)
+{
+ ath10k_ce_write32(ar, shadow_sr_wr_ind_addr(ar, ce_state), value);
+}
+
+static inline void
+ath10k_ce_shadow_dest_ring_write_index_set(struct ath10k *ar,
+ struct ath10k_ce_pipe *ce_state,
+ unsigned int value)
+{
+ ath10k_ce_write32(ar, shadow_dst_wr_ind_addr(ar, ce_state), value);
}
static inline void ath10k_ce_src_ring_base_addr_set(struct ath10k *ar,
@@ -181,11 +285,31 @@
ath10k_set_ring_byte(n, ctrl_regs->dst_ring));
}
+static inline
+ u32 ath10k_ce_dest_ring_read_index_from_ddr(struct ath10k *ar, u32 ce_id)
+{
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+
+ return (ce->vaddr_rri[ce_id] >> CE_DDR_DRRI_SHIFT) &
+ CE_DDR_RRI_MASK;
+}
+
static inline u32 ath10k_ce_dest_ring_read_index_get(struct ath10k *ar,
u32 ce_ctrl_addr)
{
- return ath10k_ce_read32(ar, ce_ctrl_addr +
- ar->hw_ce_regs->current_drri_addr);
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+ u32 ce_id = COPY_ENGINE_ID(ce_ctrl_addr);
+ struct ath10k_ce_pipe *ce_state = &ce->ce_states[ce_id];
+ u32 index;
+
+ if (ar->hw_params.rri_on_ddr &&
+ (ce_state->attr_flags & CE_ATTR_DIS_INTR))
+ index = ath10k_ce_dest_ring_read_index_from_ddr(ar, ce_id);
+ else
+ index = ath10k_ce_read32(ar, ce_ctrl_addr +
+ ar->hw_ce_regs->current_drri_addr);
+
+ return index;
}
static inline void ath10k_ce_dest_ring_base_addr_set(struct ath10k *ar,
@@ -376,8 +500,14 @@
write_index = CE_RING_IDX_INCR(nentries_mask, write_index);
/* WORKAROUND */
- if (!(flags & CE_SEND_FLAG_GATHER))
- ath10k_ce_src_ring_write_index_set(ar, ctrl_addr, write_index);
+ if (!(flags & CE_SEND_FLAG_GATHER)) {
+ if (ar->hw_params.shadow_reg_support)
+ ath10k_ce_shadow_src_ring_write_index_set(ar, ce_state,
+ write_index);
+ else
+ ath10k_ce_src_ring_write_index_set(ar, ctrl_addr,
+ write_index);
+ }
src_ring->write_index = write_index;
exit:
@@ -395,7 +525,7 @@
struct ath10k_ce_ring *src_ring = ce_state->src_ring;
struct ce_desc_64 *desc, sdesc;
unsigned int nentries_mask = src_ring->nentries_mask;
- unsigned int sw_index = src_ring->sw_index;
+ unsigned int sw_index;
unsigned int write_index = src_ring->write_index;
u32 ctrl_addr = ce_state->ctrl_addr;
__le32 *addr;
@@ -409,6 +539,11 @@
ath10k_warn(ar, "%s: send more we can (nbytes: %d, max: %d)\n",
__func__, nbytes, ce_state->src_sz_max);
+ if (ar->hw_params.rri_on_ddr)
+ sw_index = ath10k_ce_src_ring_read_index_from_ddr(ar, ce_state->id);
+ else
+ sw_index = src_ring->sw_index;
+
if (unlikely(CE_RING_DELTA(nentries_mask,
write_index, sw_index - 1) <= 0)) {
ret = -ENOSR;
@@ -464,6 +599,7 @@
return ce_state->ops->ce_send_nolock(ce_state, per_transfer_context,
buffer, nbytes, transfer_id, flags);
}
+EXPORT_SYMBOL(ath10k_ce_send_nolock);
void __ath10k_ce_send_revert(struct ath10k_ce_pipe *pipe)
{
@@ -491,6 +627,7 @@
src_ring->per_transfer_context[src_ring->write_index] = NULL;
}
+EXPORT_SYMBOL(__ath10k_ce_send_revert);
int ath10k_ce_send(struct ath10k_ce_pipe *ce_state,
void *per_transfer_context,
@@ -510,6 +647,7 @@
return ret;
}
+EXPORT_SYMBOL(ath10k_ce_send);
int ath10k_ce_num_free_src_entries(struct ath10k_ce_pipe *pipe)
{
@@ -525,6 +663,7 @@
return delta;
}
+EXPORT_SYMBOL(ath10k_ce_num_free_src_entries);
int __ath10k_ce_rx_num_free_bufs(struct ath10k_ce_pipe *pipe)
{
@@ -539,6 +678,7 @@
return CE_RING_DELTA(nentries_mask, write_index, sw_index - 1);
}
+EXPORT_SYMBOL(__ath10k_ce_rx_num_free_bufs);
static int __ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx,
dma_addr_t paddr)
@@ -615,13 +755,14 @@
/* Prevent CE ring stuck issue that will occur when ring is full.
* Make sure that write index is 1 less than read index.
*/
- if ((cur_write_idx + nentries) == dest_ring->sw_index)
+ if (((cur_write_idx + nentries) & nentries_mask) == dest_ring->sw_index)
nentries -= 1;
write_index = CE_RING_IDX_ADD(nentries_mask, write_index, nentries);
ath10k_ce_dest_ring_write_index_set(ar, ctrl_addr, write_index);
dest_ring->write_index = write_index;
}
+EXPORT_SYMBOL(ath10k_ce_rx_update_write_idx);
int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx,
dma_addr_t paddr)
@@ -636,6 +777,7 @@
return ret;
}
+EXPORT_SYMBOL(ath10k_ce_rx_post_buf);
/*
* Guts of ath10k_ce_completed_recv_next.
@@ -748,6 +890,7 @@
per_transfer_ctx,
nbytesp);
}
+EXPORT_SYMBOL(ath10k_ce_completed_recv_next_nolock);
int ath10k_ce_completed_recv_next(struct ath10k_ce_pipe *ce_state,
void **per_transfer_contextp,
@@ -766,6 +909,7 @@
return ret;
}
+EXPORT_SYMBOL(ath10k_ce_completed_recv_next);
static int _ath10k_ce_revoke_recv_next(struct ath10k_ce_pipe *ce_state,
void **per_transfer_contextp,
@@ -882,6 +1026,7 @@
per_transfer_contextp,
bufferp);
}
+EXPORT_SYMBOL(ath10k_ce_revoke_recv_next);
/*
* Guts of ath10k_ce_completed_send_next.
@@ -915,7 +1060,10 @@
src_ring->hw_index = read_index;
}
- read_index = src_ring->hw_index;
+ if (ar->hw_params.rri_on_ddr)
+ read_index = ath10k_ce_src_ring_read_index_get(ar, ctrl_addr);
+ else
+ read_index = src_ring->hw_index;
if (read_index == sw_index)
return -EIO;
@@ -936,6 +1084,7 @@
return 0;
}
+EXPORT_SYMBOL(ath10k_ce_completed_send_next_nolock);
static void ath10k_ce_extract_desc_data(struct ath10k *ar,
struct ath10k_ce_ring *src_ring,
@@ -1025,6 +1174,7 @@
return ret;
}
+EXPORT_SYMBOL(ath10k_ce_cancel_send_next);
int ath10k_ce_completed_send_next(struct ath10k_ce_pipe *ce_state,
void **per_transfer_contextp)
@@ -1040,6 +1190,7 @@
return ret;
}
+EXPORT_SYMBOL(ath10k_ce_completed_send_next);
/*
* Guts of interrupt handler for per-engine interrupts on a particular CE.
@@ -1078,6 +1229,7 @@
spin_unlock_bh(&ce->ce_lock);
}
+EXPORT_SYMBOL(ath10k_ce_per_engine_service);
/*
* Handler for per-engine interrupts on ALL active CEs.
@@ -1102,6 +1254,7 @@
ath10k_ce_per_engine_service(ar, ce_id);
}
}
+EXPORT_SYMBOL(ath10k_ce_per_engine_service_any);
/*
* Adjust interrupts for the copy complete handler.
@@ -1139,6 +1292,7 @@
return 0;
}
+EXPORT_SYMBOL(ath10k_ce_disable_interrupts);
void ath10k_ce_enable_interrupts(struct ath10k *ar)
{
@@ -1154,6 +1308,7 @@
ath10k_ce_per_engine_handler_adjust(ce_state);
}
}
+EXPORT_SYMBOL(ath10k_ce_enable_interrupts);
static int ath10k_ce_init_src_ring(struct ath10k *ar,
unsigned int ce_id,
@@ -1234,6 +1389,22 @@
return 0;
}
+static int ath10k_ce_alloc_shadow_base(struct ath10k *ar,
+ struct ath10k_ce_ring *src_ring,
+ u32 nentries)
+{
+ src_ring->shadow_base_unaligned = kcalloc(nentries,
+ sizeof(struct ce_desc),
+ GFP_KERNEL);
+ if (!src_ring->shadow_base_unaligned)
+ return -ENOMEM;
+
+ src_ring->shadow_base = (struct ce_desc *)
+ PTR_ALIGN(src_ring->shadow_base_unaligned,
+ CE_DESC_RING_ALIGN);
+ return 0;
+}
+
static struct ath10k_ce_ring *
ath10k_ce_alloc_src_ring(struct ath10k *ar, unsigned int ce_id,
const struct ce_attr *attr)
@@ -1241,6 +1412,7 @@
struct ath10k_ce_ring *src_ring;
u32 nentries = attr->src_nentries;
dma_addr_t base_addr;
+ int ret;
nentries = roundup_pow_of_two(nentries);
@@ -1277,6 +1449,19 @@
ALIGN(src_ring->base_addr_ce_space_unaligned,
CE_DESC_RING_ALIGN);
+ if (ar->hw_params.shadow_reg_support) {
+ ret = ath10k_ce_alloc_shadow_base(ar, src_ring, nentries);
+ if (ret) {
+ dma_free_coherent(ar->dev,
+ (nentries * sizeof(struct ce_desc) +
+ CE_DESC_RING_ALIGN),
+ src_ring->base_addr_owner_space_unaligned,
+ base_addr);
+ kfree(src_ring);
+ return ERR_PTR(ret);
+ }
+ }
+
return src_ring;
}
@@ -1287,6 +1472,7 @@
struct ath10k_ce_ring *src_ring;
u32 nentries = attr->src_nentries;
dma_addr_t base_addr;
+ int ret;
nentries = roundup_pow_of_two(nentries);
@@ -1322,6 +1508,19 @@
ALIGN(src_ring->base_addr_ce_space_unaligned,
CE_DESC_RING_ALIGN);
+ if (ar->hw_params.shadow_reg_support) {
+ ret = ath10k_ce_alloc_shadow_base(ar, src_ring, nentries);
+ if (ret) {
+ dma_free_coherent(ar->dev,
+ (nentries * sizeof(struct ce_desc) +
+ CE_DESC_RING_ALIGN),
+ src_ring->base_addr_owner_space_unaligned,
+ base_addr);
+ kfree(src_ring);
+ return ERR_PTR(ret);
+ }
+ }
+
return src_ring;
}
@@ -1454,6 +1653,7 @@
return 0;
}
+EXPORT_SYMBOL(ath10k_ce_init_pipe);
static void ath10k_ce_deinit_src_ring(struct ath10k *ar, unsigned int ce_id)
{
@@ -1479,6 +1679,7 @@
ath10k_ce_deinit_src_ring(ar, ce_id);
ath10k_ce_deinit_dest_ring(ar, ce_id);
}
+EXPORT_SYMBOL(ath10k_ce_deinit_pipe);
static void _ath10k_ce_free_pipe(struct ath10k *ar, int ce_id)
{
@@ -1486,6 +1687,8 @@
struct ath10k_ce_pipe *ce_state = &ce->ce_states[ce_id];
if (ce_state->src_ring) {
+ if (ar->hw_params.shadow_reg_support)
+ kfree(ce_state->src_ring->shadow_base_unaligned);
dma_free_coherent(ar->dev,
(ce_state->src_ring->nentries *
sizeof(struct ce_desc) +
@@ -1515,6 +1718,8 @@
struct ath10k_ce_pipe *ce_state = &ce->ce_states[ce_id];
if (ce_state->src_ring) {
+ if (ar->hw_params.shadow_reg_support)
+ kfree(ce_state->src_ring->shadow_base_unaligned);
dma_free_coherent(ar->dev,
(ce_state->src_ring->nentries *
sizeof(struct ce_desc_64) +
@@ -1545,6 +1750,7 @@
ce_state->ops->ce_free_pipe(ar, ce_id);
}
+EXPORT_SYMBOL(ath10k_ce_free_pipe);
void ath10k_ce_dump_registers(struct ath10k *ar,
struct ath10k_fw_crash_data *crash_data)
@@ -1584,6 +1790,7 @@
spin_unlock_bh(&ce->ce_lock);
}
+EXPORT_SYMBOL(ath10k_ce_dump_registers);
static const struct ath10k_ce_ops ce_ops = {
.ce_alloc_src_ring = ath10k_ce_alloc_src_ring,
@@ -1680,3 +1887,47 @@
return 0;
}
+EXPORT_SYMBOL(ath10k_ce_alloc_pipe);
+
+void ath10k_ce_alloc_rri(struct ath10k *ar)
+{
+ int i;
+ u32 value;
+ u32 ctrl1_regs;
+ u32 ce_base_addr;
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+
+ ce->vaddr_rri = dma_alloc_coherent(ar->dev,
+ (CE_COUNT * sizeof(u32)),
+ &ce->paddr_rri, GFP_KERNEL);
+
+ if (!ce->vaddr_rri)
+ return;
+
+ ath10k_ce_write32(ar, ar->hw_ce_regs->ce_rri_low,
+ lower_32_bits(ce->paddr_rri));
+ ath10k_ce_write32(ar, ar->hw_ce_regs->ce_rri_high,
+ (upper_32_bits(ce->paddr_rri) &
+ CE_DESC_FLAGS_GET_MASK));
+
+ for (i = 0; i < CE_COUNT; i++) {
+ ctrl1_regs = ar->hw_ce_regs->ctrl1_regs->addr;
+ ce_base_addr = ath10k_ce_base_address(ar, i);
+ value = ath10k_ce_read32(ar, ce_base_addr + ctrl1_regs);
+ value |= ar->hw_ce_regs->upd->mask;
+ ath10k_ce_write32(ar, ce_base_addr + ctrl1_regs, value);
+ }
+
+ memset(ce->vaddr_rri, 0, CE_COUNT * sizeof(u32));
+}
+EXPORT_SYMBOL(ath10k_ce_alloc_rri);
+
+void ath10k_ce_free_rri(struct ath10k *ar)
+{
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+
+ dma_free_coherent(ar->dev, (CE_COUNT * sizeof(u32)),
+ ce->vaddr_rri,
+ ce->paddr_rri);
+}
+EXPORT_SYMBOL(ath10k_ce_free_rri);
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index 2c3c8f5..dbeffae 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -1,6 +1,7 @@
/*
* Copyright (c) 2005-2011 Atheros Communications Inc.
* Copyright (c) 2011-2017 Qualcomm Atheros, Inc.
+ * Copyright (c) 2018 The Linux Foundation. All rights reserved.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
@@ -48,6 +49,9 @@
#define CE_DESC_FLAGS_META_DATA_MASK ar->hw_values->ce_desc_meta_data_mask
#define CE_DESC_FLAGS_META_DATA_LSB ar->hw_values->ce_desc_meta_data_lsb
+#define CE_DDR_RRI_MASK GENMASK(15, 0)
+#define CE_DDR_DRRI_SHIFT 16
+
struct ce_desc {
__le32 addr;
__le16 nbytes;
@@ -113,6 +117,9 @@
/* CE address space */
u32 base_addr_ce_space;
+ char *shadow_base_unaligned;
+ struct ce_desc *shadow_base;
+
/* keep last */
void *per_transfer_context[0];
};
@@ -153,6 +160,8 @@
spinlock_t ce_lock;
const struct ath10k_bus_ops *bus_ops;
struct ath10k_ce_pipe ce_states[CE_COUNT_MAX];
+ u32 *vaddr_rri;
+ dma_addr_t paddr_rri;
};
/*==================Send====================*/
@@ -261,6 +270,8 @@
void ath10k_ce_enable_interrupts(struct ath10k *ar);
void ath10k_ce_dump_registers(struct ath10k *ar,
struct ath10k_fw_crash_data *crash_data);
+void ath10k_ce_alloc_rri(struct ath10k *ar);
+void ath10k_ce_free_rri(struct ath10k *ar);
/* ce_attr.flags values */
/* Use NonSnooping PCIe accesses? */
@@ -327,6 +338,9 @@
return CE0_BASE_ADDRESS + (CE1_BASE_ADDRESS - CE0_BASE_ADDRESS) * ce_id;
}
+#define COPY_ENGINE_ID(COPY_ENGINE_BASE_ADDRESS) (((COPY_ENGINE_BASE_ADDRESS) \
+ - CE0_BASE_ADDRESS) / (CE1_BASE_ADDRESS - CE0_BASE_ADDRESS))
+
#define CE_SRC_RING_TO_DESC(baddr, idx) \
(&(((struct ce_desc *)baddr)[idx]))
@@ -355,14 +369,18 @@
(((x) & CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_MASK) >> \
CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_LSB)
#define CE_WRAPPER_INTERRUPT_SUMMARY_ADDRESS 0x0000
+#define CE_INTERRUPT_SUMMARY (GENMASK(CE_COUNT_MAX - 1, 0))
static inline u32 ath10k_ce_interrupt_summary(struct ath10k *ar)
{
struct ath10k_ce *ce = ath10k_ce_priv(ar);
- return CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_GET(
- ce->bus_ops->read32((ar), CE_WRAPPER_BASE_ADDRESS +
- CE_WRAPPER_INTERRUPT_SUMMARY_ADDRESS));
+ if (!ar->hw_params.per_ce_irq)
+ return CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_GET(
+ ce->bus_ops->read32((ar), CE_WRAPPER_BASE_ADDRESS +
+ CE_WRAPPER_INTERRUPT_SUMMARY_ADDRESS));
+ else
+ return CE_INTERRUPT_SUMMARY;
}
#endif /* _CE_H_ */
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 8a3020d..4cf54a7 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -90,6 +90,8 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA988X_HW_2_0_VERSION,
@@ -119,6 +121,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA9887_HW_1_0_VERSION,
@@ -148,6 +153,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA6174_HW_2_1_VERSION,
@@ -176,6 +184,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA6174_HW_2_1_VERSION,
@@ -204,6 +215,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA6174_HW_3_0_VERSION,
@@ -232,6 +246,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA6174_HW_3_2_VERSION,
@@ -263,6 +280,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA99X0_HW_2_0_DEV_VERSION,
@@ -297,6 +317,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA9984_HW_1_0_DEV_VERSION,
@@ -336,6 +359,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA9888_HW_2_0_DEV_VERSION,
@@ -374,6 +400,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA9377_HW_1_0_DEV_VERSION,
@@ -402,6 +431,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA9377_HW_1_1_DEV_VERSION,
@@ -432,6 +464,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = QCA4019_HW_1_0_DEV_VERSION,
@@ -467,6 +502,9 @@
.num_wds_entries = 0x20,
.target_64bit = false,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL,
+ .per_ce_irq = false,
+ .shadow_reg_support = false,
+ .rri_on_ddr = false,
},
{
.id = WCN3990_HW_1_0_DEV_VERSION,
@@ -487,6 +525,9 @@
.num_wds_entries = TARGET_HL_10_TLV_NUM_WDS_ENTRIES,
.target_64bit = true,
.rx_ring_fill_level = HTT_RX_RING_FILL_LEVEL_DUAL_MAC,
+ .per_ce_irq = true,
+ .shadow_reg_support = true,
+ .rri_on_ddr = true,
},
};
@@ -1253,14 +1294,61 @@
return ret;
}
+static int ath10k_core_search_bd(struct ath10k *ar,
+ const char *boardname,
+ const u8 *data,
+ size_t len)
+{
+ size_t ie_len;
+ struct ath10k_fw_ie *hdr;
+ int ret = -ENOENT, ie_id;
+
+ while (len > sizeof(struct ath10k_fw_ie)) {
+ hdr = (struct ath10k_fw_ie *)data;
+ ie_id = le32_to_cpu(hdr->id);
+ ie_len = le32_to_cpu(hdr->len);
+
+ len -= sizeof(*hdr);
+ data = hdr->data;
+
+ if (len < ALIGN(ie_len, 4)) {
+ ath10k_err(ar, "invalid length for board ie_id %d ie_len %zu len %zu\n",
+ ie_id, ie_len, len);
+ return -EINVAL;
+ }
+
+ switch (ie_id) {
+ case ATH10K_BD_IE_BOARD:
+ ret = ath10k_core_parse_bd_ie_board(ar, data, ie_len,
+ boardname);
+ if (ret == -ENOENT)
+ /* no match found, continue */
+ break;
+
+ /* either found or error, so stop searching */
+ goto out;
+ }
+
+ /* jump over the padding */
+ ie_len = ALIGN(ie_len, 4);
+
+ len -= ie_len;
+ data += ie_len;
+ }
+
+out:
+ /* return result of parse_bd_ie_board() or -ENOENT */
+ return ret;
+}
+
static int ath10k_core_fetch_board_data_api_n(struct ath10k *ar,
const char *boardname,
+ const char *fallback_boardname,
const char *filename)
{
- size_t len, magic_len, ie_len;
- struct ath10k_fw_ie *hdr;
+ size_t len, magic_len;
const u8 *data;
- int ret, ie_id;
+ int ret;
ar->normal_mode_fw.board = ath10k_fetch_fw_file(ar,
ar->hw_params.fw.dir,
@@ -1298,69 +1386,23 @@
data += magic_len;
len -= magic_len;
- while (len > sizeof(struct ath10k_fw_ie)) {
- hdr = (struct ath10k_fw_ie *)data;
- ie_id = le32_to_cpu(hdr->id);
- ie_len = le32_to_cpu(hdr->len);
+ /* attempt to find boardname in the IE list */
+ ret = ath10k_core_search_bd(ar, boardname, data, len);
- len -= sizeof(*hdr);
- data = hdr->data;
+ /* if we didn't find it and have a fallback name, try that */
+ if (ret == -ENOENT && fallback_boardname)
+ ret = ath10k_core_search_bd(ar, fallback_boardname, data, len);
- if (len < ALIGN(ie_len, 4)) {
- ath10k_err(ar, "invalid length for board ie_id %d ie_len %zu len %zu\n",
- ie_id, ie_len, len);
- ret = -EINVAL;
- goto err;
- }
-
- switch (ie_id) {
- case ATH10K_BD_IE_BOARD:
- ret = ath10k_core_parse_bd_ie_board(ar, data, ie_len,
- boardname);
- if (ret == -ENOENT && ar->id.bdf_ext[0] != '\0') {
- /* try default bdf if variant was not found */
- char *s, *v = ",variant=";
- char boardname2[100];
-
- strlcpy(boardname2, boardname,
- sizeof(boardname2));
-
- s = strstr(boardname2, v);
- if (s)
- *s = '\0'; /* strip ",variant=%s" */
-
- ret = ath10k_core_parse_bd_ie_board(ar, data,
- ie_len,
- boardname2);
- }
-
- if (ret == -ENOENT)
- /* no match found, continue */
- break;
- else if (ret)
- /* there was an error, bail out */
- goto err;
-
- /* board data found */
- goto out;
- }
-
- /* jump over the padding */
- ie_len = ALIGN(ie_len, 4);
-
- len -= ie_len;
- data += ie_len;
- }
-
-out:
- if (!ar->normal_mode_fw.board_data || !ar->normal_mode_fw.board_len) {
+ if (ret == -ENOENT) {
ath10k_err(ar,
"failed to fetch board data for %s from %s/%s\n",
boardname, ar->hw_params.fw.dir, filename);
ret = -ENODATA;
- goto err;
}
+ if (ret)
+ goto err;
+
return 0;
err:
@@ -1369,12 +1411,12 @@
}
static int ath10k_core_create_board_name(struct ath10k *ar, char *name,
- size_t name_len)
+ size_t name_len, bool with_variant)
{
/* strlen(',variant=') + strlen(ar->id.bdf_ext) */
char variant[9 + ATH10K_SMBIOS_BDF_EXT_STR_LENGTH] = { 0 };
- if (ar->id.bdf_ext[0] != '\0')
+ if (with_variant && ar->id.bdf_ext[0] != '\0')
scnprintf(variant, sizeof(variant), ",variant=%s",
ar->id.bdf_ext);
@@ -1400,17 +1442,26 @@
static int ath10k_core_fetch_board_file(struct ath10k *ar)
{
- char boardname[100];
+ char boardname[100], fallback_boardname[100];
int ret;
- ret = ath10k_core_create_board_name(ar, boardname, sizeof(boardname));
+ ret = ath10k_core_create_board_name(ar, boardname,
+ sizeof(boardname), true);
if (ret) {
ath10k_err(ar, "failed to create board name: %d", ret);
return ret;
}
+ ret = ath10k_core_create_board_name(ar, fallback_boardname,
+ sizeof(boardname), false);
+ if (ret) {
+ ath10k_err(ar, "failed to create fallback board name: %d", ret);
+ return ret;
+ }
+
ar->bd_api = 2;
ret = ath10k_core_fetch_board_data_api_n(ar, boardname,
+ fallback_boardname,
ATH10K_BOARD_API2_FILE);
if (!ret)
goto success;
@@ -2472,6 +2523,14 @@
ar->hw->wiphy->hw_version = target_info.version;
break;
case ATH10K_BUS_SNOC:
+ memset(&target_info, 0, sizeof(target_info));
+ ret = ath10k_hif_get_target_info(ar, &target_info);
+ if (ret) {
+ ath10k_err(ar, "could not get target info (%d)\n", ret);
+ goto err_power_down;
+ }
+ ar->target_version = target_info.version;
+ ar->hw->wiphy->hw_version = target_info.version;
break;
default:
ath10k_err(ar, "incorrect hif bus type: %d\n", ar->hif.bus);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index c17d805..e4ac8f2 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -52,6 +52,8 @@
/* Antenna noise floor */
#define ATH10K_DEFAULT_NOISE_FLOOR -95
+#define ATH10K_INVALID_RSSI 128
+
#define ATH10K_MAX_NUM_MGMT_PENDING 128
/* number of failed packets (20 packets with 16 sw reties each) */
diff --git a/drivers/net/wireless/ath/ath10k/hif.h b/drivers/net/wireless/ath/ath10k/hif.h
index 6da4e33..1a59ea0 100644
--- a/drivers/net/wireless/ath/ath10k/hif.h
+++ b/drivers/net/wireless/ath/ath10k/hif.h
@@ -20,13 +20,14 @@
#include <linux/kernel.h>
#include "core.h"
+#include "bmi.h"
#include "debug.h"
struct ath10k_hif_sg_item {
u16 transfer_id;
void *transfer_context; /* NULL = tx completion callback not called */
void *vaddr; /* for debugging mostly */
- u32 paddr;
+ dma_addr_t paddr;
u16 len;
};
@@ -93,6 +94,9 @@
/* fetch calibration data from target eeprom */
int (*fetch_cal_eeprom)(struct ath10k *ar, void **data,
size_t *data_len);
+
+ int (*get_target_info)(struct ath10k *ar,
+ struct bmi_target_info *target_info);
};
static inline int ath10k_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
@@ -218,4 +222,13 @@
return ar->hif.ops->fetch_cal_eeprom(ar, data, data_len);
}
+static inline int ath10k_hif_get_target_info(struct ath10k *ar,
+ struct bmi_target_info *tgt_info)
+{
+ if (!ar->hif.ops->get_target_info)
+ return -EOPNOTSUPP;
+
+ return ar->hif.ops->get_target_info(ar, tgt_info);
+}
+
#endif /* _HIF_H_ */
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index 492dc5b..8902720 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -542,8 +542,14 @@
return "NMI Data";
case ATH10K_HTC_SVC_ID_HTT_DATA_MSG:
return "HTT Data";
+ case ATH10K_HTC_SVC_ID_HTT_DATA2_MSG:
+ return "HTT Data";
+ case ATH10K_HTC_SVC_ID_HTT_DATA3_MSG:
+ return "HTT Data";
case ATH10K_HTC_SVC_ID_TEST_RAW_STREAMS:
return "RAW";
+ case ATH10K_HTC_SVC_ID_HTT_LOG_MSG:
+ return "PKTLOG";
}
return "Unknown";
diff --git a/drivers/net/wireless/ath/ath10k/htc.h b/drivers/net/wireless/ath/ath10k/htc.h
index a2f8814..3487759 100644
--- a/drivers/net/wireless/ath/ath10k/htc.h
+++ b/drivers/net/wireless/ath/ath10k/htc.h
@@ -248,6 +248,7 @@
ATH10K_HTC_SVC_GRP_WMI = 1,
ATH10K_HTC_SVC_GRP_NMI = 2,
ATH10K_HTC_SVC_GRP_HTT = 3,
+ ATH10K_LOG_SERVICE_GROUP = 6,
ATH10K_HTC_SVC_GRP_TEST = 254,
ATH10K_HTC_SVC_GRP_LAST = 255,
@@ -273,6 +274,9 @@
ATH10K_HTC_SVC_ID_HTT_DATA_MSG = SVC(ATH10K_HTC_SVC_GRP_HTT, 0),
+ ATH10K_HTC_SVC_ID_HTT_DATA2_MSG = SVC(ATH10K_HTC_SVC_GRP_HTT, 1),
+ ATH10K_HTC_SVC_ID_HTT_DATA3_MSG = SVC(ATH10K_HTC_SVC_GRP_HTT, 2),
+ ATH10K_HTC_SVC_ID_HTT_LOG_MSG = SVC(ATH10K_LOG_SERVICE_GROUP, 0),
/* raw stream service (i.e. flash, tcmd, calibration apps) */
ATH10K_HTC_SVC_ID_TEST_RAW_STREAMS = SVC(ATH10K_HTC_SVC_GRP_TEST, 0),
};
diff --git a/drivers/net/wireless/ath/ath10k/htt.c b/drivers/net/wireless/ath/ath10k/htt.c
index 625198d..21a67f8 100644
--- a/drivers/net/wireless/ath/ath10k/htt.c
+++ b/drivers/net/wireless/ath/ath10k/htt.c
@@ -257,11 +257,11 @@
return status;
}
- status = htt->tx_ops->htt_send_frag_desc_bank_cfg(htt);
+ status = ath10k_htt_send_frag_desc_bank_cfg(htt);
if (status)
return status;
- status = htt->tx_ops->htt_send_rx_ring_cfg(htt);
+ status = ath10k_htt_send_rx_ring_cfg(htt);
if (status) {
ath10k_warn(ar, "failed to setup rx ring: %d\n",
status);
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 8cc2a8b..5d3ff80 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1,6 +1,7 @@
/*
* Copyright (c) 2005-2011 Atheros Communications Inc.
* Copyright (c) 2011-2017 Qualcomm Atheros, Inc.
+ * Copyright (c) 2018, The Linux Foundation. All rights reserved.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
@@ -127,6 +128,19 @@
| HTT_MSDU_EXT_DESC_FLAG_TCP_IPV4_CSUM_ENABLE \
| HTT_MSDU_EXT_DESC_FLAG_TCP_IPV6_CSUM_ENABLE)
+#define HTT_MSDU_EXT_DESC_FLAG_IPV4_CSUM_ENABLE_64 BIT(16)
+#define HTT_MSDU_EXT_DESC_FLAG_UDP_IPV4_CSUM_ENABLE_64 BIT(17)
+#define HTT_MSDU_EXT_DESC_FLAG_UDP_IPV6_CSUM_ENABLE_64 BIT(18)
+#define HTT_MSDU_EXT_DESC_FLAG_TCP_IPV4_CSUM_ENABLE_64 BIT(19)
+#define HTT_MSDU_EXT_DESC_FLAG_TCP_IPV6_CSUM_ENABLE_64 BIT(20)
+#define HTT_MSDU_EXT_DESC_FLAG_PARTIAL_CSUM_ENABLE_64 BIT(21)
+
+#define HTT_MSDU_CHECKSUM_ENABLE_64 (HTT_MSDU_EXT_DESC_FLAG_IPV4_CSUM_ENABLE_64 \
+ | HTT_MSDU_EXT_DESC_FLAG_UDP_IPV4_CSUM_ENABLE_64 \
+ | HTT_MSDU_EXT_DESC_FLAG_UDP_IPV6_CSUM_ENABLE_64 \
+ | HTT_MSDU_EXT_DESC_FLAG_TCP_IPV4_CSUM_ENABLE_64 \
+ | HTT_MSDU_EXT_DESC_FLAG_TCP_IPV6_CSUM_ENABLE_64)
+
enum htt_data_tx_desc_flags0 {
HTT_DATA_TX_DESC_FLAGS0_MAC_HDR_PRESENT = 1 << 0,
HTT_DATA_TX_DESC_FLAGS0_NO_AGGR = 1 << 1,
@@ -533,12 +547,18 @@
u8 rsvd0;
} __packed;
+#define HTT_MGMT_TX_CMPL_FLAG_ACK_RSSI BIT(0)
+
+#define HTT_MGMT_TX_CMPL_INFO_ACK_RSSI_MASK GENMASK(7, 0)
+
struct htt_mgmt_tx_completion {
u8 rsvd0;
u8 rsvd1;
- u8 rsvd2;
+ u8 flags;
__le32 desc_id;
__le32 status;
+ __le32 ppdu_id;
+ __le32 info;
} __packed;
#define HTT_RX_INDICATION_INFO0_EXT_TID_MASK (0x1F)
@@ -1648,6 +1668,7 @@
struct htt_tx_done {
u16 msdu_id;
u16 status;
+ u8 ack_rssi;
};
enum htt_tx_compl_state {
@@ -1848,6 +1869,57 @@
void (*htt_free_txbuff)(struct ath10k_htt *htt);
};
+static inline int ath10k_htt_send_rx_ring_cfg(struct ath10k_htt *htt)
+{
+ if (!htt->tx_ops->htt_send_rx_ring_cfg)
+ return -EOPNOTSUPP;
+
+ return htt->tx_ops->htt_send_rx_ring_cfg(htt);
+}
+
+static inline int ath10k_htt_send_frag_desc_bank_cfg(struct ath10k_htt *htt)
+{
+ if (!htt->tx_ops->htt_send_frag_desc_bank_cfg)
+ return -EOPNOTSUPP;
+
+ return htt->tx_ops->htt_send_frag_desc_bank_cfg(htt);
+}
+
+static inline int ath10k_htt_alloc_frag_desc(struct ath10k_htt *htt)
+{
+ if (!htt->tx_ops->htt_alloc_frag_desc)
+ return -EOPNOTSUPP;
+
+ return htt->tx_ops->htt_alloc_frag_desc(htt);
+}
+
+static inline void ath10k_htt_free_frag_desc(struct ath10k_htt *htt)
+{
+ if (htt->tx_ops->htt_free_frag_desc)
+ htt->tx_ops->htt_free_frag_desc(htt);
+}
+
+static inline int ath10k_htt_tx(struct ath10k_htt *htt,
+ enum ath10k_hw_txrx_mode txmode,
+ struct sk_buff *msdu)
+{
+ return htt->tx_ops->htt_tx(htt, txmode, msdu);
+}
+
+static inline int ath10k_htt_alloc_txbuff(struct ath10k_htt *htt)
+{
+ if (!htt->tx_ops->htt_alloc_txbuff)
+ return -EOPNOTSUPP;
+
+ return htt->tx_ops->htt_alloc_txbuff(htt);
+}
+
+static inline void ath10k_htt_free_txbuff(struct ath10k_htt *htt)
+{
+ if (htt->tx_ops->htt_free_txbuff)
+ htt->tx_ops->htt_free_txbuff(htt);
+}
+
struct ath10k_htt_rx_ops {
size_t (*htt_get_rx_ring_size)(struct ath10k_htt *htt);
void (*htt_config_paddrs_ring)(struct ath10k_htt *htt, void *vaddr);
@@ -1857,6 +1929,43 @@
void (*htt_reset_paddrs_ring)(struct ath10k_htt *htt, int idx);
};
+static inline size_t ath10k_htt_get_rx_ring_size(struct ath10k_htt *htt)
+{
+ if (!htt->rx_ops->htt_get_rx_ring_size)
+ return 0;
+
+ return htt->rx_ops->htt_get_rx_ring_size(htt);
+}
+
+static inline void ath10k_htt_config_paddrs_ring(struct ath10k_htt *htt,
+ void *vaddr)
+{
+ if (htt->rx_ops->htt_config_paddrs_ring)
+ htt->rx_ops->htt_config_paddrs_ring(htt, vaddr);
+}
+
+static inline void ath10k_htt_set_paddrs_ring(struct ath10k_htt *htt,
+ dma_addr_t paddr,
+ int idx)
+{
+ if (htt->rx_ops->htt_set_paddrs_ring)
+ htt->rx_ops->htt_set_paddrs_ring(htt, paddr, idx);
+}
+
+static inline void *ath10k_htt_get_vaddr_ring(struct ath10k_htt *htt)
+{
+ if (!htt->rx_ops->htt_get_vaddr_ring)
+ return NULL;
+
+ return htt->rx_ops->htt_get_vaddr_ring(htt);
+}
+
+static inline void ath10k_htt_reset_paddrs_ring(struct ath10k_htt *htt, int idx)
+{
+ if (htt->rx_ops->htt_reset_paddrs_ring)
+ htt->rx_ops->htt_reset_paddrs_ring(htt, idx);
+}
+
#define RX_HTT_HDR_STATUS_LEN 64
/* This structure layout is programmed via rx ring setup
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 5e02e26..bd23f69 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -25,6 +25,7 @@
#include "mac.h"
#include <linux/log2.h>
+#include <linux/bitfield.h>
/* when under memory pressure rx ring refill may fail and needs a retry */
#define HTT_RX_RING_REFILL_RETRY_MS 50
@@ -181,7 +182,7 @@
rxcb = ATH10K_SKB_RXCB(skb);
rxcb->paddr = paddr;
htt->rx_ring.netbufs_ring[idx] = skb;
- htt->rx_ops->htt_set_paddrs_ring(htt, paddr, idx);
+ ath10k_htt_set_paddrs_ring(htt, paddr, idx);
htt->rx_ring.fill_cnt++;
if (htt->rx_ring.in_ord_rx) {
@@ -286,8 +287,8 @@
ath10k_htt_rx_ring_free(htt);
dma_free_coherent(htt->ar->dev,
- htt->rx_ops->htt_get_rx_ring_size(htt),
- htt->rx_ops->htt_get_vaddr_ring(htt),
+ ath10k_htt_get_rx_ring_size(htt),
+ ath10k_htt_get_vaddr_ring(htt),
htt->rx_ring.base_paddr);
dma_free_coherent(htt->ar->dev,
@@ -314,7 +315,7 @@
idx = htt->rx_ring.sw_rd_idx.msdu_payld;
msdu = htt->rx_ring.netbufs_ring[idx];
htt->rx_ring.netbufs_ring[idx] = NULL;
- htt->rx_ops->htt_reset_paddrs_ring(htt, idx);
+ ath10k_htt_reset_paddrs_ring(htt, idx);
idx++;
idx &= htt->rx_ring.size_mask;
@@ -586,13 +587,13 @@
if (!htt->rx_ring.netbufs_ring)
goto err_netbuf;
- size = htt->rx_ops->htt_get_rx_ring_size(htt);
+ size = ath10k_htt_get_rx_ring_size(htt);
vaddr_ring = dma_alloc_coherent(htt->ar->dev, size, &paddr, GFP_KERNEL);
if (!vaddr_ring)
goto err_dma_ring;
- htt->rx_ops->htt_config_paddrs_ring(htt, vaddr_ring);
+ ath10k_htt_config_paddrs_ring(htt, vaddr_ring);
htt->rx_ring.base_paddr = paddr;
vaddr = dma_alloc_coherent(htt->ar->dev,
@@ -626,7 +627,7 @@
err_dma_idx:
dma_free_coherent(htt->ar->dev,
- htt->rx_ops->htt_get_rx_ring_size(htt),
+ ath10k_htt_get_rx_ring_size(htt),
vaddr_ring,
htt->rx_ring.base_paddr);
err_dma_ring:
@@ -2719,12 +2720,21 @@
case HTT_T2H_MSG_TYPE_MGMT_TX_COMPLETION: {
struct htt_tx_done tx_done = {};
int status = __le32_to_cpu(resp->mgmt_tx_completion.status);
+ int info = __le32_to_cpu(resp->mgmt_tx_completion.info);
tx_done.msdu_id = __le32_to_cpu(resp->mgmt_tx_completion.desc_id);
switch (status) {
case HTT_MGMT_TX_STATUS_OK:
tx_done.status = HTT_TX_COMPL_STATE_ACK;
+ if (test_bit(WMI_SERVICE_HTT_MGMT_TX_COMP_VALID_FLAGS,
+ ar->wmi.svc_map) &&
+ (resp->mgmt_tx_completion.flags &
+ HTT_MGMT_TX_CMPL_FLAG_ACK_RSSI)) {
+ tx_done.ack_rssi =
+ FIELD_GET(HTT_MGMT_TX_CMPL_INFO_ACK_RSSI_MASK,
+ info);
+ }
break;
case HTT_MGMT_TX_STATUS_RETRY:
tx_done.status = HTT_TX_COMPL_STATE_NOACK;
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index d334b7b..5d8b97a 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -443,13 +443,13 @@
struct ath10k *ar = htt->ar;
int ret;
- ret = htt->tx_ops->htt_alloc_txbuff(htt);
+ ret = ath10k_htt_alloc_txbuff(htt);
if (ret) {
ath10k_err(ar, "failed to alloc cont tx buffer: %d\n", ret);
return ret;
}
- ret = htt->tx_ops->htt_alloc_frag_desc(htt);
+ ret = ath10k_htt_alloc_frag_desc(htt);
if (ret) {
ath10k_err(ar, "failed to alloc cont frag desc: %d\n", ret);
goto free_txbuf;
@@ -473,10 +473,10 @@
ath10k_htt_tx_free_txq(htt);
free_frag_desc:
- htt->tx_ops->htt_free_frag_desc(htt);
+ ath10k_htt_free_frag_desc(htt);
free_txbuf:
- htt->tx_ops->htt_free_txbuff(htt);
+ ath10k_htt_free_txbuff(htt);
return ret;
}
@@ -530,9 +530,9 @@
if (!htt->tx_mem_allocated)
return;
- htt->tx_ops->htt_free_txbuff(htt);
+ ath10k_htt_free_txbuff(htt);
ath10k_htt_tx_free_txq(htt);
- htt->tx_ops->htt_free_frag_desc(htt);
+ ath10k_htt_free_frag_desc(htt);
ath10k_htt_tx_free_txdone_fifo(htt);
htt->tx_mem_allocated = false;
}
@@ -1475,8 +1475,11 @@
!test_bit(ATH10K_FLAG_RAW_MODE, &ar->dev_flags)) {
flags1 |= HTT_DATA_TX_DESC_FLAGS1_CKSUM_L3_OFFLOAD;
flags1 |= HTT_DATA_TX_DESC_FLAGS1_CKSUM_L4_OFFLOAD;
- if (ar->hw_params.continuous_frag_desc)
- ext_desc->flags |= HTT_MSDU_CHECKSUM_ENABLE;
+ if (ar->hw_params.continuous_frag_desc) {
+ memset(ext_desc->tso_flag, 0, sizeof(ext_desc->tso_flag));
+ ext_desc->tso_flag[3] |=
+ __cpu_to_le32(HTT_MSDU_CHECKSUM_ENABLE_64);
+ }
}
/* Prevent firmware from sending up tx inspection requests. There's
diff --git a/drivers/net/wireless/ath/ath10k/hw.c b/drivers/net/wireless/ath/ath10k/hw.c
index 497ac33..677535b 100644
--- a/drivers/net/wireless/ath/ath10k/hw.c
+++ b/drivers/net/wireless/ath/ath10k/hw.c
@@ -310,6 +310,12 @@
.wm_high = &wcn3990_dst_wm_high,
};
+static struct ath10k_hw_ce_ctrl1_upd wcn3990_ctrl1_upd = {
+ .shift = 19,
+ .mask = 0x00080000,
+ .enable = 0x00000000,
+};
+
const struct ath10k_hw_ce_regs wcn3990_ce_regs = {
.sr_base_addr = 0x00000000,
.sr_size_addr = 0x00000008,
@@ -320,8 +326,6 @@
.dst_wr_index_addr = 0x00000040,
.current_srri_addr = 0x00000044,
.current_drri_addr = 0x00000048,
- .ddr_addr_for_rri_low = 0x00000004,
- .ddr_addr_for_rri_high = 0x00000008,
.ce_rri_low = 0x0024C004,
.ce_rri_high = 0x0024C008,
.host_ie_addr = 0x0000002c,
@@ -331,6 +335,7 @@
.misc_regs = &wcn3990_misc_reg,
.wm_srcr = &wcn3990_wm_src_ring,
.wm_dstr = &wcn3990_wm_dst_ring,
+ .upd = &wcn3990_ctrl1_upd,
};
const struct ath10k_hw_values wcn3990_values = {
diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 413b1b4..b8bdabe 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -1,6 +1,7 @@
/*
* Copyright (c) 2005-2011 Atheros Communications Inc.
* Copyright (c) 2011-2017 Qualcomm Atheros, Inc.
+ * Copyright (c) 2018 The Linux Foundation. All rights reserved.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
@@ -131,7 +132,7 @@
/* WCN3990 1.0 definitions */
#define WCN3990_HW_1_0_DEV_VERSION ATH10K_HW_WCN3990
-#define WCN3990_HW_1_0_FW_DIR ATH10K_FW_DIR "/WCN3990/hw3.0"
+#define WCN3990_HW_1_0_FW_DIR ATH10K_FW_DIR "/WCN3990/hw1.0"
#define ATH10K_FW_FILE_BASE "firmware"
#define ATH10K_FW_API_MAX 6
@@ -335,6 +336,12 @@
struct ath10k_hw_ce_regs_addr_map *wm_low;
struct ath10k_hw_ce_regs_addr_map *wm_high; };
+struct ath10k_hw_ce_ctrl1_upd {
+ u32 shift;
+ u32 mask;
+ u32 enable;
+};
+
struct ath10k_hw_ce_regs {
u32 sr_base_addr;
u32 sr_size_addr;
@@ -357,7 +364,9 @@
struct ath10k_hw_ce_cmd_halt *cmd_halt;
struct ath10k_hw_ce_host_ie *host_ie;
struct ath10k_hw_ce_dst_src_wm_regs *wm_srcr;
- struct ath10k_hw_ce_dst_src_wm_regs *wm_dstr; };
+ struct ath10k_hw_ce_dst_src_wm_regs *wm_dstr;
+ struct ath10k_hw_ce_ctrl1_upd *upd;
+};
struct ath10k_hw_values {
u32 rtc_state_val_on;
@@ -568,6 +577,15 @@
/* Target rx ring fill level */
u32 rx_ring_fill_level;
+
+ /* target supporting per ce IRQ */
+ bool per_ce_irq;
+
+ /* target supporting shadow register for ce write */
+ bool shadow_reg_support;
+
+ /* target supporting retention restore on ddr */
+ bool rri_on_ddr;
};
struct htt_rx_desc;
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index bf05a36..3d7119a 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3598,7 +3598,7 @@
switch (txpath) {
case ATH10K_MAC_TX_HTT:
- ret = htt->tx_ops->htt_tx(htt, txmode, skb);
+ ret = ath10k_htt_tx(htt, txmode, skb);
break;
case ATH10K_MAC_TX_HTT_MGMT:
ret = ath10k_htt_mgmt_tx(htt, skb);
@@ -4679,6 +4679,13 @@
}
}
+ param = ar->wmi.pdev_param->idle_ps_config;
+ ret = ath10k_wmi_pdev_set_param(ar, param, 1);
+ if (ret && ret != -EOPNOTSUPP) {
+ ath10k_warn(ar, "failed to enable idle_ps_config: %d\n", ret);
+ goto err_core_stop;
+ }
+
__ath10k_set_antenna(ar, ar->cfg_tx_chainmask, ar->cfg_rx_chainmask);
/*
@@ -5717,6 +5724,12 @@
arg.scan_ctrl_flags |= WMI_SCAN_FLAG_PASSIVE;
}
+ if (req->flags & NL80211_SCAN_FLAG_RANDOM_ADDR) {
+ arg.scan_ctrl_flags |= WMI_SCAN_ADD_SPOOFED_MAC_IN_PROBE_REQ;
+ ether_addr_copy(arg.mac_addr.addr, req->mac_addr);
+ ether_addr_copy(arg.mac_mask.addr, req->mac_addr_mask);
+ }
+
if (req->n_channels) {
arg.n_channels = req->n_channels;
for (i = 0; i < arg.n_channels; i++)
@@ -8433,6 +8446,17 @@
goto err_dfs_detector_exit;
}
+ if (test_bit(WMI_SERVICE_SPOOF_MAC_SUPPORT, ar->wmi.svc_map)) {
+ ret = ath10k_wmi_scan_prob_req_oui(ar, ar->mac_addr);
+ if (ret) {
+ ath10k_err(ar, "failed to set prob req oui: %i\n", ret);
+ goto err_dfs_detector_exit;
+ }
+
+ ar->hw->wiphy->features |=
+ NL80211_FEATURE_SCAN_RANDOM_MAC_ADDR;
+ }
+
ar->hw->wiphy->cipher_suites = cipher_suites;
/* QCA988x and QCA6174 family chips do not support CCMP-256, GCMP-128
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index fd1566c..af2cf55 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1383,8 +1383,8 @@
for (i = 0; i < n_items - 1; i++) {
ath10k_dbg(ar, ATH10K_DBG_PCI,
- "pci tx item %d paddr 0x%08x len %d n_items %d\n",
- i, items[i].paddr, items[i].len, n_items);
+ "pci tx item %d paddr %pad len %d n_items %d\n",
+ i, &items[i].paddr, items[i].len, n_items);
ath10k_dbg_dump(ar, ATH10K_DBG_PCI_DUMP, NULL, "pci tx data: ",
items[i].vaddr, items[i].len);
@@ -1401,8 +1401,8 @@
/* `i` is equal to `n_items -1` after for() */
ath10k_dbg(ar, ATH10K_DBG_PCI,
- "pci tx item %d paddr 0x%08x len %d n_items %d\n",
- i, items[i].paddr, items[i].len, n_items);
+ "pci tx item %d paddr %pad len %d n_items %d\n",
+ i, &items[i].paddr, items[i].len, n_items);
ath10k_dbg_dump(ar, ATH10K_DBG_PCI_DUMP, NULL, "pci tx data: ",
items[i].vaddr, items[i].len);
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c
index 03a69e5..2d04c54 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -1957,25 +1957,25 @@
ar_sdio = ath10k_sdio_priv(ar);
ar_sdio->irq_data.irq_proc_reg =
- kzalloc(sizeof(struct ath10k_sdio_irq_proc_regs),
- GFP_KERNEL);
+ devm_kzalloc(ar->dev, sizeof(struct ath10k_sdio_irq_proc_regs),
+ GFP_KERNEL);
if (!ar_sdio->irq_data.irq_proc_reg) {
ret = -ENOMEM;
goto err_core_destroy;
}
ar_sdio->irq_data.irq_en_reg =
- kzalloc(sizeof(struct ath10k_sdio_irq_enable_regs),
- GFP_KERNEL);
+ devm_kzalloc(ar->dev, sizeof(struct ath10k_sdio_irq_enable_regs),
+ GFP_KERNEL);
if (!ar_sdio->irq_data.irq_en_reg) {
ret = -ENOMEM;
- goto err_free_proc_reg;
+ goto err_core_destroy;
}
- ar_sdio->bmi_buf = kzalloc(BMI_MAX_CMDBUF_SIZE, GFP_KERNEL);
+ ar_sdio->bmi_buf = devm_kzalloc(ar->dev, BMI_MAX_CMDBUF_SIZE, GFP_KERNEL);
if (!ar_sdio->bmi_buf) {
ret = -ENOMEM;
- goto err_free_en_reg;
+ goto err_core_destroy;
}
ar_sdio->func = func;
@@ -1995,7 +1995,7 @@
ar_sdio->workqueue = create_singlethread_workqueue("ath10k_sdio_wq");
if (!ar_sdio->workqueue) {
ret = -ENOMEM;
- goto err_free_bmi_buf;
+ goto err_core_destroy;
}
for (i = 0; i < ATH10K_SDIO_BUS_REQUEST_MAX_NUM; i++)
@@ -2011,7 +2011,7 @@
ret = -ENODEV;
ath10k_err(ar, "unsupported device id %u (0x%x)\n",
dev_id_base, id->device);
- goto err_free_bmi_buf;
+ goto err_core_destroy;
}
ar->id.vendor = id->vendor;
@@ -2040,12 +2040,6 @@
err_free_wq:
destroy_workqueue(ar_sdio->workqueue);
-err_free_bmi_buf:
- kfree(ar_sdio->bmi_buf);
-err_free_en_reg:
- kfree(ar_sdio->irq_data.irq_en_reg);
-err_free_proc_reg:
- kfree(ar_sdio->irq_data.irq_proc_reg);
err_core_destroy:
ath10k_core_destroy(ar);
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
new file mode 100644
index 0000000..47a4d2a
--- /dev/null
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -0,0 +1,1414 @@
+/*
+ * Copyright (c) 2018 The Linux Foundation. All rights reserved.
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include "debug.h"
+#include "hif.h"
+#include "htc.h"
+#include "ce.h"
+#include "snoc.h"
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
+#include <linux/regulator/consumer.h>
+#include <linux/clk.h>
+#define WCN3990_CE_ATTR_FLAGS 0
+#define ATH10K_SNOC_RX_POST_RETRY_MS 50
+#define CE_POLL_PIPE 4
+
+static char *const ce_name[] = {
+ "WLAN_CE_0",
+ "WLAN_CE_1",
+ "WLAN_CE_2",
+ "WLAN_CE_3",
+ "WLAN_CE_4",
+ "WLAN_CE_5",
+ "WLAN_CE_6",
+ "WLAN_CE_7",
+ "WLAN_CE_8",
+ "WLAN_CE_9",
+ "WLAN_CE_10",
+ "WLAN_CE_11",
+};
+
+static struct ath10k_wcn3990_vreg_info vreg_cfg[] = {
+ {NULL, "vdd-0.8-cx-mx", 800000, 800000, 0, 0, false},
+ {NULL, "vdd-1.8-xo", 1800000, 1800000, 0, 0, false},
+ {NULL, "vdd-1.3-rfa", 1304000, 1304000, 0, 0, false},
+ {NULL, "vdd-3.3-ch0", 3312000, 3312000, 0, 0, false},
+};
+
+static struct ath10k_wcn3990_clk_info clk_cfg[] = {
+ {NULL, "cxo_ref_clk_pin", 0, false},
+};
+
+static void ath10k_snoc_htc_tx_cb(struct ath10k_ce_pipe *ce_state);
+static void ath10k_snoc_htt_tx_cb(struct ath10k_ce_pipe *ce_state);
+static void ath10k_snoc_htc_rx_cb(struct ath10k_ce_pipe *ce_state);
+static void ath10k_snoc_htt_rx_cb(struct ath10k_ce_pipe *ce_state);
+static void ath10k_snoc_htt_htc_rx_cb(struct ath10k_ce_pipe *ce_state);
+
+static const struct ath10k_snoc_drv_priv drv_priv = {
+ .hw_rev = ATH10K_HW_WCN3990,
+ .dma_mask = DMA_BIT_MASK(37),
+};
+
+static struct ce_attr host_ce_config_wlan[] = {
+ /* CE0: host->target HTC control streams */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 16,
+ .src_sz_max = 2048,
+ .dest_nentries = 0,
+ .send_cb = ath10k_snoc_htc_tx_cb,
+ },
+
+ /* CE1: target->host HTT + HTC control */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 2048,
+ .dest_nentries = 512,
+ .recv_cb = ath10k_snoc_htt_htc_rx_cb,
+ },
+
+ /* CE2: target->host WMI */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 2048,
+ .dest_nentries = 64,
+ .recv_cb = ath10k_snoc_htc_rx_cb,
+ },
+
+ /* CE3: host->target WMI */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 32,
+ .src_sz_max = 2048,
+ .dest_nentries = 0,
+ .send_cb = ath10k_snoc_htc_tx_cb,
+ },
+
+ /* CE4: host->target HTT */
+ {
+ .flags = CE_ATTR_FLAGS | CE_ATTR_DIS_INTR,
+ .src_nentries = 256,
+ .src_sz_max = 256,
+ .dest_nentries = 0,
+ .send_cb = ath10k_snoc_htt_tx_cb,
+ },
+
+ /* CE5: target->host HTT (ipa_uc->target ) */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 512,
+ .dest_nentries = 512,
+ .recv_cb = ath10k_snoc_htt_rx_cb,
+ },
+
+ /* CE6: target autonomous hif_memcpy */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 0,
+ .dest_nentries = 0,
+ },
+
+ /* CE7: ce_diag, the Diagnostic Window */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 2,
+ .src_sz_max = 2048,
+ .dest_nentries = 2,
+ },
+
+ /* CE8: Target to uMC */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 2048,
+ .dest_nentries = 128,
+ },
+
+ /* CE9 target->host HTT */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 2048,
+ .dest_nentries = 512,
+ .recv_cb = ath10k_snoc_htt_htc_rx_cb,
+ },
+
+ /* CE10: target->host HTT */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 2048,
+ .dest_nentries = 512,
+ .recv_cb = ath10k_snoc_htt_htc_rx_cb,
+ },
+
+ /* CE11: target -> host PKTLOG */
+ {
+ .flags = CE_ATTR_FLAGS,
+ .src_nentries = 0,
+ .src_sz_max = 2048,
+ .dest_nentries = 512,
+ .recv_cb = ath10k_snoc_htt_htc_rx_cb,
+ },
+};
+
+static struct service_to_pipe target_service_to_ce_map_wlan[] = {
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_VO),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(3),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_VO),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(2),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_BK),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(3),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_BK),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(2),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_BE),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(3),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_BE),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(2),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_VI),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(3),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_DATA_VI),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(2),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_CONTROL),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(3),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_WMI_CONTROL),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(2),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_RSVD_CTRL),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(0),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_RSVD_CTRL),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(2),
+ },
+ { /* not used */
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_TEST_RAW_STREAMS),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(0),
+ },
+ { /* not used */
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_TEST_RAW_STREAMS),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(2),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_HTT_DATA_MSG),
+ __cpu_to_le32(PIPEDIR_OUT), /* out = UL = host -> target */
+ __cpu_to_le32(4),
+ },
+ {
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_HTT_DATA_MSG),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(1),
+ },
+ { /* not used */
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_TEST_RAW_STREAMS),
+ __cpu_to_le32(PIPEDIR_OUT),
+ __cpu_to_le32(5),
+ },
+ { /* in = DL = target -> host */
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_HTT_DATA2_MSG),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(9),
+ },
+ { /* in = DL = target -> host */
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_HTT_DATA3_MSG),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(10),
+ },
+ { /* in = DL = target -> host pktlog */
+ __cpu_to_le32(ATH10K_HTC_SVC_ID_HTT_LOG_MSG),
+ __cpu_to_le32(PIPEDIR_IN), /* in = DL = target -> host */
+ __cpu_to_le32(11),
+ },
+ /* (Additions here) */
+
+ { /* must be last */
+ __cpu_to_le32(0),
+ __cpu_to_le32(0),
+ __cpu_to_le32(0),
+ },
+};
+
+void ath10k_snoc_write32(struct ath10k *ar, u32 offset, u32 value)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+
+ iowrite32(value, ar_snoc->mem + offset);
+}
+
+u32 ath10k_snoc_read32(struct ath10k *ar, u32 offset)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ u32 val;
+
+ val = ioread32(ar_snoc->mem + offset);
+
+ return val;
+}
+
+static int __ath10k_snoc_rx_post_buf(struct ath10k_snoc_pipe *pipe)
+{
+ struct ath10k_ce_pipe *ce_pipe = pipe->ce_hdl;
+ struct ath10k *ar = pipe->hif_ce_state;
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+ struct sk_buff *skb;
+ dma_addr_t paddr;
+ int ret;
+
+ skb = dev_alloc_skb(pipe->buf_sz);
+ if (!skb)
+ return -ENOMEM;
+
+ WARN_ONCE((unsigned long)skb->data & 3, "unaligned skb");
+
+ paddr = dma_map_single(ar->dev, skb->data,
+ skb->len + skb_tailroom(skb),
+ DMA_FROM_DEVICE);
+ if (unlikely(dma_mapping_error(ar->dev, paddr))) {
+ ath10k_warn(ar, "failed to dma map snoc rx buf\n");
+ dev_kfree_skb_any(skb);
+ return -EIO;
+ }
+
+ ATH10K_SKB_RXCB(skb)->paddr = paddr;
+
+ spin_lock_bh(&ce->ce_lock);
+ ret = ce_pipe->ops->ce_rx_post_buf(ce_pipe, skb, paddr);
+ spin_unlock_bh(&ce->ce_lock);
+ if (ret) {
+ dma_unmap_single(ar->dev, paddr, skb->len + skb_tailroom(skb),
+ DMA_FROM_DEVICE);
+ dev_kfree_skb_any(skb);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void ath10k_snoc_rx_post_pipe(struct ath10k_snoc_pipe *pipe)
+{
+ struct ath10k *ar = pipe->hif_ce_state;
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_ce_pipe *ce_pipe = pipe->ce_hdl;
+ int ret, num;
+
+ if (pipe->buf_sz == 0)
+ return;
+
+ if (!ce_pipe->dest_ring)
+ return;
+
+ spin_lock_bh(&ce->ce_lock);
+ num = __ath10k_ce_rx_num_free_bufs(ce_pipe);
+ spin_unlock_bh(&ce->ce_lock);
+ while (num--) {
+ ret = __ath10k_snoc_rx_post_buf(pipe);
+ if (ret) {
+ if (ret == -ENOSPC)
+ break;
+ ath10k_warn(ar, "failed to post rx buf: %d\n", ret);
+ mod_timer(&ar_snoc->rx_post_retry, jiffies +
+ ATH10K_SNOC_RX_POST_RETRY_MS);
+ break;
+ }
+ }
+}
+
+static void ath10k_snoc_rx_post(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ int i;
+
+ for (i = 0; i < CE_COUNT; i++)
+ ath10k_snoc_rx_post_pipe(&ar_snoc->pipe_info[i]);
+}
+
+static void ath10k_snoc_process_rx_cb(struct ath10k_ce_pipe *ce_state,
+ void (*callback)(struct ath10k *ar,
+ struct sk_buff *skb))
+{
+ struct ath10k *ar = ce_state->ar;
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_snoc_pipe *pipe_info = &ar_snoc->pipe_info[ce_state->id];
+ struct sk_buff *skb;
+ struct sk_buff_head list;
+ void *transfer_context;
+ unsigned int nbytes, max_nbytes;
+
+ __skb_queue_head_init(&list);
+ while (ath10k_ce_completed_recv_next(ce_state, &transfer_context,
+ &nbytes) == 0) {
+ skb = transfer_context;
+ max_nbytes = skb->len + skb_tailroom(skb);
+ dma_unmap_single(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
+ max_nbytes, DMA_FROM_DEVICE);
+
+ if (unlikely(max_nbytes < nbytes)) {
+ ath10k_warn(ar, "rxed more than expected (nbytes %d, max %d)",
+ nbytes, max_nbytes);
+ dev_kfree_skb_any(skb);
+ continue;
+ }
+
+ skb_put(skb, nbytes);
+ __skb_queue_tail(&list, skb);
+ }
+
+ while ((skb = __skb_dequeue(&list))) {
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc rx ce pipe %d len %d\n",
+ ce_state->id, skb->len);
+
+ callback(ar, skb);
+ }
+
+ ath10k_snoc_rx_post_pipe(pipe_info);
+}
+
+static void ath10k_snoc_htc_rx_cb(struct ath10k_ce_pipe *ce_state)
+{
+ ath10k_snoc_process_rx_cb(ce_state, ath10k_htc_rx_completion_handler);
+}
+
+static void ath10k_snoc_htt_htc_rx_cb(struct ath10k_ce_pipe *ce_state)
+{
+ /* CE4 polling needs to be done whenever CE pipe which transports
+ * HTT Rx (target->host) is processed.
+ */
+ ath10k_ce_per_engine_service(ce_state->ar, CE_POLL_PIPE);
+
+ ath10k_snoc_process_rx_cb(ce_state, ath10k_htc_rx_completion_handler);
+}
+
+static void ath10k_snoc_htt_rx_deliver(struct ath10k *ar, struct sk_buff *skb)
+{
+ skb_pull(skb, sizeof(struct ath10k_htc_hdr));
+ ath10k_htt_t2h_msg_handler(ar, skb);
+}
+
+static void ath10k_snoc_htt_rx_cb(struct ath10k_ce_pipe *ce_state)
+{
+ ath10k_ce_per_engine_service(ce_state->ar, CE_POLL_PIPE);
+ ath10k_snoc_process_rx_cb(ce_state, ath10k_snoc_htt_rx_deliver);
+}
+
+static void ath10k_snoc_rx_replenish_retry(struct timer_list *t)
+{
+ struct ath10k_pci *ar_snoc = from_timer(ar_snoc, t, rx_post_retry);
+ struct ath10k *ar = ar_snoc->ar;
+
+ ath10k_snoc_rx_post(ar);
+}
+
+static void ath10k_snoc_htc_tx_cb(struct ath10k_ce_pipe *ce_state)
+{
+ struct ath10k *ar = ce_state->ar;
+ struct sk_buff_head list;
+ struct sk_buff *skb;
+
+ __skb_queue_head_init(&list);
+ while (ath10k_ce_completed_send_next(ce_state, (void **)&skb) == 0) {
+ if (!skb)
+ continue;
+
+ __skb_queue_tail(&list, skb);
+ }
+
+ while ((skb = __skb_dequeue(&list)))
+ ath10k_htc_tx_completion_handler(ar, skb);
+}
+
+static void ath10k_snoc_htt_tx_cb(struct ath10k_ce_pipe *ce_state)
+{
+ struct ath10k *ar = ce_state->ar;
+ struct sk_buff *skb;
+
+ while (ath10k_ce_completed_send_next(ce_state, (void **)&skb) == 0) {
+ if (!skb)
+ continue;
+
+ dma_unmap_single(ar->dev, ATH10K_SKB_CB(skb)->paddr,
+ skb->len, DMA_TO_DEVICE);
+ ath10k_htt_hif_tx_complete(ar, skb);
+ }
+}
+
+static int ath10k_snoc_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
+ struct ath10k_hif_sg_item *items, int n_items)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+ struct ath10k_snoc_pipe *snoc_pipe;
+ struct ath10k_ce_pipe *ce_pipe;
+ int err, i = 0;
+
+ snoc_pipe = &ar_snoc->pipe_info[pipe_id];
+ ce_pipe = snoc_pipe->ce_hdl;
+ spin_lock_bh(&ce->ce_lock);
+
+ for (i = 0; i < n_items - 1; i++) {
+ ath10k_dbg(ar, ATH10K_DBG_SNOC,
+ "snoc tx item %d paddr %pad len %d n_items %d\n",
+ i, &items[i].paddr, items[i].len, n_items);
+
+ err = ath10k_ce_send_nolock(ce_pipe,
+ items[i].transfer_context,
+ items[i].paddr,
+ items[i].len,
+ items[i].transfer_id,
+ CE_SEND_FLAG_GATHER);
+ if (err)
+ goto err;
+ }
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC,
+ "snoc tx item %d paddr %pad len %d n_items %d\n",
+ i, &items[i].paddr, items[i].len, n_items);
+
+ err = ath10k_ce_send_nolock(ce_pipe,
+ items[i].transfer_context,
+ items[i].paddr,
+ items[i].len,
+ items[i].transfer_id,
+ 0);
+ if (err)
+ goto err;
+
+ spin_unlock_bh(&ce->ce_lock);
+
+ return 0;
+
+err:
+ for (; i > 0; i--)
+ __ath10k_ce_send_revert(ce_pipe);
+
+ spin_unlock_bh(&ce->ce_lock);
+ return err;
+}
+
+static int ath10k_snoc_hif_get_target_info(struct ath10k *ar,
+ struct bmi_target_info *target_info)
+{
+ target_info->version = ATH10K_HW_WCN3990;
+ target_info->type = ATH10K_HW_WCN3990;
+
+ return 0;
+}
+
+static u16 ath10k_snoc_hif_get_free_queue_number(struct ath10k *ar, u8 pipe)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "hif get free queue number\n");
+
+ return ath10k_ce_num_free_src_entries(ar_snoc->pipe_info[pipe].ce_hdl);
+}
+
+static void ath10k_snoc_hif_send_complete_check(struct ath10k *ar, u8 pipe,
+ int force)
+{
+ int resources;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc hif send complete check\n");
+
+ if (!force) {
+ resources = ath10k_snoc_hif_get_free_queue_number(ar, pipe);
+
+ if (resources > (host_ce_config_wlan[pipe].src_nentries >> 1))
+ return;
+ }
+ ath10k_ce_per_engine_service(ar, pipe);
+}
+
+static int ath10k_snoc_hif_map_service_to_pipe(struct ath10k *ar,
+ u16 service_id,
+ u8 *ul_pipe, u8 *dl_pipe)
+{
+ const struct service_to_pipe *entry;
+ bool ul_set = false, dl_set = false;
+ int i;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc hif map service\n");
+
+ for (i = 0; i < ARRAY_SIZE(target_service_to_ce_map_wlan); i++) {
+ entry = &target_service_to_ce_map_wlan[i];
+
+ if (__le32_to_cpu(entry->service_id) != service_id)
+ continue;
+
+ switch (__le32_to_cpu(entry->pipedir)) {
+ case PIPEDIR_NONE:
+ break;
+ case PIPEDIR_IN:
+ WARN_ON(dl_set);
+ *dl_pipe = __le32_to_cpu(entry->pipenum);
+ dl_set = true;
+ break;
+ case PIPEDIR_OUT:
+ WARN_ON(ul_set);
+ *ul_pipe = __le32_to_cpu(entry->pipenum);
+ ul_set = true;
+ break;
+ case PIPEDIR_INOUT:
+ WARN_ON(dl_set);
+ WARN_ON(ul_set);
+ *dl_pipe = __le32_to_cpu(entry->pipenum);
+ *ul_pipe = __le32_to_cpu(entry->pipenum);
+ dl_set = true;
+ ul_set = true;
+ break;
+ }
+ }
+
+ if (WARN_ON(!ul_set || !dl_set))
+ return -ENOENT;
+
+ return 0;
+}
+
+static void ath10k_snoc_hif_get_default_pipe(struct ath10k *ar,
+ u8 *ul_pipe, u8 *dl_pipe)
+{
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc hif get default pipe\n");
+
+ (void)ath10k_snoc_hif_map_service_to_pipe(ar,
+ ATH10K_HTC_SVC_ID_RSVD_CTRL,
+ ul_pipe, dl_pipe);
+}
+
+static inline void ath10k_snoc_irq_disable(struct ath10k *ar)
+{
+ ath10k_ce_disable_interrupts(ar);
+}
+
+static inline void ath10k_snoc_irq_enable(struct ath10k *ar)
+{
+ ath10k_ce_enable_interrupts(ar);
+}
+
+static void ath10k_snoc_rx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
+{
+ struct ath10k_ce_pipe *ce_pipe;
+ struct ath10k_ce_ring *ce_ring;
+ struct sk_buff *skb;
+ struct ath10k *ar;
+ int i;
+
+ ar = snoc_pipe->hif_ce_state;
+ ce_pipe = snoc_pipe->ce_hdl;
+ ce_ring = ce_pipe->dest_ring;
+
+ if (!ce_ring)
+ return;
+
+ if (!snoc_pipe->buf_sz)
+ return;
+
+ for (i = 0; i < ce_ring->nentries; i++) {
+ skb = ce_ring->per_transfer_context[i];
+ if (!skb)
+ continue;
+
+ ce_ring->per_transfer_context[i] = NULL;
+
+ dma_unmap_single(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
+ skb->len + skb_tailroom(skb),
+ DMA_FROM_DEVICE);
+ dev_kfree_skb_any(skb);
+ }
+}
+
+static void ath10k_snoc_tx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
+{
+ struct ath10k_ce_pipe *ce_pipe;
+ struct ath10k_ce_ring *ce_ring;
+ struct ath10k_snoc *ar_snoc;
+ struct sk_buff *skb;
+ struct ath10k *ar;
+ int i;
+
+ ar = snoc_pipe->hif_ce_state;
+ ar_snoc = ath10k_snoc_priv(ar);
+ ce_pipe = snoc_pipe->ce_hdl;
+ ce_ring = ce_pipe->src_ring;
+
+ if (!ce_ring)
+ return;
+
+ if (!snoc_pipe->buf_sz)
+ return;
+
+ for (i = 0; i < ce_ring->nentries; i++) {
+ skb = ce_ring->per_transfer_context[i];
+ if (!skb)
+ continue;
+
+ ce_ring->per_transfer_context[i] = NULL;
+
+ ath10k_htc_tx_completion_handler(ar, skb);
+ }
+}
+
+static void ath10k_snoc_buffer_cleanup(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_snoc_pipe *pipe_info;
+ int pipe_num;
+
+ del_timer_sync(&ar_snoc->rx_post_retry);
+ for (pipe_num = 0; pipe_num < CE_COUNT; pipe_num++) {
+ pipe_info = &ar_snoc->pipe_info[pipe_num];
+ ath10k_snoc_rx_pipe_cleanup(pipe_info);
+ ath10k_snoc_tx_pipe_cleanup(pipe_info);
+ }
+}
+
+static void ath10k_snoc_hif_stop(struct ath10k *ar)
+{
+ ath10k_snoc_irq_disable(ar);
+ ath10k_snoc_buffer_cleanup(ar);
+ napi_synchronize(&ar->napi);
+ napi_disable(&ar->napi);
+ ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif stop\n");
+}
+
+static int ath10k_snoc_hif_start(struct ath10k *ar)
+{
+ ath10k_snoc_irq_enable(ar);
+ ath10k_snoc_rx_post(ar);
+
+ ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif start\n");
+
+ return 0;
+}
+
+static int ath10k_snoc_init_pipes(struct ath10k *ar)
+{
+ int i, ret;
+
+ for (i = 0; i < CE_COUNT; i++) {
+ ret = ath10k_ce_init_pipe(ar, i, &host_ce_config_wlan[i]);
+ if (ret) {
+ ath10k_err(ar, "failed to initialize copy engine pipe %d: %d\n",
+ i, ret);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static int ath10k_snoc_wlan_enable(struct ath10k *ar)
+{
+ return 0;
+}
+
+static void ath10k_snoc_wlan_disable(struct ath10k *ar)
+{
+}
+
+static void ath10k_snoc_hif_power_down(struct ath10k *ar)
+{
+ ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif power down\n");
+
+ ath10k_snoc_wlan_disable(ar);
+ ath10k_ce_free_rri(ar);
+}
+
+static int ath10k_snoc_hif_power_up(struct ath10k *ar)
+{
+ int ret;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "%s:WCN3990 driver state = %d\n",
+ __func__, ar->state);
+
+ ret = ath10k_snoc_wlan_enable(ar);
+ if (ret) {
+ ath10k_err(ar, "failed to enable wcn3990: %d\n", ret);
+ return ret;
+ }
+
+ ath10k_ce_alloc_rri(ar);
+
+ ret = ath10k_snoc_init_pipes(ar);
+ if (ret) {
+ ath10k_err(ar, "failed to initialize CE: %d\n", ret);
+ goto err_wlan_enable;
+ }
+
+ napi_enable(&ar->napi);
+ return 0;
+
+err_wlan_enable:
+ ath10k_snoc_wlan_disable(ar);
+
+ return ret;
+}
+
+static const struct ath10k_hif_ops ath10k_snoc_hif_ops = {
+ .read32 = ath10k_snoc_read32,
+ .write32 = ath10k_snoc_write32,
+ .start = ath10k_snoc_hif_start,
+ .stop = ath10k_snoc_hif_stop,
+ .map_service_to_pipe = ath10k_snoc_hif_map_service_to_pipe,
+ .get_default_pipe = ath10k_snoc_hif_get_default_pipe,
+ .power_up = ath10k_snoc_hif_power_up,
+ .power_down = ath10k_snoc_hif_power_down,
+ .tx_sg = ath10k_snoc_hif_tx_sg,
+ .send_complete_check = ath10k_snoc_hif_send_complete_check,
+ .get_free_queue_number = ath10k_snoc_hif_get_free_queue_number,
+ .get_target_info = ath10k_snoc_hif_get_target_info,
+};
+
+static const struct ath10k_bus_ops ath10k_snoc_bus_ops = {
+ .read32 = ath10k_snoc_read32,
+ .write32 = ath10k_snoc_write32,
+};
+
+int ath10k_snoc_get_ce_id_from_irq(struct ath10k *ar, int irq)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ int i;
+
+ for (i = 0; i < CE_COUNT_MAX; i++) {
+ if (ar_snoc->ce_irqs[i].irq_line == irq)
+ return i;
+ }
+ ath10k_err(ar, "No matching CE id for irq %d\n", irq);
+
+ return -EINVAL;
+}
+
+static irqreturn_t ath10k_snoc_per_engine_handler(int irq, void *arg)
+{
+ struct ath10k *ar = arg;
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ int ce_id = ath10k_snoc_get_ce_id_from_irq(ar, irq);
+
+ if (ce_id < 0 || ce_id >= ARRAY_SIZE(ar_snoc->pipe_info)) {
+ ath10k_warn(ar, "unexpected/invalid irq %d ce_id %d\n", irq,
+ ce_id);
+ return IRQ_HANDLED;
+ }
+
+ ath10k_snoc_irq_disable(ar);
+ napi_schedule(&ar->napi);
+
+ return IRQ_HANDLED;
+}
+
+static int ath10k_snoc_napi_poll(struct napi_struct *ctx, int budget)
+{
+ struct ath10k *ar = container_of(ctx, struct ath10k, napi);
+ int done = 0;
+
+ ath10k_ce_per_engine_service_any(ar);
+ done = ath10k_htt_txrx_compl_task(ar, budget);
+
+ if (done < budget) {
+ napi_complete(ctx);
+ ath10k_snoc_irq_enable(ar);
+ }
+
+ return done;
+}
+
+void ath10k_snoc_init_napi(struct ath10k *ar)
+{
+ netif_napi_add(&ar->napi_dev, &ar->napi, ath10k_snoc_napi_poll,
+ ATH10K_NAPI_BUDGET);
+}
+
+static int ath10k_snoc_request_irq(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ int irqflags = IRQF_TRIGGER_RISING;
+ int ret, id;
+
+ for (id = 0; id < CE_COUNT_MAX; id++) {
+ ret = request_irq(ar_snoc->ce_irqs[id].irq_line,
+ ath10k_snoc_per_engine_handler,
+ irqflags, ce_name[id], ar);
+ if (ret) {
+ ath10k_err(ar,
+ "failed to register IRQ handler for CE %d: %d",
+ id, ret);
+ goto err_irq;
+ }
+ }
+
+ return 0;
+
+err_irq:
+ for (id -= 1; id >= 0; id--)
+ free_irq(ar_snoc->ce_irqs[id].irq_line, ar);
+
+ return ret;
+}
+
+static void ath10k_snoc_free_irq(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ int id;
+
+ for (id = 0; id < CE_COUNT_MAX; id++)
+ free_irq(ar_snoc->ce_irqs[id].irq_line, ar);
+}
+
+static int ath10k_snoc_resource_init(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct platform_device *pdev;
+ struct resource *res;
+ int i, ret = 0;
+
+ pdev = ar_snoc->dev;
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "membase");
+ if (!res) {
+ ath10k_err(ar, "Memory base not found in DT\n");
+ return -EINVAL;
+ }
+
+ ar_snoc->mem_pa = res->start;
+ ar_snoc->mem = devm_ioremap(&pdev->dev, ar_snoc->mem_pa,
+ resource_size(res));
+ if (!ar_snoc->mem) {
+ ath10k_err(ar, "Memory base ioremap failed with physical address %pa\n",
+ &ar_snoc->mem_pa);
+ return -EINVAL;
+ }
+
+ for (i = 0; i < CE_COUNT; i++) {
+ res = platform_get_resource(ar_snoc->dev, IORESOURCE_IRQ, i);
+ if (!res) {
+ ath10k_err(ar, "failed to get IRQ%d\n", i);
+ ret = -ENODEV;
+ goto out;
+ }
+ ar_snoc->ce_irqs[i].irq_line = res->start;
+ }
+
+out:
+ return ret;
+}
+
+static int ath10k_snoc_setup_resource(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_ce *ce = ath10k_ce_priv(ar);
+ struct ath10k_snoc_pipe *pipe;
+ int i, ret;
+
+ timer_setup(&ar_snoc->rx_post_retry, ath10k_snoc_rx_replenish_retry, 0);
+ spin_lock_init(&ce->ce_lock);
+ for (i = 0; i < CE_COUNT; i++) {
+ pipe = &ar_snoc->pipe_info[i];
+ pipe->ce_hdl = &ce->ce_states[i];
+ pipe->pipe_num = i;
+ pipe->hif_ce_state = ar;
+
+ ret = ath10k_ce_alloc_pipe(ar, i, &host_ce_config_wlan[i]);
+ if (ret) {
+ ath10k_err(ar, "failed to allocate copy engine pipe %d: %d\n",
+ i, ret);
+ return ret;
+ }
+
+ pipe->buf_sz = host_ce_config_wlan[i].src_sz_max;
+ }
+ ath10k_snoc_init_napi(ar);
+
+ return 0;
+}
+
+static void ath10k_snoc_release_resource(struct ath10k *ar)
+{
+ int i;
+
+ netif_napi_del(&ar->napi);
+ for (i = 0; i < CE_COUNT; i++)
+ ath10k_ce_free_pipe(ar, i);
+}
+
+static int ath10k_get_vreg_info(struct ath10k *ar, struct device *dev,
+ struct ath10k_wcn3990_vreg_info *vreg_info)
+{
+ struct regulator *reg;
+ int ret = 0;
+
+ reg = devm_regulator_get_optional(dev, vreg_info->name);
+
+ if (IS_ERR(reg)) {
+ ret = PTR_ERR(reg);
+
+ if (ret == -EPROBE_DEFER) {
+ ath10k_err(ar, "EPROBE_DEFER for regulator: %s\n",
+ vreg_info->name);
+ return ret;
+ }
+ if (vreg_info->required) {
+ ath10k_err(ar, "Regulator %s doesn't exist: %d\n",
+ vreg_info->name, ret);
+ return ret;
+ }
+ ath10k_dbg(ar, ATH10K_DBG_SNOC,
+ "Optional regulator %s doesn't exist: %d\n",
+ vreg_info->name, ret);
+ goto done;
+ }
+
+ vreg_info->reg = reg;
+
+done:
+ ath10k_dbg(ar, ATH10K_DBG_SNOC,
+ "snog vreg %s min_v %u max_v %u load_ua %u settle_delay %lu\n",
+ vreg_info->name, vreg_info->min_v, vreg_info->max_v,
+ vreg_info->load_ua, vreg_info->settle_delay);
+
+ return 0;
+}
+
+static int ath10k_get_clk_info(struct ath10k *ar, struct device *dev,
+ struct ath10k_wcn3990_clk_info *clk_info)
+{
+ struct clk *handle;
+ int ret = 0;
+
+ handle = devm_clk_get(dev, clk_info->name);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ if (clk_info->required) {
+ ath10k_err(ar, "snoc clock %s isn't available: %d\n",
+ clk_info->name, ret);
+ return ret;
+ }
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc ignoring clock %s: %d\n",
+ clk_info->name,
+ ret);
+ return 0;
+ }
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc clock %s freq %u\n",
+ clk_info->name, clk_info->freq);
+
+ clk_info->handle = handle;
+
+ return ret;
+}
+
+static int ath10k_wcn3990_vreg_on(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_wcn3990_vreg_info *vreg_info;
+ int ret = 0;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(vreg_cfg); i++) {
+ vreg_info = &ar_snoc->vreg[i];
+
+ if (!vreg_info->reg)
+ continue;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc regulator %s being enabled\n",
+ vreg_info->name);
+
+ ret = regulator_set_voltage(vreg_info->reg, vreg_info->min_v,
+ vreg_info->max_v);
+ if (ret) {
+ ath10k_err(ar,
+ "failed to set regulator %s voltage-min: %d voltage-max: %d\n",
+ vreg_info->name, vreg_info->min_v, vreg_info->max_v);
+ goto err_reg_config;
+ }
+
+ if (vreg_info->load_ua) {
+ ret = regulator_set_load(vreg_info->reg,
+ vreg_info->load_ua);
+ if (ret < 0) {
+ ath10k_err(ar,
+ "failed to set regulator %s load: %d\n",
+ vreg_info->name,
+ vreg_info->load_ua);
+ goto err_reg_config;
+ }
+ }
+
+ ret = regulator_enable(vreg_info->reg);
+ if (ret) {
+ ath10k_err(ar, "failed to enable regulator %s\n",
+ vreg_info->name);
+ goto err_reg_config;
+ }
+
+ if (vreg_info->settle_delay)
+ udelay(vreg_info->settle_delay);
+ }
+
+ return 0;
+
+err_reg_config:
+ for (; i >= 0; i--) {
+ vreg_info = &ar_snoc->vreg[i];
+
+ if (!vreg_info->reg)
+ continue;
+
+ regulator_disable(vreg_info->reg);
+ regulator_set_load(vreg_info->reg, 0);
+ regulator_set_voltage(vreg_info->reg, 0, vreg_info->max_v);
+ }
+
+ return ret;
+}
+
+static int ath10k_wcn3990_vreg_off(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_wcn3990_vreg_info *vreg_info;
+ int ret = 0;
+ int i;
+
+ for (i = ARRAY_SIZE(vreg_cfg) - 1; i >= 0; i--) {
+ vreg_info = &ar_snoc->vreg[i];
+
+ if (!vreg_info->reg)
+ continue;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc regulator %s being disabled\n",
+ vreg_info->name);
+
+ ret = regulator_disable(vreg_info->reg);
+ if (ret)
+ ath10k_err(ar, "failed to disable regulator %s\n",
+ vreg_info->name);
+
+ ret = regulator_set_load(vreg_info->reg, 0);
+ if (ret < 0)
+ ath10k_err(ar, "failed to set load %s\n",
+ vreg_info->name);
+
+ ret = regulator_set_voltage(vreg_info->reg, 0,
+ vreg_info->max_v);
+ if (ret)
+ ath10k_err(ar, "failed to set voltage %s\n",
+ vreg_info->name);
+ }
+
+ return ret;
+}
+
+static int ath10k_wcn3990_clk_init(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_wcn3990_clk_info *clk_info;
+ int ret = 0;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(clk_cfg); i++) {
+ clk_info = &ar_snoc->clk[i];
+
+ if (!clk_info->handle)
+ continue;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc clock %s being enabled\n",
+ clk_info->name);
+
+ if (clk_info->freq) {
+ ret = clk_set_rate(clk_info->handle, clk_info->freq);
+
+ if (ret) {
+ ath10k_err(ar, "failed to set clock %s freq %u\n",
+ clk_info->name, clk_info->freq);
+ goto err_clock_config;
+ }
+ }
+
+ ret = clk_prepare_enable(clk_info->handle);
+ if (ret) {
+ ath10k_err(ar, "failed to enable clock %s\n",
+ clk_info->name);
+ goto err_clock_config;
+ }
+ }
+
+ return 0;
+
+err_clock_config:
+ for (; i >= 0; i--) {
+ clk_info = &ar_snoc->clk[i];
+
+ if (!clk_info->handle)
+ continue;
+
+ clk_disable_unprepare(clk_info->handle);
+ }
+
+ return ret;
+}
+
+static int ath10k_wcn3990_clk_deinit(struct ath10k *ar)
+{
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ struct ath10k_wcn3990_clk_info *clk_info;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(clk_cfg); i++) {
+ clk_info = &ar_snoc->clk[i];
+
+ if (!clk_info->handle)
+ continue;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc clock %s being disabled\n",
+ clk_info->name);
+
+ clk_disable_unprepare(clk_info->handle);
+ }
+
+ return 0;
+}
+
+static int ath10k_hw_power_on(struct ath10k *ar)
+{
+ int ret;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "soc power on\n");
+
+ ret = ath10k_wcn3990_vreg_on(ar);
+ if (ret)
+ return ret;
+
+ ret = ath10k_wcn3990_clk_init(ar);
+ if (ret)
+ goto vreg_off;
+
+ return ret;
+
+vreg_off:
+ ath10k_wcn3990_vreg_off(ar);
+ return ret;
+}
+
+static int ath10k_hw_power_off(struct ath10k *ar)
+{
+ int ret;
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "soc power off\n");
+
+ ath10k_wcn3990_clk_deinit(ar);
+
+ ret = ath10k_wcn3990_vreg_off(ar);
+
+ return ret;
+}
+
+static const struct of_device_id ath10k_snoc_dt_match[] = {
+ { .compatible = "qcom,wcn3990-wifi",
+ .data = &drv_priv,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(of, ath10k_snoc_dt_match);
+
+static int ath10k_snoc_probe(struct platform_device *pdev)
+{
+ const struct ath10k_snoc_drv_priv *drv_data;
+ const struct of_device_id *of_id;
+ struct ath10k_snoc *ar_snoc;
+ struct device *dev;
+ struct ath10k *ar;
+ int ret;
+ u32 i;
+
+ of_id = of_match_device(ath10k_snoc_dt_match, &pdev->dev);
+ if (!of_id) {
+ dev_err(&pdev->dev, "failed to find matching device tree id\n");
+ return -EINVAL;
+ }
+
+ drv_data = of_id->data;
+ dev = &pdev->dev;
+
+ ret = dma_set_mask_and_coherent(dev, drv_data->dma_mask);
+ if (ret) {
+ dev_err(dev, "failed to set dma mask: %d", ret);
+ return ret;
+ }
+
+ ar = ath10k_core_create(sizeof(*ar_snoc), dev, ATH10K_BUS_SNOC,
+ drv_data->hw_rev, &ath10k_snoc_hif_ops);
+ if (!ar) {
+ dev_err(dev, "failed to allocate core\n");
+ return -ENOMEM;
+ }
+
+ ar_snoc = ath10k_snoc_priv(ar);
+ ar_snoc->dev = pdev;
+ platform_set_drvdata(pdev, ar);
+ ar_snoc->ar = ar;
+ ar_snoc->ce.bus_ops = &ath10k_snoc_bus_ops;
+ ar->ce_priv = &ar_snoc->ce;
+
+ ath10k_snoc_resource_init(ar);
+ if (ret) {
+ ath10k_warn(ar, "failed to initialize resource: %d\n", ret);
+ goto err_core_destroy;
+ }
+
+ ath10k_snoc_setup_resource(ar);
+ if (ret) {
+ ath10k_warn(ar, "failed to setup resource: %d\n", ret);
+ goto err_core_destroy;
+ }
+ ret = ath10k_snoc_request_irq(ar);
+ if (ret) {
+ ath10k_warn(ar, "failed to request irqs: %d\n", ret);
+ goto err_release_resource;
+ }
+
+ ar_snoc->vreg = vreg_cfg;
+ for (i = 0; i < ARRAY_SIZE(vreg_cfg); i++) {
+ ret = ath10k_get_vreg_info(ar, dev, &ar_snoc->vreg[i]);
+ if (ret)
+ goto err_free_irq;
+ }
+
+ ar_snoc->clk = clk_cfg;
+ for (i = 0; i < ARRAY_SIZE(clk_cfg); i++) {
+ ret = ath10k_get_clk_info(ar, dev, &ar_snoc->clk[i]);
+ if (ret)
+ goto err_free_irq;
+ }
+
+ ret = ath10k_hw_power_on(ar);
+ if (ret) {
+ ath10k_err(ar, "failed to power on device: %d\n", ret);
+ goto err_free_irq;
+ }
+
+ ret = ath10k_core_register(ar, drv_data->hw_rev);
+ if (ret) {
+ ath10k_err(ar, "failed to register driver core: %d\n", ret);
+ goto err_hw_power_off;
+ }
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc probe\n");
+ ath10k_warn(ar, "Warning: SNOC support is still work-in-progress, it will not work properly!");
+
+ return 0;
+
+err_hw_power_off:
+ ath10k_hw_power_off(ar);
+
+err_free_irq:
+ ath10k_snoc_free_irq(ar);
+
+err_release_resource:
+ ath10k_snoc_release_resource(ar);
+
+err_core_destroy:
+ ath10k_core_destroy(ar);
+
+ return ret;
+}
+
+static int ath10k_snoc_remove(struct platform_device *pdev)
+{
+ struct ath10k *ar = platform_get_drvdata(pdev);
+
+ ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc remove\n");
+ ath10k_core_unregister(ar);
+ ath10k_hw_power_off(ar);
+ ath10k_snoc_free_irq(ar);
+ ath10k_snoc_release_resource(ar);
+ ath10k_core_destroy(ar);
+
+ return 0;
+}
+
+static struct platform_driver ath10k_snoc_driver = {
+ .probe = ath10k_snoc_probe,
+ .remove = ath10k_snoc_remove,
+ .driver = {
+ .name = "ath10k_snoc",
+ .owner = THIS_MODULE,
+ .of_match_table = ath10k_snoc_dt_match,
+ },
+};
+
+static int __init ath10k_snoc_init(void)
+{
+ int ret;
+
+ ret = platform_driver_register(&ath10k_snoc_driver);
+ if (ret)
+ pr_err("failed to register ath10k snoc driver: %d\n",
+ ret);
+
+ return ret;
+}
+module_init(ath10k_snoc_init);
+
+static void __exit ath10k_snoc_exit(void)
+{
+ platform_driver_unregister(&ath10k_snoc_driver);
+}
+module_exit(ath10k_snoc_exit);
+
+MODULE_AUTHOR("Qualcomm");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_DESCRIPTION("Driver support for Atheros WCN3990 SNOC devices");
diff --git a/drivers/net/wireless/ath/ath10k/snoc.h b/drivers/net/wireless/ath/ath10k/snoc.h
new file mode 100644
index 0000000..05dc98f
--- /dev/null
+++ b/drivers/net/wireless/ath/ath10k/snoc.h
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2018 The Linux Foundation. All rights reserved.
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _SNOC_H_
+#define _SNOC_H_
+
+#include "hw.h"
+#include "ce.h"
+#include "pci.h"
+
+struct ath10k_snoc_drv_priv {
+ enum ath10k_hw_rev hw_rev;
+ u64 dma_mask;
+};
+
+struct snoc_state {
+ u32 pipe_cfg_addr;
+ u32 svc_to_pipe_map;
+};
+
+struct ath10k_snoc_pipe {
+ struct ath10k_ce_pipe *ce_hdl;
+ u8 pipe_num;
+ struct ath10k *hif_ce_state;
+ size_t buf_sz;
+ /* protect ce info */
+ spinlock_t pipe_lock;
+ struct ath10k_snoc *ar_snoc;
+};
+
+struct ath10k_snoc_target_info {
+ u32 target_version;
+ u32 target_type;
+ u32 target_revision;
+ u32 soc_version;
+};
+
+struct ath10k_snoc_ce_irq {
+ u32 irq_line;
+};
+
+struct ath10k_wcn3990_vreg_info {
+ struct regulator *reg;
+ const char *name;
+ u32 min_v;
+ u32 max_v;
+ u32 load_ua;
+ unsigned long settle_delay;
+ bool required;
+};
+
+struct ath10k_wcn3990_clk_info {
+ struct clk *handle;
+ const char *name;
+ u32 freq;
+ bool required;
+};
+
+struct ath10k_snoc {
+ struct platform_device *dev;
+ struct ath10k *ar;
+ void __iomem *mem;
+ dma_addr_t mem_pa;
+ struct ath10k_snoc_target_info target_info;
+ size_t mem_len;
+ struct ath10k_snoc_pipe pipe_info[CE_COUNT_MAX];
+ struct ath10k_snoc_ce_irq ce_irqs[CE_COUNT_MAX];
+ struct ath10k_ce ce;
+ struct timer_list rx_post_retry;
+ struct ath10k_wcn3990_vreg_info *vreg;
+ struct ath10k_wcn3990_clk_info *clk;
+};
+
+static inline struct ath10k_snoc *ath10k_snoc_priv(struct ath10k *ar)
+{
+ return (struct ath10k_snoc *)ar->drv_priv;
+}
+
+void ath10k_snoc_write32(struct ath10k *ar, u32 offset, u32 value);
+u32 ath10k_snoc_read32(struct ath10k *ar, u32 offset);
+
+#endif /* _SNOC_H_ */
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 70e23bb..cda164f 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -1,6 +1,7 @@
/*
* Copyright (c) 2005-2011 Atheros Communications Inc.
* Copyright (c) 2011-2016 Qualcomm Atheros, Inc.
+ * Copyright (c) 2018, The Linux Foundation. All rights reserved.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
@@ -119,6 +120,13 @@
info->flags &= ~IEEE80211_TX_STAT_ACK;
}
+ if (tx_done->status == HTT_TX_COMPL_STATE_ACK &&
+ tx_done->ack_rssi != ATH10K_INVALID_RSSI) {
+ info->status.ack_signal = ATH10K_DEFAULT_NOISE_FLOOR +
+ tx_done->ack_rssi;
+ info->status.is_valid_ack_signal = true;
+ }
+
ieee80211_tx_status(htt->ar->hw, msdu);
/* we do not own the msdu anymore */
diff --git a/drivers/net/wireless/ath/ath10k/wmi-ops.h b/drivers/net/wireless/ath/ath10k/wmi-ops.h
index c35e453..e37d16b 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-ops.h
+++ b/drivers/net/wireless/ath/ath10k/wmi-ops.h
@@ -25,6 +25,7 @@
struct wmi_ops {
void (*rx)(struct ath10k *ar, struct sk_buff *skb);
void (*map_svc)(const __le32 *in, unsigned long *out, size_t len);
+ void (*map_svc_ext)(const __le32 *in, unsigned long *out, size_t len);
int (*pull_scan)(struct ath10k *ar, struct sk_buff *skb,
struct wmi_scan_ev_arg *arg);
@@ -54,6 +55,9 @@
struct wmi_wow_ev_arg *arg);
int (*pull_echo_ev)(struct ath10k *ar, struct sk_buff *skb,
struct wmi_echo_ev_arg *arg);
+ int (*pull_svc_avail)(struct ath10k *ar, struct sk_buff *skb,
+ struct wmi_svc_avail_ev_arg *arg);
+
enum wmi_txbf_conf (*get_txbf_conf_scheme)(struct ath10k *ar);
struct sk_buff *(*gen_pdev_suspend)(struct ath10k *ar, u32 suspend_opt);
@@ -115,6 +119,8 @@
u32 value);
struct sk_buff *(*gen_scan_chan_list)(struct ath10k *ar,
const struct wmi_scan_chan_list_arg *arg);
+ struct sk_buff *(*gen_scan_prob_req_oui)(struct ath10k *ar,
+ u32 prob_req_oui);
struct sk_buff *(*gen_beacon_dma)(struct ath10k *ar, u32 vdev_id,
const void *bcn, size_t bcn_len,
u32 bcn_paddr, bool dtim_zero,
@@ -230,6 +236,17 @@
}
static inline int
+ath10k_wmi_map_svc_ext(struct ath10k *ar, const __le32 *in, unsigned long *out,
+ size_t len)
+{
+ if (!ar->wmi.ops->map_svc_ext)
+ return -EOPNOTSUPP;
+
+ ar->wmi.ops->map_svc_ext(in, out, len);
+ return 0;
+}
+
+static inline int
ath10k_wmi_pull_scan(struct ath10k *ar, struct sk_buff *skb,
struct wmi_scan_ev_arg *arg)
{
@@ -330,6 +347,15 @@
}
static inline int
+ath10k_wmi_pull_svc_avail(struct ath10k *ar, struct sk_buff *skb,
+ struct wmi_svc_avail_ev_arg *arg)
+{
+ if (!ar->wmi.ops->pull_svc_avail)
+ return -EOPNOTSUPP;
+ return ar->wmi.ops->pull_svc_avail(ar, skb, arg);
+}
+
+static inline int
ath10k_wmi_pull_fw_stats(struct ath10k *ar, struct sk_buff *skb,
struct ath10k_fw_stats *stats)
{
@@ -891,6 +917,26 @@
}
static inline int
+ath10k_wmi_scan_prob_req_oui(struct ath10k *ar, const u8 mac_addr[ETH_ALEN])
+{
+ struct sk_buff *skb;
+ u32 prob_req_oui;
+
+ prob_req_oui = (((u32)mac_addr[0]) << 16) |
+ (((u32)mac_addr[1]) << 8) | mac_addr[2];
+
+ if (!ar->wmi.ops->gen_scan_prob_req_oui)
+ return -EOPNOTSUPP;
+
+ skb = ar->wmi.ops->gen_scan_prob_req_oui(ar, prob_req_oui);
+ if (IS_ERR(skb))
+ return PTR_ERR(skb);
+
+ return ath10k_wmi_cmd_send(ar, skb,
+ ar->wmi.cmd->scan_prob_req_oui_cmdid);
+}
+
+static inline int
ath10k_wmi_peer_assoc(struct ath10k *ar,
const struct wmi_peer_assoc_complete_arg *arg)
{
diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
index 9d1b0a4..01f4eb2 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
@@ -594,6 +594,9 @@
case WMI_TLV_READY_EVENTID:
ath10k_wmi_event_ready(ar, skb);
break;
+ case WMI_TLV_SERVICE_AVAILABLE_EVENTID:
+ ath10k_wmi_event_service_available(ar, skb);
+ break;
case WMI_TLV_OFFLOAD_BCN_TX_STATUS_EVENTID:
ath10k_wmi_tlv_event_bcn_tx_status(ar, skb);
break;
@@ -1117,6 +1120,39 @@
return 0;
}
+static int ath10k_wmi_tlv_svc_avail_parse(struct ath10k *ar, u16 tag, u16 len,
+ const void *ptr, void *data)
+{
+ struct wmi_svc_avail_ev_arg *arg = data;
+
+ switch (tag) {
+ case WMI_TLV_TAG_STRUCT_SERVICE_AVAILABLE_EVENT:
+ arg->service_map_ext_len = *(__le32 *)ptr;
+ arg->service_map_ext = ptr + sizeof(__le32);
+ return 0;
+ default:
+ break;
+ }
+ return -EPROTO;
+}
+
+static int ath10k_wmi_tlv_op_pull_svc_avail(struct ath10k *ar,
+ struct sk_buff *skb,
+ struct wmi_svc_avail_ev_arg *arg)
+{
+ int ret;
+
+ ret = ath10k_wmi_tlv_iter(ar, skb->data, skb->len,
+ ath10k_wmi_tlv_svc_avail_parse, arg);
+
+ if (ret) {
+ ath10k_warn(ar, "failed to parse svc_avail tlv: %d\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
static void ath10k_wmi_tlv_pull_vdev_stats(const struct wmi_tlv_vdev_stats *src,
struct ath10k_fw_stats_vdev *dst)
{
@@ -1600,6 +1636,8 @@
cmd->num_bssids = __cpu_to_le32(arg->n_bssids);
cmd->ie_len = __cpu_to_le32(arg->ie_len);
cmd->num_probes = __cpu_to_le32(3);
+ ether_addr_copy(cmd->mac_addr.addr, arg->mac_addr.addr);
+ ether_addr_copy(cmd->mac_mask.addr, arg->mac_mask.addr);
/* FIXME: There are some scan flag inconsistencies across firmwares,
* e.g. WMI-TLV inverts the logic behind the following flag.
@@ -2447,6 +2485,27 @@
}
static struct sk_buff *
+ath10k_wmi_tlv_op_gen_scan_prob_req_oui(struct ath10k *ar, u32 prob_req_oui)
+{
+ struct wmi_scan_prob_req_oui_cmd *cmd;
+ struct wmi_tlv *tlv;
+ struct sk_buff *skb;
+
+ skb = ath10k_wmi_alloc_skb(ar, sizeof(*tlv) + sizeof(*cmd));
+ if (!skb)
+ return ERR_PTR(-ENOMEM);
+
+ tlv = (void *)skb->data;
+ tlv->tag = __cpu_to_le16(WMI_TLV_TAG_STRUCT_SCAN_PROB_REQ_OUI_CMD);
+ tlv->len = __cpu_to_le16(sizeof(*cmd));
+ cmd = (void *)tlv->value;
+ cmd->prob_req_oui = __cpu_to_le32(prob_req_oui);
+
+ ath10k_dbg(ar, ATH10K_DBG_WMI, "wmi tlv scan prob req oui\n");
+ return skb;
+}
+
+static struct sk_buff *
ath10k_wmi_tlv_op_gen_beacon_dma(struct ath10k *ar, u32 vdev_id,
const void *bcn, size_t bcn_len,
u32 bcn_paddr, bool dtim_zero,
@@ -3416,6 +3475,7 @@
.stop_scan_cmdid = WMI_TLV_STOP_SCAN_CMDID,
.scan_chan_list_cmdid = WMI_TLV_SCAN_CHAN_LIST_CMDID,
.scan_sch_prio_tbl_cmdid = WMI_TLV_SCAN_SCH_PRIO_TBL_CMDID,
+ .scan_prob_req_oui_cmdid = WMI_TLV_SCAN_PROB_REQ_OUI_CMDID,
.pdev_set_regdomain_cmdid = WMI_TLV_PDEV_SET_REGDOMAIN_CMDID,
.pdev_set_channel_cmdid = WMI_TLV_PDEV_SET_CHANNEL_CMDID,
.pdev_set_param_cmdid = WMI_TLV_PDEV_SET_PARAM_CMDID,
@@ -3740,6 +3800,7 @@
static const struct wmi_ops wmi_tlv_ops = {
.rx = ath10k_wmi_tlv_op_rx,
.map_svc = wmi_tlv_svc_map,
+ .map_svc_ext = wmi_tlv_svc_map_ext,
.pull_scan = ath10k_wmi_tlv_op_pull_scan_ev,
.pull_mgmt_rx = ath10k_wmi_tlv_op_pull_mgmt_rx_ev,
@@ -3751,6 +3812,7 @@
.pull_phyerr = ath10k_wmi_op_pull_phyerr_ev,
.pull_svc_rdy = ath10k_wmi_tlv_op_pull_svc_rdy_ev,
.pull_rdy = ath10k_wmi_tlv_op_pull_rdy_ev,
+ .pull_svc_avail = ath10k_wmi_tlv_op_pull_svc_avail,
.pull_fw_stats = ath10k_wmi_tlv_op_pull_fw_stats,
.pull_roam_ev = ath10k_wmi_tlv_op_pull_roam_ev,
.pull_wow_event = ath10k_wmi_tlv_op_pull_wow_ev,
@@ -3782,6 +3844,7 @@
.gen_set_sta_ps = ath10k_wmi_tlv_op_gen_set_sta_ps,
.gen_set_ap_ps = ath10k_wmi_tlv_op_gen_set_ap_ps,
.gen_scan_chan_list = ath10k_wmi_tlv_op_gen_scan_chan_list,
+ .gen_scan_prob_req_oui = ath10k_wmi_tlv_op_gen_scan_prob_req_oui,
.gen_beacon_dma = ath10k_wmi_tlv_op_gen_beacon_dma,
.gen_pdev_set_wmm = ath10k_wmi_tlv_op_gen_pdev_set_wmm,
.gen_request_stats = ath10k_wmi_tlv_op_gen_request_stats,
diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.h b/drivers/net/wireless/ath/ath10k/wmi-tlv.h
index fa3773e..954c50b 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.h
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.h
@@ -295,6 +295,7 @@
enum wmi_tlv_event_id {
WMI_TLV_SERVICE_READY_EVENTID = 0x1,
WMI_TLV_READY_EVENTID,
+ WMI_TLV_SERVICE_AVAILABLE_EVENTID,
WMI_TLV_SCAN_EVENTID = WMI_TLV_EV(WMI_TLV_GRP_SCAN),
WMI_TLV_PDEV_TPC_CONFIG_EVENTID = WMI_TLV_EV(WMI_TLV_GRP_PDEV),
WMI_TLV_CHAN_INFO_EVENTID,
@@ -949,6 +950,275 @@
WMI_TLV_TAG_STRUCT_PACKET_FILTER_ENABLE,
WMI_TLV_TAG_STRUCT_SAP_SET_BLACKLIST_PARAM_CMD,
WMI_TLV_TAG_STRUCT_MGMT_TX_CMD,
+ WMI_TLV_TAG_STRUCT_MGMT_TX_COMPL_EVENT,
+ WMI_TLV_TAG_STRUCT_SOC_SET_ANTENNA_MODE_CMD,
+ WMI_TLV_TAG_STRUCT_WOW_UDP_SVC_OFLD_CMD,
+ WMI_TLV_TAG_STRUCT_LRO_INFO_CMD,
+ WMI_TLV_TAG_STRUCT_ROAM_EARLYSTOP_RSSI_THRES_PARAM,
+ WMI_TLV_TAG_STRUCT_SERVICE_READY_EXT_EVENT,
+ WMI_TLV_TAG_STRUCT_MAWC_SENSOR_REPORT_IND_CMD,
+ WMI_TLV_TAG_STRUCT_MAWC_ENABLE_SENSOR_EVENT,
+ WMI_TLV_TAG_STRUCT_ROAM_CONFIGURE_MAWC_CMD,
+ WMI_TLV_TAG_STRUCT_NLO_CONFIGURE_MAWC_CMD,
+ WMI_TLV_TAG_STRUCT_EXTSCAN_CONFIGURE_MAWC_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_ASSOC_CONF_EVENT,
+ WMI_TLV_TAG_STRUCT_WOW_HOSTWAKEUP_GPIO_PIN_PATTERN_CONFIG_CMD,
+ WMI_TLV_TAG_STRUCT_AP_PS_EGAP_PARAM_CMD,
+ WMI_TLV_TAG_STRUCT_AP_PS_EGAP_INFO_EVENT,
+ WMI_TLV_TAG_STRUCT_PMF_OFFLOAD_SET_SA_QUERY_CMD,
+ WMI_TLV_TAG_STRUCT_TRANSFER_DATA_TO_FLASH_CMD,
+ WMI_TLV_TAG_STRUCT_TRANSFER_DATA_TO_FLASH_COMPLETE_EVENT,
+ WMI_TLV_TAG_STRUCT_SCPC_EVENT,
+ WMI_TLV_TAG_STRUCT_AP_PS_EGAP_INFO_CHAINMASK_LIST,
+ WMI_TLV_TAG_STRUCT_STA_SMPS_FORCE_MODE_COMPLETE_EVENT,
+ WMI_TLV_TAG_STRUCT_BPF_GET_CAPABILITY_CMD,
+ WMI_TLV_TAG_STRUCT_BPF_CAPABILITY_INFO_EVT,
+ WMI_TLV_TAG_STRUCT_BPF_GET_VDEV_STATS_CMD,
+ WMI_TLV_TAG_STRUCT_BPF_VDEV_STATS_INFO_EVT,
+ WMI_TLV_TAG_STRUCT_BPF_SET_VDEV_INSTRUCTIONS_CMD,
+ WMI_TLV_TAG_STRUCT_BPF_DEL_VDEV_INSTRUCTIONS_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_DELETE_RESP_EVENT,
+ WMI_TLV_TAG_STRUCT_PEER_DELETE_RESP_EVENT,
+ WMI_TLV_TAG_STRUCT_ROAM_DENSE_THRES_PARAM,
+ WMI_TLV_TAG_STRUCT_ENLO_CANDIDATE_SCORE_PARAM,
+ WMI_TLV_TAG_STRUCT_PEER_UPDATE_WDS_ENTRY_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_CONFIG_RATEMASK,
+ WMI_TLV_TAG_STRUCT_PDEV_FIPS_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SMART_ANT_ENABLE_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SMART_ANT_SET_RX_ANTENNA_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_SMART_ANT_SET_TX_ANTENNA_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_SMART_ANT_SET_TRAIN_ANTENNA_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_SMART_ANT_SET_NODE_CONFIG_OPS_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_ANT_SWITCH_TBL_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_CTL_TABLE_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_MIMOGAIN_TABLE_CMD,
+ WMI_TLV_TAG_STRUCT_FWTEST_SET_PARAM_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_ATF_REQUEST,
+ WMI_TLV_TAG_STRUCT_VDEV_ATF_REQUEST,
+ WMI_TLV_TAG_STRUCT_PDEV_GET_ANI_CCK_CONFIG_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_GET_ANI_OFDM_CONFIG_CMD,
+ WMI_TLV_TAG_STRUCT_INST_RSSI_STATS_RESP,
+ WMI_TLV_TAG_STRUCT_MED_UTIL_REPORT_EVENT,
+ WMI_TLV_TAG_STRUCT_PEER_STA_PS_STATECHANGE_EVENT,
+ WMI_TLV_TAG_STRUCT_WDS_ADDR_EVENT,
+ WMI_TLV_TAG_STRUCT_PEER_RATECODE_LIST_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_NFCAL_POWER_ALL_CHANNELS_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_TPC_EVENT,
+ WMI_TLV_TAG_STRUCT_ANI_OFDM_EVENT,
+ WMI_TLV_TAG_STRUCT_ANI_CCK_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_CHANNEL_HOPPING_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_FIPS_EVENT,
+ WMI_TLV_TAG_STRUCT_ATF_PEER_INFO,
+ WMI_TLV_TAG_STRUCT_PDEV_GET_TPC_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_FILTER_NRP_CONFIG_CMD,
+ WMI_TLV_TAG_STRUCT_QBOOST_CFG_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SMART_ANT_GPIO_HANDLE,
+ WMI_TLV_TAG_STRUCT_PEER_SMART_ANT_SET_TX_ANTENNA_SERIES,
+ WMI_TLV_TAG_STRUCT_PEER_SMART_ANT_SET_TRAIN_ANTENNA_PARAM,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_ANT_CTRL_CHAIN,
+ WMI_TLV_TAG_STRUCT_PEER_CCK_OFDM_RATE_INFO,
+ WMI_TLV_TAG_STRUCT_PEER_MCS_RATE_INFO,
+ WMI_TLV_TAG_STRUCT_PDEV_NFCAL_POWER_ALL_CHANNELS_NFDBR,
+ WMI_TLV_TAG_STRUCT_PDEV_NFCAL_POWER_ALL_CHANNELS_NFDBM,
+ WMI_TLV_TAG_STRUCT_PDEV_NFCAL_POWER_ALL_CHANNELS_FREQNUM,
+ WMI_TLV_TAG_STRUCT_MU_REPORT_TOTAL_MU,
+ WMI_TLV_TAG_STRUCT_VDEV_SET_DSCP_TID_MAP_CMD,
+ WMI_TLV_TAG_STRUCT_ROAM_SET_MBO,
+ WMI_TLV_TAG_STRUCT_MIB_STATS_ENABLE_CMD,
+ WMI_TLV_TAG_STRUCT_NAN_DISC_IFACE_CREATED_EVENT,
+ WMI_TLV_TAG_STRUCT_NAN_DISC_IFACE_DELETED_EVENT,
+ WMI_TLV_TAG_STRUCT_NAN_STARTED_CLUSTER_EVENT,
+ WMI_TLV_TAG_STRUCT_NAN_JOINED_CLUSTER_EVENT,
+ WMI_TLV_TAG_STRUCT_NDI_GET_CAP_REQ,
+ WMI_TLV_TAG_STRUCT_NDP_INITIATOR_REQ,
+ WMI_TLV_TAG_STRUCT_NDP_RESPONDER_REQ,
+ WMI_TLV_TAG_STRUCT_NDP_END_REQ,
+ WMI_TLV_TAG_STRUCT_NDI_CAP_RSP_EVENT,
+ WMI_TLV_TAG_STRUCT_NDP_INITIATOR_RSP_EVENT,
+ WMI_TLV_TAG_STRUCT_NDP_RESPONDER_RSP_EVENT,
+ WMI_TLV_TAG_STRUCT_NDP_END_RSP_EVENT,
+ WMI_TLV_TAG_STRUCT_NDP_INDICATION_EVENT,
+ WMI_TLV_TAG_STRUCT_NDP_CONFIRM_EVENT,
+ WMI_TLV_TAG_STRUCT_NDP_END_INDICATION_EVENT,
+ WMI_TLV_TAG_STRUCT_VDEV_SET_QUIET_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_PCL_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_HW_MODE_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_MAC_CONFIG_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_ANTENNA_MODE_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_HW_MODE_RESPONSE_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_HW_MODE_TRANSITION_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_HW_MODE_RESPONSE_VDEV_MAC_ENTRY,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_MAC_CONFIG_RESPONSE_EVENT,
+ WMI_TLV_TAG_STRUCT_COEX_CONFIG_CMD,
+ WMI_TLV_TAG_STRUCT_CONFIG_ENHANCED_MCAST_FILTER,
+ WMI_TLV_TAG_STRUCT_CHAN_AVOID_RPT_ALLOW_CMD,
+ WMI_TLV_TAG_STRUCT_SET_PERIODIC_CHANNEL_STATS_CONFIG,
+ WMI_TLV_TAG_STRUCT_VDEV_SET_CUSTOM_AGGR_SIZE_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_WAL_POWER_DEBUG_CMD,
+ WMI_TLV_TAG_STRUCT_MAC_PHY_CAPABILITIES,
+ WMI_TLV_TAG_STRUCT_HW_MODE_CAPABILITIES,
+ WMI_TLV_TAG_STRUCT_SOC_MAC_PHY_HW_MODE_CAPS,
+ WMI_TLV_TAG_STRUCT_HAL_REG_CAPABILITIES_EXT,
+ WMI_TLV_TAG_STRUCT_SOC_HAL_REG_CAPABILITIES,
+ WMI_TLV_TAG_STRUCT_VDEV_WISA_CMD,
+ WMI_TLV_TAG_STRUCT_TX_POWER_LEVEL_STATS_EVT,
+ WMI_TLV_TAG_STRUCT_SCAN_ADAPTIVE_DWELL_PARAMETERS_TLV,
+ WMI_TLV_TAG_STRUCT_SCAN_ADAPTIVE_DWELL_CONFIG,
+ WMI_TLV_TAG_STRUCT_WOW_SET_ACTION_WAKE_UP_CMD,
+ WMI_TLV_TAG_STRUCT_NDP_END_RSP_PER_NDI,
+ WMI_TLV_TAG_STRUCT_PEER_BWF_REQUEST,
+ WMI_TLV_TAG_STRUCT_BWF_PEER_INFO,
+ WMI_TLV_TAG_STRUCT_DBGLOG_TIME_STAMP_SYNC_CMD,
+ WMI_TLV_TAG_STRUCT_RMC_SET_LEADER_CMD,
+ WMI_TLV_TAG_STRUCT_RMC_MANUAL_LEADER_EVENT,
+ WMI_TLV_TAG_STRUCT_PER_CHAIN_RSSI_STATS,
+ WMI_TLV_TAG_STRUCT_RSSI_STATS,
+ WMI_TLV_TAG_STRUCT_P2P_LO_START_CMD,
+ WMI_TLV_TAG_STRUCT_P2P_LO_STOP_CMD,
+ WMI_TLV_TAG_STRUCT_P2P_LO_STOPPED_EVENT,
+ WMI_TLV_TAG_STRUCT_PEER_REORDER_QUEUE_SETUP_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_REORDER_QUEUE_REMOVE_CMD,
+ WMI_TLV_TAG_STRUCT_SET_MULTIPLE_MCAST_FILTER_CMD,
+ WMI_TLV_TAG_STRUCT_MGMT_TX_COMPL_BUNDLE_EVENT,
+ WMI_TLV_TAG_STRUCT_READ_DATA_FROM_FLASH_CMD,
+ WMI_TLV_TAG_STRUCT_READ_DATA_FROM_FLASH_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_REORDER_TIMEOUT_VAL_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_SET_RX_BLOCKSIZE_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_WAKEUP_CONFIG_CMDID,
+ WMI_TLV_TAG_STRUCT_TLV_BUF_LEN_PARAM,
+ WMI_TLV_TAG_STRUCT_SERVICE_AVAILABLE_EVENT,
+ WMI_TLV_TAG_STRUCT_PEER_ANTDIV_INFO_REQ_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_ANTDIV_INFO_EVENT,
+ WMI_TLV_TAG_STRUCT_PEER_ANTDIV_INFO,
+ WMI_TLV_TAG_STRUCT_PDEV_GET_ANTDIV_STATUS_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_ANTDIV_STATUS_EVENT,
+ WMI_TLV_TAG_STRUCT_MNT_FILTER_CMD,
+ WMI_TLV_TAG_STRUCT_GET_CHIP_POWER_STATS_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_CHIP_POWER_STATS_EVENT,
+ WMI_TLV_TAG_STRUCT_COEX_GET_ANTENNA_ISOLATION_CMD,
+ WMI_TLV_TAG_STRUCT_COEX_REPORT_ISOLATION_EVENT,
+ WMI_TLV_TAG_STRUCT_CHAN_CCA_STATS,
+ WMI_TLV_TAG_STRUCT_PEER_SIGNAL_STATS,
+ WMI_TLV_TAG_STRUCT_TX_STATS,
+ WMI_TLV_TAG_STRUCT_PEER_AC_TX_STATS,
+ WMI_TLV_TAG_STRUCT_RX_STATS,
+ WMI_TLV_TAG_STRUCT_PEER_AC_RX_STATS,
+ WMI_TLV_TAG_STRUCT_REPORT_STATS_EVENT,
+ WMI_TLV_TAG_STRUCT_CHAN_CCA_STATS_THRESH,
+ WMI_TLV_TAG_STRUCT_PEER_SIGNAL_STATS_THRESH,
+ WMI_TLV_TAG_STRUCT_TX_STATS_THRESH,
+ WMI_TLV_TAG_STRUCT_RX_STATS_THRESH,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_STATS_THRESHOLD_CMD,
+ WMI_TLV_TAG_STRUCT_REQUEST_WLAN_STATS_CMD,
+ WMI_TLV_TAG_STRUCT_RX_AGGR_FAILURE_EVENT,
+ WMI_TLV_TAG_STRUCT_RX_AGGR_FAILURE_INFO,
+ WMI_TLV_TAG_STRUCT_VDEV_ENCRYPT_DECRYPT_DATA_REQ_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_ENCRYPT_DECRYPT_DATA_RESP_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_BAND_TO_MAC,
+ WMI_TLV_TAG_STRUCT_TBTT_OFFSET_INFO,
+ WMI_TLV_TAG_STRUCT_TBTT_OFFSET_EXT_EVENT,
+ WMI_TLV_TAG_STRUCT_SAR_LIMITS_CMD,
+ WMI_TLV_TAG_STRUCT_SAR_LIMIT_CMD_ROW,
+ WMI_TLV_TAG_STRUCT_PDEV_DFS_PHYERR_OFFLOAD_ENABLE_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_DFS_PHYERR_OFFLOAD_DISABLE_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_ADFS_CH_CFG_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_ADFS_OCAC_ABORT_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_DFS_RADAR_DETECTION_EVENT,
+ WMI_TLV_TAG_STRUCT_VDEV_ADFS_OCAC_COMPLETE_EVENT,
+ WMI_TLV_TAG_STRUCT_VDEV_DFS_CAC_COMPLETE_EVENT,
+ WMI_TLV_TAG_STRUCT_VENDOR_OUI,
+ WMI_TLV_TAG_STRUCT_REQUEST_RCPI_CMD,
+ WMI_TLV_TAG_STRUCT_UPDATE_RCPI_EVENT,
+ WMI_TLV_TAG_STRUCT_REQUEST_PEER_STATS_INFO_CMD,
+ WMI_TLV_TAG_STRUCT_PEER_STATS_INFO,
+ WMI_TLV_TAG_STRUCT_PEER_STATS_INFO_EVENT,
+ WMI_TLV_TAG_STRUCT_PKGID_EVENT,
+ WMI_TLV_TAG_STRUCT_CONNECTED_NLO_RSSI_PARAMS,
+ WMI_TLV_TAG_STRUCT_SET_CURRENT_COUNTRY_CMD,
+ WMI_TLV_TAG_STRUCT_REGULATORY_RULE_STRUCT,
+ WMI_TLV_TAG_STRUCT_REG_CHAN_LIST_CC_EVENT,
+ WMI_TLV_TAG_STRUCT_11D_SCAN_START_CMD,
+ WMI_TLV_TAG_STRUCT_11D_SCAN_STOP_CMD,
+ WMI_TLV_TAG_STRUCT_11D_NEW_COUNTRY_EVENT,
+ WMI_TLV_TAG_STRUCT_REQUEST_RADIO_CHAN_STATS_CMD,
+ WMI_TLV_TAG_STRUCT_RADIO_CHAN_STATS,
+ WMI_TLV_TAG_STRUCT_RADIO_CHAN_STATS_EVENT,
+ WMI_TLV_TAG_STRUCT_ROAM_PER_CONFIG,
+ WMI_TLV_TAG_STRUCT_VDEV_ADD_MAC_ADDR_TO_RX_FILTER_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_ADD_MAC_ADDR_TO_RX_FILTER_STATUS_EVENT,
+ WMI_TLV_TAG_STRUCT_BPF_SET_VDEV_ACTIVE_MODE_CMD,
+ WMI_TLV_TAG_STRUCT_HW_DATA_FILTER_CMD,
+ WMI_TLV_TAG_STRUCT_CONNECTED_NLO_BSS_BAND_RSSI_PREF,
+ WMI_TLV_TAG_STRUCT_PEER_OPER_MODE_CHANGE_EVENT,
+ WMI_TLV_TAG_STRUCT_CHIP_POWER_SAVE_FAILURE_DETECTED,
+ WMI_TLV_TAG_STRUCT_PDEV_MULTIPLE_VDEV_RESTART_REQUEST_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_CSA_SWITCH_COUNT_STATUS_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_UPDATE_PKT_ROUTING_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_CHECK_CAL_VERSION_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_CHECK_CAL_VERSION_EVENT,
+ WMI_TLV_TAG_STRUCT_PDEV_SET_DIVERSITY_GAIN_CMD,
+ WMI_TLV_TAG_STRUCT_MAC_PHY_CHAINMASK_COMBO,
+ WMI_TLV_TAG_STRUCT_MAC_PHY_CHAINMASK_CAPABILITY,
+ WMI_TLV_TAG_STRUCT_VDEV_SET_ARP_STATS_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_GET_ARP_STATS_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_GET_ARP_STATS_EVENT,
+ WMI_TLV_TAG_STRUCT_IFACE_OFFLOAD_STATS,
+ WMI_TLV_TAG_STRUCT_REQUEST_STATS_CMD_SUB_STRUCT_PARAM,
+ WMI_TLV_TAG_STRUCT_RSSI_CTL_EXT,
+ WMI_TLV_TAG_STRUCT_SINGLE_PHYERR_EXT_RX_HDR,
+ WMI_TLV_TAG_STRUCT_COEX_BT_ACTIVITY_EVENT,
+ WMI_TLV_TAG_STRUCT_VDEV_GET_TX_POWER_CMD,
+ WMI_TLV_TAG_STRUCT_VDEV_TX_POWER_EVENT,
+ WMI_TLV_TAG_STRUCT_OFFCHAN_DATA_TX_COMPL_EVENT,
+ WMI_TLV_TAG_STRUCT_OFFCHAN_DATA_TX_SEND_CMD,
+ WMI_TLV_TAG_STRUCT_TX_SEND_PARAMS,
+ WMI_TLV_TAG_STRUCT_HE_RATE_SET,
+ WMI_TLV_TAG_STRUCT_CONGESTION_STATS,
+ WMI_TLV_TAG_STRUCT_SET_INIT_COUNTRY_CMD,
+ WMI_TLV_TAG_STRUCT_SCAN_DBS_DUTY_CYCLE,
+ WMI_TLV_TAG_STRUCT_SCAN_DBS_DUTY_CYCLE_PARAM_TLV,
+ WMI_TLV_TAG_STRUCT_PDEV_DIV_GET_RSSI_ANTID,
+ WMI_TLV_TAG_STRUCT_THERM_THROT_CONFIG_REQUEST,
+ WMI_TLV_TAG_STRUCT_THERM_THROT_LEVEL_CONFIG_INFO,
+ WMI_TLV_TAG_STRUCT_THERM_THROT_STATS_EVENT,
+ WMI_TLV_TAG_STRUCT_THERM_THROT_LEVEL_STATS_INFO,
+ WMI_TLV_TAG_STRUCT_PDEV_DIV_RSSI_ANTID_EVENT,
+ WMI_TLV_TAG_STRUCT_OEM_DMA_RING_CAPABILITIES,
+ WMI_TLV_TAG_STRUCT_OEM_DMA_RING_CFG_REQ,
+ WMI_TLV_TAG_STRUCT_OEM_DMA_RING_CFG_RSP,
+ WMI_TLV_TAG_STRUCT_OEM_INDIRECT_DATA,
+ WMI_TLV_TAG_STRUCT_OEM_DMA_BUF_RELEASE,
+ WMI_TLV_TAG_STRUCT_OEM_DMA_BUF_RELEASE_ENTRY,
+ WMI_TLV_TAG_STRUCT_PDEV_BSS_CHAN_INFO_REQUEST,
+ WMI_TLV_TAG_STRUCT_PDEV_BSS_CHAN_INFO_EVENT,
+ WMI_TLV_TAG_STRUCT_ROAM_LCA_DISALLOW_CONFIG_TLV_PARAM,
+ WMI_TLV_TAG_STRUCT_VDEV_LIMIT_OFFCHAN_CMD,
+ WMI_TLV_TAG_STRUCT_ROAM_RSSI_REJECTION_OCE_CONFIG_PARAM,
+ WMI_TLV_TAG_STRUCT_UNIT_TEST_EVENT,
+ WMI_TLV_TAG_STRUCT_ROAM_FILS_OFFLOAD_TLV_PARAM,
+ WMI_TLV_TAG_STRUCT_PDEV_UPDATE_PMK_CACHE_CMD,
+ WMI_TLV_TAG_STRUCT_PMK_CACHE,
+ WMI_TLV_TAG_STRUCT_PDEV_UPDATE_FILS_HLP_PKT_CMD,
+ WMI_TLV_TAG_STRUCT_ROAM_FILS_SYNCH_TLV_PARAM,
+ WMI_TLV_TAG_STRUCT_GTK_OFFLOAD_EXTENDED_TLV_PARAM,
+ WMI_TLV_TAG_STRUCT_ROAM_BG_SCAN_ROAMING_PARAM,
+ WMI_TLV_TAG_STRUCT_OIC_PING_OFFLOAD_PARAMS_CMD,
+ WMI_TLV_TAG_STRUCT_OIC_PING_OFFLOAD_SET_ENABLE_CMD,
+ WMI_TLV_TAG_STRUCT_OIC_PING_HANDOFF_EVENT,
+ WMI_TLV_TAG_STRUCT_DHCP_LEASE_RENEW_OFFLOAD_CMD,
+ WMI_TLV_TAG_STRUCT_DHCP_LEASE_RENEW_EVENT,
+ WMI_TLV_TAG_STRUCT_BTM_CONFIG,
+ WMI_TLV_TAG_STRUCT_DEBUG_MESG_FW_DATA_STALL_PARAM,
+ WMI_TLV_TAG_STRUCT_WLM_CONFIG_CMD,
+ WMI_TLV_TAG_STRUCT_PDEV_UPDATE_CTLTABLE_REQUEST,
+ WMI_TLV_TAG_STRUCT_PDEV_UPDATE_CTLTABLE_EVENT,
+ WMI_TLV_TAG_STRUCT_ROAM_CND_SCORING_PARAM,
+ WMI_TLV_TAG_STRUCT_PDEV_CONFIG_VENDOR_OUI_ACTION,
+ WMI_TLV_TAG_STRUCT_VENDOR_OUI_EXT,
+ WMI_TLV_TAG_STRUCT_ROAM_SYNCH_FRAME_EVENT,
+ WMI_TLV_TAG_STRUCT_FD_SEND_FROM_HOST_CMD,
+ WMI_TLV_TAG_STRUCT_ENABLE_FILS_CMD,
+ WMI_TLV_TAG_STRUCT_HOST_SWFDA_EVENT,
WMI_TLV_TAG_MAX
};
@@ -1068,16 +1338,74 @@
WMI_TLV_SERVICE_WLAN_STATS_REPORT,
WMI_TLV_SERVICE_TX_MSDU_ID_NEW_PARTITION_SUPPORT,
WMI_TLV_SERVICE_DFS_PHYERR_OFFLOAD,
+ WMI_TLV_SERVICE_RCPI_SUPPORT,
+ WMI_TLV_SERVICE_FW_MEM_DUMP_SUPPORT,
+ WMI_TLV_SERVICE_PEER_STATS_INFO,
+ WMI_TLV_SERVICE_REGULATORY_DB,
+ WMI_TLV_SERVICE_11D_OFFLOAD,
+ WMI_TLV_SERVICE_HW_DATA_FILTERING,
+ WMI_TLV_SERVICE_MULTIPLE_VDEV_RESTART,
+ WMI_TLV_SERVICE_PKT_ROUTING,
+ WMI_TLV_SERVICE_CHECK_CAL_VERSION,
+ WMI_TLV_SERVICE_OFFCHAN_TX_WMI,
+ WMI_TLV_SERVICE_8SS_TX_BFEE,
+ WMI_TLV_SERVICE_EXTENDED_NSS_SUPPORT,
+ WMI_TLV_SERVICE_ACK_TIMEOUT,
+ WMI_TLV_SERVICE_PDEV_BSS_CHANNEL_INFO_64,
+ WMI_TLV_MAX_SERVICE = 128,
+
+/* NOTE:
+ * The above service flags are delivered in the wmi_service_bitmap field
+ * of the WMI_TLV_SERVICE_READY_EVENT message.
+ * The below service flags are delivered in a WMI_TLV_SERVICE_AVAILABLE_EVENT
+ * message rather than in the WMI_TLV_SERVICE_READY_EVENT message's
+ * wmi_service_bitmap field.
+ * The WMI_TLV_SERVICE_AVAILABLE_EVENT message immediately precedes the
+ * WMI_TLV_SERVICE_READY_EVENT message.
+ */
+
+ WMI_TLV_SERVICE_CHAN_LOAD_INFO = 128,
+ WMI_TLV_SERVICE_TX_PPDU_INFO_STATS_SUPPORT,
+ WMI_TLV_SERVICE_VDEV_LIMIT_OFFCHAN_SUPPORT,
+ WMI_TLV_SERVICE_FILS_SUPPORT,
+ WMI_TLV_SERVICE_WLAN_OIC_PING_OFFLOAD,
+ WMI_TLV_SERVICE_WLAN_DHCP_RENEW,
+ WMI_TLV_SERVICE_MAWC_SUPPORT,
+ WMI_TLV_SERVICE_VDEV_LATENCY_CONFIG,
+ WMI_TLV_SERVICE_PDEV_UPDATE_CTLTABLE_SUPPORT,
+ WMI_TLV_SERVICE_PKTLOG_SUPPORT_OVER_HTT,
+ WMI_TLV_SERVICE_VDEV_MULTI_GROUP_KEY_SUPPORT,
+ WMI_TLV_SERVICE_SCAN_PHYMODE_SUPPORT,
+ WMI_TLV_SERVICE_THERM_THROT,
+ WMI_TLV_SERVICE_BCN_OFFLOAD_START_STOP_SUPPORT,
+ WMI_TLV_SERVICE_WOW_WAKEUP_BY_TIMER_PATTERN,
+ WMI_TLV_SERVICE_PEER_MAP_UNMAP_V2_SUPPORT = 143,
+ WMI_TLV_SERVICE_OFFCHAN_DATA_TID_SUPPORT = 144,
+ WMI_TLV_SERVICE_RX_PROMISC_ENABLE_SUPPORT = 145,
+ WMI_TLV_SERVICE_SUPPORT_DIRECT_DMA = 146,
+ WMI_TLV_SERVICE_AP_OBSS_DETECTION_OFFLOAD = 147,
+ WMI_TLV_SERVICE_11K_NEIGHBOUR_REPORT_SUPPORT = 148,
+ WMI_TLV_SERVICE_LISTEN_INTERVAL_OFFLOAD_SUPPORT = 149,
+ WMI_TLV_SERVICE_BSS_COLOR_OFFLOAD = 150,
+ WMI_TLV_SERVICE_RUNTIME_DPD_RECAL = 151,
+ WMI_TLV_SERVICE_STA_TWT = 152,
+ WMI_TLV_SERVICE_AP_TWT = 153,
+ WMI_TLV_SERVICE_GMAC_OFFLOAD_SUPPORT = 154,
+ WMI_TLV_SERVICE_SPOOF_MAC_SUPPORT = 155,
+
+ WMI_TLV_MAX_EXT_SERVICE = 256,
};
-#define WMI_SERVICE_IS_ENABLED(wmi_svc_bmap, svc_id, len) \
- ((svc_id) < (len) && \
- __le32_to_cpu((wmi_svc_bmap)[(svc_id) / (sizeof(u32))]) & \
- BIT((svc_id) % (sizeof(u32))))
+#define WMI_TLV_EXT_SERVICE_IS_ENABLED(wmi_svc_bmap, svc_id, len) \
+ ((svc_id) < (WMI_TLV_MAX_EXT_SERVICE) && \
+ (svc_id) >= (len) && \
+ __le32_to_cpu((wmi_svc_bmap)[((svc_id) - (len)) / 32]) & \
+ BIT(((((svc_id) - (len)) % 32) & 0x1f)))
#define SVCMAP(x, y, len) \
do { \
- if (WMI_SERVICE_IS_ENABLED((in), (x), (len))) \
+ if ((WMI_SERVICE_IS_ENABLED((in), (x), (len))) || \
+ (WMI_TLV_EXT_SERVICE_IS_ENABLED((in), (x), (len)))) \
__set_bit(y, out); \
} while (0)
@@ -1228,6 +1556,14 @@
WMI_SERVICE_MGMT_TX_WMI, len);
}
+static inline void
+wmi_tlv_svc_map_ext(const __le32 *in, unsigned long *out, size_t len)
+{
+ SVCMAP(WMI_TLV_SERVICE_SPOOF_MAC_SUPPORT,
+ WMI_SERVICE_SPOOF_MAC_SUPPORT,
+ WMI_TLV_MAX_SERVICE);
+}
+
#undef SVCMAP
struct wmi_tlv {
@@ -1370,6 +1706,15 @@
__le32 num_scan_chans;
} __packed;
+struct wmi_scan_prob_req_oui_cmd {
+/* OUI to be used in Probe Request frame when random MAC address is
+ * requested part of scan parameters. This is applied to both FW internal
+ * scans and host initiated scans. Host can request for random MAC address
+ * with WMI_SCAN_ADD_SPOOFED_MAC_IN_PROBE_REQ flag.
+ */
+ __le32 prob_req_oui;
+} __packed;
+
struct wmi_tlv_start_scan_cmd {
struct wmi_start_scan_common common;
__le32 burst_duration_ms;
@@ -1378,6 +1723,8 @@
__le32 num_ssids;
__le32 ie_len;
__le32 num_probes;
+ struct wmi_mac_addr mac_addr;
+ struct wmi_mac_addr mac_mask;
} __packed;
struct wmi_tlv_vdev_start_cmd {
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index c5e1ca5..df2e92a 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -42,6 +42,7 @@
.stop_scan_cmdid = WMI_STOP_SCAN_CMDID,
.scan_chan_list_cmdid = WMI_SCAN_CHAN_LIST_CMDID,
.scan_sch_prio_tbl_cmdid = WMI_SCAN_SCH_PRIO_TBL_CMDID,
+ .scan_prob_req_oui_cmdid = WMI_CMD_UNSUPPORTED,
.pdev_set_regdomain_cmdid = WMI_PDEV_SET_REGDOMAIN_CMDID,
.pdev_set_channel_cmdid = WMI_PDEV_SET_CHANNEL_CMDID,
.pdev_set_param_cmdid = WMI_PDEV_SET_PARAM_CMDID,
@@ -207,6 +208,7 @@
.stop_scan_cmdid = WMI_10X_STOP_SCAN_CMDID,
.scan_chan_list_cmdid = WMI_10X_SCAN_CHAN_LIST_CMDID,
.scan_sch_prio_tbl_cmdid = WMI_CMD_UNSUPPORTED,
+ .scan_prob_req_oui_cmdid = WMI_CMD_UNSUPPORTED,
.pdev_set_regdomain_cmdid = WMI_10X_PDEV_SET_REGDOMAIN_CMDID,
.pdev_set_channel_cmdid = WMI_10X_PDEV_SET_CHANNEL_CMDID,
.pdev_set_param_cmdid = WMI_10X_PDEV_SET_PARAM_CMDID,
@@ -374,6 +376,7 @@
.stop_scan_cmdid = WMI_10_2_STOP_SCAN_CMDID,
.scan_chan_list_cmdid = WMI_10_2_SCAN_CHAN_LIST_CMDID,
.scan_sch_prio_tbl_cmdid = WMI_CMD_UNSUPPORTED,
+ .scan_prob_req_oui_cmdid = WMI_CMD_UNSUPPORTED,
.pdev_set_regdomain_cmdid = WMI_10_2_PDEV_SET_REGDOMAIN_CMDID,
.pdev_set_channel_cmdid = WMI_10_2_PDEV_SET_CHANNEL_CMDID,
.pdev_set_param_cmdid = WMI_10_2_PDEV_SET_PARAM_CMDID,
@@ -541,6 +544,7 @@
.stop_scan_cmdid = WMI_10_4_STOP_SCAN_CMDID,
.scan_chan_list_cmdid = WMI_10_4_SCAN_CHAN_LIST_CMDID,
.scan_sch_prio_tbl_cmdid = WMI_10_4_SCAN_SCH_PRIO_TBL_CMDID,
+ .scan_prob_req_oui_cmdid = WMI_CMD_UNSUPPORTED,
.pdev_set_regdomain_cmdid = WMI_10_4_PDEV_SET_REGDOMAIN_CMDID,
.pdev_set_channel_cmdid = WMI_10_4_PDEV_SET_CHANNEL_CMDID,
.pdev_set_param_cmdid = WMI_10_4_PDEV_SET_PARAM_CMDID,
@@ -1338,6 +1342,7 @@
.stop_scan_cmdid = WMI_10_2_STOP_SCAN_CMDID,
.scan_chan_list_cmdid = WMI_10_2_SCAN_CHAN_LIST_CMDID,
.scan_sch_prio_tbl_cmdid = WMI_CMD_UNSUPPORTED,
+ .scan_prob_req_oui_cmdid = WMI_CMD_UNSUPPORTED,
.pdev_set_regdomain_cmdid = WMI_10_2_PDEV_SET_REGDOMAIN_CMDID,
.pdev_set_channel_cmdid = WMI_10_2_PDEV_SET_CHANNEL_CMDID,
.pdev_set_param_cmdid = WMI_10_2_PDEV_SET_PARAM_CMDID,
@@ -4357,7 +4362,7 @@
rate_code[i],
type);
snprintf(buff, sizeof(buff), "%8d ", tpc[j]);
- strncat(tpc_value, buff, strlen(buff));
+ strlcat(tpc_value, buff, sizeof(tpc_value));
}
tpc_stats->tpc_table[type].pream_idx[i] = pream_idx;
tpc_stats->tpc_table[type].rate_code[i] = rate_code[i];
@@ -4694,7 +4699,7 @@
rate_code[i],
type, pream_idx);
snprintf(buff, sizeof(buff), "%8d ", tpc[j]);
- strncat(tpc_value, buff, strlen(buff));
+ strlcat(tpc_value, buff, sizeof(tpc_value));
}
tpc_stats->tpc_table_final[type].pream_idx[i] = pream_idx;
tpc_stats->tpc_table_final[type].rate_code[i] = rate_code[i];
@@ -5059,7 +5064,6 @@
return;
}
- memset(&ar->wmi.svc_map, 0, sizeof(ar->wmi.svc_map));
ath10k_wmi_map_svc(ar, arg.service_map, ar->wmi.svc_map,
arg.service_map_len);
@@ -5269,6 +5273,21 @@
return 0;
}
+void ath10k_wmi_event_service_available(struct ath10k *ar, struct sk_buff *skb)
+{
+ int ret;
+ struct wmi_svc_avail_ev_arg arg = {};
+
+ ret = ath10k_wmi_pull_svc_avail(ar, skb, &arg);
+ if (ret) {
+ ath10k_warn(ar, "failed to parse servive available event: %d\n",
+ ret);
+ }
+
+ ath10k_wmi_map_svc_ext(ar, arg.service_map_ext, ar->wmi.svc_map,
+ __le32_to_cpu(arg.service_map_ext_len));
+}
+
static int ath10k_wmi_event_temperature(struct ath10k *ar, struct sk_buff *skb)
{
const struct wmi_pdev_temperature_event *ev;
@@ -5465,6 +5484,9 @@
ath10k_wmi_event_ready(ar, skb);
ath10k_wmi_queue_set_coverage_class_work(ar);
break;
+ case WMI_SERVICE_AVAILABLE_EVENTID:
+ ath10k_wmi_event_service_available(ar, skb);
+ break;
default:
ath10k_warn(ar, "Unknown eventid: %d\n", id);
break;
@@ -5880,6 +5902,8 @@
struct ath10k_htc_svc_conn_req conn_req;
struct ath10k_htc_svc_conn_resp conn_resp;
+ memset(&ar->wmi.svc_map, 0, sizeof(ar->wmi.svc_map));
+
memset(&conn_req, 0, sizeof(conn_req));
memset(&conn_resp, 0, sizeof(conn_resp));
@@ -7648,7 +7672,7 @@
cmd->param = __cpu_to_le32(param);
ath10k_dbg(ar, ATH10K_DBG_WMI,
- "wmi pdev get tcp config param:%d\n", param);
+ "wmi pdev get tpc config param %d\n", param);
return skb;
}
@@ -7768,7 +7792,7 @@
len += scnprintf(buf + len, buf_len - len, "%30s %10d\n",
"HW rate", pdev->data_rc);
len += scnprintf(buf + len, buf_len - len, "%30s %10d\n",
- "Sched self tiggers", pdev->self_triggers);
+ "Sched self triggers", pdev->self_triggers);
len += scnprintf(buf + len, buf_len - len, "%30s %10d\n",
"Dropped due to SW retries",
pdev->sw_retry_failure);
diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h
index 6fbc84c..16a3924 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.h
+++ b/drivers/net/wireless/ath/ath10k/wmi.h
@@ -201,6 +201,8 @@
WMI_SERVICE_HTT_MGMT_TX_COMP_VALID_FLAGS,
WMI_SERVICE_HOST_DFS_CHECK_SUPPORT,
WMI_SERVICE_TPC_STATS_FINAL,
+ WMI_SERVICE_RESET_CHIP,
+ WMI_SERVICE_SPOOF_MAC_SUPPORT,
/* keep last */
WMI_SERVICE_MAX,
@@ -238,6 +240,8 @@
WMI_10X_SERVICE_MESH,
WMI_10X_SERVICE_EXT_RES_CFG_SUPPORT,
WMI_10X_SERVICE_PEER_STATS,
+ WMI_10X_SERVICE_RESET_CHIP,
+ WMI_10X_SERVICE_HTT_MGMT_TX_COMP_VALID_FLAGS,
};
enum wmi_main_service {
@@ -548,6 +552,10 @@
WMI_SERVICE_EXT_RES_CFG_SUPPORT, len);
SVCMAP(WMI_10X_SERVICE_PEER_STATS,
WMI_SERVICE_PEER_STATS, len);
+ SVCMAP(WMI_10X_SERVICE_RESET_CHIP,
+ WMI_SERVICE_RESET_CHIP, len);
+ SVCMAP(WMI_10X_SERVICE_HTT_MGMT_TX_COMP_VALID_FLAGS,
+ WMI_SERVICE_HTT_MGMT_TX_COMP_VALID_FLAGS, len);
}
static inline void wmi_main_svc_map(const __le32 *in, unsigned long *out,
@@ -783,6 +791,7 @@
u32 stop_scan_cmdid;
u32 scan_chan_list_cmdid;
u32 scan_sch_prio_tbl_cmdid;
+ u32 scan_prob_req_oui_cmdid;
u32 pdev_set_regdomain_cmdid;
u32 pdev_set_channel_cmdid;
u32 pdev_set_param_cmdid;
@@ -1183,6 +1192,7 @@
enum wmi_event_id {
WMI_SERVICE_READY_EVENTID = 0x1,
WMI_READY_EVENTID,
+ WMI_SERVICE_AVAILABLE_EVENTID,
/* Scan specific events */
WMI_SCAN_EVENTID = WMI_EVT_GRP_START_ID(WMI_GRP_SCAN),
@@ -3159,6 +3169,8 @@
u16 channels[64];
struct wmi_ssid_arg ssids[WLAN_SCAN_PARAMS_MAX_SSID];
struct wmi_bssid_arg bssids[WLAN_SCAN_PARAMS_MAX_BSSID];
+ struct wmi_mac_addr mac_addr;
+ struct wmi_mac_addr mac_mask;
};
/* scan control flags */
@@ -3182,6 +3194,12 @@
*/
#define WMI_SCAN_CONTINUE_ON_ERROR 0x80
+/* Use random MAC address for TA for Probe Request frame and add
+ * OUI specified by WMI_SCAN_PROB_REQ_OUI_CMDID to the Probe Request frame.
+ * if OUI is not set by WMI_SCAN_PROB_REQ_OUI_CMDID then the flag is ignored.
+ */
+#define WMI_SCAN_ADD_SPOOFED_MAC_IN_PROBE_REQ 0x1000
+
/* WMI_SCAN_CLASS_MASK must be the same value as IEEE80211_SCAN_CLASS_MASK */
#define WMI_SCAN_CLASS_MASK 0xFF000000
@@ -6632,6 +6650,11 @@
const struct wlan_host_mem_req *mem_reqs[WMI_MAX_MEM_REQS];
};
+struct wmi_svc_avail_ev_arg {
+ __le32 service_map_ext_len;
+ const __le32 *service_map_ext;
+};
+
struct wmi_rdy_ev_arg {
__le32 sw_version;
__le32 abi_version;
@@ -6812,6 +6835,10 @@
#define WOW_MIN_PATTERN_SIZE 1
#define WOW_MAX_PATTERN_SIZE 148
#define WOW_MAX_PKT_OFFSET 128
+#define WOW_HDR_LEN (sizeof(struct ieee80211_hdr_3addr) + \
+ sizeof(struct rfc1042_hdr))
+#define WOW_MAX_REDUCE (WOW_HDR_LEN - sizeof(struct ethhdr) - \
+ offsetof(struct ieee80211_hdr_3addr, addr1))
enum wmi_tdls_state {
WMI_TDLS_DISABLE,
@@ -7052,6 +7079,7 @@
void ath10k_wmi_event_vdev_resume_req(struct ath10k *ar, struct sk_buff *skb);
void ath10k_wmi_event_service_ready(struct ath10k *ar, struct sk_buff *skb);
int ath10k_wmi_event_ready(struct ath10k *ar, struct sk_buff *skb);
+void ath10k_wmi_event_service_available(struct ath10k *ar, struct sk_buff *skb);
int ath10k_wmi_op_pull_phyerr_ev(struct ath10k *ar, const void *phyerr_buf,
int left_len, struct wmi_phyerr_ev_arg *arg);
void ath10k_wmi_main_op_fw_stats_fill(struct ath10k *ar,
diff --git a/drivers/net/wireless/ath/ath10k/wow.c b/drivers/net/wireless/ath/ath10k/wow.c
index c4cbccb..a6b179f 100644
--- a/drivers/net/wireless/ath/ath10k/wow.c
+++ b/drivers/net/wireless/ath/ath10k/wow.c
@@ -1,5 +1,6 @@
/*
* Copyright (c) 2015-2017 Qualcomm Atheros, Inc.
+ * Copyright (c) 2018, The Linux Foundation. All rights reserved.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
@@ -76,6 +77,109 @@
return 0;
}
+/**
+ * Convert a 802.3 format to a 802.11 format.
+ * +------------+-----------+--------+----------------+
+ * 802.3: |dest mac(6B)|src mac(6B)|type(2B)| body... |
+ * +------------+-----------+--------+----------------+
+ * |__ |_______ |____________ |________
+ * | | | |
+ * +--+------------+----+-----------+---------------+-----------+
+ * 802.11: |4B|dest mac(6B)| 6B |src mac(6B)| 8B |type(2B)| body... |
+ * +--+------------+----+-----------+---------------+-----------+
+ */
+static void ath10k_wow_convert_8023_to_80211
+ (struct cfg80211_pkt_pattern *new,
+ const struct cfg80211_pkt_pattern *old)
+{
+ u8 hdr_8023_pattern[ETH_HLEN] = {};
+ u8 hdr_8023_bit_mask[ETH_HLEN] = {};
+ u8 hdr_80211_pattern[WOW_HDR_LEN] = {};
+ u8 hdr_80211_bit_mask[WOW_HDR_LEN] = {};
+
+ int total_len = old->pkt_offset + old->pattern_len;
+ int hdr_80211_end_offset;
+
+ struct ieee80211_hdr_3addr *new_hdr_pattern =
+ (struct ieee80211_hdr_3addr *)hdr_80211_pattern;
+ struct ieee80211_hdr_3addr *new_hdr_mask =
+ (struct ieee80211_hdr_3addr *)hdr_80211_bit_mask;
+ struct ethhdr *old_hdr_pattern = (struct ethhdr *)hdr_8023_pattern;
+ struct ethhdr *old_hdr_mask = (struct ethhdr *)hdr_8023_bit_mask;
+ int hdr_len = sizeof(*new_hdr_pattern);
+
+ struct rfc1042_hdr *new_rfc_pattern =
+ (struct rfc1042_hdr *)(hdr_80211_pattern + hdr_len);
+ struct rfc1042_hdr *new_rfc_mask =
+ (struct rfc1042_hdr *)(hdr_80211_bit_mask + hdr_len);
+ int rfc_len = sizeof(*new_rfc_pattern);
+
+ memcpy(hdr_8023_pattern + old->pkt_offset,
+ old->pattern, ETH_HLEN - old->pkt_offset);
+ memcpy(hdr_8023_bit_mask + old->pkt_offset,
+ old->mask, ETH_HLEN - old->pkt_offset);
+
+ /* Copy destination address */
+ memcpy(new_hdr_pattern->addr1, old_hdr_pattern->h_dest, ETH_ALEN);
+ memcpy(new_hdr_mask->addr1, old_hdr_mask->h_dest, ETH_ALEN);
+
+ /* Copy source address */
+ memcpy(new_hdr_pattern->addr3, old_hdr_pattern->h_source, ETH_ALEN);
+ memcpy(new_hdr_mask->addr3, old_hdr_mask->h_source, ETH_ALEN);
+
+ /* Copy logic link type */
+ memcpy(&new_rfc_pattern->snap_type,
+ &old_hdr_pattern->h_proto,
+ sizeof(old_hdr_pattern->h_proto));
+ memcpy(&new_rfc_mask->snap_type,
+ &old_hdr_mask->h_proto,
+ sizeof(old_hdr_mask->h_proto));
+
+ /* Caculate new pkt_offset */
+ if (old->pkt_offset < ETH_ALEN)
+ new->pkt_offset = old->pkt_offset +
+ offsetof(struct ieee80211_hdr_3addr, addr1);
+ else if (old->pkt_offset < offsetof(struct ethhdr, h_proto))
+ new->pkt_offset = old->pkt_offset +
+ offsetof(struct ieee80211_hdr_3addr, addr3) -
+ offsetof(struct ethhdr, h_source);
+ else
+ new->pkt_offset = old->pkt_offset + hdr_len + rfc_len - ETH_HLEN;
+
+ /* Caculate new hdr end offset */
+ if (total_len > ETH_HLEN)
+ hdr_80211_end_offset = hdr_len + rfc_len;
+ else if (total_len > offsetof(struct ethhdr, h_proto))
+ hdr_80211_end_offset = hdr_len + rfc_len + total_len - ETH_HLEN;
+ else if (total_len > ETH_ALEN)
+ hdr_80211_end_offset = total_len - ETH_ALEN +
+ offsetof(struct ieee80211_hdr_3addr, addr3);
+ else
+ hdr_80211_end_offset = total_len +
+ offsetof(struct ieee80211_hdr_3addr, addr1);
+
+ new->pattern_len = hdr_80211_end_offset - new->pkt_offset;
+
+ memcpy((u8 *)new->pattern,
+ hdr_80211_pattern + new->pkt_offset,
+ new->pattern_len);
+ memcpy((u8 *)new->mask,
+ hdr_80211_bit_mask + new->pkt_offset,
+ new->pattern_len);
+
+ if (total_len > ETH_HLEN) {
+ /* Copy frame body */
+ memcpy((u8 *)new->pattern + new->pattern_len,
+ (void *)old->pattern + ETH_HLEN - old->pkt_offset,
+ total_len - ETH_HLEN);
+ memcpy((u8 *)new->mask + new->pattern_len,
+ (void *)old->mask + ETH_HLEN - old->pkt_offset,
+ total_len - ETH_HLEN);
+
+ new->pattern_len += total_len - ETH_HLEN;
+ }
+}
+
static int ath10k_vif_wow_set_wakeups(struct ath10k_vif *arvif,
struct cfg80211_wowlan *wowlan)
{
@@ -116,22 +220,40 @@
for (i = 0; i < wowlan->n_patterns; i++) {
u8 bitmask[WOW_MAX_PATTERN_SIZE] = {};
+ u8 ath_pattern[WOW_MAX_PATTERN_SIZE] = {};
+ u8 ath_bitmask[WOW_MAX_PATTERN_SIZE] = {};
+ struct cfg80211_pkt_pattern new_pattern = {};
+ struct cfg80211_pkt_pattern old_pattern = patterns[i];
int j;
+ new_pattern.pattern = ath_pattern;
+ new_pattern.mask = ath_bitmask;
if (patterns[i].pattern_len > WOW_MAX_PATTERN_SIZE)
continue;
-
/* convert bytemask to bitmask */
for (j = 0; j < patterns[i].pattern_len; j++)
if (patterns[i].mask[j / 8] & BIT(j % 8))
bitmask[j] = 0xff;
+ old_pattern.mask = bitmask;
+ new_pattern = old_pattern;
+
+ if (ar->wmi.rx_decap_mode == ATH10K_HW_TXRX_NATIVE_WIFI) {
+ if (patterns[i].pkt_offset < ETH_HLEN)
+ ath10k_wow_convert_8023_to_80211(&new_pattern,
+ &old_pattern);
+ else
+ new_pattern.pkt_offset += WOW_HDR_LEN - ETH_HLEN;
+ }
+
+ if (WARN_ON(new_pattern.pattern_len > WOW_MAX_PATTERN_SIZE))
+ return -EINVAL;
ret = ath10k_wmi_wow_add_pattern(ar, arvif->vdev_id,
pattern_id,
- patterns[i].pattern,
- bitmask,
- patterns[i].pattern_len,
- patterns[i].pkt_offset);
+ new_pattern.pattern,
+ new_pattern.mask,
+ new_pattern.pattern_len,
+ new_pattern.pkt_offset);
if (ret) {
ath10k_warn(ar, "failed to add pattern %i to vdev %i: %d\n",
pattern_id,
@@ -345,6 +467,12 @@
return -EINVAL;
ar->wow.wowlan_support = ath10k_wowlan_support;
+
+ if (ar->wmi.rx_decap_mode == ATH10K_HW_TXRX_NATIVE_WIFI) {
+ ar->wow.wowlan_support.pattern_max_len -= WOW_MAX_REDUCE;
+ ar->wow.wowlan_support.max_pkt_offset -= WOW_MAX_REDUCE;
+ }
+
ar->wow.wowlan_support.n_patterns = ar->wow.max_num_patterns;
ar->hw->wiphy->wowlan = &ar->wow.wowlan_support;
diff --git a/drivers/net/wireless/ath/ath6kl/debug.c b/drivers/net/wireless/ath/ath6kl/debug.c
index 0f965e9..4e94b22 100644
--- a/drivers/net/wireless/ath/ath6kl/debug.c
+++ b/drivers/net/wireless/ath/ath6kl/debug.c
@@ -645,7 +645,7 @@
len += scnprintf(buf + len, buf_len - len, "%20s %10llu\n",
"CRC Err", tgt_stats->rx_crc_err);
len += scnprintf(buf + len, buf_len - len, "%20s %10llu\n",
- "Key chache miss", tgt_stats->rx_key_cache_miss);
+ "Key cache miss", tgt_stats->rx_key_cache_miss);
len += scnprintf(buf + len, buf_len - len, "%20s %10llu\n",
"Decrypt Err", tgt_stats->rx_decrypt_err);
len += scnprintf(buf + len, buf_len - len, "%20s %10llu\n",
diff --git a/drivers/net/wireless/ath/ath6kl/main.c b/drivers/net/wireless/ath/ath6kl/main.c
index db95f85..808fb30 100644
--- a/drivers/net/wireless/ath/ath6kl/main.c
+++ b/drivers/net/wireless/ath/ath6kl/main.c
@@ -426,7 +426,7 @@
{
u8 *ies = NULL, *wpa_ie = NULL, *pos;
size_t ies_len = 0;
- struct station_info sinfo;
+ struct station_info *sinfo;
ath6kl_dbg(ATH6KL_DBG_TRC, "new station %pM aid=%d\n", mac_addr, aid);
@@ -482,16 +482,20 @@
keymgmt, ucipher, auth, apsd_info);
/* send event to application */
- memset(&sinfo, 0, sizeof(sinfo));
+ sinfo = kzalloc(sizeof(*sinfo), GFP_KERNEL);
+ if (!sinfo)
+ return;
/* TODO: sinfo.generation */
- sinfo.assoc_req_ies = ies;
- sinfo.assoc_req_ies_len = ies_len;
+ sinfo->assoc_req_ies = ies;
+ sinfo->assoc_req_ies_len = ies_len;
- cfg80211_new_sta(vif->ndev, mac_addr, &sinfo, GFP_KERNEL);
+ cfg80211_new_sta(vif->ndev, mac_addr, sinfo, GFP_KERNEL);
netif_wake_queue(vif->ndev);
+
+ kfree(sinfo);
}
void disconnect_timer_handler(struct timer_list *t)
diff --git a/drivers/net/wireless/ath/ath9k/dfs.c b/drivers/net/wireless/ath/ath9k/dfs.c
index 6fee9a4..c8844f5 100644
--- a/drivers/net/wireless/ath/ath9k/dfs.c
+++ b/drivers/net/wireless/ath/ath9k/dfs.c
@@ -41,7 +41,7 @@
/* we need at least 3 deltas / 4 samples for a reliable chirp detection */
#define NUM_DIFFS 3
-static const int FFT_NUM_SAMPLES = (NUM_DIFFS + 1);
+#define FFT_NUM_SAMPLES (NUM_DIFFS + 1)
/* Threshold for difference of delta peaks */
static const int MAX_DIFF = 2;
@@ -114,7 +114,7 @@
ath_dbg(common, DFS, "HT40: datalen=%d, num_fft_packets=%d\n",
datalen, num_fft_packets);
- if (num_fft_packets < (FFT_NUM_SAMPLES)) {
+ if (num_fft_packets < FFT_NUM_SAMPLES) {
ath_dbg(common, DFS, "not enough packets for chirp\n");
return false;
}
@@ -136,7 +136,7 @@
return false;
ath_dbg(common, DFS, "HT20: datalen=%d, num_fft_packets=%d\n",
datalen, num_fft_packets);
- if (num_fft_packets < (FFT_NUM_SAMPLES)) {
+ if (num_fft_packets < FFT_NUM_SAMPLES) {
ath_dbg(common, DFS, "not enough packets for chirp\n");
return false;
}
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index a3be8ad..b6663c8 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2544,7 +2544,8 @@
}
static void ath9k_mgd_prepare_tx(struct ieee80211_hw *hw,
- struct ieee80211_vif *vif)
+ struct ieee80211_vif *vif,
+ u16 duration)
{
struct ath_softc *sc = hw->priv;
struct ath_common *common = ath9k_hw_common(sc->sc_ah);
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c b/drivers/net/wireless/ath/wcn36xx/dxe.c
index 2c3b899..bd2b946 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.c
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
@@ -78,7 +78,6 @@
if (!cur_ctl)
goto out_fail;
- spin_lock_init(&cur_ctl->skb_lock);
cur_ctl->ctl_blk_order = i;
if (i == 0) {
ch->head_blk_ctl = cur_ctl;
@@ -275,12 +274,14 @@
return 0;
}
-static int wcn36xx_dxe_fill_skb(struct device *dev, struct wcn36xx_dxe_ctl *ctl)
+static int wcn36xx_dxe_fill_skb(struct device *dev,
+ struct wcn36xx_dxe_ctl *ctl,
+ gfp_t gfp)
{
struct wcn36xx_dxe_desc *dxe = ctl->desc;
struct sk_buff *skb;
- skb = alloc_skb(WCN36XX_PKT_SIZE, GFP_ATOMIC);
+ skb = alloc_skb(WCN36XX_PKT_SIZE, gfp);
if (skb == NULL)
return -ENOMEM;
@@ -307,7 +308,7 @@
cur_ctl = wcn_ch->head_blk_ctl;
for (i = 0; i < wcn_ch->desc_num; i++) {
- wcn36xx_dxe_fill_skb(wcn->dev, cur_ctl);
+ wcn36xx_dxe_fill_skb(wcn->dev, cur_ctl, GFP_KERNEL);
cur_ctl = cur_ctl->next;
}
@@ -367,7 +368,7 @@
spin_lock_irqsave(&ch->lock, flags);
ctl = ch->tail_blk_ctl;
do {
- if (ctl->desc->ctrl & WCN36xx_DXE_CTRL_VLD)
+ if (READ_ONCE(ctl->desc->ctrl) & WCN36xx_DXE_CTRL_VLD)
break;
if (ctl->skb) {
dma_unmap_single(wcn->dev, ctl->desc->src_addr_l,
@@ -377,18 +378,16 @@
/* Keep frame until TX status comes */
ieee80211_free_txskb(wcn->hw, ctl->skb);
}
- spin_lock(&ctl->skb_lock);
+
if (wcn->queues_stopped) {
wcn->queues_stopped = false;
ieee80211_wake_queues(wcn->hw);
}
- spin_unlock(&ctl->skb_lock);
ctl->skb = NULL;
}
ctl = ctl->next;
- } while (ctl != ch->head_blk_ctl &&
- !(ctl->desc->ctrl & WCN36xx_DXE_CTRL_VLD));
+ } while (ctl != ch->head_blk_ctl);
ch->tail_blk_ctl = ctl;
spin_unlock_irqrestore(&ch->lock, flags);
@@ -530,10 +529,10 @@
int_mask = WCN36XX_DXE_INT_CH3_MASK;
}
- while (!(dxe->ctrl & WCN36xx_DXE_CTRL_VLD)) {
+ while (!(READ_ONCE(dxe->ctrl) & WCN36xx_DXE_CTRL_VLD)) {
skb = ctl->skb;
dma_addr = dxe->dst_addr_l;
- ret = wcn36xx_dxe_fill_skb(wcn->dev, ctl);
+ ret = wcn36xx_dxe_fill_skb(wcn->dev, ctl, GFP_ATOMIC);
if (0 == ret) {
/* new skb allocation ok. Use the new one and queue
* the old one to network system.
@@ -654,8 +653,6 @@
spin_lock_irqsave(&ch->lock, flags);
ctl = ch->head_blk_ctl;
- spin_lock(&ctl->next->skb_lock);
-
/*
* If skb is not null that means that we reached the tail of the ring
* hence ring is full. Stop queues to let mac80211 back off until ring
@@ -664,11 +661,9 @@
if (NULL != ctl->next->skb) {
ieee80211_stop_queues(wcn->hw);
wcn->queues_stopped = true;
- spin_unlock(&ctl->next->skb_lock);
spin_unlock_irqrestore(&ch->lock, flags);
return -EBUSY;
}
- spin_unlock(&ctl->next->skb_lock);
ctl->skb = NULL;
desc = ctl->desc;
@@ -693,7 +688,6 @@
/* Set source address of the SKB we send */
ctl = ctl->next;
- ctl->skb = skb;
desc = ctl->desc;
if (ctl->bd_cpu_addr) {
wcn36xx_err("bd_cpu_addr cannot be NULL for skb DXE\n");
@@ -702,10 +696,16 @@
}
desc->src_addr_l = dma_map_single(wcn->dev,
- ctl->skb->data,
- ctl->skb->len,
+ skb->data,
+ skb->len,
DMA_TO_DEVICE);
+ if (dma_mapping_error(wcn->dev, desc->src_addr_l)) {
+ dev_err(wcn->dev, "unable to DMA map src_addr_l\n");
+ ret = -ENOMEM;
+ goto unlock;
+ }
+ ctl->skb = skb;
desc->dst_addr_l = ch->dxe_wq;
desc->fr_len = ctl->skb->len;
diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.h b/drivers/net/wireless/ath/wcn36xx/dxe.h
index ce58096..31b81b7 100644
--- a/drivers/net/wireless/ath/wcn36xx/dxe.h
+++ b/drivers/net/wireless/ath/wcn36xx/dxe.h
@@ -422,7 +422,6 @@
unsigned int desc_phy_addr;
int ctl_blk_order;
struct sk_buff *skb;
- spinlock_t skb_lock;
void *bd_cpu_addr;
dma_addr_t bd_phy_addr;
};
diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h b/drivers/net/wireless/ath/wcn36xx/hal.h
index 1829635..2aed6c2 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -88,6 +88,12 @@
/* version string max length (including NULL) */
#define WCN36XX_HAL_VERSION_LENGTH 64
+/* How many frames until we start a-mpdu TX session */
+#define WCN36XX_AMPDU_START_THRESH 20
+
+#define WCN36XX_MAX_SCAN_SSIDS 9
+#define WCN36XX_MAX_SCAN_IE_LEN 500
+
/* message types for messages exchanged between WDI and HAL */
enum wcn36xx_hal_host_msg_type {
/* Init/De-Init */
@@ -1170,7 +1176,7 @@
/* IE field */
u16 ie_len;
- u8 ie[0];
+ u8 ie[WCN36XX_MAX_SCAN_IE_LEN];
} __packed;
struct wcn36xx_hal_start_scan_offload_rsp_msg {
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c b/drivers/net/wireless/ath/wcn36xx/main.c
index 69d6be5..e3b91b3 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -353,6 +353,19 @@
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac stop\n");
+ cancel_work_sync(&wcn->scan_work);
+
+ mutex_lock(&wcn->scan_lock);
+ if (wcn->scan_req) {
+ struct cfg80211_scan_info scan_info = {
+ .aborted = true,
+ };
+
+ ieee80211_scan_completed(wcn->hw, &scan_info);
+ }
+ wcn->scan_req = NULL;
+ mutex_unlock(&wcn->scan_lock);
+
wcn36xx_debugfs_exit(wcn);
wcn36xx_smd_stop(wcn);
wcn36xx_dxe_deinit(wcn);
@@ -549,6 +562,7 @@
} else {
wcn36xx_smd_set_bsskey(wcn,
vif_priv->encrypt_type,
+ vif_priv->bss_index,
key_conf->keyidx,
key_conf->keylen,
key);
@@ -566,10 +580,13 @@
break;
case DISABLE_KEY:
if (!(IEEE80211_KEY_FLAG_PAIRWISE & key_conf->flags)) {
+ if (vif_priv->bss_index != WCN36XX_HAL_BSS_INVALID_IDX)
+ wcn36xx_smd_remove_bsskey(wcn,
+ vif_priv->encrypt_type,
+ vif_priv->bss_index,
+ key_conf->keyidx);
+
vif_priv->encrypt_type = WCN36XX_HAL_ED_NONE;
- wcn36xx_smd_remove_bsskey(wcn,
- vif_priv->encrypt_type,
- key_conf->keyidx);
} else {
sta_priv->is_data_encrypted = false;
/* do not remove key if disassociated */
@@ -670,10 +687,18 @@
wcn->scan_aborted = true;
mutex_unlock(&wcn->scan_lock);
- /* ieee80211_scan_completed will be called on FW scan indication */
- wcn36xx_smd_stop_hw_scan(wcn);
+ if (get_feat_caps(wcn->fw_feat_caps, SCAN_OFFLOAD)) {
+ /* ieee80211_scan_completed will be called on FW scan
+ * indication */
+ wcn36xx_smd_stop_hw_scan(wcn);
+ } else {
+ struct cfg80211_scan_info scan_info = {
+ .aborted = true,
+ };
- cancel_work_sync(&wcn->scan_work);
+ cancel_work_sync(&wcn->scan_work);
+ ieee80211_scan_completed(wcn->hw, &scan_info);
+ }
}
static void wcn36xx_update_allowed_rates(struct ieee80211_sta *sta,
@@ -953,6 +978,7 @@
mutex_lock(&wcn->conf_mutex);
+ vif_priv->bss_index = WCN36XX_HAL_BSS_INVALID_IDX;
list_add(&vif_priv->list, &wcn->vif_list);
wcn36xx_smd_add_sta_self(wcn, vif);
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c b/drivers/net/wireless/ath/wcn36xx/smd.c
index 8932af5..ea74f2b 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -620,9 +620,13 @@
int wcn36xx_smd_start_hw_scan(struct wcn36xx *wcn, struct ieee80211_vif *vif,
struct cfg80211_scan_request *req)
{
+ struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
struct wcn36xx_hal_start_scan_offload_req_msg msg_body;
int ret, i;
+ if (req->ie_len > WCN36XX_MAX_SCAN_IE_LEN)
+ return -EINVAL;
+
mutex_lock(&wcn->hal_mutex);
INIT_HAL_MSG(msg_body, WCN36XX_HAL_START_SCAN_OFFLOAD_REQ);
@@ -631,6 +635,7 @@
msg_body.max_ch_time = 100;
msg_body.scan_hidden = 1;
memcpy(msg_body.mac, vif->addr, ETH_ALEN);
+ msg_body.bss_type = vif_priv->bss_type;
msg_body.p2p_search = vif->p2p;
msg_body.num_ssid = min_t(u8, req->n_ssids, ARRAY_SIZE(msg_body.ssids));
@@ -646,6 +651,14 @@
for (i = 0; i < msg_body.num_channel; i++)
msg_body.channels[i] = req->channels[i]->hw_value;
+ msg_body.header.len -= WCN36XX_MAX_SCAN_IE_LEN;
+
+ if (req->ie_len > 0) {
+ msg_body.ie_len = req->ie_len;
+ msg_body.header.len += req->ie_len;
+ memcpy(msg_body.ie, req->ie, req->ie_len);
+ }
+
PREPARE_HAL_BUF(wcn->hal_buf, msg_body);
wcn36xx_dbg(WCN36XX_DBG_HAL,
@@ -1399,9 +1412,10 @@
bss->spectrum_mgt_enable = 0;
bss->tx_mgmt_power = 0;
bss->max_tx_power = WCN36XX_MAX_POWER(wcn);
-
bss->action = update;
+ vif_priv->bss_type = bss->bss_type;
+
wcn36xx_dbg(WCN36XX_DBG_HAL,
"hal config bss bssid %pM self_mac_addr %pM bss_type %d oper_mode %d nw_type %d\n",
bss->bssid, bss->self_mac_addr, bss->bss_type,
@@ -1446,6 +1460,10 @@
int ret = 0;
mutex_lock(&wcn->hal_mutex);
+
+ if (vif_priv->bss_index == WCN36XX_HAL_BSS_INVALID_IDX)
+ goto out;
+
INIT_HAL_MSG(msg_body, WCN36XX_HAL_DELETE_BSS_REQ);
msg_body.bss_index = vif_priv->bss_index;
@@ -1464,6 +1482,8 @@
wcn36xx_err("hal_delete_bss response failed err=%d\n", ret);
goto out;
}
+
+ vif_priv->bss_index = WCN36XX_HAL_BSS_INVALID_IDX;
out:
mutex_unlock(&wcn->hal_mutex);
return ret;
@@ -1630,6 +1650,7 @@
int wcn36xx_smd_set_bsskey(struct wcn36xx *wcn,
enum ani_ed_type enc_type,
+ u8 bssidx,
u8 keyidx,
u8 keylen,
u8 *key)
@@ -1639,7 +1660,7 @@
mutex_lock(&wcn->hal_mutex);
INIT_HAL_MSG(msg_body, WCN36XX_HAL_SET_BSSKEY_REQ);
- msg_body.bss_idx = 0;
+ msg_body.bss_idx = bssidx;
msg_body.enc_type = enc_type;
msg_body.num_keys = 1;
msg_body.keys[0].id = keyidx;
@@ -1700,6 +1721,7 @@
int wcn36xx_smd_remove_bsskey(struct wcn36xx *wcn,
enum ani_ed_type enc_type,
+ u8 bssidx,
u8 keyidx)
{
struct wcn36xx_hal_remove_bss_key_req_msg msg_body;
@@ -1707,7 +1729,7 @@
mutex_lock(&wcn->hal_mutex);
INIT_HAL_MSG(msg_body, WCN36XX_HAL_RMV_BSSKEY_REQ);
- msg_body.bss_idx = 0;
+ msg_body.bss_idx = bssidx;
msg_body.enc_type = enc_type;
msg_body.key_id = keyidx;
@@ -2132,11 +2154,13 @@
return -EIO;
}
- wcn36xx_dbg(WCN36XX_DBG_HAL, "scan indication (type %x)", rsp->type);
+ wcn36xx_dbg(WCN36XX_DBG_HAL, "scan indication (type %x)\n", rsp->type);
switch (rsp->type) {
case WCN36XX_HAL_SCAN_IND_FAILED:
+ case WCN36XX_HAL_SCAN_IND_DEQUEUED:
scan_info.aborted = true;
+ /* fall through */
case WCN36XX_HAL_SCAN_IND_COMPLETED:
mutex_lock(&wcn->scan_lock);
wcn->scan_req = NULL;
@@ -2147,7 +2171,6 @@
break;
case WCN36XX_HAL_SCAN_IND_STARTED:
case WCN36XX_HAL_SCAN_IND_FOREIGN_CHANNEL:
- case WCN36XX_HAL_SCAN_IND_DEQUEUED:
case WCN36XX_HAL_SCAN_IND_PREEMPTED:
case WCN36XX_HAL_SCAN_IND_RESTARTED:
break;
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.h b/drivers/net/wireless/ath/wcn36xx/smd.h
index 8076edf..61bb8d4 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.h
+++ b/drivers/net/wireless/ath/wcn36xx/smd.h
@@ -97,6 +97,7 @@
u8 sta_index);
int wcn36xx_smd_set_bsskey(struct wcn36xx *wcn,
enum ani_ed_type enc_type,
+ u8 bssidx,
u8 keyidx,
u8 keylen,
u8 *key);
@@ -106,6 +107,7 @@
u8 sta_index);
int wcn36xx_smd_remove_bsskey(struct wcn36xx *wcn,
enum ani_ed_type enc_type,
+ u8 bssidx,
u8 keyidx);
int wcn36xx_smd_enter_bmps(struct wcn36xx *wcn, struct ieee80211_vif *vif);
int wcn36xx_smd_exit_bmps(struct wcn36xx *wcn, struct ieee80211_vif *vif);
diff --git a/drivers/net/wireless/ath/wcn36xx/txrx.c b/drivers/net/wireless/ath/wcn36xx/txrx.c
index b1768ed..a690237 100644
--- a/drivers/net/wireless/ath/wcn36xx/txrx.c
+++ b/drivers/net/wireless/ath/wcn36xx/txrx.c
@@ -273,6 +273,7 @@
bool bcast = is_broadcast_ether_addr(hdr->addr1) ||
is_multicast_ether_addr(hdr->addr1);
struct wcn36xx_tx_bd bd;
+ int ret;
memset(&bd, 0, sizeof(bd));
@@ -317,5 +318,17 @@
buff_to_be((u32 *)&bd, sizeof(bd)/sizeof(u32));
bd.tx_bd_sign = 0xbdbdbdbd;
- return wcn36xx_dxe_tx_frame(wcn, vif_priv, &bd, skb, is_low);
+ ret = wcn36xx_dxe_tx_frame(wcn, vif_priv, &bd, skb, is_low);
+ if (ret && bd.tx_comp) {
+ /* If the skb has not been transmitted,
+ * don't keep a reference to it.
+ */
+ spin_lock_irqsave(&wcn->dxe_lock, flags);
+ wcn->tx_ack_skb = NULL;
+ spin_unlock_irqrestore(&wcn->dxe_lock, flags);
+
+ ieee80211_wake_queues(wcn->hw);
+ }
+
+ return ret;
}
diff --git a/drivers/net/wireless/ath/wcn36xx/wcn36xx.h b/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
index 5854adf..9343989 100644
--- a/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
+++ b/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
@@ -32,12 +32,6 @@
#define WLAN_NV_FILE "wlan/prima/WCNSS_qcom_wlan_nv.bin"
#define WCN36XX_AGGR_BUFFER_SIZE 64
-/* How many frames until we start a-mpdu TX session */
-#define WCN36XX_AMPDU_START_THRESH 20
-
-#define WCN36XX_MAX_SCAN_SSIDS 9
-#define WCN36XX_MAX_SCAN_IE_LEN 500
-
extern unsigned int wcn36xx_dbg_mask;
enum wcn36xx_debug_mask {
@@ -123,6 +117,7 @@
bool is_joining;
bool sta_assoc;
struct wcn36xx_hal_mac_ssid ssid;
+ enum wcn36xx_hal_bss_type bss_type;
/* Power management */
enum wcn36xx_power_state pw_state;
diff --git a/drivers/net/wireless/ath/wil6210/debugfs.c b/drivers/net/wireless/ath/wil6210/debugfs.c
index 8c90b31..11e46e4 100644
--- a/drivers/net/wireless/ath/wil6210/debugfs.c
+++ b/drivers/net/wireless/ath/wil6210/debugfs.c
@@ -1200,8 +1200,12 @@
static int wil_link_debugfs_show(struct seq_file *s, void *data)
{
struct wil6210_priv *wil = s->private;
- struct station_info sinfo;
- int i, rc;
+ struct station_info *sinfo;
+ int i, rc = 0;
+
+ sinfo = kzalloc(sizeof(*sinfo), GFP_KERNEL);
+ if (!sinfo)
+ return -ENOMEM;
for (i = 0; i < ARRAY_SIZE(wil->sta); i++) {
struct wil_sta_info *p = &wil->sta[i];
@@ -1229,19 +1233,21 @@
vif = (mid < wil->max_vifs) ? wil->vifs[mid] : NULL;
if (vif) {
- rc = wil_cid_fill_sinfo(vif, i, &sinfo);
+ rc = wil_cid_fill_sinfo(vif, i, sinfo);
if (rc)
- return rc;
+ goto out;
- seq_printf(s, " Tx_mcs = %d\n", sinfo.txrate.mcs);
- seq_printf(s, " Rx_mcs = %d\n", sinfo.rxrate.mcs);
- seq_printf(s, " SQ = %d\n", sinfo.signal);
+ seq_printf(s, " Tx_mcs = %d\n", sinfo->txrate.mcs);
+ seq_printf(s, " Rx_mcs = %d\n", sinfo->rxrate.mcs);
+ seq_printf(s, " SQ = %d\n", sinfo->signal);
} else {
seq_puts(s, " INVALID MID\n");
}
}
- return 0;
+out:
+ kfree(sinfo);
+ return rc;
}
static int wil_link_seq_open(struct inode *inode, struct file *file)
diff --git a/drivers/net/wireless/ath/wil6210/main.c b/drivers/net/wireless/ath/wil6210/main.c
index a4b413e..82aec6b 100644
--- a/drivers/net/wireless/ath/wil6210/main.c
+++ b/drivers/net/wireless/ath/wil6210/main.c
@@ -391,7 +391,7 @@
struct wil6210_priv *wil = container_of(work, struct wil6210_priv,
fw_error_worker);
struct net_device *ndev = wil->main_ndev;
- struct wireless_dev *wdev = ndev->ieee80211_ptr;
+ struct wireless_dev *wdev;
wil_dbg_misc(wil, "fw error worker\n");
@@ -399,6 +399,7 @@
wil_info(wil, "No recovery - interface is down\n");
return;
}
+ wdev = ndev->ieee80211_ptr;
/* increment @recovery_count if less then WIL6210_FW_RECOVERY_TO
* passed since last recovery attempt
diff --git a/drivers/net/wireless/ath/wil6210/wmi.c b/drivers/net/wireless/ath/wil6210/wmi.c
index a3dda9a..90de9a9 100644
--- a/drivers/net/wireless/ath/wil6210/wmi.c
+++ b/drivers/net/wireless/ath/wil6210/wmi.c
@@ -824,7 +824,7 @@
struct wireless_dev *wdev = vif_to_wdev(vif);
struct wmi_connect_event *evt = d;
int ch; /* channel number */
- struct station_info sinfo;
+ struct station_info *sinfo;
u8 *assoc_req_ie, *assoc_resp_ie;
size_t assoc_req_ielen, assoc_resp_ielen;
/* capinfo(u16) + listen_interval(u16) + IEs */
@@ -940,6 +940,7 @@
vif->bss = NULL;
} else if ((wdev->iftype == NL80211_IFTYPE_AP) ||
(wdev->iftype == NL80211_IFTYPE_P2P_GO)) {
+
if (rc) {
if (disable_ap_sme)
/* notify new_sta has failed */
@@ -947,16 +948,22 @@
goto out;
}
- memset(&sinfo, 0, sizeof(sinfo));
-
- sinfo.generation = wil->sinfo_gen++;
-
- if (assoc_req_ie) {
- sinfo.assoc_req_ies = assoc_req_ie;
- sinfo.assoc_req_ies_len = assoc_req_ielen;
+ sinfo = kzalloc(sizeof(*sinfo), GFP_KERNEL);
+ if (!sinfo) {
+ rc = -ENOMEM;
+ goto out;
}
- cfg80211_new_sta(ndev, evt->bssid, &sinfo, GFP_KERNEL);
+ sinfo->generation = wil->sinfo_gen++;
+
+ if (assoc_req_ie) {
+ sinfo->assoc_req_ies = assoc_req_ie;
+ sinfo->assoc_req_ies_len = assoc_req_ielen;
+ }
+
+ cfg80211_new_sta(ndev, evt->bssid, sinfo, GFP_KERNEL);
+
+ kfree(sinfo);
} else {
wil_err(wil, "unhandled iftype %d for CID %d\n", wdev->iftype,
evt->cid);
diff --git a/drivers/net/wireless/broadcom/b43/dma.c b/drivers/net/wireless/broadcom/b43/dma.c
index 6837064..6b0e1ec 100644
--- a/drivers/net/wireless/broadcom/b43/dma.c
+++ b/drivers/net/wireless/broadcom/b43/dma.c
@@ -1484,7 +1484,7 @@
int slot, firstused;
bool frame_succeed;
int skip;
- static u8 err_out1, err_out2;
+ static u8 err_out1;
ring = parse_cookie(dev, status->cookie, &slot);
if (unlikely(!ring))
@@ -1518,13 +1518,13 @@
}
} else {
/* More than a single header/data pair were missed.
- * Report this error once.
+ * Report this error, and reset the controller to
+ * revive operation.
*/
- if (!err_out2)
- b43dbg(dev->wl,
- "Out of order TX status report on DMA ring %d. Expected %d, but got %d\n",
- ring->index, firstused, slot);
- err_out2 = 1;
+ b43dbg(dev->wl,
+ "Out of order TX status report on DMA ring %d. Expected %d, but got %d\n",
+ ring->index, firstused, slot);
+ b43_controller_restart(dev, "Out of order TX");
return;
}
}
diff --git a/drivers/net/wireless/broadcom/b43legacy/dma.c b/drivers/net/wireless/broadcom/b43legacy/dma.c
index cfa617d..2f0c64c 100644
--- a/drivers/net/wireless/broadcom/b43legacy/dma.c
+++ b/drivers/net/wireless/broadcom/b43legacy/dma.c
@@ -1064,7 +1064,7 @@
meta->dmaaddr = map_descbuffer(ring, skb->data, skb->len, 1);
/* create a bounce buffer in zone_dma on mapping failure. */
if (b43legacy_dma_mapping_error(ring, meta->dmaaddr, skb->len, 1)) {
- bounce_skb = alloc_skb(skb->len, GFP_ATOMIC | GFP_DMA);
+ bounce_skb = alloc_skb(skb->len, GFP_KERNEL | GFP_DMA);
if (!bounce_skb) {
ring->current_slot = old_top_slot;
ring->used_slots = old_used_slots;
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index 0b68240..a191541 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -963,6 +963,7 @@
BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43340),
BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43341),
BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43362),
+ BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43364),
BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_4335_4339),
BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_4339),
BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43430),
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 89b8625..f5b405c 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -2728,9 +2728,8 @@
struct brcmf_bss_info_le *bi)
{
struct wiphy *wiphy = cfg_to_wiphy(cfg);
- struct ieee80211_channel *notify_channel;
struct cfg80211_bss *bss;
- struct ieee80211_supported_band *band;
+ enum nl80211_band band;
struct brcmu_chan ch;
u16 channel;
u32 freq;
@@ -2738,7 +2737,7 @@
u16 notify_interval;
u8 *notify_ie;
size_t notify_ielen;
- s32 notify_signal;
+ struct cfg80211_inform_bss bss_data = {};
if (le32_to_cpu(bi->length) > WL_BSS_INFO_MAX) {
brcmf_err("Bss info is larger than buffer. Discarding\n");
@@ -2753,32 +2752,33 @@
channel = bi->ctl_ch;
if (channel <= CH_MAX_2G_CHANNEL)
- band = wiphy->bands[NL80211_BAND_2GHZ];
+ band = NL80211_BAND_2GHZ;
else
- band = wiphy->bands[NL80211_BAND_5GHZ];
+ band = NL80211_BAND_5GHZ;
- freq = ieee80211_channel_to_frequency(channel, band->band);
- notify_channel = ieee80211_get_channel(wiphy, freq);
+ freq = ieee80211_channel_to_frequency(channel, band);
+ bss_data.chan = ieee80211_get_channel(wiphy, freq);
+ bss_data.scan_width = NL80211_BSS_CHAN_WIDTH_20;
+ bss_data.boottime_ns = ktime_to_ns(ktime_get_boottime());
notify_capability = le16_to_cpu(bi->capability);
notify_interval = le16_to_cpu(bi->beacon_period);
notify_ie = (u8 *)bi + le16_to_cpu(bi->ie_offset);
notify_ielen = le32_to_cpu(bi->ie_length);
- notify_signal = (s16)le16_to_cpu(bi->RSSI) * 100;
+ bss_data.signal = (s16)le16_to_cpu(bi->RSSI) * 100;
brcmf_dbg(CONN, "bssid: %pM\n", bi->BSSID);
brcmf_dbg(CONN, "Channel: %d(%d)\n", channel, freq);
brcmf_dbg(CONN, "Capability: %X\n", notify_capability);
brcmf_dbg(CONN, "Beacon interval: %d\n", notify_interval);
- brcmf_dbg(CONN, "Signal: %d\n", notify_signal);
+ brcmf_dbg(CONN, "Signal: %d\n", bss_data.signal);
- bss = cfg80211_inform_bss(wiphy, notify_channel,
- CFG80211_BSS_FTYPE_UNKNOWN,
- (const u8 *)bi->BSSID,
- 0, notify_capability,
- notify_interval, notify_ie,
- notify_ielen, notify_signal,
- GFP_KERNEL);
+ bss = cfg80211_inform_bss_data(wiphy, &bss_data,
+ CFG80211_BSS_FTYPE_UNKNOWN,
+ (const u8 *)bi->BSSID,
+ 0, notify_capability,
+ notify_interval, notify_ie,
+ notify_ielen, GFP_KERNEL);
if (!bss)
return -ENOMEM;
@@ -5498,7 +5498,7 @@
static int generation;
u32 event = e->event_code;
u32 reason = e->reason;
- struct station_info sinfo;
+ struct station_info *sinfo;
brcmf_dbg(CONN, "event %s (%u), reason %d\n",
brcmf_fweh_event_name(event), event, reason);
@@ -5511,16 +5511,22 @@
if (((event == BRCMF_E_ASSOC_IND) || (event == BRCMF_E_REASSOC_IND)) &&
(reason == BRCMF_E_STATUS_SUCCESS)) {
- memset(&sinfo, 0, sizeof(sinfo));
if (!data) {
brcmf_err("No IEs present in ASSOC/REASSOC_IND");
return -EINVAL;
}
- sinfo.assoc_req_ies = data;
- sinfo.assoc_req_ies_len = e->datalen;
+
+ sinfo = kzalloc(sizeof(*sinfo), GFP_KERNEL);
+ if (!sinfo)
+ return -ENOMEM;
+
+ sinfo->assoc_req_ies = data;
+ sinfo->assoc_req_ies_len = e->datalen;
generation++;
- sinfo.generation = generation;
- cfg80211_new_sta(ndev, e->addr, &sinfo, GFP_KERNEL);
+ sinfo->generation = generation;
+ cfg80211_new_sta(ndev, e->addr, sinfo, GFP_KERNEL);
+
+ kfree(sinfo);
} else if ((event == BRCMF_E_DISASSOC_IND) ||
(event == BRCMF_E_DEAUTH_IND) ||
(event == BRCMF_E_DEAUTH)) {
@@ -6512,6 +6518,7 @@
wiphy->flags |= WIPHY_FLAG_NETNS_OK |
WIPHY_FLAG_PS_ON_BY_DEFAULT |
+ WIPHY_FLAG_HAVE_AP_SME |
WIPHY_FLAG_OFFCHAN_TX |
WIPHY_FLAG_HAS_REMAIN_ON_CHANNEL;
if (brcmf_feat_is_enabled(ifp, BRCMF_FEAT_TDLS))
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c
index 3b829fe..927d62b 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/chip.c
@@ -689,6 +689,7 @@
case BRCM_CC_43525_CHIP_ID:
case BRCM_CC_4365_CHIP_ID:
case BRCM_CC_4366_CHIP_ID:
+ case BRCM_CC_43664_CHIP_ID:
return 0x200000;
case CY_CC_4373_CHIP_ID:
return 0x160000;
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
index 94e177d..9095b83 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
@@ -634,7 +634,7 @@
struct brcmf_fw_request *
brcmf_fw_alloc_request(u32 chip, u32 chiprev,
- struct brcmf_firmware_mapping mapping_table[],
+ const struct brcmf_firmware_mapping mapping_table[],
u32 table_size, struct brcmf_fw_name *fwnames,
u32 n_fwnames)
{
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.h
index 79a2109..2893e56 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.h
@@ -80,7 +80,7 @@
struct brcmf_fw_request *
brcmf_fw_alloc_request(u32 chip, u32 chiprev,
- struct brcmf_firmware_mapping mapping_table[],
+ const struct brcmf_firmware_mapping mapping_table[],
u32 table_size, struct brcmf_fw_name *fwnames,
u32 n_fwnames);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.h
index f93ba6b..692235d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.h
@@ -27,8 +27,10 @@
#define BRCMF_H2D_MSGRING_CONTROL_SUBMIT_ITEMSIZE 40
#define BRCMF_H2D_MSGRING_RXPOST_SUBMIT_ITEMSIZE 32
#define BRCMF_D2H_MSGRING_CONTROL_COMPLETE_ITEMSIZE 24
-#define BRCMF_D2H_MSGRING_TX_COMPLETE_ITEMSIZE 16
-#define BRCMF_D2H_MSGRING_RX_COMPLETE_ITEMSIZE 32
+#define BRCMF_D2H_MSGRING_TX_COMPLETE_ITEMSIZE_PRE_V7 16
+#define BRCMF_D2H_MSGRING_TX_COMPLETE_ITEMSIZE 24
+#define BRCMF_D2H_MSGRING_RX_COMPLETE_ITEMSIZE_PRE_V7 32
+#define BRCMF_D2H_MSGRING_RX_COMPLETE_ITEMSIZE 40
#define BRCMF_H2D_TXFLOWRING_ITEMSIZE 48
struct msgbuf_buf_addr {
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
index bcef208..4b2149b 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
@@ -2073,6 +2073,13 @@
}
pri_ifp = p2p->bss_idx[P2PAPI_BSSCFG_PRIMARY].vif->ifp;
+
+ /* firmware requires unique mac address for p2pdev interface */
+ if (addr && ether_addr_equal(addr, pri_ifp->mac_addr)) {
+ brcmf_err("discovery vif must be different from primary interface\n");
+ return ERR_PTR(-EINVAL);
+ }
+
brcmf_p2p_generate_bss_mac(p2p, addr);
brcmf_p2p_set_firmware(pri_ifp, p2p->dev_addr);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
index 091c191..f0797ae 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
@@ -59,7 +59,7 @@
BRCMF_FW_DEF(4366C, "brcmfmac4366c-pcie");
BRCMF_FW_DEF(4371, "brcmfmac4371-pcie");
-static struct brcmf_firmware_mapping brcmf_pcie_fwnames[] = {
+static const struct brcmf_firmware_mapping brcmf_pcie_fwnames[] = {
BRCMF_FW_ENTRY(BRCM_CC_43602_CHIP_ID, 0xFFFFFFFF, 43602),
BRCMF_FW_ENTRY(BRCM_CC_43465_CHIP_ID, 0xFFFFFFF0, 4366C),
BRCMF_FW_ENTRY(BRCM_CC_4350_CHIP_ID, 0x000000FF, 4350C),
@@ -75,6 +75,7 @@
BRCMF_FW_ENTRY(BRCM_CC_4365_CHIP_ID, 0xFFFFFFF0, 4365C),
BRCMF_FW_ENTRY(BRCM_CC_4366_CHIP_ID, 0x0000000F, 4366B),
BRCMF_FW_ENTRY(BRCM_CC_4366_CHIP_ID, 0xFFFFFFF0, 4366C),
+ BRCMF_FW_ENTRY(BRCM_CC_43664_CHIP_ID, 0xFFFFFFF0, 4366C),
BRCMF_FW_ENTRY(BRCM_CC_4371_CHIP_ID, 0xFFFFFFFF, 4371),
};
@@ -104,7 +105,8 @@
#define BRCMF_PCIE_PCIE2REG_MAILBOXMASK 0x4C
#define BRCMF_PCIE_PCIE2REG_CONFIGADDR 0x120
#define BRCMF_PCIE_PCIE2REG_CONFIGDATA 0x124
-#define BRCMF_PCIE_PCIE2REG_H2D_MAILBOX 0x140
+#define BRCMF_PCIE_PCIE2REG_H2D_MAILBOX_0 0x140
+#define BRCMF_PCIE_PCIE2REG_H2D_MAILBOX_1 0x144
#define BRCMF_PCIE2_INTA 0x01
#define BRCMF_PCIE2_INTB 0x02
@@ -134,11 +136,13 @@
BRCMF_PCIE_MB_INT_D2H3_DB0 | \
BRCMF_PCIE_MB_INT_D2H3_DB1)
+#define BRCMF_PCIE_SHARED_VERSION_7 7
#define BRCMF_PCIE_MIN_SHARED_VERSION 5
-#define BRCMF_PCIE_MAX_SHARED_VERSION 6
+#define BRCMF_PCIE_MAX_SHARED_VERSION BRCMF_PCIE_SHARED_VERSION_7
#define BRCMF_PCIE_SHARED_VERSION_MASK 0x00FF
#define BRCMF_PCIE_SHARED_DMA_INDEX 0x10000
#define BRCMF_PCIE_SHARED_DMA_2B_IDX 0x100000
+#define BRCMF_PCIE_SHARED_HOSTRDY_DB1 0x10000000
#define BRCMF_PCIE_FLAGS_HTOD_SPLIT 0x4000
#define BRCMF_PCIE_FLAGS_DTOH_SPLIT 0x8000
@@ -315,6 +319,14 @@
BRCMF_D2H_MSGRING_RX_COMPLETE_MAX_ITEM
};
+static const u32 brcmf_ring_itemsize_pre_v7[BRCMF_NROF_COMMON_MSGRINGS] = {
+ BRCMF_H2D_MSGRING_CONTROL_SUBMIT_ITEMSIZE,
+ BRCMF_H2D_MSGRING_RXPOST_SUBMIT_ITEMSIZE,
+ BRCMF_D2H_MSGRING_CONTROL_COMPLETE_ITEMSIZE,
+ BRCMF_D2H_MSGRING_TX_COMPLETE_ITEMSIZE_PRE_V7,
+ BRCMF_D2H_MSGRING_RX_COMPLETE_ITEMSIZE_PRE_V7
+};
+
static const u32 brcmf_ring_itemsize[BRCMF_NROF_COMMON_MSGRINGS] = {
BRCMF_H2D_MSGRING_CONTROL_SUBMIT_ITEMSIZE,
BRCMF_H2D_MSGRING_RXPOST_SUBMIT_ITEMSIZE,
@@ -781,6 +793,12 @@
BRCMF_PCIE_MB_INT_FN0_1);
}
+static void brcmf_pcie_hostready(struct brcmf_pciedev_info *devinfo)
+{
+ if (devinfo->shared.flags & BRCMF_PCIE_SHARED_HOSTRDY_DB1)
+ brcmf_pcie_write_reg32(devinfo,
+ BRCMF_PCIE_PCIE2REG_H2D_MAILBOX_1, 1);
+}
static irqreturn_t brcmf_pcie_quick_check_isr(int irq, void *arg)
{
@@ -923,7 +941,7 @@
brcmf_dbg(PCIE, "RING !\n");
/* Any arbitrary value will do, lets use 1 */
- brcmf_pcie_write_reg32(devinfo, BRCMF_PCIE_PCIE2REG_H2D_MAILBOX, 1);
+ brcmf_pcie_write_reg32(devinfo, BRCMF_PCIE_PCIE2REG_H2D_MAILBOX_0, 1);
return 0;
}
@@ -998,8 +1016,14 @@
struct brcmf_pcie_ringbuf *ring;
u32 size;
u32 addr;
+ const u32 *ring_itemsize_array;
- size = brcmf_ring_max_item[ring_id] * brcmf_ring_itemsize[ring_id];
+ if (devinfo->shared.version < BRCMF_PCIE_SHARED_VERSION_7)
+ ring_itemsize_array = brcmf_ring_itemsize_pre_v7;
+ else
+ ring_itemsize_array = brcmf_ring_itemsize;
+
+ size = brcmf_ring_max_item[ring_id] * ring_itemsize_array[ring_id];
dma_buf = brcmf_pcie_init_dmabuffer_for_device(devinfo, size,
tcm_ring_phys_addr + BRCMF_RING_MEM_BASE_ADDR_OFFSET,
&dma_handle);
@@ -1009,7 +1033,7 @@
addr = tcm_ring_phys_addr + BRCMF_RING_MAX_ITEM_OFFSET;
brcmf_pcie_write_tcm16(devinfo, addr, brcmf_ring_max_item[ring_id]);
addr = tcm_ring_phys_addr + BRCMF_RING_LEN_ITEMS_OFFSET;
- brcmf_pcie_write_tcm16(devinfo, addr, brcmf_ring_itemsize[ring_id]);
+ brcmf_pcie_write_tcm16(devinfo, addr, ring_itemsize_array[ring_id]);
ring = kzalloc(sizeof(*ring), GFP_KERNEL);
if (!ring) {
@@ -1018,7 +1042,7 @@
return NULL;
}
brcmf_commonring_config(&ring->commonring, brcmf_ring_max_item[ring_id],
- brcmf_ring_itemsize[ring_id], dma_buf);
+ ring_itemsize_array[ring_id], dma_buf);
ring->dma_handle = dma_handle;
ring->devinfo = devinfo;
brcmf_commonring_register_cb(&ring->commonring,
@@ -1727,6 +1751,7 @@
init_waitqueue_head(&devinfo->mbdata_resp_wait);
brcmf_pcie_intr_enable(devinfo);
+ brcmf_pcie_hostready(devinfo);
if (brcmf_attach(&devinfo->pdev->dev, devinfo->settings) == 0)
return;
@@ -1949,6 +1974,7 @@
brcmf_pcie_select_core(devinfo, BCMA_CORE_PCIE2);
brcmf_bus_change_state(bus, BRCMF_BUS_UP);
brcmf_pcie_intr_enable(devinfo);
+ brcmf_pcie_hostready(devinfo);
return 0;
}
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 1037df7..412a05b 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -619,7 +619,7 @@
BRCMF_FW_DEF(4356, "brcmfmac4356-sdio");
BRCMF_FW_DEF(4373, "brcmfmac4373-sdio");
-static struct brcmf_firmware_mapping brcmf_sdio_fwnames[] = {
+static const struct brcmf_firmware_mapping brcmf_sdio_fwnames[] = {
BRCMF_FW_ENTRY(BRCM_CC_43143_CHIP_ID, 0xFFFFFFFF, 43143),
BRCMF_FW_ENTRY(BRCM_CC_43241_CHIP_ID, 0x0000001F, 43241B0),
BRCMF_FW_ENTRY(BRCM_CC_43241_CHIP_ID, 0x00000020, 43241B4),
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
index a0873ad..a4308c6 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
@@ -52,7 +52,7 @@
BRCMF_FW_DEF(43569, "brcmfmac43569");
BRCMF_FW_DEF(4373, "brcmfmac4373");
-static struct brcmf_firmware_mapping brcmf_usb_fwnames[] = {
+static const struct brcmf_firmware_mapping brcmf_usb_fwnames[] = {
BRCMF_FW_ENTRY(BRCM_CC_43143_CHIP_ID, 0xFFFFFFFF, 43143),
BRCMF_FW_ENTRY(BRCM_CC_43235_CHIP_ID, 0x00000008, 43236B),
BRCMF_FW_ENTRY(BRCM_CC_43236_CHIP_ID, 0x00000008, 43236B),
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
index 93d4cde..9d830d2 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
@@ -3388,13 +3388,8 @@
u8 phybw40;
phybw40 = CHSPEC_IS40(pi->radio_chanspec);
- if (LCNREV_LT(pi->pubpi.phy_rev, 2)) {
- mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
- mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
- } else {
- mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
- mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
- }
+ mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
+ mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
if (phybw40 == 0) {
mod_phy_reg((pi), 0x410,
diff --git a/drivers/net/wireless/broadcom/brcm80211/include/brcm_hw_ids.h b/drivers/net/wireless/broadcom/brcm80211/include/brcm_hw_ids.h
index 57544a3..686f7a8 100644
--- a/drivers/net/wireless/broadcom/brcm80211/include/brcm_hw_ids.h
+++ b/drivers/net/wireless/broadcom/brcm80211/include/brcm_hw_ids.h
@@ -57,6 +57,7 @@
#define BRCM_CC_43602_CHIP_ID 43602
#define BRCM_CC_4365_CHIP_ID 0x4365
#define BRCM_CC_4366_CHIP_ID 0x4366
+#define BRCM_CC_43664_CHIP_ID 43664
#define BRCM_CC_4371_CHIP_ID 0x4371
#define CY_CC_4373_CHIP_ID 0x4373
diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2100.c b/drivers/net/wireless/intel/ipw2x00/ipw2100.c
index 236b524..7c4f550 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2100.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2100.c
@@ -3732,7 +3732,7 @@
IPW2100_ORD(ASSOCIATED_AP_PTR,
"0 if not associated, else pointer to AP table entry"),
IPW2100_ORD(AVAILABLE_AP_CNT,
- "AP's decsribed in the AP table"),
+ "AP's described in the AP table"),
IPW2100_ORD(AP_LIST_PTR, "Ptr to list of available APs"),
IPW2100_ORD(STAT_AP_ASSNS, "associations"),
IPW2100_ORD(STAT_ASSN_FAIL, "association failures"),
diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2100.h b/drivers/net/wireless/intel/ipw2x00/ipw2100.h
index 1939478..ce3e35f 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2100.h
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2100.h
@@ -1009,7 +1009,7 @@
IPW_ORD_STAT_PERCENT_RETRIES, // current calculation of % missed tx retries
IPW_ORD_ASSOCIATED_AP_PTR, // If associated, this is ptr to the associated
// AP table entry. set to 0 if not associated
- IPW_ORD_AVAILABLE_AP_CNT, // # of AP's decsribed in the AP table
+ IPW_ORD_AVAILABLE_AP_CNT, // # of AP's described in the AP table
IPW_ORD_AP_LIST_PTR, // Ptr to list of available APs
IPW_ORD_STAT_AP_ASSNS, // # of associations
IPW_ORD_STAT_ASSN_FAIL, // # of association failures
diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.c b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
index 87a5e41..ba3fb1d 100644
--- a/drivers/net/wireless/intel/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.c
@@ -12012,7 +12012,7 @@
#ifdef CONFIG_IPW2200_QOS
module_param(qos_enable, int, 0444);
-MODULE_PARM_DESC(qos_enable, "enable all QoS functionalitis");
+MODULE_PARM_DESC(qos_enable, "enable all QoS functionalities");
module_param(qos_burst_enable, int, 0444);
MODULE_PARM_DESC(qos_burst_enable, "enable QoS burst mode");
diff --git a/drivers/net/wireless/intel/iwlwifi/Makefile b/drivers/net/wireless/intel/iwlwifi/Makefile
index e6205ea..4d08d78 100644
--- a/drivers/net/wireless/intel/iwlwifi/Makefile
+++ b/drivers/net/wireless/intel/iwlwifi/Makefile
@@ -13,7 +13,7 @@
iwlwifi-objs += iwl-trans.o
iwlwifi-objs += fw/notif-wait.o
iwlwifi-$(CONFIG_IWLMVM) += fw/paging.o fw/smem.o fw/init.o fw/dbg.o
-iwlwifi-$(CONFIG_IWLMVM) += fw/common_rx.o fw/nvm.o
+iwlwifi-$(CONFIG_IWLMVM) += fw/common_rx.o
iwlwifi-$(CONFIG_ACPI) += fw/acpi.o
iwlwifi-$(CONFIG_IWLWIFI_DEBUGFS) += fw/debugfs.o
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/1000.c b/drivers/net/wireless/intel/iwlwifi/cfg/1000.c
index b2573b1..5916879 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/1000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/1000.c
@@ -1,6 +1,7 @@
/******************************************************************************
*
* Copyright(c) 2008 - 2014 Intel Corporation. All rights reserved.
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of version 2 of the GNU General Public License as
@@ -27,7 +28,6 @@
#include <linux/module.h>
#include <linux/stringify.h>
#include "iwl-config.h"
-#include "iwl-csr.h"
#include "iwl-agn-hw.h"
/* Highest firmware API version supported */
@@ -91,7 +91,8 @@
.base_params = &iwl1000_base_params, \
.eeprom_params = &iwl1000_eeprom_params, \
.led_mode = IWL_LED_BLINK, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl1000_bgn_cfg = {
.name = "Intel(R) Centrino(R) Wireless-N 1000 BGN",
@@ -117,7 +118,8 @@
.eeprom_params = &iwl1000_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
.rx_with_siso_diversity = true, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl100_bgn_cfg = {
.name = "Intel(R) Centrino(R) Wireless-N 100 BGN",
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/2000.c b/drivers/net/wireless/intel/iwlwifi/cfg/2000.c
index 1b32ad4..a63ca88 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/2000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/2000.c
@@ -1,6 +1,7 @@
/******************************************************************************
*
* Copyright(c) 2008 - 2014 Intel Corporation. All rights reserved.
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of version 2 of the GNU General Public License as
@@ -115,7 +116,8 @@
.base_params = &iwl2000_base_params, \
.eeprom_params = &iwl20x0_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl2000_2bgn_cfg = {
@@ -142,7 +144,8 @@
.base_params = &iwl2030_base_params, \
.eeprom_params = &iwl20x0_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl2030_2bgn_cfg = {
.name = "Intel(R) Centrino(R) Wireless-N 2230 BGN",
@@ -163,7 +166,8 @@
.eeprom_params = &iwl20x0_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
.rx_with_siso_diversity = true, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl105_bgn_cfg = {
.name = "Intel(R) Centrino(R) Wireless-N 105 BGN",
@@ -190,7 +194,8 @@
.eeprom_params = &iwl20x0_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
.rx_with_siso_diversity = true, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl135_bgn_cfg = {
.name = "Intel(R) Centrino(R) Wireless-N 135 BGN",
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/22000.c b/drivers/net/wireless/intel/iwlwifi/cfg/22000.c
index dffd9df..d4ba66a 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/22000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/22000.c
@@ -54,7 +54,6 @@
#include <linux/module.h>
#include <linux/stringify.h>
#include "iwl-config.h"
-#include "iwl-agn-hw.h"
/* Highest firmware API version supported */
#define IWL_22000_UCODE_API_MAX 38
@@ -115,8 +114,6 @@
.ucode_api_max = IWL_22000_UCODE_API_MAX, \
.ucode_api_min = IWL_22000_UCODE_API_MIN, \
.device_family = IWL_DEVICE_FAMILY_22000, \
- .max_inst_size = IWL60_RTC_INST_SIZE, \
- .max_data_size = IWL60_RTC_DATA_SIZE, \
.base_params = &iwl_22000_base_params, \
.led_mode = IWL_LED_RF_STATE, \
.nvm_hw_section_num = NVM_HW_SECTION_NUM_FAMILY_22000, \
@@ -137,13 +134,13 @@
.gen2 = true, \
.nvm_type = IWL_NVM_EXT, \
.dbgc_supported = true, \
- .tx_cmd_queue_size = 32, \
.min_umac_error_event_table = 0x400000
const struct iwl_cfg iwl22000_2ac_cfg_hr = {
.name = "Intel(R) Dual Band Wireless AC 22000",
.fw_name_pre = IWL_22000_HR_FW_PRE,
IWL_DEVICE_22000,
+ .csr = &iwl_csr_v1,
.ht_params = &iwl_22000_ht_params,
.nvm_ver = IWL_22000_NVM_VERSION,
.nvm_calib_ver = IWL_22000_TX_POWER_VERSION,
@@ -154,6 +151,7 @@
.name = "Intel(R) Dual Band Wireless AC 22000",
.fw_name_pre = IWL_22000_HR_CDB_FW_PRE,
IWL_DEVICE_22000,
+ .csr = &iwl_csr_v1,
.ht_params = &iwl_22000_ht_params,
.nvm_ver = IWL_22000_NVM_VERSION,
.nvm_calib_ver = IWL_22000_TX_POWER_VERSION,
@@ -165,6 +163,7 @@
.name = "Intel(R) Dual Band Wireless AC 22000",
.fw_name_pre = IWL_22000_JF_FW_PRE,
IWL_DEVICE_22000,
+ .csr = &iwl_csr_v1,
.ht_params = &iwl_22000_ht_params,
.nvm_ver = IWL_22000_NVM_VERSION,
.nvm_calib_ver = IWL_22000_TX_POWER_VERSION,
@@ -175,6 +174,7 @@
.name = "Intel(R) Dual Band Wireless AX 22000",
.fw_name_pre = IWL_22000_HR_FW_PRE,
IWL_DEVICE_22000,
+ .csr = &iwl_csr_v1,
.ht_params = &iwl_22000_ht_params,
.nvm_ver = IWL_22000_NVM_VERSION,
.nvm_calib_ver = IWL_22000_TX_POWER_VERSION,
@@ -185,6 +185,7 @@
.name = "Intel(R) Dual Band Wireless AX 22000",
.fw_name_pre = IWL_22000_HR_F0_FW_PRE,
IWL_DEVICE_22000,
+ .csr = &iwl_csr_v1,
.ht_params = &iwl_22000_ht_params,
.nvm_ver = IWL_22000_NVM_VERSION,
.nvm_calib_ver = IWL_22000_TX_POWER_VERSION,
@@ -195,6 +196,7 @@
.name = "Intel(R) Dual Band Wireless AX 22000",
.fw_name_pre = IWL_22000_JF_B0_FW_PRE,
IWL_DEVICE_22000,
+ .csr = &iwl_csr_v1,
.ht_params = &iwl_22000_ht_params,
.nvm_ver = IWL_22000_NVM_VERSION,
.nvm_calib_ver = IWL_22000_TX_POWER_VERSION,
@@ -205,6 +207,7 @@
.name = "Intel(R) Dual Band Wireless AX 22000",
.fw_name_pre = IWL_22000_HR_A0_FW_PRE,
IWL_DEVICE_22000,
+ .csr = &iwl_csr_v1,
.ht_params = &iwl_22000_ht_params,
.nvm_ver = IWL_22000_NVM_VERSION,
.nvm_calib_ver = IWL_22000_TX_POWER_VERSION,
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/5000.c b/drivers/net/wireless/intel/iwlwifi/cfg/5000.c
index 4aa8f0a..a224f1b 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/5000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/5000.c
@@ -1,6 +1,7 @@
/******************************************************************************
*
* Copyright(c) 2007 - 2014 Intel Corporation. All rights reserved.
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of version 2 of the GNU General Public License as
@@ -28,7 +29,6 @@
#include <linux/stringify.h>
#include "iwl-config.h"
#include "iwl-agn-hw.h"
-#include "iwl-csr.h"
/* Highest firmware API version supported */
#define IWL5000_UCODE_API_MAX 5
@@ -89,7 +89,8 @@
.base_params = &iwl5000_base_params, \
.eeprom_params = &iwl5000_eeprom_params, \
.led_mode = IWL_LED_BLINK, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl5300_agn_cfg = {
.name = "Intel(R) Ultimate N WiFi Link 5300 AGN",
@@ -153,7 +154,8 @@
.eeprom_params = &iwl5000_eeprom_params, \
.led_mode = IWL_LED_BLINK, \
.internal_wimax_coex = true, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl5150_agn_cfg = {
.name = "Intel(R) WiMAX/WiFi Link 5150 AGN",
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/6000.c b/drivers/net/wireless/intel/iwlwifi/cfg/6000.c
index 39335b7..51cec0b 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/6000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/6000.c
@@ -1,6 +1,7 @@
/******************************************************************************
*
* Copyright(c) 2008 - 2014 Intel Corporation. All rights reserved.
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of version 2 of the GNU General Public License as
@@ -135,7 +136,8 @@
.base_params = &iwl6000_g2_base_params, \
.eeprom_params = &iwl6000_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl6005_2agn_cfg = {
.name = "Intel(R) Centrino(R) Advanced-N 6205 AGN",
@@ -189,7 +191,8 @@
.base_params = &iwl6000_g2_base_params, \
.eeprom_params = &iwl6000_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl6030_2agn_cfg = {
.name = "Intel(R) Centrino(R) Advanced-N 6230 AGN",
@@ -225,7 +228,8 @@
.base_params = &iwl6000_g2_base_params, \
.eeprom_params = &iwl6000_eeprom_params, \
.led_mode = IWL_LED_RF_STATE, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl6035_2agn_cfg = {
.name = "Intel(R) Centrino(R) Advanced-N 6235 AGN",
@@ -280,7 +284,8 @@
.base_params = &iwl6000_base_params, \
.eeprom_params = &iwl6000_eeprom_params, \
.led_mode = IWL_LED_BLINK, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl6000i_2agn_cfg = {
.name = "Intel(R) Centrino(R) Advanced-N 6200 AGN",
@@ -313,7 +318,8 @@
.eeprom_params = &iwl6000_eeprom_params, \
.led_mode = IWL_LED_BLINK, \
.internal_wimax_coex = true, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl6050_2agn_cfg = {
.name = "Intel(R) Centrino(R) Advanced-N + WiMAX 6250 AGN",
@@ -339,7 +345,8 @@
.eeprom_params = &iwl6000_eeprom_params, \
.led_mode = IWL_LED_BLINK, \
.internal_wimax_coex = true, \
- .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K
+ .max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl6150_bgn_cfg = {
.name = "Intel(R) Centrino(R) Wireless-N + WiMAX 6150 BGN",
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/7000.c b/drivers/net/wireless/intel/iwlwifi/cfg/7000.c
index ce741be..69bfa82 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/7000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/7000.c
@@ -7,7 +7,8 @@
*
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2014 Intel Mobile Communications GmbH
- * Copyright(c) 2015 Intel Deutschland GmbH
+ * Copyright(c) 2015 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -34,7 +35,8 @@
*
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2014 Intel Mobile Communications GmbH
- * Copyright(c) 2015 Intel Deutschland GmbH
+ * Copyright(c) 2015 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -68,7 +70,6 @@
#include <linux/module.h>
#include <linux/stringify.h>
#include "iwl-config.h"
-#include "iwl-agn-hw.h"
/* Highest firmware API version supported */
#define IWL7260_UCODE_API_MAX 17
@@ -160,14 +161,13 @@
#define IWL_DEVICE_7000_COMMON \
.device_family = IWL_DEVICE_FAMILY_7000, \
- .max_inst_size = IWL60_RTC_INST_SIZE, \
- .max_data_size = IWL60_RTC_DATA_SIZE, \
.base_params = &iwl7000_base_params, \
.led_mode = IWL_LED_RF_STATE, \
.nvm_hw_section_num = NVM_HW_SECTION_NUM_FAMILY_7000, \
.non_shared_ant = ANT_A, \
.max_ht_ampdu_exponent = IEEE80211_HT_MAX_AMPDU_64K, \
- .dccm_offset = IWL7000_DCCM_OFFSET
+ .dccm_offset = IWL7000_DCCM_OFFSET, \
+ .csr = &iwl_csr_v1
#define IWL_DEVICE_7000 \
IWL_DEVICE_7000_COMMON, \
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/8000.c b/drivers/net/wireless/intel/iwlwifi/cfg/8000.c
index 3f4d9ba..7262e97 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/8000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/8000.c
@@ -7,7 +7,8 @@
*
* Copyright(c) 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2014 - 2015 Intel Mobile Communications GmbH
- * Copyright(c) 2016 Intel Deutschland GmbH
+ * Copyright(c) 2016 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -34,6 +35,7 @@
*
* Copyright(c) 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2014 - 2015 Intel Mobile Communications GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -67,7 +69,6 @@
#include <linux/module.h>
#include <linux/stringify.h>
#include "iwl-config.h"
-#include "iwl-agn-hw.h"
/* Highest firmware API version supported */
#define IWL8000_UCODE_API_MAX 36
@@ -140,8 +141,6 @@
#define IWL_DEVICE_8000_COMMON \
.device_family = IWL_DEVICE_FAMILY_8000, \
- .max_inst_size = IWL60_RTC_INST_SIZE, \
- .max_data_size = IWL60_RTC_DATA_SIZE, \
.base_params = &iwl8000_base_params, \
.led_mode = IWL_LED_RF_STATE, \
.nvm_hw_section_num = NVM_HW_SECTION_NUM_FAMILY_8000, \
@@ -158,7 +157,8 @@
.apmg_not_supported = true, \
.nvm_type = IWL_NVM_EXT, \
.dbgc_supported = true, \
- .min_umac_error_event_table = 0x800000
+ .min_umac_error_event_table = 0x800000, \
+ .csr = &iwl_csr_v1
#define IWL_DEVICE_8000 \
IWL_DEVICE_8000_COMMON, \
diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/9000.c b/drivers/net/wireless/intel/iwlwifi/cfg/9000.c
index e1c869a..9706624 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/9000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/9000.c
@@ -54,7 +54,6 @@
#include <linux/module.h>
#include <linux/stringify.h>
#include "iwl-config.h"
-#include "iwl-agn-hw.h"
#include "fw/file.h"
/* Highest firmware API version supported */
@@ -135,8 +134,6 @@
.ucode_api_max = IWL9000_UCODE_API_MAX, \
.ucode_api_min = IWL9000_UCODE_API_MIN, \
.device_family = IWL_DEVICE_FAMILY_9000, \
- .max_inst_size = IWL60_RTC_INST_SIZE, \
- .max_data_size = IWL60_RTC_DATA_SIZE, \
.base_params = &iwl9000_base_params, \
.led_mode = IWL_LED_RF_STATE, \
.nvm_hw_section_num = NVM_HW_SECTION_NUM_FAMILY_9000, \
@@ -156,7 +153,8 @@
.rf_id = true, \
.nvm_type = IWL_NVM_EXT, \
.dbgc_supported = true, \
- .min_umac_error_event_table = 0x800000
+ .min_umac_error_event_table = 0x800000, \
+ .csr = &iwl_csr_v1
const struct iwl_cfg iwl9160_2ac_cfg = {
.name = "Intel(R) Dual Band Wireless AC 9160",
diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/main.c b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
index e68254e..030482b 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/main.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
@@ -1200,16 +1200,16 @@
return -EINVAL;
}
- if (!data->sku_cap_11n_enable && !data->sku_cap_band_24GHz_enable &&
- !data->sku_cap_band_52GHz_enable) {
+ if (!data->sku_cap_11n_enable && !data->sku_cap_band_24ghz_enable &&
+ !data->sku_cap_band_52ghz_enable) {
IWL_ERR(priv, "Invalid device sku\n");
return -EINVAL;
}
IWL_DEBUG_INFO(priv,
"Device SKU: 24GHz %s %s, 52GHz %s %s, 11.n %s %s\n",
- data->sku_cap_band_24GHz_enable ? "" : "NOT", "enabled",
- data->sku_cap_band_52GHz_enable ? "" : "NOT", "enabled",
+ data->sku_cap_band_24ghz_enable ? "" : "NOT", "enabled",
+ data->sku_cap_band_52ghz_enable ? "" : "NOT", "enabled",
data->sku_cap_11n_enable ? "" : "NOT", "enabled");
priv->hw_params.tx_chains_num =
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/datapath.h b/drivers/net/wireless/intel/iwlwifi/fw/api/datapath.h
index a57c722..5f6e855 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/api/datapath.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/api/datapath.h
@@ -88,11 +88,6 @@
TLC_MNG_CONFIG_CMD = 0xF,
/**
- * @TLC_MNG_NOTIF_REQ_CMD: &struct iwl_tlc_notif_req_config_cmd
- */
- TLC_MNG_NOTIF_REQ_CMD = 0x10,
-
- /**
* @TLC_MNG_UPDATE_NOTIF: &struct iwl_tlc_update_notif
*/
TLC_MNG_UPDATE_NOTIF = 0xF7,
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/nvm-reg.h b/drivers/net/wireless/intel/iwlwifi/fw/api/nvm-reg.h
index 37c57bc..8d6dc91 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/api/nvm-reg.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/api/nvm-reg.h
@@ -190,22 +190,36 @@
} __packed; /* GRP_REGULATORY_NVM_GET_INFO_GENERAL_S_VER_1 */
/**
+ * enum iwl_nvm_mac_sku_flags - flags in &iwl_nvm_get_info_sku
+ * @NVM_MAC_SKU_FLAGS_BAND_2_4_ENABLED: true if 2.4 band enabled
+ * @NVM_MAC_SKU_FLAGS_BAND_5_2_ENABLED: true if 5.2 band enabled
+ * @NVM_MAC_SKU_FLAGS_802_11N_ENABLED: true if 11n enabled
+ * @NVM_MAC_SKU_FLAGS_802_11AC_ENABLED: true if 11ac enabled
+ * @NVM_MAC_SKU_FLAGS_802_11AX_ENABLED: true if 11ax enabled
+ * @NVM_MAC_SKU_FLAGS_MIMO_DISABLED: true if MIMO disabled
+ * @NVM_MAC_SKU_FLAGS_WAPI_ENABLED: true if WAPI enabled
+ * @NVM_MAC_SKU_FLAGS_REG_CHECK_ENABLED: true if regulatory checker enabled
+ * @NVM_MAC_SKU_FLAGS_API_LOCK_ENABLED: true if API lock enabled
+ */
+enum iwl_nvm_mac_sku_flags {
+ NVM_MAC_SKU_FLAGS_BAND_2_4_ENABLED = BIT(0),
+ NVM_MAC_SKU_FLAGS_BAND_5_2_ENABLED = BIT(1),
+ NVM_MAC_SKU_FLAGS_802_11N_ENABLED = BIT(2),
+ NVM_MAC_SKU_FLAGS_802_11AC_ENABLED = BIT(3),
+ NVM_MAC_SKU_FLAGS_802_11AX_ENABLED = BIT(4),
+ NVM_MAC_SKU_FLAGS_MIMO_DISABLED = BIT(5),
+ NVM_MAC_SKU_FLAGS_WAPI_ENABLED = BIT(8),
+ NVM_MAC_SKU_FLAGS_REG_CHECK_ENABLED = BIT(14),
+ NVM_MAC_SKU_FLAGS_API_LOCK_ENABLED = BIT(15),
+};
+
+/**
* struct iwl_nvm_get_info_sku - mac information
- * @enable_24g: band 2.4G enabled
- * @enable_5g: band 5G enabled
- * @enable_11n: 11n enabled
- * @enable_11ac: 11ac enabled
- * @mimo_disable: MIMO enabled
- * @ext_crypto: Extended crypto enabled
+ * @mac_sku_flags: flags for SKU, see &enum iwl_nvm_mac_sku_flags
*/
struct iwl_nvm_get_info_sku {
- __le32 enable_24g;
- __le32 enable_5g;
- __le32 enable_11n;
- __le32 enable_11ac;
- __le32 mimo_disable;
- __le32 ext_crypto;
-} __packed; /* GRP_REGULATORY_NVM_GET_INFO_MAC_SKU_SECTION_S_VER_1 */
+ __le32 mac_sku_flags;
+} __packed; /* REGULATORY_NVM_GET_INFO_MAC_SKU_SECTION_S_VER_2 */
/**
* struct iwl_nvm_get_info_phy - phy information
@@ -243,7 +257,7 @@
struct iwl_nvm_get_info_sku mac_sku;
struct iwl_nvm_get_info_phy phy_sku;
struct iwl_nvm_get_info_regulatory regulatory;
-} __packed; /* GRP_REGULATORY_NVM_GET_INFO_CMD_RSP_S_VER_1 */
+} __packed; /* GRP_REGULATORY_NVM_GET_INFO_CMD_RSP_S_VER_2 */
/**
* struct iwl_nvm_access_complete_cmd - NVM_ACCESS commands are completed
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/rs.h b/drivers/net/wireless/intel/iwlwifi/fw/api/rs.h
index e49a6f7..21e13a3 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/api/rs.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/api/rs.h
@@ -7,6 +7,7 @@
*
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -28,6 +29,7 @@
*
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -64,62 +66,38 @@
/**
* enum iwl_tlc_mng_cfg_flags_enum - options for TLC config flags
- * @IWL_TLC_MNG_CFG_FLAGS_CCK_MSK: CCK support
- * @IWL_TLC_MNG_CFG_FLAGS_DD_MSK: enable DD
* @IWL_TLC_MNG_CFG_FLAGS_STBC_MSK: enable STBC
* @IWL_TLC_MNG_CFG_FLAGS_LDPC_MSK: enable LDPC
- * @IWL_TLC_MNG_CFG_FLAGS_BF_MSK: enable BFER
- * @IWL_TLC_MNG_CFG_FLAGS_DCM_MSK: enable DCM
*/
enum iwl_tlc_mng_cfg_flags {
- IWL_TLC_MNG_CFG_FLAGS_CCK_MSK = BIT(0),
- IWL_TLC_MNG_CFG_FLAGS_DD_MSK = BIT(1),
- IWL_TLC_MNG_CFG_FLAGS_STBC_MSK = BIT(2),
- IWL_TLC_MNG_CFG_FLAGS_LDPC_MSK = BIT(3),
- IWL_TLC_MNG_CFG_FLAGS_BF_MSK = BIT(4),
- IWL_TLC_MNG_CFG_FLAGS_DCM_MSK = BIT(5),
+ IWL_TLC_MNG_CFG_FLAGS_STBC_MSK = BIT(0),
+ IWL_TLC_MNG_CFG_FLAGS_LDPC_MSK = BIT(1),
};
/**
* enum iwl_tlc_mng_cfg_cw - channel width options
- * @IWL_TLC_MNG_MAX_CH_WIDTH_20MHZ: 20MHZ channel
- * @IWL_TLC_MNG_MAX_CH_WIDTH_40MHZ: 40MHZ channel
- * @IWL_TLC_MNG_MAX_CH_WIDTH_80MHZ: 80MHZ channel
- * @IWL_TLC_MNG_MAX_CH_WIDTH_160MHZ: 160MHZ channel
- * @IWL_TLC_MNG_MAX_CH_WIDTH_LAST: maximum value
+ * @IWL_TLC_MNG_CH_WIDTH_20MHZ: 20MHZ channel
+ * @IWL_TLC_MNG_CH_WIDTH_40MHZ: 40MHZ channel
+ * @IWL_TLC_MNG_CH_WIDTH_80MHZ: 80MHZ channel
+ * @IWL_TLC_MNG_CH_WIDTH_160MHZ: 160MHZ channel
+ * @IWL_TLC_MNG_CH_WIDTH_LAST: maximum value
*/
enum iwl_tlc_mng_cfg_cw {
- IWL_TLC_MNG_MAX_CH_WIDTH_20MHZ,
- IWL_TLC_MNG_MAX_CH_WIDTH_40MHZ,
- IWL_TLC_MNG_MAX_CH_WIDTH_80MHZ,
- IWL_TLC_MNG_MAX_CH_WIDTH_160MHZ,
- IWL_TLC_MNG_MAX_CH_WIDTH_LAST = IWL_TLC_MNG_MAX_CH_WIDTH_160MHZ,
+ IWL_TLC_MNG_CH_WIDTH_20MHZ,
+ IWL_TLC_MNG_CH_WIDTH_40MHZ,
+ IWL_TLC_MNG_CH_WIDTH_80MHZ,
+ IWL_TLC_MNG_CH_WIDTH_160MHZ,
+ IWL_TLC_MNG_CH_WIDTH_LAST = IWL_TLC_MNG_CH_WIDTH_160MHZ,
};
/**
* enum iwl_tlc_mng_cfg_chains - possible chains
* @IWL_TLC_MNG_CHAIN_A_MSK: chain A
* @IWL_TLC_MNG_CHAIN_B_MSK: chain B
- * @IWL_TLC_MNG_CHAIN_C_MSK: chain C
*/
enum iwl_tlc_mng_cfg_chains {
IWL_TLC_MNG_CHAIN_A_MSK = BIT(0),
IWL_TLC_MNG_CHAIN_B_MSK = BIT(1),
- IWL_TLC_MNG_CHAIN_C_MSK = BIT(2),
-};
-
-/**
- * enum iwl_tlc_mng_cfg_gi - guard interval options
- * @IWL_TLC_MNG_SGI_20MHZ_MSK: enable short GI for 20MHZ
- * @IWL_TLC_MNG_SGI_40MHZ_MSK: enable short GI for 40MHZ
- * @IWL_TLC_MNG_SGI_80MHZ_MSK: enable short GI for 80MHZ
- * @IWL_TLC_MNG_SGI_160MHZ_MSK: enable short GI for 160MHZ
- */
-enum iwl_tlc_mng_cfg_gi {
- IWL_TLC_MNG_SGI_20MHZ_MSK = BIT(0),
- IWL_TLC_MNG_SGI_40MHZ_MSK = BIT(1),
- IWL_TLC_MNG_SGI_80MHZ_MSK = BIT(2),
- IWL_TLC_MNG_SGI_160MHZ_MSK = BIT(3),
};
/**
@@ -145,25 +123,7 @@
};
/**
- * enum iwl_tlc_mng_vht_he_types - VHT HE types
- * @IWL_TLC_MNG_VALID_VHT_HE_TYPES_SU: VHT HT single user
- * @IWL_TLC_MNG_VALID_VHT_HE_TYPES_SU_EXT: VHT HT single user extended
- * @IWL_TLC_MNG_VALID_VHT_HE_TYPES_MU: VHT HT multiple users
- * @IWL_TLC_MNG_VALID_VHT_HE_TYPES_TRIG_BASED: trigger based
- * @IWL_TLC_MNG_VALID_VHT_HE_TYPES_NUM: a count of possible types
- */
-enum iwl_tlc_mng_vht_he_types {
- IWL_TLC_MNG_VALID_VHT_HE_TYPES_SU = 0,
- IWL_TLC_MNG_VALID_VHT_HE_TYPES_SU_EXT,
- IWL_TLC_MNG_VALID_VHT_HE_TYPES_MU,
- IWL_TLC_MNG_VALID_VHT_HE_TYPES_TRIG_BASED,
- IWL_TLC_MNG_VALID_VHT_HE_TYPES_NUM =
- IWL_TLC_MNG_VALID_VHT_HE_TYPES_TRIG_BASED,
-
-};
-
-/**
- * enum iwl_tlc_mng_ht_rates - HT/VHT rates
+ * enum iwl_tlc_mng_ht_rates - HT/VHT/HE rates
* @IWL_TLC_MNG_HT_RATE_MCS0: index of MCS0
* @IWL_TLC_MNG_HT_RATE_MCS1: index of MCS1
* @IWL_TLC_MNG_HT_RATE_MCS2: index of MCS2
@@ -174,6 +134,8 @@
* @IWL_TLC_MNG_HT_RATE_MCS7: index of MCS7
* @IWL_TLC_MNG_HT_RATE_MCS8: index of MCS8
* @IWL_TLC_MNG_HT_RATE_MCS9: index of MCS9
+ * @IWL_TLC_MNG_HT_RATE_MCS10: index of MCS10
+ * @IWL_TLC_MNG_HT_RATE_MCS11: index of MCS11
* @IWL_TLC_MNG_HT_RATE_MAX: maximal rate for HT/VHT
*/
enum iwl_tlc_mng_ht_rates {
@@ -187,81 +149,73 @@
IWL_TLC_MNG_HT_RATE_MCS7,
IWL_TLC_MNG_HT_RATE_MCS8,
IWL_TLC_MNG_HT_RATE_MCS9,
- IWL_TLC_MNG_HT_RATE_MAX = IWL_TLC_MNG_HT_RATE_MCS9,
+ IWL_TLC_MNG_HT_RATE_MCS10,
+ IWL_TLC_MNG_HT_RATE_MCS11,
+ IWL_TLC_MNG_HT_RATE_MAX = IWL_TLC_MNG_HT_RATE_MCS11,
};
/* Maximum supported tx antennas number */
-#define MAX_RS_ANT_NUM 3
+#define MAX_NSS 2
/**
* struct tlc_config_cmd - TLC configuration
* @sta_id: station id
* @reserved1: reserved
- * @max_supp_ch_width: channel width
- * @flags: bitmask of &enum iwl_tlc_mng_cfg_flags
- * @chains: bitmask of &enum iwl_tlc_mng_cfg_chains
- * @max_supp_ss: valid values are 0-3, 0 - spatial streams are not supported
- * @valid_vht_he_types: bitmap of &enum iwl_tlc_mng_vht_he_types
- * @non_ht_supp_rates: bitmap of supported legacy rates
- * @ht_supp_rates: bitmap of supported HT/VHT rates, valid bits are 0-9
+ * @max_ch_width: max supported channel width from @enum iwl_tlc_mng_cfg_cw
* @mode: &enum iwl_tlc_mng_cfg_mode
- * @reserved2: reserved
- * @he_supp_rates: bitmap of supported HE rates
+ * @chains: bitmask of &enum iwl_tlc_mng_cfg_chains
+ * @amsdu: TX amsdu is supported
+ * @flags: bitmask of &enum iwl_tlc_mng_cfg_flags
+ * @non_ht_rates: bitmap of supported legacy rates
+ * @ht_rates: bitmap of &enum iwl_tlc_mng_ht_rates, per <nss, channel-width>
+ * pair (0 - 80mhz width and below, 1 - 160mhz).
+ * @max_mpdu_len: max MPDU length, in bytes
* @sgi_ch_width_supp: bitmap of SGI support per channel width
- * @he_gi_support: 11ax HE guard interval
- * @max_ampdu_cnt: max AMPDU size (frames count)
+ * use BIT(@enum iwl_tlc_mng_cfg_cw)
+ * @reserved2: reserved
*/
struct iwl_tlc_config_cmd {
u8 sta_id;
u8 reserved1[3];
- u8 max_supp_ch_width;
- u8 chains;
- u8 max_supp_ss;
- u8 valid_vht_he_types;
- __le16 flags;
- __le16 non_ht_supp_rates;
- __le16 ht_supp_rates[MAX_RS_ANT_NUM];
+ u8 max_ch_width;
u8 mode;
- u8 reserved2;
- __le16 he_supp_rates;
+ u8 chains;
+ u8 amsdu;
+ __le16 flags;
+ __le16 non_ht_rates;
+ __le16 ht_rates[MAX_NSS][2];
+ __le16 max_mpdu_len;
u8 sgi_ch_width_supp;
- u8 he_gi_support;
- __le32 max_ampdu_cnt;
-} __packed; /* TLC_MNG_CONFIG_CMD_API_S_VER_1 */
-
-#define IWL_TLC_NOTIF_INIT_RATE_POS 0
-#define IWL_TLC_NOTIF_INIT_RATE_MSK BIT(IWL_TLC_NOTIF_INIT_RATE_POS)
-#define IWL_TLC_NOTIF_REQ_INTERVAL (500)
+ u8 reserved2[1];
+} __packed; /* TLC_MNG_CONFIG_CMD_API_S_VER_2 */
/**
- * struct iwl_tlc_notif_req_config_cmd - request notif on specific changes
- * @sta_id: relevant station
- * @reserved1: reserved
- * @flags: bitmap of requested notifications %IWL_TLC_NOTIF_INIT_\*
- * @interval: minimum time between notifications from TLC to the driver (msec)
- * @reserved2: reserved
+ * enum iwl_tlc_update_flags - updated fields
+ * @IWL_TLC_NOTIF_FLAG_RATE: last initial rate update
+ * @IWL_TLC_NOTIF_FLAG_AMSDU: umsdu parameters update
*/
-struct iwl_tlc_notif_req_config_cmd {
- u8 sta_id;
- u8 reserved1;
- __le16 flags;
- __le16 interval;
- __le16 reserved2;
-} __packed; /* TLC_MNG_NOTIF_REQ_CMD_API_S_VER_1 */
+enum iwl_tlc_update_flags {
+ IWL_TLC_NOTIF_FLAG_RATE = BIT(0),
+ IWL_TLC_NOTIF_FLAG_AMSDU = BIT(1),
+};
/**
* struct iwl_tlc_update_notif - TLC notification from FW
* @sta_id: station id
* @reserved: reserved
* @flags: bitmap of notifications reported
- * @values: field per flag in struct iwl_tlc_notif_req_config_cmd
+ * @rate: current initial rate
+ * @amsdu_size: Max AMSDU size, in bytes
+ * @amsdu_enabled: bitmap for per-TID AMSDU enablement
*/
struct iwl_tlc_update_notif {
u8 sta_id;
- u8 reserved;
- __le16 flags;
- __le32 values[16];
-} __packed; /* TLC_MNG_UPDATE_NTFY_API_S_VER_1 */
+ u8 reserved[3];
+ __le32 flags;
+ __le32 rate;
+ __le32 amsdu_size;
+ __le32 amsdu_enabled;
+} __packed; /* TLC_MNG_UPDATE_NTFY_API_S_VER_2 */
/**
* enum iwl_tlc_debug_flags - debug options
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h
index dfa111b..6ac240b 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h
@@ -131,6 +131,8 @@
TX_QUEUE_CFG_TFD_SHORT_FORMAT = BIT(1),
};
+#define IWL_DEFAULT_QUEUE_SIZE 256
+#define IWL_MGMT_QUEUE_SIZE 16
/**
* struct iwl_tx_queue_cfg_cmd - txq hw scheduler config command
* @sta_id: station id
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.h b/drivers/net/wireless/intel/iwlwifi/fw/dbg.h
index 72259bff..507d9a4 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.h
@@ -227,4 +227,40 @@
cancel_delayed_work_sync(&fwrt->dump.wk);
}
+#ifdef CONFIG_IWLWIFI_DEBUGFS
+static inline void iwl_fw_cancel_timestamp(struct iwl_fw_runtime *fwrt)
+{
+ fwrt->timestamp.delay = 0;
+ cancel_delayed_work_sync(&fwrt->timestamp.wk);
+}
+
+void iwl_fw_trigger_timestamp(struct iwl_fw_runtime *fwrt, u32 delay);
+
+static inline void iwl_fw_suspend_timestamp(struct iwl_fw_runtime *fwrt)
+{
+ cancel_delayed_work_sync(&fwrt->timestamp.wk);
+}
+
+static inline void iwl_fw_resume_timestamp(struct iwl_fw_runtime *fwrt)
+{
+ if (!fwrt->timestamp.delay)
+ return;
+
+ schedule_delayed_work(&fwrt->timestamp.wk,
+ round_jiffies_relative(fwrt->timestamp.delay));
+}
+
+#else
+
+static inline void iwl_fw_cancel_timestamp(struct iwl_fw_runtime *fwrt) {}
+
+static inline void iwl_fw_trigger_timestamp(struct iwl_fw_runtime *fwrt,
+ u32 delay) {}
+
+static inline void iwl_fw_suspend_timestamp(struct iwl_fw_runtime *fwrt) {}
+
+static inline void iwl_fw_resume_timestamp(struct iwl_fw_runtime *fwrt) {}
+
+#endif /* CONFIG_IWLWIFI_DEBUGFS */
+
#endif /* __iwl_fw_dbg_h__ */
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c b/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c
index 8f005cd..8ba5a60 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/debugfs.c
@@ -64,6 +64,7 @@
*****************************************************************************/
#include "api/commands.h"
#include "debugfs.h"
+#include "dbg.h"
#define FWRT_DEBUGFS_READ_FILE_OPS(name) \
static ssize_t iwl_dbgfs_##name##_read(struct iwl_fw_runtime *fwrt, \
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/debugfs.h b/drivers/net/wireless/intel/iwlwifi/fw/debugfs.h
index d93f6a4..cbbfa8e 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/debugfs.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/debugfs.h
@@ -69,28 +69,6 @@
int iwl_fwrt_dbgfs_register(struct iwl_fw_runtime *fwrt,
struct dentry *dbgfs_dir);
-static inline void iwl_fw_cancel_timestamp(struct iwl_fw_runtime *fwrt)
-{
- fwrt->timestamp.delay = 0;
- cancel_delayed_work_sync(&fwrt->timestamp.wk);
-}
-
-static inline void iwl_fw_suspend_timestamp(struct iwl_fw_runtime *fwrt)
-{
- cancel_delayed_work_sync(&fwrt->timestamp.wk);
-}
-
-static inline void iwl_fw_resume_timestamp(struct iwl_fw_runtime *fwrt)
-{
- if (!fwrt->timestamp.delay)
- return;
-
- schedule_delayed_work(&fwrt->timestamp.wk,
- round_jiffies_relative(fwrt->timestamp.delay));
-}
-
-void iwl_fw_trigger_timestamp(struct iwl_fw_runtime *fwrt, u32 delay);
-
#else
static inline int iwl_fwrt_dbgfs_register(struct iwl_fw_runtime *fwrt,
struct dentry *dbgfs_dir)
@@ -98,13 +76,4 @@
return 0;
}
-static inline void iwl_fw_cancel_timestamp(struct iwl_fw_runtime *fwrt) {}
-
-static inline void iwl_fw_suspend_timestamp(struct iwl_fw_runtime *fwrt) {}
-
-static inline void iwl_fw_resume_timestamp(struct iwl_fw_runtime *fwrt) {}
-
-static inline void iwl_fw_trigger_timestamp(struct iwl_fw_runtime *fwrt,
- u32 delay) {}
-
#endif /* CONFIG_IWLWIFI_DEBUGFS */
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/file.h b/drivers/net/wireless/intel/iwlwifi/fw/file.h
index 9b2805e..9d939cb 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/file.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/file.h
@@ -8,6 +8,7 @@
* Copyright(c) 2008 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -143,6 +145,7 @@
IWL_UCODE_TLV_FW_DBG_TRIGGER = 40,
IWL_UCODE_TLV_FW_GSCAN_CAPA = 50,
IWL_UCODE_TLV_FW_MEM_SEG = 51,
+ IWL_UCODE_TLV_IML = 52,
};
struct iwl_ucode_tlv {
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/img.h b/drivers/net/wireless/intel/iwlwifi/fw/img.h
index b23ffe1..f491238 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/img.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/img.h
@@ -8,6 +8,7 @@
* Copyright(c) 2008 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -241,6 +243,8 @@
* @ucode_ver: ucode version from the ucode file
* @fw_version: firmware version string
* @img: ucode image like ucode_rt, ucode_init, ucode_wowlan.
+ * @iml_len: length of the image loader image
+ * @iml: image loader fw image
* @ucode_capa: capabilities parsed from the ucode file.
* @enhance_sensitivity_table: device can do enhanced sensitivity.
* @init_evtlog_ptr: event log offset for init ucode.
@@ -267,6 +271,8 @@
/* ucode images */
struct fw_img img[IWL_UCODE_TYPE_MAX];
+ size_t iml_len;
+ u8 *iml;
struct iwl_ucode_capabilities ucode_capa;
bool enhance_sensitivity_table;
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/nvm.c b/drivers/net/wireless/intel/iwlwifi/fw/nvm.c
deleted file mode 100644
index bd2e1fb..0000000
--- a/drivers/net/wireless/intel/iwlwifi/fw/nvm.c
+++ /dev/null
@@ -1,162 +0,0 @@
-/******************************************************************************
- *
- * This file is provided under a dual BSD/GPLv2 license. When using or
- * redistributing this file, you may do so under either license.
- *
- * GPL LICENSE SUMMARY
- *
- * Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
- * Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
- * Copyright(c) 2016 - 2017 Intel Deutschland GmbH
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of version 2 of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110,
- * USA
- *
- * The full GNU General Public License is included in this distribution
- * in the file called COPYING.
- *
- * Contact Information:
- * Intel Linux Wireless <linuxwifi@intel.com>
- * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
- *
- * BSD LICENSE
- *
- * Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
- * Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
- * Copyright(c) 2016 - 2017 Intel Deutschland GmbH
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- *
- * * Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in
- * the documentation and/or other materials provided with the
- * distribution.
- * * Neither the name Intel Corporation nor the names of its
- * contributors may be used to endorse or promote products derived
- * from this software without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- *
- *****************************************************************************/
-#include "iwl-drv.h"
-#include "runtime.h"
-#include "fw/api/nvm-reg.h"
-#include "fw/api/commands.h"
-#include "iwl-nvm-parse.h"
-
-struct iwl_nvm_data *iwl_fw_get_nvm(struct iwl_fw_runtime *fwrt)
-{
- struct iwl_nvm_get_info cmd = {};
- struct iwl_nvm_get_info_rsp *rsp;
- struct iwl_trans *trans = fwrt->trans;
- struct iwl_nvm_data *nvm;
- struct iwl_host_cmd hcmd = {
- .flags = CMD_WANT_SKB | CMD_SEND_IN_RFKILL,
- .data = { &cmd, },
- .len = { sizeof(cmd) },
- .id = WIDE_ID(REGULATORY_AND_NVM_GROUP, NVM_GET_INFO)
- };
- int ret;
- bool lar_fw_supported = !iwlwifi_mod_params.lar_disable &&
- fw_has_capa(&fwrt->fw->ucode_capa,
- IWL_UCODE_TLV_CAPA_LAR_SUPPORT);
-
- ret = iwl_trans_send_cmd(trans, &hcmd);
- if (ret)
- return ERR_PTR(ret);
-
- if (WARN(iwl_rx_packet_payload_len(hcmd.resp_pkt) != sizeof(*rsp),
- "Invalid payload len in NVM response from FW %d",
- iwl_rx_packet_payload_len(hcmd.resp_pkt))) {
- ret = -EINVAL;
- goto out;
- }
-
- rsp = (void *)hcmd.resp_pkt->data;
- if (le32_to_cpu(rsp->general.flags) & NVM_GENERAL_FLAGS_EMPTY_OTP)
- IWL_INFO(fwrt, "OTP is empty\n");
-
- nvm = kzalloc(sizeof(*nvm) +
- sizeof(struct ieee80211_channel) * IWL_NUM_CHANNELS,
- GFP_KERNEL);
- if (!nvm) {
- ret = -ENOMEM;
- goto out;
- }
-
- iwl_set_hw_address_from_csr(trans, nvm);
- /* TODO: if platform NVM has MAC address - override it here */
-
- if (!is_valid_ether_addr(nvm->hw_addr)) {
- IWL_ERR(fwrt, "no valid mac address was found\n");
- ret = -EINVAL;
- goto err_free;
- }
-
- IWL_INFO(trans, "base HW address: %pM\n", nvm->hw_addr);
-
- /* Initialize general data */
- nvm->nvm_version = le16_to_cpu(rsp->general.nvm_version);
-
- /* Initialize MAC sku data */
- nvm->sku_cap_11ac_enable =
- le32_to_cpu(rsp->mac_sku.enable_11ac);
- nvm->sku_cap_11n_enable =
- le32_to_cpu(rsp->mac_sku.enable_11n);
- nvm->sku_cap_band_24GHz_enable =
- le32_to_cpu(rsp->mac_sku.enable_24g);
- nvm->sku_cap_band_52GHz_enable =
- le32_to_cpu(rsp->mac_sku.enable_5g);
- nvm->sku_cap_mimo_disabled =
- le32_to_cpu(rsp->mac_sku.mimo_disable);
-
- /* Initialize PHY sku data */
- nvm->valid_tx_ant = (u8)le32_to_cpu(rsp->phy_sku.tx_chains);
- nvm->valid_rx_ant = (u8)le32_to_cpu(rsp->phy_sku.rx_chains);
-
- /* Initialize regulatory data */
- nvm->lar_enabled =
- le32_to_cpu(rsp->regulatory.lar_enabled) && lar_fw_supported;
-
- iwl_init_sbands(trans->dev, trans->cfg, nvm,
- rsp->regulatory.channel_profile,
- nvm->valid_tx_ant & fwrt->fw->valid_tx_ant,
- nvm->valid_rx_ant & fwrt->fw->valid_rx_ant,
- nvm->lar_enabled, false);
-
- iwl_free_resp(&hcmd);
- return nvm;
-
-err_free:
- kfree(nvm);
-out:
- iwl_free_resp(&hcmd);
- return ERR_PTR(ret);
-}
-IWL_EXPORT_SYMBOL(iwl_fw_get_nvm);
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/paging.c b/drivers/net/wireless/intel/iwlwifi/fw/paging.c
index 1fec8e3..9b8dd7f 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/paging.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/paging.c
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -30,6 +31,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -163,7 +165,7 @@
static int iwl_fill_paging_mem(struct iwl_fw_runtime *fwrt,
const struct fw_img *image)
{
- int sec_idx, idx;
+ int sec_idx, idx, ret;
u32 offset = 0;
/*
@@ -190,17 +192,23 @@
*/
if (sec_idx >= image->num_sec - 1) {
IWL_ERR(fwrt, "Paging: Missing CSS and/or paging sections\n");
- iwl_free_fw_paging(fwrt);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err;
}
/* copy the CSS block to the dram */
IWL_DEBUG_FW(fwrt, "Paging: load paging CSS to FW, sec = %d\n",
sec_idx);
+ if (image->sec[sec_idx].len > fwrt->fw_paging_db[0].fw_paging_size) {
+ IWL_ERR(fwrt, "CSS block is larger than paging size\n");
+ ret = -EINVAL;
+ goto err;
+ }
+
memcpy(page_address(fwrt->fw_paging_db[0].fw_paging_block),
image->sec[sec_idx].data,
- fwrt->fw_paging_db[0].fw_paging_size);
+ image->sec[sec_idx].len);
dma_sync_single_for_device(fwrt->trans->dev,
fwrt->fw_paging_db[0].fw_paging_phys,
fwrt->fw_paging_db[0].fw_paging_size,
@@ -213,17 +221,39 @@
sec_idx++;
/*
- * copy the paging blocks to the dram
- * loop index start from 1 since that CSS block already copied to dram
- * and CSS index is 0.
- * loop stop at num_of_paging_blk since that last block is not full.
+ * Copy the paging blocks to the dram. The loop index starts
+ * from 1 since the CSS block (index 0) was already copied to
+ * dram. We use num_of_paging_blk + 1 to account for that.
*/
- for (idx = 1; idx < fwrt->num_of_paging_blk; idx++) {
+ for (idx = 1; idx < fwrt->num_of_paging_blk + 1; idx++) {
struct iwl_fw_paging *block = &fwrt->fw_paging_db[idx];
+ int remaining = image->sec[sec_idx].len - offset;
+ int len = block->fw_paging_size;
+
+ /*
+ * For the last block, we copy all that is remaining,
+ * for all other blocks, we copy fw_paging_size at a
+ * time. */
+ if (idx == fwrt->num_of_paging_blk) {
+ len = remaining;
+ if (remaining !=
+ fwrt->num_of_pages_in_last_blk * FW_PAGING_SIZE) {
+ IWL_ERR(fwrt,
+ "Paging: last block contains more data than expected %d\n",
+ remaining);
+ ret = -EINVAL;
+ goto err;
+ }
+ } else if (block->fw_paging_size > remaining) {
+ IWL_ERR(fwrt,
+ "Paging: not enough data in other in block %d (%d)\n",
+ idx, remaining);
+ ret = -EINVAL;
+ goto err;
+ }
memcpy(page_address(block->fw_paging_block),
- image->sec[sec_idx].data + offset,
- block->fw_paging_size);
+ image->sec[sec_idx].data + offset, len);
dma_sync_single_for_device(fwrt->trans->dev,
block->fw_paging_phys,
block->fw_paging_size,
@@ -231,30 +261,16 @@
IWL_DEBUG_FW(fwrt,
"Paging: copied %d paging bytes to block %d\n",
- fwrt->fw_paging_db[idx].fw_paging_size,
- idx);
+ len, idx);
- offset += fwrt->fw_paging_db[idx].fw_paging_size;
- }
-
- /* copy the last paging block */
- if (fwrt->num_of_pages_in_last_blk > 0) {
- struct iwl_fw_paging *block = &fwrt->fw_paging_db[idx];
-
- memcpy(page_address(block->fw_paging_block),
- image->sec[sec_idx].data + offset,
- FW_PAGING_SIZE * fwrt->num_of_pages_in_last_blk);
- dma_sync_single_for_device(fwrt->trans->dev,
- block->fw_paging_phys,
- block->fw_paging_size,
- DMA_BIDIRECTIONAL);
-
- IWL_DEBUG_FW(fwrt,
- "Paging: copied %d pages in the last block %d\n",
- fwrt->num_of_pages_in_last_blk, idx);
+ offset += block->fw_paging_size;
}
return 0;
+
+err:
+ iwl_free_fw_paging(fwrt);
+ return ret;
}
static int iwl_save_fw_paging(struct iwl_fw_runtime *fwrt,
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/runtime.h b/drivers/net/wireless/intel/iwlwifi/fw/runtime.h
index 3fb940e..d8db1dd 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/runtime.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/runtime.h
@@ -170,6 +170,5 @@
void iwl_fwrt_handle_notification(struct iwl_fw_runtime *fwrt,
struct iwl_rx_cmd_buffer *rxb);
-struct iwl_nvm_data *iwl_fw_get_nvm(struct iwl_fw_runtime *fwrt);
#endif /* __iwl_fw_runtime_h__ */
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-config.h b/drivers/net/wireless/intel/iwlwifi/iwl-config.h
index f0f5636..c503b26 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-config.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-config.h
@@ -7,6 +7,7 @@
*
* Copyright(c) 2007 - 2014 Intel Corporation. All rights reserved.
* Copyright (C) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -33,6 +34,7 @@
*
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
* Copyright (C) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -69,6 +71,7 @@
#include <linux/netdevice.h>
#include <linux/ieee80211.h>
#include <linux/nl80211.h>
+#include "iwl-csr.h"
enum iwl_device_family {
IWL_DEVICE_FAMILY_UNDEFINED,
@@ -151,6 +154,8 @@
#define ANT_AC (ANT_A | ANT_C)
#define ANT_BC (ANT_B | ANT_C)
#define ANT_ABC (ANT_A | ANT_B | ANT_C)
+#define MAX_ANT_NUM 3
+
static inline u8 num_of_ant(u8 mask)
{
@@ -283,6 +288,52 @@
};
/**
+ * struct iwl_csr_params
+ *
+ * @flag_sw_reset: reset the device
+ * @flag_mac_clock_ready:
+ * Indicates MAC (ucode processor, etc.) is powered up and can run.
+ * Internal resources are accessible.
+ * NOTE: This does not indicate that the processor is actually running.
+ * NOTE: This does not indicate that device has completed
+ * init or post-power-down restore of internal SRAM memory.
+ * Use CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP as indication that
+ * SRAM is restored and uCode is in normal operation mode.
+ * This note is relevant only for pre 5xxx devices.
+ * NOTE: After device reset, this bit remains "0" until host sets
+ * INIT_DONE
+ * @flag_init_done: Host sets this to put device into fully operational
+ * D0 power mode. Host resets this after SW_RESET to put device into
+ * low power mode.
+ * @flag_mac_access_req: Host sets this to request and maintain MAC wakeup,
+ * to allow host access to device-internal resources. Host must wait for
+ * mac_clock_ready (and !GOING_TO_SLEEP) before accessing non-CSR device
+ * registers.
+ * @flag_val_mac_access_en: mac access is enabled
+ * @flag_master_dis: disable master
+ * @flag_stop_master: stop master
+ * @addr_sw_reset: address for resetting the device
+ * @mac_addr0_otp: first part of MAC address from OTP
+ * @mac_addr1_otp: second part of MAC address from OTP
+ * @mac_addr0_strap: first part of MAC address from strap
+ * @mac_addr1_strap: second part of MAC address from strap
+ */
+struct iwl_csr_params {
+ u8 flag_sw_reset;
+ u8 flag_mac_clock_ready;
+ u8 flag_init_done;
+ u8 flag_mac_access_req;
+ u8 flag_val_mac_access_en;
+ u8 flag_master_dis;
+ u8 flag_stop_master;
+ u8 addr_sw_reset;
+ u32 mac_addr0_otp;
+ u32 mac_addr1_otp;
+ u32 mac_addr0_strap;
+ u32 mac_addr1_strap;
+};
+
+/**
* struct iwl_cfg
* @name: Official name of the device
* @fw_name_pre: Firmware filename prefix. The api version and extension
@@ -294,8 +345,8 @@
* next step. Supported only in integrated solutions.
* @ucode_api_max: Highest version of uCode API supported by driver.
* @ucode_api_min: Lowest version of uCode API supported by driver.
- * @max_inst_size: The maximal length of the fw inst section
- * @max_data_size: The maximal length of the fw data section
+ * @max_inst_size: The maximal length of the fw inst section (only DVM)
+ * @max_data_size: The maximal length of the fw data section (only DVM)
* @valid_tx_ant: valid transmit antenna
* @valid_rx_ant: valid receive antenna
* @non_shared_ant: the antenna that is for WiFi only
@@ -314,6 +365,7 @@
* @mac_addr_from_csr: read HW address from CSR registers
* @features: hw features, any combination of feature_whitelist
* @pwr_tx_backoffs: translation table between power limits and backoffs
+ * @csr: csr flags and addresses that are different across devices
* @max_rx_agg_size: max RX aggregation size of the ADDBA request/response
* @max_tx_agg_size: max TX aggregation size of the ADDBA request/response
* @max_ht_ampdu_factor: the exponent of the max length of A-MPDU that the
@@ -333,8 +385,6 @@
* @gen2: 22000 and on transport operation
* @cdb: CDB support
* @nvm_type: see &enum iwl_nvm_type
- * @tx_cmd_queue_size: size of the cmd queue. If zero, use the same value as
- * the regular queues
*
* We enable the driver to be backward compatible wrt. hardware features.
* API differences in uCode shouldn't be handled here but through TLVs
@@ -354,6 +404,7 @@
const struct iwl_pwr_tx_backoff *pwr_tx_backoffs;
const char *default_nvm_file_C_step;
const struct iwl_tt_params *thermal_params;
+ const struct iwl_csr_params *csr;
enum iwl_device_family device_family;
enum iwl_led_mode led_mode;
enum iwl_nvm_type nvm_type;
@@ -369,7 +420,7 @@
u32 soc_latency;
u16 nvm_ver;
u16 nvm_calib_ver;
- u16 rx_with_siso_diversity:1,
+ u32 rx_with_siso_diversity:1,
bt_shared_single_ant:1,
internal_wimax_coex:1,
host_interrupt_operation_mode:1,
@@ -386,7 +437,6 @@
gen2:1,
cdb:1,
dbgc_supported:1;
- u16 tx_cmd_queue_size;
u8 valid_tx_ant;
u8 valid_rx_ant;
u8 non_shared_ant;
@@ -401,6 +451,36 @@
u32 extra_phy_cfg_flags;
};
+static const struct iwl_csr_params iwl_csr_v1 = {
+ .flag_mac_clock_ready = 0,
+ .flag_val_mac_access_en = 0,
+ .flag_init_done = 2,
+ .flag_mac_access_req = 3,
+ .flag_sw_reset = 7,
+ .flag_master_dis = 8,
+ .flag_stop_master = 9,
+ .addr_sw_reset = (CSR_BASE + 0x020),
+ .mac_addr0_otp = 0x380,
+ .mac_addr1_otp = 0x384,
+ .mac_addr0_strap = 0x388,
+ .mac_addr1_strap = 0x38C
+};
+
+static const struct iwl_csr_params iwl_csr_v2 = {
+ .flag_init_done = 6,
+ .flag_mac_clock_ready = 20,
+ .flag_val_mac_access_en = 20,
+ .flag_mac_access_req = 21,
+ .flag_master_dis = 28,
+ .flag_stop_master = 29,
+ .flag_sw_reset = 31,
+ .addr_sw_reset = (CSR_BASE + 0x024),
+ .mac_addr0_otp = 0x30,
+ .mac_addr1_otp = 0x34,
+ .mac_addr0_strap = 0x38,
+ .mac_addr1_strap = 0x3C
+};
+
/*
* This list declares the config structures for all devices.
*/
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-csr.h b/drivers/net/wireless/intel/iwlwifi/iwl-csr.h
index 4f0d070..ba971d3 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-csr.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-csr.h
@@ -8,6 +8,7 @@
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2014 Intel Mobile Communications GmbH
* Copyright(c) 2016 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -34,6 +35,7 @@
*
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2014 Intel Mobile Communications GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -257,7 +259,6 @@
/* RESET */
#define CSR_RESET_REG_FLAG_NEVO_RESET (0x00000001)
#define CSR_RESET_REG_FLAG_FORCE_NMI (0x00000002)
-#define CSR_RESET_REG_FLAG_SW_RESET (0x00000080)
#define CSR_RESET_REG_FLAG_MASTER_DISABLED (0x00000100)
#define CSR_RESET_REG_FLAG_STOP_MASTER (0x00000200)
#define CSR_RESET_LINK_PWR_MGMT_DISABLED (0x80000000)
@@ -280,35 +281,10 @@
* 4: GOING_TO_SLEEP
* Indicates MAC is entering a power-saving sleep power-down.
* Not a good time to access device-internal resources.
- * 3: MAC_ACCESS_REQ
- * Host sets this to request and maintain MAC wakeup, to allow host
- * access to device-internal resources. Host must wait for
- * MAC_CLOCK_READY (and !GOING_TO_SLEEP) before accessing non-CSR
- * device registers.
- * 2: INIT_DONE
- * Host sets this to put device into fully operational D0 power mode.
- * Host resets this after SW_RESET to put device into low power mode.
- * 0: MAC_CLOCK_READY
- * Indicates MAC (ucode processor, etc.) is powered up and can run.
- * Internal resources are accessible.
- * NOTE: This does not indicate that the processor is actually running.
- * NOTE: This does not indicate that device has completed
- * init or post-power-down restore of internal SRAM memory.
- * Use CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP as indication that
- * SRAM is restored and uCode is in normal operation mode.
- * Later devices (5xxx/6xxx/1xxx) use non-volatile SRAM, and
- * do not need to save/restore it.
- * NOTE: After device reset, this bit remains "0" until host sets
- * INIT_DONE
*/
-#define CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY (0x00000001)
-#define CSR_GP_CNTRL_REG_FLAG_INIT_DONE (0x00000004)
-#define CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ (0x00000008)
#define CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP (0x00000010)
#define CSR_GP_CNTRL_REG_FLAG_XTAL_ON (0x00000400)
-#define CSR_GP_CNTRL_REG_VAL_MAC_ACCESS_EN (0x00000001)
-
#define CSR_GP_CNTRL_REG_MSK_POWER_SAVE_TYPE (0x07000000)
#define CSR_GP_CNTRL_REG_FLAG_RFKILL_WAKE_L1A_EN (0x04000000)
#define CSR_GP_CNTRL_REG_FLAG_HW_RF_KILL_SW (0x08000000)
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c
index aa2d5c1..c59ce4f 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c
@@ -179,6 +179,7 @@
for (i = 0; i < ARRAY_SIZE(drv->fw.dbg_trigger_tlv); i++)
kfree(drv->fw.dbg_trigger_tlv[i]);
kfree(drv->fw.dbg_mem_tlv);
+ kfree(drv->fw.iml);
for (i = 0; i < IWL_UCODE_TYPE_MAX; i++)
iwl_free_fw_img(drv, drv->fw.img + i);
@@ -1126,6 +1127,13 @@
pieces->n_dbg_mem_tlv++;
break;
}
+ case IWL_UCODE_TLV_IML: {
+ drv->fw.iml_len = tlv_len;
+ drv->fw.iml = kmemdup(tlv_data, tlv_len, GFP_KERNEL);
+ if (!drv->fw.iml)
+ return -ENOMEM;
+ break;
+ }
default:
IWL_DEBUG_INFO(drv, "unknown TLV: %d\n", tlv_type);
break;
@@ -1842,3 +1850,9 @@
module_param_named(disable_11ac, iwlwifi_mod_params.disable_11ac, bool, 0444);
MODULE_PARM_DESC(disable_11ac, "Disable VHT capabilities (default: false)");
+
+module_param_named(remove_when_gone,
+ iwlwifi_mod_params.remove_when_gone, bool,
+ 0444);
+MODULE_PARM_DESC(remove_when_gone,
+ "Remove dev from PCIe bus if it is deemed inaccessible (default: false)");
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c
index 3199d34..777f5df 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.c
@@ -899,8 +899,8 @@
EEPROM_SKU_CAP);
data->sku_cap_11n_enable = sku & EEPROM_SKU_CAP_11N_ENABLE;
data->sku_cap_amt_enable = sku & EEPROM_SKU_CAP_AMT_ENABLE;
- data->sku_cap_band_24GHz_enable = sku & EEPROM_SKU_CAP_BAND_24GHZ;
- data->sku_cap_band_52GHz_enable = sku & EEPROM_SKU_CAP_BAND_52GHZ;
+ data->sku_cap_band_24ghz_enable = sku & EEPROM_SKU_CAP_BAND_24GHZ;
+ data->sku_cap_band_52ghz_enable = sku & EEPROM_SKU_CAP_BAND_52GHZ;
data->sku_cap_ipan_enable = sku & EEPROM_SKU_CAP_IPAN_ENABLE;
if (iwlwifi_mod_params.disable_11n & IWL_DISABLE_HT_ALL)
data->sku_cap_11n_enable = false;
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.h b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.h
index b338889..8be50ed 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-parse.h
@@ -81,10 +81,11 @@
__le16 kelvin_voltage;
__le16 xtal_calib[2];
- bool sku_cap_band_24GHz_enable;
- bool sku_cap_band_52GHz_enable;
+ bool sku_cap_band_24ghz_enable;
+ bool sku_cap_band_52ghz_enable;
bool sku_cap_11n_enable;
bool sku_cap_11ac_enable;
+ bool sku_cap_11ax_enable;
bool sku_cap_amt_enable;
bool sku_cap_ipan_enable;
bool sku_cap_mimo_disabled;
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c
index f2cea1c..ac965c3 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c
@@ -6,6 +6,7 @@
* GPL LICENSE SUMMARY
*
* Copyright(c) 2008 - 2014 Intel Corporation. All rights reserved.
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -31,6 +32,7 @@
* BSD LICENSE
*
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -199,12 +201,12 @@
/* Enable 40MHz radio clock */
iwl_write32(trans, CSR_GP_CNTRL,
iwl_read32(trans, CSR_GP_CNTRL) |
- CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ BIT(trans->cfg->csr->flag_init_done));
/* wait for clock to be ready */
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
25000);
if (ret < 0) {
IWL_ERR(trans, "Time out access OTP\n");
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-modparams.h b/drivers/net/wireless/intel/iwlwifi/iwl-modparams.h
index a41c46e..a7dd8a8 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-modparams.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-modparams.h
@@ -122,6 +122,7 @@
* @lar_disable: disable LAR (regulatory), default = 0
* @fw_monitor: allow to use firmware monitor
* @disable_11ac: disable VHT capabilities, default = false.
+ * @remove_when_gone: remove an inaccessible device from the PCIe bus.
*/
struct iwl_mod_params {
int swcrypto;
@@ -143,6 +144,7 @@
bool lar_disable;
bool fw_monitor;
bool disable_11ac;
+ bool remove_when_gone;
};
#endif /* #__iwl_modparams_h__ */
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index ca01746..0f9d564 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -8,6 +8,7 @@
* Copyright(c) 2008 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -68,6 +70,7 @@
#include <linux/export.h>
#include <linux/etherdevice.h>
#include <linux/pci.h>
+#include <linux/firmware.h>
#include "iwl-drv.h"
#include "iwl-modparams.h"
@@ -77,6 +80,9 @@
#include "iwl-csr.h"
#include "fw/acpi.h"
#include "fw/api/nvm-reg.h"
+#include "fw/api/commands.h"
+#include "fw/api/cmdhdr.h"
+#include "fw/img.h"
/* NVM offsets (in words) definitions */
enum nvm_offsets {
@@ -292,7 +298,7 @@
static int iwl_init_channel_map(struct device *dev, const struct iwl_cfg *cfg,
struct iwl_nvm_data *data,
const __le16 * const nvm_ch_flags,
- bool lar_supported, bool no_wide_in_5ghz)
+ u32 sbands_flags)
{
int ch_idx;
int n_channels = 0;
@@ -316,11 +322,12 @@
ch_flags = __le16_to_cpup(nvm_ch_flags + ch_idx);
- if (is_5ghz && !data->sku_cap_band_52GHz_enable)
+ if (is_5ghz && !data->sku_cap_band_52ghz_enable)
continue;
/* workaround to disable wide channels in 5GHz */
- if (no_wide_in_5ghz && is_5ghz) {
+ if ((sbands_flags & IWL_NVM_SBANDS_FLAGS_NO_WIDE_IN_5GHZ) &&
+ is_5ghz) {
ch_flags &= ~(NVM_CHANNEL_40MHZ |
NVM_CHANNEL_80MHZ |
NVM_CHANNEL_160MHZ);
@@ -329,7 +336,8 @@
if (ch_flags & NVM_CHANNEL_160MHZ)
data->vht160_supported = true;
- if (!lar_supported && !(ch_flags & NVM_CHANNEL_VALID)) {
+ if (!(sbands_flags & IWL_NVM_SBANDS_FLAGS_LAR) &&
+ !(ch_flags & NVM_CHANNEL_VALID)) {
/*
* Channels might become valid later if lar is
* supported, hence we still want to add them to
@@ -359,7 +367,7 @@
channel->max_power = IWL_DEFAULT_MAX_TX_POWER;
/* don't put limitations in case we're using LAR */
- if (!lar_supported)
+ if (!(sbands_flags & IWL_NVM_SBANDS_FLAGS_LAR))
channel->flags = iwl_get_channel_flags(nvm_chan[ch_idx],
ch_idx, is_5ghz,
ch_flags, cfg);
@@ -455,17 +463,17 @@
vht_cap->vht_mcs.tx_mcs_map = vht_cap->vht_mcs.rx_mcs_map;
}
-void iwl_init_sbands(struct device *dev, const struct iwl_cfg *cfg,
- struct iwl_nvm_data *data, const __le16 *nvm_ch_flags,
- u8 tx_chains, u8 rx_chains, bool lar_supported,
- bool no_wide_in_5ghz)
+static void iwl_init_sbands(struct device *dev, const struct iwl_cfg *cfg,
+ struct iwl_nvm_data *data,
+ const __le16 *nvm_ch_flags, u8 tx_chains,
+ u8 rx_chains, u32 sbands_flags)
{
int n_channels;
int n_used = 0;
struct ieee80211_supported_band *sband;
n_channels = iwl_init_channel_map(dev, cfg, data, nvm_ch_flags,
- lar_supported, no_wide_in_5ghz);
+ sbands_flags);
sband = &data->bands[NL80211_BAND_2GHZ];
sband->band = NL80211_BAND_2GHZ;
sband->bitrates = &iwl_cfg80211_rates[RATES_24_OFFS];
@@ -491,7 +499,6 @@
IWL_ERR_DEV(dev, "NVM: used only %d of %d channels\n",
n_used, n_channels);
}
-IWL_EXPORT_SYMBOL(iwl_init_sbands);
static int iwl_get_sku(const struct iwl_cfg *cfg, const __le16 *nvm_sw,
const __le16 *phy_sku)
@@ -569,11 +576,15 @@
dest[5] = hw_addr[0];
}
-void iwl_set_hw_address_from_csr(struct iwl_trans *trans,
- struct iwl_nvm_data *data)
+static void iwl_set_hw_address_from_csr(struct iwl_trans *trans,
+ struct iwl_nvm_data *data)
{
- __le32 mac_addr0 = cpu_to_le32(iwl_read32(trans, CSR_MAC_ADDR0_STRAP));
- __le32 mac_addr1 = cpu_to_le32(iwl_read32(trans, CSR_MAC_ADDR1_STRAP));
+ __le32 mac_addr0 =
+ cpu_to_le32(iwl_read32(trans,
+ trans->cfg->csr->mac_addr0_strap));
+ __le32 mac_addr1 =
+ cpu_to_le32(iwl_read32(trans,
+ trans->cfg->csr->mac_addr1_strap));
iwl_flip_hw_address(mac_addr0, mac_addr1, data->hw_addr);
/*
@@ -583,12 +594,13 @@
if (is_valid_ether_addr(data->hw_addr))
return;
- mac_addr0 = cpu_to_le32(iwl_read32(trans, CSR_MAC_ADDR0_OTP));
- mac_addr1 = cpu_to_le32(iwl_read32(trans, CSR_MAC_ADDR1_OTP));
+ mac_addr0 = cpu_to_le32(iwl_read32(trans,
+ trans->cfg->csr->mac_addr0_otp));
+ mac_addr1 = cpu_to_le32(iwl_read32(trans,
+ trans->cfg->csr->mac_addr1_otp));
iwl_flip_hw_address(mac_addr0, mac_addr1, data->hw_addr);
}
-IWL_EXPORT_SYMBOL(iwl_set_hw_address_from_csr);
static void iwl_set_hw_address_family_8000(struct iwl_trans *trans,
const struct iwl_cfg *cfg,
@@ -713,8 +725,8 @@
struct device *dev = trans->dev;
struct iwl_nvm_data *data;
bool lar_enabled;
- bool no_wide_in_5ghz = iwl_nvm_no_wide_in_5ghz(dev, cfg, nvm_hw);
u32 sku, radio_cfg;
+ u32 sbands_flags = 0;
u16 lar_config;
const __le16 *ch_section;
@@ -741,8 +753,8 @@
rx_chains &= data->valid_rx_ant;
sku = iwl_get_sku(cfg, nvm_sw, phy_sku);
- data->sku_cap_band_24GHz_enable = sku & NVM_SKU_CAP_BAND_24GHZ;
- data->sku_cap_band_52GHz_enable = sku & NVM_SKU_CAP_BAND_52GHZ;
+ data->sku_cap_band_24ghz_enable = sku & NVM_SKU_CAP_BAND_24GHZ;
+ data->sku_cap_band_52ghz_enable = sku & NVM_SKU_CAP_BAND_52GHZ;
data->sku_cap_11n_enable = sku & NVM_SKU_CAP_11N_ENABLE;
if (iwlwifi_mod_params.disable_11n & IWL_DISABLE_HT_ALL)
data->sku_cap_11n_enable = false;
@@ -787,8 +799,14 @@
return NULL;
}
+ if (lar_fw_supported && lar_enabled)
+ sbands_flags |= IWL_NVM_SBANDS_FLAGS_LAR;
+
+ if (iwl_nvm_no_wide_in_5ghz(dev, cfg, nvm_hw))
+ sbands_flags |= IWL_NVM_SBANDS_FLAGS_NO_WIDE_IN_5GHZ;
+
iwl_init_sbands(dev, cfg, data, ch_section, tx_chains, rx_chains,
- lar_fw_supported && lar_enabled, no_wide_in_5ghz);
+ sbands_flags);
data->calib_version = 255;
return data;
@@ -1017,3 +1035,294 @@
return copy_rd;
}
IWL_EXPORT_SYMBOL(iwl_parse_nvm_mcc_info);
+
+#define IWL_MAX_NVM_SECTION_SIZE 0x1b58
+#define IWL_MAX_EXT_NVM_SECTION_SIZE 0x1ffc
+#define MAX_NVM_FILE_LEN 16384
+
+void iwl_nvm_fixups(u32 hw_id, unsigned int section, u8 *data,
+ unsigned int len)
+{
+#define IWL_4165_DEVICE_ID 0x5501
+#define NVM_SKU_CAP_MIMO_DISABLE BIT(5)
+
+ if (section == NVM_SECTION_TYPE_PHY_SKU &&
+ hw_id == IWL_4165_DEVICE_ID && data && len >= 5 &&
+ (data[4] & NVM_SKU_CAP_MIMO_DISABLE))
+ /* OTP 0x52 bug work around: it's a 1x1 device */
+ data[3] = ANT_B | (ANT_B << 4);
+}
+IWL_EXPORT_SYMBOL(iwl_nvm_fixups);
+
+/*
+ * Reads external NVM from a file into mvm->nvm_sections
+ *
+ * HOW TO CREATE THE NVM FILE FORMAT:
+ * ------------------------------
+ * 1. create hex file, format:
+ * 3800 -> header
+ * 0000 -> header
+ * 5a40 -> data
+ *
+ * rev - 6 bit (word1)
+ * len - 10 bit (word1)
+ * id - 4 bit (word2)
+ * rsv - 12 bit (word2)
+ *
+ * 2. flip 8bits with 8 bits per line to get the right NVM file format
+ *
+ * 3. create binary file from the hex file
+ *
+ * 4. save as "iNVM_xxx.bin" under /lib/firmware
+ */
+int iwl_read_external_nvm(struct iwl_trans *trans,
+ const char *nvm_file_name,
+ struct iwl_nvm_section *nvm_sections)
+{
+ int ret, section_size;
+ u16 section_id;
+ const struct firmware *fw_entry;
+ const struct {
+ __le16 word1;
+ __le16 word2;
+ u8 data[];
+ } *file_sec;
+ const u8 *eof;
+ u8 *temp;
+ int max_section_size;
+ const __le32 *dword_buff;
+
+#define NVM_WORD1_LEN(x) (8 * (x & 0x03FF))
+#define NVM_WORD2_ID(x) (x >> 12)
+#define EXT_NVM_WORD2_LEN(x) (2 * (((x) & 0xFF) << 8 | (x) >> 8))
+#define EXT_NVM_WORD1_ID(x) ((x) >> 4)
+#define NVM_HEADER_0 (0x2A504C54)
+#define NVM_HEADER_1 (0x4E564D2A)
+#define NVM_HEADER_SIZE (4 * sizeof(u32))
+
+ IWL_DEBUG_EEPROM(trans->dev, "Read from external NVM\n");
+
+ /* Maximal size depends on NVM version */
+ if (trans->cfg->nvm_type != IWL_NVM_EXT)
+ max_section_size = IWL_MAX_NVM_SECTION_SIZE;
+ else
+ max_section_size = IWL_MAX_EXT_NVM_SECTION_SIZE;
+
+ /*
+ * Obtain NVM image via request_firmware. Since we already used
+ * request_firmware_nowait() for the firmware binary load and only
+ * get here after that we assume the NVM request can be satisfied
+ * synchronously.
+ */
+ ret = request_firmware(&fw_entry, nvm_file_name, trans->dev);
+ if (ret) {
+ IWL_ERR(trans, "ERROR: %s isn't available %d\n",
+ nvm_file_name, ret);
+ return ret;
+ }
+
+ IWL_INFO(trans, "Loaded NVM file %s (%zu bytes)\n",
+ nvm_file_name, fw_entry->size);
+
+ if (fw_entry->size > MAX_NVM_FILE_LEN) {
+ IWL_ERR(trans, "NVM file too large\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ eof = fw_entry->data + fw_entry->size;
+ dword_buff = (__le32 *)fw_entry->data;
+
+ /* some NVM file will contain a header.
+ * The header is identified by 2 dwords header as follow:
+ * dword[0] = 0x2A504C54
+ * dword[1] = 0x4E564D2A
+ *
+ * This header must be skipped when providing the NVM data to the FW.
+ */
+ if (fw_entry->size > NVM_HEADER_SIZE &&
+ dword_buff[0] == cpu_to_le32(NVM_HEADER_0) &&
+ dword_buff[1] == cpu_to_le32(NVM_HEADER_1)) {
+ file_sec = (void *)(fw_entry->data + NVM_HEADER_SIZE);
+ IWL_INFO(trans, "NVM Version %08X\n", le32_to_cpu(dword_buff[2]));
+ IWL_INFO(trans, "NVM Manufacturing date %08X\n",
+ le32_to_cpu(dword_buff[3]));
+
+ /* nvm file validation, dword_buff[2] holds the file version */
+ if (trans->cfg->device_family == IWL_DEVICE_FAMILY_8000 &&
+ CSR_HW_REV_STEP(trans->hw_rev) == SILICON_C_STEP &&
+ le32_to_cpu(dword_buff[2]) < 0xE4A) {
+ ret = -EFAULT;
+ goto out;
+ }
+ } else {
+ file_sec = (void *)fw_entry->data;
+ }
+
+ while (true) {
+ if (file_sec->data > eof) {
+ IWL_ERR(trans,
+ "ERROR - NVM file too short for section header\n");
+ ret = -EINVAL;
+ break;
+ }
+
+ /* check for EOF marker */
+ if (!file_sec->word1 && !file_sec->word2) {
+ ret = 0;
+ break;
+ }
+
+ if (trans->cfg->nvm_type != IWL_NVM_EXT) {
+ section_size =
+ 2 * NVM_WORD1_LEN(le16_to_cpu(file_sec->word1));
+ section_id = NVM_WORD2_ID(le16_to_cpu(file_sec->word2));
+ } else {
+ section_size = 2 * EXT_NVM_WORD2_LEN(
+ le16_to_cpu(file_sec->word2));
+ section_id = EXT_NVM_WORD1_ID(
+ le16_to_cpu(file_sec->word1));
+ }
+
+ if (section_size > max_section_size) {
+ IWL_ERR(trans, "ERROR - section too large (%d)\n",
+ section_size);
+ ret = -EINVAL;
+ break;
+ }
+
+ if (!section_size) {
+ IWL_ERR(trans, "ERROR - section empty\n");
+ ret = -EINVAL;
+ break;
+ }
+
+ if (file_sec->data + section_size > eof) {
+ IWL_ERR(trans,
+ "ERROR - NVM file too short for section (%d bytes)\n",
+ section_size);
+ ret = -EINVAL;
+ break;
+ }
+
+ if (WARN(section_id >= NVM_MAX_NUM_SECTIONS,
+ "Invalid NVM section ID %d\n", section_id)) {
+ ret = -EINVAL;
+ break;
+ }
+
+ temp = kmemdup(file_sec->data, section_size, GFP_KERNEL);
+ if (!temp) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ iwl_nvm_fixups(trans->hw_id, section_id, temp, section_size);
+
+ kfree(nvm_sections[section_id].data);
+ nvm_sections[section_id].data = temp;
+ nvm_sections[section_id].length = section_size;
+
+ /* advance to the next section */
+ file_sec = (void *)(file_sec->data + section_size);
+ }
+out:
+ release_firmware(fw_entry);
+ return ret;
+}
+IWL_EXPORT_SYMBOL(iwl_read_external_nvm);
+
+struct iwl_nvm_data *iwl_get_nvm(struct iwl_trans *trans,
+ const struct iwl_fw *fw)
+{
+ struct iwl_nvm_get_info cmd = {};
+ struct iwl_nvm_get_info_rsp *rsp;
+ struct iwl_nvm_data *nvm;
+ struct iwl_host_cmd hcmd = {
+ .flags = CMD_WANT_SKB | CMD_SEND_IN_RFKILL,
+ .data = { &cmd, },
+ .len = { sizeof(cmd) },
+ .id = WIDE_ID(REGULATORY_AND_NVM_GROUP, NVM_GET_INFO)
+ };
+ int ret;
+ bool lar_fw_supported = !iwlwifi_mod_params.lar_disable &&
+ fw_has_capa(&fw->ucode_capa,
+ IWL_UCODE_TLV_CAPA_LAR_SUPPORT);
+ u32 mac_flags;
+ u32 sbands_flags = 0;
+
+ ret = iwl_trans_send_cmd(trans, &hcmd);
+ if (ret)
+ return ERR_PTR(ret);
+
+ if (WARN(iwl_rx_packet_payload_len(hcmd.resp_pkt) != sizeof(*rsp),
+ "Invalid payload len in NVM response from FW %d",
+ iwl_rx_packet_payload_len(hcmd.resp_pkt))) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ rsp = (void *)hcmd.resp_pkt->data;
+ if (le32_to_cpu(rsp->general.flags) & NVM_GENERAL_FLAGS_EMPTY_OTP)
+ IWL_INFO(trans, "OTP is empty\n");
+
+ nvm = kzalloc(sizeof(*nvm) +
+ sizeof(struct ieee80211_channel) * IWL_NUM_CHANNELS,
+ GFP_KERNEL);
+ if (!nvm) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ iwl_set_hw_address_from_csr(trans, nvm);
+ /* TODO: if platform NVM has MAC address - override it here */
+
+ if (!is_valid_ether_addr(nvm->hw_addr)) {
+ IWL_ERR(trans, "no valid mac address was found\n");
+ ret = -EINVAL;
+ goto err_free;
+ }
+
+ IWL_INFO(trans, "base HW address: %pM\n", nvm->hw_addr);
+
+ /* Initialize general data */
+ nvm->nvm_version = le16_to_cpu(rsp->general.nvm_version);
+
+ /* Initialize MAC sku data */
+ mac_flags = le32_to_cpu(rsp->mac_sku.mac_sku_flags);
+ nvm->sku_cap_11ac_enable =
+ !!(mac_flags & NVM_MAC_SKU_FLAGS_802_11AC_ENABLED);
+ nvm->sku_cap_11n_enable =
+ !!(mac_flags & NVM_MAC_SKU_FLAGS_802_11N_ENABLED);
+ nvm->sku_cap_band_24ghz_enable =
+ !!(mac_flags & NVM_MAC_SKU_FLAGS_BAND_2_4_ENABLED);
+ nvm->sku_cap_band_52ghz_enable =
+ !!(mac_flags & NVM_MAC_SKU_FLAGS_BAND_5_2_ENABLED);
+ nvm->sku_cap_mimo_disabled =
+ !!(mac_flags & NVM_MAC_SKU_FLAGS_MIMO_DISABLED);
+
+ /* Initialize PHY sku data */
+ nvm->valid_tx_ant = (u8)le32_to_cpu(rsp->phy_sku.tx_chains);
+ nvm->valid_rx_ant = (u8)le32_to_cpu(rsp->phy_sku.rx_chains);
+
+ if (le32_to_cpu(rsp->regulatory.lar_enabled) && lar_fw_supported) {
+ nvm->lar_enabled = true;
+ sbands_flags |= IWL_NVM_SBANDS_FLAGS_LAR;
+ }
+
+ iwl_init_sbands(trans->dev, trans->cfg, nvm,
+ rsp->regulatory.channel_profile,
+ nvm->valid_tx_ant & fw->valid_tx_ant,
+ nvm->valid_rx_ant & fw->valid_rx_ant,
+ sbands_flags);
+
+ iwl_free_resp(&hcmd);
+ return nvm;
+
+err_free:
+ kfree(nvm);
+out:
+ iwl_free_resp(&hcmd);
+ return ERR_PTR(ret);
+}
+IWL_EXPORT_SYMBOL(iwl_get_nvm);
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.h b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.h
index 3071a23b..234d100 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.h
@@ -7,6 +7,7 @@
*
* Copyright(c) 2008 - 2015 Intel Corporation. All rights reserved.
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -33,6 +34,7 @@
*
* Copyright(c) 2005 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -68,6 +70,17 @@
#include "iwl-eeprom-parse.h"
/**
+ * enum iwl_nvm_sbands_flags - modification flags for the channel profiles
+ *
+ * @IWL_NVM_SBANDS_FLAGS_LAR: LAR is enabled
+ * @IWL_NVM_SBANDS_FLAGS_NO_WIDE_IN_5GHZ: disallow 40, 80 and 160MHz on 5GHz
+ */
+enum iwl_nvm_sbands_flags {
+ IWL_NVM_SBANDS_FLAGS_LAR = BIT(0),
+ IWL_NVM_SBANDS_FLAGS_NO_WIDE_IN_5GHZ = BIT(1),
+};
+
+/**
* iwl_parse_nvm_data - parse NVM data and return values
*
* This function parses all NVM values we need and then
@@ -83,20 +96,6 @@
u8 tx_chains, u8 rx_chains, bool lar_fw_supported);
/**
- * iwl_set_hw_address_from_csr - sets HW address for 9000 devices and on
- */
-void iwl_set_hw_address_from_csr(struct iwl_trans *trans,
- struct iwl_nvm_data *data);
-
-/**
- * iwl_init_sbands - parse and set all channel profiles
- */
-void iwl_init_sbands(struct device *dev, const struct iwl_cfg *cfg,
- struct iwl_nvm_data *data, const __le16 *nvm_ch_flags,
- u8 tx_chains, u8 rx_chains, bool lar_supported,
- bool no_wide_in_5ghz);
-
-/**
* iwl_parse_mcc_info - parse MCC (mobile country code) info coming from FW
*
* This function parses the regulatory channel data received as a
@@ -111,4 +110,33 @@
int num_of_ch, __le32 *channels, u16 fw_mcc,
u16 geo_info);
+/**
+ * struct iwl_nvm_section - describes an NVM section in memory.
+ *
+ * This struct holds an NVM section read from the NIC using NVM_ACCESS_CMD,
+ * and saved for later use by the driver. Not all NVM sections are saved
+ * this way, only the needed ones.
+ */
+struct iwl_nvm_section {
+ u16 length;
+ const u8 *data;
+};
+
+/**
+ * iwl_read_external_nvm - Reads external NVM from a file into nvm_sections
+ */
+int iwl_read_external_nvm(struct iwl_trans *trans,
+ const char *nvm_file_name,
+ struct iwl_nvm_section *nvm_sections);
+void iwl_nvm_fixups(u32 hw_id, unsigned int section, u8 *data,
+ unsigned int len);
+
+/**
+ * iwl_get_nvm - retrieve NVM data from firmware
+ *
+ * Allocates a new iwl_nvm_data structure, fills it with
+ * NVM data, and returns it to caller.
+ */
+struct iwl_nvm_data *iwl_get_nvm(struct iwl_trans *trans,
+ const struct iwl_fw *fw);
#endif /* __iwl_nvm_parse_h__ */
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h
index c25ed1a..1b9c627 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h
@@ -554,7 +554,7 @@
/* 22000 functions */
int (*txq_alloc)(struct iwl_trans *trans,
struct iwl_tx_queue_cfg_cmd *cmd,
- int cmd_id,
+ int cmd_id, int size,
unsigned int queue_wdg_timeout);
void (*txq_free)(struct iwl_trans *trans, int queue);
@@ -691,6 +691,8 @@
* @wide_cmd_header: true when ucode supports wide command header format
* @num_rx_queues: number of RX queues allocated by the transport;
* the transport must set this before calling iwl_drv_start()
+ * @iml_len: the length of the image loader
+ * @iml: a pointer to the image loader itself
* @dev_cmd_pool: pool for Tx cmd allocation - for internal use only.
* The user should use iwl_trans_{alloc,free}_tx_cmd.
* @rx_mpdu_cmd: MPDU RX command ID, must be assigned by opmode before
@@ -735,6 +737,9 @@
u8 num_rx_queues;
+ size_t iml_len;
+ u8 *iml;
+
/* The following fields are internal only */
struct kmem_cache *dev_cmd_pool;
char dev_cmd_pool_name[50];
@@ -952,8 +957,8 @@
static inline int
iwl_trans_txq_alloc(struct iwl_trans *trans,
struct iwl_tx_queue_cfg_cmd *cmd,
- int cmd_id,
- unsigned int queue_wdg_timeout)
+ int cmd_id, int size,
+ unsigned int wdg_timeout)
{
might_sleep();
@@ -965,7 +970,7 @@
return -EIO;
}
- return trans->ops->txq_alloc(trans, cmd, cmd_id, queue_wdg_timeout);
+ return trans->ops->txq_alloc(trans, cmd, cmd_id, size, wdg_timeout);
}
static inline void iwl_trans_txq_set_shared_mode(struct iwl_trans *trans,
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/coex.c b/drivers/net/wireless/intel/iwlwifi/mvm/coex.c
index 890dbff..016e03a 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/coex.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/coex.c
@@ -279,6 +279,8 @@
struct ieee80211_chanctx_conf *primary;
struct ieee80211_chanctx_conf *secondary;
bool primary_ll;
+ u8 primary_load;
+ u8 secondary_load;
};
static inline
@@ -295,6 +297,30 @@
enable ? -IWL_MVM_BT_COEX_DIS_RED_TXP_THRESH : 0;
}
+#define MVM_COEX_TCM_PERIOD (HZ * 10)
+
+static void iwl_mvm_bt_coex_tcm_based_ci(struct iwl_mvm *mvm,
+ struct iwl_bt_iterator_data *data)
+{
+ unsigned long now = jiffies;
+
+ if (!time_after(now, mvm->bt_coex_last_tcm_ts + MVM_COEX_TCM_PERIOD))
+ return;
+
+ mvm->bt_coex_last_tcm_ts = now;
+
+ /* We assume here that we don't have more than 2 vifs on 2.4GHz */
+
+ /* if the primary is low latency, it will stay primary */
+ if (data->primary_ll)
+ return;
+
+ if (data->primary_load >= data->secondary_load)
+ return;
+
+ swap(data->primary, data->secondary);
+}
+
/* must be called under rcu_read_lock */
static void iwl_mvm_bt_notif_iterator(void *_data, u8 *mac,
struct ieee80211_vif *vif)
@@ -385,6 +411,11 @@
/* there is low latency vif - we will be secondary */
data->secondary = chanctx_conf;
}
+
+ if (data->primary == chanctx_conf)
+ data->primary_load = mvm->tcm.result.load[mvmvif->id];
+ else if (data->secondary == chanctx_conf)
+ data->secondary_load = mvm->tcm.result.load[mvmvif->id];
return;
}
@@ -398,6 +429,10 @@
/* if secondary is not NULL, it might be a GO */
data->secondary = chanctx_conf;
+ if (data->primary == chanctx_conf)
+ data->primary_load = mvm->tcm.result.load[mvmvif->id];
+ else if (data->secondary == chanctx_conf)
+ data->secondary_load = mvm->tcm.result.load[mvmvif->id];
/*
* don't reduce the Tx power if one of these is true:
* we are in LOOSE
@@ -449,6 +484,8 @@
mvm->hw, IEEE80211_IFACE_ITER_NORMAL,
iwl_mvm_bt_notif_iterator, &data);
+ iwl_mvm_bt_coex_tcm_based_ci(mvm, &data);
+
if (data.primary) {
struct ieee80211_chanctx_conf *chan = data.primary;
if (WARN_ON(!chan->def.chan)) {
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/constants.h b/drivers/net/wireless/intel/iwlwifi/mvm/constants.h
index 96b52a2..d61ff66 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/constants.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/constants.h
@@ -69,6 +69,8 @@
#include <linux/ieee80211.h>
+#define IWL_MVM_UAPSD_NOAGG_BSSIDS_NUM 20
+
#define IWL_MVM_DEFAULT_PS_TX_DATA_TIMEOUT (100 * USEC_PER_MSEC)
#define IWL_MVM_DEFAULT_PS_RX_DATA_TIMEOUT (100 * USEC_PER_MSEC)
#define IWL_MVM_WOWLAN_PS_TX_DATA_TIMEOUT (10 * USEC_PER_MSEC)
@@ -112,6 +114,11 @@
#define IWL_MVM_PARSE_NVM 0
#define IWL_MVM_ADWELL_ENABLE 1
#define IWL_MVM_ADWELL_MAX_BUDGET 0
+#define IWL_MVM_TCM_LOAD_MEDIUM_THRESH 10 /* percentage */
+#define IWL_MVM_TCM_LOAD_HIGH_THRESH 50 /* percentage */
+#define IWL_MVM_TCM_LOWLAT_ENABLE_THRESH 100 /* packets/10 seconds */
+#define IWL_MVM_UAPSD_NONAGG_PERIOD 5000 /* msecs */
+#define IWL_MVM_UAPSD_NOAGG_LIST_LEN IWL_MVM_UAPSD_NOAGG_BSSIDS_NUM
#define IWL_MVM_RS_NUM_TRY_BEFORE_ANT_TOGGLE 1
#define IWL_MVM_RS_HT_VHT_RETRIES_PER_RATE 2
#define IWL_MVM_RS_HT_VHT_RETRIES_PER_RATE_TW 1
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/d3.c b/drivers/net/wireless/intel/iwlwifi/mvm/d3.c
index 2efe9b0..3fcf489 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/d3.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/d3.c
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -18,11 +19,6 @@
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110,
- * USA
- *
* The full GNU General Public License is included in this distribution
* in the file called COPYING.
*
@@ -35,6 +31,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -693,6 +690,14 @@
IWL_WOWLAN_WAKEUP_LINK_CHANGE);
}
+ if (wowlan->any) {
+ wowlan_config_cmd->wakeup_filter |=
+ cpu_to_le32(IWL_WOWLAN_WAKEUP_BEACON_MISS |
+ IWL_WOWLAN_WAKEUP_LINK_CHANGE |
+ IWL_WOWLAN_WAKEUP_RX_FRAME |
+ IWL_WOWLAN_WAKEUP_BCN_FILTERING);
+ }
+
return 0;
}
@@ -1097,6 +1102,7 @@
/* make sure the d0i3 exit work is not pending */
flush_work(&mvm->d0i3_exit_work);
+ iwl_mvm_pause_tcm(mvm, true);
iwl_fw_runtime_suspend(&mvm->fwrt);
@@ -2014,6 +2020,8 @@
mvm->trans->system_pm_mode = IWL_PLAT_PM_MODE_DISABLED;
+ iwl_mvm_resume_tcm(mvm);
+
iwl_fw_runtime_resume(&mvm->fwrt);
return ret;
@@ -2042,6 +2050,8 @@
mvm->trans->system_pm_mode = IWL_PLAT_PM_MODE_D3;
+ iwl_mvm_pause_tcm(mvm, true);
+
iwl_fw_runtime_suspend(&mvm->fwrt);
/* start pseudo D3 */
@@ -2104,6 +2114,8 @@
__iwl_mvm_resume(mvm, true);
rtnl_unlock();
+ iwl_mvm_resume_tcm(mvm);
+
iwl_fw_runtime_resume(&mvm->fwrt);
mvm->trans->system_pm_mode = IWL_PLAT_PM_MODE_DISABLED;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs-vif.c b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs-vif.c
index f7fcf700..798605c 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs-vif.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs-vif.c
@@ -269,6 +269,8 @@
mvmvif->id, mvmvif->color);
pos += scnprintf(buf+pos, bufsz-pos, "bssid: %pM\n",
vif->bss_conf.bssid);
+ pos += scnprintf(buf+pos, bufsz-pos, "Load: %d\n",
+ mvm->tcm.result.load[mvmvif->id]);
pos += scnprintf(buf+pos, bufsz-pos, "QoS:\n");
for (i = 0; i < ARRAY_SIZE(mvmvif->queue_params); i++)
pos += scnprintf(buf+pos, bufsz-pos,
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c
index 0e6401c..1c4178f 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/debugfs.c
@@ -1728,6 +1728,27 @@
return ret ?: count;
}
+static ssize_t
+iwl_dbgfs_uapsd_noagg_bssids_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct iwl_mvm *mvm = file->private_data;
+ u8 buf[IWL_MVM_UAPSD_NOAGG_BSSIDS_NUM * ETH_ALEN * 3 + 1];
+ unsigned int pos = 0;
+ size_t bufsz = sizeof(buf);
+ int i;
+
+ mutex_lock(&mvm->mutex);
+
+ for (i = 0; i < IWL_MVM_UAPSD_NOAGG_LIST_LEN; i++)
+ pos += scnprintf(buf + pos, bufsz - pos, "%pM\n",
+ mvm->uapsd_noagg_bssids[i].addr);
+
+ mutex_unlock(&mvm->mutex);
+
+ return simple_read_from_buffer(user_buf, count, ppos, buf, pos);
+}
+
MVM_DEBUGFS_READ_WRITE_FILE_OPS(prph_reg, 64);
/* Device wide debugfs entries */
@@ -1762,6 +1783,8 @@
(IWL_RSS_INDIRECTION_TABLE_SIZE * 2));
MVM_DEBUGFS_WRITE_FILE_OPS(inject_packet, 512);
+MVM_DEBUGFS_READ_FILE_OPS(uapsd_noagg_bssids);
+
#ifdef CONFIG_IWLWIFI_BCAST_FILTERING
MVM_DEBUGFS_READ_WRITE_FILE_OPS(bcast_filters, 256);
MVM_DEBUGFS_READ_WRITE_FILE_OPS(bcast_filters_macs, 256);
@@ -1972,6 +1995,8 @@
mvm->debugfs_dir, &mvm->drop_bcn_ap_mode))
goto err;
+ MVM_DEBUGFS_ADD_FILE(uapsd_noagg_bssids, mvm->debugfs_dir, S_IRUSR);
+
#ifdef CONFIG_IWLWIFI_BCAST_FILTERING
if (mvm->fw->ucode_capa.flags & IWL_UCODE_TLV_FLAGS_BCAST_FILTERING) {
bcast_dir = debugfs_create_dir("bcast_filtering",
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 3c59109..866c91c 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -79,6 +81,8 @@
#include "mvm.h"
#include "fw/dbg.h"
#include "iwl-phy-db.h"
+#include "iwl-modparams.h"
+#include "iwl-nvm-parse.h"
#define MVM_UCODE_ALIVE_TIMEOUT HZ
#define MVM_UCODE_CALIB_TIMEOUT (2*HZ)
@@ -381,7 +385,8 @@
/* Load NVM to NIC if needed */
if (mvm->nvm_file_name) {
- iwl_mvm_read_external_nvm(mvm);
+ iwl_read_external_nvm(mvm->trans, mvm->nvm_file_name,
+ mvm->nvm_sections);
iwl_mvm_load_nvm_to_nic(mvm);
}
@@ -410,7 +415,7 @@
/* Read the NVM only at driver load time, no need to do this twice */
if (!IWL_MVM_PARSE_NVM && read_nvm) {
- mvm->nvm_data = iwl_fw_get_nvm(&mvm->fwrt);
+ mvm->nvm_data = iwl_get_nvm(mvm->trans, mvm->fw);
if (IS_ERR(mvm->nvm_data)) {
ret = PTR_ERR(mvm->nvm_data);
mvm->nvm_data = NULL;
@@ -1093,6 +1098,7 @@
if (fw_has_capa(&mvm->fw->ucode_capa, IWL_UCODE_TLV_CAPA_UMAC_SCAN)) {
mvm->scan_type = IWL_SCAN_TYPE_NOT_SET;
+ mvm->hb_scan_type = IWL_SCAN_TYPE_NOT_SET;
ret = iwl_mvm_config_scan(mvm);
if (ret)
goto error;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index 90f8c89..ec91dd9 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -953,6 +953,16 @@
switch (action) {
case IEEE80211_AMPDU_RX_START:
+ if (iwl_mvm_vif_from_mac80211(vif)->ap_sta_id ==
+ iwl_mvm_sta_from_mac80211(sta)->sta_id) {
+ struct iwl_mvm_vif *mvmvif;
+ u16 macid = iwl_mvm_vif_from_mac80211(vif)->id;
+ struct iwl_mvm_tcm_mac *mdata = &mvm->tcm.data[macid];
+
+ mdata->opened_rx_ba_sessions = true;
+ mvmvif = iwl_mvm_vif_from_mac80211(vif);
+ cancel_delayed_work(&mvmvif->uapsd_nonagg_detected_wk);
+ }
if (!iwl_enable_rx_ampdu(mvm->cfg)) {
ret = -EINVAL;
break;
@@ -1436,6 +1446,8 @@
mvm->p2p_device_vif = vif;
}
+ iwl_mvm_tcm_add_vif(mvm, vif);
+
if (vif->type == NL80211_IFTYPE_MONITOR)
mvm->monitor_on = true;
@@ -1487,6 +1499,10 @@
iwl_mvm_prepare_mac_removal(mvm, vif);
+ if (!(vif->type == NL80211_IFTYPE_AP ||
+ vif->type == NL80211_IFTYPE_ADHOC))
+ iwl_mvm_tcm_rm_vif(mvm, vif);
+
mutex_lock(&mvm->mutex);
if (mvm->bf_allowed_vif == mvmvif) {
@@ -2536,6 +2552,16 @@
static void iwl_mvm_check_uapsd(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
const u8 *bssid)
{
+ int i;
+
+ if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status)) {
+ struct iwl_mvm_tcm_mac *mdata;
+
+ mdata = &mvm->tcm.data[iwl_mvm_vif_from_mac80211(vif)->id];
+ ewma_rate_init(&mdata->uapsd_nonagg_detect.rate);
+ mdata->opened_rx_ba_sessions = false;
+ }
+
if (!(mvm->fw->ucode_capa.flags & IWL_UCODE_TLV_FLAGS_UAPSD_SUPPORT))
return;
@@ -2550,6 +2576,13 @@
return;
}
+ for (i = 0; i < IWL_MVM_UAPSD_NOAGG_LIST_LEN; i++) {
+ if (ether_addr_equal(mvm->uapsd_noagg_bssids[i].addr, bssid)) {
+ vif->driver_flags &= ~IEEE80211_VIF_SUPPORTS_UAPSD;
+ return;
+ }
+ }
+
vif->driver_flags |= IEEE80211_VIF_SUPPORTS_UAPSD;
}
@@ -2811,7 +2844,8 @@
}
static void iwl_mvm_mac_mgd_prepare_tx(struct ieee80211_hw *hw,
- struct ieee80211_vif *vif)
+ struct ieee80211_vif *vif,
+ u16 req_duration)
{
struct iwl_mvm *mvm = IWL_MAC80211_GET_MVM(hw);
u32 duration = IWL_MVM_TE_SESSION_PROTECTION_MAX_TIME_MS;
@@ -2824,6 +2858,9 @@
if (iwl_mvm_ref_sync(mvm, IWL_MVM_REF_PREPARE_TX))
return;
+ if (req_duration > duration)
+ duration = req_duration;
+
mutex_lock(&mvm->mutex);
/* Try really hard to protect the session and hear a beacon */
iwl_mvm_protect_session(mvm, vif, duration, min_duration, 500, false);
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
index d2cf751..6a4ba16 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -90,7 +92,9 @@
#include "fw/runtime.h"
#include "fw/dbg.h"
#include "fw/acpi.h"
-#include "fw/debugfs.h"
+#include "iwl-nvm-parse.h"
+
+#include <linux/average.h>
#define IWL_MVM_MAX_ADDRESSES 5
/* RSSI offset for WkP */
@@ -444,6 +448,8 @@
/* FW identified misbehaving AP */
u8 uapsd_misbehaving_bssid[ETH_ALEN];
+ struct delayed_work uapsd_nonagg_detected_wk;
+
/* Indicates that CSA countdown may be started */
bool csa_countdown;
bool csa_failed;
@@ -503,18 +509,6 @@
};
/**
- * struct iwl_nvm_section - describes an NVM section in memory.
- *
- * This struct holds an NVM section read from the NIC using NVM_ACCESS_CMD,
- * and saved for later use by the driver. Not all NVM sections are saved
- * this way, only the needed ones.
- */
-struct iwl_nvm_section {
- u16 length;
- const u8 *data;
-};
-
-/**
* struct iwl_mvm_tt_mgnt - Thermal Throttling Management structure
* @ct_kill_exit: worker to exit thermal kill
* @dynamic_smps: Is thermal throttling enabled dynamic_smps?
@@ -595,6 +589,53 @@
IWL_MVM_TDLS_SW_ACTIVE,
};
+enum iwl_mvm_traffic_load {
+ IWL_MVM_TRAFFIC_LOW,
+ IWL_MVM_TRAFFIC_MEDIUM,
+ IWL_MVM_TRAFFIC_HIGH,
+};
+
+DECLARE_EWMA(rate, 16, 16)
+
+struct iwl_mvm_tcm_mac {
+ struct {
+ u32 pkts[IEEE80211_NUM_ACS];
+ u32 airtime;
+ } tx;
+ struct {
+ u32 pkts[IEEE80211_NUM_ACS];
+ u32 airtime;
+ u32 last_ampdu_ref;
+ } rx;
+ struct {
+ /* track AP's transfer in client mode */
+ u64 rx_bytes;
+ struct ewma_rate rate;
+ bool detected;
+ } uapsd_nonagg_detect;
+ bool opened_rx_ba_sessions;
+};
+
+struct iwl_mvm_tcm {
+ struct delayed_work work;
+ spinlock_t lock; /* used when time elapsed */
+ unsigned long ts; /* timestamp when period ends */
+ unsigned long ll_ts;
+ unsigned long uapsd_nonagg_ts;
+ bool paused;
+ struct iwl_mvm_tcm_mac data[NUM_MAC_INDEX_DRIVER];
+ struct {
+ u32 elapsed; /* milliseconds for this TCM period */
+ u32 airtime[NUM_MAC_INDEX_DRIVER];
+ enum iwl_mvm_traffic_load load[NUM_MAC_INDEX_DRIVER];
+ enum iwl_mvm_traffic_load band_load[NUM_NL80211_BANDS];
+ enum iwl_mvm_traffic_load global_load;
+ bool low_latency[NUM_MAC_INDEX_DRIVER];
+ bool change[NUM_MAC_INDEX_DRIVER];
+ bool global_change;
+ } result;
+};
+
/**
* struct iwl_mvm_reorder_buffer - per ra/tid/queue reorder buffer
* @head_sn: reorder window head sn
@@ -829,7 +870,10 @@
unsigned int scan_status;
void *scan_cmd;
struct iwl_mcast_filter_cmd *mcast_filter_cmd;
+ /* For CDB this is low band scan type, for non-CDB - type. */
enum iwl_mvm_scan_type scan_type;
+ enum iwl_mvm_scan_type hb_scan_type;
+
enum iwl_mvm_sched_scan_pass_all_states sched_scan_pass_all;
struct delayed_work scan_timeout_dwork;
@@ -978,6 +1022,13 @@
*/
bool temperature_test; /* Debug test temperature is enabled */
+ unsigned long bt_coex_last_tcm_ts;
+ struct iwl_mvm_tcm tcm;
+
+ u8 uapsd_noagg_bssid_write_idx;
+ struct mac_address uapsd_noagg_bssids[IWL_MVM_UAPSD_NOAGG_BSSIDS_NUM]
+ __aligned(2);
+
struct iwl_time_quota_cmd last_quota_cmd;
#ifdef CONFIG_NL80211_TESTMODE
@@ -1293,6 +1344,16 @@
IWL_UCODE_TLV_CAPA_CDB_SUPPORT);
}
+static inline bool iwl_mvm_cdb_scan_api(struct iwl_mvm *mvm)
+{
+ /*
+ * TODO: should this be the same as iwl_mvm_is_cdb_supported()?
+ * but then there's a little bit of code in scan that won't make
+ * any sense...
+ */
+ return mvm->trans->cfg->device_family >= IWL_DEVICE_FAMILY_22000;
+}
+
static inline bool iwl_mvm_has_new_rx_stats_api(struct iwl_mvm *mvm)
{
return fw_has_api(&mvm->fw->ucode_capa,
@@ -1438,7 +1499,6 @@
/* NVM */
int iwl_nvm_init(struct iwl_mvm *mvm);
int iwl_mvm_load_nvm_to_nic(struct iwl_mvm *mvm);
-int iwl_mvm_read_external_nvm(struct iwl_mvm *mvm);
static inline u8 iwl_mvm_get_valid_tx_ant(struct iwl_mvm *mvm)
{
@@ -1771,6 +1831,8 @@
enum iwl_mvm_low_latency_cause cause);
/* get SystemLowLatencyMode - only needed for beacon threshold? */
bool iwl_mvm_low_latency(struct iwl_mvm *mvm);
+bool iwl_mvm_low_latency_band(struct iwl_mvm *mvm, enum nl80211_band band);
+
/* get VMACLowLatencyMode */
static inline bool iwl_mvm_vif_low_latency(struct iwl_mvm_vif *mvmvif)
{
@@ -1906,6 +1968,17 @@
void iwl_mvm_inactivity_check(struct iwl_mvm *mvm);
+#define MVM_TCM_PERIOD_MSEC 500
+#define MVM_TCM_PERIOD (HZ * MVM_TCM_PERIOD_MSEC / 1000)
+#define MVM_LL_PERIOD (10 * HZ)
+void iwl_mvm_tcm_work(struct work_struct *work);
+void iwl_mvm_recalc_tcm(struct iwl_mvm *mvm);
+void iwl_mvm_pause_tcm(struct iwl_mvm *mvm, bool with_cancel);
+void iwl_mvm_resume_tcm(struct iwl_mvm *mvm);
+void iwl_mvm_tcm_add_vif(struct iwl_mvm *mvm, struct ieee80211_vif *vif);
+void iwl_mvm_tcm_rm_vif(struct iwl_mvm *mvm, struct ieee80211_vif *vif);
+u8 iwl_mvm_tcm_load_percentage(u32 airtime, u32 elapsed);
+
void iwl_mvm_nic_restart(struct iwl_mvm *mvm, bool fw_error);
unsigned int iwl_mvm_get_wd_timeout(struct iwl_mvm *mvm,
struct ieee80211_vif *vif,
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/nvm.c b/drivers/net/wireless/intel/iwlwifi/mvm/nvm.c
index 5bfe530..cf48517 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/nvm.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/nvm.c
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -76,9 +78,7 @@
#include "fw/acpi.h"
/* Default NVM size to read */
-#define IWL_NVM_DEFAULT_CHUNK_SIZE (2*1024)
-#define IWL_MAX_NVM_SECTION_SIZE 0x1b58
-#define IWL_MAX_EXT_NVM_SECTION_SIZE 0x1ffc
+#define IWL_NVM_DEFAULT_CHUNK_SIZE (2 * 1024)
#define NVM_WRITE_OPCODE 1
#define NVM_READ_OPCODE 0
@@ -229,19 +229,6 @@
return 0;
}
-static void iwl_mvm_nvm_fixups(struct iwl_mvm *mvm, unsigned int section,
- u8 *data, unsigned int len)
-{
-#define IWL_4165_DEVICE_ID 0x5501
-#define NVM_SKU_CAP_MIMO_DISABLE BIT(5)
-
- if (section == NVM_SECTION_TYPE_PHY_SKU &&
- mvm->trans->hw_id == IWL_4165_DEVICE_ID && data && len >= 5 &&
- (data[4] & NVM_SKU_CAP_MIMO_DISABLE))
- /* OTP 0x52 bug work around: it's a 1x1 device */
- data[3] = ANT_B | (ANT_B << 4);
-}
-
/*
* Reads an NVM section completely.
* NICs prior to 7000 family doesn't have a real NVM, but just read
@@ -282,7 +269,7 @@
offset += ret;
}
- iwl_mvm_nvm_fixups(mvm, section, data, offset);
+ iwl_nvm_fixups(mvm->trans->hw_id, section, data, offset);
IWL_DEBUG_EEPROM(mvm->trans->dev,
"NVM section %d read completed\n", section);
@@ -355,184 +342,6 @@
lar_enabled);
}
-#define MAX_NVM_FILE_LEN 16384
-
-/*
- * Reads external NVM from a file into mvm->nvm_sections
- *
- * HOW TO CREATE THE NVM FILE FORMAT:
- * ------------------------------
- * 1. create hex file, format:
- * 3800 -> header
- * 0000 -> header
- * 5a40 -> data
- *
- * rev - 6 bit (word1)
- * len - 10 bit (word1)
- * id - 4 bit (word2)
- * rsv - 12 bit (word2)
- *
- * 2. flip 8bits with 8 bits per line to get the right NVM file format
- *
- * 3. create binary file from the hex file
- *
- * 4. save as "iNVM_xxx.bin" under /lib/firmware
- */
-int iwl_mvm_read_external_nvm(struct iwl_mvm *mvm)
-{
- int ret, section_size;
- u16 section_id;
- const struct firmware *fw_entry;
- const struct {
- __le16 word1;
- __le16 word2;
- u8 data[];
- } *file_sec;
- const u8 *eof;
- u8 *temp;
- int max_section_size;
- const __le32 *dword_buff;
-
-#define NVM_WORD1_LEN(x) (8 * (x & 0x03FF))
-#define NVM_WORD2_ID(x) (x >> 12)
-#define EXT_NVM_WORD2_LEN(x) (2 * (((x) & 0xFF) << 8 | (x) >> 8))
-#define EXT_NVM_WORD1_ID(x) ((x) >> 4)
-#define NVM_HEADER_0 (0x2A504C54)
-#define NVM_HEADER_1 (0x4E564D2A)
-#define NVM_HEADER_SIZE (4 * sizeof(u32))
-
- IWL_DEBUG_EEPROM(mvm->trans->dev, "Read from external NVM\n");
-
- /* Maximal size depends on NVM version */
- if (mvm->trans->cfg->nvm_type != IWL_NVM_EXT)
- max_section_size = IWL_MAX_NVM_SECTION_SIZE;
- else
- max_section_size = IWL_MAX_EXT_NVM_SECTION_SIZE;
-
- /*
- * Obtain NVM image via request_firmware. Since we already used
- * request_firmware_nowait() for the firmware binary load and only
- * get here after that we assume the NVM request can be satisfied
- * synchronously.
- */
- ret = request_firmware(&fw_entry, mvm->nvm_file_name,
- mvm->trans->dev);
- if (ret) {
- IWL_ERR(mvm, "ERROR: %s isn't available %d\n",
- mvm->nvm_file_name, ret);
- return ret;
- }
-
- IWL_INFO(mvm, "Loaded NVM file %s (%zu bytes)\n",
- mvm->nvm_file_name, fw_entry->size);
-
- if (fw_entry->size > MAX_NVM_FILE_LEN) {
- IWL_ERR(mvm, "NVM file too large\n");
- ret = -EINVAL;
- goto out;
- }
-
- eof = fw_entry->data + fw_entry->size;
- dword_buff = (__le32 *)fw_entry->data;
-
- /* some NVM file will contain a header.
- * The header is identified by 2 dwords header as follow:
- * dword[0] = 0x2A504C54
- * dword[1] = 0x4E564D2A
- *
- * This header must be skipped when providing the NVM data to the FW.
- */
- if (fw_entry->size > NVM_HEADER_SIZE &&
- dword_buff[0] == cpu_to_le32(NVM_HEADER_0) &&
- dword_buff[1] == cpu_to_le32(NVM_HEADER_1)) {
- file_sec = (void *)(fw_entry->data + NVM_HEADER_SIZE);
- IWL_INFO(mvm, "NVM Version %08X\n", le32_to_cpu(dword_buff[2]));
- IWL_INFO(mvm, "NVM Manufacturing date %08X\n",
- le32_to_cpu(dword_buff[3]));
-
- /* nvm file validation, dword_buff[2] holds the file version */
- if (mvm->trans->cfg->device_family == IWL_DEVICE_FAMILY_8000 &&
- CSR_HW_REV_STEP(mvm->trans->hw_rev) == SILICON_C_STEP &&
- le32_to_cpu(dword_buff[2]) < 0xE4A) {
- ret = -EFAULT;
- goto out;
- }
- } else {
- file_sec = (void *)fw_entry->data;
- }
-
- while (true) {
- if (file_sec->data > eof) {
- IWL_ERR(mvm,
- "ERROR - NVM file too short for section header\n");
- ret = -EINVAL;
- break;
- }
-
- /* check for EOF marker */
- if (!file_sec->word1 && !file_sec->word2) {
- ret = 0;
- break;
- }
-
- if (mvm->trans->cfg->nvm_type != IWL_NVM_EXT) {
- section_size =
- 2 * NVM_WORD1_LEN(le16_to_cpu(file_sec->word1));
- section_id = NVM_WORD2_ID(le16_to_cpu(file_sec->word2));
- } else {
- section_size = 2 * EXT_NVM_WORD2_LEN(
- le16_to_cpu(file_sec->word2));
- section_id = EXT_NVM_WORD1_ID(
- le16_to_cpu(file_sec->word1));
- }
-
- if (section_size > max_section_size) {
- IWL_ERR(mvm, "ERROR - section too large (%d)\n",
- section_size);
- ret = -EINVAL;
- break;
- }
-
- if (!section_size) {
- IWL_ERR(mvm, "ERROR - section empty\n");
- ret = -EINVAL;
- break;
- }
-
- if (file_sec->data + section_size > eof) {
- IWL_ERR(mvm,
- "ERROR - NVM file too short for section (%d bytes)\n",
- section_size);
- ret = -EINVAL;
- break;
- }
-
- if (WARN(section_id >= NVM_MAX_NUM_SECTIONS,
- "Invalid NVM section ID %d\n", section_id)) {
- ret = -EINVAL;
- break;
- }
-
- temp = kmemdup(file_sec->data, section_size, GFP_KERNEL);
- if (!temp) {
- ret = -ENOMEM;
- break;
- }
-
- iwl_mvm_nvm_fixups(mvm, section_id, temp, section_size);
-
- kfree(mvm->nvm_sections[section_id].data);
- mvm->nvm_sections[section_id].data = temp;
- mvm->nvm_sections[section_id].length = section_size;
-
- /* advance to the next section */
- file_sec = (void *)(file_sec->data + section_size);
- }
-out:
- release_firmware(fw_entry);
- return ret;
-}
-
/* Loads the NVM data stored in mvm->nvm_sections into the NIC */
int iwl_mvm_load_nvm_to_nic(struct iwl_mvm *mvm)
{
@@ -585,7 +394,7 @@
break;
}
- iwl_mvm_nvm_fixups(mvm, section, temp, ret);
+ iwl_nvm_fixups(mvm->trans->hw_id, section, temp, ret);
mvm->nvm_sections[section].data = temp;
mvm->nvm_sections[section].length = ret;
@@ -624,14 +433,17 @@
/* Only if PNVM selected in the mod param - load external NVM */
if (mvm->nvm_file_name) {
/* read External NVM file from the mod param */
- ret = iwl_mvm_read_external_nvm(mvm);
+ ret = iwl_read_external_nvm(mvm->trans, mvm->nvm_file_name,
+ mvm->nvm_sections);
if (ret) {
mvm->nvm_file_name = nvm_file_C;
if ((ret == -EFAULT || ret == -ENOENT) &&
mvm->nvm_file_name) {
/* in case nvm file was failed try again */
- ret = iwl_mvm_read_external_nvm(mvm);
+ ret = iwl_read_external_nvm(mvm->trans,
+ mvm->nvm_file_name,
+ mvm->nvm_sections);
if (ret)
return ret;
} else {
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
index 224bfa1..ff1e518 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c
@@ -250,6 +250,9 @@
RX_HANDLER(TX_CMD, iwl_mvm_rx_tx_cmd, RX_HANDLER_SYNC),
RX_HANDLER(BA_NOTIF, iwl_mvm_rx_ba_notif, RX_HANDLER_SYNC),
+ RX_HANDLER_GRP(DATA_PATH_GROUP, TLC_MNG_UPDATE_NOTIF,
+ iwl_mvm_tlc_update_notif, RX_HANDLER_SYNC),
+
RX_HANDLER(BT_PROFILE_NOTIFICATION, iwl_mvm_rx_bt_coex_notif,
RX_HANDLER_ASYNC_LOCKED),
RX_HANDLER(BEACON_NOTIFICATION, iwl_mvm_rx_beacon_notif,
@@ -667,6 +670,12 @@
SET_IEEE80211_DEV(mvm->hw, mvm->trans->dev);
+ spin_lock_init(&mvm->tcm.lock);
+ INIT_DELAYED_WORK(&mvm->tcm.work, iwl_mvm_tcm_work);
+ mvm->tcm.ts = jiffies;
+ mvm->tcm.ll_ts = jiffies;
+ mvm->tcm.uapsd_nonagg_ts = jiffies;
+
INIT_DELAYED_WORK(&mvm->cs_tx_unblock_dwork, iwl_mvm_tx_unblock_dwork);
/*
@@ -730,6 +739,9 @@
sizeof(trans->dbg_conf_tlv));
trans->dbg_trigger_tlv = mvm->fw->dbg_trigger_tlv;
+ trans->iml = mvm->fw->iml;
+ trans->iml_len = mvm->fw->iml_len;
+
/* set up notification wait support */
iwl_notification_wait_init(&mvm->notif_wait);
@@ -859,6 +871,8 @@
for (i = 0; i < NVM_MAX_NUM_SECTIONS; i++)
kfree(mvm->nvm_sections[i].data);
+ cancel_delayed_work_sync(&mvm->tcm.work);
+
iwl_mvm_tof_clean(mvm);
mutex_destroy(&mvm->mutex);
@@ -1026,8 +1040,6 @@
iwl_mvm_rx_queue_notif(mvm, rxb, 0);
else if (cmd == WIDE_ID(LEGACY_GROUP, FRAME_RELEASE))
iwl_mvm_rx_frame_release(mvm, napi, rxb, 0);
- else if (cmd == WIDE_ID(DATA_PATH_GROUP, TLC_MNG_UPDATE_NOTIF))
- iwl_mvm_tlc_update_notif(mvm, pkt);
else
iwl_mvm_rx_common(mvm, rxb, pkt);
}
@@ -1432,6 +1444,7 @@
mvm->d0i3_offloading = false;
}
+ iwl_mvm_pause_tcm(mvm, true);
/* make sure we have no running tx while configuring the seqno */
synchronize_net();
@@ -1615,6 +1628,7 @@
/* the FW might have updated the regdomain */
iwl_mvm_update_changed_regdom(mvm);
+ iwl_mvm_resume_tcm(mvm);
iwl_mvm_unref(mvm, IWL_MVM_REF_EXIT_WORK);
mutex_unlock(&mvm->mutex);
}
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c
index fb57456..b8b2b81 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs-fw.c
@@ -6,6 +6,7 @@
* GPL LICENSE SUMMARY
*
* Copyright(c) 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -26,6 +27,7 @@
* BSD LICENSE
*
* Copyright(c) 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -65,14 +67,14 @@
{
switch (sta->bandwidth) {
case IEEE80211_STA_RX_BW_160:
- return IWL_TLC_MNG_MAX_CH_WIDTH_160MHZ;
+ return IWL_TLC_MNG_CH_WIDTH_160MHZ;
case IEEE80211_STA_RX_BW_80:
- return IWL_TLC_MNG_MAX_CH_WIDTH_80MHZ;
+ return IWL_TLC_MNG_CH_WIDTH_80MHZ;
case IEEE80211_STA_RX_BW_40:
- return IWL_TLC_MNG_MAX_CH_WIDTH_40MHZ;
+ return IWL_TLC_MNG_CH_WIDTH_40MHZ;
case IEEE80211_STA_RX_BW_20:
default:
- return IWL_TLC_MNG_MAX_CH_WIDTH_20MHZ;
+ return IWL_TLC_MNG_CH_WIDTH_20MHZ;
}
}
@@ -85,7 +87,9 @@
if (chains & ANT_B)
fw_chains |= IWL_TLC_MNG_CHAIN_B_MSK;
if (chains & ANT_C)
- fw_chains |= IWL_TLC_MNG_CHAIN_C_MSK;
+ WARN(false,
+ "tlc offload doesn't support antenna C. chains: 0x%x\n",
+ chains);
return fw_chains;
}
@@ -97,13 +101,13 @@
u8 supp = 0;
if (ht_cap->cap & IEEE80211_HT_CAP_SGI_20)
- supp |= IWL_TLC_MNG_SGI_20MHZ_MSK;
+ supp |= BIT(IWL_TLC_MNG_CH_WIDTH_20MHZ);
if (ht_cap->cap & IEEE80211_HT_CAP_SGI_40)
- supp |= IWL_TLC_MNG_SGI_40MHZ_MSK;
+ supp |= BIT(IWL_TLC_MNG_CH_WIDTH_40MHZ);
if (vht_cap->cap & IEEE80211_VHT_CAP_SHORT_GI_80)
- supp |= IWL_TLC_MNG_SGI_80MHZ_MSK;
+ supp |= BIT(IWL_TLC_MNG_CH_WIDTH_80MHZ);
if (vht_cap->cap & IEEE80211_VHT_CAP_SHORT_GI_160)
- supp |= IWL_TLC_MNG_SGI_160MHZ_MSK;
+ supp |= BIT(IWL_TLC_MNG_CH_WIDTH_160MHZ);
return supp;
}
@@ -114,9 +118,7 @@
struct ieee80211_sta_ht_cap *ht_cap = &sta->ht_cap;
struct ieee80211_sta_vht_cap *vht_cap = &sta->vht_cap;
bool vht_ena = vht_cap && vht_cap->vht_supported;
- u16 flags = IWL_TLC_MNG_CFG_FLAGS_CCK_MSK |
- IWL_TLC_MNG_CFG_FLAGS_DCM_MSK |
- IWL_TLC_MNG_CFG_FLAGS_DD_MSK;
+ u16 flags = 0;
if (mvm->cfg->ht_params->stbc &&
(num_of_ant(iwl_mvm_get_valid_tx_ant(mvm)) > 1) &&
@@ -129,16 +131,11 @@
(vht_ena && (vht_cap->cap & IEEE80211_VHT_CAP_RXLDPC))))
flags |= IWL_TLC_MNG_CFG_FLAGS_LDPC_MSK;
- if (fw_has_capa(&mvm->fw->ucode_capa, IWL_UCODE_TLV_CAPA_BEAMFORMER) &&
- (num_of_ant(iwl_mvm_get_valid_tx_ant(mvm)) > 1) &&
- (vht_cap->cap & IEEE80211_VHT_CAP_SU_BEAMFORMEE_CAPABLE))
- flags |= IWL_TLC_MNG_CFG_FLAGS_BF_MSK;
-
return flags;
}
static
-int rs_fw_vht_highest_rx_mcs_index(struct ieee80211_sta_vht_cap *vht_cap,
+int rs_fw_vht_highest_rx_mcs_index(const struct ieee80211_sta_vht_cap *vht_cap,
int nss)
{
u16 rx_mcs = le16_to_cpu(vht_cap->vht_mcs.rx_mcs_map) &
@@ -160,15 +157,16 @@
return 0;
}
-static void rs_fw_vht_set_enabled_rates(struct ieee80211_sta *sta,
- struct ieee80211_sta_vht_cap *vht_cap,
- struct iwl_tlc_config_cmd *cmd)
+static void
+rs_fw_vht_set_enabled_rates(const struct ieee80211_sta *sta,
+ const struct ieee80211_sta_vht_cap *vht_cap,
+ struct iwl_tlc_config_cmd *cmd)
{
u16 supp;
int i, highest_mcs;
for (i = 0; i < sta->rx_nss; i++) {
- if (i == MAX_RS_ANT_NUM)
+ if (i == MAX_NSS)
break;
highest_mcs = rs_fw_vht_highest_rx_mcs_index(vht_cap, i + 1);
@@ -179,7 +177,9 @@
if (sta->bandwidth == IEEE80211_STA_RX_BW_20)
supp &= ~BIT(IWL_TLC_MNG_HT_RATE_MCS9);
- cmd->ht_supp_rates[i] = cpu_to_le16(supp);
+ cmd->ht_rates[i][0] = cpu_to_le16(supp);
+ if (sta->bandwidth == IEEE80211_STA_RX_BW_160)
+ cmd->ht_rates[i][1] = cmd->ht_rates[i][0];
}
}
@@ -190,8 +190,8 @@
int i;
unsigned long tmp;
unsigned long supp; /* must be unsigned long for for_each_set_bit */
- struct ieee80211_sta_ht_cap *ht_cap = &sta->ht_cap;
- struct ieee80211_sta_vht_cap *vht_cap = &sta->vht_cap;
+ const struct ieee80211_sta_ht_cap *ht_cap = &sta->ht_cap;
+ const struct ieee80211_sta_vht_cap *vht_cap = &sta->vht_cap;
/* non HT rates */
supp = 0;
@@ -199,45 +199,40 @@
for_each_set_bit(i, &tmp, BITS_PER_LONG)
supp |= BIT(sband->bitrates[i].hw_value);
- cmd->non_ht_supp_rates = cpu_to_le16(supp);
+ cmd->non_ht_rates = cpu_to_le16(supp);
cmd->mode = IWL_TLC_MNG_MODE_NON_HT;
- /* HT/VHT rates */
if (vht_cap && vht_cap->vht_supported) {
cmd->mode = IWL_TLC_MNG_MODE_VHT;
rs_fw_vht_set_enabled_rates(sta, vht_cap, cmd);
} else if (ht_cap && ht_cap->ht_supported) {
cmd->mode = IWL_TLC_MNG_MODE_HT;
- cmd->ht_supp_rates[0] = cpu_to_le16(ht_cap->mcs.rx_mask[0]);
- cmd->ht_supp_rates[1] = cpu_to_le16(ht_cap->mcs.rx_mask[1]);
+ cmd->ht_rates[0][0] = cpu_to_le16(ht_cap->mcs.rx_mask[0]);
+ cmd->ht_rates[1][0] = cpu_to_le16(ht_cap->mcs.rx_mask[1]);
}
}
-static void rs_fw_tlc_mng_notif_req_config(struct iwl_mvm *mvm, u8 sta_id)
+void iwl_mvm_tlc_update_notif(struct iwl_mvm *mvm,
+ struct iwl_rx_cmd_buffer *rxb)
{
- u32 cmd_id = iwl_cmd_id(TLC_MNG_NOTIF_REQ_CMD, DATA_PATH_GROUP, 0);
- struct iwl_tlc_notif_req_config_cmd cfg_cmd = {
- .sta_id = sta_id,
- .flags = cpu_to_le16(IWL_TLC_NOTIF_INIT_RATE_MSK),
- .interval = cpu_to_le16(IWL_TLC_NOTIF_REQ_INTERVAL),
- };
- int ret;
-
- ret = iwl_mvm_send_cmd_pdu(mvm, cmd_id, 0, sizeof(cfg_cmd), &cfg_cmd);
- if (ret)
- IWL_ERR(mvm, "Failed to send TLC notif request (%d)\n", ret);
-}
-
-void iwl_mvm_tlc_update_notif(struct iwl_mvm *mvm, struct iwl_rx_packet *pkt)
-{
+ struct iwl_rx_packet *pkt = rxb_addr(rxb);
struct iwl_tlc_update_notif *notif;
+ struct ieee80211_sta *sta;
struct iwl_mvm_sta *mvmsta;
struct iwl_lq_sta_rs_fw *lq_sta;
+ u32 flags;
rcu_read_lock();
notif = (void *)pkt->data;
- mvmsta = iwl_mvm_sta_from_staid_rcu(mvm, notif->sta_id);
+ sta = rcu_dereference(mvm->fw_id_to_mac_id[notif->sta_id]);
+ if (IS_ERR_OR_NULL(sta)) {
+ IWL_ERR(mvm, "Invalid sta id (%d) in FW TLC notification\n",
+ notif->sta_id);
+ goto out;
+ }
+
+ mvmsta = iwl_mvm_sta_from_mac80211(sta);
if (!mvmsta) {
IWL_ERR(mvm, "Invalid sta id (%d) in FW TLC notification\n",
@@ -245,14 +240,30 @@
goto out;
}
+ flags = le32_to_cpu(notif->flags);
+
lq_sta = &mvmsta->lq_sta.rs_fw;
- if (le16_to_cpu(notif->flags) & IWL_TLC_NOTIF_INIT_RATE_MSK) {
- lq_sta->last_rate_n_flags =
- le32_to_cpu(notif->values[IWL_TLC_NOTIF_INIT_RATE_POS]);
+ if (flags & IWL_TLC_NOTIF_FLAG_RATE) {
+ lq_sta->last_rate_n_flags = le32_to_cpu(notif->rate);
IWL_DEBUG_RATE(mvm, "new rate_n_flags: 0x%X\n",
lq_sta->last_rate_n_flags);
}
+
+ if (flags & IWL_TLC_NOTIF_FLAG_AMSDU) {
+ u16 size = le32_to_cpu(notif->amsdu_size);
+
+ if (WARN_ON(sta->max_amsdu_len < size))
+ goto out;
+
+ mvmsta->amsdu_enabled = le32_to_cpu(notif->amsdu_enabled);
+ mvmsta->max_amsdu_len = size;
+
+ IWL_DEBUG_RATE(mvm,
+ "AMSDU update. AMSDU size: %d, AMSDU selected size: %d, AMSDU TID bitmap 0x%X\n",
+ le32_to_cpu(notif->amsdu_size), size,
+ mvmsta->amsdu_enabled);
+ }
out:
rcu_read_unlock();
}
@@ -267,12 +278,12 @@
struct ieee80211_supported_band *sband;
struct iwl_tlc_config_cmd cfg_cmd = {
.sta_id = mvmsta->sta_id,
- .max_supp_ch_width = rs_fw_bw_from_sta_bw(sta),
+ .max_ch_width = rs_fw_bw_from_sta_bw(sta),
.flags = cpu_to_le16(rs_fw_set_config_flags(mvm, sta)),
.chains = rs_fw_set_active_chains(iwl_mvm_get_valid_tx_ant(mvm)),
- .max_supp_ss = sta->rx_nss,
- .max_ampdu_cnt = cpu_to_le32(mvmsta->max_agg_bufsize),
+ .max_mpdu_len = cpu_to_le16(sta->max_amsdu_len),
.sgi_ch_width_supp = rs_fw_sgi_cw_support(sta),
+ .amsdu = iwl_mvm_is_csum_supported(mvm),
};
int ret;
@@ -287,8 +298,6 @@
ret = iwl_mvm_send_cmd_pdu(mvm, cmd_id, 0, sizeof(cfg_cmd), &cfg_cmd);
if (ret)
IWL_ERR(mvm, "Failed to send rate scale config (%d)\n", ret);
-
- rs_fw_tlc_mng_notif_req_config(mvm, cfg_cmd.sta_id);
}
int rs_fw_tx_protection(struct iwl_mvm *mvm, struct iwl_mvm_sta *mvmsta,
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c
index 5d776ec..af7bc3b 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c
@@ -1715,12 +1715,18 @@
{
struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta);
+ /*
+ * In case TLC offload is not active amsdu_enabled is either 0xFFFF
+ * or 0, since there is no per-TID alg.
+ */
if ((!is_vht(&tbl->rate) && !is_ht(&tbl->rate)) ||
tbl->rate.index < IWL_RATE_MCS_5_INDEX ||
scale_action == RS_ACTION_DOWNSCALE)
- mvmsta->tlc_amsdu = false;
+ mvmsta->amsdu_enabled = 0;
else
- mvmsta->tlc_amsdu = true;
+ mvmsta->amsdu_enabled = 0xFFFF;
+
+ mvmsta->max_amsdu_len = sta->max_amsdu_len;
}
/*
@@ -3134,7 +3140,8 @@
sband = hw->wiphy->bands[band];
lq_sta->lq.sta_id = mvmsta->sta_id;
- mvmsta->tlc_amsdu = false;
+ mvmsta->amsdu_enabled = 0;
+ mvmsta->max_amsdu_len = sta->max_amsdu_len;
for (j = 0; j < LQ_SIZE; j++)
rs_rate_scale_clear_tbl_windows(mvm, &lq_sta->lq_info[j]);
@@ -3744,7 +3751,7 @@
(rate->sgi) ? "SGI" : "NGI",
(rate->ldpc) ? "LDPC" : "BCC",
(lq_sta->is_agg) ? "AGG on" : "",
- (mvmsta->tlc_amsdu) ? "AMSDU on" : "");
+ (mvmsta->amsdu_enabled) ? "AMSDU on" : "");
}
desc += scnprintf(buff + desc, bufsz - desc, "last tx rate=0x%X\n",
lq_sta->last_rate_n_flags);
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.h b/drivers/net/wireless/intel/iwlwifi/mvm/rs.h
index fb18cb8..5e89141 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.h
@@ -454,5 +454,6 @@
enum nl80211_band band);
int rs_fw_tx_protection(struct iwl_mvm *mvm, struct iwl_mvm_sta *mvmsta,
bool enable);
-void iwl_mvm_tlc_update_notif(struct iwl_mvm *mvm, struct iwl_rx_packet *pkt);
+void iwl_mvm_tlc_update_notif(struct iwl_mvm *mvm,
+ struct iwl_rx_cmd_buffer *rxb);
#endif /* __rs__ */
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rx.c b/drivers/net/wireless/intel/iwlwifi/mvm/rx.c
index d26833c..bfb1634 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rx.c
@@ -254,6 +254,74 @@
return 0;
}
+static void iwl_mvm_rx_handle_tcm(struct iwl_mvm *mvm,
+ struct ieee80211_sta *sta,
+ struct ieee80211_hdr *hdr, u32 len,
+ struct iwl_rx_phy_info *phy_info,
+ u32 rate_n_flags)
+{
+ struct iwl_mvm_sta *mvmsta;
+ struct iwl_mvm_tcm_mac *mdata;
+ int mac;
+ int ac = IEEE80211_AC_BE; /* treat non-QoS as BE */
+ struct iwl_mvm_vif *mvmvif;
+ /* expected throughput in 100Kbps, single stream, 20 MHz */
+ static const u8 thresh_tpt[] = {
+ 9, 18, 30, 42, 60, 78, 90, 96, 120, 135,
+ };
+ u16 thr;
+
+ if (ieee80211_is_data_qos(hdr->frame_control))
+ ac = tid_to_mac80211_ac[ieee80211_get_tid(hdr)];
+
+ mvmsta = iwl_mvm_sta_from_mac80211(sta);
+ mac = mvmsta->mac_id_n_color & FW_CTXT_ID_MSK;
+
+ if (time_after(jiffies, mvm->tcm.ts + MVM_TCM_PERIOD))
+ schedule_delayed_work(&mvm->tcm.work, 0);
+ mdata = &mvm->tcm.data[mac];
+ mdata->rx.pkts[ac]++;
+
+ /* count the airtime only once for each ampdu */
+ if (mdata->rx.last_ampdu_ref != mvm->ampdu_ref) {
+ mdata->rx.last_ampdu_ref = mvm->ampdu_ref;
+ mdata->rx.airtime += le16_to_cpu(phy_info->frame_time);
+ }
+
+ if (!(rate_n_flags & (RATE_MCS_HT_MSK | RATE_MCS_VHT_MSK)))
+ return;
+
+ mvmvif = iwl_mvm_vif_from_mac80211(mvmsta->vif);
+
+ if (mdata->opened_rx_ba_sessions ||
+ mdata->uapsd_nonagg_detect.detected ||
+ (!mvmvif->queue_params[IEEE80211_AC_VO].uapsd &&
+ !mvmvif->queue_params[IEEE80211_AC_VI].uapsd &&
+ !mvmvif->queue_params[IEEE80211_AC_BE].uapsd &&
+ !mvmvif->queue_params[IEEE80211_AC_BK].uapsd) ||
+ mvmsta->sta_id != mvmvif->ap_sta_id)
+ return;
+
+ if (rate_n_flags & RATE_MCS_HT_MSK) {
+ thr = thresh_tpt[rate_n_flags & RATE_HT_MCS_RATE_CODE_MSK];
+ thr *= 1 + ((rate_n_flags & RATE_HT_MCS_NSS_MSK) >>
+ RATE_HT_MCS_NSS_POS);
+ } else {
+ if (WARN_ON((rate_n_flags & RATE_VHT_MCS_RATE_CODE_MSK) >=
+ ARRAY_SIZE(thresh_tpt)))
+ return;
+ thr = thresh_tpt[rate_n_flags & RATE_VHT_MCS_RATE_CODE_MSK];
+ thr *= 1 + ((rate_n_flags & RATE_VHT_MCS_NSS_MSK) >>
+ RATE_VHT_MCS_NSS_POS);
+ }
+
+ thr <<= ((rate_n_flags & RATE_MCS_CHAN_WIDTH_MSK) >>
+ RATE_MCS_CHAN_WIDTH_POS);
+
+ mdata->uapsd_nonagg_detect.rx_bytes += len;
+ ewma_rate_add(&mdata->uapsd_nonagg_detect.rate, thr);
+}
+
static void iwl_mvm_rx_csum(struct ieee80211_sta *sta,
struct sk_buff *skb,
u32 status)
@@ -408,6 +476,12 @@
NULL);
}
+ if (!mvm->tcm.paused && len >= sizeof(*hdr) &&
+ !is_multicast_ether_addr(hdr->addr1) &&
+ ieee80211_is_data(hdr->frame_control))
+ iwl_mvm_rx_handle_tcm(mvm, sta, hdr, len, phy_info,
+ rate_n_flags);
+
if (ieee80211_is_data(hdr->frame_control))
iwl_mvm_rx_csum(sta, skb, rx_pkt_status);
}
@@ -654,7 +728,8 @@
int expected_size;
int i;
u8 *energy;
- __le32 *bytes, *air_time;
+ __le32 *bytes;
+ __le32 *air_time;
__le32 flags;
if (!iwl_mvm_has_new_rx_stats_api(mvm)) {
@@ -752,6 +827,32 @@
sta->avg_energy = energy[i];
}
rcu_read_unlock();
+
+ /*
+ * Don't update in case the statistics are not cleared, since
+ * we will end up counting twice the same airtime, once in TCM
+ * request and once in statistics notification.
+ */
+ if (!(le32_to_cpu(flags) & IWL_STATISTICS_REPLY_FLG_CLEAR))
+ return;
+
+ spin_lock(&mvm->tcm.lock);
+ for (i = 0; i < NUM_MAC_INDEX_DRIVER; i++) {
+ struct iwl_mvm_tcm_mac *mdata = &mvm->tcm.data[i];
+ u32 airtime = le32_to_cpu(air_time[i]);
+ u32 rx_bytes = le32_to_cpu(bytes[i]);
+
+ mdata->uapsd_nonagg_detect.rx_bytes += rx_bytes;
+ if (airtime) {
+ /* re-init every time to store rate from FW */
+ ewma_rate_init(&mdata->uapsd_nonagg_detect.rate);
+ ewma_rate_add(&mdata->uapsd_nonagg_detect.rate,
+ rx_bytes * 8 / airtime);
+ }
+
+ mdata->rx.airtime += airtime;
+ }
+ spin_unlock(&mvm->tcm.lock);
}
void iwl_mvm_rx_statistics(struct iwl_mvm *mvm, struct iwl_rx_cmd_buffer *rxb)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
index 4a4ccfd..bb63e75 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
@@ -112,7 +112,7 @@
return -1;
if (ieee80211_is_data_qos(hdr->frame_control))
- tid = *ieee80211_get_qos_ctl(hdr) & IEEE80211_QOS_CTL_TID_MASK;
+ tid = ieee80211_get_tid(hdr);
else
tid = 0;
@@ -151,17 +151,9 @@
unsigned int hdrlen = ieee80211_hdrlen(hdr->frame_control);
if (desc->mac_flags2 & IWL_RX_MPDU_MFLG2_PAD) {
+ len -= 2;
pad_len = 2;
-
- /*
- * If the device inserted padding it means that (it thought)
- * the 802.11 header wasn't a multiple of 4 bytes long. In
- * this case, reserve two bytes at the start of the SKB to
- * align the payload properly in case we end up copying it.
- */
- skb_reserve(skb, pad_len);
}
- len -= pad_len;
/* If frame is small enough to fit in skb->head, pull it completely.
* If not, only pull ieee80211_hdr (including crypto if present, and
@@ -347,8 +339,7 @@
if (ieee80211_is_data_qos(hdr->frame_control))
/* frame has qos control */
- tid = *ieee80211_get_qos_ctl(hdr) &
- IEEE80211_QOS_CTL_TID_MASK;
+ tid = ieee80211_get_tid(hdr);
else
tid = IWL_MAX_TID_COUNT;
@@ -628,7 +619,7 @@
bool amsdu = desc->mac_flags2 & IWL_RX_MPDU_MFLG2_AMSDU;
bool last_subframe =
desc->amsdu_info & IWL_RX_MPDU_AMSDU_LAST_SUBFRAME;
- u8 tid = *ieee80211_get_qos_ctl(hdr) & IEEE80211_QOS_CTL_TID_MASK;
+ u8 tid = ieee80211_get_tid(hdr);
u8 sub_frame_idx = desc->amsdu_info &
IWL_RX_MPDU_AMSDU_SUBFRAME_IDX_MASK;
struct iwl_mvm_reorder_buf_entry *entries;
@@ -867,6 +858,16 @@
return;
}
+ if (desc->mac_flags2 & IWL_RX_MPDU_MFLG2_PAD) {
+ /*
+ * If the device inserted padding it means that (it thought)
+ * the 802.11 header wasn't a multiple of 4 bytes long. In
+ * this case, reserve two bytes at the start of the SKB to
+ * align the payload properly in case we end up copying it.
+ */
+ skb_reserve(skb, 2);
+ }
+
rx_status = IEEE80211_SKB_RXCB(skb);
if (iwl_mvm_rx_crypto(mvm, hdr, rx_status, desc,
@@ -941,6 +942,12 @@
IWL_RX_MPDU_REORDER_BAID_MASK) >>
IWL_RX_MPDU_REORDER_BAID_SHIFT);
+ if (!mvm->tcm.paused && len >= sizeof(*hdr) &&
+ !is_multicast_ether_addr(hdr->addr1) &&
+ ieee80211_is_data(hdr->frame_control) &&
+ time_after(jiffies, mvm->tcm.ts + MVM_TCM_PERIOD))
+ schedule_delayed_work(&mvm->tcm.work, 0);
+
/*
* We have tx blocked stations (with CS bit). If we heard
* frames from a blocked station on a new channel we can
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
index b31f0ff..4b3753d7 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -19,9 +20,7 @@
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110,
- * USA
+ * along with this program
*
* The full GNU General Public License is included in this distribution
* in the file called COPYING.
@@ -76,12 +75,6 @@
#define IWL_DENSE_EBS_SCAN_RATIO 5
#define IWL_SPARSE_EBS_SCAN_RATIO 1
-enum iwl_mvm_traffic_load {
- IWL_MVM_TRAFFIC_LOW,
- IWL_MVM_TRAFFIC_MEDIUM,
- IWL_MVM_TRAFFIC_HIGH,
-};
-
#define IWL_SCAN_DWELL_ACTIVE 10
#define IWL_SCAN_DWELL_PASSIVE 110
#define IWL_SCAN_DWELL_FRAGMENTED 44
@@ -123,7 +116,9 @@
};
struct iwl_mvm_scan_params {
+ /* For CDB this is low band scan type, for non-CDB - type. */
enum iwl_mvm_scan_type type;
+ enum iwl_mvm_scan_type hb_type;
u32 n_channels;
u16 delay;
int n_ssids;
@@ -152,7 +147,7 @@
if (iwl_mvm_is_adaptive_dwell_supported(mvm))
return (void *)&cmd->v7.data;
- if (iwl_mvm_has_new_tx_api(mvm))
+ if (iwl_mvm_cdb_scan_api(mvm))
return (void *)&cmd->v6.data;
return (void *)&cmd->v1.data;
@@ -169,7 +164,7 @@
if (iwl_mvm_is_adaptive_dwell_supported(mvm))
return &cmd->v7.channel;
- if (iwl_mvm_has_new_tx_api(mvm))
+ if (iwl_mvm_cdb_scan_api(mvm))
return &cmd->v6.channel;
return &cmd->v1.channel;
@@ -234,15 +229,21 @@
static enum iwl_mvm_traffic_load iwl_mvm_get_traffic_load(struct iwl_mvm *mvm)
{
- return IWL_MVM_TRAFFIC_LOW;
+ return mvm->tcm.result.global_load;
+}
+
+static enum iwl_mvm_traffic_load
+iwl_mvm_get_traffic_load_band(struct iwl_mvm *mvm, enum nl80211_band band)
+{
+ return mvm->tcm.result.band_load[band];
}
static enum
-iwl_mvm_scan_type iwl_mvm_get_scan_type(struct iwl_mvm *mvm, bool p2p_device)
+iwl_mvm_scan_type _iwl_mvm_get_scan_type(struct iwl_mvm *mvm, bool p2p_device,
+ enum iwl_mvm_traffic_load load,
+ bool low_latency)
{
int global_cnt = 0;
- enum iwl_mvm_traffic_load load;
- bool low_latency;
ieee80211_iterate_active_interfaces_atomic(mvm->hw,
IEEE80211_IFACE_ITER_NORMAL,
@@ -251,9 +252,6 @@
if (!global_cnt)
return IWL_SCAN_TYPE_UNASSOC;
- load = iwl_mvm_get_traffic_load(mvm);
- low_latency = iwl_mvm_low_latency(mvm);
-
if ((load == IWL_MVM_TRAFFIC_HIGH || low_latency) && !p2p_device &&
fw_has_api(&mvm->fw->ucode_capa, IWL_UCODE_TLV_API_FRAGMENTED_SCAN))
return IWL_SCAN_TYPE_FRAGMENTED;
@@ -264,25 +262,57 @@
return IWL_SCAN_TYPE_WILD;
}
+static enum
+iwl_mvm_scan_type iwl_mvm_get_scan_type(struct iwl_mvm *mvm, bool p2p_device)
+{
+ enum iwl_mvm_traffic_load load;
+ bool low_latency;
+
+ load = iwl_mvm_get_traffic_load(mvm);
+ low_latency = iwl_mvm_low_latency(mvm);
+
+ return _iwl_mvm_get_scan_type(mvm, p2p_device, load, low_latency);
+}
+
+static enum
+iwl_mvm_scan_type iwl_mvm_get_scan_type_band(struct iwl_mvm *mvm,
+ bool p2p_device,
+ enum nl80211_band band)
+{
+ enum iwl_mvm_traffic_load load;
+ bool low_latency;
+
+ load = iwl_mvm_get_traffic_load_band(mvm, band);
+ low_latency = iwl_mvm_low_latency_band(mvm, band);
+
+ return _iwl_mvm_get_scan_type(mvm, p2p_device, load, low_latency);
+}
+
static int
iwl_mvm_get_measurement_dwell(struct iwl_mvm *mvm,
struct cfg80211_scan_request *req,
struct iwl_mvm_scan_params *params)
{
+ u32 duration = scan_timing[params->type].max_out_time;
+
if (!req->duration)
return 0;
- if (req->duration_mandatory &&
- req->duration > scan_timing[params->type].max_out_time) {
+ if (iwl_mvm_is_cdb_supported(mvm)) {
+ u32 hb_time = scan_timing[params->hb_type].max_out_time;
+
+ duration = min_t(u32, duration, hb_time);
+ }
+
+ if (req->duration_mandatory && req->duration > duration) {
IWL_DEBUG_SCAN(mvm,
"Measurement scan - too long dwell %hu (max out time %u)\n",
req->duration,
- scan_timing[params->type].max_out_time);
+ duration);
return -EOPNOTSUPP;
}
- return min_t(u32, (u32)req->duration,
- scan_timing[params->type].max_out_time);
+ return min_t(u32, (u32)req->duration, duration);
}
static inline bool iwl_mvm_rrm_scan_needed(struct iwl_mvm *mvm)
@@ -437,6 +467,7 @@
ieee80211_scan_completed(mvm->hw, &info);
iwl_mvm_unref(mvm, IWL_MVM_REF_SCAN);
cancel_delayed_work(&mvm->scan_timeout_dwork);
+ iwl_mvm_resume_tcm(mvm);
} else {
IWL_ERR(mvm,
"got scan complete notification but no scan is running\n");
@@ -1030,22 +1061,38 @@
static void iwl_mvm_fill_scan_config(struct iwl_mvm *mvm, void *config,
u32 flags, u8 channel_flags)
{
- enum iwl_mvm_scan_type type = iwl_mvm_get_scan_type(mvm, false);
struct iwl_scan_config *cfg = config;
cfg->flags = cpu_to_le32(flags);
cfg->tx_chains = cpu_to_le32(iwl_mvm_get_valid_tx_ant(mvm));
cfg->rx_chains = cpu_to_le32(iwl_mvm_scan_rx_ant(mvm));
cfg->legacy_rates = iwl_mvm_scan_config_rates(mvm);
- cfg->out_of_channel_time[0] =
- cpu_to_le32(scan_timing[type].max_out_time);
- cfg->suspend_time[0] = cpu_to_le32(scan_timing[type].suspend_time);
if (iwl_mvm_is_cdb_supported(mvm)) {
- cfg->suspend_time[1] =
- cpu_to_le32(scan_timing[type].suspend_time);
- cfg->out_of_channel_time[1] =
+ enum iwl_mvm_scan_type lb_type, hb_type;
+
+ lb_type = iwl_mvm_get_scan_type_band(mvm, false,
+ NL80211_BAND_2GHZ);
+ hb_type = iwl_mvm_get_scan_type_band(mvm, false,
+ NL80211_BAND_5GHZ);
+
+ cfg->out_of_channel_time[SCAN_LB_LMAC_IDX] =
+ cpu_to_le32(scan_timing[lb_type].max_out_time);
+ cfg->suspend_time[SCAN_LB_LMAC_IDX] =
+ cpu_to_le32(scan_timing[lb_type].suspend_time);
+
+ cfg->out_of_channel_time[SCAN_HB_LMAC_IDX] =
+ cpu_to_le32(scan_timing[hb_type].max_out_time);
+ cfg->suspend_time[SCAN_HB_LMAC_IDX] =
+ cpu_to_le32(scan_timing[hb_type].suspend_time);
+ } else {
+ enum iwl_mvm_scan_type type =
+ iwl_mvm_get_scan_type(mvm, false);
+
+ cfg->out_of_channel_time[SCAN_LB_LMAC_IDX] =
cpu_to_le32(scan_timing[type].max_out_time);
+ cfg->suspend_time[SCAN_LB_LMAC_IDX] =
+ cpu_to_le32(scan_timing[type].suspend_time);
}
iwl_mvm_fill_scan_dwell(mvm, &cfg->dwell);
@@ -1065,7 +1112,8 @@
struct iwl_host_cmd cmd = {
.id = iwl_cmd_id(SCAN_CFG_CMD, IWL_ALWAYS_LONG_GROUP, 0),
};
- enum iwl_mvm_scan_type type = iwl_mvm_get_scan_type(mvm, false);
+ enum iwl_mvm_scan_type type;
+ enum iwl_mvm_scan_type hb_type = IWL_SCAN_TYPE_NOT_SET;
int num_channels =
mvm->nvm_data->bands[NL80211_BAND_2GHZ].n_channels +
mvm->nvm_data->bands[NL80211_BAND_5GHZ].n_channels;
@@ -1075,10 +1123,20 @@
if (WARN_ON(num_channels > mvm->fw->ucode_capa.n_scan_channels))
return -ENOBUFS;
- if (type == mvm->scan_type)
- return 0;
+ if (iwl_mvm_is_cdb_supported(mvm)) {
+ type = iwl_mvm_get_scan_type_band(mvm, false,
+ NL80211_BAND_2GHZ);
+ hb_type = iwl_mvm_get_scan_type_band(mvm, false,
+ NL80211_BAND_5GHZ);
+ if (type == mvm->scan_type && hb_type == mvm->hb_scan_type)
+ return 0;
+ } else {
+ type = iwl_mvm_get_scan_type(mvm, false);
+ if (type == mvm->scan_type)
+ return 0;
+ }
- if (iwl_mvm_has_new_tx_api(mvm))
+ if (iwl_mvm_cdb_scan_api(mvm))
cmd_size = sizeof(struct iwl_scan_config);
else
cmd_size = sizeof(struct iwl_scan_config_v1);
@@ -1107,10 +1165,15 @@
IWL_CHANNEL_FLAG_EBS_ADD |
IWL_CHANNEL_FLAG_PRE_SCAN_PASSIVE2ACTIVE;
- if (iwl_mvm_has_new_tx_api(mvm)) {
- flags |= (type == IWL_SCAN_TYPE_FRAGMENTED) ?
- SCAN_CONFIG_FLAG_SET_LMAC2_FRAGMENTED :
- SCAN_CONFIG_FLAG_CLEAR_LMAC2_FRAGMENTED;
+ /*
+ * Check for fragmented scan on LMAC2 - high band.
+ * LMAC1 - low band is checked above.
+ */
+ if (iwl_mvm_cdb_scan_api(mvm)) {
+ if (iwl_mvm_is_cdb_supported(mvm))
+ flags |= (hb_type == IWL_SCAN_TYPE_FRAGMENTED) ?
+ SCAN_CONFIG_FLAG_SET_LMAC2_FRAGMENTED :
+ SCAN_CONFIG_FLAG_CLEAR_LMAC2_FRAGMENTED;
iwl_mvm_fill_scan_config(mvm, cfg, flags, channel_flags);
} else {
iwl_mvm_fill_scan_config_v1(mvm, cfg, flags, channel_flags);
@@ -1123,8 +1186,10 @@
IWL_DEBUG_SCAN(mvm, "Sending UMAC scan config\n");
ret = iwl_mvm_send_cmd(mvm, &cmd);
- if (!ret)
+ if (!ret) {
mvm->scan_type = type;
+ mvm->hb_scan_type = hb_type;
+ }
kfree(cfg);
return ret;
@@ -1178,7 +1243,7 @@
cpu_to_le32(timing->suspend_time);
if (iwl_mvm_is_cdb_supported(mvm)) {
- hb_timing = &scan_timing[params->type];
+ hb_timing = &scan_timing[params->hb_type];
cmd->v7.max_out_time[SCAN_HB_LMAC_IDX] =
cpu_to_le32(hb_timing->max_out_time);
@@ -1208,7 +1273,7 @@
cmd->v1.fragmented_dwell = IWL_SCAN_DWELL_FRAGMENTED;
if (iwl_mvm_is_cdb_supported(mvm)) {
- hb_timing = &scan_timing[params->type];
+ hb_timing = &scan_timing[params->hb_type];
cmd->v6.max_out_time[SCAN_HB_LMAC_IDX] =
cpu_to_le32(hb_timing->max_out_time);
@@ -1216,7 +1281,7 @@
cpu_to_le32(hb_timing->suspend_time);
}
- if (iwl_mvm_has_new_tx_api(mvm)) {
+ if (iwl_mvm_cdb_scan_api(mvm)) {
cmd->v6.scan_priority =
cpu_to_le32(IWL_SCAN_PRIORITY_EXT_6);
cmd->v6.max_out_time[SCAN_LB_LMAC_IDX] =
@@ -1232,6 +1297,11 @@
cpu_to_le32(timing->suspend_time);
}
}
+
+ if (iwl_mvm_is_regular_scan(params))
+ cmd->ooc_priority = cpu_to_le32(IWL_SCAN_PRIORITY_EXT_6);
+ else
+ cmd->ooc_priority = cpu_to_le32(IWL_SCAN_PRIORITY_EXT_2);
}
static void
@@ -1262,11 +1332,12 @@
if (params->n_ssids == 1 && params->ssids[0].ssid_len != 0)
flags |= IWL_UMAC_SCAN_GEN_FLAGS_PRE_CONNECT;
- if (params->type == IWL_SCAN_TYPE_FRAGMENTED) {
+ if (params->type == IWL_SCAN_TYPE_FRAGMENTED)
flags |= IWL_UMAC_SCAN_GEN_FLAGS_FRAGMENTED;
- if (iwl_mvm_is_cdb_supported(mvm))
- flags |= IWL_UMAC_SCAN_GEN_FLAGS_LMAC2_FRAGMENTED;
- }
+
+ if (iwl_mvm_is_cdb_supported(mvm) &&
+ params->hb_type == IWL_SCAN_TYPE_FRAGMENTED)
+ flags |= IWL_UMAC_SCAN_GEN_FLAGS_LMAC2_FRAGMENTED;
if (iwl_mvm_rrm_scan_needed(mvm) &&
fw_has_capa(&mvm->fw->ucode_capa,
@@ -1497,6 +1568,21 @@
iwl_force_nmi(mvm->trans);
}
+static void iwl_mvm_fill_scan_type(struct iwl_mvm *mvm,
+ struct iwl_mvm_scan_params *params,
+ bool p2p)
+{
+ if (iwl_mvm_is_cdb_supported(mvm)) {
+ params->type =
+ iwl_mvm_get_scan_type_band(mvm, p2p,
+ NL80211_BAND_2GHZ);
+ params->hb_type =
+ iwl_mvm_get_scan_type_band(mvm, p2p,
+ NL80211_BAND_5GHZ);
+ } else {
+ params->type = iwl_mvm_get_scan_type(mvm, p2p);
+ }
+}
int iwl_mvm_reg_scan_start(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
struct cfg80211_scan_request *req,
struct ieee80211_scan_ies *ies)
@@ -1544,9 +1630,8 @@
params.scan_plans = &scan_plan;
params.n_scan_plans = 1;
- params.type =
- iwl_mvm_get_scan_type(mvm,
- vif->type == NL80211_IFTYPE_P2P_DEVICE);
+ iwl_mvm_fill_scan_type(mvm, ¶ms,
+ vif->type == NL80211_IFTYPE_P2P_DEVICE);
ret = iwl_mvm_get_measurement_dwell(mvm, req, ¶ms);
if (ret < 0)
@@ -1568,6 +1653,8 @@
if (ret)
return ret;
+ iwl_mvm_pause_tcm(mvm, false);
+
ret = iwl_mvm_send_cmd(mvm, &hcmd);
if (ret) {
/* If the scan failed, it usually means that the FW was unable
@@ -1575,6 +1662,7 @@
* should try to send the command again with different params.
*/
IWL_ERR(mvm, "Scan failed! ret %d\n", ret);
+ iwl_mvm_resume_tcm(mvm);
return ret;
}
@@ -1638,9 +1726,8 @@
params.n_scan_plans = req->n_scan_plans;
params.scan_plans = req->scan_plans;
- params.type =
- iwl_mvm_get_scan_type(mvm,
- vif->type == NL80211_IFTYPE_P2P_DEVICE);
+ iwl_mvm_fill_scan_type(mvm, ¶ms,
+ vif->type == NL80211_IFTYPE_P2P_DEVICE);
/* In theory, LMAC scans can handle a 32-bit delay, but since
* waiting for over 18 hours to start the scan is a bit silly
@@ -1711,6 +1798,7 @@
mvm->scan_vif = NULL;
iwl_mvm_unref(mvm, IWL_MVM_REF_SCAN);
cancel_delayed_work(&mvm->scan_timeout_dwork);
+ iwl_mvm_resume_tcm(mvm);
} else if (mvm->scan_uid_status[uid] == IWL_MVM_SCAN_SCHED) {
ieee80211_sched_scan_stopped(mvm->hw);
mvm->sched_scan_pass_all = SCHED_SCAN_PASS_ALL_DISABLED;
@@ -1827,7 +1915,7 @@
base_size = IWL_SCAN_REQ_UMAC_SIZE_V8;
else if (iwl_mvm_is_adaptive_dwell_supported(mvm))
base_size = IWL_SCAN_REQ_UMAC_SIZE_V7;
- else if (iwl_mvm_has_new_tx_api(mvm))
+ else if (iwl_mvm_cdb_scan_api(mvm))
base_size = IWL_SCAN_REQ_UMAC_SIZE_V6;
if (fw_has_capa(&mvm->fw->ucode_capa, IWL_UCODE_TLV_CAPA_UMAC_SCAN))
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
index 80067eb..4ddd2c4 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2015 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2012 - 2015 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -2466,6 +2468,15 @@
lockdep_assert_held(&mvm->mutex);
+ if (mvmsta->tid_data[tid].txq_id == IWL_MVM_INVALID_QUEUE &&
+ iwl_mvm_has_new_tx_api(mvm)) {
+ u8 ac = tid_to_mac80211_ac[tid];
+
+ ret = iwl_mvm_sta_alloc_queue_tvqm(mvm, sta, ac, tid);
+ if (ret)
+ return ret;
+ }
+
spin_lock_bh(&mvmsta->lock);
/* possible race condition - we entered D0i3 while starting agg */
@@ -2887,7 +2898,7 @@
u32 sta_id,
struct ieee80211_key_conf *key, bool mcast,
u32 tkip_iv32, u16 *tkip_p1k, u32 cmd_flags,
- u8 key_offset)
+ u8 key_offset, bool mfp)
{
union {
struct iwl_mvm_add_sta_key_cmd_v1 cmd_v1;
@@ -2960,6 +2971,8 @@
if (mcast)
key_flags |= cpu_to_le16(STA_KEY_MULTICAST);
+ if (mfp)
+ key_flags |= cpu_to_le16(STA_KEY_MFP);
u.cmd.common.key_offset = key_offset;
u.cmd.common.key_flags = key_flags;
@@ -3101,11 +3114,13 @@
struct ieee80211_key_seq seq;
u16 p1k[5];
u32 sta_id;
+ bool mfp = false;
if (sta) {
struct iwl_mvm_sta *mvm_sta = iwl_mvm_sta_from_mac80211(sta);
sta_id = mvm_sta->sta_id;
+ mfp = sta->mfp;
} else if (vif->type == NL80211_IFTYPE_AP &&
!(keyconf->flags & IEEE80211_KEY_FLAG_PAIRWISE)) {
struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
@@ -3127,7 +3142,8 @@
ieee80211_get_key_rx_seq(keyconf, 0, &seq);
ieee80211_get_tkip_rx_p1k(keyconf, addr, seq.tkip.iv32, p1k);
ret = iwl_mvm_send_sta_key(mvm, sta_id, keyconf, mcast,
- seq.tkip.iv32, p1k, 0, key_offset);
+ seq.tkip.iv32, p1k, 0, key_offset,
+ mfp);
break;
case WLAN_CIPHER_SUITE_CCMP:
case WLAN_CIPHER_SUITE_WEP40:
@@ -3135,11 +3151,11 @@
case WLAN_CIPHER_SUITE_GCMP:
case WLAN_CIPHER_SUITE_GCMP_256:
ret = iwl_mvm_send_sta_key(mvm, sta_id, keyconf, mcast,
- 0, NULL, 0, key_offset);
+ 0, NULL, 0, key_offset, mfp);
break;
default:
ret = iwl_mvm_send_sta_key(mvm, sta_id, keyconf, mcast,
- 0, NULL, 0, key_offset);
+ 0, NULL, 0, key_offset, mfp);
}
return ret;
@@ -3366,6 +3382,7 @@
{
struct iwl_mvm_sta *mvm_sta;
bool mcast = !(keyconf->flags & IEEE80211_KEY_FLAG_PAIRWISE);
+ bool mfp = sta ? sta->mfp : false;
rcu_read_lock();
@@ -3373,7 +3390,8 @@
if (WARN_ON_ONCE(!mvm_sta))
goto unlock;
iwl_mvm_send_sta_key(mvm, mvm_sta->sta_id, keyconf, mcast,
- iv32, phase1key, CMD_ASYNC, keyconf->hw_key_idx);
+ iv32, phase1key, CMD_ASYNC, keyconf->hw_key_idx,
+ mfp);
unlock:
rcu_read_unlock();
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.h b/drivers/net/wireless/intel/iwlwifi/mvm/sta.h
index 5ffd6ad..60502c8 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.h
@@ -391,7 +391,9 @@
* @tx_protection: reference counter for controlling the Tx protection.
* @tt_tx_protection: is thermal throttling enable Tx protection?
* @disable_tx: is tx to this STA disabled?
- * @tlc_amsdu: true if A-MSDU is allowed
+ * @amsdu_enabled: bitmap of TX AMSDU allowed TIDs.
+ * In case TLC offload is not active it is either 0xFFFF or 0.
+ * @max_amsdu_len: max AMSDU length
* @agg_tids: bitmap of tids whose status is operational aggregated (IWL_AGG_ON)
* @sleep_tx_count: the number of frames that we told the firmware to let out
* even when that station is asleep. This is useful in case the queue
@@ -436,7 +438,8 @@
bool tt_tx_protection;
bool disable_tx;
- bool tlc_amsdu;
+ u16 amsdu_enabled;
+ u16 max_amsdu_len;
bool sleeping;
bool associated;
u8 agg_tids;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index 7950659..df4c604 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -767,16 +767,16 @@
u16 snap_ip_tcp, pad;
unsigned int dbg_max_amsdu_len;
netdev_features_t netdev_flags = NETIF_F_CSUM_MASK | NETIF_F_SG;
- u8 *qc, tid, txf;
+ u8 tid, txf;
snap_ip_tcp = 8 + skb_transport_header(skb) - skb_network_header(skb) +
tcp_hdrlen(skb);
dbg_max_amsdu_len = READ_ONCE(mvm->max_amsdu_len);
- if (!sta->max_amsdu_len ||
+ if (!mvmsta->max_amsdu_len ||
!ieee80211_is_data_qos(hdr->frame_control) ||
- (!mvmsta->tlc_amsdu && !dbg_max_amsdu_len))
+ (!mvmsta->amsdu_enabled && !dbg_max_amsdu_len))
return iwl_mvm_tx_tso_segment(skb, 1, netdev_flags, mpdus_skb);
/*
@@ -790,8 +790,7 @@
return iwl_mvm_tx_tso_segment(skb, 1, netdev_flags, mpdus_skb);
}
- qc = ieee80211_get_qos_ctl(hdr);
- tid = *qc & IEEE80211_QOS_CTL_TID_MASK;
+ tid = ieee80211_get_tid(hdr);
if (WARN_ON_ONCE(tid >= IWL_MAX_TID_COUNT))
return -EINVAL;
@@ -803,7 +802,11 @@
!mvmsta->tid_data[tid].amsdu_in_ampdu_allowed)
return iwl_mvm_tx_tso_segment(skb, 1, netdev_flags, mpdus_skb);
- max_amsdu_len = sta->max_amsdu_len;
+ if (iwl_mvm_vif_low_latency(iwl_mvm_vif_from_mac80211(mvmsta->vif)) ||
+ !(mvmsta->amsdu_enabled & BIT(tid)))
+ return iwl_mvm_tx_tso_segment(skb, 1, netdev_flags, mpdus_skb);
+
+ max_amsdu_len = mvmsta->max_amsdu_len;
/* the Tx FIFO to which this A-MSDU will be routed */
txf = iwl_mvm_mac_ac_to_tx_fifo(mvm, tid_to_mac80211_ac[tid]);
@@ -841,7 +844,7 @@
*/
num_subframes = (max_amsdu_len + pad) / (subf_len + pad);
if (num_subframes > 1)
- *qc |= IEEE80211_QOS_CTL_A_MSDU_PRESENT;
+ *ieee80211_get_qos_ctl(hdr) |= IEEE80211_QOS_CTL_A_MSDU_PRESENT;
tcp_payload_len = skb_tail_pointer(skb) - skb_transport_header(skb) -
tcp_hdrlen(skb) + skb->data_len;
@@ -930,6 +933,32 @@
return false;
}
+static void iwl_mvm_tx_airtime(struct iwl_mvm *mvm,
+ struct iwl_mvm_sta *mvmsta,
+ int airtime)
+{
+ int mac = mvmsta->mac_id_n_color & FW_CTXT_ID_MSK;
+ struct iwl_mvm_tcm_mac *mdata = &mvm->tcm.data[mac];
+
+ if (mvm->tcm.paused)
+ return;
+
+ if (time_after(jiffies, mvm->tcm.ts + MVM_TCM_PERIOD))
+ schedule_delayed_work(&mvm->tcm.work, 0);
+
+ mdata->tx.airtime += airtime;
+}
+
+static void iwl_mvm_tx_pkt_queued(struct iwl_mvm *mvm,
+ struct iwl_mvm_sta *mvmsta, int tid)
+{
+ u32 ac = tid_to_mac80211_ac[tid];
+ int mac = mvmsta->mac_id_n_color & FW_CTXT_ID_MSK;
+ struct iwl_mvm_tcm_mac *mdata = &mvm->tcm.data[mac];
+
+ mdata->tx.pkts[ac]++;
+}
+
/*
* Sets the fields in the Tx cmd that are crypto related
*/
@@ -976,9 +1005,7 @@
* assignment of MGMT TID
*/
if (ieee80211_is_data_qos(fc) && !ieee80211_is_qos_nullfunc(fc)) {
- u8 *qc = NULL;
- qc = ieee80211_get_qos_ctl(hdr);
- tid = qc[0] & IEEE80211_QOS_CTL_TID_MASK;
+ tid = ieee80211_get_tid(hdr);
if (WARN_ON_ONCE(tid >= IWL_MAX_TID_COUNT))
goto drop_unlock_sta;
@@ -1067,6 +1094,8 @@
spin_unlock(&mvmsta->lock);
+ iwl_mvm_tx_pkt_queued(mvm, mvmsta, tid == IWL_MAX_TID_COUNT ? 0 : tid);
+
return 0;
drop_unlock_sta:
@@ -1469,6 +1498,9 @@
if (!IS_ERR(sta)) {
mvmsta = iwl_mvm_sta_from_mac80211(sta);
+ iwl_mvm_tx_airtime(mvm, mvmsta,
+ le16_to_cpu(tx_resp->wireless_media_time));
+
if (tid != IWL_TID_NON_QOS && tid != IWL_MGMT_TID) {
struct iwl_mvm_tid_data *tid_data =
&mvmsta->tid_data[tid];
@@ -1610,6 +1642,8 @@
le16_to_cpu(tx_resp->wireless_media_time);
mvmsta->tid_data[tid].lq_color =
TX_RES_RATE_TABLE_COL_GET(tx_resp->tlc_info);
+ iwl_mvm_tx_airtime(mvm, mvmsta,
+ le16_to_cpu(tx_resp->wireless_media_time));
}
rcu_read_unlock();
@@ -1800,6 +1834,8 @@
le32_to_cpu(ba_res->tx_rate));
}
+ iwl_mvm_tx_airtime(mvm, mvmsta,
+ le32_to_cpu(ba_res->wireless_time));
out_unlock:
rcu_read_unlock();
out:
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
index d99d9ea..b002a7a 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
@@ -8,6 +8,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2014 Intel Mobile Communications GmbH
* Copyright (C) 2015 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2012 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2014 Intel Mobile Communications GmbH
* Copyright (C) 2015 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -278,8 +280,8 @@
u8 ind = last_idx;
int i;
- for (i = 0; i < MAX_RS_ANT_NUM; i++) {
- ind = (ind + 1) % MAX_RS_ANT_NUM;
+ for (i = 0; i < MAX_ANT_NUM; i++) {
+ ind = (ind + 1) % MAX_ANT_NUM;
if (valid & BIT(ind))
return ind;
}
@@ -520,15 +522,15 @@
/* set INIT_DONE flag */
iwl_set_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ BIT(trans->cfg->csr->flag_init_done));
/* and wait for clock stabilization */
if (trans->cfg->device_family == IWL_DEVICE_FAMILY_8000)
udelay(2);
err = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
25000);
if (err < 0) {
IWL_DEBUG_INFO(trans,
@@ -728,12 +730,14 @@
.sta_id = sta_id,
.tid = tid,
};
- int queue;
+ int queue, size = IWL_DEFAULT_QUEUE_SIZE;
- if (cmd.tid == IWL_MAX_TID_COUNT)
+ if (cmd.tid == IWL_MAX_TID_COUNT) {
cmd.tid = IWL_MGMT_TID;
+ size = IWL_MGMT_QUEUE_SIZE;
+ }
queue = iwl_trans_txq_alloc(mvm->trans, (void *)&cmd,
- SCD_QUEUE_CFG, timeout);
+ SCD_QUEUE_CFG, size, timeout);
if (queue < 0) {
IWL_DEBUG_TX_QUEUES(mvm,
@@ -1074,23 +1078,48 @@
return iwl_mvm_power_update_mac(mvm);
}
+struct iwl_mvm_low_latency_iter {
+ bool result;
+ bool result_per_band[NUM_NL80211_BANDS];
+};
+
static void iwl_mvm_ll_iter(void *_data, u8 *mac, struct ieee80211_vif *vif)
{
- bool *result = _data;
+ struct iwl_mvm_low_latency_iter *result = _data;
+ struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
+ enum nl80211_band band;
- if (iwl_mvm_vif_low_latency(iwl_mvm_vif_from_mac80211(vif)))
- *result = true;
+ if (iwl_mvm_vif_low_latency(mvmvif)) {
+ result->result = true;
+
+ if (!mvmvif->phy_ctxt)
+ return;
+
+ band = mvmvif->phy_ctxt->channel->band;
+ result->result_per_band[band] = true;
+ }
}
bool iwl_mvm_low_latency(struct iwl_mvm *mvm)
{
- bool result = false;
+ struct iwl_mvm_low_latency_iter data = {};
ieee80211_iterate_active_interfaces_atomic(
mvm->hw, IEEE80211_IFACE_ITER_NORMAL,
- iwl_mvm_ll_iter, &result);
+ iwl_mvm_ll_iter, &data);
- return result;
+ return data.result;
+}
+
+bool iwl_mvm_low_latency_band(struct iwl_mvm *mvm, enum nl80211_band band)
+{
+ struct iwl_mvm_low_latency_iter data = {};
+
+ ieee80211_iterate_active_interfaces_atomic(
+ mvm->hw, IEEE80211_IFACE_ITER_NORMAL,
+ iwl_mvm_ll_iter, &data);
+
+ return data.result_per_band[band];
}
struct iwl_bss_iter_data {
@@ -1429,6 +1458,387 @@
sta->addr, tid);
}
+u8 iwl_mvm_tcm_load_percentage(u32 airtime, u32 elapsed)
+{
+ if (!elapsed)
+ return 0;
+
+ return (100 * airtime / elapsed) / USEC_PER_MSEC;
+}
+
+static enum iwl_mvm_traffic_load
+iwl_mvm_tcm_load(struct iwl_mvm *mvm, u32 airtime, unsigned long elapsed)
+{
+ u8 load = iwl_mvm_tcm_load_percentage(airtime, elapsed);
+
+ if (load > IWL_MVM_TCM_LOAD_HIGH_THRESH)
+ return IWL_MVM_TRAFFIC_HIGH;
+ if (load > IWL_MVM_TCM_LOAD_MEDIUM_THRESH)
+ return IWL_MVM_TRAFFIC_MEDIUM;
+
+ return IWL_MVM_TRAFFIC_LOW;
+}
+
+struct iwl_mvm_tcm_iter_data {
+ struct iwl_mvm *mvm;
+ bool any_sent;
+};
+
+static void iwl_mvm_tcm_iter(void *_data, u8 *mac, struct ieee80211_vif *vif)
+{
+ struct iwl_mvm_tcm_iter_data *data = _data;
+ struct iwl_mvm *mvm = data->mvm;
+ struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
+ bool low_latency, prev = mvmvif->low_latency & LOW_LATENCY_TRAFFIC;
+
+ if (mvmvif->id >= NUM_MAC_INDEX_DRIVER)
+ return;
+
+ low_latency = mvm->tcm.result.low_latency[mvmvif->id];
+
+ if (!mvm->tcm.result.change[mvmvif->id] &&
+ prev == low_latency) {
+ iwl_mvm_update_quotas(mvm, false, NULL);
+ return;
+ }
+
+ if (prev != low_latency) {
+ /* this sends traffic load and updates quota as well */
+ iwl_mvm_update_low_latency(mvm, vif, low_latency,
+ LOW_LATENCY_TRAFFIC);
+ } else {
+ iwl_mvm_update_quotas(mvm, false, NULL);
+ }
+
+ data->any_sent = true;
+}
+
+static void iwl_mvm_tcm_results(struct iwl_mvm *mvm)
+{
+ struct iwl_mvm_tcm_iter_data data = {
+ .mvm = mvm,
+ .any_sent = false,
+ };
+
+ mutex_lock(&mvm->mutex);
+
+ ieee80211_iterate_active_interfaces(
+ mvm->hw, IEEE80211_IFACE_ITER_NORMAL,
+ iwl_mvm_tcm_iter, &data);
+
+ if (fw_has_capa(&mvm->fw->ucode_capa, IWL_UCODE_TLV_CAPA_UMAC_SCAN))
+ iwl_mvm_config_scan(mvm);
+
+ mutex_unlock(&mvm->mutex);
+}
+
+static void iwl_mvm_tcm_uapsd_nonagg_detected_wk(struct work_struct *wk)
+{
+ struct iwl_mvm *mvm;
+ struct iwl_mvm_vif *mvmvif;
+ struct ieee80211_vif *vif;
+
+ mvmvif = container_of(wk, struct iwl_mvm_vif,
+ uapsd_nonagg_detected_wk.work);
+ vif = container_of((void *)mvmvif, struct ieee80211_vif, drv_priv);
+ mvm = mvmvif->mvm;
+
+ if (mvm->tcm.data[mvmvif->id].opened_rx_ba_sessions)
+ return;
+
+ /* remember that this AP is broken */
+ memcpy(mvm->uapsd_noagg_bssids[mvm->uapsd_noagg_bssid_write_idx].addr,
+ vif->bss_conf.bssid, ETH_ALEN);
+ mvm->uapsd_noagg_bssid_write_idx++;
+ if (mvm->uapsd_noagg_bssid_write_idx >= IWL_MVM_UAPSD_NOAGG_LIST_LEN)
+ mvm->uapsd_noagg_bssid_write_idx = 0;
+
+ iwl_mvm_connection_loss(mvm, vif,
+ "AP isn't using AMPDU with uAPSD enabled");
+}
+
+static void iwl_mvm_uapsd_agg_disconnect_iter(void *data, u8 *mac,
+ struct ieee80211_vif *vif)
+{
+ struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
+ struct iwl_mvm *mvm = mvmvif->mvm;
+ int *mac_id = data;
+
+ if (vif->type != NL80211_IFTYPE_STATION)
+ return;
+
+ if (mvmvif->id != *mac_id)
+ return;
+
+ if (!vif->bss_conf.assoc)
+ return;
+
+ if (!mvmvif->queue_params[IEEE80211_AC_VO].uapsd &&
+ !mvmvif->queue_params[IEEE80211_AC_VI].uapsd &&
+ !mvmvif->queue_params[IEEE80211_AC_BE].uapsd &&
+ !mvmvif->queue_params[IEEE80211_AC_BK].uapsd)
+ return;
+
+ if (mvm->tcm.data[*mac_id].uapsd_nonagg_detect.detected)
+ return;
+
+ mvm->tcm.data[*mac_id].uapsd_nonagg_detect.detected = true;
+ IWL_INFO(mvm,
+ "detected AP should do aggregation but isn't, likely due to U-APSD\n");
+ schedule_delayed_work(&mvmvif->uapsd_nonagg_detected_wk, 15 * HZ);
+}
+
+static void iwl_mvm_check_uapsd_agg_expected_tpt(struct iwl_mvm *mvm,
+ unsigned int elapsed,
+ int mac)
+{
+ u64 bytes = mvm->tcm.data[mac].uapsd_nonagg_detect.rx_bytes;
+ u64 tpt;
+ unsigned long rate;
+
+ rate = ewma_rate_read(&mvm->tcm.data[mac].uapsd_nonagg_detect.rate);
+
+ if (!rate || mvm->tcm.data[mac].opened_rx_ba_sessions ||
+ mvm->tcm.data[mac].uapsd_nonagg_detect.detected)
+ return;
+
+ if (iwl_mvm_has_new_rx_api(mvm)) {
+ tpt = 8 * bytes; /* kbps */
+ do_div(tpt, elapsed);
+ rate *= 1000; /* kbps */
+ if (tpt < 22 * rate / 100)
+ return;
+ } else {
+ /*
+ * the rate here is actually the threshold, in 100Kbps units,
+ * so do the needed conversion from bytes to 100Kbps:
+ * 100kb = bits / (100 * 1000),
+ * 100kbps = 100kb / (msecs / 1000) ==
+ * (bits / (100 * 1000)) / (msecs / 1000) ==
+ * bits / (100 * msecs)
+ */
+ tpt = (8 * bytes);
+ do_div(tpt, elapsed * 100);
+ if (tpt < rate)
+ return;
+ }
+
+ ieee80211_iterate_active_interfaces_atomic(
+ mvm->hw, IEEE80211_IFACE_ITER_NORMAL,
+ iwl_mvm_uapsd_agg_disconnect_iter, &mac);
+}
+
+static void iwl_mvm_tcm_iterator(void *_data, u8 *mac,
+ struct ieee80211_vif *vif)
+{
+ struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
+ u32 *band = _data;
+
+ if (!mvmvif->phy_ctxt)
+ return;
+
+ band[mvmvif->id] = mvmvif->phy_ctxt->channel->band;
+}
+
+static unsigned long iwl_mvm_calc_tcm_stats(struct iwl_mvm *mvm,
+ unsigned long ts,
+ bool handle_uapsd)
+{
+ unsigned int elapsed = jiffies_to_msecs(ts - mvm->tcm.ts);
+ unsigned int uapsd_elapsed =
+ jiffies_to_msecs(ts - mvm->tcm.uapsd_nonagg_ts);
+ u32 total_airtime = 0;
+ u32 band_airtime[NUM_NL80211_BANDS] = {0};
+ u32 band[NUM_MAC_INDEX_DRIVER] = {0};
+ int ac, mac, i;
+ bool low_latency = false;
+ enum iwl_mvm_traffic_load load, band_load;
+ bool handle_ll = time_after(ts, mvm->tcm.ll_ts + MVM_LL_PERIOD);
+
+ if (handle_ll)
+ mvm->tcm.ll_ts = ts;
+ if (handle_uapsd)
+ mvm->tcm.uapsd_nonagg_ts = ts;
+
+ mvm->tcm.result.elapsed = elapsed;
+
+ ieee80211_iterate_active_interfaces_atomic(mvm->hw,
+ IEEE80211_IFACE_ITER_NORMAL,
+ iwl_mvm_tcm_iterator,
+ &band);
+
+ for (mac = 0; mac < NUM_MAC_INDEX_DRIVER; mac++) {
+ struct iwl_mvm_tcm_mac *mdata = &mvm->tcm.data[mac];
+ u32 vo_vi_pkts = 0;
+ u32 airtime = mdata->rx.airtime + mdata->tx.airtime;
+
+ total_airtime += airtime;
+ band_airtime[band[mac]] += airtime;
+
+ load = iwl_mvm_tcm_load(mvm, airtime, elapsed);
+ mvm->tcm.result.change[mac] = load != mvm->tcm.result.load[mac];
+ mvm->tcm.result.load[mac] = load;
+ mvm->tcm.result.airtime[mac] = airtime;
+
+ for (ac = IEEE80211_AC_VO; ac <= IEEE80211_AC_VI; ac++)
+ vo_vi_pkts += mdata->rx.pkts[ac] +
+ mdata->tx.pkts[ac];
+
+ /* enable immediately with enough packets but defer disabling */
+ if (vo_vi_pkts > IWL_MVM_TCM_LOWLAT_ENABLE_THRESH)
+ mvm->tcm.result.low_latency[mac] = true;
+ else if (handle_ll)
+ mvm->tcm.result.low_latency[mac] = false;
+
+ if (handle_ll) {
+ /* clear old data */
+ memset(&mdata->rx.pkts, 0, sizeof(mdata->rx.pkts));
+ memset(&mdata->tx.pkts, 0, sizeof(mdata->tx.pkts));
+ }
+ low_latency |= mvm->tcm.result.low_latency[mac];
+
+ if (!mvm->tcm.result.low_latency[mac] && handle_uapsd)
+ iwl_mvm_check_uapsd_agg_expected_tpt(mvm, uapsd_elapsed,
+ mac);
+ /* clear old data */
+ if (handle_uapsd)
+ mdata->uapsd_nonagg_detect.rx_bytes = 0;
+ memset(&mdata->rx.airtime, 0, sizeof(mdata->rx.airtime));
+ memset(&mdata->tx.airtime, 0, sizeof(mdata->tx.airtime));
+ }
+
+ load = iwl_mvm_tcm_load(mvm, total_airtime, elapsed);
+ mvm->tcm.result.global_change = load != mvm->tcm.result.global_load;
+ mvm->tcm.result.global_load = load;
+
+ for (i = 0; i < NUM_NL80211_BANDS; i++) {
+ band_load = iwl_mvm_tcm_load(mvm, band_airtime[i], elapsed);
+ mvm->tcm.result.band_load[i] = band_load;
+ }
+
+ /*
+ * If the current load isn't low we need to force re-evaluation
+ * in the TCM period, so that we can return to low load if there
+ * was no traffic at all (and thus iwl_mvm_recalc_tcm didn't get
+ * triggered by traffic).
+ */
+ if (load != IWL_MVM_TRAFFIC_LOW)
+ return MVM_TCM_PERIOD;
+ /*
+ * If low-latency is active we need to force re-evaluation after
+ * (the longer) MVM_LL_PERIOD, so that we can disable low-latency
+ * when there's no traffic at all.
+ */
+ if (low_latency)
+ return MVM_LL_PERIOD;
+ /*
+ * Otherwise, we don't need to run the work struct because we're
+ * in the default "idle" state - traffic indication is low (which
+ * also covers the "no traffic" case) and low-latency is disabled
+ * so there's no state that may need to be disabled when there's
+ * no traffic at all.
+ *
+ * Note that this has no impact on the regular scheduling of the
+ * updates triggered by traffic - those happen whenever one of the
+ * two timeouts expire (if there's traffic at all.)
+ */
+ return 0;
+}
+
+void iwl_mvm_recalc_tcm(struct iwl_mvm *mvm)
+{
+ unsigned long ts = jiffies;
+ bool handle_uapsd =
+ time_after(ts, mvm->tcm.uapsd_nonagg_ts +
+ msecs_to_jiffies(IWL_MVM_UAPSD_NONAGG_PERIOD));
+
+ spin_lock(&mvm->tcm.lock);
+ if (mvm->tcm.paused || !time_after(ts, mvm->tcm.ts + MVM_TCM_PERIOD)) {
+ spin_unlock(&mvm->tcm.lock);
+ return;
+ }
+ spin_unlock(&mvm->tcm.lock);
+
+ if (handle_uapsd && iwl_mvm_has_new_rx_api(mvm)) {
+ mutex_lock(&mvm->mutex);
+ if (iwl_mvm_request_statistics(mvm, true))
+ handle_uapsd = false;
+ mutex_unlock(&mvm->mutex);
+ }
+
+ spin_lock(&mvm->tcm.lock);
+ /* re-check if somebody else won the recheck race */
+ if (!mvm->tcm.paused && time_after(ts, mvm->tcm.ts + MVM_TCM_PERIOD)) {
+ /* calculate statistics */
+ unsigned long work_delay = iwl_mvm_calc_tcm_stats(mvm, ts,
+ handle_uapsd);
+
+ /* the memset needs to be visible before the timestamp */
+ smp_mb();
+ mvm->tcm.ts = ts;
+ if (work_delay)
+ schedule_delayed_work(&mvm->tcm.work, work_delay);
+ }
+ spin_unlock(&mvm->tcm.lock);
+
+ iwl_mvm_tcm_results(mvm);
+}
+
+void iwl_mvm_tcm_work(struct work_struct *work)
+{
+ struct delayed_work *delayed_work = to_delayed_work(work);
+ struct iwl_mvm *mvm = container_of(delayed_work, struct iwl_mvm,
+ tcm.work);
+
+ iwl_mvm_recalc_tcm(mvm);
+}
+
+void iwl_mvm_pause_tcm(struct iwl_mvm *mvm, bool with_cancel)
+{
+ spin_lock_bh(&mvm->tcm.lock);
+ mvm->tcm.paused = true;
+ spin_unlock_bh(&mvm->tcm.lock);
+ if (with_cancel)
+ cancel_delayed_work_sync(&mvm->tcm.work);
+}
+
+void iwl_mvm_resume_tcm(struct iwl_mvm *mvm)
+{
+ int mac;
+
+ spin_lock_bh(&mvm->tcm.lock);
+ mvm->tcm.ts = jiffies;
+ mvm->tcm.ll_ts = jiffies;
+ for (mac = 0; mac < NUM_MAC_INDEX_DRIVER; mac++) {
+ struct iwl_mvm_tcm_mac *mdata = &mvm->tcm.data[mac];
+
+ memset(&mdata->rx.pkts, 0, sizeof(mdata->rx.pkts));
+ memset(&mdata->tx.pkts, 0, sizeof(mdata->tx.pkts));
+ memset(&mdata->rx.airtime, 0, sizeof(mdata->rx.airtime));
+ memset(&mdata->tx.airtime, 0, sizeof(mdata->tx.airtime));
+ }
+ /* The TCM data needs to be reset before "paused" flag changes */
+ smp_mb();
+ mvm->tcm.paused = false;
+ spin_unlock_bh(&mvm->tcm.lock);
+}
+
+void iwl_mvm_tcm_add_vif(struct iwl_mvm *mvm, struct ieee80211_vif *vif)
+{
+ struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
+
+ INIT_DELAYED_WORK(&mvmvif->uapsd_nonagg_detected_wk,
+ iwl_mvm_tcm_uapsd_nonagg_detected_wk);
+}
+
+void iwl_mvm_tcm_rm_vif(struct iwl_mvm *mvm, struct ieee80211_vif *vif)
+{
+ struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif);
+
+ cancel_delayed_work_sync(&mvmvif->uapsd_nonagg_detected_wk);
+}
+
+
void iwl_mvm_get_sync_time(struct iwl_mvm *mvm, u32 *gp2, u64 *boottime)
{
bool ps_disabled;
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info.c b/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info.c
index 5ef216f..3fc4343 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info.c
@@ -244,7 +244,7 @@
ctxt_info->hcmd_cfg.cmd_queue_addr =
cpu_to_le64(trans_pcie->txq[trans_pcie->cmd_queue]->dma_addr);
ctxt_info->hcmd_cfg.cmd_queue_size =
- TFD_QUEUE_CB_SIZE(trans_pcie->tx_cmd_queue_size);
+ TFD_QUEUE_CB_SIZE(TFD_CMD_SLOTS);
/* allocate ucode sections in dram and set addresses */
ret = iwl_pcie_ctxt_info_init_fw_sec(trans, fw, ctxt_info);
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
index ca3b64f..45ea327 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
@@ -383,7 +383,8 @@
* @hw_init_mask: initial unmasked hw causes
* @fh_mask: current unmasked fh causes
* @hw_mask: current unmasked hw causes
- * @tx_cmd_queue_size: the size of the tx command queue
+ * @in_rescan: true if we have triggered a device rescan
+ * @scheduled_for_removal: true if we have scheduled a device removal
*/
struct iwl_trans_pcie {
struct iwl_rxq *rxq;
@@ -466,6 +467,8 @@
u32 hw_mask;
cpumask_t affinity_mask[IWL_MAX_RX_HW_QUEUES];
u16 tx_cmd_queue_size;
+ bool in_rescan;
+ bool scheduled_for_removal;
};
static inline struct iwl_trans_pcie *
@@ -537,7 +540,6 @@
void iwl_trans_pcie_reclaim(struct iwl_trans *trans, int txq_id, int ssn,
struct sk_buff_head *skbs);
void iwl_trans_pcie_tx_reset(struct iwl_trans *trans);
-void iwl_pcie_set_tx_cmd_queue_size(struct iwl_trans *trans);
static inline u16 iwl_pcie_tfd_tb_get_len(struct iwl_trans *trans, void *_tfd,
u8 idx)
@@ -822,7 +824,7 @@
void iwl_trans_pcie_gen2_fw_alive(struct iwl_trans *trans, u32 scd_addr);
int iwl_trans_pcie_dyn_txq_alloc(struct iwl_trans *trans,
struct iwl_tx_queue_cfg_cmd *cmd,
- int cmd_id,
+ int cmd_id, int size,
unsigned int timeout);
void iwl_trans_pcie_dyn_txq_free(struct iwl_trans *trans, int queue);
int iwl_trans_pcie_gen2_tx(struct iwl_trans *trans, struct sk_buff *skb,
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
index f25ce3a..f772d70 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
@@ -3,6 +3,7 @@
* Copyright(c) 2003 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* Portions of this file are derived from the ipw3945 project, as well
* as portions of the ieee80211 subsystem header files.
@@ -201,7 +202,7 @@
IWL_DEBUG_INFO(trans, "Rx queue requesting wakeup, GP1 = 0x%x\n",
reg);
iwl_set_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
rxq->need_update = true;
return;
}
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans-gen2.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans-gen2.c
index cb40125..b8e8dac 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans-gen2.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans-gen2.c
@@ -6,6 +6,7 @@
* GPL LICENSE SUMMARY
*
* Copyright(c) 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -19,6 +20,7 @@
* BSD LICENSE
*
* Copyright(c) 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -92,7 +94,8 @@
* Set "initialization complete" bit to move adapter from
* D0U* --> D0A* (powered-up active) state.
*/
- iwl_set_bit(trans, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ iwl_set_bit(trans, CSR_GP_CNTRL,
+ BIT(trans->cfg->csr->flag_init_done));
/*
* Wait for clock stabilization; once stabilized, access to
@@ -100,8 +103,9 @@
* and accesses to uCode SRAM.
*/
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY, 25000);
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ 25000);
if (ret < 0) {
IWL_DEBUG_INFO(trans, "Failed to init the card\n");
return ret;
@@ -143,7 +147,8 @@
* Clear "initialization complete" bit to move adapter from
* D0A* (powered-up Active) --> D0U* (Uninitialized) state.
*/
- iwl_clear_bit(trans, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ iwl_clear_bit(trans, CSR_GP_CNTRL,
+ BIT(trans->cfg->csr->flag_init_done));
}
void _iwl_trans_pcie_gen2_stop_device(struct iwl_trans *trans, bool low_power)
@@ -187,7 +192,7 @@
/* Make sure (redundant) we've released our request to stay awake */
iwl_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
/* Stop the device, and put it in low power state */
iwl_pcie_gen2_apm_stop(trans, false);
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index f8a0234..6e9a9ec 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -8,6 +8,7 @@
* Copyright(c) 2007 - 2015 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
@@ -35,6 +36,7 @@
* Copyright(c) 2005 - 2015 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -73,6 +75,7 @@
#include <linux/gfp.h>
#include <linux/vmalloc.h>
#include <linux/pm_runtime.h>
+#include <linux/module.h>
#include "iwl-drv.h"
#include "iwl-trans.h"
@@ -179,7 +182,8 @@
static void iwl_trans_pcie_sw_reset(struct iwl_trans *trans)
{
/* Reset entire device - do controller reset (results in SHRD_HW_RST) */
- iwl_set_bit(trans, CSR_RESET, CSR_RESET_REG_FLAG_SW_RESET);
+ iwl_set_bit(trans, trans->cfg->csr->addr_sw_reset,
+ BIT(trans->cfg->csr->flag_sw_reset));
usleep_range(5000, 6000);
}
@@ -372,7 +376,8 @@
* Set "initialization complete" bit to move adapter from
* D0U* --> D0A* (powered-up active) state.
*/
- iwl_set_bit(trans, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ iwl_set_bit(trans, CSR_GP_CNTRL,
+ BIT(trans->cfg->csr->flag_init_done));
/*
* Wait for clock stabilization; once stabilized, access to
@@ -380,8 +385,9 @@
* and accesses to uCode SRAM.
*/
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY, 25000);
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ 25000);
if (ret < 0) {
IWL_ERR(trans, "Failed to init the card\n");
return ret;
@@ -459,15 +465,16 @@
* Set "initialization complete" bit to move adapter from
* D0U* --> D0A* (powered-up active) state.
*/
- iwl_set_bit(trans, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ iwl_set_bit(trans, CSR_GP_CNTRL,
+ BIT(trans->cfg->csr->flag_init_done));
/*
* Wait for clock stabilization; once stabilized, access to
* device-internal resources is possible.
*/
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
25000);
if (WARN_ON(ret < 0)) {
IWL_ERR(trans, "Access time out - failed to enable LP XTAL\n");
@@ -519,7 +526,7 @@
* D0A* (powered-up Active) --> D0U* (Uninitialized) state.
*/
iwl_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ BIT(trans->cfg->csr->flag_init_done));
/* Activates XTAL resources monitor */
__iwl_trans_pcie_set_bit(trans, CSR_MONITOR_CFG_REG,
@@ -541,11 +548,12 @@
int ret;
/* stop device's busmaster DMA activity */
- iwl_set_bit(trans, CSR_RESET, CSR_RESET_REG_FLAG_STOP_MASTER);
+ iwl_set_bit(trans, trans->cfg->csr->addr_sw_reset,
+ BIT(trans->cfg->csr->flag_stop_master));
- ret = iwl_poll_bit(trans, CSR_RESET,
- CSR_RESET_REG_FLAG_MASTER_DISABLED,
- CSR_RESET_REG_FLAG_MASTER_DISABLED, 100);
+ ret = iwl_poll_bit(trans, trans->cfg->csr->addr_sw_reset,
+ BIT(trans->cfg->csr->flag_master_dis),
+ BIT(trans->cfg->csr->flag_master_dis), 100);
if (ret < 0)
IWL_WARN(trans, "Master Disable Timed Out, 100 usec\n");
@@ -594,7 +602,7 @@
* D0A* (powered-up Active) --> D0U* (Uninitialized) state.
*/
iwl_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ BIT(trans->cfg->csr->flag_init_done));
}
static int iwl_pcie_nic_init(struct iwl_trans *trans)
@@ -1267,7 +1275,7 @@
/* Make sure (redundant) we've released our request to stay awake */
iwl_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
/* Stop the device, and put it in low power state */
iwl_pcie_apm_stop(trans, false);
@@ -1497,9 +1505,9 @@
iwl_pcie_synchronize_irqs(trans);
iwl_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
iwl_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ BIT(trans->cfg->csr->flag_init_done));
iwl_pcie_enable_rx_wake(trans, false);
@@ -1543,15 +1551,17 @@
iwl_pcie_reset_ict(trans);
iwl_enable_interrupts(trans);
- iwl_set_bit(trans, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
- iwl_set_bit(trans, CSR_GP_CNTRL, CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ iwl_set_bit(trans, CSR_GP_CNTRL,
+ BIT(trans->cfg->csr->flag_mac_access_req));
+ iwl_set_bit(trans, CSR_GP_CNTRL,
+ BIT(trans->cfg->csr->flag_init_done));
if (trans->cfg->device_family >= IWL_DEVICE_FAMILY_8000)
udelay(2);
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
25000);
if (ret < 0) {
IWL_ERR(trans, "Failed to resume the device (mac ready)\n");
@@ -1562,7 +1572,7 @@
if (!reset) {
iwl_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
} else {
iwl_trans_pcie_tx_reset(trans);
@@ -1926,6 +1936,29 @@
clear_bit(STATUS_TPOWER_PMI, &trans->status);
}
+struct iwl_trans_pcie_removal {
+ struct pci_dev *pdev;
+ struct work_struct work;
+};
+
+static void iwl_trans_pcie_removal_wk(struct work_struct *wk)
+{
+ struct iwl_trans_pcie_removal *removal =
+ container_of(wk, struct iwl_trans_pcie_removal, work);
+ struct pci_dev *pdev = removal->pdev;
+ char *prop[] = {"EVENT=INACCESSIBLE", NULL};
+
+ dev_err(&pdev->dev, "Device gone - attempting removal\n");
+ kobject_uevent_env(&pdev->dev.kobj, KOBJ_CHANGE, prop);
+ pci_lock_rescan_remove();
+ pci_dev_put(pdev);
+ pci_stop_and_remove_bus_device(pdev);
+ pci_unlock_rescan_remove();
+
+ kfree(removal);
+ module_put(THIS_MODULE);
+}
+
static bool iwl_trans_pcie_grab_nic_access(struct iwl_trans *trans,
unsigned long *flags)
{
@@ -1939,7 +1972,7 @@
/* this bit wakes up the NIC */
__iwl_trans_pcie_set_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
if (trans->cfg->device_family >= IWL_DEVICE_FAMILY_8000)
udelay(2);
@@ -1964,15 +1997,59 @@
* and do not save/restore SRAM when power cycling.
*/
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_VAL_MAC_ACCESS_EN,
- (CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY |
+ BIT(trans->cfg->csr->flag_val_mac_access_en),
+ (BIT(trans->cfg->csr->flag_mac_clock_ready) |
CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP), 15000);
if (unlikely(ret < 0)) {
- iwl_trans_pcie_dump_regs(trans);
- iwl_write32(trans, CSR_RESET, CSR_RESET_REG_FLAG_FORCE_NMI);
+ u32 cntrl = iwl_read32(trans, CSR_GP_CNTRL);
+
WARN_ONCE(1,
"Timeout waiting for hardware access (CSR_GP_CNTRL 0x%08x)\n",
- iwl_read32(trans, CSR_GP_CNTRL));
+ cntrl);
+
+ iwl_trans_pcie_dump_regs(trans);
+
+ if (iwlwifi_mod_params.remove_when_gone && cntrl == ~0U) {
+ struct iwl_trans_pcie_removal *removal;
+
+ if (trans_pcie->scheduled_for_removal)
+ goto err;
+
+ IWL_ERR(trans, "Device gone - scheduling removal!\n");
+
+ /*
+ * get a module reference to avoid doing this
+ * while unloading anyway and to avoid
+ * scheduling a work with code that's being
+ * removed.
+ */
+ if (!try_module_get(THIS_MODULE)) {
+ IWL_ERR(trans,
+ "Module is being unloaded - abort\n");
+ goto err;
+ }
+
+ removal = kzalloc(sizeof(*removal), GFP_ATOMIC);
+ if (!removal) {
+ module_put(THIS_MODULE);
+ goto err;
+ }
+ /*
+ * we don't need to clear this flag, because
+ * the trans will be freed and reallocated.
+ */
+ trans_pcie->scheduled_for_removal = true;
+
+ removal->pdev = to_pci_dev(trans->dev);
+ INIT_WORK(&removal->work, iwl_trans_pcie_removal_wk);
+ pci_dev_get(removal->pdev);
+ schedule_work(&removal->work);
+ } else {
+ iwl_write32(trans, CSR_RESET,
+ CSR_RESET_REG_FLAG_FORCE_NMI);
+ }
+
+err:
spin_unlock_irqrestore(&trans_pcie->reg_lock, *flags);
return false;
}
@@ -2003,7 +2080,7 @@
goto out;
__iwl_trans_pcie_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
/*
* Above we read the CSR_GP_CNTRL register, which will flush
* any previous writes, but we need the write that clears the
@@ -3232,12 +3309,12 @@
* id located at the AUX bus MISC address space.
*/
iwl_set_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_INIT_DONE);
+ BIT(trans->cfg->csr->flag_init_done));
udelay(2);
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
- CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY,
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
+ BIT(trans->cfg->csr->flag_mac_clock_ready),
25000);
if (ret < 0) {
IWL_DEBUG_INFO(trans, "Failed to wake up the nic\n");
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c b/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c
index fabae0f..48890a1 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c
@@ -488,6 +488,23 @@
spin_lock(&txq->lock);
+ if (iwl_queue_space(txq) < txq->high_mark) {
+ iwl_stop_queue(trans, txq);
+
+ /* don't put the packet on the ring, if there is no room */
+ if (unlikely(iwl_queue_space(txq) < 3)) {
+ struct iwl_device_cmd **dev_cmd_ptr;
+
+ dev_cmd_ptr = (void *)((u8 *)skb->cb +
+ trans_pcie->dev_cmd_offs);
+
+ *dev_cmd_ptr = dev_cmd;
+ __skb_queue_tail(&txq->overflow_q, skb);
+ spin_unlock(&txq->lock);
+ return 0;
+ }
+ }
+
idx = iwl_pcie_get_cmd_index(txq, txq->write_ptr);
/* Set up driver data for this TFD */
@@ -523,9 +540,6 @@
/* Tell device the write index *just past* this latest filled TFD */
txq->write_ptr = iwl_queue_inc_wrap(txq->write_ptr);
iwl_pcie_gen2_txq_inc_wr_ptr(trans, txq);
- if (iwl_queue_space(txq) < txq->high_mark)
- iwl_stop_queue(trans, txq);
-
/*
* At this point the frame is "transmitted" successfully
* and we will get a TX status notification eventually.
@@ -555,15 +569,13 @@
unsigned long flags;
void *dup_buf = NULL;
dma_addr_t phys_addr;
- int i, cmd_pos, idx = iwl_pcie_get_cmd_index(txq, txq->write_ptr);
+ int i, cmd_pos, idx;
u16 copy_size, cmd_size, tb0_size;
bool had_nocopy = false;
u8 group_id = iwl_cmd_groupid(cmd->id);
const u8 *cmddata[IWL_MAX_CMD_TBS_PER_TFD];
u16 cmdlen[IWL_MAX_CMD_TBS_PER_TFD];
- struct iwl_tfh_tfd *tfd = iwl_pcie_get_tfd(trans, txq, txq->write_ptr);
-
- memset(tfd, 0, sizeof(*tfd));
+ struct iwl_tfh_tfd *tfd;
copy_size = sizeof(struct iwl_cmd_header_wide);
cmd_size = sizeof(struct iwl_cmd_header_wide);
@@ -634,6 +646,10 @@
spin_lock_bh(&txq->lock);
+ idx = iwl_pcie_get_cmd_index(txq, txq->write_ptr);
+ tfd = iwl_pcie_get_tfd(trans, txq, txq->write_ptr);
+ memset(tfd, 0, sizeof(*tfd));
+
if (iwl_queue_space(txq) < ((cmd->flags & CMD_ASYNC) ? 2 : 1)) {
spin_unlock_bh(&txq->lock);
@@ -957,6 +973,13 @@
spin_unlock_irqrestore(&trans_pcie->reg_lock, flags);
}
}
+
+ while (!skb_queue_empty(&txq->overflow_q)) {
+ struct sk_buff *skb = __skb_dequeue(&txq->overflow_q);
+
+ iwl_op_mode_free_skb(trans->op_mode, skb);
+ }
+
spin_unlock_bh(&txq->lock);
/* just in case - this queue may have been stopped */
@@ -972,7 +995,7 @@
/* De-alloc circular buffer of TFDs */
if (txq->tfds) {
dma_free_coherent(dev,
- trans_pcie->tfd_size * TFD_QUEUE_SIZE_MAX,
+ trans_pcie->tfd_size * txq->n_window,
txq->tfds, txq->dma_addr);
dma_free_coherent(dev,
sizeof(*txq->first_tb_bufs) * txq->n_window,
@@ -1020,7 +1043,7 @@
int iwl_trans_pcie_dyn_txq_alloc(struct iwl_trans *trans,
struct iwl_tx_queue_cfg_cmd *cmd,
- int cmd_id,
+ int cmd_id, int size,
unsigned int timeout)
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
@@ -1046,12 +1069,12 @@
return -ENOMEM;
}
- ret = iwl_pcie_txq_alloc(trans, txq, TFD_TX_CMD_SLOTS, false);
+ ret = iwl_pcie_txq_alloc(trans, txq, size, false);
if (ret) {
IWL_ERR(trans, "Tx queue alloc failed\n");
goto error;
}
- ret = iwl_pcie_txq_init(trans, txq, TFD_TX_CMD_SLOTS, false);
+ ret = iwl_pcie_txq_init(trans, txq, size, false);
if (ret) {
IWL_ERR(trans, "Tx queue init failed\n");
goto error;
@@ -1061,7 +1084,7 @@
cmd->tfdq_addr = cpu_to_le64(txq->dma_addr);
cmd->byte_cnt_addr = cpu_to_le64(txq->bc_tbl.dma);
- cmd->cb_size = cpu_to_le32(TFD_QUEUE_CB_SIZE(TFD_TX_CMD_SLOTS));
+ cmd->cb_size = cpu_to_le32(TFD_QUEUE_CB_SIZE(size));
ret = iwl_trans_send_cmd(trans, &hcmd);
if (ret)
@@ -1152,8 +1175,6 @@
struct iwl_txq *cmd_queue;
int txq_id = trans_pcie->cmd_queue, ret;
- iwl_pcie_set_tx_cmd_queue_size(trans);
-
/* alloc and init the command queue */
if (!trans_pcie->txq[txq_id]) {
cmd_queue = kzalloc(sizeof(*cmd_queue), GFP_KERNEL);
@@ -1162,8 +1183,7 @@
return -ENOMEM;
}
trans_pcie->txq[txq_id] = cmd_queue;
- ret = iwl_pcie_txq_alloc(trans, cmd_queue,
- trans_pcie->tx_cmd_queue_size, true);
+ ret = iwl_pcie_txq_alloc(trans, cmd_queue, TFD_CMD_SLOTS, true);
if (ret) {
IWL_ERR(trans, "Tx %d queue init failed\n", txq_id);
goto error;
@@ -1172,8 +1192,7 @@
cmd_queue = trans_pcie->txq[txq_id];
}
- ret = iwl_pcie_txq_init(trans, cmd_queue,
- trans_pcie->tx_cmd_queue_size, true);
+ ret = iwl_pcie_txq_init(trans, cmd_queue, TFD_CMD_SLOTS, true);
if (ret) {
IWL_ERR(trans, "Tx %d queue alloc failed\n", txq_id);
goto error;
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
index 1a56628..473fe7c 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
@@ -3,6 +3,7 @@
* Copyright(c) 2003 - 2014 Intel Corporation. All rights reserved.
* Copyright(c) 2013 - 2015 Intel Mobile Communications GmbH
* Copyright(c) 2016 - 2017 Intel Deutschland GmbH
+ * Copyright(c) 2018 Intel Corporation
*
* Portions of this file are derived from the ipw3945 project, as well
* as portions of the ieee80211 subsystem header files.
@@ -273,7 +274,7 @@
IWL_DEBUG_INFO(trans, "Tx queue %d requesting wakeup, GP1 = 0x%x\n",
txq_id, reg);
iwl_set_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
txq->need_update = true;
return;
}
@@ -495,6 +496,9 @@
if (WARN_ON(txq->entries || txq->tfds))
return -EINVAL;
+ if (trans->cfg->use_tfh)
+ tfd_sz = trans_pcie->tfd_size * slots_num;
+
timer_setup(&txq->stuck_timer, iwl_pcie_txq_stuck_timer, 0);
txq->trans_pcie = trans_pcie;
@@ -608,7 +612,7 @@
trans_pcie->cmd_hold_nic_awake = false;
__iwl_trans_pcie_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(trans->cfg->csr->flag_mac_access_req));
}
/*
@@ -950,8 +954,7 @@
txq_id++) {
bool cmd_queue = (txq_id == trans_pcie->cmd_queue);
- slots_num = cmd_queue ? trans_pcie->tx_cmd_queue_size :
- TFD_TX_CMD_SLOTS;
+ slots_num = cmd_queue ? TFD_CMD_SLOTS : TFD_TX_CMD_SLOTS;
trans_pcie->txq[txq_id] = &trans_pcie->txq_memory[txq_id];
ret = iwl_pcie_txq_alloc(trans, trans_pcie->txq[txq_id],
slots_num, cmd_queue);
@@ -970,21 +973,6 @@
return ret;
}
-void iwl_pcie_set_tx_cmd_queue_size(struct iwl_trans *trans)
-{
- struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
- int queue_size = TFD_CMD_SLOTS;
-
- if (trans->cfg->tx_cmd_queue_size)
- queue_size = trans->cfg->tx_cmd_queue_size;
-
- if (WARN_ON(!(is_power_of_2(queue_size) &&
- TFD_QUEUE_CB_SIZE(queue_size) > 0)))
- trans_pcie->tx_cmd_queue_size = TFD_CMD_SLOTS;
- else
- trans_pcie->tx_cmd_queue_size = queue_size;
-}
-
int iwl_pcie_tx_init(struct iwl_trans *trans)
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
@@ -992,8 +980,6 @@
int txq_id, slots_num;
bool alloc = false;
- iwl_pcie_set_tx_cmd_queue_size(trans);
-
if (!trans_pcie->txq_memory) {
ret = iwl_pcie_tx_alloc(trans);
if (ret)
@@ -1017,8 +1003,7 @@
txq_id++) {
bool cmd_queue = (txq_id == trans_pcie->cmd_queue);
- slots_num = cmd_queue ? trans_pcie->tx_cmd_queue_size :
- TFD_TX_CMD_SLOTS;
+ slots_num = cmd_queue ? TFD_CMD_SLOTS : TFD_TX_CMD_SLOTS;
ret = iwl_pcie_txq_init(trans, trans_pcie->txq[txq_id],
slots_num, cmd_queue);
if (ret) {
@@ -1166,7 +1151,7 @@
* In that case, iwl_queue_space will be small again
* and we won't wake mac80211's queue.
*/
- iwl_trans_pcie_tx(trans, skb, dev_cmd_ptr, txq_id);
+ iwl_trans_tx(trans, skb, dev_cmd_ptr, txq_id);
}
spin_lock_bh(&txq->lock);
@@ -1187,6 +1172,7 @@
const struct iwl_host_cmd *cmd)
{
struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
+ const struct iwl_cfg *cfg = trans->cfg;
int ret;
lockdep_assert_held(&trans_pcie->reg_lock);
@@ -1204,19 +1190,19 @@
* returned. This needs to be done only on NICs that have
* apmg_wake_up_wa set.
*/
- if (trans->cfg->base_params->apmg_wake_up_wa &&
+ if (cfg->base_params->apmg_wake_up_wa &&
!trans_pcie->cmd_hold_nic_awake) {
__iwl_trans_pcie_set_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(cfg->csr->flag_mac_access_req));
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_VAL_MAC_ACCESS_EN,
- (CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY |
+ BIT(cfg->csr->flag_val_mac_access_en),
+ (BIT(cfg->csr->flag_mac_clock_ready) |
CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP),
15000);
if (ret < 0) {
__iwl_trans_pcie_clear_bit(trans, CSR_GP_CNTRL,
- CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
+ BIT(cfg->csr->flag_mac_access_req));
IWL_ERR(trans, "Failed to wake NIC for hcmd\n");
return -EIO;
}
@@ -2411,7 +2397,13 @@
goto out_err;
iwl_pcie_txq_build_tfd(trans, txq, tb1_phys, tb1_len, false);
- if (amsdu) {
+ /*
+ * If gso_size wasn't set, don't give the frame "amsdu treatment"
+ * (adding subframes, etc.).
+ * This can happen in some testing flows when the amsdu was already
+ * pre-built, and we just need to send the resulting skb.
+ */
+ if (amsdu && skb_shinfo(skb)->gso_size) {
if (unlikely(iwl_fill_data_tbs_amsdu(trans, skb, txq, hdr_len,
out_meta, dev_cmd,
tb1_len)))
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 920c23e..89fc225 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -2650,6 +2650,7 @@
ieee80211_hw_set(hw, AMPDU_AGGREGATION);
ieee80211_hw_set(hw, MFP_CAPABLE);
ieee80211_hw_set(hw, SIGNAL_DBM);
+ ieee80211_hw_set(hw, SUPPORTS_PS);
ieee80211_hw_set(hw, TDLS_WIDER_BW);
if (rctbl)
ieee80211_hw_set(hw, SUPPORTS_RC_TABLE);
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 7f7e9de..54a2297 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -929,7 +929,7 @@
adapter->rx_locked = false;
spin_unlock_irqrestore(&adapter->rx_proc_lock, flags);
- mwifiex_set_mac_address(priv, dev);
+ mwifiex_set_mac_address(priv, dev, false, NULL);
return 0;
}
@@ -1979,7 +1979,8 @@
bss_cfg->bcast_ssid_ctl = 0;
break;
case NL80211_HIDDEN_SSID_ZERO_CONTENTS:
- /* firmware doesn't support this type of hidden SSID */
+ bss_cfg->bcast_ssid_ctl = 2;
+ break;
default:
kfree(bss_cfg);
return -EINVAL;
@@ -2978,7 +2979,7 @@
priv->netdev = dev;
if (!adapter->mfg_mode) {
- mwifiex_set_mac_address(priv, dev);
+ mwifiex_set_mac_address(priv, dev, false, NULL);
ret = mwifiex_send_cmd(priv, HostCmd_CMD_SET_BSS_MODE,
HostCmd_ACT_GEN_SET, 0, NULL, true);
diff --git a/drivers/net/wireless/marvell/mwifiex/cmdevt.c b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
index 7014f44..9cfcdf6 100644
--- a/drivers/net/wireless/marvell/mwifiex/cmdevt.c
+++ b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
@@ -25,7 +25,6 @@
#include "main.h"
#include "wmm.h"
#include "11n.h"
-#include "11ac.h"
static void mwifiex_cancel_pending_ioctl(struct mwifiex_adapter *adapter);
diff --git a/drivers/net/wireless/marvell/mwifiex/ie.c b/drivers/net/wireless/marvell/mwifiex/ie.c
index 922e3d6..b10baac 100644
--- a/drivers/net/wireless/marvell/mwifiex/ie.c
+++ b/drivers/net/wireless/marvell/mwifiex/ie.c
@@ -349,6 +349,7 @@
case WLAN_EID_SUPP_RATES:
case WLAN_EID_COUNTRY:
case WLAN_EID_PWR_CONSTRAINT:
+ case WLAN_EID_ERP_INFO:
case WLAN_EID_EXT_SUPP_RATES:
case WLAN_EID_HT_CAPABILITY:
case WLAN_EID_HT_OPERATION:
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index b648458..510f6b8 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -858,7 +858,7 @@
/*
* CFG802.11 network device handler for data transmission.
*/
-static int
+static netdev_tx_t
mwifiex_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev);
@@ -940,29 +940,33 @@
}
int mwifiex_set_mac_address(struct mwifiex_private *priv,
- struct net_device *dev)
+ struct net_device *dev, bool external,
+ u8 *new_mac)
{
int ret;
u64 mac_addr, old_mac_addr;
- if (priv->bss_type == MWIFIEX_BSS_TYPE_ANY)
- return -ENOTSUPP;
+ old_mac_addr = ether_addr_to_u64(priv->curr_addr);
- mac_addr = ether_addr_to_u64(priv->curr_addr);
- old_mac_addr = mac_addr;
+ if (external) {
+ mac_addr = ether_addr_to_u64(new_mac);
+ } else {
+ /* Internal mac address change */
+ if (priv->bss_type == MWIFIEX_BSS_TYPE_ANY)
+ return -ENOTSUPP;
- if (priv->bss_type == MWIFIEX_BSS_TYPE_P2P)
- mac_addr |= BIT_ULL(MWIFIEX_MAC_LOCAL_ADMIN_BIT);
+ mac_addr = old_mac_addr;
- if (mwifiex_get_intf_num(priv->adapter, priv->bss_type) > 1) {
- /* Set mac address based on bss_type/bss_num */
- mac_addr ^= BIT_ULL(priv->bss_type + 8);
- mac_addr += priv->bss_num;
+ if (priv->bss_type == MWIFIEX_BSS_TYPE_P2P)
+ mac_addr |= BIT_ULL(MWIFIEX_MAC_LOCAL_ADMIN_BIT);
+
+ if (mwifiex_get_intf_num(priv->adapter, priv->bss_type) > 1) {
+ /* Set mac address based on bss_type/bss_num */
+ mac_addr ^= BIT_ULL(priv->bss_type + 8);
+ mac_addr += priv->bss_num;
+ }
}
- if (mac_addr == old_mac_addr)
- goto done;
-
u64_to_ether_addr(mac_addr, priv->curr_addr);
/* Send request to firmware */
@@ -976,7 +980,6 @@
return ret;
}
-done:
ether_addr_copy(dev->dev_addr, priv->curr_addr);
return 0;
}
@@ -989,8 +992,7 @@
struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev);
struct sockaddr *hw_addr = addr;
- memcpy(priv->curr_addr, hw_addr->sa_data, ETH_ALEN);
- return mwifiex_set_mac_address(priv, dev);
+ return mwifiex_set_mac_address(priv, dev, true, hw_addr->sa_data);
}
/*
@@ -1331,7 +1333,10 @@
priv->assocresp_idx = MWIFIEX_AUTO_IDX_MASK;
priv->gen_idx = MWIFIEX_AUTO_IDX_MASK;
priv->num_tx_timeout = 0;
- ether_addr_copy(priv->curr_addr, priv->adapter->perm_addr);
+ if (is_valid_ether_addr(dev->dev_addr))
+ ether_addr_copy(priv->curr_addr, dev->dev_addr);
+ else
+ ether_addr_copy(priv->curr_addr, priv->adapter->perm_addr);
if (GET_BSS_ROLE(priv) == MWIFIEX_BSS_ROLE_STA ||
GET_BSS_ROLE(priv) == MWIFIEX_BSS_ROLE_UAP) {
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 9bde181..8ae74ed 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -84,8 +84,8 @@
#define MWIFIEX_TIMER_10S 10000
#define MWIFIEX_TIMER_1S 1000
-#define MAX_TX_PENDING 100
-#define LOW_TX_PENDING 80
+#define MAX_TX_PENDING 400
+#define LOW_TX_PENDING 380
#define HIGH_RX_PENDING 50
#define LOW_RX_PENDING 20
@@ -1709,7 +1709,8 @@
struct sk_buff *event_skb);
void mwifiex_multi_chan_resync(struct mwifiex_adapter *adapter);
int mwifiex_set_mac_address(struct mwifiex_private *priv,
- struct net_device *dev);
+ struct net_device *dev,
+ bool external, u8 *new_mac);
void mwifiex_devdump_tmo_func(unsigned long function_context);
#ifdef CONFIG_DEBUG_FS
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 97a6199..7538543 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -1881,7 +1881,8 @@
mwifiex_dbg(adapter, EVENT,
"info: Event length: %d\n", evt_len);
- if ((evt_len > 0) && (evt_len < MAX_EVENT_SIZE))
+ if (evt_len > MWIFIEX_EVENT_HEADER_LEN &&
+ evt_len < MAX_EVENT_SIZE)
memcpy(adapter->event_body, skb_cmd->data +
MWIFIEX_EVENT_HEADER_LEN, evt_len -
MWIFIEX_EVENT_HEADER_LEN);
diff --git a/drivers/net/wireless/marvell/mwifiex/uap_event.c b/drivers/net/wireless/marvell/mwifiex/uap_event.c
index e8c8728..e86217a 100644
--- a/drivers/net/wireless/marvell/mwifiex/uap_event.c
+++ b/drivers/net/wireless/marvell/mwifiex/uap_event.c
@@ -108,7 +108,7 @@
struct mwifiex_adapter *adapter = priv->adapter;
int len, i;
u32 eventcause = adapter->event_cause;
- struct station_info sinfo;
+ struct station_info *sinfo;
struct mwifiex_assoc_event *event;
struct mwifiex_sta_node *node;
u8 *deauth_mac;
@@ -117,7 +117,10 @@
switch (eventcause) {
case EVENT_UAP_STA_ASSOC:
- memset(&sinfo, 0, sizeof(sinfo));
+ sinfo = kzalloc(sizeof(*sinfo), GFP_KERNEL);
+ if (!sinfo)
+ return -ENOMEM;
+
event = (struct mwifiex_assoc_event *)
(adapter->event_body + MWIFIEX_UAP_EVENT_EXTRA_HEADER);
if (le16_to_cpu(event->type) == TLV_TYPE_UAP_MGMT_FRAME) {
@@ -132,28 +135,31 @@
len = ETH_ALEN;
if (len != -1) {
- sinfo.assoc_req_ies = &event->data[len];
- len = (u8 *)sinfo.assoc_req_ies -
+ sinfo->assoc_req_ies = &event->data[len];
+ len = (u8 *)sinfo->assoc_req_ies -
(u8 *)&event->frame_control;
- sinfo.assoc_req_ies_len =
+ sinfo->assoc_req_ies_len =
le16_to_cpu(event->len) - (u16)len;
}
}
- cfg80211_new_sta(priv->netdev, event->sta_addr, &sinfo,
+ cfg80211_new_sta(priv->netdev, event->sta_addr, sinfo,
GFP_KERNEL);
node = mwifiex_add_sta_entry(priv, event->sta_addr);
if (!node) {
mwifiex_dbg(adapter, ERROR,
"could not create station entry!\n");
+ kfree(sinfo);
return -1;
}
- if (!priv->ap_11n_enabled)
+ if (!priv->ap_11n_enabled) {
+ kfree(sinfo);
break;
+ }
- mwifiex_set_sta_ht_cap(priv, sinfo.assoc_req_ies,
- sinfo.assoc_req_ies_len, node);
+ mwifiex_set_sta_ht_cap(priv, sinfo->assoc_req_ies,
+ sinfo->assoc_req_ies_len, node);
for (i = 0; i < MAX_NUM_TID; i++) {
if (node->is_11n_enabled)
@@ -163,6 +169,7 @@
node->ampdu_sta[i] = BA_STREAM_NOT_ALLOWED;
}
memset(node->rx_seq, 0xff, sizeof(node->rx_seq));
+ kfree(sinfo);
break;
case EVENT_UAP_STA_DEAUTH:
deauth_mac = adapter->event_body +
diff --git a/drivers/net/wireless/mediatek/mt76/agg-rx.c b/drivers/net/wireless/mediatek/mt76/agg-rx.c
index fcb208d..b67acc6 100644
--- a/drivers/net/wireless/mediatek/mt76/agg-rx.c
+++ b/drivers/net/wireless/mediatek/mt76/agg-rx.c
@@ -103,6 +103,7 @@
__skb_queue_head_init(&frames);
local_bh_disable();
+ rcu_read_lock();
spin_lock(&tid->lock);
mt76_rx_aggr_check_release(tid, &frames);
@@ -114,6 +115,7 @@
REORDER_TIMEOUT);
mt76_rx_complete(dev, &frames, -1);
+ rcu_read_unlock();
local_bh_enable();
}
@@ -147,12 +149,13 @@
void mt76_rx_aggr_reorder(struct sk_buff *skb, struct sk_buff_head *frames)
{
struct mt76_rx_status *status = (struct mt76_rx_status *) skb->cb;
+ struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
struct mt76_wcid *wcid = status->wcid;
struct ieee80211_sta *sta;
struct mt76_rx_tid *tid;
bool sn_less;
u16 seqno, head, size;
- u8 idx;
+ u8 ackp, idx;
__skb_queue_tail(frames, skb);
@@ -165,10 +168,17 @@
return;
}
+ /* not part of a BA session */
+ ackp = *ieee80211_get_qos_ctl(hdr) & IEEE80211_QOS_CTL_ACK_POLICY_MASK;
+ if (ackp != IEEE80211_QOS_CTL_ACK_POLICY_BLOCKACK &&
+ ackp != IEEE80211_QOS_CTL_ACK_POLICY_NORMAL)
+ return;
+
tid = rcu_dereference(wcid->aggr[status->tid]);
if (!tid)
return;
+ status->flag |= RX_FLAG_DUP_VALIDATED;
spin_lock_bh(&tid->lock);
if (tid->stopped)
@@ -258,6 +268,8 @@
u8 size = tid->size;
int i;
+ cancel_delayed_work(&tid->reorder_work);
+
spin_lock_bh(&tid->lock);
tid->stopped = true;
@@ -272,8 +284,6 @@
}
spin_unlock_bh(&tid->lock);
-
- cancel_delayed_work_sync(&tid->reorder_work);
}
void mt76_rx_aggr_stop(struct mt76_dev *dev, struct mt76_wcid *wcid, u8 tidno)
diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index 4f30cdc..915e617 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -213,7 +213,8 @@
vht_cap->vht_supported = true;
vht_cap->cap |= IEEE80211_VHT_CAP_RXLDPC |
IEEE80211_VHT_CAP_RXSTBC_1 |
- IEEE80211_VHT_CAP_SHORT_GI_80;
+ IEEE80211_VHT_CAP_SHORT_GI_80 |
+ (3 << IEEE80211_VHT_CAP_MAX_A_MPDU_LENGTH_EXPONENT_SHIFT);
return 0;
}
@@ -541,15 +542,13 @@
if (!!test_bit(MT_WCID_FLAG_PS, &wcid->flags) == ps)
return;
- if (ps) {
+ if (ps)
set_bit(MT_WCID_FLAG_PS, &wcid->flags);
- mt76_stop_tx_queues(dev, sta, true);
- } else {
+ else
clear_bit(MT_WCID_FLAG_PS, &wcid->flags);
- }
- ieee80211_sta_ps_transition(sta, ps);
dev->drv->sta_ps(dev, sta, ps);
+ ieee80211_sta_ps_transition(sta, ps);
}
void mt76_rx_complete(struct mt76_dev *dev, struct sk_buff_head *frames,
@@ -562,6 +561,7 @@
if (queue >= 0)
napi = &dev->napi[queue];
+ spin_lock(&dev->rx_lock);
while ((skb = __skb_dequeue(frames)) != NULL) {
if (mt76_check_ccmp_pn(skb)) {
dev_kfree_skb(skb);
@@ -571,6 +571,7 @@
sta = mt76_rx_convert(skb);
ieee80211_rx_napi(dev->hw, sta, skb, napi);
}
+ spin_unlock(&dev->rx_lock);
}
void mt76_rx_poll_complete(struct mt76_dev *dev, enum mt76_rxq_id q)
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
index 065ff78..a74e6ee 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76.h
@@ -241,6 +241,7 @@
struct device *dev;
struct net_device napi_dev;
+ spinlock_t rx_lock;
struct napi_struct napi[__MT_RXQ_MAX];
struct sk_buff_head rx_skb[__MT_RXQ_MAX];
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2.h b/drivers/net/wireless/mediatek/mt76/mt76x2.h
index 783b812..a5d1255 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2.h
@@ -117,7 +117,6 @@
u8 beacon_mask;
u8 beacon_data_mask;
- u32 rev;
u32 rxfilter;
u16 chainmask;
@@ -151,7 +150,7 @@
static inline bool is_mt7612(struct mt76x2_dev *dev)
{
- return (dev->rev >> 16) == 0x7612;
+ return mt76_chip(&dev->mt76) == 0x7612;
}
void mt76x2_set_irq_mask(struct mt76x2_dev *dev, u32 clear, u32 set);
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_dma.h b/drivers/net/wireless/mediatek/mt76/mt76x2_dma.h
index 47f79d8..e9d426b 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_dma.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_dma.h
@@ -19,7 +19,7 @@
#include "dma.h"
-#define MT_TXD_INFO_LEN GENMASK(13, 0)
+#define MT_TXD_INFO_LEN GENMASK(15, 0)
#define MT_TXD_INFO_NEXT_VLD BIT(16)
#define MT_TXD_INFO_TX_BURST BIT(17)
#define MT_TXD_INFO_80211 BIT(19)
@@ -27,9 +27,8 @@
#define MT_TXD_INFO_CSO BIT(21)
#define MT_TXD_INFO_WIV BIT(24)
#define MT_TXD_INFO_QSEL GENMASK(26, 25)
-#define MT_TXD_INFO_TCO BIT(29)
-#define MT_TXD_INFO_UCO BIT(30)
-#define MT_TXD_INFO_ICO BIT(31)
+#define MT_TXD_INFO_DPORT GENMASK(29, 27)
+#define MT_TXD_INFO_TYPE GENMASK(31, 30)
#define MT_RX_FCE_INFO_LEN GENMASK(13, 0)
#define MT_RX_FCE_INFO_SELF_GEN BIT(15)
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.c b/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.c
index 5bb5002..95d5f7d 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.c
@@ -609,17 +609,13 @@
memset(t, 0, sizeof(*t));
- val = mt76x2_eeprom_get(dev, MT_EE_NIC_CONF_1);
- if (!(val & MT_EE_NIC_CONF_1_TEMP_TX_ALC))
+ if (!mt76x2_temp_tx_alc_enabled(dev))
return -EINVAL;
if (!mt76x2_ext_pa_enabled(dev, band))
return -EINVAL;
val = mt76x2_eeprom_get(dev, MT_EE_TX_POWER_EXT_PA_5G) >> 8;
- if (!(val & BIT(7)))
- return -EINVAL;
-
t->temp_25_ref = val & 0x7f;
if (band == NL80211_BAND_5GHZ) {
slope = mt76x2_eeprom_get(dev, MT_EE_RF_TEMP_COMP_SLOPE_5G);
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.h b/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.h
index d791227..aa0b0c0 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_eeprom.h
@@ -159,6 +159,12 @@
static inline bool
mt76x2_temp_tx_alc_enabled(struct mt76x2_dev *dev)
{
+ u16 val;
+
+ val = mt76x2_eeprom_get(dev, MT_EE_TX_POWER_EXT_PA_5G);
+ if (!(val & BIT(15)))
+ return false;
+
return mt76x2_eeprom_get(dev, MT_EE_NIC_CONF_1) &
MT_EE_NIC_CONF_1_TEMP_TX_ALC;
}
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_init.c b/drivers/net/wireless/mediatek/mt76/mt76x2_init.c
index 934c331..dd4c112 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_init.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_init.c
@@ -228,7 +228,7 @@
mt76_wr(dev, MT_BCN_OFFSET(i), regs[i]);
}
-int mt76x2_mac_reset(struct mt76x2_dev *dev, bool hard)
+static int mt76x2_mac_reset(struct mt76x2_dev *dev, bool hard)
{
static const u8 null_addr[ETH_ALEN] = {};
const u8 *macaddr = dev->mt76.macaddr;
@@ -370,12 +370,12 @@
/* Wait for MAC to become idle */
for (i = 0; i < 300; i++) {
- if (mt76_rr(dev, MT_MAC_STATUS) &
- (MT_MAC_STATUS_RX | MT_MAC_STATUS_TX))
+ if ((mt76_rr(dev, MT_MAC_STATUS) &
+ (MT_MAC_STATUS_RX | MT_MAC_STATUS_TX)) ||
+ mt76_rr(dev, MT_BBP(IBI, 12))) {
+ usleep_range(10, 20);
continue;
-
- if (mt76_rr(dev, MT_BBP(IBI, 12)))
- continue;
+ }
stopped = true;
break;
@@ -645,6 +645,7 @@
dev->mt76.drv = &drv_ops;
mutex_init(&dev->mutex);
spin_lock_init(&dev->irq_lock);
+ spin_lock_init(&dev->mt76.rx_lock);
return dev;
}
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_mac.c b/drivers/net/wireless/mediatek/mt76/mt76x2_mac.c
index d183156..dab7137 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_mac.c
@@ -410,7 +410,6 @@
break;
default:
return -EINVAL;
- break;
}
if (rate & MT_RXWI_RATE_SGI)
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_mac.h b/drivers/net/wireless/mediatek/mt76/mt76x2_mac.h
index 8a8a25e..c048cd0 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_mac.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_mac.h
@@ -157,7 +157,6 @@
return (void *) info->status.status_driver_data;
}
-int mt76x2_mac_reset(struct mt76x2_dev *dev, bool hard);
int mt76x2_mac_start(struct mt76x2_dev *dev);
void mt76x2_mac_stop(struct mt76x2_dev *dev, bool force);
void mt76x2_mac_resume(struct mt76x2_dev *dev);
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_main.c b/drivers/net/wireless/mediatek/mt76/mt76x2_main.c
index 73c127f..81c58f8 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_main.c
@@ -321,6 +321,7 @@
struct mt76x2_dev *dev = container_of(mdev, struct mt76x2_dev, mt76);
int idx = msta->wcid.idx;
+ mt76_stop_tx_queues(&dev->mt76, sta, true);
mt76x2_mac_wcid_set_drop(dev, idx, ps);
}
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2_phy.c b/drivers/net/wireless/mediatek/mt76/mt76x2_phy.c
index fcc37eb..c1c38ca 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2_phy.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2_phy.c
@@ -73,16 +73,6 @@
return rssi;
}
-static u8
-mt76x2_txpower_check(int value)
-{
- if (value < 0)
- return 0;
- if (value > 0x2f)
- return 0x2f;
- return value;
-}
-
static void
mt76x2_add_rate_power_offset(struct mt76_rate_power *r, int offset)
{
@@ -102,6 +92,26 @@
r->all[i] = limit;
}
+static int
+mt76x2_get_min_rate_power(struct mt76_rate_power *r)
+{
+ int i;
+ s8 ret = 0;
+
+ for (i = 0; i < sizeof(r->all); i++) {
+ if (!r->all[i])
+ continue;
+
+ if (ret)
+ ret = min(ret, r->all[i]);
+ else
+ ret = r->all[i];
+ }
+
+ return ret;
+}
+
+
void mt76x2_phy_set_txpower(struct mt76x2_dev *dev)
{
enum nl80211_chan_width width = dev->mt76.chandef.width;
@@ -109,6 +119,7 @@
struct mt76x2_tx_power_info txp;
int txp_0, txp_1, delta = 0;
struct mt76_rate_power t = {};
+ int base_power, gain;
mt76x2_get_power_info(dev, &txp, chan);
@@ -117,27 +128,33 @@
else if (width == NL80211_CHAN_WIDTH_80)
delta = txp.delta_bw80;
- if (txp.target_power > dev->txpower_conf)
- delta -= txp.target_power - dev->txpower_conf;
-
mt76x2_get_rate_power(dev, &t, chan);
- mt76x2_add_rate_power_offset(&t, txp.chain[0].target_power +
- txp.chain[0].delta);
+ mt76x2_add_rate_power_offset(&t, txp.chain[0].target_power);
mt76x2_limit_rate_power(&t, dev->txpower_conf);
dev->txpower_cur = mt76x2_get_max_rate_power(&t);
- mt76x2_add_rate_power_offset(&t, -(txp.chain[0].target_power +
- txp.chain[0].delta + delta));
+
+ base_power = mt76x2_get_min_rate_power(&t);
+ delta += base_power - txp.chain[0].target_power;
+ txp_0 = txp.chain[0].target_power + txp.chain[0].delta + delta;
+ txp_1 = txp.chain[1].target_power + txp.chain[1].delta + delta;
+
+ gain = min(txp_0, txp_1);
+ if (gain < 0) {
+ base_power -= gain;
+ txp_0 -= gain;
+ txp_1 -= gain;
+ } else if (gain > 0x2f) {
+ base_power -= gain - 0x2f;
+ txp_0 = 0x2f;
+ txp_1 = 0x2f;
+ }
+
+ mt76x2_add_rate_power_offset(&t, -base_power);
dev->target_power = txp.chain[0].target_power;
- dev->target_power_delta[0] = txp.chain[0].delta + delta;
- dev->target_power_delta[1] = txp.chain[1].delta + delta;
+ dev->target_power_delta[0] = txp_0 - txp.chain[0].target_power;
+ dev->target_power_delta[1] = txp_1 - txp.chain[0].target_power;
dev->rate_power = t;
- txp_0 = mt76x2_txpower_check(txp.chain[0].target_power +
- txp.chain[0].delta + delta);
-
- txp_1 = mt76x2_txpower_check(txp.chain[1].target_power +
- txp.chain[1].delta + delta);
-
mt76_rmw_field(dev, MT_TX_ALC_CFG_0, MT_TX_ALC_CFG_0_CH_INIT_0, txp_0);
mt76_rmw_field(dev, MT_TX_ALC_CFG_0, MT_TX_ALC_CFG_0_CH_INIT_1, txp_1);
@@ -180,7 +197,7 @@
if (mt76x2_channel_silent(dev))
return false;
- if (chan->band == NL80211_BAND_2GHZ)
+ if (chan->band == NL80211_BAND_5GHZ)
flag |= BIT(0);
if (mt76x2_ext_pa_enabled(dev, chan->band))
@@ -257,7 +274,6 @@
mt76_wr(dev, MT_TX_ALC_CFG_2, 0x1b0f0400);
mt76_wr(dev, MT_TX_ALC_CFG_3, 0x1b0f0476);
}
- mt76_wr(dev, MT_TX_ALC_CFG_4, 0);
if (mt76x2_ext_pa_enabled(dev, band))
pa_mode_adj = 0x04000000;
@@ -492,8 +508,10 @@
u8 gain_delta;
int low_gain;
- dev->cal.avg_rssi[0] = (dev->cal.avg_rssi[0] * 15) / 16 + (rssi0 << 8);
- dev->cal.avg_rssi[1] = (dev->cal.avg_rssi[1] * 15) / 16 + (rssi1 << 8);
+ dev->cal.avg_rssi[0] = (dev->cal.avg_rssi[0] * 15) / 16 +
+ (rssi0 << 8) / 16;
+ dev->cal.avg_rssi[1] = (dev->cal.avg_rssi[1] * 15) / 16 +
+ (rssi1 << 8) / 16;
dev->cal.avg_rssi_all = (dev->cal.avg_rssi[0] +
dev->cal.avg_rssi[1]) / 512;
@@ -661,6 +679,14 @@
memcpy(dev->cal.agc_gain_cur, dev->cal.agc_gain_init,
sizeof(dev->cal.agc_gain_cur));
+ /* init default values for temp compensation */
+ if (mt76x2_tssi_enabled(dev)) {
+ mt76_rmw_field(dev, MT_TX_ALC_CFG_1, MT_TX_ALC_CFG_1_TEMP_COMP,
+ 0x38);
+ mt76_rmw_field(dev, MT_TX_ALC_CFG_2, MT_TX_ALC_CFG_2_TEMP_COMP,
+ 0x38);
+ }
+
ieee80211_queue_delayed_work(mt76_hw(dev), &dev->cal_work,
MT_CALIBRATE_INTERVAL);
diff --git a/drivers/net/wireless/mediatek/mt76/tx.c b/drivers/net/wireless/mediatek/mt76/tx.c
index 4eef69b..7ecd2d7 100644
--- a/drivers/net/wireless/mediatek/mt76/tx.c
+++ b/drivers/net/wireless/mediatek/mt76/tx.c
@@ -385,6 +385,10 @@
bool empty = false;
int cur;
+ if (test_bit(MT76_SCANNING, &dev->state) ||
+ test_bit(MT76_RESET, &dev->state))
+ return -EBUSY;
+
mtxq = list_first_entry(&hwq->swq, struct mt76_txq, list);
if (mtxq->send_bar && mtxq->aggr) {
struct ieee80211_txq *txq = mtxq_to_txq(mtxq);
@@ -422,12 +426,14 @@
{
int len;
+ rcu_read_lock();
do {
if (hwq->swq_queued >= 4 || list_empty(&hwq->swq))
break;
len = mt76_txq_schedule_list(dev, hwq);
} while (len > 0);
+ rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(mt76_txq_schedule);
diff --git a/drivers/net/wireless/mediatek/mt7601u/mac.c b/drivers/net/wireless/mediatek/mt7601u/mac.c
index d55d704..148c36d 100644
--- a/drivers/net/wireless/mediatek/mt7601u/mac.c
+++ b/drivers/net/wireless/mediatek/mt7601u/mac.c
@@ -453,7 +453,7 @@
{
dev->bcn_freq_off = rxwi->freq_off;
dev->bcn_phy_mode = FIELD_GET(MT_RXWI_RATE_PHY, rate);
- dev->avg_rssi = (dev->avg_rssi * 15) / 16 + (rssi << 8);
+ ewma_rssi_add(&dev->avg_rssi, -rssi);
}
static int
@@ -503,7 +503,7 @@
if (mt7601u_rx_is_our_beacon(dev, data))
mt7601u_rx_monitor_beacon(dev, rxwi, rate, rssi);
else if (rxwi->rxinfo & cpu_to_le32(MT_RXINFO_U2M))
- dev->avg_rssi = (dev->avg_rssi * 15) / 16 + (rssi << 8);
+ ewma_rssi_add(&dev->avg_rssi, -rssi);
spin_unlock_bh(&dev->con_mon_lock);
return len;
diff --git a/drivers/net/wireless/mediatek/mt7601u/main.c b/drivers/net/wireless/mediatek/mt7601u/main.c
index 3c9ea40..7b21016 100644
--- a/drivers/net/wireless/mediatek/mt7601u/main.c
+++ b/drivers/net/wireless/mediatek/mt7601u/main.c
@@ -288,6 +288,12 @@
mt7601u_agc_restore(dev);
clear_bit(MT7601U_STATE_SCANNING, &dev->state);
+
+ ieee80211_queue_delayed_work(dev->hw, &dev->cal_work,
+ MT_CALIBRATE_INTERVAL);
+ if (dev->freq_cal.enabled)
+ ieee80211_queue_delayed_work(dev->hw, &dev->freq_cal.work,
+ MT_FREQ_CAL_INIT_DELAY);
}
static int
diff --git a/drivers/net/wireless/mediatek/mt7601u/mt7601u.h b/drivers/net/wireless/mediatek/mt7601u/mt7601u.h
index 9233744..db317d8 100644
--- a/drivers/net/wireless/mediatek/mt7601u/mt7601u.h
+++ b/drivers/net/wireless/mediatek/mt7601u/mt7601u.h
@@ -23,6 +23,7 @@
#include <linux/completion.h>
#include <net/mac80211.h>
#include <linux/debugfs.h>
+#include <linux/average.h>
#include "regs.h"
@@ -138,6 +139,8 @@
MT7601U_STATE_MORE_STATS,
};
+DECLARE_EWMA(rssi, 10, 4);
+
/**
* struct mt7601u_dev - adapter structure
* @lock: protects @wcid->tx_rate.
@@ -220,7 +223,7 @@
s8 bcn_freq_off;
u8 bcn_phy_mode;
- int avg_rssi; /* starts at 0 and converges */
+ struct ewma_rssi avg_rssi;
u8 agc_save;
diff --git a/drivers/net/wireless/mediatek/mt7601u/phy.c b/drivers/net/wireless/mediatek/mt7601u/phy.c
index ca09a5d..9d2f9a7 100644
--- a/drivers/net/wireless/mediatek/mt7601u/phy.c
+++ b/drivers/net/wireless/mediatek/mt7601u/phy.c
@@ -795,6 +795,7 @@
switch (phy_mode) {
case MT_PHY_TYPE_OFDM:
tx_rate += 4;
+ /* fall through */
case MT_PHY_TYPE_CCK:
reg = dev->rf_pa_mode[0];
break;
@@ -974,6 +975,7 @@
static void mt7601u_agc_tune(struct mt7601u_dev *dev)
{
u8 val = mt7601u_agc_default(dev);
+ long avg_rssi;
if (test_bit(MT7601U_STATE_SCANNING, &dev->state))
return;
@@ -983,9 +985,12 @@
* Rssi updates are only on beacons and U2M so should work...
*/
spin_lock_bh(&dev->con_mon_lock);
- if (dev->avg_rssi <= -70)
+ avg_rssi = ewma_rssi_read(&dev->avg_rssi);
+ WARN_ON_ONCE(avg_rssi == 0);
+ avg_rssi = -avg_rssi;
+ if (avg_rssi <= -70)
val -= 0x20;
- else if (dev->avg_rssi <= -60)
+ else if (avg_rssi <= -60)
val -= 0x10;
spin_unlock_bh(&dev->con_mon_lock);
@@ -1101,7 +1106,7 @@
/* Start/stop collecting beacon data */
spin_lock_bh(&dev->con_mon_lock);
ether_addr_copy(dev->ap_bssid, info->bssid);
- dev->avg_rssi = 0;
+ ewma_rssi_init(&dev->avg_rssi);
dev->bcn_freq_off = MT_FREQ_OFFSET_INVALID;
spin_unlock_bh(&dev->con_mon_lock);
diff --git a/drivers/net/wireless/quantenna/qtnfmac/cfg80211.c b/drivers/net/wireless/quantenna/qtnfmac/cfg80211.c
index 0398bec..5122dc79 100644
--- a/drivers/net/wireless/quantenna/qtnfmac/cfg80211.c
+++ b/drivers/net/wireless/quantenna/qtnfmac/cfg80211.c
@@ -813,6 +813,9 @@
struct qtnf_vif *vif = qtnf_netdev_get_priv(ndev);
int ret;
+ if (wiphy_ext_feature_isset(wiphy, NL80211_EXT_FEATURE_DFS_OFFLOAD))
+ return -ENOTSUPP;
+
ret = qtnf_cmd_start_cac(vif, chandef, cac_time_ms);
if (ret)
pr_err("%s: failed to start CAC ret=%d\n", ndev->name, ret);
@@ -909,6 +912,9 @@
{
struct wiphy *wiphy;
+ if (bus->hw_info.hw_capab & QLINK_HW_CAPAB_DFS_OFFLOAD)
+ qtn_cfg80211_ops.start_radar_detection = NULL;
+
wiphy = wiphy_new(&qtn_cfg80211_ops, sizeof(struct qtnf_wmac));
if (!wiphy)
return NULL;
@@ -982,6 +988,9 @@
WIPHY_FLAG_AP_UAPSD |
WIPHY_FLAG_HAS_CHANNEL_SWITCH;
+ if (hw_info->hw_capab & QLINK_HW_CAPAB_DFS_OFFLOAD)
+ wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_DFS_OFFLOAD);
+
wiphy->probe_resp_offload = NL80211_PROBE_RESP_OFFLOAD_SUPPORT_WPS |
NL80211_PROBE_RESP_OFFLOAD_SUPPORT_WPS2;
diff --git a/drivers/net/wireless/quantenna/qtnfmac/core.c b/drivers/net/wireless/quantenna/qtnfmac/core.c
index cf26c15..b3bfb4f 100644
--- a/drivers/net/wireless/quantenna/qtnfmac/core.c
+++ b/drivers/net/wireless/quantenna/qtnfmac/core.c
@@ -76,7 +76,7 @@
/* Netdev handler for data transmission.
*/
-static int
+static netdev_tx_t
qtnf_netdev_hard_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
struct qtnf_vif *vif;
diff --git a/drivers/net/wireless/quantenna/qtnfmac/event.c b/drivers/net/wireless/quantenna/qtnfmac/event.c
index bcd415f..16617c4 100644
--- a/drivers/net/wireless/quantenna/qtnfmac/event.c
+++ b/drivers/net/wireless/quantenna/qtnfmac/event.c
@@ -34,12 +34,13 @@
{
const u8 *sta_addr;
u16 frame_control;
- struct station_info sinfo = { 0 };
+ struct station_info *sinfo;
size_t payload_len;
u16 tlv_type;
u16 tlv_value_len;
size_t tlv_full_len;
const struct qlink_tlv_hdr *tlv;
+ int ret = 0;
if (unlikely(len < sizeof(*sta_assoc))) {
pr_err("VIF%u.%u: payload is too short (%u < %zu)\n",
@@ -53,6 +54,10 @@
return -EPROTO;
}
+ sinfo = kzalloc(sizeof(*sinfo), GFP_KERNEL);
+ if (!sinfo)
+ return -ENOMEM;
+
sta_addr = sta_assoc->sta_addr;
frame_control = le16_to_cpu(sta_assoc->frame_control);
@@ -61,9 +66,9 @@
qtnf_sta_list_add(vif, sta_addr);
- sinfo.assoc_req_ies = NULL;
- sinfo.assoc_req_ies_len = 0;
- sinfo.generation = vif->generation;
+ sinfo->assoc_req_ies = NULL;
+ sinfo->assoc_req_ies_len = 0;
+ sinfo->generation = vif->generation;
payload_len = len - sizeof(*sta_assoc);
tlv = (const struct qlink_tlv_hdr *)sta_assoc->ies;
@@ -73,23 +78,27 @@
tlv_value_len = le16_to_cpu(tlv->len);
tlv_full_len = tlv_value_len + sizeof(struct qlink_tlv_hdr);
- if (tlv_full_len > payload_len)
- return -EINVAL;
+ if (tlv_full_len > payload_len) {
+ ret = -EINVAL;
+ goto out;
+ }
if (tlv_type == QTN_TLV_ID_IE_SET) {
const struct qlink_tlv_ie_set *ie_set;
unsigned int ie_len;
- if (payload_len < sizeof(*ie_set))
- return -EINVAL;
+ if (payload_len < sizeof(*ie_set)) {
+ ret = -EINVAL;
+ goto out;
+ }
ie_set = (const struct qlink_tlv_ie_set *)tlv;
ie_len = tlv_value_len -
(sizeof(*ie_set) - sizeof(ie_set->hdr));
if (ie_set->type == QLINK_IE_SET_ASSOC_REQ && ie_len) {
- sinfo.assoc_req_ies = ie_set->ie_data;
- sinfo.assoc_req_ies_len = ie_len;
+ sinfo->assoc_req_ies = ie_set->ie_data;
+ sinfo->assoc_req_ies_len = ie_len;
}
}
@@ -97,13 +106,17 @@
tlv = (struct qlink_tlv_hdr *)(tlv->val + tlv_value_len);
}
- if (payload_len)
- return -EINVAL;
+ if (payload_len) {
+ ret = -EINVAL;
+ goto out;
+ }
- cfg80211_new_sta(vif->netdev, sta_assoc->sta_addr, &sinfo,
+ cfg80211_new_sta(vif->netdev, sta_assoc->sta_addr, sinfo,
GFP_KERNEL);
- return 0;
+out:
+ kfree(sinfo);
+ return ret;
}
static int
@@ -443,6 +456,17 @@
cfg80211_cac_event(vif->netdev, &chandef,
NL80211_RADAR_CAC_ABORTED, GFP_KERNEL);
break;
+ case QLINK_RADAR_CAC_STARTED:
+ if (vif->wdev.cac_started)
+ break;
+
+ if (!wiphy_ext_feature_isset(wiphy,
+ NL80211_EXT_FEATURE_DFS_OFFLOAD))
+ break;
+
+ cfg80211_cac_event(vif->netdev, &chandef,
+ NL80211_RADAR_CAC_STARTED, GFP_KERNEL);
+ break;
default:
pr_warn("%s: unhandled radar event %u\n",
vif->netdev->name, ev->event);
diff --git a/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c b/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c
index f117904..6c1e139 100644
--- a/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c
+++ b/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c
@@ -1185,6 +1185,10 @@
if (qtnf_poll_state(&priv->bda->bda_ep_state, QTN_EP_FW_LOADRDY,
QTN_FW_DL_TIMEOUT_MS)) {
pr_err("card is not ready\n");
+
+ if (!flashboot)
+ release_firmware(fw);
+
goto fw_load_fail;
}
diff --git a/drivers/net/wireless/quantenna/qtnfmac/qlink.h b/drivers/net/wireless/quantenna/qtnfmac/qlink.h
index 9bf3ae4..9ab27e1 100644
--- a/drivers/net/wireless/quantenna/qtnfmac/qlink.h
+++ b/drivers/net/wireless/quantenna/qtnfmac/qlink.h
@@ -68,10 +68,12 @@
* @QLINK_HW_CAPAB_STA_INACT_TIMEOUT: device implements a logic to kick-out
* associated STAs due to inactivity. Inactivity timeout period is taken
* from QLINK_CMD_START_AP parameters.
+ * @QLINK_HW_CAPAB_DFS_OFFLOAD: device implements DFS offload functionality
*/
enum qlink_hw_capab {
- QLINK_HW_CAPAB_REG_UPDATE = BIT(0),
- QLINK_HW_CAPAB_STA_INACT_TIMEOUT = BIT(1),
+ QLINK_HW_CAPAB_REG_UPDATE = BIT(0),
+ QLINK_HW_CAPAB_STA_INACT_TIMEOUT = BIT(1),
+ QLINK_HW_CAPAB_DFS_OFFLOAD = BIT(2),
};
enum qlink_iface_type {
@@ -1031,6 +1033,7 @@
QLINK_RADAR_CAC_ABORTED,
QLINK_RADAR_NOP_FINISHED,
QLINK_RADAR_PRE_CAC_EXPIRED,
+ QLINK_RADAR_CAC_STARTED,
};
/**
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800.h b/drivers/net/wireless/ralink/rt2x00/rt2800.h
index 6a8c93f..b05ed2f 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800.h
@@ -94,6 +94,7 @@
#define REV_RT3390E 0x0211
#define REV_RT3593E 0x0211
#define REV_RT5390F 0x0502
+#define REV_RT5370G 0x0503
#define REV_RT5390R 0x1502
#define REV_RT5592C 0x0221
@@ -1193,10 +1194,10 @@
#define TX_PWR_CFG_3_MCS13 FIELD32(0x000000f0)
#define TX_PWR_CFG_3_MCS14 FIELD32(0x00000f00)
#define TX_PWR_CFG_3_MCS15 FIELD32(0x0000f000)
-#define TX_PWR_CFG_3_UKNOWN1 FIELD32(0x000f0000)
-#define TX_PWR_CFG_3_UKNOWN2 FIELD32(0x00f00000)
-#define TX_PWR_CFG_3_UKNOWN3 FIELD32(0x0f000000)
-#define TX_PWR_CFG_3_UKNOWN4 FIELD32(0xf0000000)
+#define TX_PWR_CFG_3_UNKNOWN1 FIELD32(0x000f0000)
+#define TX_PWR_CFG_3_UNKNOWN2 FIELD32(0x00f00000)
+#define TX_PWR_CFG_3_UNKNOWN3 FIELD32(0x0f000000)
+#define TX_PWR_CFG_3_UNKNOWN4 FIELD32(0xf0000000)
/* bits for 3T devices */
#define TX_PWR_CFG_3_MCS12_CH0 FIELD32(0x0000000f)
#define TX_PWR_CFG_3_MCS12_CH1 FIELD32(0x000000f0)
@@ -1216,10 +1217,10 @@
* TX_PWR_CFG_4:
*/
#define TX_PWR_CFG_4 0x1324
-#define TX_PWR_CFG_4_UKNOWN5 FIELD32(0x0000000f)
-#define TX_PWR_CFG_4_UKNOWN6 FIELD32(0x000000f0)
-#define TX_PWR_CFG_4_UKNOWN7 FIELD32(0x00000f00)
-#define TX_PWR_CFG_4_UKNOWN8 FIELD32(0x0000f000)
+#define TX_PWR_CFG_4_UNKNOWN5 FIELD32(0x0000000f)
+#define TX_PWR_CFG_4_UNKNOWN6 FIELD32(0x000000f0)
+#define TX_PWR_CFG_4_UNKNOWN7 FIELD32(0x00000f00)
+#define TX_PWR_CFG_4_UNKNOWN8 FIELD32(0x0000f000)
/* bits for 3T devices */
#define TX_PWR_CFG_4_STBC4_CH0 FIELD32(0x0000000f)
#define TX_PWR_CFG_4_STBC4_CH1 FIELD32(0x000000f0)
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
index 429d07b..a567bc2 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
@@ -1557,12 +1557,13 @@
rt2800_register_write(rt2x00dev, MAX_LEN_CFG, reg);
}
-int rt2800_sta_add(struct rt2x00_dev *rt2x00dev, struct ieee80211_vif *vif,
+int rt2800_sta_add(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
struct ieee80211_sta *sta)
{
- int wcid;
- struct rt2x00_sta *sta_priv = sta_to_rt2x00_sta(sta);
+ struct rt2x00_dev *rt2x00dev = hw->priv;
struct rt2800_drv_data *drv_data = rt2x00dev->drv_data;
+ struct rt2x00_sta *sta_priv = sta_to_rt2x00_sta(sta);
+ int wcid;
/*
* Limit global maximum TX AMPDU length to smallest value of all
@@ -1608,8 +1609,10 @@
}
EXPORT_SYMBOL_GPL(rt2800_sta_add);
-int rt2800_sta_remove(struct rt2x00_dev *rt2x00dev, struct ieee80211_sta *sta)
+int rt2800_sta_remove(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ struct ieee80211_sta *sta)
{
+ struct rt2x00_dev *rt2x00dev = hw->priv;
struct rt2800_drv_data *drv_data = rt2x00dev->drv_data;
struct rt2x00_sta *sta_priv = sta_to_rt2x00_sta(sta);
int wcid = sta_priv->wcid;
@@ -6220,8 +6223,9 @@
rt2800_register_write(rt2x00dev, GPIO_CTRL, reg);
}
- /* This chip has hardware antenna diversity*/
- if (rt2x00_rt_rev_gte(rt2x00dev, RT5390, REV_RT5390R)) {
+ /* These chips have hardware RX antenna diversity */
+ if (rt2x00_rt_rev_gte(rt2x00dev, RT5390, REV_RT5390R) ||
+ rt2x00_rt_rev_gte(rt2x00dev, RT5390, REV_RT5370G)) {
rt2800_bbp_write(rt2x00dev, 150, 0); /* Disable Antenna Software OFDM */
rt2800_bbp_write(rt2x00dev, 151, 0); /* Disable Antenna Software CCK */
rt2800_bbp_write(rt2x00dev, 154, 0); /* Clear previously selected antenna */
@@ -8748,7 +8752,9 @@
rt2x00dev->default_ant.rx = ANTENNA_A;
}
- if (rt2x00_rt_rev_gte(rt2x00dev, RT5390, REV_RT5390R)) {
+ /* These chips have hardware RX antenna diversity */
+ if (rt2x00_rt_rev_gte(rt2x00dev, RT5390, REV_RT5390R) ||
+ rt2x00_rt_rev_gte(rt2x00dev, RT5390, REV_RT5370G)) {
rt2x00dev->default_ant.tx = ANTENNA_HW_DIVERSITY; /* Unused */
rt2x00dev->default_ant.rx = ANTENNA_HW_DIVERSITY; /* Unused */
}
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800lib.h b/drivers/net/wireless/ralink/rt2x00/rt2800lib.h
index 275e396..51d9c2a 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800lib.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800lib.h
@@ -208,9 +208,10 @@
int rt2800_config_pairwise_key(struct rt2x00_dev *rt2x00dev,
struct rt2x00lib_crypto *crypto,
struct ieee80211_key_conf *key);
-int rt2800_sta_add(struct rt2x00_dev *rt2x00dev, struct ieee80211_vif *vif,
+int rt2800_sta_add(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
struct ieee80211_sta *sta);
-int rt2800_sta_remove(struct rt2x00_dev *rt2x00dev, struct ieee80211_sta *sta);
+int rt2800_sta_remove(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ struct ieee80211_sta *sta);
void rt2800_config_filter(struct rt2x00_dev *rt2x00dev,
const unsigned int filter_flags);
void rt2800_config_intf(struct rt2x00_dev *rt2x00dev, struct rt2x00_intf *intf,
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800mmio.c b/drivers/net/wireless/ralink/rt2x00/rt2800mmio.c
index 1123e2b..e1a7ed7 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800mmio.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800mmio.c
@@ -600,6 +600,7 @@
case QID_AC_VI:
case QID_AC_BE:
case QID_AC_BK:
+ WARN_ON_ONCE(rt2x00queue_empty(queue));
entry = rt2x00queue_get_entry(queue, Q_INDEX);
rt2x00mmio_register_write(rt2x00dev, TX_CTX_IDX(queue->qid),
entry->entry_idx);
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800pci.c b/drivers/net/wireless/ralink/rt2x00/rt2800pci.c
index 1172eef..71b1aff 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800pci.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800pci.c
@@ -311,8 +311,8 @@
.get_stats = rt2x00mac_get_stats,
.get_key_seq = rt2800_get_key_seq,
.set_rts_threshold = rt2800_set_rts_threshold,
- .sta_add = rt2x00mac_sta_add,
- .sta_remove = rt2x00mac_sta_remove,
+ .sta_add = rt2800_sta_add,
+ .sta_remove = rt2800_sta_remove,
.bss_info_changed = rt2x00mac_bss_info_changed,
.conf_tx = rt2800_conf_tx,
.get_tsf = rt2800_get_tsf,
@@ -377,8 +377,6 @@
.config_erp = rt2800_config_erp,
.config_ant = rt2800_config_ant,
.config = rt2800_config,
- .sta_add = rt2800_sta_add,
- .sta_remove = rt2800_sta_remove,
};
static const struct rt2x00_ops rt2800pci_ops = {
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800soc.c b/drivers/net/wireless/ralink/rt2x00/rt2800soc.c
index 6848ebc..a502816 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800soc.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800soc.c
@@ -150,8 +150,8 @@
.get_stats = rt2x00mac_get_stats,
.get_key_seq = rt2800_get_key_seq,
.set_rts_threshold = rt2800_set_rts_threshold,
- .sta_add = rt2x00mac_sta_add,
- .sta_remove = rt2x00mac_sta_remove,
+ .sta_add = rt2800_sta_add,
+ .sta_remove = rt2800_sta_remove,
.bss_info_changed = rt2x00mac_bss_info_changed,
.conf_tx = rt2800_conf_tx,
.get_tsf = rt2800_get_tsf,
@@ -216,8 +216,6 @@
.config_erp = rt2800_config_erp,
.config_ant = rt2800_config_ant,
.config = rt2800_config,
- .sta_add = rt2800_sta_add,
- .sta_remove = rt2800_sta_remove,
};
static const struct rt2x00_ops rt2800soc_ops = {
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800usb.c b/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
index d901a41..98a7313 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800usb.c
@@ -797,8 +797,8 @@
.get_stats = rt2x00mac_get_stats,
.get_key_seq = rt2800_get_key_seq,
.set_rts_threshold = rt2800_set_rts_threshold,
- .sta_add = rt2x00mac_sta_add,
- .sta_remove = rt2x00mac_sta_remove,
+ .sta_add = rt2800_sta_add,
+ .sta_remove = rt2800_sta_remove,
.bss_info_changed = rt2x00mac_bss_info_changed,
.conf_tx = rt2800_conf_tx,
.get_tsf = rt2800_get_tsf,
@@ -858,8 +858,6 @@
.config_erp = rt2800_config_erp,
.config_ant = rt2800_config_ant,
.config = rt2800_config,
- .sta_add = rt2800_sta_add,
- .sta_remove = rt2800_sta_remove,
};
static void rt2800usb_queue_init(struct data_queue *queue)
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00.h b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
index 1f38c33..a279a43 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
@@ -1457,10 +1457,6 @@
#else
#define rt2x00mac_set_key NULL
#endif /* CONFIG_RT2X00_LIB_CRYPTO */
-int rt2x00mac_sta_add(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
- struct ieee80211_sta *sta);
-int rt2x00mac_sta_remove(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
- struct ieee80211_sta *sta);
void rt2x00mac_sw_scan_start(struct ieee80211_hw *hw,
struct ieee80211_vif *vif,
const u8 *mac_addr);
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c b/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c
index a971bc7..c380c1f 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00mac.c
@@ -739,8 +739,7 @@
return;
tx_queue_for_each(rt2x00dev, queue)
- if (!rt2x00queue_empty(queue))
- rt2x00queue_flush_queue(queue, drop);
+ rt2x00queue_flush_queue(queue, drop);
}
EXPORT_SYMBOL_GPL(rt2x00mac_flush);
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c
index a6884e7..7c1f8f5 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c
@@ -1000,6 +1000,8 @@
(queue->qid == QID_AC_BE) ||
(queue->qid == QID_AC_BK);
+ if (rt2x00queue_empty(queue))
+ return;
/*
* If we are not supposed to drop any pending
diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8192e2ant.c b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8192e2ant.c
index 8fce371..f22fec0 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8192e2ant.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8192e2ant.c
@@ -1783,7 +1783,7 @@
bool scan = false, link = false, roam = false;
RT_TRACE(rtlpriv, COMP_BT_COEXIST, DBG_LOUD,
- "[BTCoex], PsTdma type dismatch!!!, ");
+ "[BTCoex], PsTdma type mismatch!!!, ");
RT_TRACE(rtlpriv, COMP_BT_COEXIST, DBG_LOUD,
"curPsTdma=%d, recordPsTdma=%d\n",
coex_dm->cur_ps_tdma, coex_dm->tdma_adj_type);
diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8723b2ant.c b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8723b2ant.c
index 73ec319..279fe01 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8723b2ant.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8723b2ant.c
@@ -2766,7 +2766,7 @@
if (coex_dm->cur_ps_tdma != coex_dm->ps_tdma_du_adj_type) {
bool scan = false, link = false, roam = false;
RT_TRACE(rtlpriv, COMP_BT_COEXIST, DBG_LOUD,
- "[BTCoex], PsTdma type dismatch!!!, curPsTdma=%d, recordPsTdma=%d\n",
+ "[BTCoex], PsTdma type mismatch!!!, curPsTdma=%d, recordPsTdma=%d\n",
coex_dm->cur_ps_tdma, coex_dm->ps_tdma_du_adj_type);
btcoexist->btc_get(btcoexist, BTC_GET_BL_WIFI_SCAN, &scan);
diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a1ant.c b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a1ant.c
index 202597c..b5d6587 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a1ant.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a1ant.c
@@ -1584,11 +1584,7 @@
/* tdma and coex table */
btc8821a1ant_ps_tdma(btcoexist, NORMAL_EXEC, true, 5);
- if (BT_8821A_1ANT_WIFI_STATUS_NON_CONNECTED_ASSO_AUTH_SCAN ==
- wifi_status)
- btc8821a1ant_coex_table_with_type(btcoexist, NORMAL_EXEC, 1);
- else
- btc8821a1ant_coex_table_with_type(btcoexist, NORMAL_EXEC, 1);
+ btc8821a1ant_coex_table_with_type(btcoexist, NORMAL_EXEC, 1);
}
static void btc8821a1ant_act_wifi_con_bt_acl_busy(struct btc_coexist *btcoexist,
@@ -1991,16 +1987,9 @@
wifi_rssi_state =
btc8821a1ant_wifi_rssi_state(btcoexist, 1, 2,
30, 0);
- if ((wifi_rssi_state == BTC_RSSI_STATE_HIGH) ||
- (wifi_rssi_state == BTC_RSSI_STATE_STAY_HIGH)) {
- btc8821a1ant_limited_tx(btcoexist,
- NORMAL_EXEC, 1, 1,
- 0, 1);
- } else {
- btc8821a1ant_limited_tx(btcoexist,
- NORMAL_EXEC, 1, 1,
- 0, 1);
- }
+ btc8821a1ant_limited_tx(btcoexist,
+ NORMAL_EXEC, 1, 1,
+ 0, 1);
} else {
btc8821a1ant_limited_tx(btcoexist, NORMAL_EXEC,
0, 0, 0, 0);
diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a2ant.c b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a2ant.c
index 2202d5e..01a9d30 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a2ant.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a2ant.c
@@ -2614,7 +2614,7 @@
bool scan = false, link = false, roam = false;
RT_TRACE(rtlpriv, COMP_BT_COEXIST, DBG_LOUD,
- "[BTCoex], PsTdma type dismatch!!!, cur_ps_tdma = %d, recordPsTdma = %d\n",
+ "[BTCoex], PsTdma type mismatch!!!, cur_ps_tdma = %d, recordPsTdma = %d\n",
coex_dm->cur_ps_tdma, coex_dm->ps_tdma_du_adj_type);
btcoexist->btc_get(btcoexist, BTC_GET_BL_WIFI_SCAN, &scan);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192ee/hw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8192ee/hw.c
index fd7928f..3f2295f 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8192ee/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192ee/hw.c
@@ -2574,11 +2574,11 @@
rtlpriv->btcoexist.btc_info.btcoexist = 0;
rtlpriv->btcoexist.btc_info.bt_type = BT_RTL8192E;
- rtlpriv->btcoexist.btc_info.ant_num = ANT_TOTAL_X2;
+ rtlpriv->btcoexist.btc_info.ant_num = ANT_X2;
} else {
rtlpriv->btcoexist.btc_info.btcoexist = 1;
rtlpriv->btcoexist.btc_info.bt_type = BT_RTL8192E;
- rtlpriv->btcoexist.btc_info.ant_num = ANT_TOTAL_X1;
+ rtlpriv->btcoexist.btc_info.ant_num = ANT_X1;
}
}
diff --git a/drivers/net/wireless/realtek/rtlwifi/wifi.h b/drivers/net/wireless/realtek/rtlwifi/wifi.h
index ce17540..208010f 100644
--- a/drivers/net/wireless/realtek/rtlwifi/wifi.h
+++ b/drivers/net/wireless/realtek/rtlwifi/wifi.h
@@ -2842,11 +2842,6 @@
BT_RTL8812A = 11,
};
-enum bt_total_ant_num {
- ANT_TOTAL_X2 = 0,
- ANT_TOTAL_X1 = 1
-};
-
enum bt_cur_state {
BT_OFF = 0,
BT_ON = 1,
diff --git a/drivers/net/wireless/rsi/rsi_91x_coex.c b/drivers/net/wireless/rsi/rsi_91x_coex.c
index d055099..c8ba148 100644
--- a/drivers/net/wireless/rsi/rsi_91x_coex.c
+++ b/drivers/net/wireless/rsi/rsi_91x_coex.c
@@ -73,6 +73,7 @@
switch (msg_type) {
case COMMON_CARD_READY_IND:
rsi_dbg(INFO_ZONE, "common card ready received\n");
+ common->hibernate_resume = false;
rsi_handle_card_ready(common, msg);
break;
case SLEEP_NOTIFY_IND:
diff --git a/drivers/net/wireless/rsi/rsi_91x_core.c b/drivers/net/wireless/rsi/rsi_91x_core.c
index 5dafd2e..3644d7d 100644
--- a/drivers/net/wireless/rsi/rsi_91x_core.c
+++ b/drivers/net/wireless/rsi/rsi_91x_core.c
@@ -411,11 +411,30 @@
if ((ieee80211_is_mgmt(wh->frame_control)) ||
(ieee80211_is_ctl(wh->frame_control)) ||
(ieee80211_is_qos_nullfunc(wh->frame_control))) {
+ if (ieee80211_is_assoc_req(wh->frame_control) ||
+ ieee80211_is_reassoc_req(wh->frame_control)) {
+ struct ieee80211_bss_conf *bss = &vif->bss_conf;
+
+ common->eapol4_confirm = false;
+ rsi_hal_send_sta_notify_frame(common,
+ RSI_IFTYPE_STATION,
+ STA_CONNECTED, bss->bssid,
+ bss->qos, bss->aid, 0,
+ vif);
+ }
+
q_num = MGMT_SOFT_Q;
skb->priority = q_num;
+
+ if (rsi_prepare_mgmt_desc(common, skb)) {
+ rsi_dbg(ERR_ZONE, "Failed to prepare desc\n");
+ goto xmit_fail;
+ }
} else {
if (ieee80211_is_data_qos(wh->frame_control)) {
- tid = (skb->data[24] & IEEE80211_QOS_TID);
+ u8 *qos = ieee80211_get_qos_ctl(wh);
+
+ tid = *qos & IEEE80211_QOS_CTL_TID_MASK;
skb->priority = TID_TO_WME_AC(tid);
} else {
tid = IEEE80211_NONQOS_TID;
@@ -433,6 +452,8 @@
if (!rsta)
goto xmit_fail;
tx_params->sta_id = rsta->sta_id;
+ } else {
+ tx_params->sta_id = 0;
}
if (rsta) {
@@ -443,6 +464,14 @@
tid, 0);
}
}
+ if (skb->protocol == cpu_to_be16(ETH_P_PAE)) {
+ q_num = MGMT_SOFT_Q;
+ skb->priority = q_num;
+ }
+ if (rsi_prepare_data_desc(common, skb)) {
+ rsi_dbg(ERR_ZONE, "Failed to prepare data desc\n");
+ goto xmit_fail;
+ }
}
if ((q_num < MGMT_SOFT_Q) &&
@@ -456,7 +485,7 @@
}
rsi_core_queue_pkt(common, skb);
- rsi_dbg(DATA_TX_ZONE, "%s: ===> Scheduling TX thead <===\n", __func__);
+ rsi_dbg(DATA_TX_ZONE, "%s: ===> Scheduling TX thread <===\n", __func__);
rsi_set_event(&common->tx_thread.event);
return;
diff --git a/drivers/net/wireless/rsi/rsi_91x_hal.c b/drivers/net/wireless/rsi/rsi_91x_hal.c
index de608ae..0761e61 100644
--- a/drivers/net/wireless/rsi/rsi_91x_hal.c
+++ b/drivers/net/wireless/rsi/rsi_91x_hal.c
@@ -45,7 +45,7 @@
return status;
}
-static int rsi_prepare_mgmt_desc(struct rsi_common *common, struct sk_buff *skb)
+int rsi_prepare_mgmt_desc(struct rsi_common *common, struct sk_buff *skb)
{
struct rsi_hw *adapter = common->priv;
struct ieee80211_hdr *wh = NULL;
@@ -55,7 +55,7 @@
struct rsi_mgmt_desc *mgmt_desc;
struct skb_info *tx_params;
struct ieee80211_bss_conf *bss = NULL;
- struct xtended_desc *xtend_desc = NULL;
+ struct rsi_xtended_desc *xtend_desc = NULL;
u8 header_size;
u32 dword_align_bytes = 0;
@@ -69,7 +69,7 @@
vif = tx_params->vif;
/* Update header size */
- header_size = FRAME_DESC_SZ + sizeof(struct xtended_desc);
+ header_size = FRAME_DESC_SZ + sizeof(struct rsi_xtended_desc);
if (header_size > skb_headroom(skb)) {
rsi_dbg(ERR_ZONE,
"%s: Failed to add extended descriptor\n",
@@ -92,7 +92,7 @@
wh = (struct ieee80211_hdr *)&skb->data[header_size];
mgmt_desc = (struct rsi_mgmt_desc *)skb->data;
- xtend_desc = (struct xtended_desc *)&skb->data[FRAME_DESC_SZ];
+ xtend_desc = (struct rsi_xtended_desc *)&skb->data[FRAME_DESC_SZ];
rsi_set_len_qno(&mgmt_desc->len_qno, (skb->len - FRAME_DESC_SZ),
RSI_WIFI_MGMT_Q);
@@ -113,17 +113,6 @@
if (conf_is_ht40(conf))
mgmt_desc->bbp_info = cpu_to_le16(FULL40M_ENABLE);
- if (ieee80211_is_probe_req(wh->frame_control)) {
- if (!bss->assoc) {
- rsi_dbg(INFO_ZONE,
- "%s: blocking mgmt queue\n", __func__);
- mgmt_desc->misc_flags = RSI_DESC_REQUIRE_CFM_TO_HOST;
- xtend_desc->confirm_frame_type = PROBEREQ_CONFIRM;
- common->mgmt_q_block = true;
- rsi_dbg(INFO_ZONE, "Mgmt queue blocked\n");
- }
- }
-
if (ieee80211_is_probe_resp(wh->frame_control)) {
mgmt_desc->misc_flags |= (RSI_ADD_DELTA_TSF_VAP_ID |
RSI_FETCH_RETRY_CNT_FRM_HST);
@@ -149,7 +138,7 @@
}
/* This function prepares descriptor for given data packet */
-static int rsi_prepare_data_desc(struct rsi_common *common, struct sk_buff *skb)
+int rsi_prepare_data_desc(struct rsi_common *common, struct sk_buff *skb)
{
struct rsi_hw *adapter = common->priv;
struct ieee80211_vif *vif;
@@ -158,7 +147,7 @@
struct skb_info *tx_params;
struct ieee80211_bss_conf *bss;
struct rsi_data_desc *data_desc;
- struct xtended_desc *xtend_desc;
+ struct rsi_xtended_desc *xtend_desc;
u8 ieee80211_size = MIN_802_11_HDR_LEN;
u8 header_size;
u8 vap_id = 0;
@@ -170,7 +159,7 @@
bss = &vif->bss_conf;
tx_params = (struct skb_info *)info->driver_data;
- header_size = FRAME_DESC_SZ + sizeof(struct xtended_desc);
+ header_size = FRAME_DESC_SZ + sizeof(struct rsi_xtended_desc);
if (header_size > skb_headroom(skb)) {
rsi_dbg(ERR_ZONE, "%s: Unable to send pkt\n", __func__);
return -ENOSPC;
@@ -188,7 +177,7 @@
data_desc = (struct rsi_data_desc *)skb->data;
memset(data_desc, 0, header_size);
- xtend_desc = (struct xtended_desc *)&skb->data[FRAME_DESC_SZ];
+ xtend_desc = (struct rsi_xtended_desc *)&skb->data[FRAME_DESC_SZ];
wh = (struct ieee80211_hdr *)&skb->data[header_size];
seq_num = IEEE80211_SEQ_TO_SN(le16_to_cpu(wh->seq_ctrl));
@@ -243,6 +232,18 @@
data_desc->misc_flags |= RSI_FETCH_RETRY_CNT_FRM_HST;
#define EAPOL_RETRY_CNT 15
xtend_desc->retry_cnt = EAPOL_RETRY_CNT;
+
+ if (common->eapol4_confirm)
+ skb->priority = VO_Q;
+ else
+ rsi_set_len_qno(&data_desc->len_qno,
+ (skb->len - FRAME_DESC_SZ),
+ RSI_WIFI_MGMT_Q);
+ if ((skb->len - header_size) == EAPOL4_PACKET_LEN) {
+ data_desc->misc_flags |=
+ RSI_DESC_REQUIRE_CFM_TO_HOST;
+ xtend_desc->confirm_frame_type = EAPOL4_CONFIRM;
+ }
}
data_desc->mac_flags = cpu_to_le16(seq_num & 0xfff);
@@ -282,8 +283,11 @@
struct rsi_hw *adapter = common->priv;
struct ieee80211_vif *vif;
struct ieee80211_tx_info *info;
+ struct skb_info *tx_params;
struct ieee80211_bss_conf *bss;
+ struct ieee80211_hdr *wh;
int status = -EINVAL;
+ u8 header_size;
if (!skb)
return 0;
@@ -295,16 +299,15 @@
goto err;
vif = info->control.vif;
bss = &vif->bss_conf;
+ tx_params = (struct skb_info *)info->driver_data;
+ header_size = tx_params->internal_hdr_size;
+ wh = (struct ieee80211_hdr *)&skb->data[header_size];
if (((vif->type == NL80211_IFTYPE_STATION) ||
(vif->type == NL80211_IFTYPE_P2P_CLIENT)) &&
(!bss->assoc))
goto err;
- status = rsi_prepare_data_desc(common, skb);
- if (status)
- goto err;
-
status = rsi_send_pkt_to_bus(common, skb);
if (status)
rsi_dbg(ERR_ZONE, "%s: Failed to write pkt\n", __func__);
@@ -327,12 +330,18 @@
struct sk_buff *skb)
{
struct rsi_hw *adapter = common->priv;
+ struct ieee80211_bss_conf *bss;
+ struct ieee80211_hdr *wh;
struct ieee80211_tx_info *info;
struct skb_info *tx_params;
+ struct rsi_mgmt_desc *mgmt_desc;
+ struct rsi_xtended_desc *xtend_desc;
int status = -E2BIG;
+ u8 header_size;
info = IEEE80211_SKB_CB(skb);
tx_params = (struct skb_info *)info->driver_data;
+ header_size = tx_params->internal_hdr_size;
if (tx_params->flags & INTERNAL_MGMT_PKT) {
status = adapter->host_intf_ops->write_pkt(common->priv,
@@ -346,15 +355,25 @@
return status;
}
- if (FRAME_DESC_SZ > skb_headroom(skb))
- goto err;
+ bss = &info->control.vif->bss_conf;
+ wh = (struct ieee80211_hdr *)&skb->data[header_size];
+ mgmt_desc = (struct rsi_mgmt_desc *)skb->data;
+ xtend_desc = (struct rsi_xtended_desc *)&skb->data[FRAME_DESC_SZ];
- rsi_prepare_mgmt_desc(common, skb);
+ /* Indicate to firmware to give cfm for probe */
+ if (ieee80211_is_probe_req(wh->frame_control) && !bss->assoc) {
+ rsi_dbg(INFO_ZONE,
+ "%s: blocking mgmt queue\n", __func__);
+ mgmt_desc->misc_flags = RSI_DESC_REQUIRE_CFM_TO_HOST;
+ xtend_desc->confirm_frame_type = PROBEREQ_CONFIRM;
+ common->mgmt_q_block = true;
+ rsi_dbg(INFO_ZONE, "Mgmt queue blocked\n");
+ }
+
status = rsi_send_pkt_to_bus(common, skb);
if (status)
rsi_dbg(ERR_ZONE, "%s: Failed to write the packet\n", __func__);
-err:
rsi_indicate_tx_status(common->priv, skb, status);
return status;
}
@@ -616,28 +635,32 @@
u32 content_size)
{
struct rsi_host_intf_ops *hif_ops = adapter->host_intf_ops;
- struct bl_header bl_hdr;
+ struct bl_header *bl_hdr;
u32 write_addr, write_len;
int status;
- bl_hdr.flags = 0;
- bl_hdr.image_no = cpu_to_le32(adapter->priv->coex_mode);
- bl_hdr.check_sum = cpu_to_le32(
- *(u32 *)&flash_content[CHECK_SUM_OFFSET]);
- bl_hdr.flash_start_address = cpu_to_le32(
- *(u32 *)&flash_content[ADDR_OFFSET]);
- bl_hdr.flash_len = cpu_to_le32(*(u32 *)&flash_content[LEN_OFFSET]);
+ bl_hdr = kzalloc(sizeof(*bl_hdr), GFP_KERNEL);
+ if (!bl_hdr)
+ return -ENOMEM;
+
+ bl_hdr->flags = 0;
+ bl_hdr->image_no = cpu_to_le32(adapter->priv->coex_mode);
+ bl_hdr->check_sum =
+ cpu_to_le32(*(u32 *)&flash_content[CHECK_SUM_OFFSET]);
+ bl_hdr->flash_start_address =
+ cpu_to_le32(*(u32 *)&flash_content[ADDR_OFFSET]);
+ bl_hdr->flash_len = cpu_to_le32(*(u32 *)&flash_content[LEN_OFFSET]);
write_len = sizeof(struct bl_header);
if (adapter->rsi_host_intf == RSI_HOST_INTF_USB) {
write_addr = PING_BUFFER_ADDRESS;
status = hif_ops->write_reg_multiple(adapter, write_addr,
- (u8 *)&bl_hdr, write_len);
+ (u8 *)bl_hdr, write_len);
if (status < 0) {
rsi_dbg(ERR_ZONE,
"%s: Failed to load Version/CRC structure\n",
__func__);
- return status;
+ goto fail;
}
} else {
write_addr = PING_BUFFER_ADDRESS >> 16;
@@ -646,20 +669,23 @@
rsi_dbg(ERR_ZONE,
"%s: Unable to set ms word to common reg\n",
__func__);
- return status;
+ goto fail;
}
write_addr = RSI_SD_REQUEST_MASTER |
(PING_BUFFER_ADDRESS & 0xFFFF);
status = hif_ops->write_reg_multiple(adapter, write_addr,
- (u8 *)&bl_hdr, write_len);
+ (u8 *)bl_hdr, write_len);
if (status < 0) {
rsi_dbg(ERR_ZONE,
"%s: Failed to load Version/CRC structure\n",
__func__);
- return status;
+ goto fail;
}
}
- return 0;
+ status = 0;
+fail:
+ kfree(bl_hdr);
+ return status;
}
static u32 read_flash_capacity(struct rsi_hw *adapter)
diff --git a/drivers/net/wireless/rsi/rsi_91x_mac80211.c b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
index 32f5cb4..3faa044 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mac80211.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
@@ -614,7 +614,7 @@
/* Power save parameters */
if (changed & IEEE80211_CONF_CHANGE_PS) {
- struct ieee80211_vif *vif;
+ struct ieee80211_vif *vif, *sta_vif = NULL;
unsigned long flags;
int i, set_ps = 1;
@@ -628,13 +628,17 @@
set_ps = 0;
break;
}
+ if ((vif->type == NL80211_IFTYPE_STATION ||
+ vif->type == NL80211_IFTYPE_P2P_CLIENT) &&
+ (!sta_vif || vif->bss_conf.assoc))
+ sta_vif = vif;
}
- if (set_ps) {
+ if (set_ps && sta_vif) {
spin_lock_irqsave(&adapter->ps_lock, flags);
if (conf->flags & IEEE80211_CONF_PS)
- rsi_enable_ps(adapter, vif);
+ rsi_enable_ps(adapter, sta_vif);
else
- rsi_disable_ps(adapter, vif);
+ rsi_disable_ps(adapter, sta_vif);
spin_unlock_irqrestore(&adapter->ps_lock, flags);
}
}
@@ -737,7 +741,8 @@
bss_conf->bssid,
bss_conf->qos,
bss_conf->aid,
- NULL, 0, vif);
+ NULL, 0,
+ bss_conf->assoc_capability, vif);
adapter->ps_info.dtim_interval_duration = bss->dtim_period;
adapter->ps_info.listen_interval = conf->listen_interval;
@@ -906,14 +911,25 @@
}
}
- return rsi_hal_load_key(adapter->priv,
- key->key,
- key->keylen,
- key_type,
- key->keyidx,
- key->cipher,
- sta_id,
- vif);
+ status = rsi_hal_load_key(adapter->priv,
+ key->key,
+ key->keylen,
+ key_type,
+ key->keyidx,
+ key->cipher,
+ sta_id,
+ vif);
+ if (status)
+ return status;
+
+ if (vif->type == NL80211_IFTYPE_STATION && key->key &&
+ (key->cipher == WLAN_CIPHER_SUITE_WEP104 ||
+ key->cipher == WLAN_CIPHER_SUITE_WEP40)) {
+ if (!rsi_send_block_unblock_frame(adapter->priv, false))
+ adapter->priv->hw_data_qs_blocked = false;
+ }
+
+ return 0;
}
/**
@@ -1391,7 +1407,7 @@
rsi_dbg(INFO_ZONE, "Indicate bss status to device\n");
rsi_inform_bss_status(common, RSI_OPMODE_AP, 1,
sta->addr, sta->wme, sta->aid,
- sta, sta_idx, vif);
+ sta, sta_idx, 0, vif);
if (common->key) {
struct ieee80211_key_conf *key = common->key;
@@ -1469,7 +1485,7 @@
rsi_inform_bss_status(common, RSI_OPMODE_AP, 0,
sta->addr, sta->wme,
sta->aid, sta, sta_idx,
- vif);
+ 0, vif);
rsta->sta = NULL;
rsta->sta_id = -1;
for (cnt = 0; cnt < IEEE80211_NUM_TIDS; cnt++)
@@ -1788,15 +1804,21 @@
struct rsi_common *common = adapter->priv;
u16 triggers = 0;
u16 rx_filter_word = 0;
- struct ieee80211_bss_conf *bss = &adapter->vifs[0]->bss_conf;
+ struct ieee80211_bss_conf *bss = NULL;
rsi_dbg(INFO_ZONE, "Config WoWLAN to device\n");
+ if (!adapter->vifs[0])
+ return -EINVAL;
+
+ bss = &adapter->vifs[0]->bss_conf;
+
if (WARN_ON(!wowlan)) {
rsi_dbg(ERR_ZONE, "WoW triggers not enabled\n");
return -EINVAL;
}
+ common->wow_flags |= RSI_WOW_ENABLED;
triggers = rsi_wow_map_triggers(common, wowlan);
if (!triggers) {
rsi_dbg(ERR_ZONE, "%s:No valid WoW triggers\n", __func__);
@@ -1819,7 +1841,6 @@
rx_filter_word = (ALLOW_DATA_ASSOC_PEER | DISALLOW_BEACONS);
rsi_send_rx_filter_frame(common, rx_filter_word);
- common->wow_flags |= RSI_WOW_ENABLED;
return 0;
}
@@ -1939,9 +1960,8 @@
hw->uapsd_queues = RSI_IEEE80211_UAPSD_QUEUES;
hw->uapsd_max_sp_len = IEEE80211_WMM_IE_STA_QOSINFO_SP_ALL;
- hw->max_tx_aggregation_subframes = 6;
- rsi_register_rates_channels(adapter, NL80211_BAND_2GHZ);
- rsi_register_rates_channels(adapter, NL80211_BAND_5GHZ);
+ hw->max_tx_aggregation_subframes = RSI_MAX_TX_AGGR_FRMS;
+ hw->max_rx_aggregation_subframes = RSI_MAX_RX_AGGR_FRMS;
hw->rate_control_algorithm = "AARF";
SET_IEEE80211_PERM_ADDR(hw, common->mac_addr);
@@ -1962,10 +1982,15 @@
wiphy->available_antennas_rx = 1;
wiphy->available_antennas_tx = 1;
+
+ rsi_register_rates_channels(adapter, NL80211_BAND_2GHZ);
wiphy->bands[NL80211_BAND_2GHZ] =
&adapter->sbands[NL80211_BAND_2GHZ];
- wiphy->bands[NL80211_BAND_5GHZ] =
- &adapter->sbands[NL80211_BAND_5GHZ];
+ if (common->num_supp_bands > 1) {
+ rsi_register_rates_channels(adapter, NL80211_BAND_5GHZ);
+ wiphy->bands[NL80211_BAND_5GHZ] =
+ &adapter->sbands[NL80211_BAND_5GHZ];
+ }
/* AP Parameters */
wiphy->max_ap_assoc_sta = rsi_max_ap_stas[common->oper_mode - 1];
@@ -1991,6 +2016,9 @@
wiphy->iface_combinations = rsi_iface_combinations;
wiphy->n_iface_combinations = ARRAY_SIZE(rsi_iface_combinations);
+ if (common->coex_mode > 1)
+ wiphy->flags |= WIPHY_FLAG_PS_ON_BY_DEFAULT;
+
status = ieee80211_register_hw(hw);
if (status)
return status;
diff --git a/drivers/net/wireless/rsi/rsi_91x_mgmt.c b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
index c21fca7..0757adc 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mgmt.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
@@ -325,8 +325,8 @@
radio_caps->channel_num = common->channel;
radio_caps->rf_model = RSI_RF_TYPE;
+ radio_caps->radio_cfg_info = RSI_LMAC_CLOCK_80MHZ;
if (common->channel_width == BW_40MHZ) {
- radio_caps->radio_cfg_info = RSI_LMAC_CLOCK_80MHZ;
radio_caps->radio_cfg_info |= RSI_ENABLE_40MHZ;
if (common->fsm_state == FSM_MAC_INIT_DONE) {
@@ -454,14 +454,10 @@
*
* Return: status: 0 on success, corresponding negative error code on failure.
*/
-static int rsi_hal_send_sta_notify_frame(struct rsi_common *common,
- enum opmode opmode,
- u8 notify_event,
- const unsigned char *bssid,
- u8 qos_enable,
- u16 aid,
- u16 sta_id,
- struct ieee80211_vif *vif)
+int rsi_hal_send_sta_notify_frame(struct rsi_common *common, enum opmode opmode,
+ u8 notify_event, const unsigned char *bssid,
+ u8 qos_enable, u16 aid, u16 sta_id,
+ struct ieee80211_vif *vif)
{
struct sk_buff *skb = NULL;
struct rsi_peer_notify *peer_notify;
@@ -1328,6 +1324,7 @@
u16 aid,
struct ieee80211_sta *sta,
u16 sta_id,
+ u16 assoc_cap,
struct ieee80211_vif *vif)
{
if (status) {
@@ -1342,10 +1339,10 @@
vif);
if (common->min_rate == 0xffff)
rsi_send_auto_rate_request(common, sta, sta_id, vif);
- if (opmode == RSI_OPMODE_STA) {
- if (!rsi_send_block_unblock_frame(common, false))
- common->hw_data_qs_blocked = false;
- }
+ if (opmode == RSI_OPMODE_STA &&
+ !(assoc_cap & WLAN_CAPABILITY_PRIVACY) &&
+ !rsi_send_block_unblock_frame(common, false))
+ common->hw_data_qs_blocked = false;
} else {
if (opmode == RSI_OPMODE_STA)
common->hw_data_qs_blocked = true;
@@ -1850,10 +1847,19 @@
__func__);
return rsi_handle_card_ready(common, msg);
case TX_STATUS_IND:
- if (msg[15] == PROBEREQ_CONFIRM) {
+ switch (msg[RSI_TX_STATUS_TYPE]) {
+ case PROBEREQ_CONFIRM:
common->mgmt_q_block = false;
rsi_dbg(FSM_ZONE, "%s: Probe confirm received\n",
__func__);
+ break;
+ case EAPOL4_CONFIRM:
+ if (msg[RSI_TX_STATUS]) {
+ common->eapol4_confirm = true;
+ if (!rsi_send_block_unblock_frame(common,
+ false))
+ common->hw_data_qs_blocked = false;
+ }
}
break;
case BEACON_EVENT_IND:
diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c
index d76e69c..416981d 100644
--- a/drivers/net/wireless/rsi/rsi_91x_sdio.c
+++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c
@@ -170,7 +170,6 @@
int err;
struct mmc_card *card = pfunction->card;
struct mmc_host *host = card->host;
- s32 bit = (fls(host->ocr_avail) - 1);
u8 cmd52_resp;
u32 clock, resp, i;
u16 rca;
@@ -190,7 +189,6 @@
msleep(20);
/* Initialize the SDIO card */
- host->ios.vdd = bit;
host->ios.chip_select = MMC_CS_DONTCARE;
host->ios.bus_mode = MMC_BUSMODE_OPENDRAIN;
host->ios.power_mode = MMC_POWER_UP;
@@ -660,8 +658,6 @@
if (!data)
return -ENOMEM;
- data = PTR_ALIGN(data, 8);
-
ms_addr = (addr >> 16);
status = rsi_sdio_master_access_msword(adapter, ms_addr);
if (status < 0) {
@@ -724,8 +720,6 @@
if (!data_aligned)
return -ENOMEM;
- data_aligned = PTR_ALIGN(data_aligned, 8);
-
if (size == 2) {
*data_aligned = ((data << 16) | (data & 0xFFFF));
} else if (size == 1) {
@@ -1042,17 +1036,21 @@
/*This function resets and re-initializes the chip.*/
static void rsi_reset_chip(struct rsi_hw *adapter)
{
- __le32 data;
+ u8 *data;
u8 sdio_interrupt_status = 0;
u8 request = 1;
int ret;
+ data = kzalloc(sizeof(u32), GFP_KERNEL);
+ if (!data)
+ return;
+
rsi_dbg(INFO_ZONE, "Writing disable to wakeup register\n");
ret = rsi_sdio_write_register(adapter, 0, SDIO_WAKEUP_REG, &request);
if (ret < 0) {
rsi_dbg(ERR_ZONE,
"%s: Failed to write SDIO wakeup register\n", __func__);
- return;
+ goto err;
}
msleep(20);
ret = rsi_sdio_read_register(adapter, RSI_FN1_INT_REGISTER,
@@ -1060,7 +1058,7 @@
if (ret < 0) {
rsi_dbg(ERR_ZONE, "%s: Failed to Read Intr Status Register\n",
__func__);
- return;
+ goto err;
}
rsi_dbg(INFO_ZONE, "%s: Intr Status Register value = %d\n",
__func__, sdio_interrupt_status);
@@ -1070,17 +1068,17 @@
rsi_dbg(ERR_ZONE,
"%s: Unable to set ms word to common reg\n",
__func__);
- return;
+ goto err;
}
- data = TA_HOLD_THREAD_VALUE;
+ put_unaligned_le32(TA_HOLD_THREAD_VALUE, data);
if (rsi_sdio_write_register_multiple(adapter, TA_HOLD_THREAD_REG |
RSI_SD_REQUEST_MASTER,
- (u8 *)&data, 4)) {
+ data, 4)) {
rsi_dbg(ERR_ZONE,
"%s: Unable to hold Thread-Arch processor threads\n",
__func__);
- return;
+ goto err;
}
/* This msleep will ensure Thread-Arch processor to go to hold
@@ -1101,6 +1099,9 @@
* read write operations to complete for chip reset.
*/
msleep(500);
+err:
+ kfree(data);
+ return;
}
/**
diff --git a/drivers/net/wireless/rsi/rsi_91x_usb.c b/drivers/net/wireless/rsi/rsi_91x_usb.c
index 7b8bae3..6ce6b75 100644
--- a/drivers/net/wireless/rsi/rsi_91x_usb.c
+++ b/drivers/net/wireless/rsi/rsi_91x_usb.c
@@ -687,6 +687,14 @@
*/
msleep(100);
+ ret = rsi_usb_master_reg_write(adapter, SWBL_REGOUT,
+ RSI_FW_WDT_DISABLE_REQ,
+ RSI_COMMON_REG_SIZE);
+ if (ret < 0) {
+ rsi_dbg(ERR_ZONE, "Disabling firmware watchdog timer failed\n");
+ goto fail;
+ }
+
ret = usb_ulp_read_write(adapter, RSI_WATCH_DOG_TIMER_1,
RSI_ULP_WRITE_2, 32);
if (ret < 0)
diff --git a/drivers/net/wireless/rsi/rsi_boot_params.h b/drivers/net/wireless/rsi/rsi_boot_params.h
index 238ee96..ad903b22 100644
--- a/drivers/net/wireless/rsi/rsi_boot_params.h
+++ b/drivers/net/wireless/rsi/rsi_boot_params.h
@@ -46,7 +46,8 @@
(((TA_PLL_M_VAL_20 + 1) * 40) / \
((TA_PLL_N_VAL_20 + 1) * (TA_PLL_P_VAL_20 + 1)))
#define VALID_20 \
- (WIFI_PLL960_CONFIGS | WIFI_AFEPLL_CONFIGS | WIFI_SWITCH_CLK_CONFIGS)
+ (WIFI_TAPLL_CONFIGS | WIFI_PLL960_CONFIGS | WIFI_AFEPLL_CONFIGS | \
+ WIFI_SWITCH_CLK_CONFIGS | BOOTUP_MODE_INFO | CRYSTAL_GOOD_TIME)
#define UMAC_CLK_40BW \
(((TA_PLL_M_VAL_40 + 1) * 40) / \
((TA_PLL_N_VAL_40 + 1) * (TA_PLL_P_VAL_40 + 1)))
diff --git a/drivers/net/wireless/rsi/rsi_hal.h b/drivers/net/wireless/rsi/rsi_hal.h
index 786dccd..327638c 100644
--- a/drivers/net/wireless/rsi/rsi_hal.h
+++ b/drivers/net/wireless/rsi/rsi_hal.h
@@ -115,6 +115,7 @@
#define FW_FLASH_OFFSET 0x820
#define LMAC_VER_OFFSET (FW_FLASH_OFFSET + 0x200)
#define MAX_DWORD_ALIGN_BYTES 64
+#define RSI_COMMON_REG_SIZE 2
struct bl_header {
__le32 flags;
@@ -167,6 +168,8 @@
} __packed;
int rsi_hal_device_init(struct rsi_hw *adapter);
+int rsi_prepare_mgmt_desc(struct rsi_common *common, struct sk_buff *skb);
+int rsi_prepare_data_desc(struct rsi_common *common, struct sk_buff *skb);
int rsi_prepare_beacon(struct rsi_common *common, struct sk_buff *skb);
int rsi_send_pkt_to_bus(struct rsi_common *common, struct sk_buff *skb);
int rsi_send_bt_pkt(struct rsi_common *common, struct sk_buff *skb);
diff --git a/drivers/net/wireless/rsi/rsi_main.h b/drivers/net/wireless/rsi/rsi_main.h
index ef4fa32..a084f22 100644
--- a/drivers/net/wireless/rsi/rsi_main.h
+++ b/drivers/net/wireless/rsi/rsi_main.h
@@ -190,12 +190,6 @@
u32 rssi_hyst;
};
-struct xtended_desc {
- u8 confirm_frame_type;
- u8 retry_cnt;
- u16 reserved;
-};
-
enum rsi_dfs_regions {
RSI_REGION_FCC = 0,
RSI_REGION_ETSI,
@@ -293,6 +287,7 @@
struct timer_list roc_timer;
struct ieee80211_vif *roc_vif;
+ bool eapol4_confirm;
void *bt_adapter;
};
diff --git a/drivers/net/wireless/rsi/rsi_mgmt.h b/drivers/net/wireless/rsi/rsi_mgmt.h
index cf6567a..1462093 100644
--- a/drivers/net/wireless/rsi/rsi_mgmt.h
+++ b/drivers/net/wireless/rsi/rsi_mgmt.h
@@ -33,6 +33,7 @@
#define WMM_SHORT_SLOT_TIME 9
#define SIFS_DURATION 16
+#define EAPOL4_PACKET_LEN 0x85
#define KEY_TYPE_CLEAR 0
#define RSI_PAIRWISE_KEY 1
#define RSI_GROUP_KEY 2
@@ -62,9 +63,12 @@
#define RX_DOT11_MGMT 0x02
#define TX_STATUS_IND 0x04
#define BEACON_EVENT_IND 0x08
+#define EAPOL4_CONFIRM 1
#define PROBEREQ_CONFIRM 2
#define CARD_READY_IND 0x00
#define SLEEP_NOTIFY_IND 0x06
+#define RSI_TX_STATUS_TYPE 15
+#define RSI_TX_STATUS 12
#define RSI_DELETE_PEER 0x0
#define RSI_ADD_PEER 0x1
@@ -221,6 +225,9 @@
#define RSI_WOW_DISCONNECT BIT(5)
#endif
+#define RSI_MAX_TX_AGGR_FRMS 8
+#define RSI_MAX_RX_AGGR_FRMS 8
+
enum opmode {
RSI_OPMODE_UNSUPPORTED = -1,
RSI_OPMODE_AP = 0,
@@ -301,6 +308,12 @@
#define ENCAP_MGMT_PKT BIT(7)
#define DESC_IMMEDIATE_WAKEUP BIT(15)
+struct rsi_xtended_desc {
+ u8 confirm_frame_type;
+ u8 retry_cnt;
+ u16 reserved;
+};
+
struct rsi_cmd_desc_dword0 {
__le16 len_qno;
u8 frame_type;
@@ -654,10 +667,14 @@
struct ieee80211_channel *channel);
int rsi_send_vap_dynamic_update(struct rsi_common *common);
int rsi_send_block_unblock_frame(struct rsi_common *common, bool event);
+int rsi_hal_send_sta_notify_frame(struct rsi_common *common, enum opmode opmode,
+ u8 notify_event, const unsigned char *bssid,
+ u8 qos_enable, u16 aid, u16 sta_id,
+ struct ieee80211_vif *vif);
void rsi_inform_bss_status(struct rsi_common *common, enum opmode opmode,
u8 status, const u8 *addr, u8 qos_enable, u16 aid,
struct ieee80211_sta *sta, u16 sta_id,
- struct ieee80211_vif *vif);
+ u16 assoc_cap, struct ieee80211_vif *vif);
void rsi_indicate_pkt_to_os(struct rsi_common *common, struct sk_buff *skb);
int rsi_mac80211_attach(struct rsi_common *common);
void rsi_indicate_tx_status(struct rsi_hw *common, struct sk_buff *skb,
diff --git a/drivers/net/wireless/rsi/rsi_sdio.h b/drivers/net/wireless/rsi/rsi_sdio.h
index ead8e7c..353dbdf 100644
--- a/drivers/net/wireless/rsi/rsi_sdio.h
+++ b/drivers/net/wireless/rsi/rsi_sdio.h
@@ -87,7 +87,7 @@
#define TA_SOFT_RST_CLR 0
#define TA_SOFT_RST_SET BIT(0)
#define TA_PC_ZERO 0
-#define TA_HOLD_THREAD_VALUE cpu_to_le32(0xF)
+#define TA_HOLD_THREAD_VALUE 0xF
#define TA_RELEASE_THREAD_VALUE cpu_to_le32(0xF)
#define TA_BASE_ADDR 0x2200
#define MISC_CFG_BASE_ADDR 0x4105
diff --git a/drivers/net/wireless/rsi/rsi_usb.h b/drivers/net/wireless/rsi/rsi_usb.h
index a88d592..b6fe79f 100644
--- a/drivers/net/wireless/rsi/rsi_usb.h
+++ b/drivers/net/wireless/rsi/rsi_usb.h
@@ -26,6 +26,7 @@
#define RSI_USB_READY_MAGIC_NUM 0xab
#define FW_STATUS_REG 0x41050012
#define RSI_TA_HOLD_REG 0x22000844
+#define RSI_FW_WDT_DISABLE_REQ 0x69
#define USB_VENDOR_REGISTER_READ 0x15
#define USB_VENDOR_REGISTER_WRITE 0x16
diff --git a/drivers/net/wireless/st/cw1200/txrx.c b/drivers/net/wireless/st/cw1200/txrx.c
index e9050b4..f7b1b00 100644
--- a/drivers/net/wireless/st/cw1200/txrx.c
+++ b/drivers/net/wireless/st/cw1200/txrx.c
@@ -1069,7 +1069,7 @@
}
if (skb->len < sizeof(struct ieee80211_pspoll)) {
- wiphy_warn(priv->hw->wiphy, "Mailformed SDU rx'ed. Size is lesser than IEEE header.\n");
+ wiphy_warn(priv->hw->wiphy, "Malformed SDU rx'ed. Size is lesser than IEEE header.\n");
goto drop;
}
diff --git a/drivers/net/wireless/ti/wlcore/sdio.c b/drivers/net/wireless/ti/wlcore/sdio.c
index 1f727ba..6dbe61d 100644
--- a/drivers/net/wireless/ti/wlcore/sdio.c
+++ b/drivers/net/wireless/ti/wlcore/sdio.c
@@ -155,17 +155,11 @@
struct mmc_card *card = func->card;
ret = pm_runtime_get_sync(&card->dev);
- if (ret) {
- /*
- * Runtime PM might be temporarily disabled, or the device
- * might have a positive reference counter. Make sure it is
- * really powered on.
- */
- ret = mmc_power_restore_host(card->host);
- if (ret < 0) {
- pm_runtime_put_sync(&card->dev);
- goto out;
- }
+ if (ret < 0) {
+ pm_runtime_put_noidle(&card->dev);
+ dev_err(glue->dev, "%s: failed to get_sync(%d)\n",
+ __func__, ret);
+ goto out;
}
sdio_claim_host(func);
@@ -178,7 +172,6 @@
static int wl12xx_sdio_power_off(struct wl12xx_sdio_glue *glue)
{
- int ret;
struct sdio_func *func = dev_to_sdio_func(glue->dev);
struct mmc_card *card = func->card;
@@ -186,16 +179,8 @@
sdio_disable_func(func);
sdio_release_host(func);
- /* Power off the card manually in case it wasn't powered off above */
- ret = mmc_power_save_host(card->host);
- if (ret < 0)
- goto out;
-
/* Let runtime PM know the card is powered off */
- pm_runtime_put_sync(&card->dev);
-
-out:
- return ret;
+ return pm_runtime_put_sync(&card->dev);
}
static int wl12xx_sdio_set_power(struct device *child, bool enable)
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 8c0c927..d963baf 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -204,6 +204,9 @@
bool scanphys = false;
int addr, rc;
+ if (!np)
+ return mdiobus_register(mdio);
+
/* Do not continue if the node is disabled */
if (!of_device_is_available(np))
return -ENODEV;
diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
index a0d5221..4ef4292 100644
--- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
+++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
@@ -135,19 +135,25 @@
static const struct mvebu_comhy_conf mvebu_comphy_cp110_modes[] = {
/* lane 0 */
MVEBU_COMPHY_CONF(0, 1, PHY_MODE_SGMII, 0x1),
+ MVEBU_COMPHY_CONF(0, 1, PHY_MODE_2500SGMII, 0x1),
/* lane 1 */
MVEBU_COMPHY_CONF(1, 2, PHY_MODE_SGMII, 0x1),
+ MVEBU_COMPHY_CONF(1, 2, PHY_MODE_2500SGMII, 0x1),
/* lane 2 */
MVEBU_COMPHY_CONF(2, 0, PHY_MODE_SGMII, 0x1),
+ MVEBU_COMPHY_CONF(2, 0, PHY_MODE_2500SGMII, 0x1),
MVEBU_COMPHY_CONF(2, 0, PHY_MODE_10GKR, 0x1),
/* lane 3 */
MVEBU_COMPHY_CONF(3, 1, PHY_MODE_SGMII, 0x2),
+ MVEBU_COMPHY_CONF(3, 1, PHY_MODE_2500SGMII, 0x2),
/* lane 4 */
MVEBU_COMPHY_CONF(4, 0, PHY_MODE_SGMII, 0x2),
+ MVEBU_COMPHY_CONF(4, 0, PHY_MODE_2500SGMII, 0x2),
MVEBU_COMPHY_CONF(4, 0, PHY_MODE_10GKR, 0x2),
MVEBU_COMPHY_CONF(4, 1, PHY_MODE_SGMII, 0x1),
/* lane 5 */
MVEBU_COMPHY_CONF(5, 2, PHY_MODE_SGMII, 0x1),
+ MVEBU_COMPHY_CONF(5, 2, PHY_MODE_2500SGMII, 0x1),
};
struct mvebu_comphy_priv {
@@ -206,6 +212,10 @@
if (mode == PHY_MODE_10GKR)
val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0xe) |
MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0xe);
+ else if (mode == PHY_MODE_2500SGMII)
+ val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0x8) |
+ MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0x8) |
+ MVEBU_COMPHY_SERDES_CFG0_HALF_BUS;
else if (mode == PHY_MODE_SGMII)
val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0x6) |
MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0x6) |
@@ -296,13 +306,13 @@
return 0;
}
-static int mvebu_comphy_set_mode_sgmii(struct phy *phy)
+static int mvebu_comphy_set_mode_sgmii(struct phy *phy, enum phy_mode mode)
{
struct mvebu_comphy_lane *lane = phy_get_drvdata(phy);
struct mvebu_comphy_priv *priv = lane->priv;
u32 val;
- mvebu_comphy_ethernet_init_reset(lane, PHY_MODE_SGMII);
+ mvebu_comphy_ethernet_init_reset(lane, mode);
val = readl(priv->base + MVEBU_COMPHY_RX_CTRL1(lane->id));
val &= ~MVEBU_COMPHY_RX_CTRL1_CLK8T_EN;
@@ -487,7 +497,8 @@
switch (lane->mode) {
case PHY_MODE_SGMII:
- ret = mvebu_comphy_set_mode_sgmii(phy);
+ case PHY_MODE_2500SGMII:
+ ret = mvebu_comphy_set_mode_sgmii(phy, lane->mode);
break;
case PHY_MODE_10GKR:
ret = mvebu_comphy_set_mode_10gkr(phy);
diff --git a/drivers/ptp/ptp_pch.c b/drivers/ptp/ptp_pch.c
index b328517..78ccf93 100644
--- a/drivers/ptp/ptp_pch.c
+++ b/drivers/ptp/ptp_pch.c
@@ -452,7 +452,6 @@
static int ptp_pch_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
{
u64 ns;
- u32 remainder;
unsigned long flags;
struct pch_dev *pch_dev = container_of(ptp, struct pch_dev, caps);
struct pch_ts_regs __iomem *regs = pch_dev->regs;
@@ -461,8 +460,7 @@
ns = pch_systime_read(regs);
spin_unlock_irqrestore(&pch_dev->register_lock, flags);
- ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
- ts->tv_nsec = remainder;
+ *ts = ns_to_timespec64(ns);
return 0;
}
@@ -474,8 +472,7 @@
struct pch_dev *pch_dev = container_of(ptp, struct pch_dev, caps);
struct pch_ts_regs __iomem *regs = pch_dev->regs;
- ns = ts->tv_sec * 1000000000ULL;
- ns += ts->tv_nsec;
+ ns = timespec64_to_ns(ts);
spin_lock_irqsave(&pch_dev->register_lock, flags);
pch_systime_write(regs, ns);
diff --git a/drivers/s390/net/lcs.c b/drivers/s390/net/lcs.c
index 0ee8f33..2d9fe7e 100644
--- a/drivers/s390/net/lcs.c
+++ b/drivers/s390/net/lcs.c
@@ -1928,6 +1928,8 @@
return -EINVAL;
/* TODO: sanity checks */
card->portno = value;
+ if (card->dev)
+ card->dev->dev_port = card->portno;
return count;
@@ -2158,6 +2160,7 @@
card->dev = dev;
card->dev->ml_priv = card;
card->dev->netdev_ops = &lcs_netdev_ops;
+ card->dev->dev_port = card->portno;
memcpy(card->dev->dev_addr, card->mac, LCS_MAC_LENGTH);
#ifdef CONFIG_IP_MULTICAST
if (!lcs_check_multicast_support(card))
diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 78b98b3..2a5fec5 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -148,6 +148,7 @@
unsigned int tx_csum;
unsigned int tx_lin;
unsigned int tx_linfail;
+ unsigned int rx_csum;
};
/* Routing stuff */
@@ -712,9 +713,6 @@
struct qeth_discipline {
const struct device_type *devtype;
- void (*start_poll)(struct ccw_device *, int, unsigned long);
- qdio_handler_t *input_handler;
- qdio_handler_t *output_handler;
int (*process_rx_buffer)(struct qeth_card *card, int budget, int *done);
int (*recover)(void *ptr);
int (*setup) (struct ccwgroup_device *);
@@ -780,9 +778,9 @@
struct qeth_card_options options;
wait_queue_head_t wait_q;
- spinlock_t vlanlock;
spinlock_t mclock;
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
+ struct mutex vid_list_mutex; /* vid_list */
struct list_head vid_list;
DECLARE_HASHTABLE(mac_htable, 4);
DECLARE_HASHTABLE(ip_htable, 4);
@@ -867,6 +865,32 @@
}
}
+static inline void qeth_rx_csum(struct qeth_card *card, struct sk_buff *skb,
+ u8 flags)
+{
+ if ((card->dev->features & NETIF_F_RXCSUM) &&
+ (flags & QETH_HDR_EXT_CSUM_TRANSP_REQ)) {
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+ if (card->options.performance_stats)
+ card->perf_stats.rx_csum++;
+ } else {
+ skb->ip_summed = CHECKSUM_NONE;
+ }
+}
+
+static inline void qeth_tx_csum(struct sk_buff *skb, u8 *flags, int ipv)
+{
+ *flags |= QETH_HDR_EXT_CSUM_TRANSP_REQ;
+ if ((ipv == 4 && ip_hdr(skb)->protocol == IPPROTO_UDP) ||
+ (ipv == 6 && ipv6_hdr(skb)->nexthdr == IPPROTO_UDP))
+ *flags |= QETH_HDR_EXT_UDP;
+ if (ipv == 4) {
+ /* some HW requires combined L3+L4 csum offload: */
+ *flags |= QETH_HDR_EXT_CSUM_HDR_REQ;
+ ip_hdr(skb)->check = 0;
+ }
+}
+
static inline void qeth_put_buffer_pool_entry(struct qeth_card *card,
struct qeth_buffer_pool_entry *entry)
{
@@ -879,6 +903,27 @@
return card->info.diagass_support & (__u32)cmd;
}
+int qeth_send_simple_setassparms_prot(struct qeth_card *card,
+ enum qeth_ipa_funcs ipa_func,
+ u16 cmd_code, long data,
+ enum qeth_prot_versions prot);
+/* IPv4 variant */
+static inline int qeth_send_simple_setassparms(struct qeth_card *card,
+ enum qeth_ipa_funcs ipa_func,
+ u16 cmd_code, long data)
+{
+ return qeth_send_simple_setassparms_prot(card, ipa_func, cmd_code,
+ data, QETH_PROT_IPV4);
+}
+
+static inline int qeth_send_simple_setassparms_v6(struct qeth_card *card,
+ enum qeth_ipa_funcs ipa_func,
+ u16 cmd_code, long data)
+{
+ return qeth_send_simple_setassparms_prot(card, ipa_func, cmd_code,
+ data, QETH_PROT_IPV6);
+}
+
extern struct qeth_discipline qeth_l2_discipline;
extern struct qeth_discipline qeth_l3_discipline;
extern const struct attribute_group *qeth_generic_attr_groups[];
@@ -921,13 +966,7 @@
struct qeth_qdio_buffer *, struct qdio_buffer_element **, int *,
struct qeth_hdr **);
void qeth_schedule_recovery(struct qeth_card *);
-void qeth_qdio_start_poll(struct ccw_device *, int, unsigned long);
int qeth_poll(struct napi_struct *napi, int budget);
-void qeth_qdio_input_handler(struct ccw_device *,
- unsigned int, unsigned int, int,
- int, unsigned long);
-void qeth_qdio_output_handler(struct ccw_device *, unsigned int,
- int, int, int, unsigned long);
void qeth_clear_ipacmd_list(struct qeth_card *);
int qeth_qdio_clear_card(struct qeth_card *, int);
void qeth_clear_working_pool_list(struct qeth_card *);
@@ -979,8 +1018,6 @@
int qeth_query_ipassists(struct qeth_card *, enum qeth_prot_versions prot);
void qeth_trace_features(struct qeth_card *);
void qeth_close_dev(struct qeth_card *);
-int qeth_send_simple_setassparms(struct qeth_card *, enum qeth_ipa_funcs,
- __u16, long);
int qeth_send_setassparms(struct qeth_card *, struct qeth_cmd_buffer *, __u16,
long,
int (*reply_cb)(struct qeth_card *,
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index dffd820..06415b6 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1467,13 +1467,13 @@
card->lan_online = 0;
card->read_or_write_problem = 0;
card->dev = NULL;
- spin_lock_init(&card->vlanlock);
spin_lock_init(&card->mclock);
spin_lock_init(&card->lock);
spin_lock_init(&card->ip_lock);
spin_lock_init(&card->thread_mask_lock);
mutex_init(&card->conf_mutex);
mutex_init(&card->discipline_mutex);
+ mutex_init(&card->vid_list_mutex);
card->thread_start_mask = 0;
card->thread_allowed_mask = 0;
card->thread_running_mask = 0;
@@ -3588,15 +3588,14 @@
}
}
-void qeth_qdio_start_poll(struct ccw_device *ccwdev, int queue,
- unsigned long card_ptr)
+static void qeth_qdio_start_poll(struct ccw_device *ccwdev, int queue,
+ unsigned long card_ptr)
{
struct qeth_card *card = (struct qeth_card *)card_ptr;
if (card->dev && (card->dev->flags & IFF_UP))
napi_schedule(&card->napi);
}
-EXPORT_SYMBOL_GPL(qeth_qdio_start_poll);
int qeth_configure_cq(struct qeth_card *card, enum qeth_cq cq)
{
@@ -3698,9 +3697,10 @@
return;
}
-void qeth_qdio_input_handler(struct ccw_device *ccwdev, unsigned int qdio_err,
- unsigned int queue, int first_elem, int count,
- unsigned long card_ptr)
+static void qeth_qdio_input_handler(struct ccw_device *ccwdev,
+ unsigned int qdio_err, int queue,
+ int first_elem, int count,
+ unsigned long card_ptr)
{
struct qeth_card *card = (struct qeth_card *)card_ptr;
@@ -3711,14 +3711,12 @@
qeth_qdio_cq_handler(card, qdio_err, queue, first_elem, count);
else if (qdio_err)
qeth_schedule_recovery(card);
-
-
}
-EXPORT_SYMBOL_GPL(qeth_qdio_input_handler);
-void qeth_qdio_output_handler(struct ccw_device *ccwdev,
- unsigned int qdio_error, int __queue, int first_element,
- int count, unsigned long card_ptr)
+static void qeth_qdio_output_handler(struct ccw_device *ccwdev,
+ unsigned int qdio_error, int __queue,
+ int first_element, int count,
+ unsigned long card_ptr)
{
struct qeth_card *card = (struct qeth_card *) card_ptr;
struct qeth_qdio_out_q *queue = card->qdio.out_qs[__queue];
@@ -3787,7 +3785,6 @@
card->perf_stats.outbound_handler_time += qeth_get_micros() -
card->perf_stats.outbound_handler_start_time;
}
-EXPORT_SYMBOL_GPL(qeth_qdio_output_handler);
/* We cannot use outbound queue 3 for unicast packets on HiperSockets */
static inline int qeth_cut_iqd_prio(struct qeth_card *card, int queue_num)
@@ -4995,7 +4992,7 @@
goto out_free_in_sbals;
}
for (i = 0; i < card->qdio.no_in_queues; ++i)
- queue_start_poll[i] = card->discipline->start_poll;
+ queue_start_poll[i] = qeth_qdio_start_poll;
qeth_qdio_establish_cq(card, in_sbal_ptrs, queue_start_poll);
@@ -5019,8 +5016,8 @@
init_data.qib_param_field = qib_param_field;
init_data.no_input_qs = card->qdio.no_in_queues;
init_data.no_output_qs = card->qdio.no_out_queues;
- init_data.input_handler = card->discipline->input_handler;
- init_data.output_handler = card->discipline->output_handler;
+ init_data.input_handler = qeth_qdio_input_handler;
+ init_data.output_handler = qeth_qdio_output_handler;
init_data.queue_start_poll_array = queue_start_poll;
init_data.int_parm = (unsigned long) card;
init_data.input_sbal_addr_array = (void **) in_sbal_ptrs;
@@ -5204,6 +5201,11 @@
rc = qeth_query_ipassists(card, QETH_PROT_IPV4);
if (rc == -ENOMEM)
goto out;
+ if (qeth_is_supported(card, IPA_IPV6)) {
+ rc = qeth_query_ipassists(card, QETH_PROT_IPV6);
+ if (rc == -ENOMEM)
+ goto out;
+ }
if (qeth_is_supported(card, IPA_SETADAPTERPARMS)) {
rc = qeth_query_setadapterparms(card);
if (rc < 0) {
@@ -5511,26 +5513,26 @@
}
EXPORT_SYMBOL_GPL(qeth_send_setassparms);
-int qeth_send_simple_setassparms(struct qeth_card *card,
- enum qeth_ipa_funcs ipa_func,
- __u16 cmd_code, long data)
+int qeth_send_simple_setassparms_prot(struct qeth_card *card,
+ enum qeth_ipa_funcs ipa_func,
+ u16 cmd_code, long data,
+ enum qeth_prot_versions prot)
{
int rc;
int length = 0;
struct qeth_cmd_buffer *iob;
- QETH_CARD_TEXT(card, 4, "simassp4");
+ QETH_CARD_TEXT_(card, 4, "simassp%i", prot);
if (data)
length = sizeof(__u32);
- iob = qeth_get_setassparms_cmd(card, ipa_func, cmd_code,
- length, QETH_PROT_IPV4);
+ iob = qeth_get_setassparms_cmd(card, ipa_func, cmd_code, length, prot);
if (!iob)
return -ENOMEM;
rc = qeth_send_setassparms(card, iob, length, data,
qeth_setassparms_cb, NULL);
return rc;
}
-EXPORT_SYMBOL_GPL(qeth_send_simple_setassparms);
+EXPORT_SYMBOL_GPL(qeth_send_simple_setassparms_prot);
static void qeth_unregister_dbf_views(void)
{
@@ -6008,7 +6010,8 @@
{"tx lin"},
{"tx linfail"},
{"cq handler count"},
- {"cq handler time"}
+ {"cq handler time"},
+ {"rx csum"}
};
int qeth_core_get_sset_count(struct net_device *dev, int stringset)
@@ -6070,6 +6073,7 @@
data[35] = card->perf_stats.tx_linfail;
data[36] = card->perf_stats.cq_cnt;
data[37] = card->perf_stats.cq_time;
+ data[38] = card->perf_stats.rx_csum;
}
EXPORT_SYMBOL_GPL(qeth_core_get_ethtool_stats);
@@ -6326,14 +6330,15 @@
static int qeth_ipa_checksum_run_cmd(struct qeth_card *card,
enum qeth_ipa_funcs ipa_func,
__u16 cmd_code, long data,
- struct qeth_checksum_cmd *chksum_cb)
+ struct qeth_checksum_cmd *chksum_cb,
+ enum qeth_prot_versions prot)
{
struct qeth_cmd_buffer *iob;
int rc = -ENOMEM;
QETH_CARD_TEXT(card, 4, "chkdocmd");
iob = qeth_get_setassparms_cmd(card, ipa_func, cmd_code,
- sizeof(__u32), QETH_PROT_IPV4);
+ sizeof(__u32), prot);
if (iob)
rc = qeth_send_setassparms(card, iob, sizeof(__u32), data,
qeth_ipa_checksum_run_cmd_cb,
@@ -6341,16 +6346,17 @@
return rc;
}
-static int qeth_send_checksum_on(struct qeth_card *card, int cstype)
+static int qeth_send_checksum_on(struct qeth_card *card, int cstype,
+ enum qeth_prot_versions prot)
{
- const __u32 required_features = QETH_IPA_CHECKSUM_IP_HDR |
- QETH_IPA_CHECKSUM_UDP |
- QETH_IPA_CHECKSUM_TCP;
+ u32 required_features = QETH_IPA_CHECKSUM_UDP | QETH_IPA_CHECKSUM_TCP;
struct qeth_checksum_cmd chksum_cb;
int rc;
+ if (prot == QETH_PROT_IPV4)
+ required_features |= QETH_IPA_CHECKSUM_IP_HDR;
rc = qeth_ipa_checksum_run_cmd(card, cstype, IPA_CMD_ASS_START, 0,
- &chksum_cb);
+ &chksum_cb, prot);
if (!rc) {
if ((required_features & chksum_cb.supported) !=
required_features)
@@ -6362,37 +6368,42 @@
QETH_CARD_IFNAME(card));
}
if (rc) {
- qeth_send_simple_setassparms(card, cstype, IPA_CMD_ASS_STOP, 0);
+ qeth_send_simple_setassparms_prot(card, cstype,
+ IPA_CMD_ASS_STOP, 0, prot);
dev_warn(&card->gdev->dev,
- "Starting HW checksumming for %s failed, using SW checksumming\n",
- QETH_CARD_IFNAME(card));
+ "Starting HW IPv%d checksumming for %s failed, using SW checksumming\n",
+ prot, QETH_CARD_IFNAME(card));
return rc;
}
rc = qeth_ipa_checksum_run_cmd(card, cstype, IPA_CMD_ASS_ENABLE,
- chksum_cb.supported, &chksum_cb);
+ chksum_cb.supported, &chksum_cb,
+ prot);
if (!rc) {
if ((required_features & chksum_cb.enabled) !=
required_features)
rc = -EIO;
}
if (rc) {
- qeth_send_simple_setassparms(card, cstype, IPA_CMD_ASS_STOP, 0);
+ qeth_send_simple_setassparms_prot(card, cstype,
+ IPA_CMD_ASS_STOP, 0, prot);
dev_warn(&card->gdev->dev,
- "Enabling HW checksumming for %s failed, using SW checksumming\n",
- QETH_CARD_IFNAME(card));
+ "Enabling HW IPv%d checksumming for %s failed, using SW checksumming\n",
+ prot, QETH_CARD_IFNAME(card));
return rc;
}
- dev_info(&card->gdev->dev, "HW Checksumming (%sbound) enabled\n",
- cstype == IPA_INBOUND_CHECKSUM ? "in" : "out");
+ dev_info(&card->gdev->dev, "HW Checksumming (%sbound IPv%d) enabled\n",
+ cstype == IPA_INBOUND_CHECKSUM ? "in" : "out", prot);
return 0;
}
-static int qeth_set_ipa_csum(struct qeth_card *card, int on, int cstype)
+static int qeth_set_ipa_csum(struct qeth_card *card, bool on, int cstype,
+ enum qeth_prot_versions prot)
{
- int rc = (on) ? qeth_send_checksum_on(card, cstype)
- : qeth_send_simple_setassparms(card, cstype,
- IPA_CMD_ASS_STOP, 0);
+ int rc = (on) ? qeth_send_checksum_on(card, cstype, prot)
+ : qeth_send_simple_setassparms_prot(card, cstype,
+ IPA_CMD_ASS_STOP, 0,
+ prot);
return rc ? -EIO : 0;
}
@@ -6419,8 +6430,31 @@
return rc;
}
-#define QETH_HW_FEATURES (NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_TSO)
+static int qeth_set_ipa_rx_csum(struct qeth_card *card, bool on)
+{
+ int rc_ipv4 = (on) ? -EOPNOTSUPP : 0;
+ int rc_ipv6;
+ if (qeth_is_supported(card, IPA_INBOUND_CHECKSUM))
+ rc_ipv4 = qeth_set_ipa_csum(card, on, IPA_INBOUND_CHECKSUM,
+ QETH_PROT_IPV4);
+ if (!qeth_is_supported6(card, IPA_INBOUND_CHECKSUM_V6))
+ /* no/one Offload Assist available, so the rc is trivial */
+ return rc_ipv4;
+
+ rc_ipv6 = qeth_set_ipa_csum(card, on, IPA_INBOUND_CHECKSUM,
+ QETH_PROT_IPV6);
+
+ if (on)
+ /* enable: success if any Assist is active */
+ return (rc_ipv6) ? rc_ipv4 : 0;
+
+ /* disable: failure if any Assist is still active */
+ return (rc_ipv6) ? rc_ipv6 : rc_ipv4;
+}
+
+#define QETH_HW_FEATURES (NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_TSO | \
+ NETIF_F_IPV6_CSUM)
/**
* qeth_recover_features() - Restore device features after recovery
* @dev: the recovering net_device
@@ -6455,16 +6489,19 @@
QETH_DBF_HEX(SETUP, 2, &features, sizeof(features));
if ((changed & NETIF_F_IP_CSUM)) {
- rc = qeth_set_ipa_csum(card,
- features & NETIF_F_IP_CSUM ? 1 : 0,
- IPA_OUTBOUND_CHECKSUM);
+ rc = qeth_set_ipa_csum(card, features & NETIF_F_IP_CSUM,
+ IPA_OUTBOUND_CHECKSUM, QETH_PROT_IPV4);
if (rc)
changed ^= NETIF_F_IP_CSUM;
}
- if ((changed & NETIF_F_RXCSUM)) {
- rc = qeth_set_ipa_csum(card,
- features & NETIF_F_RXCSUM ? 1 : 0,
- IPA_INBOUND_CHECKSUM);
+ if (changed & NETIF_F_IPV6_CSUM) {
+ rc = qeth_set_ipa_csum(card, features & NETIF_F_IPV6_CSUM,
+ IPA_OUTBOUND_CHECKSUM, QETH_PROT_IPV6);
+ if (rc)
+ changed ^= NETIF_F_IPV6_CSUM;
+ }
+ if (changed & NETIF_F_RXCSUM) {
+ rc = qeth_set_ipa_rx_csum(card, features & NETIF_F_RXCSUM);
if (rc)
changed ^= NETIF_F_RXCSUM;
}
@@ -6491,7 +6528,10 @@
QETH_DBF_TEXT(SETUP, 2, "fixfeat");
if (!qeth_is_supported(card, IPA_OUTBOUND_CHECKSUM))
features &= ~NETIF_F_IP_CSUM;
- if (!qeth_is_supported(card, IPA_INBOUND_CHECKSUM))
+ if (!qeth_is_supported6(card, IPA_OUTBOUND_CHECKSUM_V6))
+ features &= ~NETIF_F_IPV6_CSUM;
+ if (!qeth_is_supported(card, IPA_INBOUND_CHECKSUM) &&
+ !qeth_is_supported6(card, IPA_INBOUND_CHECKSUM_V6))
features &= ~NETIF_F_RXCSUM;
if (!qeth_is_supported(card, IPA_OUTBOUND_TSO))
features &= ~NETIF_F_TSO;
diff --git a/drivers/s390/net/qeth_core_mpc.h b/drivers/s390/net/qeth_core_mpc.h
index f4d1ec0b..878e62f 100644
--- a/drivers/s390/net/qeth_core_mpc.h
+++ b/drivers/s390/net/qeth_core_mpc.h
@@ -246,6 +246,8 @@
IPA_QUERY_ARP_ASSIST = 0x00040000L,
IPA_INBOUND_TSO = 0x00080000L,
IPA_OUTBOUND_TSO = 0x00100000L,
+ IPA_INBOUND_CHECKSUM_V6 = 0x00400000L,
+ IPA_OUTBOUND_CHECKSUM_V6 = 0x00800000L,
};
/* SETIP/DELIP IPA Command: ***************************************************/
diff --git a/drivers/s390/net/qeth_core_sys.c b/drivers/s390/net/qeth_core_sys.c
index ae81534d..c3f18af 100644
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -144,6 +144,8 @@
goto out;
}
card->info.portno = portno;
+ if (card->dev)
+ card->dev->dev_port = portno;
out:
mutex_unlock(&card->conf_mutex);
return rc ? rc : count;
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index b8079f2..a7cb37d 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -17,7 +17,6 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/etherdevice.h>
-#include <linux/ip.h>
#include <linux/list.h>
#include <linux/hash.h>
#include <linux/hashtable.h>
@@ -195,23 +194,6 @@
return RTN_UNSPEC;
}
-static void qeth_l2_hdr_csum(struct qeth_card *card, struct qeth_hdr *hdr,
- struct sk_buff *skb)
-{
- struct iphdr *iph = ip_hdr(skb);
-
- /* tcph->check contains already the pseudo hdr checksum
- * so just set the header flags
- */
- if (iph->protocol == IPPROTO_UDP)
- hdr->hdr.l2.flags[1] |= QETH_HDR_EXT_UDP;
- hdr->hdr.l2.flags[1] |= QETH_HDR_EXT_CSUM_TRANSP_REQ |
- QETH_HDR_EXT_CSUM_HDR_REQ;
- iph->check = 0;
- if (card->options.performance_stats)
- card->perf_stats.tx_csum++;
-}
-
static void qeth_l2_fill_header(struct qeth_hdr *hdr, struct sk_buff *skb,
int cast_type, unsigned int data_len)
{
@@ -297,12 +279,13 @@
static void qeth_l2_process_vlans(struct qeth_card *card)
{
struct qeth_vlan_vid *id;
+
QETH_CARD_TEXT(card, 3, "L2prcvln");
- spin_lock_bh(&card->vlanlock);
+ mutex_lock(&card->vid_list_mutex);
list_for_each_entry(id, &card->vid_list, list) {
qeth_l2_send_setdelvlan(card, id->vid, IPA_CMD_SETVLAN);
}
- spin_unlock_bh(&card->vlanlock);
+ mutex_unlock(&card->vid_list_mutex);
}
static int qeth_l2_vlan_rx_add_vid(struct net_device *dev,
@@ -319,7 +302,7 @@
QETH_CARD_TEXT(card, 3, "aidREC");
return 0;
}
- id = kmalloc(sizeof(struct qeth_vlan_vid), GFP_ATOMIC);
+ id = kmalloc(sizeof(*id), GFP_KERNEL);
if (id) {
id->vid = vid;
rc = qeth_l2_send_setdelvlan(card, vid, IPA_CMD_SETVLAN);
@@ -327,9 +310,9 @@
kfree(id);
return rc;
}
- spin_lock_bh(&card->vlanlock);
+ mutex_lock(&card->vid_list_mutex);
list_add_tail(&id->list, &card->vid_list);
- spin_unlock_bh(&card->vlanlock);
+ mutex_unlock(&card->vid_list_mutex);
} else {
return -ENOMEM;
}
@@ -348,7 +331,7 @@
QETH_CARD_TEXT(card, 3, "kidREC");
return 0;
}
- spin_lock_bh(&card->vlanlock);
+ mutex_lock(&card->vid_list_mutex);
list_for_each_entry(id, &card->vid_list, list) {
if (id->vid == vid) {
list_del(&id->list);
@@ -356,7 +339,7 @@
break;
}
}
- spin_unlock_bh(&card->vlanlock);
+ mutex_unlock(&card->vid_list_mutex);
if (tmpid) {
rc = qeth_l2_send_setdelvlan(card, vid, IPA_CMD_DELVLAN);
kfree(tmpid);
@@ -423,15 +406,7 @@
switch (hdr->hdr.l2.id) {
case QETH_HEADER_TYPE_LAYER2:
skb->protocol = eth_type_trans(skb, skb->dev);
- if ((card->dev->features & NETIF_F_RXCSUM)
- && ((hdr->hdr.l2.flags[1] &
- (QETH_HDR_EXT_CSUM_HDR_REQ |
- QETH_HDR_EXT_CSUM_TRANSP_REQ)) ==
- (QETH_HDR_EXT_CSUM_HDR_REQ |
- QETH_HDR_EXT_CSUM_TRANSP_REQ)))
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- else
- skb->ip_summed = CHECKSUM_NONE;
+ qeth_rx_csum(card, skb, hdr->hdr.l2.flags[1]);
if (skb->protocol == htons(ETH_P_802_2))
*((__u32 *)skb->cb) = ++card->seqno.pkt_seqno;
len = skb->len;
@@ -464,7 +439,6 @@
static int qeth_l2_request_initial_mac(struct qeth_card *card)
{
int rc = 0;
- char vendor_pre[] = {0x02, 0x00, 0x00};
QETH_DBF_TEXT(SETUP, 2, "l2reqmac");
QETH_DBF_TEXT_(SETUP, 2, "doL2%s", CARD_BUS_ID(card));
@@ -484,16 +458,20 @@
card->info.type == QETH_CARD_TYPE_OSX ||
card->info.guestlan) {
rc = qeth_setadpparms_change_macaddr(card);
- if (rc) {
- QETH_DBF_MESSAGE(2, "couldn't get MAC address on "
- "device %s: x%x\n", CARD_BUS_ID(card), rc);
- QETH_DBF_TEXT_(SETUP, 2, "1err%04x", rc);
- return rc;
- }
- } else {
- eth_random_addr(card->dev->dev_addr);
- memcpy(card->dev->dev_addr, vendor_pre, 3);
+ if (!rc)
+ goto out;
+ QETH_DBF_MESSAGE(2, "READ_MAC Assist failed on device %s: x%x\n",
+ CARD_BUS_ID(card), rc);
+ QETH_DBF_TEXT_(SETUP, 2, "1err%04x", rc);
+ /* fall back once more: */
}
+
+ /* some devices don't support a custom MAC address: */
+ if (card->info.type == QETH_CARD_TYPE_OSM ||
+ card->info.type == QETH_CARD_TYPE_OSX)
+ return (rc) ? rc : -EADDRNOTAVAIL;
+ eth_hw_addr_random(card->dev);
+
out:
QETH_DBF_HEX(SETUP, 2, card->dev->dev_addr, card->dev->addr_len);
return 0;
@@ -685,7 +663,8 @@
}
static int qeth_l2_xmit_osa(struct qeth_card *card, struct sk_buff *skb,
- struct qeth_qdio_out_q *queue, int cast_type)
+ struct qeth_qdio_out_q *queue, int cast_type,
+ int ipv)
{
int push_len = sizeof(struct qeth_hdr);
unsigned int elements, nr_frags;
@@ -723,8 +702,11 @@
hdr_elements = 1;
}
qeth_l2_fill_header(hdr, skb, cast_type, skb->len - push_len);
- if (skb->ip_summed == CHECKSUM_PARTIAL)
- qeth_l2_hdr_csum(card, hdr, skb);
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ qeth_tx_csum(skb, &hdr->hdr.l2.flags[1], ipv);
+ if (card->options.performance_stats)
+ card->perf_stats.tx_csum++;
+ }
elements = qeth_get_elements_no(card, skb, hdr_elements, 0);
if (!elements) {
@@ -776,6 +758,7 @@
{
struct qeth_card *card = dev->ml_priv;
int cast_type = qeth_l2_get_cast_type(card, skb);
+ int ipv = qeth_get_ip_version(skb);
struct qeth_qdio_out_q *queue;
int tx_bytes = skb->len;
int rc;
@@ -783,7 +766,7 @@
if (card->qdio.do_prio_queueing || (cast_type &&
card->info.is_multicast_different))
queue = card->qdio.out_qs[qeth_get_priority_queue(card, skb,
- qeth_get_ip_version(skb), cast_type)];
+ ipv, cast_type)];
else
queue = card->qdio.out_qs[card->qdio.default_out_queue];
@@ -806,7 +789,7 @@
rc = qeth_l2_xmit_iqd(card, skb, queue, cast_type);
break;
default:
- rc = qeth_l2_xmit_osa(card, skb, queue, cast_type);
+ rc = qeth_l2_xmit_osa(card, skb, queue, cast_type, ipv);
}
if (!rc) {
@@ -983,6 +966,7 @@
card->dev->mtu = card->info.initial_mtu;
card->dev->min_mtu = 64;
card->dev->max_mtu = ETH_MAX_MTU;
+ card->dev->dev_port = card->info.portno;
card->dev->netdev_ops = &qeth_l2_netdev_ops;
if (card->info.type == QETH_CARD_TYPE_OSN) {
card->dev->ethtool_ops = &qeth_l2_osn_ops;
@@ -1011,10 +995,15 @@
card->dev->hw_features |= NETIF_F_IP_CSUM;
card->dev->vlan_features |= NETIF_F_IP_CSUM;
}
- if (qeth_is_supported(card, IPA_INBOUND_CHECKSUM)) {
- card->dev->hw_features |= NETIF_F_RXCSUM;
- card->dev->vlan_features |= NETIF_F_RXCSUM;
- }
+ }
+ if (qeth_is_supported6(card, IPA_OUTBOUND_CHECKSUM_V6)) {
+ card->dev->hw_features |= NETIF_F_IPV6_CSUM;
+ card->dev->vlan_features |= NETIF_F_IPV6_CSUM;
+ }
+ if (qeth_is_supported(card, IPA_INBOUND_CHECKSUM) ||
+ qeth_is_supported6(card, IPA_INBOUND_CHECKSUM_V6)) {
+ card->dev->hw_features |= NETIF_F_RXCSUM;
+ card->dev->vlan_features |= NETIF_F_RXCSUM;
}
card->info.broadcast_capable = 1;
@@ -1315,9 +1304,6 @@
struct qeth_discipline qeth_l2_discipline = {
.devtype = &qeth_l2_devtype,
- .start_poll = qeth_qdio_start_poll,
- .input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
- .output_handler = (qdio_handler_t *) qeth_qdio_output_handler,
.process_rx_buffer = qeth_l2_process_inbound_buffer,
.recover = qeth_l2_recover,
.setup = qeth_l2_probe_device,
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index c1a16a7..e7fa479 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -735,22 +735,6 @@
return rc;
}
-static int qeth_l3_send_simple_setassparms_ipv6(struct qeth_card *card,
- enum qeth_ipa_funcs ipa_func, __u16 cmd_code)
-{
- int rc;
- struct qeth_cmd_buffer *iob;
-
- QETH_CARD_TEXT(card, 4, "simassp6");
- iob = qeth_get_setassparms_cmd(card, ipa_func, cmd_code,
- 0, QETH_PROT_IPV6);
- if (!iob)
- return -ENOMEM;
- rc = qeth_send_setassparms(card, iob, 0, 0,
- qeth_setassparms_cb, NULL);
- return rc;
-}
-
static int qeth_l3_start_ipa_arp_processing(struct qeth_card *card)
{
int rc;
@@ -851,14 +835,6 @@
QETH_CARD_TEXT(card, 3, "softipv6");
- rc = qeth_query_ipassists(card, QETH_PROT_IPV6);
- if (rc) {
- dev_err(&card->gdev->dev,
- "Activating IPv6 support for %s failed\n",
- QETH_CARD_IFNAME(card));
- return rc;
- }
-
if (card->info.type == QETH_CARD_TYPE_IQD)
goto out;
@@ -870,16 +846,16 @@
QETH_CARD_IFNAME(card));
return rc;
}
- rc = qeth_l3_send_simple_setassparms_ipv6(card, IPA_IPV6,
- IPA_CMD_ASS_START);
+ rc = qeth_send_simple_setassparms_v6(card, IPA_IPV6,
+ IPA_CMD_ASS_START, 0);
if (rc) {
dev_err(&card->gdev->dev,
"Activating IPv6 support for %s failed\n",
QETH_CARD_IFNAME(card));
return rc;
}
- rc = qeth_l3_send_simple_setassparms_ipv6(card, IPA_PASSTHRU,
- IPA_CMD_ASS_START);
+ rc = qeth_send_simple_setassparms_v6(card, IPA_PASSTHRU,
+ IPA_CMD_ASS_START, 0);
if (rc) {
dev_warn(&card->gdev->dev,
"Enabling the passthrough mode for %s failed\n",
@@ -1293,91 +1269,6 @@
in6_dev_put(in6_dev);
}
-static void qeth_l3_free_vlan_addresses4(struct qeth_card *card,
- unsigned short vid)
-{
- struct in_device *in_dev;
- struct in_ifaddr *ifa;
- struct qeth_ipaddr *addr;
- struct net_device *netdev;
-
- QETH_CARD_TEXT(card, 4, "frvaddr4");
-
- netdev = __vlan_find_dev_deep_rcu(card->dev, htons(ETH_P_8021Q), vid);
- if (!netdev)
- return;
- in_dev = in_dev_get(netdev);
- if (!in_dev)
- return;
-
- addr = qeth_l3_get_addr_buffer(QETH_PROT_IPV4);
- if (!addr)
- goto out;
-
- spin_lock_bh(&card->ip_lock);
-
- for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next) {
- addr->u.a4.addr = be32_to_cpu(ifa->ifa_address);
- addr->u.a4.mask = be32_to_cpu(ifa->ifa_mask);
- addr->type = QETH_IP_TYPE_NORMAL;
- qeth_l3_delete_ip(card, addr);
- }
-
- spin_unlock_bh(&card->ip_lock);
-
- kfree(addr);
-out:
- in_dev_put(in_dev);
-}
-
-static void qeth_l3_free_vlan_addresses6(struct qeth_card *card,
- unsigned short vid)
-{
- struct inet6_dev *in6_dev;
- struct inet6_ifaddr *ifa;
- struct qeth_ipaddr *addr;
- struct net_device *netdev;
-
- QETH_CARD_TEXT(card, 4, "frvaddr6");
-
- netdev = __vlan_find_dev_deep_rcu(card->dev, htons(ETH_P_8021Q), vid);
- if (!netdev)
- return;
-
- in6_dev = in6_dev_get(netdev);
- if (!in6_dev)
- return;
-
- addr = qeth_l3_get_addr_buffer(QETH_PROT_IPV6);
- if (!addr)
- goto out;
-
- spin_lock_bh(&card->ip_lock);
-
- list_for_each_entry(ifa, &in6_dev->addr_list, if_list) {
- memcpy(&addr->u.a6.addr, &ifa->addr,
- sizeof(struct in6_addr));
- addr->u.a6.pfxlen = ifa->prefix_len;
- addr->type = QETH_IP_TYPE_NORMAL;
- qeth_l3_delete_ip(card, addr);
- }
-
- spin_unlock_bh(&card->ip_lock);
-
- kfree(addr);
-out:
- in6_dev_put(in6_dev);
-}
-
-static void qeth_l3_free_vlan_addresses(struct qeth_card *card,
- unsigned short vid)
-{
- rcu_read_lock();
- qeth_l3_free_vlan_addresses4(card, vid);
- qeth_l3_free_vlan_addresses6(card, vid);
- rcu_read_unlock();
-}
-
static int qeth_l3_vlan_rx_add_vid(struct net_device *dev,
__be16 proto, u16 vid)
{
@@ -1398,8 +1289,6 @@
QETH_CARD_TEXT(card, 3, "kidREC");
return 0;
}
- /* unregister IP addresses of vlan device */
- qeth_l3_free_vlan_addresses(card, vid);
clear_bit(vid, card->active_vlans);
qeth_l3_set_rx_mode(dev);
return 0;
@@ -1454,17 +1343,7 @@
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), tag);
}
- if (card->dev->features & NETIF_F_RXCSUM) {
- if ((hdr->hdr.l3.ext_flags &
- (QETH_HDR_EXT_CSUM_HDR_REQ |
- QETH_HDR_EXT_CSUM_TRANSP_REQ)) ==
- (QETH_HDR_EXT_CSUM_HDR_REQ |
- QETH_HDR_EXT_CSUM_TRANSP_REQ))
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- else
- skb->ip_summed = CHECKSUM_NONE;
- } else
- skb->ip_summed = CHECKSUM_NONE;
+ qeth_rx_csum(card, skb, hdr->hdr.l3.ext_flags);
}
static int qeth_l3_process_inbound_buffer(struct qeth_card *card,
@@ -2210,23 +2089,6 @@
rcu_read_unlock();
}
-static void qeth_l3_hdr_csum(struct qeth_card *card, struct qeth_hdr *hdr,
- struct sk_buff *skb)
-{
- struct iphdr *iph = ip_hdr(skb);
-
- /* tcph->check contains already the pseudo hdr checksum
- * so just set the header flags
- */
- if (iph->protocol == IPPROTO_UDP)
- hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_UDP;
- hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_CSUM_TRANSP_REQ |
- QETH_HDR_EXT_CSUM_HDR_REQ;
- iph->check = 0;
- if (card->options.performance_stats)
- card->perf_stats.tx_csum++;
-}
-
static void qeth_tso_fill_header(struct qeth_card *card,
struct qeth_hdr *qhdr, struct sk_buff *skb)
{
@@ -2418,8 +2280,11 @@
}
}
- if (skb->ip_summed == CHECKSUM_PARTIAL)
- qeth_l3_hdr_csum(card, hdr, new_skb);
+ if (new_skb->ip_summed == CHECKSUM_PARTIAL) {
+ qeth_tx_csum(new_skb, &hdr->hdr.l3.ext_flags, ipv);
+ if (card->options.performance_stats)
+ card->perf_stats.tx_csum++;
+ }
}
elements = use_tso ?
@@ -2620,28 +2485,32 @@
(card->info.link_type == QETH_LINK_TYPE_HSTR)) {
pr_info("qeth_l3: ignoring TR device\n");
return -ENODEV;
- } else {
- card->dev = alloc_etherdev(0);
- if (!card->dev)
- return -ENODEV;
- card->dev->netdev_ops = &qeth_l3_osa_netdev_ops;
+ }
- /*IPv6 address autoconfiguration stuff*/
- qeth_l3_get_unique_id(card);
- if (!(card->info.unique_id & UNIQUE_ID_NOT_BY_CARD))
- card->dev->dev_id = card->info.unique_id &
- 0xffff;
+ card->dev = alloc_etherdev(0);
+ if (!card->dev)
+ return -ENODEV;
+ card->dev->netdev_ops = &qeth_l3_osa_netdev_ops;
- card->dev->hw_features |= NETIF_F_SG;
- card->dev->vlan_features |= NETIF_F_SG;
+ /*IPv6 address autoconfiguration stuff*/
+ qeth_l3_get_unique_id(card);
+ if (!(card->info.unique_id & UNIQUE_ID_NOT_BY_CARD))
+ card->dev->dev_id = card->info.unique_id & 0xffff;
- if (!card->info.guestlan) {
- card->dev->features |= NETIF_F_SG;
- card->dev->hw_features |= NETIF_F_TSO |
- NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
- card->dev->vlan_features |= NETIF_F_TSO |
- NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
- }
+ card->dev->hw_features |= NETIF_F_SG;
+ card->dev->vlan_features |= NETIF_F_SG;
+
+ if (!card->info.guestlan) {
+ card->dev->features |= NETIF_F_SG;
+ card->dev->hw_features |= NETIF_F_TSO |
+ NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
+ card->dev->vlan_features |= NETIF_F_TSO |
+ NETIF_F_RXCSUM | NETIF_F_IP_CSUM;
+ }
+
+ if (qeth_is_supported6(card, IPA_OUTBOUND_CHECKSUM_V6)) {
+ card->dev->hw_features |= NETIF_F_IPV6_CSUM;
+ card->dev->vlan_features |= NETIF_F_IPV6_CSUM;
}
} else if (card->info.type == QETH_CARD_TYPE_IQD) {
card->dev = alloc_netdev(0, "hsi%d", NET_NAME_UNKNOWN,
@@ -2663,6 +2532,7 @@
card->dev->mtu = card->info.initial_mtu;
card->dev->min_mtu = 64;
card->dev->max_mtu = ETH_MAX_MTU;
+ card->dev->dev_port = card->info.portno;
card->dev->ethtool_ops = &qeth_l3_ethtool_ops;
card->dev->features |= NETIF_F_HW_VLAN_CTAG_TX |
NETIF_F_HW_VLAN_CTAG_RX |
@@ -2960,9 +2830,6 @@
struct qeth_discipline qeth_l3_discipline = {
.devtype = &qeth_l3_devtype,
- .start_poll = qeth_qdio_start_poll,
- .input_handler = (qdio_handler_t *) qeth_qdio_input_handler,
- .output_handler = (qdio_handler_t *) qeth_qdio_output_handler,
.process_rx_buffer = qeth_l3_process_inbound_buffer,
.recover = qeth_l3_recover,
.setup = qeth_l3_probe_device,
diff --git a/drivers/scsi/csiostor/csio_hw.c b/drivers/scsi/csiostor/csio_hw.c
index 96bbb82..a10cf25 100644
--- a/drivers/scsi/csiostor/csio_hw.c
+++ b/drivers/scsi/csiostor/csio_hw.c
@@ -1500,8 +1500,8 @@
CAP16_TO_CAP32(FC_RX);
CAP16_TO_CAP32(FC_TX);
CAP16_TO_CAP32(ANEG);
- CAP16_TO_CAP32(MDIX);
CAP16_TO_CAP32(MDIAUTO);
+ CAP16_TO_CAP32(MDISTRAIGHT);
CAP16_TO_CAP32(FEC_RS);
CAP16_TO_CAP32(FEC_BASER_RS);
CAP16_TO_CAP32(802_3_PAUSE);
diff --git a/drivers/scsi/qedf/qedf.h b/drivers/scsi/qedf/qedf.h
index c105a2e..cabb6af 100644
--- a/drivers/scsi/qedf/qedf.h
+++ b/drivers/scsi/qedf/qedf.h
@@ -383,11 +383,16 @@
u32 flogi_failed;
/* Used for fc statistics */
+ struct mutex stats_mutex;
u64 input_requests;
u64 output_requests;
u64 control_requests;
u64 packet_aborts;
u64 alloc_failures;
+ u8 lun_resets;
+ u8 target_resets;
+ u8 task_set_fulls;
+ u8 busy;
};
struct io_bdt {
@@ -496,7 +501,9 @@
extern void qedf_process_seq_cleanup_compl(struct qedf_ctx *qedf,
struct fcoe_cqe *cqe, struct qedf_ioreq *io_req);
extern int qedf_send_flogi(struct qedf_ctx *qedf);
+extern void qedf_get_protocol_tlv_data(void *dev, void *data);
extern void qedf_fp_io_handler(struct work_struct *work);
+extern void qedf_get_generic_tlv_data(void *dev, struct qed_generic_tlvs *data);
#define FCOE_WORD_TO_BYTE 4
#define QEDF_MAX_TASK_NUM 0xFFFF
diff --git a/drivers/scsi/qedf/qedf_debugfs.c b/drivers/scsi/qedf/qedf_debugfs.c
index c539a7a..5789ce1 100644
--- a/drivers/scsi/qedf/qedf_debugfs.c
+++ b/drivers/scsi/qedf/qedf_debugfs.c
@@ -439,7 +439,6 @@
return single_open(file, qedf_offload_stats_show, qedf);
}
-
const struct file_operations qedf_dbg_fops[] = {
qedf_dbg_fileops(qedf, fp_int),
qedf_dbg_fileops_seq(qedf, io_trace),
diff --git a/drivers/scsi/qedf/qedf_fip.c b/drivers/scsi/qedf/qedf_fip.c
index 773558f..16d1a21 100644
--- a/drivers/scsi/qedf/qedf_fip.c
+++ b/drivers/scsi/qedf/qedf_fip.c
@@ -23,6 +23,7 @@
struct fip_vlan *vlan;
#define MY_FIP_ALL_FCF_MACS ((__u8[6]) { 1, 0x10, 0x18, 1, 0, 2 })
static u8 my_fcoe_all_fcfs[ETH_ALEN] = MY_FIP_ALL_FCF_MACS;
+ unsigned long flags = 0;
skb = dev_alloc_skb(sizeof(struct fip_vlan));
if (!skb)
@@ -65,7 +66,9 @@
kfree_skb(skb);
return;
}
- qed_ops->ll2->start_xmit(qedf->cdev, skb);
+
+ set_bit(QED_LL2_XMIT_FLAGS_FIP_DISCOVERY, &flags);
+ qed_ops->ll2->start_xmit(qedf->cdev, skb, flags);
}
static void qedf_fcoe_process_vlan_resp(struct qedf_ctx *qedf,
@@ -139,7 +142,7 @@
print_hex_dump(KERN_WARNING, "fip ", DUMP_PREFIX_OFFSET, 16, 1,
skb->data, skb->len, false);
- qed_ops->ll2->start_xmit(qedf->cdev, skb);
+ qed_ops->ll2->start_xmit(qedf->cdev, skb, 0);
}
/* Process incoming FIP frames. */
diff --git a/drivers/scsi/qedf/qedf_io.c b/drivers/scsi/qedf/qedf_io.c
index 50a50c4..3fe579d 100644
--- a/drivers/scsi/qedf/qedf_io.c
+++ b/drivers/scsi/qedf/qedf_io.c
@@ -1200,6 +1200,12 @@
fcport->retry_delay_timestamp =
jiffies + (qualifier * HZ / 10);
}
+ /* Record stats */
+ if (io_req->cdb_status ==
+ SAM_STAT_TASK_SET_FULL)
+ qedf->task_set_fulls++;
+ else
+ qedf->busy++;
}
}
if (io_req->fcp_resid)
@@ -1866,6 +1872,11 @@
goto reset_tmf_err;
}
+ if (tm_flags == FCP_TMF_LUN_RESET)
+ qedf->lun_resets++;
+ else if (tm_flags == FCP_TMF_TGT_RESET)
+ qedf->target_resets++;
+
/* Initialize rest of io_req fields */
io_req->sc_cmd = sc_cmd;
io_req->fcport = fcport;
diff --git a/drivers/scsi/qedf/qedf_main.c b/drivers/scsi/qedf/qedf_main.c
index 284ccb5..d3f73d8 100644
--- a/drivers/scsi/qedf/qedf_main.c
+++ b/drivers/scsi/qedf/qedf_main.c
@@ -566,6 +566,8 @@
{
.link_update = qedf_link_update,
.dcbx_aen = qedf_dcbx_handler,
+ .get_generic_tlv_data = qedf_get_generic_tlv_data,
+ .get_protocol_tlv_data = qedf_get_protocol_tlv_data,
}
};
@@ -994,7 +996,7 @@
if (qedf_dump_frames)
print_hex_dump(KERN_WARNING, "fcoe: ", DUMP_PREFIX_OFFSET, 16,
1, skb->data, skb->len, false);
- qed_ops->ll2->start_xmit(qedf->cdev, skb);
+ qed_ops->ll2->start_xmit(qedf->cdev, skb, 0);
return 0;
}
@@ -1746,6 +1748,8 @@
goto out;
}
+ mutex_lock(&qedf->stats_mutex);
+
/* Query firmware for offload stats */
qed_ops->get_stats(qedf->cdev, fw_fcoe_stats);
@@ -1779,6 +1783,7 @@
qedf_stats->fcp_packet_aborts += qedf->packet_aborts;
qedf_stats->fcp_frame_alloc_failures += qedf->alloc_failures;
+ mutex_unlock(&qedf->stats_mutex);
kfree(fw_fcoe_stats);
out:
return qedf_stats;
@@ -2948,6 +2953,7 @@
qedf->stop_io_on_error = false;
pci_set_drvdata(pdev, qedf);
init_completion(&qedf->fipvlan_compl);
+ mutex_init(&qedf->stats_mutex);
QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_INFO,
"QLogic FastLinQ FCoE Module qedf %s, "
@@ -3393,6 +3399,104 @@
}
/*
+ * Protocol TLV handler
+ */
+void qedf_get_protocol_tlv_data(void *dev, void *data)
+{
+ struct qedf_ctx *qedf = dev;
+ struct qed_mfw_tlv_fcoe *fcoe = data;
+ struct fc_lport *lport = qedf->lport;
+ struct Scsi_Host *host = lport->host;
+ struct fc_host_attrs *fc_host = shost_to_fc_host(host);
+ struct fc_host_statistics *hst;
+
+ /* Force a refresh of the fc_host stats including offload stats */
+ hst = qedf_fc_get_host_stats(host);
+
+ fcoe->qos_pri_set = true;
+ fcoe->qos_pri = 3; /* Hard coded to 3 in driver */
+
+ fcoe->ra_tov_set = true;
+ fcoe->ra_tov = lport->r_a_tov;
+
+ fcoe->ed_tov_set = true;
+ fcoe->ed_tov = lport->e_d_tov;
+
+ fcoe->npiv_state_set = true;
+ fcoe->npiv_state = 1; /* NPIV always enabled */
+
+ fcoe->num_npiv_ids_set = true;
+ fcoe->num_npiv_ids = fc_host->npiv_vports_inuse;
+
+ /* Certain attributes we only want to set if we've selected an FCF */
+ if (qedf->ctlr.sel_fcf) {
+ fcoe->switch_name_set = true;
+ u64_to_wwn(qedf->ctlr.sel_fcf->switch_name, fcoe->switch_name);
+ }
+
+ fcoe->port_state_set = true;
+ /* For qedf we're either link down or fabric attach */
+ if (lport->link_up)
+ fcoe->port_state = QED_MFW_TLV_PORT_STATE_FABRIC;
+ else
+ fcoe->port_state = QED_MFW_TLV_PORT_STATE_OFFLINE;
+
+ fcoe->link_failures_set = true;
+ fcoe->link_failures = (u16)hst->link_failure_count;
+
+ fcoe->fcoe_txq_depth_set = true;
+ fcoe->fcoe_rxq_depth_set = true;
+ fcoe->fcoe_rxq_depth = FCOE_PARAMS_NUM_TASKS;
+ fcoe->fcoe_txq_depth = FCOE_PARAMS_NUM_TASKS;
+
+ fcoe->fcoe_rx_frames_set = true;
+ fcoe->fcoe_rx_frames = hst->rx_frames;
+
+ fcoe->fcoe_tx_frames_set = true;
+ fcoe->fcoe_tx_frames = hst->tx_frames;
+
+ fcoe->fcoe_rx_bytes_set = true;
+ fcoe->fcoe_rx_bytes = hst->fcp_input_megabytes * 1000000;
+
+ fcoe->fcoe_tx_bytes_set = true;
+ fcoe->fcoe_tx_bytes = hst->fcp_output_megabytes * 1000000;
+
+ fcoe->crc_count_set = true;
+ fcoe->crc_count = hst->invalid_crc_count;
+
+ fcoe->tx_abts_set = true;
+ fcoe->tx_abts = hst->fcp_packet_aborts;
+
+ fcoe->tx_lun_rst_set = true;
+ fcoe->tx_lun_rst = qedf->lun_resets;
+
+ fcoe->abort_task_sets_set = true;
+ fcoe->abort_task_sets = qedf->packet_aborts;
+
+ fcoe->scsi_busy_set = true;
+ fcoe->scsi_busy = qedf->busy;
+
+ fcoe->scsi_tsk_full_set = true;
+ fcoe->scsi_tsk_full = qedf->task_set_fulls;
+}
+
+/* Generic TLV data callback */
+void qedf_get_generic_tlv_data(void *dev, struct qed_generic_tlvs *data)
+{
+ struct qedf_ctx *qedf;
+
+ if (!dev) {
+ QEDF_INFO(NULL, QEDF_LOG_EVT,
+ "dev is NULL so ignoring get_generic_tlv_data request.\n");
+ return;
+ }
+ qedf = (struct qedf_ctx *)dev;
+
+ memset(data, 0, sizeof(struct qed_generic_tlvs));
+ ether_addr_copy(data->mac[0], qedf->mac);
+}
+
+/*
* Module Init/Remove
*/
diff --git a/drivers/scsi/qedi/qedi.h b/drivers/scsi/qedi/qedi.h
index b8b22ce..fc3babc 100644
--- a/drivers/scsi/qedi/qedi.h
+++ b/drivers/scsi/qedi/qedi.h
@@ -353,6 +353,9 @@
#define IPV6_LEN 41
#define IPV4_LEN 17
struct iscsi_boot_kset *boot_kset;
+
+ /* Used for iscsi statistics */
+ struct mutex stats_lock;
};
struct qedi_work {
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 7ec7f6e..ff21aa1 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -1150,7 +1150,7 @@
if (vlanid)
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
- rc = qedi_ops->ll2->start_xmit(cdev, skb);
+ rc = qedi_ops->ll2->start_xmit(cdev, skb, 0);
if (rc) {
QEDI_ERR(&qedi->dbg_ctx, "ll2 start_xmit returned %d\n",
rc);
diff --git a/drivers/scsi/qedi/qedi_iscsi.h b/drivers/scsi/qedi/qedi_iscsi.h
index ea13151..1126077 100644
--- a/drivers/scsi/qedi/qedi_iscsi.h
+++ b/drivers/scsi/qedi/qedi_iscsi.h
@@ -223,6 +223,12 @@
struct work_struct *ptr_tmf_work;
};
+struct qedi_boot_target {
+ char ip_addr[64];
+ char iscsi_name[255];
+ u32 ipv6_en;
+};
+
#define qedi_set_itt(task_id, itt) ((u32)(((task_id) & 0xffff) | ((itt) << 16)))
#define qedi_get_itt(cqe) (cqe.iscsi_hdr.cmd.itt >> 16)
diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 4da3592..32ee7f6 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -55,6 +55,7 @@
static struct qedi_cmd *qedi_get_cmd_from_tid(struct qedi_ctx *qedi, u32 tid);
static void qedi_reset_uio_rings(struct qedi_uio_dev *udev);
static void qedi_ll2_free_skbs(struct qedi_ctx *qedi);
+static struct nvm_iscsi_block *qedi_get_nvram_block(struct qedi_ctx *qedi);
static int qedi_iscsi_event_cb(void *context, u8 fw_event_code, void *fw_handle)
{
@@ -879,6 +880,201 @@
kfree(qedi->global_queues);
}
+static void qedi_get_boot_tgt_info(struct nvm_iscsi_block *block,
+ struct qedi_boot_target *tgt, u8 index)
+{
+ u32 ipv6_en;
+
+ ipv6_en = !!(block->generic.ctrl_flags &
+ NVM_ISCSI_CFG_GEN_IPV6_ENABLED);
+
+ snprintf(tgt->iscsi_name, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
+ block->target[index].target_name.byte);
+
+ tgt->ipv6_en = ipv6_en;
+
+ if (ipv6_en)
+ snprintf(tgt->ip_addr, IPV6_LEN, "%pI6\n",
+ block->target[index].ipv6_addr.byte);
+ else
+ snprintf(tgt->ip_addr, IPV4_LEN, "%pI4\n",
+ block->target[index].ipv4_addr.byte);
+}
+
+static int qedi_find_boot_info(struct qedi_ctx *qedi,
+ struct qed_mfw_tlv_iscsi *iscsi,
+ struct nvm_iscsi_block *block)
+{
+ struct qedi_boot_target *pri_tgt = NULL, *sec_tgt = NULL;
+ u32 pri_ctrl_flags = 0, sec_ctrl_flags = 0, found = 0;
+ struct iscsi_cls_session *cls_sess;
+ struct iscsi_cls_conn *cls_conn;
+ struct qedi_conn *qedi_conn;
+ struct iscsi_session *sess;
+ struct iscsi_conn *conn;
+ char ep_ip_addr[64];
+ int i, ret = 0;
+
+ pri_ctrl_flags = !!(block->target[0].ctrl_flags &
+ NVM_ISCSI_CFG_TARGET_ENABLED);
+ if (pri_ctrl_flags) {
+ pri_tgt = kzalloc(sizeof(*pri_tgt), GFP_KERNEL);
+ if (!pri_tgt)
+ return -1;
+ qedi_get_boot_tgt_info(block, pri_tgt, 0);
+ }
+
+ sec_ctrl_flags = !!(block->target[1].ctrl_flags &
+ NVM_ISCSI_CFG_TARGET_ENABLED);
+ if (sec_ctrl_flags) {
+ sec_tgt = kzalloc(sizeof(*sec_tgt), GFP_KERNEL);
+ if (!sec_tgt) {
+ ret = -1;
+ goto free_tgt;
+ }
+ qedi_get_boot_tgt_info(block, sec_tgt, 1);
+ }
+
+ for (i = 0; i < qedi->max_active_conns; i++) {
+ qedi_conn = qedi_get_conn_from_id(qedi, i);
+ if (!qedi_conn)
+ continue;
+
+ if (qedi_conn->ep->ip_type == TCP_IPV4)
+ snprintf(ep_ip_addr, IPV4_LEN, "%pI4\n",
+ qedi_conn->ep->dst_addr);
+ else
+ snprintf(ep_ip_addr, IPV6_LEN, "%pI6\n",
+ qedi_conn->ep->dst_addr);
+
+ cls_conn = qedi_conn->cls_conn;
+ conn = cls_conn->dd_data;
+ cls_sess = iscsi_conn_to_session(cls_conn);
+ sess = cls_sess->dd_data;
+
+ if (pri_ctrl_flags) {
+ if (!strcmp(pri_tgt->iscsi_name, sess->targetname) &&
+ !strcmp(pri_tgt->ip_addr, ep_ip_addr)) {
+ found = 1;
+ break;
+ }
+ }
+
+ if (sec_ctrl_flags) {
+ if (!strcmp(sec_tgt->iscsi_name, sess->targetname) &&
+ !strcmp(sec_tgt->ip_addr, ep_ip_addr)) {
+ found = 1;
+ break;
+ }
+ }
+ }
+
+ if (found) {
+ if (conn->hdrdgst_en) {
+ iscsi->header_digest_set = true;
+ iscsi->header_digest = 1;
+ }
+
+ if (conn->datadgst_en) {
+ iscsi->data_digest_set = true;
+ iscsi->data_digest = 1;
+ }
+ iscsi->boot_taget_portal_set = true;
+ iscsi->boot_taget_portal = sess->tpgt;
+
+ } else {
+ ret = -1;
+ }
+
+ if (sec_ctrl_flags)
+ kfree(sec_tgt);
+free_tgt:
+ if (pri_ctrl_flags)
+ kfree(pri_tgt);
+
+ return ret;
+}
+
+static void qedi_get_generic_tlv_data(void *dev, struct qed_generic_tlvs *data)
+{
+ struct qedi_ctx *qedi;
+
+ if (!dev) {
+ QEDI_INFO(NULL, QEDI_LOG_EVT,
+ "dev is NULL so ignoring get_generic_tlv_data request.\n");
+ return;
+ }
+ qedi = (struct qedi_ctx *)dev;
+
+ memset(data, 0, sizeof(struct qed_generic_tlvs));
+ ether_addr_copy(data->mac[0], qedi->mac);
+}
+
+/*
+ * Protocol TLV handler
+ */
+static void qedi_get_protocol_tlv_data(void *dev, void *data)
+{
+ struct qed_mfw_tlv_iscsi *iscsi = data;
+ struct qed_iscsi_stats *fw_iscsi_stats;
+ struct nvm_iscsi_block *block = NULL;
+ u32 chap_en = 0, mchap_en = 0;
+ struct qedi_ctx *qedi = dev;
+ int rval = 0;
+
+ fw_iscsi_stats = kmalloc(sizeof(*fw_iscsi_stats), GFP_KERNEL);
+ if (!fw_iscsi_stats) {
+ QEDI_ERR(&qedi->dbg_ctx,
+ "Could not allocate memory for fw_iscsi_stats.\n");
+ goto exit_get_data;
+ }
+
+ mutex_lock(&qedi->stats_lock);
+ /* Query firmware for offload stats */
+ qedi_ops->get_stats(qedi->cdev, fw_iscsi_stats);
+ mutex_unlock(&qedi->stats_lock);
+
+ iscsi->rx_frames_set = true;
+ iscsi->rx_frames = fw_iscsi_stats->iscsi_rx_packet_cnt;
+ iscsi->rx_bytes_set = true;
+ iscsi->rx_bytes = fw_iscsi_stats->iscsi_rx_bytes_cnt;
+ iscsi->tx_frames_set = true;
+ iscsi->tx_frames = fw_iscsi_stats->iscsi_tx_packet_cnt;
+ iscsi->tx_bytes_set = true;
+ iscsi->tx_bytes = fw_iscsi_stats->iscsi_tx_bytes_cnt;
+ iscsi->frame_size_set = true;
+ iscsi->frame_size = qedi->ll2_mtu;
+ block = qedi_get_nvram_block(qedi);
+ if (block) {
+ chap_en = !!(block->generic.ctrl_flags &
+ NVM_ISCSI_CFG_GEN_CHAP_ENABLED);
+ mchap_en = !!(block->generic.ctrl_flags &
+ NVM_ISCSI_CFG_GEN_CHAP_MUTUAL_ENABLED);
+
+ iscsi->auth_method_set = (chap_en || mchap_en) ? true : false;
+ iscsi->auth_method = 1;
+ if (chap_en)
+ iscsi->auth_method = 2;
+ if (mchap_en)
+ iscsi->auth_method = 3;
+
+ iscsi->tx_desc_size_set = true;
+ iscsi->tx_desc_size = QEDI_SQ_SIZE;
+ iscsi->rx_desc_size_set = true;
+ iscsi->rx_desc_size = QEDI_CQ_SIZE;
+
+ /* tpgt, hdr digest, data digest */
+ rval = qedi_find_boot_info(qedi, iscsi, block);
+ if (rval)
+ QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_INFO,
+ "Boot target not set");
+ }
+
+ kfree(fw_iscsi_stats);
+exit_get_data:
+ return;
+}
+
static void qedi_link_update(void *dev, struct qed_link_output *link)
{
struct qedi_ctx *qedi = (struct qedi_ctx *)dev;
@@ -896,6 +1092,8 @@
static struct qed_iscsi_cb_ops qedi_cb_ops = {
{
.link_update = qedi_link_update,
+ .get_protocol_tlv_data = qedi_get_protocol_tlv_data,
+ .get_generic_tlv_data = qedi_get_generic_tlv_data,
}
};
diff --git a/drivers/soc/ti/knav_dma.c b/drivers/soc/ti/knav_dma.c
index 026182d..224d7dd 100644
--- a/drivers/soc/ti/knav_dma.c
+++ b/drivers/soc/ti/knav_dma.c
@@ -134,6 +134,13 @@
static struct knav_dma_pool_device *kdev;
+static bool device_ready;
+bool knav_dma_device_ready(void)
+{
+ return device_ready;
+}
+EXPORT_SYMBOL_GPL(knav_dma_device_ready);
+
static bool check_config(struct knav_dma_chan *chan, struct knav_dma_cfg *cfg)
{
if (!memcmp(&chan->cfg, cfg, sizeof(*cfg)))
@@ -773,6 +780,7 @@
debugfs_create_file("knav_dma", S_IFREG | S_IRUGO, NULL, NULL,
&knav_dma_debug_ops);
+ device_ready = true;
return ret;
}
diff --git a/drivers/soc/ti/knav_qmss.h b/drivers/soc/ti/knav_qmss.h
index 905b974..56866ba4 100644
--- a/drivers/soc/ti/knav_qmss.h
+++ b/drivers/soc/ti/knav_qmss.h
@@ -292,6 +292,11 @@
struct list_head list;
};
+enum qmss_version {
+ QMSS,
+ QMSS_66AK2G,
+};
+
struct knav_device {
struct device *dev;
unsigned base_id;
@@ -305,6 +310,7 @@
struct list_head pools;
struct list_head pdsps;
struct list_head qmgrs;
+ enum qmss_version version;
};
struct knav_range_ops {
diff --git a/drivers/soc/ti/knav_qmss_queue.c b/drivers/soc/ti/knav_qmss_queue.c
index 77d6b5c..419365a 100644
--- a/drivers/soc/ti/knav_qmss_queue.c
+++ b/drivers/soc/ti/knav_qmss_queue.c
@@ -42,6 +42,15 @@
#define KNAV_QUEUE_PUSH_REG_INDEX 4
#define KNAV_QUEUE_POP_REG_INDEX 5
+/* Queue manager register indices in DTS for QMSS in K2G NAVSS.
+ * There are no status and vbusm push registers on this version
+ * of QMSS. Push registers are same as pop, So all indices above 1
+ * are to be re-defined
+ */
+#define KNAV_L_QUEUE_CONFIG_REG_INDEX 1
+#define KNAV_L_QUEUE_REGION_REG_INDEX 2
+#define KNAV_L_QUEUE_PUSH_REG_INDEX 3
+
/* PDSP register indices in DTS */
#define KNAV_QUEUE_PDSP_IRAM_REG_INDEX 0
#define KNAV_QUEUE_PDSP_REGS_REG_INDEX 1
@@ -65,6 +74,13 @@
*/
const char *knav_acc_firmwares[] = {"ks2_qmss_pdsp_acc48.bin"};
+static bool device_ready;
+bool knav_qmss_device_ready(void)
+{
+ return device_ready;
+}
+EXPORT_SYMBOL_GPL(knav_qmss_device_ready);
+
/**
* knav_queue_notify: qmss queue notfier call
*
@@ -1169,8 +1185,12 @@
dev_dbg(kdev->dev, "linkram0: dma:%pad, virt:%p, size:%x\n",
&block->dma, block->virt, block->size);
writel_relaxed((u32)block->dma, &qmgr->reg_config->link_ram_base0);
- writel_relaxed(block->size, &qmgr->reg_config->link_ram_size0);
-
+ if (kdev->version == QMSS_66AK2G)
+ writel_relaxed(block->size,
+ &qmgr->reg_config->link_ram_size0);
+ else
+ writel_relaxed(block->size - 1,
+ &qmgr->reg_config->link_ram_size0);
block++;
if (!block->size)
continue;
@@ -1387,42 +1407,64 @@
qmgr->reg_peek =
knav_queue_map_reg(kdev, child,
KNAV_QUEUE_PEEK_REG_INDEX);
- qmgr->reg_status =
- knav_queue_map_reg(kdev, child,
- KNAV_QUEUE_STATUS_REG_INDEX);
+
+ if (kdev->version == QMSS) {
+ qmgr->reg_status =
+ knav_queue_map_reg(kdev, child,
+ KNAV_QUEUE_STATUS_REG_INDEX);
+ }
+
qmgr->reg_config =
knav_queue_map_reg(kdev, child,
+ (kdev->version == QMSS_66AK2G) ?
+ KNAV_L_QUEUE_CONFIG_REG_INDEX :
KNAV_QUEUE_CONFIG_REG_INDEX);
qmgr->reg_region =
knav_queue_map_reg(kdev, child,
+ (kdev->version == QMSS_66AK2G) ?
+ KNAV_L_QUEUE_REGION_REG_INDEX :
KNAV_QUEUE_REGION_REG_INDEX);
+
qmgr->reg_push =
knav_queue_map_reg(kdev, child,
- KNAV_QUEUE_PUSH_REG_INDEX);
- qmgr->reg_pop =
- knav_queue_map_reg(kdev, child,
- KNAV_QUEUE_POP_REG_INDEX);
+ (kdev->version == QMSS_66AK2G) ?
+ KNAV_L_QUEUE_PUSH_REG_INDEX :
+ KNAV_QUEUE_PUSH_REG_INDEX);
- if (IS_ERR(qmgr->reg_peek) || IS_ERR(qmgr->reg_status) ||
+ if (kdev->version == QMSS) {
+ qmgr->reg_pop =
+ knav_queue_map_reg(kdev, child,
+ KNAV_QUEUE_POP_REG_INDEX);
+ }
+
+ if (IS_ERR(qmgr->reg_peek) ||
+ ((kdev->version == QMSS) &&
+ (IS_ERR(qmgr->reg_status) || IS_ERR(qmgr->reg_pop))) ||
IS_ERR(qmgr->reg_config) || IS_ERR(qmgr->reg_region) ||
- IS_ERR(qmgr->reg_push) || IS_ERR(qmgr->reg_pop)) {
+ IS_ERR(qmgr->reg_push)) {
dev_err(dev, "failed to map qmgr regs\n");
+ if (kdev->version == QMSS) {
+ if (!IS_ERR(qmgr->reg_status))
+ devm_iounmap(dev, qmgr->reg_status);
+ if (!IS_ERR(qmgr->reg_pop))
+ devm_iounmap(dev, qmgr->reg_pop);
+ }
if (!IS_ERR(qmgr->reg_peek))
devm_iounmap(dev, qmgr->reg_peek);
- if (!IS_ERR(qmgr->reg_status))
- devm_iounmap(dev, qmgr->reg_status);
if (!IS_ERR(qmgr->reg_config))
devm_iounmap(dev, qmgr->reg_config);
if (!IS_ERR(qmgr->reg_region))
devm_iounmap(dev, qmgr->reg_region);
if (!IS_ERR(qmgr->reg_push))
devm_iounmap(dev, qmgr->reg_push);
- if (!IS_ERR(qmgr->reg_pop))
- devm_iounmap(dev, qmgr->reg_pop);
devm_kfree(dev, qmgr);
continue;
}
+ /* Use same push register for pop as well */
+ if (kdev->version == QMSS_66AK2G)
+ qmgr->reg_pop = qmgr->reg_push;
+
list_add_tail(&qmgr->list, &kdev->qmgrs);
dev_info(dev, "added qmgr start queue %d, num of queues %d, reg_peek %p, reg_status %p, reg_config %p, reg_region %p, reg_push %p, reg_pop %p\n",
qmgr->start_queue, qmgr->num_queues,
@@ -1681,10 +1723,24 @@
return 0;
}
+/* Match table for of_platform binding */
+static const struct of_device_id keystone_qmss_of_match[] = {
+ {
+ .compatible = "ti,keystone-navigator-qmss",
+ },
+ {
+ .compatible = "ti,66ak2g-navss-qm",
+ .data = (void *)QMSS_66AK2G,
+ },
+ {},
+};
+MODULE_DEVICE_TABLE(of, keystone_qmss_of_match);
+
static int knav_queue_probe(struct platform_device *pdev)
{
struct device_node *node = pdev->dev.of_node;
struct device_node *qmgrs, *queue_pools, *regions, *pdsps;
+ const struct of_device_id *match;
struct device *dev = &pdev->dev;
u32 temp[2];
int ret;
@@ -1700,6 +1756,10 @@
return -ENOMEM;
}
+ match = of_match_device(of_match_ptr(keystone_qmss_of_match), dev);
+ if (match && match->data)
+ kdev->version = QMSS_66AK2G;
+
platform_set_drvdata(pdev, kdev);
kdev->dev = dev;
INIT_LIST_HEAD(&kdev->queue_ranges);
@@ -1796,6 +1856,7 @@
debugfs_create_file("qmss", S_IFREG | S_IRUGO, NULL, NULL,
&knav_queue_debug_ops);
+ device_ready = true;
return 0;
err:
@@ -1815,13 +1876,6 @@
return 0;
}
-/* Match table for of_platform binding */
-static struct of_device_id keystone_qmss_of_match[] = {
- { .compatible = "ti,keystone-navigator-qmss", },
- {},
-};
-MODULE_DEVICE_TABLE(of, keystone_qmss_of_match);
-
static struct platform_driver keystone_qmss_driver = {
.probe = knav_queue_probe,
.remove = knav_queue_remove,
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 986058a..c4b49fc 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -32,6 +32,7 @@
#include <linux/skbuff.h>
#include <net/sock.h>
+#include <net/xdp.h>
#include "vhost.h"
@@ -45,8 +46,10 @@
#define VHOST_NET_WEIGHT 0x80000
/* Max number of packets transferred before requeueing the job.
- * Using this limit prevents one virtqueue from starving rx. */
-#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
+ * Using this limit prevents one virtqueue from starving others with small
+ * pkts.
+ */
+#define VHOST_NET_PKT_WEIGHT 256
/* MAX number of TX used buffers for outstanding zerocopy */
#define VHOST_MAX_PEND 128
@@ -181,10 +184,10 @@
static int vhost_net_buf_peek_len(void *ptr)
{
- if (tun_is_xdp_buff(ptr)) {
- struct xdp_buff *xdp = tun_ptr_to_xdp(ptr);
+ if (tun_is_xdp_frame(ptr)) {
+ struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
- return xdp->data_end - xdp->data;
+ return xdpf->len;
}
return __skb_array_len_with_tag(ptr);
@@ -586,7 +589,7 @@
vhost_zerocopy_signal_used(net, vq);
vhost_net_tx_packet(net);
if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
- unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
+ unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT)) {
vhost_poll_queue(&vq->poll);
break;
}
@@ -768,6 +771,7 @@
struct socket *sock;
struct iov_iter fixup;
__virtio16 num_buffers;
+ int recv_pkts = 0;
mutex_lock_nested(&vq->mutex, 0);
sock = vq->private_data;
@@ -871,7 +875,8 @@
if (unlikely(vq_log))
vhost_log_write(vq, vq_log, log, vhost_len);
total_len += vhost_len;
- if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
+ if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
+ unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) {
vhost_poll_queue(&vq->poll);
goto out;
}
diff --git a/fs/exec.c b/fs/exec.c
index 183059c..30a36c2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1706,14 +1706,13 @@
/*
* sys_execve() executes a new program.
*/
-static int do_execveat_common(int fd, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp,
- int flags)
+static int __do_execve_file(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags, struct file *file)
{
char *pathbuf = NULL;
struct linux_binprm *bprm;
- struct file *file;
struct files_struct *displaced;
int retval;
@@ -1752,7 +1751,8 @@
check_unsafe_exec(bprm);
current->in_execve = 1;
- file = do_open_execat(fd, filename, flags);
+ if (!file)
+ file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
@@ -1760,7 +1760,9 @@
sched_exec();
bprm->file = file;
- if (fd == AT_FDCWD || filename->name[0] == '/') {
+ if (!filename) {
+ bprm->filename = "none";
+ } else if (fd == AT_FDCWD || filename->name[0] == '/') {
bprm->filename = filename->name;
} else {
if (filename->name[0] == '\0')
@@ -1826,7 +1828,8 @@
task_numa_free(current);
free_bprm(bprm);
kfree(pathbuf);
- putname(filename);
+ if (filename)
+ putname(filename);
if (displaced)
put_files_struct(displaced);
return retval;
@@ -1849,10 +1852,27 @@
if (displaced)
reset_files_struct(displaced);
out_ret:
- putname(filename);
+ if (filename)
+ putname(filename);
return retval;
}
+static int do_execveat_common(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags)
+{
+ return __do_execve_file(fd, filename, argv, envp, flags, NULL);
+}
+
+int do_execve_file(struct file *file, void *__argv, void *__envp)
+{
+ struct user_arg_ptr argv = { .ptr.native = __argv };
+ struct user_arg_ptr envp = { .ptr.native = __envp };
+
+ return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
+}
+
int do_execve(struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp)
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 1ade120..0eaeb41 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -43,6 +43,21 @@
help
Exports the dump image of crashed kernel in ELF format.
+config PROC_VMCORE_DEVICE_DUMP
+ bool "Device Hardware/Firmware Log Collection"
+ depends on PROC_VMCORE
+ default n
+ help
+ After kernel panic, device drivers can collect the device
+ specific snapshot of their hardware or firmware before the
+ underlying devices are initialized in crash recovery kernel.
+ Note that the device driver must be present in the crash
+ recovery kernel's initramfs to collect its underlying device
+ snapshot.
+
+ If you say Y here, the collected device dumps will be added
+ as ELF notes to /proc/vmcore.
+
config PROC_SYSCTL
bool "Sysctl support (/proc/sys)" if EXPERT
depends on PROC_FS
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af..cfb6674 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -20,6 +20,7 @@
#include <linux/init.h>
#include <linux/crash_dump.h>
#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/vmalloc.h>
#include <linux/pagemap.h>
#include <linux/uaccess.h>
@@ -38,12 +39,23 @@
static char *elfnotes_buf;
static size_t elfnotes_sz;
+/* Size of all notes minus the device dump notes */
+static size_t elfnotes_orig_sz;
/* Total size of vmcore file. */
static u64 vmcore_size;
static struct proc_dir_entry *proc_vmcore;
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+/* Device Dump list and mutex to synchronize access to list */
+static LIST_HEAD(vmcoredd_list);
+static DEFINE_MUTEX(vmcoredd_mutex);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+/* Device Dump Size */
+static size_t vmcoredd_orig_sz;
+
/*
* Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
* The called function has to take care of module refcounting.
@@ -178,6 +190,77 @@
return 0;
}
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+static int vmcoredd_copy_dumps(void *dst, u64 start, size_t size, int userbuf)
+{
+ struct vmcoredd_node *dump;
+ u64 offset = 0;
+ int ret = 0;
+ size_t tsz;
+ char *buf;
+
+ mutex_lock(&vmcoredd_mutex);
+ list_for_each_entry(dump, &vmcoredd_list, list) {
+ if (start < offset + dump->size) {
+ tsz = min(offset + (u64)dump->size - start, (u64)size);
+ buf = dump->buf + start - offset;
+ if (copy_to(dst, buf, tsz, userbuf)) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
+ size -= tsz;
+ start += tsz;
+ dst += tsz;
+
+ /* Leave now if buffer filled already */
+ if (!size)
+ goto out_unlock;
+ }
+ offset += dump->size;
+ }
+
+out_unlock:
+ mutex_unlock(&vmcoredd_mutex);
+ return ret;
+}
+
+static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
+ u64 start, size_t size)
+{
+ struct vmcoredd_node *dump;
+ u64 offset = 0;
+ int ret = 0;
+ size_t tsz;
+ char *buf;
+
+ mutex_lock(&vmcoredd_mutex);
+ list_for_each_entry(dump, &vmcoredd_list, list) {
+ if (start < offset + dump->size) {
+ tsz = min(offset + (u64)dump->size - start, (u64)size);
+ buf = dump->buf + start - offset;
+ if (remap_vmalloc_range_partial(vma, dst, buf, tsz)) {
+ ret = -EFAULT;
+ goto out_unlock;
+ }
+
+ size -= tsz;
+ start += tsz;
+ dst += tsz;
+
+ /* Leave now if buffer filled already */
+ if (!size)
+ goto out_unlock;
+ }
+ offset += dump->size;
+ }
+
+out_unlock:
+ mutex_unlock(&vmcoredd_mutex);
+ return ret;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
/* Read from the ELF header and then the crash dump. On error, negative value is
* returned otherwise number of bytes read are returned.
*/
@@ -215,10 +298,41 @@
if (*fpos < elfcorebuf_sz + elfnotes_sz) {
void *kaddr;
+ /* We add device dumps before other elf notes because the
+ * other elf notes may not fill the elf notes buffer
+ * completely and we will end up with zero-filled data
+ * between the elf notes and the device dumps. Tools will
+ * then try to decode this zero-filled data as valid notes
+ * and we don't want that. Hence, adding device dumps before
+ * the other elf notes ensure that zero-filled data can be
+ * avoided.
+ */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+ /* Read device dumps */
+ if (*fpos < elfcorebuf_sz + vmcoredd_orig_sz) {
+ tsz = min(elfcorebuf_sz + vmcoredd_orig_sz -
+ (size_t)*fpos, buflen);
+ start = *fpos - elfcorebuf_sz;
+ if (vmcoredd_copy_dumps(buffer, start, tsz, userbuf))
+ return -EFAULT;
+
+ buflen -= tsz;
+ *fpos += tsz;
+ buffer += tsz;
+ acc += tsz;
+
+ /* leave now if filled buffer already */
+ if (!buflen)
+ return acc;
+ }
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+ /* Read remaining elf notes */
tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
- kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
+ kaddr = elfnotes_buf + *fpos - elfcorebuf_sz - vmcoredd_orig_sz;
if (copy_to(buffer, kaddr, tsz, userbuf))
return -EFAULT;
+
buflen -= tsz;
*fpos += tsz;
buffer += tsz;
@@ -302,10 +416,8 @@
};
/**
- * alloc_elfnotes_buf - allocate buffer for ELF note segment in
- * vmalloc memory
- *
- * @notes_sz: size of buffer
+ * vmcore_alloc_buf - allocate buffer in vmalloc memory
+ * @sizez: size of buffer
*
* If CONFIG_MMU is defined, use vmalloc_user() to allow users to mmap
* the buffer to user-space by means of remap_vmalloc_range().
@@ -313,12 +425,12 @@
* If CONFIG_MMU is not defined, use vzalloc() since mmap_vmcore() is
* disabled and there's no need to allow users to mmap the buffer.
*/
-static inline char *alloc_elfnotes_buf(size_t notes_sz)
+static inline char *vmcore_alloc_buf(size_t size)
{
#ifdef CONFIG_MMU
- return vmalloc_user(notes_sz);
+ return vmalloc_user(size);
#else
- return vzalloc(notes_sz);
+ return vzalloc(size);
#endif
}
@@ -446,11 +558,46 @@
if (start < elfcorebuf_sz + elfnotes_sz) {
void *kaddr;
+ /* We add device dumps before other elf notes because the
+ * other elf notes may not fill the elf notes buffer
+ * completely and we will end up with zero-filled data
+ * between the elf notes and the device dumps. Tools will
+ * then try to decode this zero-filled data as valid notes
+ * and we don't want that. Hence, adding device dumps before
+ * the other elf notes ensure that zero-filled data can be
+ * avoided. This also ensures that the device dumps and
+ * other elf notes can be properly mmaped at page aligned
+ * address.
+ */
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+ /* Read device dumps */
+ if (start < elfcorebuf_sz + vmcoredd_orig_sz) {
+ u64 start_off;
+
+ tsz = min(elfcorebuf_sz + vmcoredd_orig_sz -
+ (size_t)start, size);
+ start_off = start - elfcorebuf_sz;
+ if (vmcoredd_mmap_dumps(vma, vma->vm_start + len,
+ start_off, tsz))
+ goto fail;
+
+ size -= tsz;
+ start += tsz;
+ len += tsz;
+
+ /* leave now if filled buffer already */
+ if (!size)
+ return 0;
+ }
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+ /* Read remaining elf notes */
tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
- kaddr = elfnotes_buf + start - elfcorebuf_sz;
+ kaddr = elfnotes_buf + start - elfcorebuf_sz - vmcoredd_orig_sz;
if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
kaddr, tsz))
goto fail;
+
size -= tsz;
start += tsz;
len += tsz;
@@ -502,8 +649,8 @@
return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
}
-static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
- struct list_head *vc_list)
+static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
+ struct list_head *vc_list)
{
u64 size;
struct vmcore *m;
@@ -665,7 +812,7 @@
return rc;
*notes_sz = roundup(phdr_sz, PAGE_SIZE);
- *notes_buf = alloc_elfnotes_buf(*notes_sz);
+ *notes_buf = vmcore_alloc_buf(*notes_sz);
if (!*notes_buf)
return -ENOMEM;
@@ -698,6 +845,11 @@
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
+ /* Store the size of all notes. We need this to update the note
+ * header when the device dumps will be added.
+ */
+ elfnotes_orig_sz = phdr.p_memsz;
+
return 0;
}
@@ -851,7 +1003,7 @@
return rc;
*notes_sz = roundup(phdr_sz, PAGE_SIZE);
- *notes_buf = alloc_elfnotes_buf(*notes_sz);
+ *notes_buf = vmcore_alloc_buf(*notes_sz);
if (!*notes_buf)
return -ENOMEM;
@@ -884,6 +1036,11 @@
/* Modify e_phnum to reflect merged headers. */
ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
+ /* Store the size of all notes. We need this to update the note
+ * header when the device dumps will be added.
+ */
+ elfnotes_orig_sz = phdr.p_memsz;
+
return 0;
}
@@ -976,8 +1133,8 @@
}
/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
- struct list_head *vc_list)
+static void set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
+ struct list_head *vc_list)
{
loff_t vmcore_off;
struct vmcore *m;
@@ -1145,6 +1302,202 @@
return 0;
}
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+/**
+ * vmcoredd_write_header - Write vmcore device dump header at the
+ * beginning of the dump's buffer.
+ * @buf: Output buffer where the note is written
+ * @data: Dump info
+ * @size: Size of the dump
+ *
+ * Fills beginning of the dump's buffer with vmcore device dump header.
+ */
+static void vmcoredd_write_header(void *buf, struct vmcoredd_data *data,
+ u32 size)
+{
+ struct vmcoredd_header *vdd_hdr = (struct vmcoredd_header *)buf;
+
+ vdd_hdr->n_namesz = sizeof(vdd_hdr->name);
+ vdd_hdr->n_descsz = size + sizeof(vdd_hdr->dump_name);
+ vdd_hdr->n_type = NT_VMCOREDD;
+
+ strncpy((char *)vdd_hdr->name, VMCOREDD_NOTE_NAME,
+ sizeof(vdd_hdr->name));
+ memcpy(vdd_hdr->dump_name, data->dump_name, sizeof(vdd_hdr->dump_name));
+}
+
+/**
+ * vmcoredd_update_program_headers - Update all Elf program headers
+ * @elfptr: Pointer to elf header
+ * @elfnotesz: Size of elf notes aligned to page size
+ * @vmcoreddsz: Size of device dumps to be added to elf note header
+ *
+ * Determine type of Elf header (Elf64 or Elf32) and update the elf note size.
+ * Also update the offsets of all the program headers after the elf note header.
+ */
+static void vmcoredd_update_program_headers(char *elfptr, size_t elfnotesz,
+ size_t vmcoreddsz)
+{
+ unsigned char *e_ident = (unsigned char *)elfptr;
+ u64 start, end, size;
+ loff_t vmcore_off;
+ u32 i;
+
+ vmcore_off = elfcorebuf_sz + elfnotesz;
+
+ if (e_ident[EI_CLASS] == ELFCLASS64) {
+ Elf64_Ehdr *ehdr = (Elf64_Ehdr *)elfptr;
+ Elf64_Phdr *phdr = (Elf64_Phdr *)(elfptr + sizeof(Elf64_Ehdr));
+
+ /* Update all program headers */
+ for (i = 0; i < ehdr->e_phnum; i++, phdr++) {
+ if (phdr->p_type == PT_NOTE) {
+ /* Update note size */
+ phdr->p_memsz = elfnotes_orig_sz + vmcoreddsz;
+ phdr->p_filesz = phdr->p_memsz;
+ continue;
+ }
+
+ start = rounddown(phdr->p_offset, PAGE_SIZE);
+ end = roundup(phdr->p_offset + phdr->p_memsz,
+ PAGE_SIZE);
+ size = end - start;
+ phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+ vmcore_off += size;
+ }
+ } else {
+ Elf32_Ehdr *ehdr = (Elf32_Ehdr *)elfptr;
+ Elf32_Phdr *phdr = (Elf32_Phdr *)(elfptr + sizeof(Elf32_Ehdr));
+
+ /* Update all program headers */
+ for (i = 0; i < ehdr->e_phnum; i++, phdr++) {
+ if (phdr->p_type == PT_NOTE) {
+ /* Update note size */
+ phdr->p_memsz = elfnotes_orig_sz + vmcoreddsz;
+ phdr->p_filesz = phdr->p_memsz;
+ continue;
+ }
+
+ start = rounddown(phdr->p_offset, PAGE_SIZE);
+ end = roundup(phdr->p_offset + phdr->p_memsz,
+ PAGE_SIZE);
+ size = end - start;
+ phdr->p_offset = vmcore_off + (phdr->p_offset - start);
+ vmcore_off += size;
+ }
+ }
+}
+
+/**
+ * vmcoredd_update_size - Update the total size of the device dumps and update
+ * Elf header
+ * @dump_size: Size of the current device dump to be added to total size
+ *
+ * Update the total size of all the device dumps and update the Elf program
+ * headers. Calculate the new offsets for the vmcore list and update the
+ * total vmcore size.
+ */
+static void vmcoredd_update_size(size_t dump_size)
+{
+ vmcoredd_orig_sz += dump_size;
+ elfnotes_sz = roundup(elfnotes_orig_sz, PAGE_SIZE) + vmcoredd_orig_sz;
+ vmcoredd_update_program_headers(elfcorebuf, elfnotes_sz,
+ vmcoredd_orig_sz);
+
+ /* Update vmcore list offsets */
+ set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
+
+ vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+ &vmcore_list);
+ proc_vmcore->size = vmcore_size;
+}
+
+/**
+ * vmcore_add_device_dump - Add a buffer containing device dump to vmcore
+ * @data: dump info.
+ *
+ * Allocate a buffer and invoke the calling driver's dump collect routine.
+ * Write Elf note at the beginning of the buffer to indicate vmcore device
+ * dump and add the dump to global list.
+ */
+int vmcore_add_device_dump(struct vmcoredd_data *data)
+{
+ struct vmcoredd_node *dump;
+ void *buf = NULL;
+ size_t data_size;
+ int ret;
+
+ if (!data || !strlen(data->dump_name) ||
+ !data->vmcoredd_callback || !data->size)
+ return -EINVAL;
+
+ dump = vzalloc(sizeof(*dump));
+ if (!dump) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ /* Keep size of the buffer page aligned so that it can be mmaped */
+ data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
+ PAGE_SIZE);
+
+ /* Allocate buffer for driver's to write their dumps */
+ buf = vmcore_alloc_buf(data_size);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+
+ vmcoredd_write_header(buf, data, data_size -
+ sizeof(struct vmcoredd_header));
+
+ /* Invoke the driver's dump collection routing */
+ ret = data->vmcoredd_callback(data, buf +
+ sizeof(struct vmcoredd_header));
+ if (ret)
+ goto out_err;
+
+ dump->buf = buf;
+ dump->size = data_size;
+
+ /* Add the dump to driver sysfs list */
+ mutex_lock(&vmcoredd_mutex);
+ list_add_tail(&dump->list, &vmcoredd_list);
+ mutex_unlock(&vmcoredd_mutex);
+
+ vmcoredd_update_size(data_size);
+ return 0;
+
+out_err:
+ if (buf)
+ vfree(buf);
+
+ if (dump)
+ vfree(dump);
+
+ return ret;
+}
+EXPORT_SYMBOL(vmcore_add_device_dump);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+
+/* Free all dumps in vmcore device dump list */
+static void vmcore_free_device_dumps(void)
+{
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+ mutex_lock(&vmcoredd_mutex);
+ while (!list_empty(&vmcoredd_list)) {
+ struct vmcoredd_node *dump;
+
+ dump = list_first_entry(&vmcoredd_list, struct vmcoredd_node,
+ list);
+ list_del(&dump->list);
+ vfree(dump->buf);
+ vfree(dump);
+ }
+ mutex_unlock(&vmcoredd_mutex);
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
+}
+
/* Init function for vmcore module. */
static int __init vmcore_init(void)
{
@@ -1192,4 +1545,7 @@
kfree(m);
}
free_elfcorebuf();
+
+ /* clear vmcore device dump list */
+ vmcore_free_device_dumps();
}
diff --git a/include/dt-bindings/net/microchip-lan78xx.h b/include/dt-bindings/net/microchip-lan78xx.h
new file mode 100644
index 0000000..0742ff0
--- /dev/null
+++ b/include/dt-bindings/net/microchip-lan78xx.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _DT_BINDINGS_MICROCHIP_LAN78XX_H
+#define _DT_BINDINGS_MICROCHIP_LAN78XX_H
+
+/* LED modes for LAN7800/LAN7850 embedded PHY */
+
+#define LAN78XX_LINK_ACTIVITY 0
+#define LAN78XX_LINK_1000_ACTIVITY 1
+#define LAN78XX_LINK_100_ACTIVITY 2
+#define LAN78XX_LINK_10_ACTIVITY 3
+#define LAN78XX_LINK_100_1000_ACTIVITY 4
+#define LAN78XX_LINK_10_1000_ACTIVITY 5
+#define LAN78XX_LINK_10_100_ACTIVITY 6
+#define LAN78XX_DUPLEX_COLLISION 8
+#define LAN78XX_COLLISION 9
+#define LAN78XX_ACTIVITY 10
+#define LAN78XX_AUTONEG_FAULT 12
+#define LAN78XX_FORCE_LED_OFF 14
+#define LAN78XX_FORCE_LED_ON 15
+
+#endif
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index b0a7f31..212b382 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -485,7 +485,7 @@
struct virtchnl_rss_lut {
u16 vsi_id;
u16 lut_entries;
- u8 lut[1]; /* RSS lookup table*/
+ u8 lut[1]; /* RSS lookup table */
};
VIRTCHNL_CHECK_STRUCT_LEN(6, virtchnl_rss_lut);
@@ -819,7 +819,7 @@
return VIRTCHNL_ERR_PARAM;
}
/* few more checks */
- if ((valid_len != msglen) || (err_msg_format))
+ if (err_msg_format || valid_len != msglen)
return VIRTCHNL_STATUS_ERR_OPCODE_MISMATCH;
return 0;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 4955e08..c05f24f 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -150,5 +150,6 @@
const char __user * const __user *,
const char __user * const __user *,
int);
+int do_execve_file(struct file *file, void *__argv, void *__envp);
#endif /* _LINUX_BINFMTS_H */
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 469b20e..bbe2974 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -22,6 +22,8 @@
struct bpf_prog;
struct bpf_map;
struct sock;
+struct seq_file;
+struct btf;
/* map is generic key/value storage optionally accesible by eBPF programs */
struct bpf_map_ops {
@@ -44,10 +46,14 @@
void (*map_fd_put_ptr)(void *ptr);
u32 (*map_gen_lookup)(struct bpf_map *map, struct bpf_insn *insn_buf);
u32 (*map_fd_sys_lookup_elem)(void *ptr);
+ void (*map_seq_show_elem)(struct bpf_map *map, void *key,
+ struct seq_file *m);
+ int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
+ u32 key_type_id, u32 value_type_id);
};
struct bpf_map {
- /* 1st cacheline with read-mostly members of which some
+ /* The first two cachelines with read-mostly members of which some
* are also accessed in fast-path (e.g. ops, max_entries).
*/
const struct bpf_map_ops *ops ____cacheline_aligned;
@@ -63,10 +69,13 @@
u32 pages;
u32 id;
int numa_node;
+ u32 btf_key_type_id;
+ u32 btf_value_type_id;
+ struct btf *btf;
bool unpriv_array;
- /* 7 bytes hole */
+ /* 55 bytes hole */
- /* 2nd cacheline with misc members to avoid false sharing
+ /* The 3rd and 4th cacheline with misc members to avoid false sharing
* particularly with refcounting.
*/
struct user_struct *user ____cacheline_aligned;
@@ -101,6 +110,16 @@
return container_of(map, struct bpf_offloaded_map, map);
}
+static inline bool bpf_map_offload_neutral(const struct bpf_map *map)
+{
+ return map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
+}
+
+static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
+{
+ return map->ops->map_seq_show_elem && map->ops->map_check_btf;
+}
+
extern const struct bpf_map_ops bpf_map_offload_ops;
/* function argument constraints */
@@ -221,6 +240,8 @@
struct bpf_insn_access_aux *info);
int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
const struct bpf_prog *prog);
+ int (*gen_ld_abs)(const struct bpf_insn *orig,
+ struct bpf_insn *insn_buf);
u32 (*convert_ctx_access)(enum bpf_access_type type,
const struct bpf_insn *src,
struct bpf_insn *dst,
@@ -442,6 +463,8 @@
int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
int bpf_get_file_flag(int flags);
+int bpf_check_uarg_tail_zero(void __user *uaddr, size_t expected_size,
+ size_t actual_size);
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
* forced to use 'long' read/writes to try to atomically copy long counters.
@@ -464,14 +487,17 @@
void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
/* Map specifics */
-struct net_device *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
+struct xdp_buff;
+
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
void __dev_map_flush(struct bpf_map *map);
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+ struct net_device *dev_rx);
struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
void __cpu_map_flush(struct bpf_map *map);
-struct xdp_buff;
int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
struct net_device *dev_rx);
@@ -550,6 +576,16 @@
{
}
+struct xdp_buff;
+struct bpf_dtab_netdev;
+
+static inline
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+ struct net_device *dev_rx)
+{
+ return 0;
+}
+
static inline
struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
{
@@ -564,7 +600,6 @@
{
}
-struct xdp_buff;
static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
struct xdp_buff *xdp,
struct net_device *dev_rx)
@@ -606,7 +641,7 @@
#if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
-static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux)
+static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux)
{
return aux->offload_requested;
}
@@ -647,6 +682,7 @@
#if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_INET)
struct sock *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
+struct sock *__sock_hash_lookup_elem(struct bpf_map *map, void *key);
int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
#else
static inline struct sock *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
@@ -654,6 +690,12 @@
return NULL;
}
+static inline struct sock *__sock_hash_lookup_elem(struct bpf_map *map,
+ void *key)
+{
+ return NULL;
+}
+
static inline int sock_map_prog(struct bpf_map *map,
struct bpf_prog *prog,
u32 type)
@@ -662,6 +704,31 @@
}
#endif
+#if defined(CONFIG_XDP_SOCKETS)
+struct xdp_sock;
+struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, u32 key);
+int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
+ struct xdp_sock *xs);
+void __xsk_map_flush(struct bpf_map *map);
+#else
+struct xdp_sock;
+static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map,
+ u32 key)
+{
+ return NULL;
+}
+
+static inline int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
+ struct xdp_sock *xs)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void __xsk_map_flush(struct bpf_map *map)
+{
+}
+#endif
+
/* verifier prototypes for helper functions called from eBPF programs */
extern const struct bpf_func_proto bpf_map_lookup_elem_proto;
extern const struct bpf_func_proto bpf_map_update_elem_proto;
@@ -675,10 +742,10 @@
extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto;
extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
extern const struct bpf_func_proto bpf_get_current_comm_proto;
-extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
-extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
extern const struct bpf_func_proto bpf_get_stackid_proto;
+extern const struct bpf_func_proto bpf_get_stack_proto;
extern const struct bpf_func_proto bpf_sock_map_update_proto;
+extern const struct bpf_func_proto bpf_sock_hash_update_proto;
/* Shared helpers among cBPF and eBPF. */
void bpf_user_rnd_init_once(void);
diff --git a/include/linux/bpf_trace.h b/include/linux/bpf_trace.h
index e6fe98a..ddf896a 100644
--- a/include/linux/bpf_trace.h
+++ b/include/linux/bpf_trace.h
@@ -2,7 +2,6 @@
#ifndef __LINUX_BPF_TRACE_H__
#define __LINUX_BPF_TRACE_H__
-#include <trace/events/bpf.h>
#include <trace/events/xdp.h>
#endif /* __LINUX_BPF_TRACE_H__ */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2b28fcf..b161e50 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -9,9 +9,10 @@
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SKB, cg_skb)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock)
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, cg_sock_addr)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_inout)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_inout)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_SEG6LOCAL, lwt_seg6local)
BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
@@ -47,6 +48,10 @@
BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
#if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_INET)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
+#if defined(CONFIG_XDP_SOCKETS)
+BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
+#endif
#endif
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index df36b1b..38b04f5 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -174,6 +174,11 @@
#define BPF_MAX_SUBPROGS 256
+struct bpf_subprog_info {
+ u32 start; /* insn idx of function entry point */
+ u16 stack_depth; /* max. stack depth used by this function */
+};
+
/* single container for all structs
* one verifier_env per bpf_check() call
*/
@@ -192,14 +197,12 @@
bool seen_direct_write;
struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
struct bpf_verifier_log log;
- u32 subprog_starts[BPF_MAX_SUBPROGS];
- /* computes the stack depth of each bpf function */
- u16 subprog_stack_depth[BPF_MAX_SUBPROGS + 1];
+ struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 1];
u32 subprog_cnt;
};
-void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
- va_list args);
+__printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
+ const char *fmt, va_list args);
__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
const char *fmt, ...);
diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h
new file mode 100644
index 0000000..687b176
--- /dev/null
+++ b/include/linux/bpfilter.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_BPFILTER_H
+#define _LINUX_BPFILTER_H
+
+#include <uapi/linux/bpfilter.h>
+
+struct sock;
+int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char *optval,
+ unsigned int optlen);
+int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char *optval,
+ int *optlen);
+extern int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
+ char __user *optval,
+ unsigned int optlen, bool is_set);
+#endif
diff --git a/include/linux/btf.h b/include/linux/btf.h
new file mode 100644
index 0000000..e076c46
--- /dev/null
+++ b/include/linux/btf.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+
+#ifndef _LINUX_BTF_H
+#define _LINUX_BTF_H 1
+
+#include <linux/types.h>
+
+struct btf;
+struct btf_type;
+union bpf_attr;
+
+extern const struct file_operations btf_fops;
+
+void btf_put(struct btf *btf);
+int btf_new_fd(const union bpf_attr *attr);
+struct btf *btf_get_by_fd(int fd);
+int btf_get_info_by_fd(const struct btf *btf,
+ const union bpf_attr *attr,
+ union bpf_attr __user *uattr);
+/* Figure out the size of a type_id. If type_id is a modifier
+ * (e.g. const), it will be resolved to find out the type with size.
+ *
+ * For example:
+ * In describing "const void *", type_id is "const" and "const"
+ * refers to "void *". The return type will be "void *".
+ *
+ * If type_id is a simple "int", then return type will be "int".
+ *
+ * @btf: struct btf object
+ * @type_id: Find out the size of type_id. The type_id of the return
+ * type is set to *type_id.
+ * @ret_size: It can be NULL. If not NULL, the size of the return
+ * type is set to *ret_size.
+ * Return: The btf_type (resolved to another type with size info if needed).
+ * NULL is returned if type_id itself does not have size info
+ * (e.g. void) or it cannot be resolved to another type that
+ * has size info.
+ * *type_id and *ret_size will not be changed in the
+ * NULL return case.
+ */
+const struct btf_type *btf_type_id_size(const struct btf *btf,
+ u32 *type_id,
+ u32 *ret_size);
+void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
+ struct seq_file *m);
+int btf_get_fd_by_id(u32 id);
+u32 btf_id(const struct btf *btf);
+
+#endif
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index f7ac2aa..3e4ba9d 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -5,6 +5,7 @@
#include <linux/kexec.h>
#include <linux/proc_fs.h>
#include <linux/elf.h>
+#include <uapi/linux/vmcore.h>
#include <asm/pgtable.h> /* for pgprot_t */
@@ -93,4 +94,21 @@
#endif /* CONFIG_CRASH_DUMP */
extern unsigned long saved_max_pfn;
+
+/* Device Dump information to be filled by drivers */
+struct vmcoredd_data {
+ char dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Unique name of the dump */
+ unsigned int size; /* Size of the dump */
+ /* Driver's registered callback to be invoked to collect dump */
+ int (*vmcoredd_callback)(struct vmcoredd_data *data, void *buf);
+};
+
+#ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP
+int vmcore_add_device_dump(struct vmcoredd_data *data);
+#else
+static inline int vmcore_add_device_dump(struct vmcoredd_data *data)
+{
+ return -EOPNOTSUPP;
+}
+#endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
#endif /* LINUX_CRASHDUMP_H */
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index b32cd20..f8a2245 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -312,6 +312,9 @@
* by kernel. Returns a negative error code or zero.
* @get_fecparam: Get the network device Forward Error Correction parameters.
* @set_fecparam: Set the network device Forward Error Correction parameters.
+ * @get_ethtool_phy_stats: Return extended statistics about the PHY device.
+ * This is only useful if the device maintains PHY statistics and
+ * cannot use the standard PHY library helpers.
*
* All operations are optional (i.e. the function pointer may be set
* to %NULL) and callers must take this into account. Callers must
@@ -407,5 +410,7 @@
struct ethtool_fecparam *);
int (*set_fecparam)(struct net_device *,
struct ethtool_fecparam *);
+ void (*get_ethtool_phy_stats)(struct net_device *,
+ struct ethtool_stats *, u64 *);
};
#endif /* _LINUX_ETHTOOL_H */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fc4e8f9..d358d18 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -30,6 +30,7 @@
struct seccomp_data;
struct bpf_prog_aux;
struct xdp_rxq_info;
+struct xdp_buff;
/* ArgX, context and stack frame pointer register positions. Note,
* Arg1, Arg2, Arg3, etc are used as argument mappings of function
@@ -46,7 +47,9 @@
/* Additional register mappings for converted user programs. */
#define BPF_REG_A BPF_REG_0
#define BPF_REG_X BPF_REG_7
-#define BPF_REG_TMP BPF_REG_8
+#define BPF_REG_TMP BPF_REG_2 /* scratch reg */
+#define BPF_REG_D BPF_REG_8 /* data, callee-saved */
+#define BPF_REG_H BPF_REG_9 /* hlen, callee-saved */
/* Kernel hidden auxiliary/helper register for hardening step.
* Only used by eBPF JITs. It's nothing more than a temporary
@@ -467,7 +470,8 @@
dst_needed:1, /* Do we need dst entry? */
blinded:1, /* Was blinded */
is_func:1, /* program is a bpf function */
- kprobe_override:1; /* Do we override a kprobe? */
+ kprobe_override:1, /* Do we override a kprobe? */
+ has_callchain_buf:1; /* callchain buffer allocated? */
enum bpf_prog_type type; /* Type of BPF program */
enum bpf_attach_type expected_attach_type; /* For some prog types */
u32 len; /* Number of filter blocks */
@@ -500,14 +504,6 @@
void *data_end;
};
-struct xdp_buff {
- void *data;
- void *data_end;
- void *data_meta;
- void *data_hard_start;
- struct xdp_rxq_info *rxq;
-};
-
struct sk_msg_buff {
void *data;
void *data_end;
@@ -519,9 +515,9 @@
int sg_end;
struct scatterlist sg_data[MAX_SKB_FRAGS];
bool sg_copy[MAX_SKB_FRAGS];
- __u32 key;
__u32 flags;
- struct bpf_map *map;
+ struct sock *sk_redir;
+ struct sock *sk;
struct sk_buff *skb;
struct list_head list;
};
@@ -766,27 +762,12 @@
* This does not appear to be a real limitation for existing software.
*/
int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *prog);
+ struct xdp_buff *xdp, struct bpf_prog *prog);
int xdp_do_redirect(struct net_device *dev,
struct xdp_buff *xdp,
struct bpf_prog *prog);
void xdp_do_flush_map(void);
-/* Drivers not supporting XDP metadata can use this helper, which
- * rejects any room expansion for metadata as a result.
- */
-static __always_inline void
-xdp_set_data_meta_invalid(struct xdp_buff *xdp)
-{
- xdp->data_meta = xdp->data + 1;
-}
-
-static __always_inline bool
-xdp_data_meta_unsupported(const struct xdp_buff *xdp)
-{
- return unlikely(xdp->data_meta > xdp->data);
-}
-
void bpf_warn_invalid_xdp_action(u32 act);
struct sock *do_sk_redirect_map(struct sk_buff *skb);
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 02639eb..7843b98 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -50,6 +50,7 @@
#define BR_VLAN_TUNNEL BIT(13)
#define BR_BCAST_FLOOD BIT(14)
#define BR_NEIGH_SUPPRESS BIT(15)
+#define BR_ISOLATED BIT(16)
#define BR_DEFAULT_AGEING_TIME (300 * HZ)
@@ -93,11 +94,39 @@
#if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
bool br_vlan_enabled(const struct net_device *dev);
+int br_vlan_get_pvid(const struct net_device *dev, u16 *p_pvid);
+int br_vlan_get_info(const struct net_device *dev, u16 vid,
+ struct bridge_vlan_info *p_vinfo);
#else
static inline bool br_vlan_enabled(const struct net_device *dev)
{
return false;
}
+
+static inline int br_vlan_get_pvid(const struct net_device *dev, u16 *p_pvid)
+{
+ return -1;
+}
+
+static inline int br_vlan_get_info(const struct net_device *dev, u16 vid,
+ struct bridge_vlan_info *p_vinfo)
+{
+ return -1;
+}
+#endif
+
+#if IS_ENABLED(CONFIG_BRIDGE)
+struct net_device *br_fdb_find_port(const struct net_device *br_dev,
+ const unsigned char *addr,
+ __u16 vid);
+#else
+static inline struct net_device *
+br_fdb_find_port(const struct net_device *br_dev,
+ const unsigned char *addr,
+ __u16 vid)
+{
+ return NULL;
+}
#endif
#endif
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 4cb7aee..2e55e4c 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -21,7 +21,7 @@
struct hlist_node hlist;
struct macvlan_port *port;
struct net_device *lowerdev;
- void *fwd_priv;
+ void *accel_priv;
struct vlan_pcpu_stats __percpu *pcpu_stats;
DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
@@ -61,10 +61,6 @@
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack);
-extern void macvlan_count_rx(const struct macvlan_dev *vlan,
- unsigned int len, bool success,
- bool multicast);
-
extern void macvlan_dellink(struct net_device *dev, struct list_head *head);
extern int macvlan_link_register(struct rtnl_link_ops *ops);
@@ -86,4 +82,27 @@
}
#endif
+static inline void *macvlan_accel_priv(struct net_device *dev)
+{
+ struct macvlan_dev *macvlan = netdev_priv(dev);
+
+ return macvlan->accel_priv;
+}
+
+static inline bool macvlan_supports_dest_filter(struct net_device *dev)
+{
+ struct macvlan_dev *macvlan = netdev_priv(dev);
+
+ return macvlan->mode == MACVLAN_MODE_PRIVATE ||
+ macvlan->mode == MACVLAN_MODE_VEPA ||
+ macvlan->mode == MACVLAN_MODE_BRIDGE;
+}
+
+static inline int macvlan_release_l2fw_offload(struct net_device *dev)
+{
+ struct macvlan_dev *macvlan = netdev_priv(dev);
+
+ macvlan->accel_priv = NULL;
+ return dev_uc_add(macvlan->lowerdev, dev->dev_addr);
+}
#endif /* _LINUX_IF_MACVLAN_H */
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index fd00170..3d2996d 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -22,7 +22,7 @@
#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
struct socket *tun_get_socket(struct file *);
struct ptr_ring *tun_get_tx_ring(struct file *file);
-bool tun_is_xdp_buff(void *ptr);
+bool tun_is_xdp_frame(void *ptr);
void *tun_xdp_to_ptr(void *ptr);
void *tun_ptr_to_xdp(void *ptr);
void tun_ptr_free(void *ptr);
@@ -39,7 +39,7 @@
{
return ERR_PTR(-EINVAL);
}
-static inline bool tun_is_xdp_buff(void *ptr)
+static inline bool tun_is_xdp_frame(void *ptr)
{
return false;
}
diff --git a/include/linux/kcore.h b/include/linux/kcore.h
index 80db19d..8de55e4 100644
--- a/include/linux/kcore.h
+++ b/include/linux/kcore.h
@@ -28,6 +28,12 @@
loff_t offset;
};
+struct vmcoredd_node {
+ struct list_head list; /* List of dumps */
+ void *buf; /* Buffer containing device's dump */
+ unsigned int size; /* Size of the buffer */
+};
+
#ifdef CONFIG_PROC_KCORE
extern void kclist_add(struct kcore_list *, void *, size_t, int type);
#else
diff --git a/include/linux/mdio-bitbang.h b/include/linux/mdio-bitbang.h
index a8ac9cf..5d71e8a 100644
--- a/include/linux/mdio-bitbang.h
+++ b/include/linux/mdio-bitbang.h
@@ -33,8 +33,6 @@
struct mdiobb_ctrl {
const struct mdiobb_ops *ops;
- /* reset callback */
- int (*reset)(struct mii_bus *bus);
};
/* The returned bus is not yet registered with the phy layer. */
diff --git a/include/linux/mdio-gpio.h b/include/linux/mdio-gpio.h
new file mode 100644
index 0000000..cea443a6
--- /dev/null
+++ b/include/linux/mdio-gpio.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_MDIO_GPIO_H
+#define __LINUX_MDIO_GPIO_H
+
+#define MDIO_GPIO_MDC 0
+#define MDIO_GPIO_MDIO 1
+#define MDIO_GPIO_MDO 2
+
+#endif
diff --git a/include/linux/microchipphy.h b/include/linux/microchipphy.h
index 8f9c903..8c40128 100644
--- a/include/linux/microchipphy.h
+++ b/include/linux/microchipphy.h
@@ -70,6 +70,9 @@
#define LAN88XX_MMD3_CHIP_ID (32877)
#define LAN88XX_MMD3_CHIP_REV (32878)
+/* Registers specific to the LAN7800/LAN7850 embedded phy */
+#define LAN78XX_PHY_LED_MODE_SELECT (0x1D)
+
/* DSP registers */
#define PHY_ARDENNES_MMD_DEV_3_PHY_CFG (0x806A)
#define PHY_ARDENNES_MMD_DEV_3_PHY_CFG_ZD_DLY_EN_ (0x2000)
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 81d0799..122e7e9 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -225,6 +225,7 @@
MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP = 1ULL << 36,
MLX4_DEV_CAP_FLAG2_SL_TO_VL_CHANGE_EVENT = 1ULL << 37,
MLX4_DEV_CAP_FLAG2_USER_MAC_EN = 1ULL << 38,
+ MLX4_DEV_CAP_FLAG2_DRIVER_VERSION_TO_FW = 1ULL << 39,
};
enum {
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 2bc27f8..db0332a 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1152,6 +1152,9 @@
#define MLX5_CAP_PCAM_FEATURE(mdev, fld) \
MLX5_GET(pcam_reg, (mdev)->caps.pcam, feature_cap_mask.enhanced_features.fld)
+#define MLX5_CAP_PCAM_REG(mdev, reg) \
+ MLX5_GET(pcam_reg, (mdev)->caps.pcam, port_access_reg_cap_mask.regs_5000_to_507f.reg)
+
#define MLX5_CAP_MCAM_REG(mdev, reg) \
MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_access_reg_cap_mask.access_regs.reg)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d703774..92d2924 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -124,6 +124,8 @@
MLX5_REG_PAOS = 0x5006,
MLX5_REG_PFCC = 0x5007,
MLX5_REG_PPCNT = 0x5008,
+ MLX5_REG_PPTB = 0x500b,
+ MLX5_REG_PBMC = 0x500c,
MLX5_REG_PMAOS = 0x5012,
MLX5_REG_PUDE = 0x5009,
MLX5_REG_PMPE = 0x5010,
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 47aecc4..9f4d32e 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -90,8 +90,12 @@
union {
u32 tir_num;
struct mlx5_flow_table *ft;
- u32 vport_num;
struct mlx5_fc *counter;
+ struct {
+ u16 num;
+ u16 vhca_id;
+ bool vhca_id_valid;
+ } vport;
};
};
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 1aad455..edbddea 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -356,22 +356,6 @@
u8 reserved_at_6[0x1a];
};
-struct mlx5_ifc_ipv4_layout_bits {
- u8 reserved_at_0[0x60];
-
- u8 ipv4[0x20];
-};
-
-struct mlx5_ifc_ipv6_layout_bits {
- u8 ipv6[16][0x8];
-};
-
-union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
- struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
- struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
- u8 reserved_at_0[0x80];
-};
-
struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
u8 smac_47_16[0x20];
@@ -412,7 +396,7 @@
u8 reserved_at_0[0x8];
u8 source_sqn[0x18];
- u8 reserved_at_20[0x10];
+ u8 source_eswitch_owner_vhca_id[0x10];
u8 source_port[0x10];
u8 outer_second_prio[0x3];
@@ -557,7 +541,8 @@
u8 vport_svlan_insert[0x1];
u8 vport_cvlan_insert_if_not_exist[0x1];
u8 vport_cvlan_insert_overwrite[0x1];
- u8 reserved_at_5[0x19];
+ u8 reserved_at_5[0x18];
+ u8 merged_eswitch[0x1];
u8 nic_vport_node_guid_modify[0x1];
u8 nic_vport_port_guid_modify[0x1];
@@ -1147,8 +1132,9 @@
struct mlx5_ifc_dest_format_struct_bits {
u8 destination_type[0x8];
u8 destination_id[0x18];
-
- u8 reserved_at_20[0x20];
+ u8 destination_eswitch_owner_vhca_id_valid[0x1];
+ u8 reserved_at_21[0xf];
+ u8 destination_eswitch_owner_vhca_id[0x10];
};
struct mlx5_ifc_flow_counter_list_bits {
@@ -6993,7 +6979,9 @@
u8 reserved_at_a0[0x8];
u8 table_id[0x18];
- u8 reserved_at_c0[0x20];
+ u8 source_eswitch_owner_vhca_id_valid[0x1];
+
+ u8 reserved_at_c1[0x1f];
u8 start_flow_index[0x20];
@@ -8015,6 +8003,17 @@
u8 ppcnt_statistical_group[0x1];
};
+struct mlx5_ifc_pcam_regs_5000_to_507f_bits {
+ u8 port_access_reg_cap_mask_127_to_96[0x20];
+ u8 port_access_reg_cap_mask_95_to_64[0x20];
+ u8 port_access_reg_cap_mask_63_to_32[0x20];
+
+ u8 port_access_reg_cap_mask_31_to_13[0x13];
+ u8 pbmc[0x1];
+ u8 pptb[0x1];
+ u8 port_access_reg_cap_mask_10_to_0[0xb];
+};
+
struct mlx5_ifc_pcam_reg_bits {
u8 reserved_at_0[0x8];
u8 feature_group[0x8];
@@ -8024,6 +8023,7 @@
u8 reserved_at_20[0x20];
union {
+ struct mlx5_ifc_pcam_regs_5000_to_507f_bits regs_5000_to_507f;
u8 reserved_at_0[0x80];
} port_access_reg_cap_mask;
@@ -8788,6 +8788,41 @@
u8 trust_state[0x3];
};
+struct mlx5_ifc_pptb_reg_bits {
+ u8 reserved_at_0[0x2];
+ u8 mm[0x2];
+ u8 reserved_at_4[0x4];
+ u8 local_port[0x8];
+ u8 reserved_at_10[0x6];
+ u8 cm[0x1];
+ u8 um[0x1];
+ u8 pm[0x8];
+
+ u8 prio_x_buff[0x20];
+
+ u8 pm_msb[0x8];
+ u8 reserved_at_48[0x10];
+ u8 ctrl_buff[0x4];
+ u8 untagged_buff[0x4];
+};
+
+struct mlx5_ifc_pbmc_reg_bits {
+ u8 reserved_at_0[0x8];
+ u8 local_port[0x8];
+ u8 reserved_at_10[0x10];
+
+ u8 xoff_timer_value[0x10];
+ u8 xoff_refresh[0x10];
+
+ u8 reserved_at_40[0x9];
+ u8 fullness_threshold[0x7];
+ u8 port_buffer_size[0x10];
+
+ struct mlx5_ifc_bufferx_reg_bits buffer[10];
+
+ u8 reserved_at_2e0[0x40];
+};
+
struct mlx5_ifc_qtct_reg_bits {
u8 reserved_at_0[0x8];
u8 port_number[0x8];
diff --git a/include/linux/mlx5/mlx5_ifc_fpga.h b/include/linux/mlx5/mlx5_ifc_fpga.h
index ec05249..1930915 100644
--- a/include/linux/mlx5/mlx5_ifc_fpga.h
+++ b/include/linux/mlx5/mlx5_ifc_fpga.h
@@ -32,12 +32,29 @@
#ifndef MLX5_IFC_FPGA_H
#define MLX5_IFC_FPGA_H
+struct mlx5_ifc_ipv4_layout_bits {
+ u8 reserved_at_0[0x60];
+
+ u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+ u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+ struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+ struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+ u8 reserved_at_0[0x80];
+};
+
enum {
MLX5_FPGA_CAP_SANDBOX_VENDOR_ID_MLNX = 0x2c9,
};
enum {
MLX5_FPGA_CAP_SANDBOX_PRODUCT_ID_IPSEC = 0x2,
+ MLX5_FPGA_CAP_SANDBOX_PRODUCT_ID_TLS = 0x3,
};
struct mlx5_ifc_fpga_shell_caps_bits {
@@ -370,6 +387,27 @@
u8 reserved_at_40[0x40];
};
+struct mlx5_ifc_tls_extended_cap_bits {
+ u8 aes_gcm_128[0x1];
+ u8 aes_gcm_256[0x1];
+ u8 reserved_at_2[0x1e];
+ u8 reserved_at_20[0x20];
+ u8 context_capacity_total[0x20];
+ u8 context_capacity_rx[0x20];
+ u8 context_capacity_tx[0x20];
+ u8 reserved_at_a0[0x10];
+ u8 tls_counter_size[0x10];
+ u8 tls_counters_addr_low[0x20];
+ u8 tls_counters_addr_high[0x20];
+ u8 rx[0x1];
+ u8 tx[0x1];
+ u8 tls_v12[0x1];
+ u8 tls_v13[0x1];
+ u8 lro[0x1];
+ u8 ipv6[0x1];
+ u8 reserved_at_106[0x1a];
+};
+
struct mlx5_ifc_ipsec_extended_cap_bits {
u8 encapsulation[0x20];
@@ -519,4 +557,43 @@
__be16 reserved2;
} __packed;
+enum fpga_tls_cmds {
+ CMD_SETUP_STREAM = 0x1001,
+ CMD_TEARDOWN_STREAM = 0x1002,
+};
+
+#define MLX5_TLS_1_2 (0)
+
+#define MLX5_TLS_ALG_AES_GCM_128 (0)
+#define MLX5_TLS_ALG_AES_GCM_256 (1)
+
+struct mlx5_ifc_tls_cmd_bits {
+ u8 command_type[0x20];
+ u8 ipv6[0x1];
+ u8 direction_sx[0x1];
+ u8 tls_version[0x2];
+ u8 reserved[0x1c];
+ u8 swid[0x20];
+ u8 src_port[0x10];
+ u8 dst_port[0x10];
+ union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
+ union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
+ u8 tls_rcd_sn[0x40];
+ u8 tcp_sn[0x20];
+ u8 tls_implicit_iv[0x20];
+ u8 tls_xor_iv[0x40];
+ u8 encryption_key[0x100];
+ u8 alg[4];
+ u8 reserved2[0x1c];
+ u8 reserved3[0x4a0];
+};
+
+struct mlx5_ifc_tls_resp_bits {
+ u8 syndrome[0x20];
+ u8 stream_id[0x20];
+ u8 reserverd[0x40];
+};
+
+#define MLX5_TLS_COMMAND_SIZE (0x100)
+
#endif /* MLX5_IFC_FPGA_H */
diff --git a/include/linux/mmc/sdio_ids.h b/include/linux/mmc/sdio_ids.h
index cdd66a5..0a7abe8 100644
--- a/include/linux/mmc/sdio_ids.h
+++ b/include/linux/mmc/sdio_ids.h
@@ -35,6 +35,7 @@
#define SDIO_DEVICE_ID_BROADCOM_4335_4339 0x4335
#define SDIO_DEVICE_ID_BROADCOM_4339 0x4339
#define SDIO_DEVICE_ID_BROADCOM_43362 0xa962
+#define SDIO_DEVICE_ID_BROADCOM_43364 0xa9a4
#define SDIO_DEVICE_ID_BROADCOM_43430 0xa9a6
#define SDIO_DEVICE_ID_BROADCOM_4345 0x4345
#define SDIO_DEVICE_ID_BROADCOM_43455 0xa9bf
diff --git a/include/linux/net.h b/include/linux/net.h
index 2248a05..6554d3b 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -197,6 +197,7 @@
int offset, size_t size, int flags);
int (*sendmsg_locked)(struct sock *sk, struct msghdr *msg,
size_t size);
+ int (*set_rcvlowat)(struct sock *sk, int val);
};
#define DECLARE_SOCKADDR(type, dst, src) \
diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index 29ed8fd..db99240 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -103,11 +103,12 @@
#define NET_DIM_PARAMS_NUM_PROFILES 5
/* Adaptive moderation profiles */
#define NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
+#define NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE 128
#define NET_DIM_DEF_PROFILE_CQE 1
#define NET_DIM_DEF_PROFILE_EQE 1
/* All profiles sizes must be NET_PARAMS_DIM_NUM_PROFILES */
-#define NET_DIM_EQE_PROFILES { \
+#define NET_DIM_RX_EQE_PROFILES { \
{1, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
{8, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
{64, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
@@ -115,7 +116,7 @@
{256, NET_DIM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
}
-#define NET_DIM_CQE_PROFILES { \
+#define NET_DIM_RX_CQE_PROFILES { \
{2, 256}, \
{8, 128}, \
{16, 64}, \
@@ -123,32 +124,68 @@
{64, 64} \
}
+#define NET_DIM_TX_EQE_PROFILES { \
+ {1, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \
+ {8, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \
+ {32, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \
+ {64, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE}, \
+ {128, NET_DIM_DEFAULT_TX_CQ_MODERATION_PKTS_FROM_EQE} \
+}
+
+#define NET_DIM_TX_CQE_PROFILES { \
+ {5, 128}, \
+ {8, 64}, \
+ {16, 32}, \
+ {32, 32}, \
+ {64, 32} \
+}
+
static const struct net_dim_cq_moder
-profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
- NET_DIM_EQE_PROFILES,
- NET_DIM_CQE_PROFILES,
+rx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+ NET_DIM_RX_EQE_PROFILES,
+ NET_DIM_RX_CQE_PROFILES,
};
-static inline struct net_dim_cq_moder net_dim_get_profile(u8 cq_period_mode,
- int ix)
-{
- struct net_dim_cq_moder cq_moder;
+static const struct net_dim_cq_moder
+tx_profile[NET_DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = {
+ NET_DIM_TX_EQE_PROFILES,
+ NET_DIM_TX_CQE_PROFILES,
+};
- cq_moder = profile[cq_period_mode][ix];
+static inline struct net_dim_cq_moder
+net_dim_get_rx_moderation(u8 cq_period_mode, int ix)
+{
+ struct net_dim_cq_moder cq_moder = rx_profile[cq_period_mode][ix];
+
cq_moder.cq_period_mode = cq_period_mode;
return cq_moder;
}
-static inline struct net_dim_cq_moder net_dim_get_def_profile(u8 rx_cq_period_mode)
+static inline struct net_dim_cq_moder
+net_dim_get_def_rx_moderation(u8 cq_period_mode)
{
- int default_profile_ix;
+ u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+ NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
- if (rx_cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE)
- default_profile_ix = NET_DIM_DEF_PROFILE_CQE;
- else /* NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE */
- default_profile_ix = NET_DIM_DEF_PROFILE_EQE;
+ return net_dim_get_rx_moderation(cq_period_mode, profile_ix);
+}
- return net_dim_get_profile(rx_cq_period_mode, default_profile_ix);
+static inline struct net_dim_cq_moder
+net_dim_get_tx_moderation(u8 cq_period_mode, int ix)
+{
+ struct net_dim_cq_moder cq_moder = tx_profile[cq_period_mode][ix];
+
+ cq_moder.cq_period_mode = cq_period_mode;
+ return cq_moder;
+}
+
+static inline struct net_dim_cq_moder
+net_dim_get_def_tx_moderation(u8 cq_period_mode)
+{
+ u8 profile_ix = cq_period_mode == NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE ?
+ NET_DIM_DEF_PROFILE_CQE : NET_DIM_DEF_PROFILE_EQE;
+
+ return net_dim_get_tx_moderation(cq_period_mode, profile_ix);
}
static inline bool net_dim_on_top(struct net_dim *dim)
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 35b79f4..623bb8c 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -55,8 +55,9 @@
NETIF_F_GSO_SCTP_BIT, /* ... SCTP fragmentation */
NETIF_F_GSO_ESP_BIT, /* ... ESP with TSO */
NETIF_F_GSO_UDP_BIT, /* ... UFO, deprecated except tuntap */
+ NETIF_F_GSO_UDP_L4_BIT, /* ... UDP payload GSO (not UFO) */
/**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */
- NETIF_F_GSO_UDP_BIT,
+ NETIF_F_GSO_UDP_L4_BIT,
NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */
NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */
@@ -77,6 +78,7 @@
NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload */
NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */
NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
+ NETIF_F_HW_TLS_TX_BIT, /* Hardware TLS TX offload */
NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */
NETIF_F_HW_TLS_RECORD_BIT, /* Offload TLS record */
@@ -147,6 +149,8 @@
#define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM)
#define NETIF_F_RX_UDP_TUNNEL_PORT __NETIF_F(RX_UDP_TUNNEL_PORT)
#define NETIF_F_HW_TLS_RECORD __NETIF_F(HW_TLS_RECORD)
+#define NETIF_F_GSO_UDP_L4 __NETIF_F(GSO_UDP_L4)
+#define NETIF_F_HW_TLS_TX __NETIF_F(HW_TLS_TX)
#define for_each_netdev_feature(mask_addr, bit) \
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cf44503..8452f72 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -865,6 +865,26 @@
};
#endif
+#if IS_ENABLED(CONFIG_TLS_DEVICE)
+enum tls_offload_ctx_dir {
+ TLS_OFFLOAD_CTX_DIR_RX,
+ TLS_OFFLOAD_CTX_DIR_TX,
+};
+
+struct tls_crypto_info;
+struct tls_context;
+
+struct tlsdev_ops {
+ int (*tls_dev_add)(struct net_device *netdev, struct sock *sk,
+ enum tls_offload_ctx_dir direction,
+ struct tls_crypto_info *crypto_info,
+ u32 start_offload_tcp_sn);
+ void (*tls_dev_del)(struct net_device *netdev,
+ struct tls_context *ctx,
+ enum tls_offload_ctx_dir direction);
+};
+#endif
+
struct dev_ifalias {
struct rcu_head rcuhead;
char ifalias[];
@@ -1165,9 +1185,13 @@
* This function is used to set or query state related to XDP on the
* netdevice and manage BPF offload. See definition of
* enum bpf_netdev_command for details.
- * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_buff *xdp);
- * This function is used to submit a XDP packet for transmit on a
- * netdevice.
+ * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp);
+ * This function is used to submit @n XDP packets for transmit on a
+ * netdevice. Returns number of frames successfully transmitted, frames
+ * that got dropped are freed/returned via xdp_return_frame().
+ * Returns negative number, means general error invoking ndo, meaning
+ * no frames were xmit'ed and core-caller will free all frames.
+ * TODO: Consider add flag to allow sending flush operation.
* void (*ndo_xdp_flush)(struct net_device *dev);
* This function is used to inform the driver to flush a particular
* xdp tx queue. Must be called on same CPU as xdp_xmit.
@@ -1355,8 +1379,8 @@
int needed_headroom);
int (*ndo_bpf)(struct net_device *dev,
struct netdev_bpf *bpf);
- int (*ndo_xdp_xmit)(struct net_device *dev,
- struct xdp_buff *xdp);
+ int (*ndo_xdp_xmit)(struct net_device *dev, int n,
+ struct xdp_frame **xdp);
void (*ndo_xdp_flush)(struct net_device *dev);
};
@@ -1750,6 +1774,10 @@
const struct xfrmdev_ops *xfrmdev_ops;
#endif
+#if IS_ENABLED(CONFIG_TLS_DEVICE)
+ const struct tlsdev_ops *tlsdev_ops;
+#endif
+
const struct header_ops *header_ops;
unsigned int flags;
@@ -2304,8 +2332,19 @@
NETDEV_LAG_TX_TYPE_HASH,
};
+enum netdev_lag_hash {
+ NETDEV_LAG_HASH_NONE,
+ NETDEV_LAG_HASH_L2,
+ NETDEV_LAG_HASH_L34,
+ NETDEV_LAG_HASH_L23,
+ NETDEV_LAG_HASH_E23,
+ NETDEV_LAG_HASH_E34,
+ NETDEV_LAG_HASH_UNKNOWN,
+};
+
struct netdev_lag_upper_info {
enum netdev_lag_tx_type tx_type;
+ enum netdev_lag_hash hash_type;
};
struct netdev_lag_lower_state_info {
@@ -2486,6 +2525,7 @@
int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb);
int dev_queue_xmit(struct sk_buff *skb);
int dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv);
+int dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
int register_netdevice(struct net_device *dev);
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
void unregister_netdevice_many(struct list_head *head);
@@ -3213,19 +3253,6 @@
}
#endif
-u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
- unsigned int num_tx_queues);
-
-/*
- * Returns a Tx hash for the given packet when dev->real_num_tx_queues is used
- * as a distribution range limit for the returned value.
- */
-static inline u16 skb_tx_hash(const struct net_device *dev,
- struct sk_buff *skb)
-{
- return __skb_tx_hash(dev, skb, dev->real_num_tx_queues);
-}
-
/**
* netif_is_multiqueue - test if device has multiple transmit queues
* @dev: network device
@@ -4186,6 +4213,7 @@
BUILD_BUG_ON(SKB_GSO_SCTP != (NETIF_F_GSO_SCTP >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
+ BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
return (features & feature) == feature;
}
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 85a1a0b..04551af 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -67,7 +67,6 @@
struct net_device *dev;
void *priv;
u_int8_t pf;
- bool nat_hook;
unsigned int hooknum;
/* Hooks are ordered in ascending priority. */
int priority;
@@ -321,18 +320,33 @@
int nf_reroute(struct sk_buff *skb, struct nf_queue_entry *entry);
#include <net/flow.h>
-extern void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
+
+struct nf_conn;
+enum nf_nat_manip_type;
+struct nlattr;
+enum ip_conntrack_dir;
+
+struct nf_nat_hook {
+ int (*parse_nat_setup)(struct nf_conn *ct, enum nf_nat_manip_type manip,
+ const struct nlattr *attr);
+ void (*decode_session)(struct sk_buff *skb, struct flowi *fl);
+ unsigned int (*manip_pkt)(struct sk_buff *skb, struct nf_conn *ct,
+ enum nf_nat_manip_type mtype,
+ enum ip_conntrack_dir dir);
+};
+
+extern struct nf_nat_hook __rcu *nf_nat_hook;
static inline void
nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family)
{
#ifdef CONFIG_NF_NAT_NEEDED
- void (*decodefn)(struct sk_buff *, struct flowi *);
+ struct nf_nat_hook *nat_hook;
rcu_read_lock();
- decodefn = rcu_dereference(nf_nat_decode_session_hook);
- if (decodefn)
- decodefn(skb, fl);
+ nat_hook = rcu_dereference(nf_nat_hook);
+ if (nat_hook->decode_session)
+ nat_hook->decode_session(skb, fl);
rcu_read_unlock();
#endif
}
@@ -374,13 +388,19 @@
extern void (*ip_ct_attach)(struct sk_buff *, const struct sk_buff *) __rcu;
void nf_ct_attach(struct sk_buff *, const struct sk_buff *);
-extern void (*nf_ct_destroy)(struct nf_conntrack *) __rcu;
#else
static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
#endif
struct nf_conn;
enum ip_conntrack_info;
+
+struct nf_ct_hook {
+ int (*update)(struct net *net, struct sk_buff *skb);
+ void (*destroy)(struct nf_conntrack *);
+};
+extern struct nf_ct_hook __rcu *nf_ct_hook;
+
struct nlattr;
struct nfnl_ct_hook {
diff --git a/include/linux/netfilter/nf_osf.h b/include/linux/netfilter/nf_osf.h
new file mode 100644
index 0000000..0e114c4
--- /dev/null
+++ b/include/linux/netfilter/nf_osf.h
@@ -0,0 +1,33 @@
+#include <uapi/linux/netfilter/nf_osf.h>
+
+/* Initial window size option state machine: multiple of mss, mtu or
+ * plain numeric value. Can also be made as plain numeric value which
+ * is not a multiple of specified value.
+ */
+enum nf_osf_window_size_options {
+ OSF_WSS_PLAIN = 0,
+ OSF_WSS_MSS,
+ OSF_WSS_MTU,
+ OSF_WSS_MODULO,
+ OSF_WSS_MAX,
+};
+
+enum osf_fmatch_states {
+ /* Packet does not match the fingerprint */
+ FMATCH_WRONG = 0,
+ /* Packet matches the fingerprint */
+ FMATCH_OK,
+ /* Options do not match the fingerprint, but header does */
+ FMATCH_OPT_WRONG,
+};
+
+struct nf_osf_finger {
+ struct rcu_head rcu_head;
+ struct list_head finger_entry;
+ struct nf_osf_user_finger finger;
+};
+
+bool nf_osf_match(const struct sk_buff *skb, u_int8_t family,
+ int hooknum, struct net_device *in, struct net_device *out,
+ const struct nf_osf_info *info, struct net *net,
+ const struct list_head *nf_osf_fingers);
diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index 0773b5a..c6935be 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -17,10 +17,6 @@
#include <linux/if_ether.h>
#include <uapi/linux/netfilter_bridge/ebtables.h>
-/* return values for match() functions */
-#define EBT_MATCH 0
-#define EBT_NOMATCH 1
-
struct ebt_match {
struct list_head list;
const char name[EBT_FUNCTION_MAXNAMELEN];
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e71e99e..eec302b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -868,6 +868,7 @@
extern void perf_event_free_task(struct task_struct *task);
extern void perf_event_delayed_put(struct task_struct *task);
extern struct file *perf_event_get(unsigned int fd);
+extern const struct perf_event *perf_get_event(struct file *file);
extern const struct perf_event_attr *perf_event_attrs(struct perf_event *event);
extern void perf_event_print_debug(void);
extern void perf_pmu_disable(struct pmu *pmu);
@@ -1289,6 +1290,10 @@
static inline void perf_event_free_task(struct task_struct *task) { }
static inline void perf_event_delayed_put(struct task_struct *task) { }
static inline struct file *perf_event_get(unsigned int fd) { return ERR_PTR(-EINVAL); }
+static inline const struct perf_event *perf_get_event(struct file *file)
+{
+ return ERR_PTR(-EINVAL);
+}
static inline const struct perf_event_attr *perf_event_attrs(struct perf_event *event)
{
return ERR_PTR(-EINVAL);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index f0b5870..6cd0909 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -406,13 +406,17 @@
u32 phy_id;
struct phy_c45_device_ids c45_ids;
- bool is_c45;
- bool is_internal;
- bool is_pseudo_fixed_link;
- bool has_fixups;
- bool suspended;
- bool sysfs_links;
- bool loopback_enabled;
+ unsigned is_c45:1;
+ unsigned is_internal:1;
+ unsigned is_pseudo_fixed_link:1;
+ unsigned has_fixups:1;
+ unsigned suspended:1;
+ unsigned sysfs_links:1;
+ unsigned loopback_enabled:1;
+
+ unsigned autoneg:1;
+ /* The most recently read link state */
+ unsigned link:1;
enum phy_state state;
@@ -429,9 +433,6 @@
int pause;
int asym_pause;
- /* The most recently read link state */
- int link;
-
/* Enabled Interrupts */
u32 interrupts;
@@ -444,8 +445,6 @@
/* Energy efficient ethernet modes which should be prohibited */
u32 eee_broken_modes;
- int autoneg;
-
int link_timeout;
#ifdef CONFIG_LED_TRIGGER_PHY
@@ -1068,6 +1067,52 @@
void mdio_bus_exit(void);
#endif
+/* Inline function for use within net/core/ethtool.c (built-in) */
+static inline int phy_ethtool_get_strings(struct phy_device *phydev, u8 *data)
+{
+ if (!phydev->drv)
+ return -EIO;
+
+ mutex_lock(&phydev->lock);
+ phydev->drv->get_strings(phydev, data);
+ mutex_unlock(&phydev->lock);
+
+ return 0;
+}
+
+static inline int phy_ethtool_get_sset_count(struct phy_device *phydev)
+{
+ int ret;
+
+ if (!phydev->drv)
+ return -EIO;
+
+ if (phydev->drv->get_sset_count &&
+ phydev->drv->get_strings &&
+ phydev->drv->get_stats) {
+ mutex_lock(&phydev->lock);
+ ret = phydev->drv->get_sset_count(phydev);
+ mutex_unlock(&phydev->lock);
+
+ return ret;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static inline int phy_ethtool_get_stats(struct phy_device *phydev,
+ struct ethtool_stats *stats, u64 *data)
+{
+ if (!phydev->drv)
+ return -EIO;
+
+ mutex_lock(&phydev->lock);
+ phydev->drv->get_stats(phydev, stats, data);
+ mutex_unlock(&phydev->lock);
+
+ return 0;
+}
+
extern struct bus_type mdio_bus_type;
struct mdio_board_info {
diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
index c9d14ee..9713aeb 100644
--- a/include/linux/phy/phy.h
+++ b/include/linux/phy/phy.h
@@ -36,6 +36,7 @@
PHY_MODE_USB_DEVICE_SS,
PHY_MODE_USB_OTG,
PHY_MODE_SGMII,
+ PHY_MODE_2500SGMII,
PHY_MODE_10GKR,
PHY_MODE_UFS_HS_A,
PHY_MODE_UFS_HS_B,
diff --git a/include/linux/platform_data/b53.h b/include/linux/platform_data/b53.h
index 69d279c..8eaef2f 100644
--- a/include/linux/platform_data/b53.h
+++ b/include/linux/platform_data/b53.h
@@ -20,8 +20,12 @@
#define __B53_H
#include <linux/kernel.h>
+#include <net/dsa.h>
struct b53_platform_data {
+ /* Must be first such that dsa_register_switch() can access it */
+ struct dsa_chip_data cd;
+
u32 chip_id;
u16 enabled_ports;
diff --git a/include/linux/platform_data/mdio-gpio.h b/include/linux/platform_data/mdio-gpio.h
deleted file mode 100644
index 11f00cd..0000000
--- a/include/linux/platform_data/mdio-gpio.h
+++ /dev/null
@@ -1,33 +0,0 @@
-/*
- * MDIO-GPIO bus platform data structures
- *
- * Copyright (C) 2008, Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
- *
- * This file is licensed under the terms of the GNU General Public License
- * version 2. This program is licensed "as is" without any warranty of any
- * kind, whether express or implied.
- */
-
-#ifndef __LINUX_MDIO_GPIO_H
-#define __LINUX_MDIO_GPIO_H
-
-#include <linux/mdio-bitbang.h>
-
-struct mdio_gpio_platform_data {
- /* GPIO numbers for bus pins */
- unsigned int mdc;
- unsigned int mdio;
- unsigned int mdo;
-
- bool mdc_active_low;
- bool mdio_active_low;
- bool mdo_active_low;
-
- u32 phy_mask;
- u32 phy_ignore_ta_mask;
- int irqs[PHY_MAX_ADDR];
- /* reset callback */
- int (*reset)(struct mii_bus *bus);
-};
-
-#endif /* __LINUX_MDIO_GPIO_H */
diff --git a/include/linux/platform_data/mv88e6xxx.h b/include/linux/platform_data/mv88e6xxx.h
new file mode 100644
index 0000000..f63af29
--- /dev/null
+++ b/include/linux/platform_data/mv88e6xxx.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __DSA_MV88E6XXX_H
+#define __DSA_MV88E6XXX_H
+
+#include <net/dsa.h>
+
+struct dsa_mv88e6xxx_pdata {
+ /* Must be first, such that dsa_register_switch() can access this
+ * without gory pointer manipulations
+ */
+ struct dsa_chip_data cd;
+ const char *compatible;
+ unsigned int enabled_ports;
+ struct net_device *netdev;
+ u32 eeprom_len;
+};
+
+#endif
diff --git a/include/linux/qed/qed_eth_if.h b/include/linux/qed/qed_eth_if.h
index 147d08c..2978fa4 100644
--- a/include/linux/qed/qed_eth_if.h
+++ b/include/linux/qed/qed_eth_if.h
@@ -66,6 +66,7 @@
QED_FILTER_CONFIG_MODE_5_TUPLE,
QED_FILTER_CONFIG_MODE_L4_PORT,
QED_FILTER_CONFIG_MODE_IP_DEST,
+ QED_FILTER_CONFIG_MODE_IP_SRC,
};
struct qed_ntuple_filter_params {
@@ -88,6 +89,9 @@
/* true iff this filter is to be added. Else to be removed */
bool b_is_add;
+
+ /* If flow needs to be dropped */
+ bool b_is_drop;
};
struct qed_dev_eth_info {
@@ -352,6 +356,7 @@
int (*configure_arfs_searcher)(struct qed_dev *cdev,
enum qed_filter_config_mode mode);
int (*get_coalesce)(struct qed_dev *cdev, u16 *coal, void *handle);
+ int (*req_bulletin_update_mac)(struct qed_dev *cdev, u8 *mac);
};
const struct qed_eth_ops *qed_get_eth_ops(void);
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index b5b2bc9..ac991a3 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -159,6 +159,9 @@
enum qed_nvm_images {
QED_NVM_IMAGE_ISCSI_CFG,
QED_NVM_IMAGE_FCOE_CFG,
+ QED_NVM_IMAGE_NVM_CFG1,
+ QED_NVM_IMAGE_DEFAULT_CFG,
+ QED_NVM_IMAGE_NVM_META,
};
struct qed_link_eee_params {
@@ -179,6 +182,272 @@
QED_LED_MODE_RESTORE
};
+struct qed_mfw_tlv_eth {
+ u16 lso_maxoff_size;
+ bool lso_maxoff_size_set;
+ u16 lso_minseg_size;
+ bool lso_minseg_size_set;
+ u8 prom_mode;
+ bool prom_mode_set;
+ u16 tx_descr_size;
+ bool tx_descr_size_set;
+ u16 rx_descr_size;
+ bool rx_descr_size_set;
+ u16 netq_count;
+ bool netq_count_set;
+ u32 tcp4_offloads;
+ bool tcp4_offloads_set;
+ u32 tcp6_offloads;
+ bool tcp6_offloads_set;
+ u16 tx_descr_qdepth;
+ bool tx_descr_qdepth_set;
+ u16 rx_descr_qdepth;
+ bool rx_descr_qdepth_set;
+ u8 iov_offload;
+#define QED_MFW_TLV_IOV_OFFLOAD_NONE (0)
+#define QED_MFW_TLV_IOV_OFFLOAD_MULTIQUEUE (1)
+#define QED_MFW_TLV_IOV_OFFLOAD_VEB (2)
+#define QED_MFW_TLV_IOV_OFFLOAD_VEPA (3)
+ bool iov_offload_set;
+ u8 txqs_empty;
+ bool txqs_empty_set;
+ u8 rxqs_empty;
+ bool rxqs_empty_set;
+ u8 num_txqs_full;
+ bool num_txqs_full_set;
+ u8 num_rxqs_full;
+ bool num_rxqs_full_set;
+};
+
+#define QED_MFW_TLV_TIME_SIZE 14
+struct qed_mfw_tlv_time {
+ bool b_set;
+ u8 month;
+ u8 day;
+ u8 hour;
+ u8 min;
+ u16 msec;
+ u16 usec;
+};
+
+struct qed_mfw_tlv_fcoe {
+ u8 scsi_timeout;
+ bool scsi_timeout_set;
+ u32 rt_tov;
+ bool rt_tov_set;
+ u32 ra_tov;
+ bool ra_tov_set;
+ u32 ed_tov;
+ bool ed_tov_set;
+ u32 cr_tov;
+ bool cr_tov_set;
+ u8 boot_type;
+ bool boot_type_set;
+ u8 npiv_state;
+ bool npiv_state_set;
+ u32 num_npiv_ids;
+ bool num_npiv_ids_set;
+ u8 switch_name[8];
+ bool switch_name_set;
+ u16 switch_portnum;
+ bool switch_portnum_set;
+ u8 switch_portid[3];
+ bool switch_portid_set;
+ u8 vendor_name[8];
+ bool vendor_name_set;
+ u8 switch_model[8];
+ bool switch_model_set;
+ u8 switch_fw_version[8];
+ bool switch_fw_version_set;
+ u8 qos_pri;
+ bool qos_pri_set;
+ u8 port_alias[3];
+ bool port_alias_set;
+ u8 port_state;
+#define QED_MFW_TLV_PORT_STATE_OFFLINE (0)
+#define QED_MFW_TLV_PORT_STATE_LOOP (1)
+#define QED_MFW_TLV_PORT_STATE_P2P (2)
+#define QED_MFW_TLV_PORT_STATE_FABRIC (3)
+ bool port_state_set;
+ u16 fip_tx_descr_size;
+ bool fip_tx_descr_size_set;
+ u16 fip_rx_descr_size;
+ bool fip_rx_descr_size_set;
+ u16 link_failures;
+ bool link_failures_set;
+ u8 fcoe_boot_progress;
+ bool fcoe_boot_progress_set;
+ u64 rx_bcast;
+ bool rx_bcast_set;
+ u64 tx_bcast;
+ bool tx_bcast_set;
+ u16 fcoe_txq_depth;
+ bool fcoe_txq_depth_set;
+ u16 fcoe_rxq_depth;
+ bool fcoe_rxq_depth_set;
+ u64 fcoe_rx_frames;
+ bool fcoe_rx_frames_set;
+ u64 fcoe_rx_bytes;
+ bool fcoe_rx_bytes_set;
+ u64 fcoe_tx_frames;
+ bool fcoe_tx_frames_set;
+ u64 fcoe_tx_bytes;
+ bool fcoe_tx_bytes_set;
+ u16 crc_count;
+ bool crc_count_set;
+ u32 crc_err_src_fcid[5];
+ bool crc_err_src_fcid_set[5];
+ struct qed_mfw_tlv_time crc_err[5];
+ u16 losync_err;
+ bool losync_err_set;
+ u16 losig_err;
+ bool losig_err_set;
+ u16 primtive_err;
+ bool primtive_err_set;
+ u16 disparity_err;
+ bool disparity_err_set;
+ u16 code_violation_err;
+ bool code_violation_err_set;
+ u32 flogi_param[4];
+ bool flogi_param_set[4];
+ struct qed_mfw_tlv_time flogi_tstamp;
+ u32 flogi_acc_param[4];
+ bool flogi_acc_param_set[4];
+ struct qed_mfw_tlv_time flogi_acc_tstamp;
+ u32 flogi_rjt;
+ bool flogi_rjt_set;
+ struct qed_mfw_tlv_time flogi_rjt_tstamp;
+ u32 fdiscs;
+ bool fdiscs_set;
+ u8 fdisc_acc;
+ bool fdisc_acc_set;
+ u8 fdisc_rjt;
+ bool fdisc_rjt_set;
+ u8 plogi;
+ bool plogi_set;
+ u8 plogi_acc;
+ bool plogi_acc_set;
+ u8 plogi_rjt;
+ bool plogi_rjt_set;
+ u32 plogi_dst_fcid[5];
+ bool plogi_dst_fcid_set[5];
+ struct qed_mfw_tlv_time plogi_tstamp[5];
+ u32 plogi_acc_src_fcid[5];
+ bool plogi_acc_src_fcid_set[5];
+ struct qed_mfw_tlv_time plogi_acc_tstamp[5];
+ u8 tx_plogos;
+ bool tx_plogos_set;
+ u8 plogo_acc;
+ bool plogo_acc_set;
+ u8 plogo_rjt;
+ bool plogo_rjt_set;
+ u32 plogo_src_fcid[5];
+ bool plogo_src_fcid_set[5];
+ struct qed_mfw_tlv_time plogo_tstamp[5];
+ u8 rx_logos;
+ bool rx_logos_set;
+ u8 tx_accs;
+ bool tx_accs_set;
+ u8 tx_prlis;
+ bool tx_prlis_set;
+ u8 rx_accs;
+ bool rx_accs_set;
+ u8 tx_abts;
+ bool tx_abts_set;
+ u8 rx_abts_acc;
+ bool rx_abts_acc_set;
+ u8 rx_abts_rjt;
+ bool rx_abts_rjt_set;
+ u32 abts_dst_fcid[5];
+ bool abts_dst_fcid_set[5];
+ struct qed_mfw_tlv_time abts_tstamp[5];
+ u8 rx_rscn;
+ bool rx_rscn_set;
+ u32 rx_rscn_nport[4];
+ bool rx_rscn_nport_set[4];
+ u8 tx_lun_rst;
+ bool tx_lun_rst_set;
+ u8 abort_task_sets;
+ bool abort_task_sets_set;
+ u8 tx_tprlos;
+ bool tx_tprlos_set;
+ u8 tx_nos;
+ bool tx_nos_set;
+ u8 rx_nos;
+ bool rx_nos_set;
+ u8 ols;
+ bool ols_set;
+ u8 lr;
+ bool lr_set;
+ u8 lrr;
+ bool lrr_set;
+ u8 tx_lip;
+ bool tx_lip_set;
+ u8 rx_lip;
+ bool rx_lip_set;
+ u8 eofa;
+ bool eofa_set;
+ u8 eofni;
+ bool eofni_set;
+ u8 scsi_chks;
+ bool scsi_chks_set;
+ u8 scsi_cond_met;
+ bool scsi_cond_met_set;
+ u8 scsi_busy;
+ bool scsi_busy_set;
+ u8 scsi_inter;
+ bool scsi_inter_set;
+ u8 scsi_inter_cond_met;
+ bool scsi_inter_cond_met_set;
+ u8 scsi_rsv_conflicts;
+ bool scsi_rsv_conflicts_set;
+ u8 scsi_tsk_full;
+ bool scsi_tsk_full_set;
+ u8 scsi_aca_active;
+ bool scsi_aca_active_set;
+ u8 scsi_tsk_abort;
+ bool scsi_tsk_abort_set;
+ u32 scsi_rx_chk[5];
+ bool scsi_rx_chk_set[5];
+ struct qed_mfw_tlv_time scsi_chk_tstamp[5];
+};
+
+struct qed_mfw_tlv_iscsi {
+ u8 target_llmnr;
+ bool target_llmnr_set;
+ u8 header_digest;
+ bool header_digest_set;
+ u8 data_digest;
+ bool data_digest_set;
+ u8 auth_method;
+#define QED_MFW_TLV_AUTH_METHOD_NONE (1)
+#define QED_MFW_TLV_AUTH_METHOD_CHAP (2)
+#define QED_MFW_TLV_AUTH_METHOD_MUTUAL_CHAP (3)
+ bool auth_method_set;
+ u16 boot_taget_portal;
+ bool boot_taget_portal_set;
+ u16 frame_size;
+ bool frame_size_set;
+ u16 tx_desc_size;
+ bool tx_desc_size_set;
+ u16 rx_desc_size;
+ bool rx_desc_size_set;
+ u8 boot_progress;
+ bool boot_progress_set;
+ u16 tx_desc_qdepth;
+ bool tx_desc_qdepth_set;
+ u16 rx_desc_qdepth;
+ bool rx_desc_qdepth_set;
+ u64 rx_frames;
+ bool rx_frames_set;
+ u64 rx_bytes;
+ bool rx_bytes_set;
+ u64 tx_frames;
+ bool tx_frames_set;
+ u64 tx_bytes;
+ bool tx_bytes_set;
+};
+
#define DIRECT_REG_WR(reg_addr, val) writel((u32)val, \
(void __iomem *)(reg_addr))
@@ -336,7 +605,6 @@
u8 num_hwfns;
u8 hw_mac[ETH_ALEN];
- bool is_mf_default;
/* FW version */
u16 fw_major;
@@ -356,7 +624,7 @@
#define QED_MFW_VERSION_3_OFFSET 24
u32 flash_size;
- u8 mf_mode;
+ bool b_inter_pf_switch;
bool tx_switching;
bool rdma_supported;
u16 mtu;
@@ -483,6 +751,14 @@
u8 used_cnt;
};
+struct qed_generic_tlvs {
+#define QED_TLV_IP_CSUM BIT(0)
+#define QED_TLV_LSO BIT(1)
+ u16 feat_flags;
+#define QED_TLV_MAC_COUNT 3
+ u8 mac[QED_TLV_MAC_COUNT][ETH_ALEN];
+};
+
#define QED_NVM_SIGNATURE 0x12435687
enum qed_nvm_flash_cmd {
@@ -497,6 +773,8 @@
void (*link_update)(void *dev,
struct qed_link_output *link);
void (*dcbx_aen)(void *dev, struct qed_dcbx_get *get, u32 mib_type);
+ void (*get_generic_tlv_data)(void *dev, struct qed_generic_tlvs *data);
+ void (*get_protocol_tlv_data)(void *dev, void *data);
};
struct qed_selftest_ops {
@@ -851,6 +1129,7 @@
u64 rx_bcast_pkts;
u64 mftag_filter_discards;
u64 mac_filter_discards;
+ u64 gft_filter_drop;
u64 tx_ucast_bytes;
u64 tx_mcast_bytes;
u64 tx_bcast_bytes;
diff --git a/include/linux/qed/qed_ll2_if.h b/include/linux/qed/qed_ll2_if.h
index 266c1fb..5eb0229 100644
--- a/include/linux/qed/qed_ll2_if.h
+++ b/include/linux/qed/qed_ll2_if.h
@@ -202,6 +202,7 @@
bool enable_ip_cksum;
bool enable_l4_cksum;
bool calc_ip_len;
+ bool remove_stag;
};
#define QED_LL2_UNUSED_HANDLE (0xff)
@@ -220,6 +221,11 @@
u8 ll2_mac_address[ETH_ALEN];
};
+enum qed_ll2_xmit_flags {
+ /* FIP discovery packet */
+ QED_LL2_XMIT_FLAGS_FIP_DISCOVERY
+};
+
struct qed_ll2_ops {
/**
* @brief start - initializes ll2
@@ -245,10 +251,12 @@
*
* @param cdev
* @param skb
+ * @param xmit_flags - Transmit options defined by the enum qed_ll2_xmit_flags.
*
* @return 0 on success, otherwise error value.
*/
- int (*start_xmit)(struct qed_dev *cdev, struct sk_buff *skb);
+ int (*start_xmit)(struct qed_dev *cdev, struct sk_buff *skb,
+ unsigned long xmit_flags);
/**
* @brief register_cb_ops - protocol driver register the callback for Rx/Tx
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 1f8ad12..4e1f535 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -836,9 +836,8 @@
*
* It is safe to call this function from atomic context.
*
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
*/
static inline int rhashtable_insert_fast(
struct rhashtable *ht, struct rhash_head *obj,
@@ -866,9 +865,8 @@
*
* It is safe to call this function from atomic context.
*
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
*/
static inline int rhltable_insert_key(
struct rhltable *hlt, const void *key, struct rhlist_head *list,
@@ -890,9 +888,8 @@
*
* It is safe to call this function from atomic context.
*
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
*/
static inline int rhltable_insert(
struct rhltable *hlt, struct rhlist_head *list,
@@ -922,9 +919,8 @@
*
* It is safe to call this function from atomic context.
*
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
*/
static inline int rhashtable_lookup_insert_fast(
struct rhashtable *ht, struct rhash_head *obj,
@@ -981,9 +977,8 @@
*
* Lookups may occur in parallel with hashtable mutations and resizing.
*
- * Will trigger an automatic deferred table resizing if the size grows
- * beyond the watermark indicated by grow_decision() which can be passed
- * to rhashtable_init().
+ * Will trigger an automatic deferred table resizing if residency in the
+ * table grows beyond 70%.
*
* Returns zero on success.
*/
@@ -1134,8 +1129,8 @@
* walk the bucket chain upon removal. The removal operation is thus
* considerable slow if the hash table is not correctly sized.
*
- * Will automatically shrink the table via rhashtable_expand() if the
- * shrink_decision function specified at rhashtable_init() returns true.
+ * Will automatically shrink the table if permitted when residency drops
+ * below 30%.
*
* Returns zero on success, -ENOENT if the entry could not be found.
*/
@@ -1156,8 +1151,8 @@
* walk the bucket chain upon removal. The removal operation is thus
* considerable slow if the hash table is not correctly sized.
*
- * Will automatically shrink the table via rhashtable_expand() if the
- * shrink_decision function specified at rhashtable_init() returns true.
+ * Will automatically shrink the table if permitted when residency drops
+ * below 30%
*
* Returns zero on success, -ENOENT if the entry could not be found.
*/
@@ -1273,8 +1268,9 @@
* For a completely stable walk you should construct your own data
* structure outside the hash table.
*
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
+ * This function may be called from any process context, including
+ * non-preemptable context, but cannot be called from softirq or
+ * hardirq context.
*
* You must call rhashtable_walk_exit after this function returns.
*/
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index a6b6e8b..62d9b0a 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -97,6 +97,11 @@
return ptr_ring_empty_any(&a->ring);
}
+static inline struct sk_buff *__skb_array_consume(struct skb_array *a)
+{
+ return __ptr_ring_consume(&a->ring);
+}
+
static inline struct sk_buff *skb_array_consume(struct skb_array *a)
{
return ptr_ring_consume(&a->ring);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9065477..693564a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -573,6 +573,8 @@
SKB_GSO_ESP = 1 << 15,
SKB_GSO_UDP = 1 << 16,
+
+ SKB_GSO_UDP_L4 = 1 << 17,
};
#if BITS_PER_LONG > 32
@@ -1032,6 +1034,7 @@
struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src);
int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask);
struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority);
+void skb_copy_header(struct sk_buff *new, const struct sk_buff *old);
struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t priority);
struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom,
gfp_t gfp_mask, bool fclone);
@@ -1168,7 +1171,7 @@
u32 __skb_get_hash_symmetric(const struct sk_buff *skb);
u32 skb_get_poff(const struct sk_buff *skb);
u32 __skb_get_poff(const struct sk_buff *skb, void *data,
- const struct flow_keys *keys, int hlen);
+ const struct flow_keys_basic *keys, int hlen);
__be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
void *data, int hlen_proto);
@@ -1205,13 +1208,14 @@
NULL, 0, 0, 0, flags);
}
-static inline bool skb_flow_dissect_flow_keys_buf(struct flow_keys *flow,
- void *data, __be16 proto,
- int nhoff, int hlen,
- unsigned int flags)
+static inline bool
+skb_flow_dissect_flow_keys_basic(const struct sk_buff *skb,
+ struct flow_keys_basic *flow, void *data,
+ __be16 proto, int nhoff, int hlen,
+ unsigned int flags)
{
memset(flow, 0, sizeof(*flow));
- return __skb_flow_dissect(NULL, &flow_keys_buf_dissector, flow,
+ return __skb_flow_dissect(skb, &flow_keys_basic_dissector, flow,
data, proto, nhoff, hlen, flags);
}
@@ -2347,11 +2351,12 @@
static inline void skb_probe_transport_header(struct sk_buff *skb,
const int offset_hint)
{
- struct flow_keys keys;
+ struct flow_keys_basic keys;
if (skb_transport_header_was_set(skb))
return;
- else if (skb_flow_dissect_flow_keys(skb, &keys, 0))
+
+ if (skb_flow_dissect_flow_keys_basic(skb, &keys, 0, 0, 0, 0, 0))
skb_set_transport_header(skb, keys.control.thoff);
else
skb_set_transport_header(skb, offset_hint);
@@ -3131,6 +3136,7 @@
return skb->data;
}
+int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
/**
* pskb_trim_rcsum - trim received skb and update checksum
* @skb: buffer to trim
@@ -3144,9 +3150,7 @@
{
if (likely(len >= skb->len))
return 0;
- if (skb->ip_summed == CHECKSUM_COMPLETE)
- skb->ip_summed = CHECKSUM_NONE;
- return __pskb_trim(skb, len);
+ return pskb_trim_rcsum_slow(skb, len);
}
static inline int __skb_trim_rcsum(struct sk_buff *skb, unsigned int len)
diff --git a/include/linux/soc/ti/knav_dma.h b/include/linux/soc/ti/knav_dma.h
index 66693bc..7127ec3 100644
--- a/include/linux/soc/ti/knav_dma.h
+++ b/include/linux/soc/ti/knav_dma.h
@@ -167,6 +167,8 @@
void *knav_dma_open_channel(struct device *dev, const char *name,
struct knav_dma_cfg *config);
void knav_dma_close_channel(void *channel);
+int knav_dma_get_flow(void *channel);
+bool knav_dma_device_ready(void);
#else
static inline void *knav_dma_open_channel(struct device *dev, const char *name,
struct knav_dma_cfg *config)
@@ -176,6 +178,16 @@
static inline void knav_dma_close_channel(void *channel)
{}
+static inline int knav_dma_get_flow(void *channel)
+{
+ return -EINVAL;
+}
+
+static inline bool knav_dma_device_ready(void)
+{
+ return false;
+}
+
#endif
#endif /* __SOC_TI_KEYSTONE_NAVIGATOR_DMA_H__ */
diff --git a/include/linux/soc/ti/knav_qmss.h b/include/linux/soc/ti/knav_qmss.h
index 9f0ebb3b..9745df6 100644
--- a/include/linux/soc/ti/knav_qmss.h
+++ b/include/linux/soc/ti/knav_qmss.h
@@ -86,5 +86,6 @@
void *knav_pool_desc_unmap(void *ph, dma_addr_t dma, unsigned dma_sz);
dma_addr_t knav_pool_desc_virt_to_dma(void *ph, void *virt);
void *knav_pool_desc_dma_to_virt(void *ph, dma_addr_t dma);
+bool knav_qmss_device_ready(void);
#endif /* __SOC_TI_KNAV_QMSS_H__ */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index ea50f4a..7ed4713 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -207,8 +207,9 @@
* PF_SMC protocol family that
* reuses AF_INET address family
*/
+#define AF_XDP 44 /* XDP sockets */
-#define AF_MAX 44 /* For now.. */
+#define AF_MAX 45 /* For now.. */
/* Protocol families, same as address families. */
#define PF_UNSPEC AF_UNSPEC
@@ -257,6 +258,7 @@
#define PF_KCM AF_KCM
#define PF_QIPCRTR AF_QIPCRTR
#define PF_SMC AF_SMC
+#define PF_XDP AF_XDP
#define PF_MAX AF_MAX
/* Maximum queue length specifiable by listen. */
@@ -338,6 +340,7 @@
#define SOL_NFC 280
#define SOL_KCM 281
#define SOL_TLS 282
+#define SOL_XDP 283
/* IPX options */
#define IPX_TYPE 1
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 8f4c549..72705ea 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -218,6 +218,7 @@
reord:1; /* reordering detected */
} rack;
u16 advmss; /* Advertised MSS */
+ u8 compressed_ack;
u32 chrono_start; /* Start time in jiffies of a TCP chrono */
u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
u8 chrono_type:2, /* current chronograph type */
@@ -228,7 +229,7 @@
unused:2;
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
- unused1 : 1,
+ recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
repair : 1,
frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */
u8 repair_queue;
@@ -281,6 +282,7 @@
* receiver in Recovery. */
u32 prr_out; /* Total number of pkts sent during Recovery. */
u32 delivered; /* Total data packets delivered incl. rexmits */
+ u32 delivered_ce; /* Like the above but only ECE marked packets */
u32 lost; /* Total data packets lost incl. rexmits */
u32 app_limited; /* limited until "delivered" reaches this val */
u64 first_tx_mstamp; /* start of window send phase */
@@ -296,6 +298,7 @@
u32 sacked_out; /* SACK'd packets */
struct hrtimer pacing_timer;
+ struct hrtimer compressed_ack_timer;
/* from STCP, retrans queue hinting */
struct sk_buff* lost_skb_hint;
diff --git a/include/linux/tnum.h b/include/linux/tnum.h
index 0d2d3da..c7dc2b5 100644
--- a/include/linux/tnum.h
+++ b/include/linux/tnum.h
@@ -23,8 +23,10 @@
/* Arithmetic and logical ops */
/* Shift a tnum left (by a fixed shift) */
struct tnum tnum_lshift(struct tnum a, u8 shift);
-/* Shift a tnum right (by a fixed shift) */
+/* Shift (rsh) a tnum right (by a fixed shift) */
struct tnum tnum_rshift(struct tnum a, u8 shift);
+/* Shift (arsh) a tnum right (by a fixed min_shift) */
+struct tnum tnum_arshift(struct tnum a, u8 min_shift);
/* Add two tnums, return @a + @b */
struct tnum tnum_add(struct tnum a, struct tnum b);
/* Subtract two tnums, return @a - @b */
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 2bde3ef..d34144a 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -473,6 +473,9 @@
int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name);
+int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
+ u32 *fd_type, const char **buf,
+ u64 *probe_offset, u64 *probe_addr);
#else
static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
{
@@ -504,6 +507,13 @@
{
return NULL;
}
+static inline int bpf_get_perf_event_info(const struct perf_event *event,
+ u32 *prog_id, u32 *fd_type,
+ const char **buf, u64 *probe_offset,
+ u64 *probe_addr)
+{
+ return -EOPNOTSUPP;
+}
#endif
enum {
@@ -560,10 +570,17 @@
#ifdef CONFIG_KPROBE_EVENTS
extern int perf_kprobe_init(struct perf_event *event, bool is_retprobe);
extern void perf_kprobe_destroy(struct perf_event *event);
+extern int bpf_get_kprobe_info(const struct perf_event *event,
+ u32 *fd_type, const char **symbol,
+ u64 *probe_offset, u64 *probe_addr,
+ bool perf_type_tracepoint);
#endif
#ifdef CONFIG_UPROBE_EVENTS
extern int perf_uprobe_init(struct perf_event *event, bool is_retprobe);
extern void perf_uprobe_destroy(struct perf_event *event);
+extern int bpf_get_uprobe_info(const struct perf_event *event,
+ u32 *fd_type, const char **filename,
+ u64 *probe_offset, bool perf_type_tracepoint);
#endif
extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
char *filter_str);
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 07ee0f8..a27604f 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -112,20 +112,6 @@
#endif
}
-static inline void u64_stats_update_begin_raw(struct u64_stats_sync *syncp)
-{
-#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
- raw_write_seqcount_begin(&syncp->seq);
-#endif
-}
-
-static inline void u64_stats_update_end_raw(struct u64_stats_sync *syncp)
-{
-#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
- raw_write_seqcount_end(&syncp->seq);
-#endif
-}
-
static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
{
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index eaea63b..ca84034 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -55,6 +55,7 @@
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
+ __u16 gso_size;
/*
* Fields specific to UDP-Lite.
*/
@@ -87,6 +88,8 @@
int forward_deficit;
};
+#define UDP_MAX_SEGMENTS (1 << 6UL)
+
static inline struct udp_sock *udp_sk(const struct sock *sk)
{
return (struct udp_sock *)sk;
diff --git a/include/linux/umh.h b/include/linux/umh.h
index 244aff6..5c812ac 100644
--- a/include/linux/umh.h
+++ b/include/linux/umh.h
@@ -22,8 +22,10 @@
const char *path;
char **argv;
char **envp;
+ struct file *file;
int wait;
int retval;
+ pid_t pid;
int (*init)(struct subprocess_info *info, struct cred *new);
void (*cleanup)(struct subprocess_info *info);
void *data;
@@ -38,6 +40,16 @@
int (*init)(struct subprocess_info *info, struct cred *new),
void (*cleanup)(struct subprocess_info *), void *data);
+struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
+ int (*init)(struct subprocess_info *info, struct cred *new),
+ void (*cleanup)(struct subprocess_info *), void *data);
+struct umh_info {
+ struct file *pipe_to_umh;
+ struct file *pipe_from_umh;
+ pid_t pid;
+};
+int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
+
extern int
call_usermodehelper_exec(struct subprocess_info *info, int wait);
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 378d601..c07d4dd 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -223,6 +223,22 @@
const struct in6_addr *addr);
int (*ipv6_dst_lookup)(struct net *net, struct sock *sk,
struct dst_entry **dst, struct flowi6 *fl6);
+
+ struct fib6_table *(*fib6_get_table)(struct net *net, u32 id);
+ struct fib6_info *(*fib6_lookup)(struct net *net, int oif,
+ struct flowi6 *fl6, int flags);
+ struct fib6_info *(*fib6_table_lookup)(struct net *net,
+ struct fib6_table *table,
+ int oif, struct flowi6 *fl6,
+ int flags);
+ struct fib6_info *(*fib6_multipath_select)(const struct net *net,
+ struct fib6_info *f6i,
+ struct flowi6 *fl6, int oif,
+ const struct sk_buff *skb,
+ int strict);
+ u32 (*ip6_mtu_from_fib6)(struct fib6_info *f6i, struct in6_addr *daddr,
+ struct in6_addr *saddr);
+
void (*udpv6_encap_enable)(void);
void (*ndisc_send_na)(struct net_device *dev, const struct in6_addr *daddr,
const struct in6_addr *solicited_addr,
@@ -308,6 +324,20 @@
}
/**
+ * __in6_dev_get_safely - get inet6_dev pointer from netdevice
+ * @dev: network device
+ *
+ * This is a safer version of __in6_dev_get
+ */
+static inline struct inet6_dev *__in6_dev_get_safely(const struct net_device *dev)
+{
+ if (likely(dev))
+ return rcu_dereference_rtnl(dev->ip6_ptr);
+ else
+ return NULL;
+}
+
+/**
* in6_dev_get - get inet6_dev pointer from netdevice
* @dev: network device
*
diff --git a/include/net/ax88796.h b/include/net/ax88796.h
index b9a3bec..84b3785 100644
--- a/include/net/ax88796.h
+++ b/include/net/ax88796.h
@@ -12,6 +12,10 @@
#ifndef __NET_AX88796_PLAT_H
#define __NET_AX88796_PLAT_H
+struct sk_buff;
+struct net_device;
+struct platform_device;
+
#define AXFLG_HAS_EEPROM (1<<0)
#define AXFLG_MAC_FROMDEV (1<<1) /* device already has MAC */
#define AXFLG_HAS_93CX6 (1<<2) /* use eeprom_93cx6 driver */
@@ -26,6 +30,16 @@
u32 *reg_offsets; /* register offsets */
u8 *mac_addr; /* MAC addr (only used when
AXFLG_MAC_FROMPLATFORM is used */
+
+ /* uses default ax88796 buffer if set to NULL */
+ void (*block_output)(struct net_device *dev, int count,
+ const unsigned char *buf, int star_page);
+ void (*block_input)(struct net_device *dev, int count,
+ struct sk_buff *skb, int ring_offset);
+ /* returns nonzero if a pending interrupt request might by caused by
+ * the ax88786. Handles all interrupts if set to NULL
+ */
+ int (*check_irq)(struct platform_device *pdev);
};
#endif /* __NET_AX88796_PLAT_H */
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index b619a19..893bbbb 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -1393,6 +1393,8 @@
const void *param, u32 timeout);
struct sk_buff *__hci_cmd_sync_ev(struct hci_dev *hdev, u16 opcode, u32 plen,
const void *param, u8 event, u32 timeout);
+int __hci_cmd_send(struct hci_dev *hdev, u16 opcode, u32 plen,
+ const void *param);
int hci_send_cmd(struct hci_dev *hdev, __u16 opcode, __u32 plen,
const void *param);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index b522351..808f1d1 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -285,8 +285,15 @@
static inline bool bond_is_nondyn_tlb(const struct bonding *bond)
{
- return (BOND_MODE(bond) == BOND_MODE_TLB) &&
- (bond->params.tlb_dynamic_lb == 0);
+ return (bond_is_lb(bond) && bond->params.tlb_dynamic_lb == 0);
+}
+
+static inline bool bond_mode_can_use_xmit_hash(const struct bonding *bond)
+{
+ return (BOND_MODE(bond) == BOND_MODE_8023AD ||
+ BOND_MODE(bond) == BOND_MODE_XOR ||
+ BOND_MODE(bond) == BOND_MODE_TLB ||
+ BOND_MODE(bond) == BOND_MODE_ALB);
}
static inline bool bond_mode_uses_xmit_hash(const struct bonding *bond)
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 250dac3..5fbfe61 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -1080,6 +1080,37 @@
};
/**
+ * struct cfg80211_txq_stats - TXQ statistics for this TID
+ * @filled: bitmap of flags using the bits of &enum nl80211_txq_stats to
+ * indicate the relevant values in this struct are filled
+ * @backlog_bytes: total number of bytes currently backlogged
+ * @backlog_packets: total number of packets currently backlogged
+ * @flows: number of new flows seen
+ * @drops: total number of packets dropped
+ * @ecn_marks: total number of packets marked with ECN CE
+ * @overlimit: number of drops due to queue space overflow
+ * @overmemory: number of drops due to memory limit overflow
+ * @collisions: number of hash collisions
+ * @tx_bytes: total number of bytes dequeued
+ * @tx_packets: total number of packets dequeued
+ * @max_flows: maximum number of flows supported
+ */
+struct cfg80211_txq_stats {
+ u32 filled;
+ u32 backlog_bytes;
+ u32 backlog_packets;
+ u32 flows;
+ u32 drops;
+ u32 ecn_marks;
+ u32 overlimit;
+ u32 overmemory;
+ u32 collisions;
+ u32 tx_bytes;
+ u32 tx_packets;
+ u32 max_flows;
+};
+
+/**
* struct cfg80211_tid_stats - per-TID statistics
* @filled: bitmap of flags using the bits of &enum nl80211_tid_stats to
* indicate the relevant values in this struct are filled
@@ -1088,6 +1119,7 @@
* @tx_msdu_retries: number of retries (not counting the first) for
* transmitted MSDUs
* @tx_msdu_failed: number of failed transmitted MSDUs
+ * @txq_stats: TXQ statistics
*/
struct cfg80211_tid_stats {
u32 filled;
@@ -1095,6 +1127,7 @@
u64 tx_msdu;
u64 tx_msdu_retries;
u64 tx_msdu_failed;
+ struct cfg80211_txq_stats txq_stats;
};
#define IEEE80211_MAX_CHAINS 4
@@ -1151,7 +1184,10 @@
* @rx_duration: aggregate PPDU duration(usecs) for all the frames from a peer
* @pertid: per-TID statistics, see &struct cfg80211_tid_stats, using the last
* (IEEE80211_NUM_TIDS) index for MSDUs not encapsulated in QoS-MPDUs.
+ * Note that this doesn't use the @filled bit, but is used if non-NULL.
* @ack_signal: signal strength (in dBm) of the last ACK frame.
+ * @avg_ack_signal: average rssi value of ack packet for the no of msdu's has
+ * been sent.
*/
struct station_info {
u64 filled;
@@ -1195,8 +1231,9 @@
u64 rx_beacon;
u64 rx_duration;
u8 rx_beacon_signal_avg;
- struct cfg80211_tid_stats pertid[IEEE80211_NUM_TIDS + 1];
+ struct cfg80211_tid_stats *pertid;
s8 ack_signal;
+ s8 avg_ack_signal;
};
#if IS_ENABLED(CONFIG_CFG80211)
@@ -2188,9 +2225,14 @@
* have to be updated as part of update_connect_params() call.
*
* @UPDATE_ASSOC_IES: Indicates whether association request IEs are updated
+ * @UPDATE_FILS_ERP_INFO: Indicates that FILS connection parameters (realm,
+ * username, erp sequence number and rrk) are updated
+ * @UPDATE_AUTH_TYPE: Indicates that authentication type is updated
*/
enum cfg80211_connect_params_changed {
UPDATE_ASSOC_IES = BIT(0),
+ UPDATE_FILS_ERP_INFO = BIT(1),
+ UPDATE_AUTH_TYPE = BIT(2),
};
/**
@@ -2201,6 +2243,9 @@
* @WIPHY_PARAM_RTS_THRESHOLD: wiphy->rts_threshold has changed
* @WIPHY_PARAM_COVERAGE_CLASS: coverage class changed
* @WIPHY_PARAM_DYN_ACK: dynack has been enabled
+ * @WIPHY_PARAM_TXQ_LIMIT: TXQ packet limit has been changed
+ * @WIPHY_PARAM_TXQ_MEMORY_LIMIT: TXQ memory limit has been changed
+ * @WIPHY_PARAM_TXQ_QUANTUM: TXQ scheduler quantum
*/
enum wiphy_params_flags {
WIPHY_PARAM_RETRY_SHORT = 1 << 0,
@@ -2209,6 +2254,9 @@
WIPHY_PARAM_RTS_THRESHOLD = 1 << 3,
WIPHY_PARAM_COVERAGE_CLASS = 1 << 4,
WIPHY_PARAM_DYN_ACK = 1 << 5,
+ WIPHY_PARAM_TXQ_LIMIT = 1 << 6,
+ WIPHY_PARAM_TXQ_MEMORY_LIMIT = 1 << 7,
+ WIPHY_PARAM_TXQ_QUANTUM = 1 << 8,
};
/**
@@ -2961,6 +3009,9 @@
*
* @set_multicast_to_unicast: configure multicast to unicast conversion for BSS
*
+ * @get_txq_stats: Get TXQ stats for interface or phy. If wdev is %NULL, this
+ * function should return phy stats, and interface stats otherwise.
+ *
* @set_pmk: configure the PMK to be used for offloaded 802.1X 4-Way handshake.
* If not deleted through @del_pmk the PMK remains valid until disconnect
* upon which the driver should clear it.
@@ -3262,6 +3313,10 @@
struct net_device *dev,
const bool enabled);
+ int (*get_txq_stats)(struct wiphy *wiphy,
+ struct wireless_dev *wdev,
+ struct cfg80211_txq_stats *txqstats);
+
int (*set_pmk)(struct wiphy *wiphy, struct net_device *dev,
const struct cfg80211_pmk_conf *conf);
int (*del_pmk)(struct wiphy *wiphy, struct net_device *dev,
@@ -3806,6 +3861,10 @@
* bitmap of &enum nl80211_band values. For instance, for
* NL80211_BAND_2GHZ, bit 0 would be set
* (i.e. BIT(NL80211_BAND_2GHZ)).
+ *
+ * @txq_limit: configuration of internal TX queue frame limit
+ * @txq_memory_limit: configuration internal TX queue memory limit
+ * @txq_quantum: configuration of internal TX queue scheduler quantum
*/
struct wiphy {
/* assign these fields before you register the wiphy */
@@ -3940,6 +3999,10 @@
u8 nan_supported_bands;
+ u32 txq_limit;
+ u32 txq_memory_limit;
+ u32 txq_quantum;
+
char priv[0] __aligned(NETDEV_ALIGN);
};
@@ -5363,6 +5426,30 @@
#endif
/**
+ * struct cfg80211_fils_resp_params - FILS connection response params
+ * @kek: KEK derived from a successful FILS connection (may be %NULL)
+ * @kek_len: Length of @fils_kek in octets
+ * @update_erp_next_seq_num: Boolean value to specify whether the value in
+ * @erp_next_seq_num is valid.
+ * @erp_next_seq_num: The next sequence number to use in ERP message in
+ * FILS Authentication. This value should be specified irrespective of the
+ * status for a FILS connection.
+ * @pmk: A new PMK if derived from a successful FILS connection (may be %NULL).
+ * @pmk_len: Length of @pmk in octets
+ * @pmkid: A new PMKID if derived from a successful FILS connection or the PMKID
+ * used for this FILS connection (may be %NULL).
+ */
+struct cfg80211_fils_resp_params {
+ const u8 *kek;
+ size_t kek_len;
+ bool update_erp_next_seq_num;
+ u16 erp_next_seq_num;
+ const u8 *pmk;
+ size_t pmk_len;
+ const u8 *pmkid;
+};
+
+/**
* struct cfg80211_connect_resp_params - Connection response params
* @status: Status code, %WLAN_STATUS_SUCCESS for successful connection, use
* %WLAN_STATUS_UNSPECIFIED_FAILURE if your device cannot give you
@@ -5380,17 +5467,7 @@
* @req_ie_len: Association request IEs length
* @resp_ie: Association response IEs (may be %NULL)
* @resp_ie_len: Association response IEs length
- * @fils_kek: KEK derived from a successful FILS connection (may be %NULL)
- * @fils_kek_len: Length of @fils_kek in octets
- * @update_erp_next_seq_num: Boolean value to specify whether the value in
- * @fils_erp_next_seq_num is valid.
- * @fils_erp_next_seq_num: The next sequence number to use in ERP message in
- * FILS Authentication. This value should be specified irrespective of the
- * status for a FILS connection.
- * @pmk: A new PMK if derived from a successful FILS connection (may be %NULL).
- * @pmk_len: Length of @pmk in octets
- * @pmkid: A new PMKID if derived from a successful FILS connection or the PMKID
- * used for this FILS connection (may be %NULL).
+ * @fils: FILS connection response parameters.
* @timeout_reason: Reason for connection timeout. This is used when the
* connection fails due to a timeout instead of an explicit rejection from
* the AP. %NL80211_TIMEOUT_UNSPECIFIED is used when the timeout reason is
@@ -5406,13 +5483,7 @@
size_t req_ie_len;
const u8 *resp_ie;
size_t resp_ie_len;
- const u8 *fils_kek;
- size_t fils_kek_len;
- bool update_erp_next_seq_num;
- u16 fils_erp_next_seq_num;
- const u8 *pmk;
- size_t pmk_len;
- const u8 *pmkid;
+ struct cfg80211_fils_resp_params fils;
enum nl80211_timeout_reason timeout_reason;
};
@@ -5558,6 +5629,7 @@
* @req_ie_len: association request IEs length
* @resp_ie: association response IEs (may be %NULL)
* @resp_ie_len: assoc response IEs length
+ * @fils: FILS related roaming information.
*/
struct cfg80211_roam_info {
struct ieee80211_channel *channel;
@@ -5567,6 +5639,7 @@
size_t req_ie_len;
const u8 *resp_ie;
size_t resp_ie_len;
+ struct cfg80211_fils_resp_params fils;
};
/**
@@ -5648,6 +5721,26 @@
struct ieee80211_channel *chan,
gfp_t gfp);
+/**
+ * cfg80211_sinfo_alloc_tid_stats - allocate per-tid statistics.
+ *
+ * @sinfo: the station information
+ * @gfp: allocation flags
+ */
+int cfg80211_sinfo_alloc_tid_stats(struct station_info *sinfo, gfp_t gfp);
+
+/**
+ * cfg80211_sinfo_release_content - release contents of station info
+ * @sinfo: the station information
+ *
+ * Releases any potentially allocated sub-information of the station
+ * information, but not the struct itself (since it's typically on
+ * the stack.)
+ */
+static inline void cfg80211_sinfo_release_content(struct station_info *sinfo)
+{
+ kfree(sinfo->pertid);
+}
/**
* cfg80211_new_sta - notify userspace about station
diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index 207d9ba..0e5e91b 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -101,6 +101,10 @@
/* CEE peer */
int (*cee_peer_getpg) (struct net_device *, struct cee_pg *);
int (*cee_peer_getpfc) (struct net_device *, struct cee_pfc *);
+
+ /* buffer settings */
+ int (*dcbnl_getbuffer)(struct net_device *, struct dcbnl_buffer *);
+ int (*dcbnl_setbuffer)(struct net_device *, struct dcbnl_buffer *);
};
#endif /* __NET_DCBNL_H__ */
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2e4f71e..9686a1a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -35,6 +35,14 @@
char priv[0] __aligned(NETDEV_ALIGN);
};
+struct devlink_port_attrs {
+ bool set;
+ enum devlink_port_flavour flavour;
+ u32 port_number; /* same value as "split group" */
+ bool split;
+ u32 split_subport_number;
+};
+
struct devlink_port {
struct list_head list;
struct devlink *devlink;
@@ -43,8 +51,7 @@
enum devlink_port_type type;
enum devlink_port_type desired_type;
void *type_dev;
- bool split;
- u32 split_group;
+ struct devlink_port_attrs attrs;
};
struct devlink_sb_pool_info {
@@ -367,8 +374,12 @@
void devlink_port_type_ib_set(struct devlink_port *devlink_port,
struct ib_device *ibdev);
void devlink_port_type_clear(struct devlink_port *devlink_port);
-void devlink_port_split_set(struct devlink_port *devlink_port,
- u32 split_group);
+void devlink_port_attrs_set(struct devlink_port *devlink_port,
+ enum devlink_port_flavour flavour,
+ u32 port_number, bool split,
+ u32 split_subport_number);
+int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
+ char *name, size_t len);
int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
u32 size, u16 ingress_pools_count,
u16 egress_pools_count, u16 ingress_tc_count,
@@ -466,11 +477,20 @@
{
}
-static inline void devlink_port_split_set(struct devlink_port *devlink_port,
- u32 split_group)
+static inline void devlink_port_attrs_set(struct devlink_port *devlink_port,
+ enum devlink_port_flavour flavour,
+ u32 port_number, bool split,
+ u32 split_subport_number)
{
}
+static inline int
+devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
+ char *name, size_t len)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int devlink_sb_register(struct devlink *devlink,
unsigned int sb_index, u32 size,
u16 ingress_pools_count,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 60fb4ec..fdbd608 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -20,12 +20,14 @@
#include <linux/of.h>
#include <linux/ethtool.h>
#include <linux/net_tstamp.h>
+#include <linux/phy.h>
#include <net/devlink.h>
#include <net/switchdev.h>
struct tc_action;
struct phy_device;
struct fixed_phy_status;
+struct phylink_link_state;
enum dsa_tag_protocol {
DSA_TAG_PROTO_NONE = 0,
@@ -199,6 +201,7 @@
u8 stp_state;
struct net_device *bridge_dev;
struct devlink_port devlink_port;
+ struct phylink *pl;
/*
* Original copy of the master netdev ethtool_ops
*/
@@ -354,12 +357,36 @@
struct fixed_phy_status *st);
/*
+ * PHYLINK integration
+ */
+ void (*phylink_validate)(struct dsa_switch *ds, int port,
+ unsigned long *supported,
+ struct phylink_link_state *state);
+ int (*phylink_mac_link_state)(struct dsa_switch *ds, int port,
+ struct phylink_link_state *state);
+ void (*phylink_mac_config)(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state);
+ void (*phylink_mac_an_restart)(struct dsa_switch *ds, int port);
+ void (*phylink_mac_link_down)(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface);
+ void (*phylink_mac_link_up)(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ phy_interface_t interface,
+ struct phy_device *phydev);
+ void (*phylink_fixed_state)(struct dsa_switch *ds, int port,
+ struct phylink_link_state *state);
+ /*
* ethtool hardware statistics.
*/
- void (*get_strings)(struct dsa_switch *ds, int port, uint8_t *data);
+ void (*get_strings)(struct dsa_switch *ds, int port,
+ u32 stringset, uint8_t *data);
void (*get_ethtool_stats)(struct dsa_switch *ds,
int port, uint64_t *data);
- int (*get_sset_count)(struct dsa_switch *ds, int port);
+ int (*get_sset_count)(struct dsa_switch *ds, int port, int sset);
+ void (*get_ethtool_phy_stats)(struct dsa_switch *ds,
+ int port, uint64_t *data);
/*
* ethtool Wake-on-LAN
@@ -588,4 +615,10 @@
#define BRCM_TAG_GET_PORT(v) ((v) >> 8)
#define BRCM_TAG_GET_QUEUE(v) ((v) & 0xff)
+
+int dsa_port_get_phy_strings(struct dsa_port *dp, uint8_t *data);
+int dsa_port_get_ethtool_phy_stats(struct dsa_port *dp, uint64_t *data);
+int dsa_port_get_phy_sset_count(struct dsa_port *dp);
+void dsa_port_phylink_mac_change(struct dsa_switch *ds, int port, bool up);
+
#endif
diff --git a/include/net/erspan.h b/include/net/erspan.h
index d044aa6..b39643e 100644
--- a/include/net/erspan.h
+++ b/include/net/erspan.h
@@ -219,6 +219,33 @@
return htonl((u32)h_usecs);
}
+/* ERSPAN BSO (Bad/Short/Oversized), see RFC1757
+ * 00b --> Good frame with no error, or unknown integrity
+ * 01b --> Payload is a Short Frame
+ * 10b --> Payload is an Oversized Frame
+ * 11b --> Payload is a Bad Frame with CRC or Alignment Error
+ */
+enum erspan_bso {
+ BSO_NOERROR = 0x0,
+ BSO_SHORT = 0x1,
+ BSO_OVERSIZED = 0x2,
+ BSO_BAD = 0x3,
+};
+
+static inline u8 erspan_detect_bso(struct sk_buff *skb)
+{
+ /* BSO_BAD is not handled because the frame CRC
+ * or alignment error information is in FCS.
+ */
+ if (skb->len < ETH_ZLEN)
+ return BSO_SHORT;
+
+ if (skb->len > ETH_FRAME_LEN)
+ return BSO_OVERSIZED;
+
+ return BSO_NOERROR;
+}
+
static inline void erspan_build_header_v2(struct sk_buff *skb,
u32 id, u8 direction, u16 hwid,
bool truncate, bool is_ipv4)
@@ -248,6 +275,7 @@
vlan_tci = ntohs(qp->tci);
}
+ bso = erspan_detect_bso(skb);
skb_push(skb, sizeof(*ershdr) + ERSPAN_V2_MDSIZE);
ershdr = (struct erspan_base_hdr *)skb->data;
memset(ershdr, 0, sizeof(*ershdr) + ERSPAN_V2_MDSIZE);
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index e5cfcfc..b473df5 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -75,7 +75,8 @@
int (*configure)(struct fib_rule *,
struct sk_buff *,
struct fib_rule_hdr *,
- struct nlattr **);
+ struct nlattr **,
+ struct netlink_ext_ack *);
int (*delete)(struct fib_rule *);
int (*compare)(struct fib_rule *,
struct fib_rule_hdr *,
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index d1fcf24..adc24df5 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -226,6 +226,11 @@
unsigned short int offset[FLOW_DISSECTOR_KEY_MAX];
};
+struct flow_keys_basic {
+ struct flow_dissector_key_control control;
+ struct flow_dissector_key_basic basic;
+};
+
struct flow_keys {
struct flow_dissector_key_control control;
#define FLOW_KEYS_HASH_START_FIELD basic
@@ -244,7 +249,7 @@
__be32 flow_get_u32_dst(const struct flow_keys *flow);
extern struct flow_dissector flow_keys_dissector;
-extern struct flow_dissector flow_keys_buf_dissector;
+extern struct flow_dissector flow_keys_basic_dissector;
/* struct flow_keys_digest:
*
diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index d4088d1..db38925 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -64,7 +64,7 @@
struct delayed_work dad_work;
struct inet6_dev *idev;
- struct rt6_info *rt;
+ struct fib6_info *rt;
struct hlist_node addr_lst;
struct list_head if_list;
@@ -143,8 +143,7 @@
struct ifacaddr6 {
struct in6_addr aca_addr;
- struct inet6_dev *aca_idev;
- struct rt6_info *aca_rt;
+ struct fib6_info *aca_rt;
struct ifacaddr6 *aca_next;
int aca_users;
refcount_t aca_refcnt;
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index b68fea0..0a6c9e0 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -23,8 +23,6 @@
#include <net/inet_sock.h>
#include <net/request_sock.h>
-#define INET_CSK_DEBUG 1
-
/* Cancel timers, when they are not required. */
#undef INET_CSK_CLEAR_TIMERS
@@ -77,6 +75,7 @@
* @icsk_af_ops Operations which are AF_INET{4,6} specific
* @icsk_ulp_ops Pluggable ULP control hook
* @icsk_ulp_data ULP private data
+ * @icsk_clean_acked Clean acked data hook
* @icsk_listen_portaddr_node hash to the portaddr listener hashtable
* @icsk_ca_state: Congestion control state
* @icsk_retransmits: Number of unrecovered [RTO] timeouts
@@ -102,6 +101,7 @@
const struct inet_connection_sock_af_ops *icsk_af_ops;
const struct tcp_ulp_ops *icsk_ulp_ops;
void *icsk_ulp_data;
+ void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
struct hlist_node icsk_listen_portaddr_node;
unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
__u8 icsk_ca_state:6,
@@ -194,10 +194,6 @@
void inet_csk_delete_keepalive_timer(struct sock *sk);
void inet_csk_reset_keepalive_timer(struct sock *sk, unsigned long timeout);
-#ifdef INET_CSK_DEBUG
-extern const char inet_csk_timer_bug_msg[];
-#endif
-
static inline void inet_csk_clear_xmit_timer(struct sock *sk, const int what)
{
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -212,12 +208,9 @@
#ifdef INET_CSK_CLEAR_TIMERS
sk_stop_timer(sk, &icsk->icsk_delack_timer);
#endif
+ } else {
+ pr_debug("inet_csk BUG: unknown timer value\n");
}
-#ifdef INET_CSK_DEBUG
- else {
- pr_debug("%s", inet_csk_timer_bug_msg);
- }
-#endif
}
/*
@@ -230,10 +223,8 @@
struct inet_connection_sock *icsk = inet_csk(sk);
if (when > max_when) {
-#ifdef INET_CSK_DEBUG
pr_debug("reset_xmit_timer: sk=%p %d when=0x%lx, caller=%p\n",
sk, what, when, current_text_addr());
-#endif
when = max_when;
}
@@ -247,12 +238,9 @@
icsk->icsk_ack.pending |= ICSK_ACK_TIMER;
icsk->icsk_ack.timeout = jiffies + when;
sk_reset_timer(sk, &icsk->icsk_delack_timer, icsk->icsk_ack.timeout);
+ } else {
+ pr_debug("inet_csk BUG: unknown timer value\n");
}
-#ifdef INET_CSK_DEBUG
- else {
- pr_debug("%s", inet_csk_timer_bug_msg);
- }
-#endif
}
static inline unsigned long
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 0a671c3..83d5b3c 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -147,6 +147,7 @@
__u8 ttl;
__s16 tos;
char priority;
+ __u16 gso_size;
};
struct inet_cork_full {
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index c7be1ca..659d8ed 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -62,6 +62,7 @@
#define tw_dr __tw_common.skc_tw_dr
int tw_timeout;
+ __u32 tw_mark;
volatile unsigned char tw_substate;
unsigned char tw_rcv_wscale;
diff --git a/include/net/ip.h b/include/net/ip.h
index ecffd84..0d2281b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -76,6 +76,7 @@
__u8 ttl;
__s16 tos;
char priority;
+ __u16 gso_size;
};
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
@@ -171,7 +172,7 @@
int len, int odd, struct sk_buff *skb),
void *from, int length, int transhdrlen,
struct ipcm_cookie *ipc, struct rtable **rtp,
- unsigned int flags);
+ struct inet_cork *cork, unsigned int flags);
static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
{
@@ -396,6 +397,9 @@
return min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
}
+int ip_metrics_convert(struct net *net, struct nlattr *fc_mx, int fc_mx_len,
+ u32 *metrics);
+
u32 ip_idents_reserve(u32 hash, int segs);
void __ip_select_ident(struct net *net, struct iphdr *iph, int segs);
@@ -660,4 +664,7 @@
int ip_misc_proc_init(void);
#endif
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+ struct netlink_ext_ack *extack);
+
#endif /* _IP_H */
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 5e86fd9..7897efe 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -38,6 +38,7 @@
#endif
struct rt6_info;
+struct fib6_info;
struct fib6_config {
u32 fc_table;
@@ -74,12 +75,12 @@
#ifdef CONFIG_IPV6_SUBTREES
struct fib6_node __rcu *subtree;
#endif
- struct rt6_info __rcu *leaf;
+ struct fib6_info __rcu *leaf;
__u16 fn_bit; /* bit key */
__u16 fn_flags;
int fn_sernum;
- struct rt6_info __rcu *rr_ptr;
+ struct fib6_info __rcu *rr_ptr;
struct rcu_head rcu;
};
@@ -94,11 +95,6 @@
#define FIB6_SUBTREE(fn) (rcu_dereference_protected((fn)->subtree, 1))
#endif
-struct mx6_config {
- const u32 *mx;
- DECLARE_BITMAP(mx_valid, RTAX_MAX);
-};
-
/*
* routing information
*
@@ -127,92 +123,104 @@
#define FIB6_EXCEPTION_BUCKET_SIZE (1 << FIB6_EXCEPTION_BUCKET_SIZE_SHIFT)
#define FIB6_MAX_DEPTH 5
-struct rt6_info {
- struct dst_entry dst;
- struct rt6_info __rcu *rt6_next;
- struct rt6_info *from;
+struct fib6_nh {
+ struct in6_addr nh_gw;
+ struct net_device *nh_dev;
+ struct lwtunnel_state *nh_lwtstate;
- /*
- * Tail elements of dst_entry (__refcnt etc.)
- * and these elements (rarely used in hot path) are in
- * the same cache line.
- */
- struct fib6_table *rt6i_table;
- struct fib6_node __rcu *rt6i_node;
+ unsigned int nh_flags;
+ atomic_t nh_upper_bound;
+ int nh_weight;
+};
- struct in6_addr rt6i_gateway;
+struct fib6_info {
+ struct fib6_table *fib6_table;
+ struct fib6_info __rcu *fib6_next;
+ struct fib6_node __rcu *fib6_node;
/* Multipath routes:
- * siblings is a list of rt6_info that have the the same metric/weight,
+ * siblings is a list of fib6_info that have the the same metric/weight,
* destination, but not the same gateway. nsiblings is just a cache
* to speed up lookup.
*/
- struct list_head rt6i_siblings;
- unsigned int rt6i_nsiblings;
- atomic_t rt6i_nh_upper_bound;
+ struct list_head fib6_siblings;
+ unsigned int fib6_nsiblings;
- atomic_t rt6i_ref;
+ atomic_t fib6_ref;
+ unsigned long expires;
+ struct dst_metrics *fib6_metrics;
+#define fib6_pmtu fib6_metrics->metrics[RTAX_MTU-1]
- unsigned int rt6i_nh_flags;
+ struct rt6key fib6_dst;
+ u32 fib6_flags;
+ struct rt6key fib6_src;
+ struct rt6key fib6_prefsrc;
- /* These are in a separate cache line. */
- struct rt6key rt6i_dst ____cacheline_aligned_in_smp;
- u32 rt6i_flags;
+ struct rt6_info * __percpu *rt6i_pcpu;
+ struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
+
+ u32 fib6_metric;
+ u8 fib6_protocol;
+ u8 fib6_type;
+ u8 exception_bucket_flushed:1,
+ should_flush:1,
+ dst_nocount:1,
+ dst_nopolicy:1,
+ dst_host:1,
+ unused:3;
+
+ struct fib6_nh fib6_nh;
+};
+
+struct rt6_info {
+ struct dst_entry dst;
+ struct fib6_info __rcu *from;
+
+ struct rt6key rt6i_dst;
struct rt6key rt6i_src;
+ struct in6_addr rt6i_gateway;
+ struct inet6_dev *rt6i_idev;
+ u32 rt6i_flags;
struct rt6key rt6i_prefsrc;
struct list_head rt6i_uncached;
struct uncached_list *rt6i_uncached_list;
- struct inet6_dev *rt6i_idev;
- struct rt6_info * __percpu *rt6i_pcpu;
- struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
-
- u32 rt6i_metric;
- u32 rt6i_pmtu;
/* more non-fragment space at head required */
- int rt6i_nh_weight;
unsigned short rt6i_nfheader_len;
- u8 rt6i_protocol;
- u8 exception_bucket_flushed:1,
- should_flush:1,
- unused:6;
};
#define for_each_fib6_node_rt_rcu(fn) \
for (rt = rcu_dereference((fn)->leaf); rt; \
- rt = rcu_dereference(rt->rt6_next))
+ rt = rcu_dereference(rt->fib6_next))
#define for_each_fib6_walker_rt(w) \
for (rt = (w)->leaf; rt; \
- rt = rcu_dereference_protected(rt->rt6_next, 1))
+ rt = rcu_dereference_protected(rt->fib6_next, 1))
static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
{
return ((struct rt6_info *)dst)->rt6i_idev;
}
-static inline void rt6_clean_expires(struct rt6_info *rt)
+static inline void fib6_clean_expires(struct fib6_info *f6i)
{
- rt->rt6i_flags &= ~RTF_EXPIRES;
- rt->dst.expires = 0;
+ f6i->fib6_flags &= ~RTF_EXPIRES;
+ f6i->expires = 0;
}
-static inline void rt6_set_expires(struct rt6_info *rt, unsigned long expires)
+static inline void fib6_set_expires(struct fib6_info *f6i,
+ unsigned long expires)
{
- rt->dst.expires = expires;
- rt->rt6i_flags |= RTF_EXPIRES;
+ f6i->expires = expires;
+ f6i->fib6_flags |= RTF_EXPIRES;
}
-static inline void rt6_update_expires(struct rt6_info *rt0, int timeout)
+static inline bool fib6_check_expired(const struct fib6_info *f6i)
{
- struct rt6_info *rt;
-
- for (rt = rt0; rt && !(rt->rt6i_flags & RTF_EXPIRES); rt = rt->from);
- if (rt && rt != rt0)
- rt0->dst.expires = rt->dst.expires;
- dst_set_expires(&rt0->dst, timeout);
- rt0->rt6i_flags |= RTF_EXPIRES;
+ if (f6i->fib6_flags & RTF_EXPIRES)
+ return time_after(jiffies, f6i->expires);
+ return false;
}
/* Function to safely get fn->sernum for passed in rt
@@ -220,14 +228,13 @@
* Return true if we can get cookie safely
* Return false if not
*/
-static inline bool rt6_get_cookie_safe(const struct rt6_info *rt,
- u32 *cookie)
+static inline bool fib6_get_cookie_safe(const struct fib6_info *f6i,
+ u32 *cookie)
{
struct fib6_node *fn;
bool status = false;
- rcu_read_lock();
- fn = rcu_dereference(rt->rt6i_node);
+ fn = rcu_dereference(f6i->fib6_node);
if (fn) {
*cookie = fn->fn_sernum;
@@ -236,19 +243,22 @@
status = true;
}
- rcu_read_unlock();
return status;
}
static inline u32 rt6_get_cookie(const struct rt6_info *rt)
{
+ struct fib6_info *from;
u32 cookie = 0;
- if (rt->rt6i_flags & RTF_PCPU ||
- (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->from))
- rt = rt->from;
+ rcu_read_lock();
- rt6_get_cookie_safe(rt, &cookie);
+ from = rcu_dereference(rt->from);
+ if (from && (rt->rt6i_flags & RTF_PCPU ||
+ unlikely(!list_empty(&rt->rt6i_uncached))))
+ fib6_get_cookie_safe(from, &cookie);
+
+ rcu_read_unlock();
return cookie;
}
@@ -262,20 +272,18 @@
dst_release(&rt->dst);
}
-void rt6_free_pcpu(struct rt6_info *non_pcpu_rt);
+struct fib6_info *fib6_info_alloc(gfp_t gfp_flags);
+void fib6_info_destroy(struct fib6_info *f6i);
-static inline void rt6_hold(struct rt6_info *rt)
+static inline void fib6_info_hold(struct fib6_info *f6i)
{
- atomic_inc(&rt->rt6i_ref);
+ atomic_inc(&f6i->fib6_ref);
}
-static inline void rt6_release(struct rt6_info *rt)
+static inline void fib6_info_release(struct fib6_info *f6i)
{
- if (atomic_dec_and_test(&rt->rt6i_ref)) {
- rt6_free_pcpu(rt);
- dst_dev_put(&rt->dst);
- dst_release(&rt->dst);
- }
+ if (f6i && atomic_dec_and_test(&f6i->fib6_ref))
+ fib6_info_destroy(f6i);
}
enum fib6_walk_state {
@@ -291,7 +299,7 @@
struct fib6_walker {
struct list_head lh;
struct fib6_node *root, *node;
- struct rt6_info *leaf;
+ struct fib6_info *leaf;
enum fib6_walk_state state;
unsigned int skip;
unsigned int count;
@@ -355,7 +363,7 @@
struct fib6_entry_notifier_info {
struct fib_notifier_info info; /* must be first */
- struct rt6_info *rt;
+ struct fib6_info *rt;
};
/*
@@ -368,24 +376,49 @@
const struct sk_buff *skb,
int flags, pol_lookup_t lookup);
-struct fib6_node *fib6_lookup(struct fib6_node *root,
- const struct in6_addr *daddr,
- const struct in6_addr *saddr);
+/* called with rcu lock held; can return error pointer
+ * caller needs to select path
+ */
+struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+ int flags);
+
+/* called with rcu lock held; caller needs to select path */
+struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table,
+ int oif, struct flowi6 *fl6, int strict);
+
+struct fib6_info *fib6_multipath_select(const struct net *net,
+ struct fib6_info *match,
+ struct flowi6 *fl6, int oif,
+ const struct sk_buff *skb, int strict);
+
+struct fib6_node *fib6_node_lookup(struct fib6_node *root,
+ const struct in6_addr *daddr,
+ const struct in6_addr *saddr);
struct fib6_node *fib6_locate(struct fib6_node *root,
const struct in6_addr *daddr, int dst_len,
const struct in6_addr *saddr, int src_len,
bool exact_match);
-void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg),
+void fib6_clean_all(struct net *net, int (*func)(struct fib6_info *, void *arg),
void *arg);
-int fib6_add(struct fib6_node *root, struct rt6_info *rt,
- struct nl_info *info, struct mx6_config *mxc,
- struct netlink_ext_ack *extack);
-int fib6_del(struct rt6_info *rt, struct nl_info *info);
+int fib6_add(struct fib6_node *root, struct fib6_info *rt,
+ struct nl_info *info, struct netlink_ext_ack *extack);
+int fib6_del(struct fib6_info *rt, struct nl_info *info);
-void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info,
+static inline struct net_device *fib6_info_nh_dev(const struct fib6_info *f6i)
+{
+ return f6i->fib6_nh.nh_dev;
+}
+
+static inline
+struct lwtunnel_state *fib6_info_nh_lwt(const struct fib6_info *f6i)
+{
+ return f6i->fib6_nh.nh_lwtstate;
+}
+
+void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
unsigned int flags);
void fib6_run_gc(unsigned long expires, struct net *net, bool force);
@@ -408,8 +441,14 @@
unsigned int fib6_tables_seq_read(struct net *net);
int fib6_tables_dump(struct net *net, struct notifier_block *nb);
-void fib6_update_sernum(struct rt6_info *rt);
-void fib6_update_sernum_upto_root(struct net *net, struct rt6_info *rt);
+void fib6_update_sernum(struct net *net, struct fib6_info *rt);
+void fib6_update_sernum_upto_root(struct net *net, struct fib6_info *rt);
+
+void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val);
+static inline bool fib6_metric_locked(struct fib6_info *f6i, int metric)
+{
+ return !!(f6i->fib6_metrics->metrics[RTAX_LOCK - 1] & (1 << metric));
+}
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
int fib6_rules_init(void);
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 08b1323..59656fc5 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -66,12 +66,6 @@
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
-static inline bool rt6_qualify_for_ecmp(const struct rt6_info *rt)
-{
- return (rt->rt6i_flags & (RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC)) ==
- RTF_GATEWAY;
-}
-
void ip6_route_input(struct sk_buff *skb);
struct dst_entry *ip6_route_input_lookup(struct net *net,
struct net_device *dev,
@@ -100,29 +94,29 @@
int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg);
-int ip6_route_add(struct fib6_config *cfg, struct netlink_ext_ack *extack);
-int ip6_ins_rt(struct rt6_info *);
-int ip6_del_rt(struct rt6_info *);
+int ip6_route_add(struct fib6_config *cfg, gfp_t gfp_flags,
+ struct netlink_ext_ack *extack);
+int ip6_ins_rt(struct net *net, struct fib6_info *f6i);
+int ip6_del_rt(struct net *net, struct fib6_info *f6i);
-void rt6_flush_exceptions(struct rt6_info *rt);
-int rt6_remove_exception_rt(struct rt6_info *rt);
-void rt6_age_exceptions(struct rt6_info *rt, struct fib6_gc_args *gc_args,
+void rt6_flush_exceptions(struct fib6_info *f6i);
+void rt6_age_exceptions(struct fib6_info *f6i, struct fib6_gc_args *gc_args,
unsigned long now);
-static inline int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
+static inline int ip6_route_get_saddr(struct net *net, struct fib6_info *f6i,
const struct in6_addr *daddr,
unsigned int prefs,
struct in6_addr *saddr)
{
- struct inet6_dev *idev =
- rt ? ip6_dst_idev((struct dst_entry *)rt) : NULL;
int err = 0;
- if (rt && rt->rt6i_prefsrc.plen)
- *saddr = rt->rt6i_prefsrc.addr;
- else
- err = ipv6_dev_get_saddr(net, idev ? idev->dev : NULL,
- daddr, prefs, saddr);
+ if (f6i && f6i->fib6_prefsrc.plen) {
+ *saddr = f6i->fib6_prefsrc.addr;
+ } else {
+ struct net_device *dev = f6i ? fib6_info_nh_dev(f6i) : NULL;
+
+ err = ipv6_dev_get_saddr(net, dev, daddr, prefs, saddr);
+ }
return err;
}
@@ -137,8 +131,9 @@
void fib6_force_start_gc(struct net *net);
-struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
- const struct in6_addr *addr, bool anycast);
+struct fib6_info *addrconf_f6i_alloc(struct net *net, struct inet6_dev *idev,
+ const struct in6_addr *addr, bool anycast,
+ gfp_t gfp_flags);
struct rt6_info *ip6_dst_alloc(struct net *net, struct net_device *dev,
int flags);
@@ -147,9 +142,11 @@
* support functions for ND
*
*/
-struct rt6_info *rt6_get_dflt_router(const struct in6_addr *addr,
+struct fib6_info *rt6_get_dflt_router(struct net *net,
+ const struct in6_addr *addr,
struct net_device *dev);
-struct rt6_info *rt6_add_dflt_router(const struct in6_addr *gwaddr,
+struct fib6_info *rt6_add_dflt_router(struct net *net,
+ const struct in6_addr *gwaddr,
struct net_device *dev, unsigned int pref);
void rt6_purge_dflt_routers(struct net *net);
@@ -174,14 +171,14 @@
struct net *net;
};
-int rt6_dump_route(struct rt6_info *rt, void *p_arg);
+int rt6_dump_route(struct fib6_info *f6i, void *p_arg);
void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
void rt6_sync_up(struct net_device *dev, unsigned int nh_flags);
void rt6_disable_ip(struct net_device *dev, unsigned long event);
void rt6_sync_down_dev(struct net_device *dev, unsigned long event);
-void rt6_multipath_rebalance(struct rt6_info *rt);
+void rt6_multipath_rebalance(struct fib6_info *f6i);
void rt6_uncached_list_add(struct rt6_info *rt);
void rt6_uncached_list_del(struct rt6_info *rt);
@@ -269,12 +266,38 @@
return daddr;
}
-static inline bool rt6_duplicate_nexthop(struct rt6_info *a, struct rt6_info *b)
+static inline bool rt6_duplicate_nexthop(struct fib6_info *a, struct fib6_info *b)
{
- return a->dst.dev == b->dst.dev &&
- a->rt6i_idev == b->rt6i_idev &&
- ipv6_addr_equal(&a->rt6i_gateway, &b->rt6i_gateway) &&
- !lwtunnel_cmp_encap(a->dst.lwtstate, b->dst.lwtstate);
+ return a->fib6_nh.nh_dev == b->fib6_nh.nh_dev &&
+ ipv6_addr_equal(&a->fib6_nh.nh_gw, &b->fib6_nh.nh_gw) &&
+ !lwtunnel_cmp_encap(a->fib6_nh.nh_lwtstate, b->fib6_nh.nh_lwtstate);
}
+static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
+{
+ struct inet6_dev *idev;
+ unsigned int mtu;
+
+ if (dst_metric_locked(dst, RTAX_MTU)) {
+ mtu = dst_metric_raw(dst, RTAX_MTU);
+ if (mtu)
+ return mtu;
+ }
+
+ mtu = IPV6_MIN_MTU;
+ rcu_read_lock();
+ idev = __in6_dev_get(dst->dev);
+ if (idev)
+ mtu = idev->cnf.mtu6;
+ rcu_read_unlock();
+
+ return mtu;
+}
+
+u32 ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+ struct in6_addr *saddr);
+
+struct neighbour *ip6_neigh_lookup(const struct in6_addr *gw,
+ struct net_device *dev, struct sk_buff *skb,
+ const void *daddr);
#endif
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 81d0f21..69c91d1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -449,4 +449,6 @@
}
#endif
+u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr);
+
#endif /* _NET_FIB_H */
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 540a4b4..90ff430 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -379,6 +379,17 @@
return 0;
}
+static inline u8 ip_tunnel_get_ttl(const struct iphdr *iph,
+ const struct sk_buff *skb)
+{
+ if (skb->protocol == htons(ETH_P_IP))
+ return iph->ttl;
+ else if (skb->protocol == htons(ETH_P_IPV6))
+ return ((const struct ipv6hdr *)iph)->hop_limit;
+ else
+ return 0;
+}
+
/* Propogate ECN bits out */
static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
const struct sk_buff *skb)
@@ -466,12 +477,12 @@
return (struct ip_tunnel_info *)lwtstate->data;
}
-extern struct static_key ip_tunnel_metadata_cnt;
+DECLARE_STATIC_KEY_FALSE(ip_tunnel_metadata_cnt);
/* Returns > 0 if metadata should be collected */
static inline int ip_tunnel_collect_metadata(void)
{
- return static_key_false(&ip_tunnel_metadata_cnt);
+ return static_branch_unlikely(&ip_tunnel_metadata_cnt);
}
void __init ip_tunnel_core_init(void);
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index eb0bec0..0ac795b 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -668,6 +668,7 @@
volatile unsigned int flags; /* dest status flags */
atomic_t conn_flags; /* flags to copy to conn */
atomic_t weight; /* server weight */
+ atomic_t last_weight; /* server latest weight */
refcount_t refcnt; /* reference counter */
struct ip_vs_stats stats; /* statistics */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 836f31a..798558f 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -298,6 +298,7 @@
__s16 tclass;
__s8 dontfrag;
struct ipv6_txoptions *opt;
+ __u16 gso_size;
};
static inline struct ipv6_txoptions *txopt_get(const struct ipv6_pinfo *np)
@@ -950,6 +951,7 @@
void *from, int length, int transhdrlen,
struct ipcm6_cookie *ipc6, struct flowi6 *fl6,
struct rt6_info *rt, unsigned int flags,
+ struct inet_cork_full *cork,
const struct sockcm_cookie *sockc);
static inline struct sk_buff *ip6_finish_skb(struct sock *sk)
@@ -958,8 +960,6 @@
&inet6_sk(sk)->cork);
}
-unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst);
-
int ip6_dst_lookup(struct net *net, struct sock *sk, struct dst_entry **dst,
struct flowi6 *fl6);
struct dst_entry *ip6_dst_lookup_flow(const struct sock *sk, struct flowi6 *fl6,
@@ -1044,8 +1044,6 @@
void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu);
int inet6_release(struct socket *sock);
-int __inet6_bind(struct sock *sock, struct sockaddr *uaddr, int addr_len,
- bool force_bind_address_no_port, bool with_lock);
int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len);
int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
int peer);
diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index b2f3a0c..851a5e1 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -3378,6 +3378,8 @@
* frame in case that no beacon was heard from the AP/P2P GO.
* The callback will be called before each transmission and upon return
* mac80211 will transmit the frame right away.
+ * If duration is greater than zero, mac80211 hints to the driver the
+ * duration for which the operation is requested.
* The callback is optional and can (should!) sleep.
*
* @mgd_protect_tdls_discover: Protect a TDLS discovery session. After sending
@@ -3697,7 +3699,8 @@
u32 sset, u8 *data);
void (*mgd_prepare_tx)(struct ieee80211_hw *hw,
- struct ieee80211_vif *vif);
+ struct ieee80211_vif *vif,
+ u16 duration);
void (*mgd_protect_tdls_discover)(struct ieee80211_hw *hw,
struct ieee80211_vif *vif);
@@ -4450,6 +4453,19 @@
u8 ieee80211_csa_update_counter(struct ieee80211_vif *vif);
/**
+ * ieee80211_csa_set_counter - request mac80211 to set csa counter
+ * @vif: &struct ieee80211_vif pointer from the add_interface callback.
+ * @counter: the new value for the counter
+ *
+ * The csa counter can be changed by the device, this API should be
+ * used by the device driver to update csa counter in mac80211.
+ *
+ * It should never be used together with ieee80211_csa_update_counter(),
+ * as it will cause a race condition around the counter value.
+ */
+void ieee80211_csa_set_counter(struct ieee80211_vif *vif, u8 counter);
+
+/**
* ieee80211_csa_finish - notify mac80211 about channel switch
* @vif: &struct ieee80211_vif pointer from the add_interface callback.
*
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index e421f86..6c1eecd 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -246,6 +246,7 @@
#define NEIGH_UPDATE_F_OVERRIDE 0x00000001
#define NEIGH_UPDATE_F_WEAK_OVERRIDE 0x00000002
#define NEIGH_UPDATE_F_OVERRIDE_ISROUTER 0x00000004
+#define NEIGH_UPDATE_F_EXT_LEARNED 0x20000000
#define NEIGH_UPDATE_F_ISROUTER 0x40000000
#define NEIGH_UPDATE_F_ADMIN 0x80000000
@@ -526,5 +527,21 @@
} while (read_seqretry(&n->ha_lock, seq));
}
+static inline void neigh_update_ext_learned(struct neighbour *neigh, u32 flags,
+ int *notify)
+{
+ u8 ndm_flags = 0;
+ if (!(flags & NEIGH_UPDATE_F_ADMIN))
+ return;
+
+ ndm_flags |= (flags & NEIGH_UPDATE_F_EXT_LEARNED) ? NTF_EXT_LEARNED : 0;
+ if ((neigh->flags ^ ndm_flags) & NTF_EXT_LEARNED) {
+ if (ndm_flags & NTF_EXT_LEARNED)
+ neigh->flags |= NTF_EXT_LEARNED;
+ else
+ neigh->flags &= ~NTF_EXT_LEARNED;
+ *notify = 1;
+ }
+}
#endif
diff --git a/include/net/netfilter/ipv4/nf_nat_masquerade.h b/include/net/netfilter/ipv4/nf_nat_masquerade.h
index ebd8694..cd24be4 100644
--- a/include/net/netfilter/ipv4/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv4/nf_nat_masquerade.h
@@ -6,7 +6,7 @@
unsigned int
nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
const struct net_device *out);
void nf_nat_masquerade_ipv4_register_notifier(void);
diff --git a/include/net/netfilter/ipv6/nf_nat_masquerade.h b/include/net/netfilter/ipv6/nf_nat_masquerade.h
index 1ed4f26..0c3b5eb 100644
--- a/include/net/netfilter/ipv6/nf_nat_masquerade.h
+++ b/include/net/netfilter/ipv6/nf_nat_masquerade.h
@@ -3,7 +3,7 @@
#define _NF_NAT_MASQUERADE_IPV6_H_
unsigned int
-nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range *range,
+nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
const struct net_device *out);
void nf_nat_masquerade_ipv6_register_notifier(void);
void nf_nat_masquerade_ipv6_unregister_notifier(void);
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 833752d..ba9fa45 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -6,6 +6,7 @@
#include <linux/netdevice.h>
#include <linux/rhashtable.h>
#include <linux/rcupdate.h>
+#include <linux/netfilter/nf_conntrack_tuple_common.h>
#include <net/dst.h>
struct nf_flowtable;
@@ -13,25 +14,24 @@
struct nf_flowtable_type {
struct list_head list;
int family;
- void (*gc)(struct work_struct *work);
+ int (*init)(struct nf_flowtable *ft);
void (*free)(struct nf_flowtable *ft);
- const struct rhashtable_params *params;
nf_hookfn *hook;
struct module *owner;
};
struct nf_flowtable {
+ struct list_head list;
struct rhashtable rhashtable;
const struct nf_flowtable_type *type;
struct delayed_work gc_work;
};
enum flow_offload_tuple_dir {
- FLOW_OFFLOAD_DIR_ORIGINAL,
- FLOW_OFFLOAD_DIR_REPLY,
- __FLOW_OFFLOAD_DIR_MAX = FLOW_OFFLOAD_DIR_REPLY,
+ FLOW_OFFLOAD_DIR_ORIGINAL = IP_CT_DIR_ORIGINAL,
+ FLOW_OFFLOAD_DIR_REPLY = IP_CT_DIR_REPLY,
+ FLOW_OFFLOAD_DIR_MAX = IP_CT_DIR_MAX
};
-#define FLOW_OFFLOAD_DIR_MAX (__FLOW_OFFLOAD_DIR_MAX + 1)
struct flow_offload_tuple {
union {
@@ -55,6 +55,8 @@
int oifidx;
+ u16 mtu;
+
struct dst_entry *dst_cache;
};
@@ -66,6 +68,7 @@
#define FLOW_OFFLOAD_SNAT 0x1
#define FLOW_OFFLOAD_DNAT 0x2
#define FLOW_OFFLOAD_DYING 0x4
+#define FLOW_OFFLOAD_TEARDOWN 0x8
struct flow_offload {
struct flow_offload_tuple_rhash tuplehash[FLOW_OFFLOAD_DIR_MAX];
@@ -98,11 +101,14 @@
void nf_flow_table_cleanup(struct net *net, struct net_device *dev);
+int nf_flow_table_init(struct nf_flowtable *flow_table);
void nf_flow_table_free(struct nf_flowtable *flow_table);
-void nf_flow_offload_work_gc(struct work_struct *work);
-extern const struct rhashtable_params nf_flow_offload_rhash_params;
-void flow_offload_dead(struct flow_offload *flow);
+void flow_offload_teardown(struct flow_offload *flow);
+static inline void flow_offload_dead(struct flow_offload *flow)
+{
+ flow->flags |= FLOW_OFFLOAD_DYING;
+}
int nf_flow_snat_port(const struct flow_offload *flow,
struct sk_buff *skb, unsigned int thoff,
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index 207a467..a17eb2f 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -39,7 +39,7 @@
/* Set up the info structure to map into this range. */
unsigned int nf_nat_setup_info(struct nf_conn *ct,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype);
extern unsigned int nf_nat_alloc_null_binding(struct nf_conn *ct,
@@ -75,4 +75,8 @@
#endif
}
+int nf_nat_register_fn(struct net *net, const struct nf_hook_ops *ops,
+ const struct nf_hook_ops *nat_ops, unsigned int ops_count);
+void nf_nat_unregister_fn(struct net *net, const struct nf_hook_ops *ops,
+ unsigned int ops_count);
#endif
diff --git a/include/net/netfilter/nf_nat_core.h b/include/net/netfilter/nf_nat_core.h
index 235bd0e..dc7cd04 100644
--- a/include/net/netfilter/nf_nat_core.h
+++ b/include/net/netfilter/nf_nat_core.h
@@ -11,6 +11,10 @@
unsigned int nf_nat_packet(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
unsigned int hooknum, struct sk_buff *skb);
+unsigned int
+nf_nat_inet_fn(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *state);
+
int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family);
static inline int nf_nat_initialized(struct nf_conn *ct,
@@ -22,11 +26,4 @@
return ct->status & IPS_DST_NAT_DONE;
}
-struct nlattr;
-
-extern int
-(*nfnetlink_parse_nat_setup_hook)(struct nf_conn *ct,
- enum nf_nat_manip_type manip,
- const struct nlattr *attr);
-
#endif /* _NF_NAT_CORE_H */
diff --git a/include/net/netfilter/nf_nat_l3proto.h b/include/net/netfilter/nf_nat_l3proto.h
index ce7c2b4..d300b8f 100644
--- a/include/net/netfilter/nf_nat_l3proto.h
+++ b/include/net/netfilter/nf_nat_l3proto.h
@@ -7,7 +7,7 @@
u8 l3proto;
bool (*in_range)(const struct nf_conntrack_tuple *t,
- const struct nf_nat_range *range);
+ const struct nf_nat_range2 *range);
u32 (*secure_port)(const struct nf_conntrack_tuple *t, __be16);
@@ -33,7 +33,7 @@
struct flowi *fl);
int (*nlattr_to_range)(struct nlattr *tb[],
- struct nf_nat_range *range);
+ struct nf_nat_range2 *range);
};
int nf_nat_l3proto_register(const struct nf_nat_l3proto *);
@@ -44,66 +44,14 @@
enum ip_conntrack_info ctinfo,
unsigned int hooknum);
-unsigned int nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
-
-unsigned int nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
-
-unsigned int nf_nat_ipv4_local_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
-
-unsigned int nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
-
int nf_nat_icmpv6_reply_translation(struct sk_buff *skb, struct nf_conn *ct,
enum ip_conntrack_info ctinfo,
unsigned int hooknum, unsigned int hdrlen);
-unsigned int nf_nat_ipv6_in(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
+int nf_nat_l3proto_ipv4_register_fn(struct net *net, const struct nf_hook_ops *ops);
+void nf_nat_l3proto_ipv4_unregister_fn(struct net *net, const struct nf_hook_ops *ops);
-unsigned int nf_nat_ipv6_out(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
-
-unsigned int nf_nat_ipv6_local_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
-
-unsigned int nf_nat_ipv6_fn(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct));
+int nf_nat_l3proto_ipv6_register_fn(struct net *net, const struct nf_hook_ops *ops);
+void nf_nat_l3proto_ipv6_unregister_fn(struct net *net, const struct nf_hook_ops *ops);
#endif /* _NF_NAT_L3PROTO_H */
diff --git a/include/net/netfilter/nf_nat_l4proto.h b/include/net/netfilter/nf_nat_l4proto.h
index 67835ff..b4d6b29 100644
--- a/include/net/netfilter/nf_nat_l4proto.h
+++ b/include/net/netfilter/nf_nat_l4proto.h
@@ -34,12 +34,12 @@
*/
void (*unique_tuple)(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct);
int (*nlattr_to_range)(struct nlattr *tb[],
- struct nf_nat_range *range);
+ struct nf_nat_range2 *range);
};
/* Protocol registration. */
@@ -72,11 +72,11 @@
void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct, u16 *rover);
int nf_nat_l4proto_nlattr_to_range(struct nlattr *tb[],
- struct nf_nat_range *range);
+ struct nf_nat_range2 *range);
#endif /*_NF_NAT_L4PROTO_H*/
diff --git a/include/net/netfilter/nf_nat_redirect.h b/include/net/netfilter/nf_nat_redirect.h
index 5ddabb0..c129aac 100644
--- a/include/net/netfilter/nf_nat_redirect.h
+++ b/include/net/netfilter/nf_nat_redirect.h
@@ -7,7 +7,7 @@
const struct nf_nat_ipv4_multi_range_compat *mr,
unsigned int hooknum);
unsigned int
-nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range *range,
+nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
unsigned int hooknum);
#endif /* _NF_NAT_REDIRECT_H_ */
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index a1e28dd..603b514 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -276,23 +276,6 @@
enum nft_set_class space;
};
-/**
- * struct nft_set_type - nf_tables set type
- *
- * @select_ops: function to select nft_set_ops
- * @ops: default ops, used when no select_ops functions is present
- * @list: used internally
- * @owner: module reference
- */
-struct nft_set_type {
- const struct nft_set_ops *(*select_ops)(const struct nft_ctx *,
- const struct nft_set_desc *desc,
- u32 flags);
- const struct nft_set_ops *ops;
- struct list_head list;
- struct module *owner;
-};
-
struct nft_set_ext;
struct nft_expr;
@@ -311,7 +294,6 @@
* @init: initialize private data of new set instance
* @destroy: destroy private data of set instance
* @elemsize: element private size
- * @features: features supported by the implementation
*/
struct nft_set_ops {
bool (*lookup)(const struct net *net,
@@ -362,10 +344,24 @@
void (*destroy)(const struct nft_set *set);
unsigned int elemsize;
- u32 features;
- const struct nft_set_type *type;
};
+/**
+ * struct nft_set_type - nf_tables set type
+ *
+ * @ops: set ops for this type
+ * @list: used internally
+ * @owner: module reference
+ * @features: features supported by the implementation
+ */
+struct nft_set_type {
+ const struct nft_set_ops ops;
+ struct list_head list;
+ struct module *owner;
+ u32 features;
+};
+#define to_set_type(o) container_of(o, struct nft_set_type, ops)
+
int nft_register_set(struct nft_set_type *type);
void nft_unregister_set(struct nft_set_type *type);
@@ -590,7 +586,7 @@
return nft_set_ext(ext, NFT_SET_EXT_TIMEOUT);
}
-static inline unsigned long *nft_set_ext_expiration(const struct nft_set_ext *ext)
+static inline u64 *nft_set_ext_expiration(const struct nft_set_ext *ext)
{
return nft_set_ext(ext, NFT_SET_EXT_EXPIRATION);
}
@@ -608,7 +604,7 @@
static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
{
return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) &&
- time_is_before_eq_jiffies(*nft_set_ext_expiration(ext));
+ time_is_before_eq_jiffies64(*nft_set_ext_expiration(ext));
}
static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
@@ -889,8 +885,8 @@
* @owner: module owner
* @hook_mask: mask of valid hooks
* @hooks: array of hook functions
- * @init: chain initialization function
- * @free: chain release function
+ * @ops_register: base chain register function
+ * @ops_unregister: base chain unregister function
*/
struct nft_chain_type {
const char *name;
@@ -899,8 +895,8 @@
struct module *owner;
unsigned int hook_mask;
nf_hookfn *hooks[NF_MAX_HOOKS];
- int (*init)(struct nft_ctx *ctx);
- void (*free)(struct nft_ctx *ctx);
+ int (*ops_register)(struct net *net, const struct nf_hook_ops *ops);
+ void (*ops_unregister)(struct net *net, const struct nf_hook_ops *ops);
};
int nft_chain_validate_dependency(const struct nft_chain *chain,
@@ -1020,9 +1016,9 @@
#define nft_expr_obj(expr) *((struct nft_object **)nft_expr_priv(expr))
-struct nft_object *nf_tables_obj_lookup(const struct nft_table *table,
- const struct nlattr *nla, u32 objtype,
- u8 genmask);
+struct nft_object *nft_obj_lookup(const struct nft_table *table,
+ const struct nlattr *nla, u32 objtype,
+ u8 genmask);
void nft_obj_notify(struct net *net, struct nft_table *table,
struct nft_object *obj, u32 portid, u32 seq,
@@ -1111,12 +1107,9 @@
struct nf_flowtable data;
};
-struct nft_flowtable *nf_tables_flowtable_lookup(const struct nft_table *table,
- const struct nlattr *nla,
- u8 genmask);
-void nft_flow_table_iterate(struct net *net,
- void (*iter)(struct nf_flowtable *flowtable, void *data),
- void *data);
+struct nft_flowtable *nft_flowtable_lookup(const struct nft_table *table,
+ const struct nlattr *nla,
+ u8 genmask);
void nft_register_flowtable_type(struct nf_flowtable_type *type);
void nft_unregister_flowtable_type(struct nf_flowtable_type *type);
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index ea5aab5..cd6915b 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -10,6 +10,9 @@
extern struct nft_expr_type nft_payload_type;
extern struct nft_expr_type nft_dynset_type;
extern struct nft_expr_type nft_range_type;
+extern struct nft_expr_type nft_meta_type;
+extern struct nft_expr_type nft_rt_type;
+extern struct nft_expr_type nft_exthdr_type;
int nf_tables_core_module_init(void);
void nf_tables_core_module_exit(void);
diff --git a/include/net/netfilter/nfnetlink_log.h b/include/net/netfilter/nfnetlink_log.h
index 612cfb6..ea32a7d 100644
--- a/include/net/netfilter/nfnetlink_log.h
+++ b/include/net/netfilter/nfnetlink_log.h
@@ -1,18 +1 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _KER_NFNETLINK_LOG_H
-#define _KER_NFNETLINK_LOG_H
-
-void
-nfulnl_log_packet(struct net *net,
- u_int8_t pf,
- unsigned int hooknum,
- const struct sk_buff *skb,
- const struct net_device *in,
- const struct net_device *out,
- const struct nf_loginfo *li_user,
- const char *prefix);
-
-#define NFULNL_COPY_DISABLED 0xff
-
-#endif /* _KER_NFNETLINK_LOG_H */
-
diff --git a/include/net/netfilter/nft_meta.h b/include/net/netfilter/nft_meta.h
deleted file mode 100644
index 5c69e9b..0000000
--- a/include/net/netfilter/nft_meta.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _NFT_META_H_
-#define _NFT_META_H_
-
-struct nft_meta {
- enum nft_meta_keys key:8;
- union {
- enum nft_registers dreg:8;
- enum nft_registers sreg:8;
- };
-};
-
-extern const struct nla_policy nft_meta_policy[];
-
-int nft_meta_get_init(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
- const struct nlattr * const tb[]);
-
-int nft_meta_set_init(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
- const struct nlattr * const tb[]);
-
-int nft_meta_get_dump(struct sk_buff *skb,
- const struct nft_expr *expr);
-
-int nft_meta_set_dump(struct sk_buff *skb,
- const struct nft_expr *expr);
-
-void nft_meta_get_eval(const struct nft_expr *expr,
- struct nft_regs *regs,
- const struct nft_pktinfo *pkt);
-
-void nft_meta_set_eval(const struct nft_expr *expr,
- struct nft_regs *regs,
- const struct nft_pktinfo *pkt);
-
-void nft_meta_set_destroy(const struct nft_ctx *ctx,
- const struct nft_expr *expr);
-
-int nft_meta_set_validate(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
- const struct nft_data **data);
-
-#endif
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 8491bc9..661348f 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -160,6 +160,8 @@
int sysctl_tcp_pacing_ca_ratio;
int sysctl_tcp_wmem[3];
int sysctl_tcp_rmem[3];
+ int sysctl_tcp_comp_sack_nr;
+ unsigned long sysctl_tcp_comp_sack_delay_ns;
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
int sysctl_tcp_fastopen;
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index c29f09c..c978a31 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -43,6 +43,7 @@
int max_hbh_opts_cnt;
int max_dst_opts_len;
int max_hbh_opts_len;
+ int seg6_flowlabel;
};
struct netns_ipv6 {
@@ -60,7 +61,8 @@
#endif
struct xt_table *ip6table_nat;
#endif
- struct rt6_info *ip6_null_entry;
+ struct fib6_info *fib6_null_entry;
+ struct rt6_info *ip6_null_entry;
struct rt6_statistics *rt6_stats;
struct timer_list ip6_fib_timer;
struct hlist_head *fib_table_hash;
diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h
index 4813435..29c3851 100644
--- a/include/net/netns/nftables.h
+++ b/include/net/netns/nftables.h
@@ -4,8 +4,6 @@
#include <linux/list.h>
-struct nft_af_info;
-
struct netns_nftables {
struct list_head tables;
struct list_head commit_list;
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
new file mode 100644
index 0000000..694d055
--- /dev/null
+++ b/include/net/page_pool.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * page_pool.h
+ * Author: Jesper Dangaard Brouer <netoptimizer@brouer.com>
+ * Copyright (C) 2016 Red Hat, Inc.
+ */
+
+/**
+ * DOC: page_pool allocator
+ *
+ * This page_pool allocator is optimized for the XDP mode that
+ * uses one-frame-per-page, but have fallbacks that act like the
+ * regular page allocator APIs.
+ *
+ * Basic use involve replacing alloc_pages() calls with the
+ * page_pool_alloc_pages() call. Drivers should likely use
+ * page_pool_dev_alloc_pages() replacing dev_alloc_pages().
+ *
+ * If page_pool handles DMA mapping (use page->private), then API user
+ * is responsible for invoking page_pool_put_page() once. In-case of
+ * elevated refcnt, the DMA state is released, assuming other users of
+ * the page will eventually call put_page().
+ *
+ * If no DMA mapping is done, then it can act as shim-layer that
+ * fall-through to alloc_page. As no state is kept on the page, the
+ * regular put_page() call is sufficient.
+ */
+#ifndef _NET_PAGE_POOL_H
+#define _NET_PAGE_POOL_H
+
+#include <linux/mm.h> /* Needed by ptr_ring */
+#include <linux/ptr_ring.h>
+#include <linux/dma-direction.h>
+
+#define PP_FLAG_DMA_MAP 1 /* Should page_pool do the DMA map/unmap */
+#define PP_FLAG_ALL PP_FLAG_DMA_MAP
+
+/*
+ * Fast allocation side cache array/stack
+ *
+ * The cache size and refill watermark is related to the network
+ * use-case. The NAPI budget is 64 packets. After a NAPI poll the RX
+ * ring is usually refilled and the max consumed elements will be 64,
+ * thus a natural max size of objects needed in the cache.
+ *
+ * Keeping room for more objects, is due to XDP_DROP use-case. As
+ * XDP_DROP allows the opportunity to recycle objects directly into
+ * this array, as it shares the same softirq/NAPI protection. If
+ * cache is already full (or partly full) then the XDP_DROP recycles
+ * would have to take a slower code path.
+ */
+#define PP_ALLOC_CACHE_SIZE 128
+#define PP_ALLOC_CACHE_REFILL 64
+struct pp_alloc_cache {
+ u32 count;
+ void *cache[PP_ALLOC_CACHE_SIZE];
+};
+
+struct page_pool_params {
+ unsigned int flags;
+ unsigned int order;
+ unsigned int pool_size;
+ int nid; /* Numa node id to allocate from pages from */
+ struct device *dev; /* device, for DMA pre-mapping purposes */
+ enum dma_data_direction dma_dir; /* DMA mapping direction */
+};
+
+struct page_pool {
+ struct rcu_head rcu;
+ struct page_pool_params p;
+
+ /*
+ * Data structure for allocation side
+ *
+ * Drivers allocation side usually already perform some kind
+ * of resource protection. Piggyback on this protection, and
+ * require driver to protect allocation side.
+ *
+ * For NIC drivers this means, allocate a page_pool per
+ * RX-queue. As the RX-queue is already protected by
+ * Softirq/BH scheduling and napi_schedule. NAPI schedule
+ * guarantee that a single napi_struct will only be scheduled
+ * on a single CPU (see napi_schedule).
+ */
+ struct pp_alloc_cache alloc ____cacheline_aligned_in_smp;
+
+ /* Data structure for storing recycled pages.
+ *
+ * Returning/freeing pages is more complicated synchronization
+ * wise, because free's can happen on remote CPUs, with no
+ * association with allocation resource.
+ *
+ * Use ptr_ring, as it separates consumer and producer
+ * effeciently, it a way that doesn't bounce cache-lines.
+ *
+ * TODO: Implement bulk return pages into this structure.
+ */
+ struct ptr_ring ring;
+};
+
+struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
+
+static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
+{
+ gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+ return page_pool_alloc_pages(pool, gfp);
+}
+
+struct page_pool *page_pool_create(const struct page_pool_params *params);
+
+void page_pool_destroy(struct page_pool *pool);
+
+/* Never call this directly, use helpers below */
+void __page_pool_put_page(struct page_pool *pool,
+ struct page *page, bool allow_direct);
+
+static inline void page_pool_put_page(struct page_pool *pool,
+ struct page *page, bool allow_direct)
+{
+ /* When page_pool isn't compiled-in, net/core/xdp.c doesn't
+ * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
+ */
+#ifdef CONFIG_PAGE_POOL
+ __page_pool_put_page(pool, page, allow_direct);
+#endif
+}
+/* Very limited use-cases allow recycle direct */
+static inline void page_pool_recycle_direct(struct page_pool *pool,
+ struct page *page)
+{
+ __page_pool_put_page(pool, page, true);
+}
+
+static inline bool is_page_pool_compiled_in(void)
+{
+#ifdef CONFIG_PAGE_POOL
+ return true;
+#else
+ return false;
+#endif
+}
+
+#endif /* _NET_PAGE_POOL_H */
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e828d31..f3ec437 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -33,7 +33,7 @@
};
struct tcf_block_cb;
-bool tcf_queue_work(struct work_struct *work);
+bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
#ifdef CONFIG_NET_CLS
struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
@@ -683,9 +683,11 @@
/* SKIP_HW and SKIP_SW are mutually exclusive flags. */
static inline bool tc_flags_valid(u32 flags)
{
- if (flags & ~(TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW))
+ if (flags & ~(TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW |
+ TCA_CLS_FLAGS_VERBOSE))
return false;
+ flags &= TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW;
if (!(flags ^ (TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW)))
return false;
@@ -705,7 +707,7 @@
cls_common->chain_index = tp->chain->index;
cls_common->protocol = tp->protocol;
cls_common->prio = tp->prio;
- if (tc_skip_sw(flags))
+ if (tc_skip_sw(flags) || flags & TCA_CLS_FLAGS_VERBOSE)
cls_common->extack = extack;
}
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5154c83..98c10a2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -30,7 +30,6 @@
enum qdisc_state_t {
__QDISC_STATE_SCHED,
__QDISC_STATE_DEACTIVATED,
- __QDISC_STATE_RUNNING,
};
struct qdisc_size_table {
@@ -102,6 +101,7 @@
refcount_t refcnt;
spinlock_t busylock ____cacheline_aligned_in_smp;
+ spinlock_t seqlock;
};
static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
@@ -111,15 +111,21 @@
refcount_inc(&qdisc->refcnt);
}
-static inline bool qdisc_is_running(const struct Qdisc *qdisc)
+static inline bool qdisc_is_running(struct Qdisc *qdisc)
{
+ if (qdisc->flags & TCQ_F_NOLOCK)
+ return spin_is_locked(&qdisc->seqlock);
return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
}
static inline bool qdisc_run_begin(struct Qdisc *qdisc)
{
- if (qdisc_is_running(qdisc))
+ if (qdisc->flags & TCQ_F_NOLOCK) {
+ if (!spin_trylock(&qdisc->seqlock))
+ return false;
+ } else if (qdisc_is_running(qdisc)) {
return false;
+ }
/* Variant of write_seqcount_begin() telling lockdep a trylock
* was attempted.
*/
@@ -131,6 +137,8 @@
static inline void qdisc_run_end(struct Qdisc *qdisc)
{
write_seqcount_end(&qdisc->running);
+ if (qdisc->flags & TCQ_F_NOLOCK)
+ spin_unlock(&qdisc->seqlock);
}
static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 20ff237..86f034b 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -254,11 +254,10 @@
#define SCTP_TSN_MAP_SIZE 4096
/* We will not record more than this many duplicate TSNs between two
- * SACKs. The minimum PMTU is 576. Remove all the headers and there
- * is enough room for 131 duplicate reports. Round down to the
+ * SACKs. The minimum PMTU is 512. Remove all the headers and there
+ * is enough room for 117 duplicate reports. Round down to the
* nearest power of 2.
*/
-enum { SCTP_MIN_PMTU = 576 };
enum { SCTP_MAX_DUP_TSNS = 16 };
enum { SCTP_MAX_GABS = 16 };
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 35498e6..8c2caa3 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -430,32 +430,6 @@
return (head->next != head) && (head->next == head->prev);
}
-/* Break down data chunks at this point. */
-static inline int sctp_frag_point(const struct sctp_association *asoc, int pmtu)
-{
- struct sctp_sock *sp = sctp_sk(asoc->base.sk);
- struct sctp_af *af = sp->pf->af;
- int frag = pmtu;
-
- frag -= af->ip_options_len(asoc->base.sk);
- frag -= af->net_header_len;
- frag -= sizeof(struct sctphdr) + sctp_datachk_len(&asoc->stream);
-
- if (asoc->user_frag)
- frag = min_t(int, frag, asoc->user_frag);
-
- frag = SCTP_TRUNC4(min_t(int, frag, SCTP_MAX_CHUNK_LEN -
- sctp_datachk_len(&asoc->stream)));
-
- return frag;
-}
-
-static inline void sctp_assoc_pending_pmtu(struct sctp_association *asoc)
-{
- sctp_assoc_sync_pmtu(asoc);
- asoc->pmtu_pending = 0;
-}
-
static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
{
return !list_empty(&chunk->list);
@@ -609,17 +583,29 @@
return t->dst;
}
-static inline bool sctp_transport_pmtu_check(struct sctp_transport *t)
+/* Calculate max payload size given a MTU, or the total overhead if
+ * given MTU is zero
+ */
+static inline __u32 sctp_mtu_payload(const struct sctp_sock *sp,
+ __u32 mtu, __u32 extra)
{
- __u32 pmtu = max_t(size_t, SCTP_TRUNC4(dst_mtu(t->dst)),
- SCTP_DEFAULT_MINSEGMENT);
+ __u32 overhead = sizeof(struct sctphdr) + extra;
- if (t->pathmtu == pmtu)
- return true;
+ if (sp)
+ overhead += sp->pf->af->net_header_len;
+ else
+ overhead += sizeof(struct ipv6hdr);
- t->pathmtu = pmtu;
+ if (WARN_ON_ONCE(mtu && mtu <= overhead))
+ mtu = overhead;
- return false;
+ return mtu ? mtu - overhead : overhead;
+}
+
+static inline __u32 sctp_dst_mtu(const struct dst_entry *dst)
+{
+ return SCTP_TRUNC4(max_t(__u32, dst_mtu(dst),
+ SCTP_DEFAULT_MINSEGMENT));
}
#endif /* __net_sctp_h__ */
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 2d0e782..5ef1bad 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -207,7 +207,7 @@
int len, __u8 flags, gfp_t gfp);
struct sctp_chunk *sctp_make_ecne(const struct sctp_association *asoc,
const __u32 lowest_tsn);
-struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc);
+struct sctp_chunk *sctp_make_sack(struct sctp_association *asoc);
struct sctp_chunk *sctp_make_shutdown(const struct sctp_association *asoc,
const struct sctp_chunk *chunk);
struct sctp_chunk *sctp_make_shutdown_ack(const struct sctp_association *asoc,
@@ -215,7 +215,7 @@
struct sctp_chunk *sctp_make_shutdown_complete(
const struct sctp_association *asoc,
const struct sctp_chunk *chunk);
-void sctp_init_cause(struct sctp_chunk *chunk, __be16 cause, size_t paylen);
+int sctp_init_cause(struct sctp_chunk *chunk, __be16 cause, size_t paylen);
struct sctp_chunk *sctp_make_abort(const struct sctp_association *asoc,
const struct sctp_chunk *chunk,
const size_t hint);
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a0ec462..ebf809e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -2091,16 +2091,14 @@
enum sctp_transport_cmd command,
sctp_sn_error_t error);
struct sctp_transport *sctp_assoc_lookup_tsn(struct sctp_association *, __u32);
-struct sctp_transport *sctp_assoc_is_match(struct sctp_association *,
- struct net *,
- const union sctp_addr *,
- const union sctp_addr *);
void sctp_assoc_migrate(struct sctp_association *, struct sock *);
int sctp_assoc_update(struct sctp_association *old,
struct sctp_association *new);
__u32 sctp_association_get_next_tsn(struct sctp_association *);
+void sctp_assoc_update_frag_point(struct sctp_association *asoc);
+void sctp_assoc_set_pmtu(struct sctp_association *asoc, __u32 pmtu);
void sctp_assoc_sync_pmtu(struct sctp_association *asoc);
void sctp_assoc_rwnd_increase(struct sctp_association *, unsigned int);
void sctp_assoc_rwnd_decrease(struct sctp_association *, unsigned int);
diff --git a/include/net/seg6.h b/include/net/seg6.h
index 099bad5..e029e30 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -49,7 +49,11 @@
static inline struct seg6_pernet_data *seg6_pernet(struct net *net)
{
+#if IS_ENABLED(CONFIG_IPV6)
return net->ipv6.seg6_data;
+#else
+ return NULL;
+#endif
}
extern int seg6_init(void);
@@ -63,5 +67,6 @@
extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
int proto);
extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
-
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+ u32 tbl_id);
#endif
diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
new file mode 100644
index 0000000..661fd5b
--- /dev/null
+++ b/include/net/seg6_local.h
@@ -0,0 +1,32 @@
+/*
+ * SR-IPv6 implementation
+ *
+ * Authors:
+ * David Lebrun <david.lebrun@uclouvain.be>
+ * eBPF support: Mathieu Xhonneux <m.xhonneux@gmail.com>
+ *
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NET_SEG6_LOCAL_H
+#define _NET_SEG6_LOCAL_H
+
+#include <linux/percpu.h>
+#include <linux/net.h>
+#include <linux/ipv6.h>
+
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+ u32 tbl_id);
+
+struct seg6_bpf_srh_state {
+ bool valid;
+ u16 hdrlen;
+};
+
+DECLARE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
+
+#endif
diff --git a/include/net/sock.h b/include/net/sock.h
index 74d725f..4f7c584 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -481,6 +481,11 @@
void (*sk_error_report)(struct sock *sk);
int (*sk_backlog_rcv)(struct sock *sk,
struct sk_buff *skb);
+#ifdef CONFIG_SOCK_VALIDATE_XMIT
+ struct sk_buff* (*sk_validate_xmit_skb)(struct sock *sk,
+ struct net_device *dev,
+ struct sk_buff *skb);
+#endif
void (*sk_destruct)(struct sock *sk);
struct sock_reuseport __rcu *sk_reuseport_cb;
struct rcu_head sk_rcu;
@@ -803,10 +808,10 @@
}
#ifdef CONFIG_NET
-extern struct static_key memalloc_socks;
+DECLARE_STATIC_KEY_FALSE(memalloc_socks_key);
static inline int sk_memalloc_socks(void)
{
- return static_key_false(&memalloc_socks);
+ return static_branch_unlikely(&memalloc_socks_key);
}
#else
@@ -2332,6 +2337,22 @@
return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
}
+/* Checks if this SKB belongs to an HW offloaded socket
+ * and whether any SW fallbacks are required based on dev.
+ */
+static inline struct sk_buff *sk_validate_xmit_skb(struct sk_buff *skb,
+ struct net_device *dev)
+{
+#ifdef CONFIG_SOCK_VALIDATE_XMIT
+ struct sock *sk = skb->sk;
+
+ if (sk && sk_fullsock(sk) && sk->sk_validate_xmit_skb)
+ skb = sk->sk_validate_xmit_skb(sk, dev, skb);
+#endif
+
+ return skb;
+}
+
/* This helper checks if a socket is a LISTEN or NEW_SYN_RECV
* SYNACK messages can be attached to either ones (depending on SYNCOOKIE)
*/
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 39bc855..d574ce6 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -155,6 +155,7 @@
struct switchdev_notifier_info info; /* must be first */
const unsigned char *addr;
u16 vid;
+ bool added_by_user;
};
static inline struct net_device *
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9c9b376..952d842 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -245,6 +245,7 @@
#define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */
#define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */
+#define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */
extern atomic_long_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
@@ -402,6 +403,10 @@
void tcp_syn_ack_timeout(const struct request_sock *req);
int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
int flags, int *addr_len);
+int tcp_set_rcvlowat(struct sock *sk, int val);
+void tcp_data_ready(struct sock *sk);
+int tcp_mmap(struct file *file, struct socket *sock,
+ struct vm_area_struct *vma);
void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
struct tcp_options_received *opt_rx,
int estab, struct tcp_fastopen_cookie *foc);
@@ -553,7 +558,12 @@
void tcp_init_xmit_timers(struct sock *);
static inline void tcp_clear_xmit_timers(struct sock *sk)
{
- hrtimer_cancel(&tcp_sk(sk)->pacing_timer);
+ if (hrtimer_try_to_cancel(&tcp_sk(sk)->pacing_timer) == 1)
+ __sock_put(sk);
+
+ if (hrtimer_try_to_cancel(&tcp_sk(sk)->compressed_ack_timer) == 1)
+ __sock_put(sk);
+
inet_csk_clear_xmit_timers(sk);
}
@@ -810,9 +820,8 @@
#endif
} header; /* For incoming skbs */
struct {
- __u32 key;
__u32 flags;
- struct bpf_map *map;
+ struct sock *sk_redir;
void *data_end;
} bpf;
};
@@ -1871,6 +1880,10 @@
void tcp_init(void);
/* tcp_recovery.c */
+void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb);
+void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced);
+extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb,
+ u32 reo_wnd);
extern void tcp_rack_mark_lost(struct sock *sk);
extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq,
u64 xmit_time);
@@ -2101,4 +2114,12 @@
#if IS_ENABLED(CONFIG_SMC)
extern struct static_key_false tcp_have_smc;
#endif
+
+#if IS_ENABLED(CONFIG_TLS_DEVICE)
+void clean_acked_data_enable(struct inet_connection_sock *icsk,
+ void (*cad)(struct sock *sk, u32 ack_seq));
+void clean_acked_data_disable(struct inet_connection_sock *icsk);
+
+#endif
+
#endif /* _TCP_H */
diff --git a/include/net/tipc.h b/include/net/tipc.h
index 07670ec..f0e7e6b 100644
--- a/include/net/tipc.h
+++ b/include/net/tipc.h
@@ -44,11 +44,11 @@
__be32 w[4];
};
-static inline u32 tipc_hdr_rps_key(struct tipc_basic_hdr *hdr)
+static inline __be32 tipc_hdr_rps_key(struct tipc_basic_hdr *hdr)
{
u32 w0 = ntohl(hdr->w[0]);
bool keepalive_msg = (w0 & KEEPALIVE_MSG_MASK) == KEEPALIVE_MSG_MASK;
- int key;
+ __be32 key;
/* Return source node identity as key */
if (likely(!keepalive_msg))
diff --git a/include/net/tls.h b/include/net/tls.h
index f5fb16d..70c2737 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -83,24 +83,10 @@
void (*unhash)(struct tls_device *device, struct sock *sk);
};
-struct tls_sw_context {
+struct tls_sw_context_tx {
struct crypto_aead *aead_send;
- struct crypto_aead *aead_recv;
struct crypto_wait async_wait;
- /* Receive context */
- struct strparser strp;
- void (*saved_data_ready)(struct sock *sk);
- unsigned int (*sk_poll)(struct file *file, struct socket *sock,
- struct poll_table_struct *wait);
- struct sk_buff *recv_pkt;
- u8 control;
- bool decrypted;
-
- char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
- char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
-
- /* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
unsigned int sg_plaintext_size;
@@ -117,6 +103,54 @@
struct scatterlist sg_aead_out[2];
};
+struct tls_sw_context_rx {
+ struct crypto_aead *aead_recv;
+ struct crypto_wait async_wait;
+
+ struct strparser strp;
+ void (*saved_data_ready)(struct sock *sk);
+ unsigned int (*sk_poll)(struct file *file, struct socket *sock,
+ struct poll_table_struct *wait);
+ struct sk_buff *recv_pkt;
+ u8 control;
+ bool decrypted;
+
+ char rx_aad_ciphertext[TLS_AAD_SPACE_SIZE];
+ char rx_aad_plaintext[TLS_AAD_SPACE_SIZE];
+
+};
+
+struct tls_record_info {
+ struct list_head list;
+ u32 end_seq;
+ int len;
+ int num_frags;
+ skb_frag_t frags[MAX_SKB_FRAGS];
+};
+
+struct tls_offload_context {
+ struct crypto_aead *aead_send;
+ spinlock_t lock; /* protects records list */
+ struct list_head records_list;
+ struct tls_record_info *open_record;
+ struct tls_record_info *retransmit_hint;
+ u64 hint_record_sn;
+ u64 unacked_record_sn;
+
+ struct scatterlist sg_tx_data[MAX_SKB_FRAGS];
+ void (*sk_destruct)(struct sock *sk);
+ u8 driver_state[];
+ /* The TLS layer reserves room for driver specific state
+ * Currently the belief is that there is not enough
+ * driver specific state to justify another layer of indirection
+ */
+#define TLS_DRIVER_STATE_SIZE (max_t(size_t, 8, sizeof(void *)))
+};
+
+#define TLS_OFFLOAD_CONTEXT_SIZE \
+ (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) + \
+ TLS_DRIVER_STATE_SIZE)
+
enum {
TLS_PENDING_CLOSED_RECORD
};
@@ -141,9 +175,15 @@
struct tls12_crypto_info_aes_gcm_128 crypto_recv_aes_gcm_128;
};
- void *priv_ctx;
+ struct list_head list;
+ struct net_device *netdev;
+ refcount_t refcount;
- u8 conf:3;
+ void *priv_ctx_tx;
+ void *priv_ctx_rx;
+
+ u8 tx_conf:3;
+ u8 rx_conf:3;
struct cipher_context tx;
struct cipher_context rx;
@@ -181,7 +221,8 @@
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
void tls_sw_close(struct sock *sk, long timeout);
-void tls_sw_free_resources(struct sock *sk);
+void tls_sw_free_resources_tx(struct sock *sk);
+void tls_sw_free_resources_rx(struct sock *sk);
int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int nonblock, int flags, int *addr_len);
unsigned int tls_sw_poll(struct file *file, struct socket *sock,
@@ -190,9 +231,28 @@
struct pipe_inode_info *pipe,
size_t len, unsigned int flags);
-void tls_sk_destruct(struct sock *sk, struct tls_context *ctx);
-void tls_icsk_clean_acked(struct sock *sk);
+int tls_set_device_offload(struct sock *sk, struct tls_context *ctx);
+int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
+int tls_device_sendpage(struct sock *sk, struct page *page,
+ int offset, size_t size, int flags);
+void tls_device_sk_destruct(struct sock *sk);
+void tls_device_init(void);
+void tls_device_cleanup(void);
+struct tls_record_info *tls_get_record(struct tls_offload_context *context,
+ u32 seq, u64 *p_record_sn);
+
+static inline bool tls_record_is_start_marker(struct tls_record_info *rec)
+{
+ return rec->len == 0;
+}
+
+static inline u32 tls_record_start_seq(struct tls_record_info *rec)
+{
+ return rec->end_seq - rec->len;
+}
+
+void tls_sk_destruct(struct sock *sk, struct tls_context *ctx);
int tls_push_sg(struct sock *sk, struct tls_context *ctx,
struct scatterlist *sg, u16 first_offset,
int flags);
@@ -229,6 +289,13 @@
return tls_ctx->pending_open_record_frags;
}
+static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk)
+{
+ return sk_fullsock(sk) &&
+ /* matches smp_store_release in tls_set_device_offload */
+ smp_load_acquire(&sk->sk_destruct) == &tls_device_sk_destruct;
+}
+
static inline void tls_err_abort(struct sock *sk, int err)
{
sk->sk_err = err;
@@ -301,16 +368,22 @@
return icsk->icsk_ulp_data;
}
-static inline struct tls_sw_context *tls_sw_ctx(
+static inline struct tls_sw_context_rx *tls_sw_ctx_rx(
const struct tls_context *tls_ctx)
{
- return (struct tls_sw_context *)tls_ctx->priv_ctx;
+ return (struct tls_sw_context_rx *)tls_ctx->priv_ctx_rx;
+}
+
+static inline struct tls_sw_context_tx *tls_sw_ctx_tx(
+ const struct tls_context *tls_ctx)
+{
+ return (struct tls_sw_context_tx *)tls_ctx->priv_ctx_tx;
}
static inline struct tls_offload_context *tls_offload_ctx(
const struct tls_context *tls_ctx)
{
- return (struct tls_offload_context *)tls_ctx->priv_ctx;
+ return (struct tls_offload_context *)tls_ctx->priv_ctx_tx;
}
int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
@@ -318,4 +391,12 @@
void tls_register_device(struct tls_device *device);
void tls_unregister_device(struct tls_device *device);
+struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
+ struct net_device *dev,
+ struct sk_buff *skb);
+
+int tls_sw_fallback_init(struct sock *sk,
+ struct tls_offload_context *offload_ctx,
+ struct tls_crypto_info *crypto_info);
+
#endif /* _TLS_OFFLOAD_H */
diff --git a/include/net/udp.h b/include/net/udp.h
index 0676b27..9289b64 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -174,6 +174,9 @@
struct udphdr *uh, udp_lookup_t lookup);
int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
+struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
+ netdev_features_t features);
+
static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb)
{
struct udphdr *uh;
@@ -269,6 +272,7 @@
int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);
int udp_push_pending_frames(struct sock *sk);
void udp_flush_pending_frames(struct sock *sk);
+int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size);
void udp4_hwcsum(struct sk_buff *skb, __be32 src, __be32 dst);
int udp_rcv(struct sk_buff *skb);
int udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index ad73d8b..b99a02ae 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -262,6 +262,7 @@
#define VXLAN_F_COLLECT_METADATA 0x2000
#define VXLAN_F_GPE 0x4000
#define VXLAN_F_IPV6_LINKLOCAL 0x8000
+#define VXLAN_F_TTL_INHERIT 0x10000
/* Flags that are used in the receive path. These flags must match in
* order for a socket to be shareable
diff --git a/include/net/xdp.h b/include/net/xdp.h
index b2362dd..7ad7792 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -33,16 +33,101 @@
* also mandatory during RX-ring setup.
*/
+enum xdp_mem_type {
+ MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
+ MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */
+ MEM_TYPE_PAGE_POOL,
+ MEM_TYPE_MAX,
+};
+
+struct xdp_mem_info {
+ u32 type; /* enum xdp_mem_type, but known size type */
+ u32 id;
+};
+
+struct page_pool;
+
struct xdp_rxq_info {
struct net_device *dev;
u32 queue_index;
u32 reg_state;
+ struct xdp_mem_info mem;
} ____cacheline_aligned; /* perf critical, avoid false-sharing */
+struct xdp_buff {
+ void *data;
+ void *data_end;
+ void *data_meta;
+ void *data_hard_start;
+ struct xdp_rxq_info *rxq;
+};
+
+struct xdp_frame {
+ void *data;
+ u16 len;
+ u16 headroom;
+ u16 metasize;
+ /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
+ * while mem info is valid on remote CPU.
+ */
+ struct xdp_mem_info mem;
+ struct net_device *dev_rx; /* used by cpumap */
+};
+
+/* Convert xdp_buff to xdp_frame */
+static inline
+struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
+{
+ struct xdp_frame *xdp_frame;
+ int metasize;
+ int headroom;
+
+ /* Assure headroom is available for storing info */
+ headroom = xdp->data - xdp->data_hard_start;
+ metasize = xdp->data - xdp->data_meta;
+ metasize = metasize > 0 ? metasize : 0;
+ if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
+ return NULL;
+
+ /* Store info in top of packet */
+ xdp_frame = xdp->data_hard_start;
+
+ xdp_frame->data = xdp->data;
+ xdp_frame->len = xdp->data_end - xdp->data;
+ xdp_frame->headroom = headroom - sizeof(*xdp_frame);
+ xdp_frame->metasize = metasize;
+
+ /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
+ xdp_frame->mem = xdp->rxq->mem;
+
+ return xdp_frame;
+}
+
+void xdp_return_frame(struct xdp_frame *xdpf);
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
+void xdp_return_buff(struct xdp_buff *xdp);
+
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
struct net_device *dev, u32 queue_index);
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
+int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
+ enum xdp_mem_type type, void *allocator);
+
+/* Drivers not supporting XDP metadata can use this helper, which
+ * rejects any room expansion for metadata as a result.
+ */
+static __always_inline void
+xdp_set_data_meta_invalid(struct xdp_buff *xdp)
+{
+ xdp->data_meta = xdp->data + 1;
+}
+
+static __always_inline bool
+xdp_data_meta_unsupported(const struct xdp_buff *xdp)
+{
+ return unlikely(xdp->data_meta > xdp->data);
+}
#endif /* __LINUX_NET_XDP_H__ */
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
new file mode 100644
index 0000000..7a647c5
--- /dev/null
+++ b/include/net/xdp_sock.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* AF_XDP internal functions
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#ifndef _LINUX_XDP_SOCK_H
+#define _LINUX_XDP_SOCK_H
+
+#include <linux/mutex.h>
+#include <net/sock.h>
+
+struct net_device;
+struct xsk_queue;
+struct xdp_umem;
+
+struct xdp_sock {
+ /* struct sock must be the first member of struct xdp_sock */
+ struct sock sk;
+ struct xsk_queue *rx;
+ struct net_device *dev;
+ struct xdp_umem *umem;
+ struct list_head flush_node;
+ u16 queue_id;
+ struct xsk_queue *tx ____cacheline_aligned_in_smp;
+ /* Protects multiple processes in the control path */
+ struct mutex mutex;
+ u64 rx_dropped;
+};
+
+struct xdp_buff;
+#ifdef CONFIG_XDP_SOCKETS
+int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
+int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
+void xsk_flush(struct xdp_sock *xs);
+bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
+#else
+static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
+ return -ENOTSUPP;
+}
+
+static inline int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
+ return -ENOTSUPP;
+}
+
+static inline void xsk_flush(struct xdp_sock *xs)
+{
+}
+
+static inline bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
+{
+ return false;
+}
+#endif /* CONFIG_XDP_SOCKETS */
+
+#endif /* _LINUX_XDP_SOCK_H */
diff --git a/include/trace/events/bpf.h b/include/trace/events/bpf.h
deleted file mode 100644
index 1501856..0000000
--- a/include/trace/events/bpf.h
+++ /dev/null
@@ -1,355 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM bpf
-
-#if !defined(_TRACE_BPF_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_BPF_H
-
-/* These are only used within the BPF_SYSCALL code */
-#ifdef CONFIG_BPF_SYSCALL
-
-#include <linux/filter.h>
-#include <linux/bpf.h>
-#include <linux/fs.h>
-#include <linux/tracepoint.h>
-
-#define __PROG_TYPE_MAP(FN) \
- FN(SOCKET_FILTER) \
- FN(KPROBE) \
- FN(SCHED_CLS) \
- FN(SCHED_ACT) \
- FN(TRACEPOINT) \
- FN(XDP) \
- FN(PERF_EVENT) \
- FN(CGROUP_SKB) \
- FN(CGROUP_SOCK) \
- FN(LWT_IN) \
- FN(LWT_OUT) \
- FN(LWT_XMIT)
-
-#define __MAP_TYPE_MAP(FN) \
- FN(HASH) \
- FN(ARRAY) \
- FN(PROG_ARRAY) \
- FN(PERF_EVENT_ARRAY) \
- FN(PERCPU_HASH) \
- FN(PERCPU_ARRAY) \
- FN(STACK_TRACE) \
- FN(CGROUP_ARRAY) \
- FN(LRU_HASH) \
- FN(LRU_PERCPU_HASH) \
- FN(LPM_TRIE)
-
-#define __PROG_TYPE_TP_FN(x) \
- TRACE_DEFINE_ENUM(BPF_PROG_TYPE_##x);
-#define __PROG_TYPE_SYM_FN(x) \
- { BPF_PROG_TYPE_##x, #x },
-#define __PROG_TYPE_SYM_TAB \
- __PROG_TYPE_MAP(__PROG_TYPE_SYM_FN) { -1, 0 }
-__PROG_TYPE_MAP(__PROG_TYPE_TP_FN)
-
-#define __MAP_TYPE_TP_FN(x) \
- TRACE_DEFINE_ENUM(BPF_MAP_TYPE_##x);
-#define __MAP_TYPE_SYM_FN(x) \
- { BPF_MAP_TYPE_##x, #x },
-#define __MAP_TYPE_SYM_TAB \
- __MAP_TYPE_MAP(__MAP_TYPE_SYM_FN) { -1, 0 }
-__MAP_TYPE_MAP(__MAP_TYPE_TP_FN)
-
-DECLARE_EVENT_CLASS(bpf_prog_event,
-
- TP_PROTO(const struct bpf_prog *prg),
-
- TP_ARGS(prg),
-
- TP_STRUCT__entry(
- __array(u8, prog_tag, 8)
- __field(u32, type)
- ),
-
- TP_fast_assign(
- BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(prg->tag));
- memcpy(__entry->prog_tag, prg->tag, sizeof(prg->tag));
- __entry->type = prg->type;
- ),
-
- TP_printk("prog=%s type=%s",
- __print_hex_str(__entry->prog_tag, 8),
- __print_symbolic(__entry->type, __PROG_TYPE_SYM_TAB))
-);
-
-DEFINE_EVENT(bpf_prog_event, bpf_prog_get_type,
-
- TP_PROTO(const struct bpf_prog *prg),
-
- TP_ARGS(prg)
-);
-
-DEFINE_EVENT(bpf_prog_event, bpf_prog_put_rcu,
-
- TP_PROTO(const struct bpf_prog *prg),
-
- TP_ARGS(prg)
-);
-
-TRACE_EVENT(bpf_prog_load,
-
- TP_PROTO(const struct bpf_prog *prg, int ufd),
-
- TP_ARGS(prg, ufd),
-
- TP_STRUCT__entry(
- __array(u8, prog_tag, 8)
- __field(u32, type)
- __field(int, ufd)
- ),
-
- TP_fast_assign(
- BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(prg->tag));
- memcpy(__entry->prog_tag, prg->tag, sizeof(prg->tag));
- __entry->type = prg->type;
- __entry->ufd = ufd;
- ),
-
- TP_printk("prog=%s type=%s ufd=%d",
- __print_hex_str(__entry->prog_tag, 8),
- __print_symbolic(__entry->type, __PROG_TYPE_SYM_TAB),
- __entry->ufd)
-);
-
-TRACE_EVENT(bpf_map_create,
-
- TP_PROTO(const struct bpf_map *map, int ufd),
-
- TP_ARGS(map, ufd),
-
- TP_STRUCT__entry(
- __field(u32, type)
- __field(u32, size_key)
- __field(u32, size_value)
- __field(u32, max_entries)
- __field(u32, flags)
- __field(int, ufd)
- ),
-
- TP_fast_assign(
- __entry->type = map->map_type;
- __entry->size_key = map->key_size;
- __entry->size_value = map->value_size;
- __entry->max_entries = map->max_entries;
- __entry->flags = map->map_flags;
- __entry->ufd = ufd;
- ),
-
- TP_printk("map type=%s ufd=%d key=%u val=%u max=%u flags=%x",
- __print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
- __entry->ufd, __entry->size_key, __entry->size_value,
- __entry->max_entries, __entry->flags)
-);
-
-DECLARE_EVENT_CLASS(bpf_obj_prog,
-
- TP_PROTO(const struct bpf_prog *prg, int ufd,
- const struct filename *pname),
-
- TP_ARGS(prg, ufd, pname),
-
- TP_STRUCT__entry(
- __array(u8, prog_tag, 8)
- __field(int, ufd)
- __string(path, pname->name)
- ),
-
- TP_fast_assign(
- BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(prg->tag));
- memcpy(__entry->prog_tag, prg->tag, sizeof(prg->tag));
- __assign_str(path, pname->name);
- __entry->ufd = ufd;
- ),
-
- TP_printk("prog=%s path=%s ufd=%d",
- __print_hex_str(__entry->prog_tag, 8),
- __get_str(path), __entry->ufd)
-);
-
-DEFINE_EVENT(bpf_obj_prog, bpf_obj_pin_prog,
-
- TP_PROTO(const struct bpf_prog *prg, int ufd,
- const struct filename *pname),
-
- TP_ARGS(prg, ufd, pname)
-);
-
-DEFINE_EVENT(bpf_obj_prog, bpf_obj_get_prog,
-
- TP_PROTO(const struct bpf_prog *prg, int ufd,
- const struct filename *pname),
-
- TP_ARGS(prg, ufd, pname)
-);
-
-DECLARE_EVENT_CLASS(bpf_obj_map,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const struct filename *pname),
-
- TP_ARGS(map, ufd, pname),
-
- TP_STRUCT__entry(
- __field(u32, type)
- __field(int, ufd)
- __string(path, pname->name)
- ),
-
- TP_fast_assign(
- __assign_str(path, pname->name);
- __entry->type = map->map_type;
- __entry->ufd = ufd;
- ),
-
- TP_printk("map type=%s ufd=%d path=%s",
- __print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
- __entry->ufd, __get_str(path))
-);
-
-DEFINE_EVENT(bpf_obj_map, bpf_obj_pin_map,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const struct filename *pname),
-
- TP_ARGS(map, ufd, pname)
-);
-
-DEFINE_EVENT(bpf_obj_map, bpf_obj_get_map,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const struct filename *pname),
-
- TP_ARGS(map, ufd, pname)
-);
-
-DECLARE_EVENT_CLASS(bpf_map_keyval,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const void *key, const void *val),
-
- TP_ARGS(map, ufd, key, val),
-
- TP_STRUCT__entry(
- __field(u32, type)
- __field(u32, key_len)
- __dynamic_array(u8, key, map->key_size)
- __field(bool, key_trunc)
- __field(u32, val_len)
- __dynamic_array(u8, val, map->value_size)
- __field(bool, val_trunc)
- __field(int, ufd)
- ),
-
- TP_fast_assign(
- memcpy(__get_dynamic_array(key), key, map->key_size);
- memcpy(__get_dynamic_array(val), val, map->value_size);
- __entry->type = map->map_type;
- __entry->key_len = min(map->key_size, 16U);
- __entry->key_trunc = map->key_size != __entry->key_len;
- __entry->val_len = min(map->value_size, 16U);
- __entry->val_trunc = map->value_size != __entry->val_len;
- __entry->ufd = ufd;
- ),
-
- TP_printk("map type=%s ufd=%d key=[%s%s] val=[%s%s]",
- __print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
- __entry->ufd,
- __print_hex(__get_dynamic_array(key), __entry->key_len),
- __entry->key_trunc ? " ..." : "",
- __print_hex(__get_dynamic_array(val), __entry->val_len),
- __entry->val_trunc ? " ..." : "")
-);
-
-DEFINE_EVENT(bpf_map_keyval, bpf_map_lookup_elem,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const void *key, const void *val),
-
- TP_ARGS(map, ufd, key, val)
-);
-
-DEFINE_EVENT(bpf_map_keyval, bpf_map_update_elem,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const void *key, const void *val),
-
- TP_ARGS(map, ufd, key, val)
-);
-
-TRACE_EVENT(bpf_map_delete_elem,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const void *key),
-
- TP_ARGS(map, ufd, key),
-
- TP_STRUCT__entry(
- __field(u32, type)
- __field(u32, key_len)
- __dynamic_array(u8, key, map->key_size)
- __field(bool, key_trunc)
- __field(int, ufd)
- ),
-
- TP_fast_assign(
- memcpy(__get_dynamic_array(key), key, map->key_size);
- __entry->type = map->map_type;
- __entry->key_len = min(map->key_size, 16U);
- __entry->key_trunc = map->key_size != __entry->key_len;
- __entry->ufd = ufd;
- ),
-
- TP_printk("map type=%s ufd=%d key=[%s%s]",
- __print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
- __entry->ufd,
- __print_hex(__get_dynamic_array(key), __entry->key_len),
- __entry->key_trunc ? " ..." : "")
-);
-
-TRACE_EVENT(bpf_map_next_key,
-
- TP_PROTO(const struct bpf_map *map, int ufd,
- const void *key, const void *key_next),
-
- TP_ARGS(map, ufd, key, key_next),
-
- TP_STRUCT__entry(
- __field(u32, type)
- __field(u32, key_len)
- __dynamic_array(u8, key, map->key_size)
- __dynamic_array(u8, nxt, map->key_size)
- __field(bool, key_trunc)
- __field(bool, key_null)
- __field(int, ufd)
- ),
-
- TP_fast_assign(
- if (key)
- memcpy(__get_dynamic_array(key), key, map->key_size);
- __entry->key_null = !key;
- memcpy(__get_dynamic_array(nxt), key_next, map->key_size);
- __entry->type = map->map_type;
- __entry->key_len = min(map->key_size, 16U);
- __entry->key_trunc = map->key_size != __entry->key_len;
- __entry->ufd = ufd;
- ),
-
- TP_printk("map type=%s ufd=%d key=[%s%s] next=[%s%s]",
- __print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
- __entry->ufd,
- __entry->key_null ? "NULL" : __print_hex(__get_dynamic_array(key),
- __entry->key_len),
- __entry->key_trunc && !__entry->key_null ? " ..." : "",
- __print_hex(__get_dynamic_array(nxt), __entry->key_len),
- __entry->key_trunc ? " ..." : "")
-);
-#endif /* CONFIG_BPF_SYSCALL */
-#endif /* _TRACE_BPF_H */
-
-#include <trace/define_trace.h>
diff --git a/include/trace/events/fib.h b/include/trace/events/fib.h
index 81b7e98..9763cdd 100644
--- a/include/trace/events/fib.h
+++ b/include/trace/events/fib.h
@@ -12,12 +12,14 @@
TRACE_EVENT(fib_table_lookup,
- TP_PROTO(u32 tb_id, const struct flowi4 *flp),
+ TP_PROTO(u32 tb_id, const struct flowi4 *flp,
+ const struct fib_nh *nh, int err),
- TP_ARGS(tb_id, flp),
+ TP_ARGS(tb_id, flp, nh, err),
TP_STRUCT__entry(
__field( u32, tb_id )
+ __field( int, err )
__field( int, oif )
__field( int, iif )
__field( __u8, tos )
@@ -25,12 +27,19 @@
__field( __u8, flags )
__array( __u8, src, 4 )
__array( __u8, dst, 4 )
+ __array( __u8, gw, 4 )
+ __array( __u8, saddr, 4 )
+ __field( u16, sport )
+ __field( u16, dport )
+ __field( u8, proto )
+ __dynamic_array(char, name, IFNAMSIZ )
),
TP_fast_assign(
__be32 *p32;
__entry->tb_id = tb_id;
+ __entry->err = err;
__entry->oif = flp->flowi4_oif;
__entry->iif = flp->flowi4_iif;
__entry->tos = flp->flowi4_tos;
@@ -42,71 +51,41 @@
p32 = (__be32 *) __entry->dst;
*p32 = flp->daddr;
+
+ __entry->proto = flp->flowi4_proto;
+ if (__entry->proto == IPPROTO_TCP ||
+ __entry->proto == IPPROTO_UDP) {
+ __entry->sport = ntohs(flp->fl4_sport);
+ __entry->dport = ntohs(flp->fl4_dport);
+ } else {
+ __entry->sport = 0;
+ __entry->dport = 0;
+ }
+
+ if (nh) {
+ p32 = (__be32 *) __entry->saddr;
+ *p32 = nh->nh_saddr;
+
+ p32 = (__be32 *) __entry->gw;
+ *p32 = nh->nh_gw;
+
+ __assign_str(name, nh->nh_dev ? nh->nh_dev->name : "-");
+ } else {
+ p32 = (__be32 *) __entry->saddr;
+ *p32 = 0;
+
+ p32 = (__be32 *) __entry->gw;
+ *p32 = 0;
+
+ __assign_str(name, "-");
+ }
),
- TP_printk("table %u oif %d iif %d src %pI4 dst %pI4 tos %d scope %d flags %x",
- __entry->tb_id, __entry->oif, __entry->iif,
- __entry->src, __entry->dst, __entry->tos, __entry->scope,
- __entry->flags)
-);
-
-TRACE_EVENT(fib_table_lookup_nh,
-
- TP_PROTO(const struct fib_nh *nh),
-
- TP_ARGS(nh),
-
- TP_STRUCT__entry(
- __string( name, nh->nh_dev->name)
- __field( int, oif )
- __array( __u8, src, 4 )
- ),
-
- TP_fast_assign(
- __be32 *p32 = (__be32 *) __entry->src;
-
- __assign_str(name, nh->nh_dev ? nh->nh_dev->name : "not set");
- __entry->oif = nh->nh_oif;
- *p32 = nh->nh_saddr;
- ),
-
- TP_printk("nexthop dev %s oif %d src %pI4",
- __get_str(name), __entry->oif, __entry->src)
-);
-
-TRACE_EVENT(fib_validate_source,
-
- TP_PROTO(const struct net_device *dev, const struct flowi4 *flp),
-
- TP_ARGS(dev, flp),
-
- TP_STRUCT__entry(
- __string( name, dev->name )
- __field( int, oif )
- __field( int, iif )
- __field( __u8, tos )
- __array( __u8, src, 4 )
- __array( __u8, dst, 4 )
- ),
-
- TP_fast_assign(
- __be32 *p32;
-
- __assign_str(name, dev ? dev->name : "not set");
- __entry->oif = flp->flowi4_oif;
- __entry->iif = flp->flowi4_iif;
- __entry->tos = flp->flowi4_tos;
-
- p32 = (__be32 *) __entry->src;
- *p32 = flp->saddr;
-
- p32 = (__be32 *) __entry->dst;
- *p32 = flp->daddr;
- ),
-
- TP_printk("dev %s oif %d iif %d tos %d src %pI4 dst %pI4",
- __get_str(name), __entry->oif, __entry->iif, __entry->tos,
- __entry->src, __entry->dst)
+ TP_printk("table %u oif %d iif %d proto %u %pI4/%u -> %pI4/%u tos %d scope %d flags %x ==> dev %s gw %pI4 src %pI4 err %d",
+ __entry->tb_id, __entry->oif, __entry->iif, __entry->proto,
+ __entry->src, __entry->sport, __entry->dst, __entry->dport,
+ __entry->tos, __entry->scope, __entry->flags,
+ __get_str(name), __entry->gw, __entry->saddr, __entry->err)
);
#endif /* _TRACE_FIB_H */
diff --git a/include/trace/events/fib6.h b/include/trace/events/fib6.h
index 7e8d48a..b088b54 100644
--- a/include/trace/events/fib6.h
+++ b/include/trace/events/fib6.h
@@ -12,14 +12,14 @@
TRACE_EVENT(fib6_table_lookup,
- TP_PROTO(const struct net *net, const struct rt6_info *rt,
+ TP_PROTO(const struct net *net, const struct fib6_info *f6i,
struct fib6_table *table, const struct flowi6 *flp),
- TP_ARGS(net, rt, table, flp),
+ TP_ARGS(net, f6i, table, flp),
TP_STRUCT__entry(
__field( u32, tb_id )
-
+ __field( int, err )
__field( int, oif )
__field( int, iif )
__field( __u8, tos )
@@ -27,7 +27,10 @@
__field( __u8, flags )
__array( __u8, src, 16 )
__array( __u8, dst, 16 )
-
+ __field( u16, sport )
+ __field( u16, dport )
+ __field( u8, proto )
+ __field( u8, rt_type )
__dynamic_array( char, name, IFNAMSIZ )
__array( __u8, gw, 16 )
),
@@ -36,6 +39,7 @@
struct in6_addr *in6;
__entry->tb_id = table->tb6_id;
+ __entry->err = ip6_rt_type_to_error(f6i->fib6_type);
__entry->oif = flp->flowi6_oif;
__entry->iif = flp->flowi6_iif;
__entry->tos = ip6_tclass(flp->flowlabel);
@@ -48,27 +52,38 @@
in6 = (struct in6_addr *)__entry->dst;
*in6 = flp->daddr;
- if (rt->rt6i_idev) {
- __assign_str(name, rt->rt6i_idev->dev->name);
+ __entry->proto = flp->flowi6_proto;
+ if (__entry->proto == IPPROTO_TCP ||
+ __entry->proto == IPPROTO_UDP) {
+ __entry->sport = ntohs(flp->fl6_sport);
+ __entry->dport = ntohs(flp->fl6_dport);
} else {
- __assign_str(name, "");
+ __entry->sport = 0;
+ __entry->dport = 0;
}
- if (rt == net->ipv6.ip6_null_entry) {
+
+ if (f6i->fib6_nh.nh_dev) {
+ __assign_str(name, f6i->fib6_nh.nh_dev);
+ } else {
+ __assign_str(name, "-");
+ }
+ if (f6i == net->ipv6.fib6_null_entry) {
struct in6_addr in6_zero = {};
in6 = (struct in6_addr *)__entry->gw;
*in6 = in6_zero;
- } else if (rt) {
+ } else if (f6i) {
in6 = (struct in6_addr *)__entry->gw;
- *in6 = rt->rt6i_gateway;
+ *in6 = f6i->fib6_nh.nh_gw;
}
),
- TP_printk("table %3u oif %d iif %d src %pI6c dst %pI6c tos %d scope %d flags %x ==> dev %s gw %pI6c",
- __entry->tb_id, __entry->oif, __entry->iif,
- __entry->src, __entry->dst, __entry->tos, __entry->scope,
- __entry->flags, __get_str(name), __entry->gw)
+ TP_printk("table %3u oif %d iif %d proto %u %pI6c/%u -> %pI6c/%u tos %d scope %d flags %x ==> dev %s gw %pI6c err %d",
+ __entry->tb_id, __entry->oif, __entry->iif, __entry->proto,
+ __entry->src, __entry->sport, __entry->dst, __entry->dport,
+ __entry->tos, __entry->scope, __entry->flags,
+ __get_str(name), __entry->gw, __entry->err)
);
#endif /* _TRACE_FIB6_H */
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 878b2be..c1a5284 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -10,6 +10,7 @@
#include <linux/tracepoint.h>
#include <net/ipv6.h>
#include <net/tcp.h>
+#include <linux/sock_diag.h>
#define TP_STORE_V4MAPPED(__entry, saddr, daddr) \
do { \
@@ -113,7 +114,7 @@
*/
DECLARE_EVENT_CLASS(tcp_event_sk,
- TP_PROTO(const struct sock *sk),
+ TP_PROTO(struct sock *sk),
TP_ARGS(sk),
@@ -125,6 +126,7 @@
__array(__u8, daddr, 4)
__array(__u8, saddr_v6, 16)
__array(__u8, daddr_v6, 16)
+ __field(__u64, sock_cookie)
),
TP_fast_assign(
@@ -144,73 +146,36 @@
TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
+
+ __entry->sock_cookie = sock_gen_cookie(sk);
),
- TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
+ TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c sock_cookie=%llx",
__entry->sport, __entry->dport,
__entry->saddr, __entry->daddr,
- __entry->saddr_v6, __entry->daddr_v6)
+ __entry->saddr_v6, __entry->daddr_v6,
+ __entry->sock_cookie)
);
DEFINE_EVENT(tcp_event_sk, tcp_receive_reset,
- TP_PROTO(const struct sock *sk),
+ TP_PROTO(struct sock *sk),
TP_ARGS(sk)
);
DEFINE_EVENT(tcp_event_sk, tcp_destroy_sock,
- TP_PROTO(const struct sock *sk),
+ TP_PROTO(struct sock *sk),
TP_ARGS(sk)
);
-TRACE_EVENT(tcp_set_state,
+DEFINE_EVENT(tcp_event_sk, tcp_rcv_space_adjust,
- TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
+ TP_PROTO(struct sock *sk),
- TP_ARGS(sk, oldstate, newstate),
-
- TP_STRUCT__entry(
- __field(const void *, skaddr)
- __field(int, oldstate)
- __field(int, newstate)
- __field(__u16, sport)
- __field(__u16, dport)
- __array(__u8, saddr, 4)
- __array(__u8, daddr, 4)
- __array(__u8, saddr_v6, 16)
- __array(__u8, daddr_v6, 16)
- ),
-
- TP_fast_assign(
- struct inet_sock *inet = inet_sk(sk);
- __be32 *p32;
-
- __entry->skaddr = sk;
- __entry->oldstate = oldstate;
- __entry->newstate = newstate;
-
- __entry->sport = ntohs(inet->inet_sport);
- __entry->dport = ntohs(inet->inet_dport);
-
- p32 = (__be32 *) __entry->saddr;
- *p32 = inet->inet_saddr;
-
- p32 = (__be32 *) __entry->daddr;
- *p32 = inet->inet_daddr;
-
- TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
- sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
- ),
-
- TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
- __entry->sport, __entry->dport,
- __entry->saddr, __entry->daddr,
- __entry->saddr_v6, __entry->daddr_v6,
- show_tcp_state_name(__entry->oldstate),
- show_tcp_state_name(__entry->newstate))
+ TP_ARGS(sk)
);
TRACE_EVENT(tcp_retransmit_synack,
@@ -279,6 +244,7 @@
__field(__u32, snd_wnd)
__field(__u32, srtt)
__field(__u32, rcv_wnd)
+ __field(__u64, sock_cookie)
),
TP_fast_assign(
@@ -303,15 +269,14 @@
__entry->rcv_wnd = tp->rcv_wnd;
__entry->ssthresh = tcp_current_ssthresh(sk);
__entry->srtt = tp->srtt_us >> 3;
+ __entry->sock_cookie = sock_gen_cookie(sk);
),
- TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x "
- "snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u "
- "rcv_wnd=%u",
+ TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u rcv_wnd=%u sock_cookie=%llx",
__entry->saddr, __entry->daddr, __entry->mark,
__entry->length, __entry->snd_nxt, __entry->snd_una,
__entry->snd_cwnd, __entry->ssthresh, __entry->snd_wnd,
- __entry->srtt, __entry->rcv_wnd)
+ __entry->srtt, __entry->rcv_wnd, __entry->sock_cookie)
);
#endif /* _TRACE_TCP_H */
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 8989a92..1ecf4c6 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -138,11 +138,18 @@
__entry->map_id, __entry->map_index)
);
+#ifndef __DEVMAP_OBJ_TYPE
+#define __DEVMAP_OBJ_TYPE
+struct _bpf_dtab_netdev {
+ struct net_device *dev;
+};
+#endif /* __DEVMAP_OBJ_TYPE */
+
#define devmap_ifindex(fwd, map) \
(!fwd ? 0 : \
(!map ? 0 : \
((map->map_type == BPF_MAP_TYPE_DEVMAP) ? \
- ((struct net_device *)fwd)->ifindex : 0)))
+ ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)))
#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \
trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \
@@ -222,6 +229,47 @@
__entry->to_cpu)
);
+TRACE_EVENT(xdp_devmap_xmit,
+
+ TP_PROTO(const struct bpf_map *map, u32 map_index,
+ int sent, int drops,
+ const struct net_device *from_dev,
+ const struct net_device *to_dev, int err),
+
+ TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err),
+
+ TP_STRUCT__entry(
+ __field(int, map_id)
+ __field(u32, act)
+ __field(u32, map_index)
+ __field(int, drops)
+ __field(int, sent)
+ __field(int, from_ifindex)
+ __field(int, to_ifindex)
+ __field(int, err)
+ ),
+
+ TP_fast_assign(
+ __entry->map_id = map->id;
+ __entry->act = XDP_REDIRECT;
+ __entry->map_index = map_index;
+ __entry->drops = drops;
+ __entry->sent = sent;
+ __entry->from_ifindex = from_dev->ifindex;
+ __entry->to_ifindex = to_dev->ifindex;
+ __entry->err = err;
+ ),
+
+ TP_printk("ndo_xdp_xmit"
+ " map_id=%d map_index=%d action=%s"
+ " sent=%d drops=%d"
+ " from_ifindex=%d to_ifindex=%d err=%d",
+ __entry->map_id, __entry->map_index,
+ __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
+ __entry->sent, __entry->drops,
+ __entry->from_ifindex, __entry->to_ifindex, __entry->err)
+);
+
#endif /* _TRACE_XDP_H */
#include <trace/define_trace.h>
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec897..9b8c6e3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -95,6 +95,9 @@
BPF_OBJ_GET_INFO_BY_FD,
BPF_PROG_QUERY,
BPF_RAW_TRACEPOINT_OPEN,
+ BPF_BTF_LOAD,
+ BPF_BTF_GET_FD_BY_ID,
+ BPF_TASK_FD_QUERY,
};
enum bpf_map_type {
@@ -115,6 +118,8 @@
BPF_MAP_TYPE_DEVMAP,
BPF_MAP_TYPE_SOCKMAP,
BPF_MAP_TYPE_CPUMAP,
+ BPF_MAP_TYPE_XSKMAP,
+ BPF_MAP_TYPE_SOCKHASH,
};
enum bpf_prog_type {
@@ -137,6 +142,7 @@
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+ BPF_PROG_TYPE_LWT_SEG6LOCAL,
};
enum bpf_attach_type {
@@ -279,6 +285,9 @@
*/
char map_name[BPF_OBJ_NAME_LEN];
__u32 map_ifindex; /* ifindex of netdev to create on */
+ __u32 btf_fd; /* fd pointing to a BTF type data */
+ __u32 btf_key_type_id; /* BTF type_id of the key */
+ __u32 btf_value_type_id; /* BTF type_id of the value */
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -339,6 +348,7 @@
__u32 start_id;
__u32 prog_id;
__u32 map_id;
+ __u32 btf_id;
};
__u32 next_id;
__u32 open_flags;
@@ -363,398 +373,1637 @@
__u64 name;
__u32 prog_fd;
} raw_tracepoint;
+
+ struct { /* anonymous struct for BPF_BTF_LOAD */
+ __aligned_u64 btf;
+ __aligned_u64 btf_log_buf;
+ __u32 btf_size;
+ __u32 btf_log_size;
+ __u32 btf_log_level;
+ };
+
+ struct {
+ __u32 pid; /* input: pid */
+ __u32 fd; /* input: fd */
+ __u32 flags; /* input: flags */
+ __u32 buf_len; /* input/output: buf len */
+ __aligned_u64 buf; /* input/output:
+ * tp_name for tracepoint
+ * symbol for kprobe
+ * filename for uprobe
+ */
+ __u32 prog_id; /* output: prod_id */
+ __u32 fd_type; /* output: BPF_FD_TYPE_* */
+ __u64 probe_offset; /* output: probe_offset */
+ __u64 probe_addr; /* output: probe_addr */
+ } task_fd_query;
} __attribute__((aligned(8)));
-/* BPF helper function descriptions:
+/* The description below is an attempt at providing documentation to eBPF
+ * developers about the multiple available eBPF helper functions. It can be
+ * parsed and used to produce a manual page. The workflow is the following,
+ * and requires the rst2man utility:
*
- * void *bpf_map_lookup_elem(&map, &key)
- * Return: Map value or NULL
+ * $ ./scripts/bpf_helpers_doc.py \
+ * --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
+ * $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
+ * $ man /tmp/bpf-helpers.7
*
- * int bpf_map_update_elem(&map, &key, &value, flags)
- * Return: 0 on success or negative error
+ * Note that in order to produce this external documentation, some RST
+ * formatting is used in the descriptions to get "bold" and "italics" in
+ * manual pages. Also note that the few trailing white spaces are
+ * intentional, removing them would break paragraphs for rst2man.
*
- * int bpf_map_delete_elem(&map, &key)
- * Return: 0 on success or negative error
+ * Start of BPF helper function descriptions:
*
- * int bpf_probe_read(void *dst, int size, void *src)
- * Return: 0 on success or negative error
+ * void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Perform a lookup in *map* for an entry associated to *key*.
+ * Return
+ * Map value associated to *key*, or **NULL** if no entry was
+ * found.
+ *
+ * int bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
+ * Description
+ * Add or update the value of the entry associated to *key* in
+ * *map* with *value*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * Flag value **BPF_NOEXIST** cannot be used for maps of types
+ * **BPF_MAP_TYPE_ARRAY** or **BPF_MAP_TYPE_PERCPU_ARRAY** (all
+ * elements always exist), the helper would return an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_delete_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Delete entry with *key* from *map*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * Description
+ * For tracing programs, safely attempt to read *size* bytes from
+ * address *src* and store the data in *dst*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
* u64 bpf_ktime_get_ns(void)
- * Return: current ktime
+ * Description
+ * Return the time elapsed since system boot, in nanoseconds.
+ * Return
+ * Current *ktime*.
*
- * int bpf_trace_printk(const char *fmt, int fmt_size, ...)
- * Return: length of buffer written or negative error
+ * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
+ * Description
+ * This helper is a "printk()-like" facility for debugging. It
+ * prints a message defined by format *fmt* (of size *fmt_size*)
+ * to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
+ * available. It can take up to three additional **u64**
+ * arguments (as an eBPF helpers, the total number of arguments is
+ * limited to five).
*
- * u32 bpf_prandom_u32(void)
- * Return: random value
+ * Each time the helper is called, it appends a line to the trace.
+ * The format of the trace is customizable, and the exact output
+ * one will get depends on the options set in
+ * *\/sys/kernel/debug/tracing/trace_options* (see also the
+ * *README* file under the same directory). However, it usually
+ * defaults to something like:
*
- * u32 bpf_raw_smp_processor_id(void)
- * Return: SMP processor ID
+ * ::
*
- * int bpf_skb_store_bytes(skb, offset, from, len, flags)
- * store bytes into packet
- * @skb: pointer to skb
- * @offset: offset within packet from skb->mac_header
- * @from: pointer where to copy bytes from
- * @len: number of bytes to store into packet
- * @flags: bit 0 - if true, recompute skb->csum
- * other bits - reserved
- * Return: 0 on success or negative error
+ * telnet-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
*
- * int bpf_l3_csum_replace(skb, offset, from, to, flags)
- * recompute IP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where IP checksum is located
- * @from: old value of header field
- * @to: new value of header field
- * @flags: bits 0-3 - size of header field
- * other bits - reserved
- * Return: 0 on success or negative error
+ * In the above:
*
- * int bpf_l4_csum_replace(skb, offset, from, to, flags)
- * recompute TCP/UDP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where TCP/UDP checksum is located
- * @from: old value of header field
- * @to: new value of header field
- * @flags: bits 0-3 - size of header field
- * bit 4 - is pseudo header
- * other bits - reserved
- * Return: 0 on success or negative error
+ * * ``telnet`` is the name of the current task.
+ * * ``470`` is the PID of the current task.
+ * * ``001`` is the CPU number on which the task is
+ * running.
+ * * In ``.N..``, each character refers to a set of
+ * options (whether irqs are enabled, scheduling
+ * options, whether hard/softirqs are running, level of
+ * preempt_disabled respectively). **N** means that
+ * **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
+ * are set.
+ * * ``419421.045894`` is a timestamp.
+ * * ``0x00000001`` is a fake value used by BPF for the
+ * instruction pointer register.
+ * * ``<formatted msg>`` is the message formatted with
+ * *fmt*.
*
- * int bpf_tail_call(ctx, prog_array_map, index)
- * jump into another BPF program
- * @ctx: context pointer passed to next program
- * @prog_array_map: pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
- * @index: 32-bit index inside array that selects specific program to run
- * Return: 0 on success or negative error
+ * The conversion specifiers supported by *fmt* are similar, but
+ * more limited than for printk(). They are **%d**, **%i**,
+ * **%u**, **%x**, **%ld**, **%li**, **%lu**, **%lx**, **%lld**,
+ * **%lli**, **%llu**, **%llx**, **%p**, **%s**. No modifier (size
+ * of field, padding with zeroes, etc.) is available, and the
+ * helper will return **-EINVAL** (but print nothing) if it
+ * encounters an unknown specifier.
*
- * int bpf_clone_redirect(skb, ifindex, flags)
- * redirect to another netdev
- * @skb: pointer to skb
- * @ifindex: ifindex of the net device
- * @flags: bit 0 - if set, redirect to ingress instead of egress
- * other bits - reserved
- * Return: 0 on success or negative error
+ * Also, note that **bpf_trace_printk**\ () is slow, and should
+ * only be used for debugging purposes. For this reason, a notice
+ * bloc (spanning several lines) is printed to kernel logs and
+ * states that the helper should not be used "for production use"
+ * the first time this helper is used (or more precisely, when
+ * **trace_printk**\ () buffers are allocated). For passing values
+ * to user space, perf events should be preferred.
+ * Return
+ * The number of bytes written to the buffer, or a negative error
+ * in case of failure.
+ *
+ * u32 bpf_get_prandom_u32(void)
+ * Description
+ * Get a pseudo-random number.
+ *
+ * From a security point of view, this helper uses its own
+ * pseudo-random internal state, and cannot be used to infer the
+ * seed of other random functions in the kernel. However, it is
+ * essential to note that the generator used by the helper is not
+ * cryptographically secure.
+ * Return
+ * A random 32-bit unsigned value.
+ *
+ * u32 bpf_get_smp_processor_id(void)
+ * Description
+ * Get the SMP (symmetric multiprocessing) processor id. Note that
+ * all programs run with preemption disabled, which means that the
+ * SMP processor id is stable during all the execution of the
+ * program.
+ * Return
+ * The SMP id of the processor running the program.
+ *
+ * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
+ * Description
+ * Store *len* bytes from address *from* into the packet
+ * associated to *skb*, at *offset*. *flags* are a combination of
+ * **BPF_F_RECOMPUTE_CSUM** (automatically recompute the
+ * checksum for the packet after storing the bytes) and
+ * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\
+ * **->swhash** and *skb*\ **->l4hash** to 0).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 size)
+ * Description
+ * Recompute the layer 3 (e.g. IP) checksum for the packet
+ * associated to *skb*. Computation is incremental, so the helper
+ * must know the former value of the header field that was
+ * modified (*from*), the new value of this field (*to*), and the
+ * number of bytes (2 or 4) for this field, stored in *size*.
+ * Alternatively, it is possible to store the difference between
+ * the previous and the new values of the header field in *to*, by
+ * setting *from* and *size* to 0. For both methods, *offset*
+ * indicates the location of the IP checksum within the packet.
+ *
+ * This helper works in combination with **bpf_csum_diff**\ (),
+ * which does not update the checksum in-place, but offers more
+ * flexibility and can handle sizes larger than 2 or 4 for the
+ * checksum to update.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 flags)
+ * Description
+ * Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the
+ * packet associated to *skb*. Computation is incremental, so the
+ * helper must know the former value of the header field that was
+ * modified (*from*), the new value of this field (*to*), and the
+ * number of bytes (2 or 4) for this field, stored on the lowest
+ * four bits of *flags*. Alternatively, it is possible to store
+ * the difference between the previous and the new values of the
+ * header field in *to*, by setting *from* and the four lowest
+ * bits of *flags* to 0. For both methods, *offset* indicates the
+ * location of the IP checksum within the packet. In addition to
+ * the size of the field, *flags* can be added (bitwise OR) actual
+ * flags. With **BPF_F_MARK_MANGLED_0**, a null checksum is left
+ * untouched (unless **BPF_F_MARK_ENFORCE** is added as well), and
+ * for updates resulting in a null checksum the value is set to
+ * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates
+ * the checksum is to be computed against a pseudo-header.
+ *
+ * This helper works in combination with **bpf_csum_diff**\ (),
+ * which does not update the checksum in-place, but offers more
+ * flexibility and can handle sizes larger than 2 or 4 for the
+ * checksum to update.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
+ * Description
+ * This special helper is used to trigger a "tail call", or in
+ * other words, to jump into another eBPF program. The same stack
+ * frame is used (but values on stack and in registers for the
+ * caller are not accessible to the callee). This mechanism allows
+ * for program chaining, either for raising the maximum number of
+ * available eBPF instructions, or to execute given programs in
+ * conditional blocks. For security reasons, there is an upper
+ * limit to the number of successive tail calls that can be
+ * performed.
+ *
+ * Upon call of this helper, the program attempts to jump into a
+ * program referenced at index *index* in *prog_array_map*, a
+ * special map of type **BPF_MAP_TYPE_PROG_ARRAY**, and passes
+ * *ctx*, a pointer to the context.
+ *
+ * If the call succeeds, the kernel immediately runs the first
+ * instruction of the new program. This is not a function call,
+ * and it never returns to the previous program. If the call
+ * fails, then the helper has no effect, and the caller continues
+ * to run its subsequent instructions. A call can fail if the
+ * destination program for the jump does not exist (i.e. *index*
+ * is superior to the number of entries in *prog_array_map*), or
+ * if the maximum number of tail calls has been reached for this
+ * chain of programs. This limit is defined in the kernel by the
+ * macro **MAX_TAIL_CALL_CNT** (not accessible to user space),
+ * which is currently set to 32.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
+ * Description
+ * Clone and redirect the packet associated to *skb* to another
+ * net device of index *ifindex*. Both ingress and egress
+ * interfaces can be used for redirection. The **BPF_F_INGRESS**
+ * value in *flags* is used to make the distinction (ingress path
+ * is selected if the flag is present, egress path otherwise).
+ * This is the only flag supported for now.
+ *
+ * In comparison with **bpf_redirect**\ () helper,
+ * **bpf_clone_redirect**\ () has the associated cost of
+ * duplicating the packet buffer, but this can be executed out of
+ * the eBPF program. Conversely, **bpf_redirect**\ () is more
+ * efficient, but it is handled through an action code where the
+ * redirection happens only after the eBPF program has returned.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
* u64 bpf_get_current_pid_tgid(void)
- * Return: current->tgid << 32 | current->pid
+ * Return
+ * A 64-bit integer containing the current tgid and pid, and
+ * created as such:
+ * *current_task*\ **->tgid << 32 \|**
+ * *current_task*\ **->pid**.
*
* u64 bpf_get_current_uid_gid(void)
- * Return: current_gid << 32 | current_uid
+ * Return
+ * A 64-bit integer containing the current GID and UID, and
+ * created as such: *current_gid* **<< 32 \|** *current_uid*.
*
- * int bpf_get_current_comm(char *buf, int size_of_buf)
- * stores current->comm into buf
- * Return: 0 on success or negative error
+ * int bpf_get_current_comm(char *buf, u32 size_of_buf)
+ * Description
+ * Copy the **comm** attribute of the current task into *buf* of
+ * *size_of_buf*. The **comm** attribute contains the name of
+ * the executable (excluding the path) for the current task. The
+ * *size_of_buf* must be strictly positive. On success, the
+ * helper makes sure that the *buf* is NUL-terminated. On failure,
+ * it is filled with zeroes.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * u32 bpf_get_cgroup_classid(skb)
- * retrieve a proc's classid
- * @skb: pointer to skb
- * Return: classid if != 0
+ * u32 bpf_get_cgroup_classid(struct sk_buff *skb)
+ * Description
+ * Retrieve the classid for the current task, i.e. for the net_cls
+ * cgroup to which *skb* belongs.
*
- * int bpf_skb_vlan_push(skb, vlan_proto, vlan_tci)
- * Return: 0 on success or negative error
+ * This helper can be used on TC egress path, but not on ingress.
*
- * int bpf_skb_vlan_pop(skb)
- * Return: 0 on success or negative error
+ * The net_cls cgroup provides an interface to tag network packets
+ * based on a user-provided identifier for all traffic coming from
+ * the tasks belonging to the related cgroup. See also the related
+ * kernel documentation, available from the Linux sources in file
+ * *Documentation/cgroup-v1/net_cls.txt*.
*
- * int bpf_skb_get_tunnel_key(skb, key, size, flags)
- * int bpf_skb_set_tunnel_key(skb, key, size, flags)
- * retrieve or populate tunnel metadata
- * @skb: pointer to skb
- * @key: pointer to 'struct bpf_tunnel_key'
- * @size: size of 'struct bpf_tunnel_key'
- * @flags: room for future extensions
- * Return: 0 on success or negative error
+ * The Linux kernel has two versions for cgroups: there are
+ * cgroups v1 and cgroups v2. Both are available to users, who can
+ * use a mixture of them, but note that the net_cls cgroup is for
+ * cgroup v1 only. This makes it incompatible with BPF programs
+ * run on cgroups, which is a cgroup-v2-only feature (a socket can
+ * only hold data for one version of cgroups at a time).
*
- * u64 bpf_perf_event_read(map, flags)
- * read perf event counter value
- * @map: pointer to perf_event_array map
- * @flags: index of event in the map or bitmask flags
- * Return: value of perf event counter read or error code
+ * This helper is only available is the kernel was compiled with
+ * the **CONFIG_CGROUP_NET_CLASSID** configuration option set to
+ * "**y**" or to "**m**".
+ * Return
+ * The classid, or 0 for the default unconfigured classid.
*
- * int bpf_redirect(ifindex, flags)
- * redirect to another netdev
- * @ifindex: ifindex of the net device
- * @flags:
- * cls_bpf:
- * bit 0 - if set, redirect to ingress instead of egress
- * other bits - reserved
- * xdp_bpf:
- * all bits - reserved
- * Return: cls_bpf: TC_ACT_REDIRECT on success or TC_ACT_SHOT on error
- * xdp_bfp: XDP_REDIRECT on success or XDP_ABORT on error
- * int bpf_redirect_map(map, key, flags)
- * redirect to endpoint in map
- * @map: pointer to dev map
- * @key: index in map to lookup
- * @flags: --
- * Return: XDP_REDIRECT on success or XDP_ABORT on error
+ * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
+ * Description
+ * Push a *vlan_tci* (VLAN tag control information) of protocol
+ * *vlan_proto* to the packet associated to *skb*, then update
+ * the checksum. Note that if *vlan_proto* is different from
+ * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
+ * be **ETH_P_8021Q**.
*
- * u32 bpf_get_route_realm(skb)
- * retrieve a dst's tclassid
- * @skb: pointer to skb
- * Return: realm if != 0
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_perf_event_output(ctx, map, flags, data, size)
- * output perf raw sample
- * @ctx: struct pt_regs*
- * @map: pointer to perf_event_array map
- * @flags: index of event in the map or bitmask flags
- * @data: data on stack to be output as raw data
- * @size: size of data
- * Return: 0 on success or negative error
+ * int bpf_skb_vlan_pop(struct sk_buff *skb)
+ * Description
+ * Pop a VLAN header from the packet associated to *skb*.
*
- * int bpf_get_stackid(ctx, map, flags)
- * walk user or kernel stack and return id
- * @ctx: struct pt_regs*
- * @map: pointer to stack_trace map
- * @flags: bits 0-7 - numer of stack frames to skip
- * bit 8 - collect user stack instead of kernel
- * bit 9 - compare stacks by hash only
- * bit 10 - if two different stacks hash into the same stackid
- * discard old
- * other bits - reserved
- * Return: >= 0 stackid on success or negative error
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
- * calculate csum diff
- * @from: raw from buffer
- * @from_size: length of from buffer
- * @to: raw to buffer
- * @to_size: length of to buffer
- * @seed: optional seed
- * Return: csum result or negative error code
+ * int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
+ * Description
+ * Get tunnel metadata. This helper takes a pointer *key* to an
+ * empty **struct bpf_tunnel_key** of **size**, that will be
+ * filled with tunnel metadata for the packet associated to *skb*.
+ * The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which
+ * indicates that the tunnel is based on IPv6 protocol instead of
+ * IPv4.
*
- * int bpf_skb_get_tunnel_opt(skb, opt, size)
- * retrieve tunnel options metadata
- * @skb: pointer to skb
- * @opt: pointer to raw tunnel option data
- * @size: size of @opt
- * Return: option size
+ * The **struct bpf_tunnel_key** is an object that generalizes the
+ * principal parameters used by various tunneling protocols into a
+ * single struct. This way, it can be used to easily make a
+ * decision based on the contents of the encapsulation header,
+ * "summarized" in this struct. In particular, it holds the IP
+ * address of the remote end (IPv4 or IPv6, depending on the case)
+ * in *key*\ **->remote_ipv4** or *key*\ **->remote_ipv6**. Also,
+ * this struct exposes the *key*\ **->tunnel_id**, which is
+ * generally mapped to a VNI (Virtual Network Identifier), making
+ * it programmable together with the **bpf_skb_set_tunnel_key**\
+ * () helper.
*
- * int bpf_skb_set_tunnel_opt(skb, opt, size)
- * populate tunnel options metadata
- * @skb: pointer to skb
- * @opt: pointer to raw tunnel option data
- * @size: size of @opt
- * Return: 0 on success or negative error
+ * Let's imagine that the following code is part of a program
+ * attached to the TC ingress interface, on one end of a GRE
+ * tunnel, and is supposed to filter out all messages coming from
+ * remote ends with IPv4 address other than 10.0.0.1:
*
- * int bpf_skb_change_proto(skb, proto, flags)
- * Change protocol of the skb. Currently supported is v4 -> v6,
- * v6 -> v4 transitions. The helper will also resize the skb. eBPF
- * program is expected to fill the new headers via skb_store_bytes
- * and lX_csum_replace.
- * @skb: pointer to skb
- * @proto: new skb->protocol type
- * @flags: reserved
- * Return: 0 on success or negative error
+ * ::
*
- * int bpf_skb_change_type(skb, type)
- * Change packet type of skb.
- * @skb: pointer to skb
- * @type: new skb->pkt_type type
- * Return: 0 on success or negative error
+ * int ret;
+ * struct bpf_tunnel_key key = {};
+ *
+ * ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+ * if (ret < 0)
+ * return TC_ACT_SHOT; // drop packet
+ *
+ * if (key.remote_ipv4 != 0x0a000001)
+ * return TC_ACT_SHOT; // drop packet
+ *
+ * return TC_ACT_OK; // accept packet
*
- * int bpf_skb_under_cgroup(skb, map, index)
- * Check cgroup2 membership of skb
- * @skb: pointer to skb
- * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
- * @index: index of the cgroup in the bpf_map
- * Return:
- * == 0 skb failed the cgroup2 descendant test
- * == 1 skb succeeded the cgroup2 descendant test
- * < 0 error
+ * This interface can also be used with all encapsulation devices
+ * that can operate in "collect metadata" mode: instead of having
+ * one network device per specific configuration, the "collect
+ * metadata" mode only requires a single device where the
+ * configuration can be extracted from this helper.
*
- * u32 bpf_get_hash_recalc(skb)
- * Retrieve and possibly recalculate skb->hash.
- * @skb: pointer to skb
- * Return: hash
+ * This can be used together with various tunnels such as VXLan,
+ * Geneve, GRE or IP in IP (IPIP).
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
+ * Description
+ * Populate tunnel metadata for packet associated to *skb.* The
+ * tunnel metadata is set to the contents of *key*, of *size*. The
+ * *flags* can be set to a combination of the following values:
+ *
+ * **BPF_F_TUNINFO_IPV6**
+ * Indicate that the tunnel is based on IPv6 protocol
+ * instead of IPv4.
+ * **BPF_F_ZERO_CSUM_TX**
+ * For IPv4 packets, add a flag to tunnel metadata
+ * indicating that checksum computation should be skipped
+ * and checksum set to zeroes.
+ * **BPF_F_DONT_FRAGMENT**
+ * Add a flag to tunnel metadata indicating that the
+ * packet should not be fragmented.
+ * **BPF_F_SEQ_NUMBER**
+ * Add a flag to tunnel metadata indicating that a
+ * sequence number should be added to tunnel header before
+ * sending the packet. This flag was added for GRE
+ * encapsulation, but might be used with other protocols
+ * as well in the future.
+ *
+ * Here is a typical usage on the transmit path:
+ *
+ * ::
+ *
+ * struct bpf_tunnel_key key;
+ * populate key ...
+ * bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
+ * bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
+ *
+ * See also the description of the **bpf_skb_get_tunnel_key**\ ()
+ * helper for additional information.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
+ * Description
+ * Read the value of a perf event counter. This helper relies on a
+ * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of
+ * the perf event counter is selected when *map* is updated with
+ * perf event file descriptors. The *map* is an array whose size
+ * is the number of available CPUs, and each cell contains a value
+ * relative to one CPU. The value to retrieve is indicated by
+ * *flags*, that contains the index of the CPU to look up, masked
+ * with **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU should be retrieved.
+ *
+ * Note that before Linux 4.13, only hardware perf event can be
+ * retrieved.
+ *
+ * Also, be aware that the newer helper
+ * **bpf_perf_event_read_value**\ () is recommended over
+ * **bpf_perf_event_read**\ () in general. The latter has some ABI
+ * quirks where error and counter value are used as a return code
+ * (which is wrong to do since ranges may overlap). This issue is
+ * fixed with **bpf_perf_event_read_value**\ (), which at the same
+ * time provides more features over the **bpf_perf_event_read**\
+ * () interface. Please refer to the description of
+ * **bpf_perf_event_read_value**\ () for details.
+ * Return
+ * The value of the perf event counter read from the map, or a
+ * negative error code in case of failure.
+ *
+ * int bpf_redirect(u32 ifindex, u64 flags)
+ * Description
+ * Redirect the packet to another net device of index *ifindex*.
+ * This helper is somewhat similar to **bpf_clone_redirect**\
+ * (), except that the packet is not cloned, which provides
+ * increased performance.
+ *
+ * Except for XDP, both ingress and egress interfaces can be used
+ * for redirection. The **BPF_F_INGRESS** value in *flags* is used
+ * to make the distinction (ingress path is selected if the flag
+ * is present, egress path otherwise). Currently, XDP only
+ * supports redirection to the egress interface, and accepts no
+ * flag at all.
+ *
+ * The same effect can be attained with the more generic
+ * **bpf_redirect_map**\ (), which requires specific maps to be
+ * used but offers better performance.
+ * Return
+ * For XDP, the helper returns **XDP_REDIRECT** on success or
+ * **XDP_ABORTED** on error. For other program types, the values
+ * are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
+ * error.
+ *
+ * u32 bpf_get_route_realm(struct sk_buff *skb)
+ * Description
+ * Retrieve the realm or the route, that is to say the
+ * **tclassid** field of the destination for the *skb*. The
+ * indentifier retrieved is a user-provided tag, similar to the
+ * one used with the net_cls cgroup (see description for
+ * **bpf_get_cgroup_classid**\ () helper), but here this tag is
+ * held by a route (a destination entry), not by a task.
+ *
+ * Retrieving this identifier works with the clsact TC egress hook
+ * (see also **tc-bpf(8)**), or alternatively on conventional
+ * classful egress qdiscs, but not on TC ingress path. In case of
+ * clsact TC egress hook, this has the advantage that, internally,
+ * the destination entry has not been dropped yet in the transmit
+ * path. Therefore, the destination entry does not need to be
+ * artificially held via **netif_keep_dst**\ () for a classful
+ * qdisc until the *skb* is freed.
+ *
+ * This helper is available only if the kernel was compiled with
+ * **CONFIG_IP_ROUTE_CLASSID** configuration option.
+ * Return
+ * The realm of the route for the packet associated to *skb*, or 0
+ * if none was found.
+ *
+ * int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64 flags, void *data, u64 size)
+ * Description
+ * Write raw *data* blob into a special BPF perf event held by
+ * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This perf
+ * event must have the following attributes: **PERF_SAMPLE_RAW**
+ * as **sample_type**, **PERF_TYPE_SOFTWARE** as **type**, and
+ * **PERF_COUNT_SW_BPF_OUTPUT** as **config**.
+ *
+ * The *flags* are used to indicate the index in *map* for which
+ * the value must be put, masked with **BPF_F_INDEX_MASK**.
+ * Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**
+ * to indicate that the index of the current CPU core should be
+ * used.
+ *
+ * The value to write, of *size*, is passed through eBPF stack and
+ * pointed by *data*.
+ *
+ * The context of the program *ctx* needs also be passed to the
+ * helper.
+ *
+ * On user space, a program willing to read the values needs to
+ * call **perf_event_open**\ () on the perf event (either for
+ * one or for all CPUs) and to store the file descriptor into the
+ * *map*. This must be done before the eBPF program can send data
+ * into it. An example is available in file
+ * *samples/bpf/trace_output_user.c* in the Linux kernel source
+ * tree (the eBPF program counterpart is in
+ * *samples/bpf/trace_output_kern.c*).
+ *
+ * **bpf_perf_event_output**\ () achieves better performance
+ * than **bpf_trace_printk**\ () for sharing data with user
+ * space, and is much better suitable for streaming data from eBPF
+ * programs.
+ *
+ * Note that this helper is not restricted to tracing use cases
+ * and can be used with programs attached to TC or XDP as well,
+ * where it allows for passing data to user space listeners. Data
+ * can be:
+ *
+ * * Only custom structs,
+ * * Only the packet payload, or
+ * * A combination of both.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len)
+ * Description
+ * This helper was provided as an easy way to load data from a
+ * packet. It can be used to load *len* bytes from *offset* from
+ * the packet associated to *skb*, into the buffer pointed by
+ * *to*.
+ *
+ * Since Linux 4.7, usage of this helper has mostly been replaced
+ * by "direct packet access", enabling packet data to be
+ * manipulated with *skb*\ **->data** and *skb*\ **->data_end**
+ * pointing respectively to the first byte of packet data and to
+ * the byte after the last byte of packet data. However, it
+ * remains useful if one wishes to read large quantities of data
+ * at once from a packet into the eBPF stack.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
+ * Description
+ * Walk a user or a kernel stack and return its id. To achieve
+ * this, the helper needs *ctx*, which is a pointer to the context
+ * on which the tracing program is executed, and a pointer to a
+ * *map* of type **BPF_MAP_TYPE_STACK_TRACE**.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * a combination of the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_FAST_STACK_CMP**
+ * Compare stacks by hash only.
+ * **BPF_F_REUSE_STACKID**
+ * If two different stacks hash into the same *stackid*,
+ * discard the old one.
+ *
+ * The stack id retrieved is a 32 bit long integer handle which
+ * can be further combined with other data (including other stack
+ * ids) and used as a key into maps. This can be useful for
+ * generating a variety of graphs (such as flame graphs or off-cpu
+ * graphs).
+ *
+ * For walking a stack, this helper is an improvement over
+ * **bpf_probe_read**\ (), which can be used with unrolled loops
+ * but is not efficient and consumes a lot of eBPF instructions.
+ * Instead, **bpf_get_stackid**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=<new value>
+ *
+ * Return
+ * The positive or null stack id on success, or a negative error
+ * in case of failure.
+ *
+ * s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size, __wsum seed)
+ * Description
+ * Compute a checksum difference, from the raw buffer pointed by
+ * *from*, of length *from_size* (that must be a multiple of 4),
+ * towards the raw buffer pointed by *to*, of size *to_size*
+ * (same remark). An optional *seed* can be added to the value
+ * (this can be cascaded, the seed may come from a previous call
+ * to the helper).
+ *
+ * This is flexible enough to be used in several ways:
+ *
+ * * With *from_size* == 0, *to_size* > 0 and *seed* set to
+ * checksum, it can be used when pushing new data.
+ * * With *from_size* > 0, *to_size* == 0 and *seed* set to
+ * checksum, it can be used when removing data from a packet.
+ * * With *from_size* > 0, *to_size* > 0 and *seed* set to 0, it
+ * can be used to compute a diff. Note that *from_size* and
+ * *to_size* do not need to be equal.
+ *
+ * This helper can be used in combination with
+ * **bpf_l3_csum_replace**\ () and **bpf_l4_csum_replace**\ (), to
+ * which one can feed in the difference computed with
+ * **bpf_csum_diff**\ ().
+ * Return
+ * The checksum result, or a negative error code in case of
+ * failure.
+ *
+ * int bpf_skb_get_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
+ * Description
+ * Retrieve tunnel options metadata for the packet associated to
+ * *skb*, and store the raw tunnel option data to the buffer *opt*
+ * of *size*.
+ *
+ * This helper can be used with encapsulation devices that can
+ * operate in "collect metadata" mode (please refer to the related
+ * note in the description of **bpf_skb_get_tunnel_key**\ () for
+ * more details). A particular example where this can be used is
+ * in combination with the Geneve encapsulation protocol, where it
+ * allows for pushing (with **bpf_skb_get_tunnel_opt**\ () helper)
+ * and retrieving arbitrary TLVs (Type-Length-Value headers) from
+ * the eBPF program. This allows for full customization of these
+ * headers.
+ * Return
+ * The size of the option data retrieved.
+ *
+ * int bpf_skb_set_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
+ * Description
+ * Set tunnel options metadata for the packet associated to *skb*
+ * to the option data contained in the raw buffer *opt* of *size*.
+ *
+ * See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ * helper for additional information.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
+ * Description
+ * Change the protocol of the *skb* to *proto*. Currently
+ * supported are transition from IPv4 to IPv6, and from IPv6 to
+ * IPv4. The helper takes care of the groundwork for the
+ * transition, including resizing the socket buffer. The eBPF
+ * program is expected to fill the new headers, if any, via
+ * **skb_store_bytes**\ () and to recompute the checksums with
+ * **bpf_l3_csum_replace**\ () and **bpf_l4_csum_replace**\
+ * (). The main case for this helper is to perform NAT64
+ * operations out of an eBPF program.
+ *
+ * Internally, the GSO type is marked as dodgy so that headers are
+ * checked and segments are recalculated by the GSO/GRO engine.
+ * The size for GSO target is adapted as well.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_change_type(struct sk_buff *skb, u32 type)
+ * Description
+ * Change the packet type for the packet associated to *skb*. This
+ * comes down to setting *skb*\ **->pkt_type** to *type*, except
+ * the eBPF program does not have a write access to *skb*\
+ * **->pkt_type** beside this helper. Using a helper here allows
+ * for graceful handling of errors.
+ *
+ * The major use case is to change incoming *skb*s to
+ * **PACKET_HOST** in a programmatic way instead of having to
+ * recirculate via **redirect**\ (..., **BPF_F_INGRESS**), for
+ * example.
+ *
+ * Note that *type* only allows certain values. At this time, they
+ * are:
+ *
+ * **PACKET_HOST**
+ * Packet is for us.
+ * **PACKET_BROADCAST**
+ * Send packet to all.
+ * **PACKET_MULTICAST**
+ * Send packet to group.
+ * **PACKET_OTHERHOST**
+ * Send packet to someone else.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32 index)
+ * Description
+ * Check whether *skb* is a descendant of the cgroup2 held by
+ * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
+ * Return
+ * The return value depends on the result of the test, and can be:
+ *
+ * * 0, if the *skb* failed the cgroup2 descendant test.
+ * * 1, if the *skb* succeeded the cgroup2 descendant test.
+ * * A negative error code, if an error occurred.
+ *
+ * u32 bpf_get_hash_recalc(struct sk_buff *skb)
+ * Description
+ * Retrieve the hash of the packet, *skb*\ **->hash**. If it is
+ * not set, in particular if the hash was cleared due to mangling,
+ * recompute this hash. Later accesses to the hash can be done
+ * directly with *skb*\ **->hash**.
+ *
+ * Calling **bpf_set_hash_invalid**\ (), changing a packet
+ * prototype with **bpf_skb_change_proto**\ (), or calling
+ * **bpf_skb_store_bytes**\ () with the
+ * **BPF_F_INVALIDATE_HASH** are actions susceptible to clear
+ * the hash and to trigger a new computation for the next call to
+ * **bpf_get_hash_recalc**\ ().
+ * Return
+ * The 32-bit hash.
*
* u64 bpf_get_current_task(void)
- * Returns current task_struct
- * Return: current
+ * Return
+ * A pointer to the current task struct.
*
- * int bpf_probe_write_user(void *dst, void *src, int len)
- * safely attempt to write to a location
- * @dst: destination address in userspace
- * @src: source address on stack
- * @len: number of bytes to copy
- * Return: 0 on success or negative error
+ * int bpf_probe_write_user(void *dst, const void *src, u32 len)
+ * Description
+ * Attempt in a safe way to write *len* bytes from the buffer
+ * *src* to *dst* in memory. It only works for threads that are in
+ * user context, and *dst* must be a valid user space address.
*
- * int bpf_current_task_under_cgroup(map, index)
- * Check cgroup2 membership of current task
- * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
- * @index: index of the cgroup in the bpf_map
- * Return:
- * == 0 current failed the cgroup2 descendant test
- * == 1 current succeeded the cgroup2 descendant test
- * < 0 error
+ * This helper should not be used to implement any kind of
+ * security mechanism because of TOC-TOU attacks, but rather to
+ * debug, divert, and manipulate execution of semi-cooperative
+ * processes.
*
- * int bpf_skb_change_tail(skb, len, flags)
- * The helper will resize the skb to the given new size, to be used f.e.
- * with control messages.
- * @skb: pointer to skb
- * @len: new skb length
- * @flags: reserved
- * Return: 0 on success or negative error
+ * Keep in mind that this feature is meant for experiments, and it
+ * has a risk of crashing the system and running programs.
+ * Therefore, when an eBPF program using this helper is attached,
+ * a warning including PID and process name is printed to kernel
+ * logs.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_skb_pull_data(skb, len)
- * The helper will pull in non-linear data in case the skb is non-linear
- * and not all of len are part of the linear section. Only needed for
- * read/write with direct packet access.
- * @skb: pointer to skb
- * @len: len to make read/writeable
- * Return: 0 on success or negative error
+ * int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
+ * Description
+ * Check whether the probe is being run is the context of a given
+ * subset of the cgroup2 hierarchy. The cgroup2 to test is held by
+ * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
+ * Return
+ * The return value depends on the result of the test, and can be:
*
- * s64 bpf_csum_update(skb, csum)
- * Adds csum into skb->csum in case of CHECKSUM_COMPLETE.
- * @skb: pointer to skb
- * @csum: csum to add
- * Return: csum on success or negative error
+ * * 0, if the *skb* task belongs to the cgroup2.
+ * * 1, if the *skb* task does not belong to the cgroup2.
+ * * A negative error code, if an error occurred.
*
- * void bpf_set_hash_invalid(skb)
- * Invalidate current skb->hash.
- * @skb: pointer to skb
+ * int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
+ * Description
+ * Resize (trim or grow) the packet associated to *skb* to the
+ * new *len*. The *flags* are reserved for future usage, and must
+ * be left at zero.
*
- * int bpf_get_numa_node_id()
- * Return: Id of current NUMA node.
+ * The basic idea is that the helper performs the needed work to
+ * change the size of the packet, then the eBPF program rewrites
+ * the rest via helpers like **bpf_skb_store_bytes**\ (),
+ * **bpf_l3_csum_replace**\ (), **bpf_l3_csum_replace**\ ()
+ * and others. This helper is a slow path utility intended for
+ * replies with control messages. And because it is targeted for
+ * slow path, the helper itself can afford to be slow: it
+ * implicitly linearizes, unclones and drops offloads from the
+ * *skb*.
*
- * int bpf_skb_change_head()
- * Grows headroom of skb and adjusts MAC header offset accordingly.
- * Will extends/reallocae as required automatically.
- * May change skb data pointer and will thus invalidate any check
- * performed for direct packet access.
- * @skb: pointer to skb
- * @len: length of header to be pushed in front
- * @flags: Flags (unused for now)
- * Return: 0 on success or negative error
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_xdp_adjust_head(xdp_md, delta)
- * Adjust the xdp_md.data by delta
- * @xdp_md: pointer to xdp_md
- * @delta: An positive/negative integer to be added to xdp_md.data
- * Return: 0 on success or negative on error
+ * int bpf_skb_pull_data(struct sk_buff *skb, u32 len)
+ * Description
+ * Pull in non-linear data in case the *skb* is non-linear and not
+ * all of *len* are part of the linear section. Make *len* bytes
+ * from *skb* readable and writable. If a zero value is passed for
+ * *len*, then the whole length of the *skb* is pulled.
+ *
+ * This helper is only needed for reading and writing with direct
+ * packet access.
+ *
+ * For direct packet access, testing that offsets to access
+ * are within packet boundaries (test on *skb*\ **->data_end**) is
+ * susceptible to fail if offsets are invalid, or if the requested
+ * data is in non-linear parts of the *skb*. On failure the
+ * program can just bail out, or in the case of a non-linear
+ * buffer, use a helper to make the data available. The
+ * **bpf_skb_load_bytes**\ () helper is a first solution to access
+ * the data. Another one consists in using **bpf_skb_pull_data**
+ * to pull in once the non-linear parts, then retesting and
+ * eventually access the data.
+ *
+ * At the same time, this also makes sure the *skb* is uncloned,
+ * which is a necessary condition for direct write. As this needs
+ * to be an invariant for the write part only, the verifier
+ * detects writes and adds a prologue that is calling
+ * **bpf_skb_pull_data()** to effectively unclone the *skb* from
+ * the very beginning in case it is indeed cloned.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
+ * Description
+ * Add the checksum *csum* into *skb*\ **->csum** in case the
+ * driver has supplied a checksum for the entire packet into that
+ * field. Return an error otherwise. This helper is intended to be
+ * used in combination with **bpf_csum_diff**\ (), in particular
+ * when the checksum needs to be updated after data has been
+ * written into the packet through direct packet access.
+ * Return
+ * The checksum on success, or a negative error code in case of
+ * failure.
+ *
+ * void bpf_set_hash_invalid(struct sk_buff *skb)
+ * Description
+ * Invalidate the current *skb*\ **->hash**. It can be used after
+ * mangling on headers through direct packet access, in order to
+ * indicate that the hash is outdated and to trigger a
+ * recalculation the next time the kernel tries to access this
+ * hash or when the **bpf_get_hash_recalc**\ () helper is called.
+ *
+ * int bpf_get_numa_node_id(void)
+ * Description
+ * Return the id of the current NUMA node. The primary use case
+ * for this helper is the selection of sockets for the local NUMA
+ * node, when the program is attached to sockets using the
+ * **SO_ATTACH_REUSEPORT_EBPF** option (see also **socket(7)**),
+ * but the helper is also available to other eBPF program types,
+ * similarly to **bpf_get_smp_processor_id**\ ().
+ * Return
+ * The id of current NUMA node.
+ *
+ * int bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
+ * Description
+ * Grows headroom of packet associated to *skb* and adjusts the
+ * offset of the MAC header accordingly, adding *len* bytes of
+ * space. It automatically extends and reallocates memory as
+ * required.
+ *
+ * This helper can be used on a layer 3 *skb* to push a MAC header
+ * for redirection into a layer 2 device.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
+ * Description
+ * Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
+ * it is possible to use a negative value for *delta*. This helper
+ * can be used to prepare the packet for pushing or popping
+ * headers.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
* int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
- * Copy a NUL terminated string from unsafe address. In case the string
- * length is smaller than size, the target is not padded with further NUL
- * bytes. In case the string length is larger than size, just count-1
- * bytes are copied and the last byte is set to NUL.
- * @dst: destination address
- * @size: maximum number of bytes to copy, including the trailing NUL
- * @unsafe_ptr: unsafe address
- * Return:
- * > 0 length of the string including the trailing NUL on success
- * < 0 error
+ * Description
+ * Copy a NUL terminated string from an unsafe address
+ * *unsafe_ptr* to *dst*. The *size* should include the
+ * terminating NUL byte. In case the string length is smaller than
+ * *size*, the target is not padded with further NUL bytes. If the
+ * string length is larger than *size*, just *size*-1 bytes are
+ * copied and the last byte is set to NUL.
*
- * u64 bpf_get_socket_cookie(skb)
- * Get the cookie for the socket stored inside sk_buff.
- * @skb: pointer to skb
- * Return: 8 Bytes non-decreasing number on success or 0 if the socket
- * field is missing inside sk_buff
+ * On success, the length of the copied string is returned. This
+ * makes this helper useful in tracing programs for reading
+ * strings, and more importantly to get its length at runtime. See
+ * the following snippet:
*
- * u32 bpf_get_socket_uid(skb)
- * Get the owner uid of the socket stored inside sk_buff.
- * @skb: pointer to skb
- * Return: uid of the socket owner on success or overflowuid if failed.
+ * ::
*
- * u32 bpf_set_hash(skb, hash)
- * Set full skb->hash.
- * @skb: pointer to skb
- * @hash: hash to set
+ * SEC("kprobe/sys_open")
+ * void bpf_sys_open(struct pt_regs *ctx)
+ * {
+ * char buf[PATHLEN]; // PATHLEN is defined to 256
+ * int res = bpf_probe_read_str(buf, sizeof(buf),
+ * ctx->di);
*
- * int bpf_setsockopt(bpf_socket, level, optname, optval, optlen)
- * Calls setsockopt. Not all opts are available, only those with
- * integer optvals plus TCP_CONGESTION.
- * Supported levels: SOL_SOCKET and IPPROTO_TCP
- * @bpf_socket: pointer to bpf_socket
- * @level: SOL_SOCKET or IPPROTO_TCP
- * @optname: option name
- * @optval: pointer to option value
- * @optlen: length of optval in bytes
- * Return: 0 or negative error
+ * // Consume buf, for example push it to
+ * // userspace via bpf_perf_event_output(); we
+ * // can use res (the string length) as event
+ * // size, after checking its boundaries.
+ * }
*
- * int bpf_getsockopt(bpf_socket, level, optname, optval, optlen)
- * Calls getsockopt. Not all opts are available.
- * Supported levels: IPPROTO_TCP
- * @bpf_socket: pointer to bpf_socket
- * @level: IPPROTO_TCP
- * @optname: option name
- * @optval: pointer to option value
- * @optlen: length of optval in bytes
- * Return: 0 or negative error
+ * In comparison, using **bpf_probe_read()** helper here instead
+ * to read the string would require to estimate the length at
+ * compile time, and would often result in copying more memory
+ * than necessary.
*
- * int bpf_sock_ops_cb_flags_set(bpf_sock_ops, flags)
- * Set callback flags for sock_ops
- * @bpf_sock_ops: pointer to bpf_sock_ops_kern struct
- * @flags: flags value
- * Return: 0 for no error
- * -EINVAL if there is no full tcp socket
- * bits in flags that are not supported by current kernel
+ * Another useful use case is when parsing individual process
+ * arguments or individual environment variables navigating
+ * *current*\ **->mm->arg_start** and *current*\
+ * **->mm->env_start**: using this helper and the return value,
+ * one can quickly iterate at the right offset of the memory area.
+ * Return
+ * On success, the strictly positive length of the string,
+ * including the trailing NUL character. On error, a negative
+ * value.
*
- * int bpf_skb_adjust_room(skb, len_diff, mode, flags)
- * Grow or shrink room in sk_buff.
- * @skb: pointer to skb
- * @len_diff: (signed) amount of room to grow/shrink
- * @mode: operation mode (enum bpf_adj_room_mode)
- * @flags: reserved for future use
- * Return: 0 on success or negative error code
+ * u64 bpf_get_socket_cookie(struct sk_buff *skb)
+ * Description
+ * If the **struct sk_buff** pointed by *skb* has a known socket,
+ * retrieve the cookie (generated by the kernel) of this socket.
+ * If no cookie has been set yet, generate a new cookie. Once
+ * generated, the socket cookie remains stable for the life of the
+ * socket. This helper can be useful for monitoring per socket
+ * networking traffic statistics as it provides a unique socket
+ * identifier per namespace.
+ * Return
+ * A 8-byte long non-decreasing number on success, or 0 if the
+ * socket field is missing inside *skb*.
*
- * int bpf_sk_redirect_map(map, key, flags)
- * Redirect skb to a sock in map using key as a lookup key for the
- * sock in map.
- * @map: pointer to sockmap
- * @key: key to lookup sock in map
- * @flags: reserved for future use
- * Return: SK_PASS
+ * u32 bpf_get_socket_uid(struct sk_buff *skb)
+ * Return
+ * The owner UID of the socket associated to *skb*. If the socket
+ * is **NULL**, or if it is not a full socket (i.e. if it is a
+ * time-wait or a request socket instead), **overflowuid** value
+ * is returned (note that **overflowuid** might also be the actual
+ * UID value for the socket).
*
- * int bpf_sock_map_update(skops, map, key, flags)
- * @skops: pointer to bpf_sock_ops
- * @map: pointer to sockmap to update
- * @key: key to insert/update sock in map
- * @flags: same flags as map update elem
+ * u32 bpf_set_hash(struct sk_buff *skb, u32 hash)
+ * Description
+ * Set the full hash for *skb* (set the field *skb*\ **->hash**)
+ * to value *hash*.
+ * Return
+ * 0
*
- * int bpf_xdp_adjust_meta(xdp_md, delta)
- * Adjust the xdp_md.data_meta by delta
- * @xdp_md: pointer to xdp_md
- * @delta: An positive/negative integer to be added to xdp_md.data_meta
- * Return: 0 on success or negative on error
+ * int bpf_setsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **setsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **setsockopt(2)** for more information.
+ * The option value of length *optlen* is pointed by *optval*.
*
- * int bpf_perf_event_read_value(map, flags, buf, buf_size)
- * read perf event counter value and perf event enabled/running time
- * @map: pointer to perf_event_array map
- * @flags: index of event in the map or bitmask flags
- * @buf: buf to fill
- * @buf_size: size of the buf
- * Return: 0 on success or negative error code
+ * This helper actually implements a subset of **setsockopt()**.
+ * It supports the following *level*\ s:
*
- * int bpf_perf_prog_read_value(ctx, buf, buf_size)
- * read perf prog attached perf event counter and enabled/running time
- * @ctx: pointer to ctx
- * @buf: buf to fill
- * @buf_size: size of the buf
- * Return : 0 on success or negative error code
+ * * **SOL_SOCKET**, which supports the following *optname*\ s:
+ * **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
+ * **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
+ * * **IPPROTO_TCP**, which supports the following *optname*\ s:
+ * **TCP_CONGESTION**, **TCP_BPF_IW**,
+ * **TCP_BPF_SNDCWND_CLAMP**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_override_return(pt_regs, rc)
- * @pt_regs: pointer to struct pt_regs
- * @rc: the return value to set
+ * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 flags)
+ * Description
+ * Grow or shrink the room for data in the packet associated to
+ * *skb* by *len_diff*, and according to the selected *mode*.
*
- * int bpf_msg_redirect_map(map, key, flags)
- * Redirect msg to a sock in map using key as a lookup key for the
- * sock in map.
- * @map: pointer to sockmap
- * @key: key to lookup sock in map
- * @flags: reserved for future use
- * Return: SK_PASS
+ * There is a single supported mode at this time:
*
- * int bpf_bind(ctx, addr, addr_len)
- * Bind socket to address. Only binding to IP is supported, no port can be
- * set in addr.
- * @ctx: pointer to context of type bpf_sock_addr
- * @addr: pointer to struct sockaddr to bind socket to
- * @addr_len: length of sockaddr structure
- * Return: 0 on success or negative error code
+ * * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
+ * (room space is added or removed below the layer 3 header).
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the endpoint referenced by *map* at
+ * index *key*. Depending on its type, this *map* can contain
+ * references to net devices (for forwarding packets through other
+ * ports), or to CPUs (for redirecting XDP frames to another CPU;
+ * but this is only implemented for native XDP (with driver
+ * support) as of this writing).
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * When used to redirect packets to net devices, this helper
+ * provides a high performance increase over **bpf_redirect**\ ().
+ * This is due to various implementation details of the underlying
+ * mechanisms, one of which is the fact that **bpf_redirect_map**\
+ * () tries to send packet as a "bulk" to the device.
+ * Return
+ * **XDP_REDIRECT** on success, or **XDP_ABORTED** on error.
+ *
+ * int bpf_sk_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKMAP**) at index *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress path otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * Add an entry to, or update a *map* referencing sockets. The
+ * *skops* is used as a new value for the entry associated to
+ * *key*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * If the *map* has eBPF programs (parser and verdict), those will
+ * be inherited by the socket being added. If the socket is
+ * already attached to eBPF programs, this results in an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
+ * Description
+ * Adjust the address pointed by *xdp_md*\ **->data_meta** by
+ * *delta* (which can be positive or negative). Note that this
+ * operation modifies the address stored in *xdp_md*\ **->data**,
+ * so the latter must be loaded only after the helper has been
+ * called.
+ *
+ * The use of *xdp_md*\ **->data_meta** is optional and programs
+ * are not required to use it. The rationale is that when the
+ * packet is processed with XDP (e.g. as DoS filter), it is
+ * possible to push further meta data along with it before passing
+ * to the stack, and to give the guarantee that an ingress eBPF
+ * program attached as a TC classifier on the same device can pick
+ * this up for further post-processing. Since TC works with socket
+ * buffers, it remains possible to set from XDP the **mark** or
+ * **priority** pointers, or other pointers for the socket buffer.
+ * Having this scratch space generic and programmable allows for
+ * more flexibility as the user is free to store whatever meta
+ * data they need.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * Read the value of a perf event counter, and store it into *buf*
+ * of size *buf_size*. This helper relies on a *map* of type
+ * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf event
+ * counter is selected when *map* is updated with perf event file
+ * descriptors. The *map* is an array whose size is the number of
+ * available CPUs, and each cell contains a value relative to one
+ * CPU. The value to retrieve is indicated by *flags*, that
+ * contains the index of the CPU to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU should be retrieved.
+ *
+ * This helper behaves in a way close to
+ * **bpf_perf_event_read**\ () helper, save that instead of
+ * just returning the value observed, it fills the *buf*
+ * structure. This allows for additional data to be retrieved: in
+ * particular, the enabled and running times (in *buf*\
+ * **->enabled** and *buf*\ **->running**, respectively) are
+ * copied. In general, **bpf_perf_event_read_value**\ () is
+ * recommended over **bpf_perf_event_read**\ (), which has some
+ * ABI issues and provides fewer functionalities.
+ *
+ * These values are interesting, because hardware PMU (Performance
+ * Monitoring Unit) counters are limited resources. When there are
+ * more PMU based perf events opened than available counters,
+ * kernel will multiplex these events so each event gets certain
+ * percentage (but not all) of the PMU time. In case that
+ * multiplexing happens, the number of samples or counter value
+ * will not reflect the case compared to when no multiplexing
+ * occurs. This makes comparison between different runs difficult.
+ * Typically, the counter value should be normalized before
+ * comparing to other experiments. The usual normalization is done
+ * as follows.
+ *
+ * ::
+ *
+ * normalized_counter = counter * t_enabled / t_running
+ *
+ * Where t_enabled is the time enabled for event and t_running is
+ * the time running for event since last normalization. The
+ * enabled and running times are accumulated since the perf event
+ * open. To achieve scaling factor between two invocations of an
+ * eBPF program, users can can use CPU id as the key (which is
+ * typical for perf array usage model) to remember the previous
+ * value and do the calculation inside the eBPF program.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * For en eBPF program attached to a perf event, retrieve the
+ * value of the event counter associated to *ctx* and store it in
+ * the structure pointed by *buf* and of size *buf_size*. Enabled
+ * and running times are also stored in the structure (see
+ * description of helper **bpf_perf_event_read_value**\ () for
+ * more details).
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_getsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **getsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **getsockopt(2)** for more information.
+ * The retrieved value is stored in the structure pointed by
+ * *opval* and of length *optlen*.
+ *
+ * This helper actually implements a subset of **getsockopt()**.
+ * It supports the following *level*\ s:
+ *
+ * * **IPPROTO_TCP**, which supports *optname*
+ * **TCP_CONGESTION**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_override_return(struct pt_reg *regs, u64 rc)
+ * Description
+ * Used for error injection, this helper uses kprobes to override
+ * the return value of the probed function, and to set it to *rc*.
+ * The first argument is the context *regs* on which the kprobe
+ * works.
+ *
+ * This helper works by setting setting the PC (program counter)
+ * to an override function which is run in place of the original
+ * probed function. This means the probed function is not run at
+ * all. The replacement function just returns with the required
+ * value.
+ *
+ * This helper has security implications, and thus is subject to
+ * restrictions. It is only available if the kernel was compiled
+ * with the **CONFIG_BPF_KPROBE_OVERRIDE** configuration
+ * option, and in this case it only works on functions tagged with
+ * **ALLOW_ERROR_INJECTION** in the kernel code.
+ *
+ * Also, the helper is only available for the architectures having
+ * the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
+ * x86 architecture is the only one to support this feature.
+ * Return
+ * 0
+ *
+ * int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int argval)
+ * Description
+ * Attempt to set the value of the **bpf_sock_ops_cb_flags** field
+ * for the full TCP socket associated to *bpf_sock_ops* to
+ * *argval*.
+ *
+ * The primary use of this field is to determine if there should
+ * be calls to eBPF programs of type
+ * **BPF_PROG_TYPE_SOCK_OPS** at various points in the TCP
+ * code. A program of the same type can change its value, per
+ * connection and as necessary, when the connection is
+ * established. This field is directly accessible for reading, but
+ * this helper must be used for updates in order to return an
+ * error if an eBPF program tries to set a callback that is not
+ * supported in the current kernel.
+ *
+ * The supported callback values that *argval* can combine are:
+ *
+ * * **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
+ * * **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
+ * * **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
+ *
+ * Here are some examples of where one could call such eBPF
+ * program:
+ *
+ * * When RTO fires.
+ * * When a packet is retransmitted.
+ * * When the connection terminates.
+ * * When a packet is sent.
+ * * When a packet is received.
+ * Return
+ * Code **-EINVAL** if the socket is not a full TCP socket;
+ * otherwise, a positive number containing the bits that could not
+ * be set is returned (which comes down to 0 if all bits were set
+ * as required).
+ *
+ * int bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ * socket level. If the message *msg* is allowed to pass (i.e. if
+ * the verdict eBPF program returns **SK_PASS**), redirect it to
+ * the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKMAP**) at index *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress path otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
+ * Description
+ * For socket policies, apply the verdict of the eBPF program to
+ * the next *bytes* (number of bytes) of message *msg*.
+ *
+ * For example, this helper can be used in the following cases:
+ *
+ * * A single **sendmsg**\ () or **sendfile**\ () system call
+ * contains multiple logical messages that the eBPF program is
+ * supposed to read and for which it should apply a verdict.
+ * * An eBPF program only cares to read the first *bytes* of a
+ * *msg*. If the message has a large payload, then setting up
+ * and calling the eBPF program repeatedly for all bytes, even
+ * though the verdict is already known, would create unnecessary
+ * overhead.
+ *
+ * When called from within an eBPF program, the helper sets a
+ * counter internal to the BPF infrastructure, that is used to
+ * apply the last verdict to the next *bytes*. If *bytes* is
+ * smaller than the current data being processed from a
+ * **sendmsg**\ () or **sendfile**\ () system call, the first
+ * *bytes* will be sent and the eBPF program will be re-run with
+ * the pointer for start of data pointing to byte number *bytes*
+ * **+ 1**. If *bytes* is larger than the current data being
+ * processed, then the eBPF verdict will be applied to multiple
+ * **sendmsg**\ () or **sendfile**\ () calls until *bytes* are
+ * consumed.
+ *
+ * Note that if a socket closes with the internal counter holding
+ * a non-zero value, this is not a problem because data is not
+ * being buffered for *bytes* and is sent as it is received.
+ * Return
+ * 0
+ *
+ * int bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
+ * Description
+ * For socket policies, prevent the execution of the verdict eBPF
+ * program for message *msg* until *bytes* (byte number) have been
+ * accumulated.
+ *
+ * This can be used when one needs a specific number of bytes
+ * before a verdict can be assigned, even if the data spans
+ * multiple **sendmsg**\ () or **sendfile**\ () calls. The extreme
+ * case would be a user calling **sendmsg**\ () repeatedly with
+ * 1-byte long message segments. Obviously, this is bad for
+ * performance, but it is still valid. If the eBPF program needs
+ * *bytes* bytes to validate a header, this helper can be used to
+ * prevent the eBPF program to be called again until *bytes* have
+ * been accumulated.
+ * Return
+ * 0
+ *
+ * int bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64 flags)
+ * Description
+ * For socket policies, pull in non-linear data from user space
+ * for *msg* and set pointers *msg*\ **->data** and *msg*\
+ * **->data_end** to *start* and *end* bytes offsets into *msg*,
+ * respectively.
+ *
+ * If a program of type **BPF_PROG_TYPE_SK_MSG** is run on a
+ * *msg* it can only parse data that the (**data**, **data_end**)
+ * pointers have already consumed. For **sendmsg**\ () hooks this
+ * is likely the first scatterlist element. But for calls relying
+ * on the **sendpage** handler (e.g. **sendfile**\ ()) this will
+ * be the range (**0**, **0**) because the data is shared with
+ * user space and by default the objective is to avoid allowing
+ * user space to modify data while (or after) eBPF verdict is
+ * being decided. This helper can be used to pull in data and to
+ * set the start and end pointer to given values. Data will be
+ * copied if necessary (i.e. if data was not linear and if start
+ * and end pointers do not point to the same chunk).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int addr_len)
+ * Description
+ * Bind the socket associated to *ctx* to the address pointed by
+ * *addr*, of length *addr_len*. This allows for making outgoing
+ * connection from the desired IP address, which can be useful for
+ * example when all processes inside a cgroup should use one
+ * single IP address on a host that has multiple IP configured.
+ *
+ * This helper works for IPv4 and IPv6, TCP and UDP sockets. The
+ * domain (*addr*\ **->sa_family**) must be **AF_INET** (or
+ * **AF_INET6**). Looking for a free port to bind to can be
+ * expensive, therefore binding to port is not permitted by the
+ * helper: *addr*\ **->sin_port** (or **sin6_port**, respectively)
+ * must be set to zero.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
+ * Description
+ * Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
+ * only possible to shrink the packet as of this writing,
+ * therefore *delta* must be a negative integer.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_get_xfrm_state(struct sk_buff *skb, u32 index, struct bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
+ * Description
+ * Retrieve the XFRM state (IP transform framework, see also
+ * **ip-xfrm(8)**) at *index* in XFRM "security path" for *skb*.
+ *
+ * The retrieved value is stored in the **struct bpf_xfrm_state**
+ * pointed by *xfrm_state* and of length *size*.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * This helper is available only if the kernel was compiled with
+ * **CONFIG_XFRM** configuration option.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
+ * Description
+ * Return a user or a kernel stack in bpf program provided buffer.
+ * To achieve this, the helper needs *ctx*, which is a pointer
+ * to the context on which the tracing program is executed.
+ * To store the stacktrace, the bpf program provides *buf* with
+ * a nonnegative *size*.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_USER_BUILD_ID**
+ * Collect buildid+offset instead of ips for user stack,
+ * only valid if **BPF_F_USER_STACK** is also specified.
+ *
+ * **bpf_get_stack**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
+ * to sufficient large buffer size. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=<new value>
+ *
+ * Return
+ * a non-negative value equal to or less than size on success, or
+ * a negative error in case of failure.
+ *
+ * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void *to, u32 len, u32 start_header)
+ * Description
+ * This helper is similar to **bpf_skb_load_bytes**\ () in that
+ * it provides an easy way to load *len* bytes from *offset*
+ * from the packet associated to *skb*, into the buffer pointed
+ * by *to*. The difference to **bpf_skb_load_bytes**\ () is that
+ * a fifth argument *start_header* exists in order to select a
+ * base offset to start from. *start_header* can be one of:
+ *
+ * **BPF_HDR_START_MAC**
+ * Base offset to load data from is *skb*'s mac header.
+ * **BPF_HDR_START_NET**
+ * Base offset to load data from is *skb*'s network header.
+ *
+ * In general, "direct packet access" is the preferred method to
+ * access packet data, however, this helper is in particular useful
+ * in socket filters where *skb*\ **->data** does not always point
+ * to the start of the mac header and where "direct packet access"
+ * is not available.
+ *
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags)
+ * Description
+ * Do FIB lookup in kernel tables using parameters in *params*.
+ * If lookup is successful and result shows packet is to be
+ * forwarded, the neighbor tables are searched for the nexthop.
+ * If successful (ie., FIB lookup shows forwarding and nexthop
+ * is resolved), the nexthop address is returned in ipv4_dst,
+ * ipv6_dst or mpls_out based on family, smac is set to mac
+ * address of egress device, dmac is set to nexthop mac address,
+ * rt_metric is set to metric from route.
+ *
+ * *plen* argument is the size of the passed in struct.
+ * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags:
+ *
+ * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs
+ * full lookup using FIB rules
+ * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress
+ * perspective (default is ingress)
+ *
+ * *ctx* is either **struct xdp_md** for XDP programs or
+ * **struct sk_buff** tc cls_act programs.
+ *
+ * Return
+ * Egress device index on success, 0 if packet needs to continue
+ * up the stack for further processing or a negative error in case
+ * of failure.
+ *
+ * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * Add an entry to, or update a sockhash *map* referencing sockets.
+ * The *skops* is used as a new value for the entry associated to
+ * *key*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * If the *map* has eBPF programs (parser and verdict), those will
+ * be inherited by the socket being added. If the socket is
+ * already attached to eBPF programs, this results in an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ * socket level. If the message *msg* is allowed to pass (i.e. if
+ * the verdict eBPF program returns **SK_PASS**), redirect it to
+ * the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress path otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ * skb socket level. If the sk_buff *skb* is allowed to pass (i.e.
+ * if the verdeict eBPF program returns **SK_PASS**), redirect it
+ * to the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+ * Description
+ * Encapsulate the packet associated to *skb* within a Layer 3
+ * protocol header. This header is provided in the buffer at
+ * address *hdr*, with *len* its size in bytes. *type* indicates
+ * the protocol of the header and can be one of:
+ *
+ * **BPF_LWT_ENCAP_SEG6**
+ * IPv6 encapsulation with Segment Routing Header
+ * (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
+ * the IPv6 header is computed by the kernel.
+ * **BPF_LWT_ENCAP_SEG6_INLINE**
+ * Only works if *skb* contains an IPv6 packet. Insert a
+ * Segment Routing Header (**struct ipv6_sr_hdr**) inside
+ * the IPv6 header.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len)
+ * Description
+ * Store *len* bytes from address *from* into the packet
+ * associated to *skb*, at *offset*. Only the flags, tag and TLVs
+ * inside the outermost IPv6 Segment Routing Header can be
+ * modified through this helper.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
+ * Description
+ * Adjust the size allocated to TLVs in the outermost IPv6
+ * Segment Routing Header contained in the packet associated to
+ * *skb*, at position *offset* by *delta* bytes. Only offsets
+ * after the segments are accepted. *delta* can be as well
+ * positive (growing) as negative (shrinking).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param, u32 param_len)
+ * Description
+ * Apply an IPv6 Segment Routing action of type *action* to the
+ * packet associated to *skb*. Each action takes a parameter
+ * contained at address *param*, and of length *param_len* bytes.
+ * *action* can be one of:
+ *
+ * **SEG6_LOCAL_ACTION_END_X**
+ * End.X action: Endpoint with Layer-3 cross-connect.
+ * Type of *param*: **struct in6_addr**.
+ * **SEG6_LOCAL_ACTION_END_T**
+ * End.T action: Endpoint with specific IPv6 table lookup.
+ * Type of *param*: **int**.
+ * **SEG6_LOCAL_ACTION_END_B6**
+ * End.B6 action: Endpoint bound to an SRv6 policy.
+ * Type of param: **struct ipv6_sr_hdr**.
+ * **SEG6_LOCAL_ACTION_END_B6_ENCAP**
+ * End.B6.Encap action: Endpoint bound to an SRv6
+ * encapsulation policy.
+ * Type of param: **struct ipv6_sr_hdr**.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -821,7 +2070,19 @@
FN(msg_apply_bytes), \
FN(msg_cork_bytes), \
FN(msg_pull_data), \
- FN(bind),
+ FN(bind), \
+ FN(xdp_adjust_tail), \
+ FN(skb_get_xfrm_state), \
+ FN(get_stack), \
+ FN(skb_load_bytes_relative), \
+ FN(fib_lookup), \
+ FN(sock_hash_update), \
+ FN(msg_redirect_hash), \
+ FN(sk_redirect_hash), \
+ FN(lwt_push_encap), \
+ FN(lwt_seg6_store_bytes), \
+ FN(lwt_seg6_adjust_srh), \
+ FN(lwt_seg6_action),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -855,11 +2116,14 @@
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
#define BPF_F_TUNINFO_IPV6 (1ULL << 0)
-/* BPF_FUNC_get_stackid flags. */
+/* flags for both BPF_FUNC_get_stackid and BPF_FUNC_get_stack. */
#define BPF_F_SKIP_FIELD_MASK 0xffULL
#define BPF_F_USER_STACK (1ULL << 8)
+/* flags used by BPF_FUNC_get_stackid only. */
#define BPF_F_FAST_STACK_CMP (1ULL << 9)
#define BPF_F_REUSE_STACKID (1ULL << 10)
+/* flags used by BPF_FUNC_get_stack only. */
+#define BPF_F_USER_BUILD_ID (1ULL << 11)
/* BPF_FUNC_skb_set_tunnel_key flags. */
#define BPF_F_ZERO_CSUM_TX (1ULL << 1)
@@ -879,6 +2143,18 @@
BPF_ADJ_ROOM_NET,
};
+/* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
+enum bpf_hdr_start_off {
+ BPF_HDR_START_MAC,
+ BPF_HDR_START_NET,
+};
+
+/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
+enum bpf_lwt_encap_mode {
+ BPF_LWT_ENCAP_SEG6,
+ BPF_LWT_ENCAP_SEG6_INLINE
+};
+
/* user accessible mirror of in-kernel sk_buff.
* new fields can only be added to the end of this structure
*/
@@ -927,6 +2203,19 @@
__u32 tunnel_label;
};
+/* user accessible mirror of in-kernel xfrm_state.
+ * new fields can only be added to the end of this structure
+ */
+struct bpf_xfrm_state {
+ __u32 reqid;
+ __u32 spi; /* Stored in network byte order */
+ __u16 family;
+ union {
+ __u32 remote_ipv4; /* Stored in network byte order */
+ __u32 remote_ipv6[4]; /* Stored in network byte order */
+ };
+};
+
/* Generic BPF return codes which all BPF program types may support.
* The values are binary compatible with their TC_ACT_* counter-part to
* provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
@@ -999,6 +2288,14 @@
struct sk_msg_md {
void *data;
void *data_end;
+
+ __u32 family;
+ __u32 remote_ip4; /* Stored in network byte order */
+ __u32 local_ip4; /* Stored in network byte order */
+ __u32 remote_ip6[4]; /* Stored in network byte order */
+ __u32 local_ip6[4]; /* Stored in network byte order */
+ __u32 remote_port; /* Stored in network byte order */
+ __u32 local_port; /* stored in host byte order */
};
#define BPF_TAG_SIZE 8
@@ -1017,8 +2314,13 @@
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
__u32 ifindex;
+ __u32 gpl_compatible:1;
__u64 netns_dev;
__u64 netns_ino;
+ __u32 nr_jited_ksyms;
+ __u32 nr_jited_func_lens;
+ __aligned_u64 jited_ksyms;
+ __aligned_u64 jited_func_lens;
} __attribute__((aligned(8)));
struct bpf_map_info {
@@ -1032,6 +2334,15 @@
__u32 ifindex;
__u64 netns_dev;
__u64 netns_ino;
+ __u32 btf_id;
+ __u32 btf_key_type_id;
+ __u32 btf_value_type_id;
+} __attribute__((aligned(8)));
+
+struct bpf_btf_info {
+ __aligned_u64 btf;
+ __u32 btf_size;
+ __u32 id;
} __attribute__((aligned(8)));
/* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
@@ -1212,4 +2523,64 @@
__u64 args[0];
};
+/* DIRECT: Skip the FIB rules and go to FIB table associated with device
+ * OUTPUT: Do lookup from egress perspective; default is ingress
+ */
+#define BPF_FIB_LOOKUP_DIRECT BIT(0)
+#define BPF_FIB_LOOKUP_OUTPUT BIT(1)
+
+struct bpf_fib_lookup {
+ /* input */
+ __u8 family; /* network family, AF_INET, AF_INET6, AF_MPLS */
+
+ /* set if lookup is to consider L4 data - e.g., FIB rules */
+ __u8 l4_protocol;
+ __be16 sport;
+ __be16 dport;
+
+ /* total length of packet from network header - used for MTU check */
+ __u16 tot_len;
+ __u32 ifindex; /* L3 device index for lookup */
+
+ union {
+ /* inputs to lookup */
+ __u8 tos; /* AF_INET */
+ __be32 flowlabel; /* AF_INET6 */
+
+ /* output: metric of fib result */
+ __u32 rt_metric;
+ };
+
+ union {
+ __be32 mpls_in;
+ __be32 ipv4_src;
+ __u32 ipv6_src[4]; /* in6_addr; network order */
+ };
+
+ /* input to bpf_fib_lookup, *dst is destination address.
+ * output: bpf_fib_lookup sets to gateway address
+ */
+ union {
+ /* return for MPLS lookups */
+ __be32 mpls_out[4]; /* support up to 4 labels */
+ __be32 ipv4_dst;
+ __u32 ipv6_dst[4]; /* in6_addr; network order */
+ };
+
+ /* output */
+ __be16 h_vlan_proto;
+ __be16 h_vlan_TCI;
+ __u8 smac[6]; /* ETH_ALEN */
+ __u8 dmac[6]; /* ETH_ALEN */
+};
+
+enum bpf_task_fd_type {
+ BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
+ BPF_FD_TYPE_TRACEPOINT, /* tp name */
+ BPF_FD_TYPE_KPROBE, /* (symbol + offset) or addr */
+ BPF_FD_TYPE_KRETPROBE, /* (symbol + offset) or addr */
+ BPF_FD_TYPE_UPROBE, /* filename + offset */
+ BPF_FD_TYPE_URETPROBE, /* filename + offset */
+};
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/include/uapi/linux/bpfilter.h b/include/uapi/linux/bpfilter.h
new file mode 100644
index 0000000..2ec3cc9
--- /dev/null
+++ b/include/uapi/linux/bpfilter.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI_LINUX_BPFILTER_H
+#define _UAPI_LINUX_BPFILTER_H
+
+#include <linux/if.h>
+
+enum {
+ BPFILTER_IPT_SO_SET_REPLACE = 64,
+ BPFILTER_IPT_SO_SET_ADD_COUNTERS = 65,
+ BPFILTER_IPT_SET_MAX,
+};
+
+enum {
+ BPFILTER_IPT_SO_GET_INFO = 64,
+ BPFILTER_IPT_SO_GET_ENTRIES = 65,
+ BPFILTER_IPT_SO_GET_REVISION_MATCH = 66,
+ BPFILTER_IPT_SO_GET_REVISION_TARGET = 67,
+ BPFILTER_IPT_GET_MAX,
+};
+
+#endif /* _UAPI_LINUX_BPFILTER_H */
diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
new file mode 100644
index 0000000..0b5ddbe
--- /dev/null
+++ b/include/uapi/linux/btf.h
@@ -0,0 +1,113 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2018 Facebook */
+#ifndef _UAPI__LINUX_BTF_H__
+#define _UAPI__LINUX_BTF_H__
+
+#include <linux/types.h>
+
+#define BTF_MAGIC 0xeB9F
+#define BTF_VERSION 1
+
+struct btf_header {
+ __u16 magic;
+ __u8 version;
+ __u8 flags;
+ __u32 hdr_len;
+
+ /* All offsets are in bytes relative to the end of this header */
+ __u32 type_off; /* offset of type section */
+ __u32 type_len; /* length of type section */
+ __u32 str_off; /* offset of string section */
+ __u32 str_len; /* length of string section */
+};
+
+/* Max # of type identifier */
+#define BTF_MAX_TYPE 0x0000ffff
+/* Max offset into the string section */
+#define BTF_MAX_NAME_OFFSET 0x0000ffff
+/* Max # of struct/union/enum members or func args */
+#define BTF_MAX_VLEN 0xffff
+
+struct btf_type {
+ __u32 name_off;
+ /* "info" bits arrangement
+ * bits 0-15: vlen (e.g. # of struct's members)
+ * bits 16-23: unused
+ * bits 24-27: kind (e.g. int, ptr, array...etc)
+ * bits 28-31: unused
+ */
+ __u32 info;
+ /* "size" is used by INT, ENUM, STRUCT and UNION.
+ * "size" tells the size of the type it is describing.
+ *
+ * "type" is used by PTR, TYPEDEF, VOLATILE, CONST and RESTRICT.
+ * "type" is a type_id referring to another type.
+ */
+ union {
+ __u32 size;
+ __u32 type;
+ };
+};
+
+#define BTF_INFO_KIND(info) (((info) >> 24) & 0x0f)
+#define BTF_INFO_VLEN(info) ((info) & 0xffff)
+
+#define BTF_KIND_UNKN 0 /* Unknown */
+#define BTF_KIND_INT 1 /* Integer */
+#define BTF_KIND_PTR 2 /* Pointer */
+#define BTF_KIND_ARRAY 3 /* Array */
+#define BTF_KIND_STRUCT 4 /* Struct */
+#define BTF_KIND_UNION 5 /* Union */
+#define BTF_KIND_ENUM 6 /* Enumeration */
+#define BTF_KIND_FWD 7 /* Forward */
+#define BTF_KIND_TYPEDEF 8 /* Typedef */
+#define BTF_KIND_VOLATILE 9 /* Volatile */
+#define BTF_KIND_CONST 10 /* Const */
+#define BTF_KIND_RESTRICT 11 /* Restrict */
+#define BTF_KIND_MAX 11
+#define NR_BTF_KINDS 12
+
+/* For some specific BTF_KIND, "struct btf_type" is immediately
+ * followed by extra data.
+ */
+
+/* BTF_KIND_INT is followed by a u32 and the following
+ * is the 32 bits arrangement:
+ */
+#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
+#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
+#define BTF_INT_BITS(VAL) ((VAL) & 0x0000ffff)
+
+/* Attributes stored in the BTF_INT_ENCODING */
+#define BTF_INT_SIGNED (1 << 0)
+#define BTF_INT_CHAR (1 << 1)
+#define BTF_INT_BOOL (1 << 2)
+
+/* BTF_KIND_ENUM is followed by multiple "struct btf_enum".
+ * The exact number of btf_enum is stored in the vlen (of the
+ * info in "struct btf_type").
+ */
+struct btf_enum {
+ __u32 name_off;
+ __s32 val;
+};
+
+/* BTF_KIND_ARRAY is followed by one "struct btf_array" */
+struct btf_array {
+ __u32 type;
+ __u32 index_type;
+ __u32 nelems;
+};
+
+/* BTF_KIND_STRUCT and BTF_KIND_UNION are followed
+ * by multiple "struct btf_member". The exact number
+ * of btf_member is stored in the vlen (of the info in
+ * "struct btf_type").
+ */
+struct btf_member {
+ __u32 name_off;
+ __u32 type;
+ __u32 offset; /* offset in bits */
+};
+
+#endif /* _UAPI__LINUX_BTF_H__ */
diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h
index 68ff254..db21062 100644
--- a/include/uapi/linux/cn_proc.h
+++ b/include/uapi/linux/cn_proc.h
@@ -116,12 +116,16 @@
struct coredump_proc_event {
__kernel_pid_t process_pid;
__kernel_pid_t process_tgid;
+ __kernel_pid_t parent_pid;
+ __kernel_pid_t parent_tgid;
} coredump;
struct exit_proc_event {
__kernel_pid_t process_pid;
__kernel_pid_t process_tgid;
__u32 exit_code, exit_signal;
+ __kernel_pid_t parent_pid;
+ __kernel_pid_t parent_tgid;
} exit;
} event_data;
diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
index 2c0c645..60aa2e4 100644
--- a/include/uapi/linux/dcbnl.h
+++ b/include/uapi/linux/dcbnl.h
@@ -163,6 +163,16 @@
__u64 indications[IEEE_8021QAZ_MAX_TCS];
};
+#define IEEE_8021Q_MAX_PRIORITIES 8
+#define DCBX_MAX_BUFFERS 8
+struct dcbnl_buffer {
+ /* priority to buffer mapping */
+ __u8 prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
+ /* buffer size in Bytes */
+ __u32 buffer_size[DCBX_MAX_BUFFERS];
+ __u32 total_size;
+};
+
/* CEE DCBX std supported values */
#define CEE_DCBX_MAX_PGS 8
#define CEE_DCBX_MAX_PRIO 8
@@ -406,6 +416,7 @@
DCB_ATTR_IEEE_MAXRATE,
DCB_ATTR_IEEE_QCN,
DCB_ATTR_IEEE_QCN_STATS,
+ DCB_ATTR_DCB_BUFFER,
__DCB_ATTR_IEEE_MAX
};
#define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 1df65a4..75cb545 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -132,6 +132,16 @@
DEVLINK_ESWITCH_ENCAP_MODE_BASIC,
};
+enum devlink_port_flavour {
+ DEVLINK_PORT_FLAVOUR_PHYSICAL, /* Any kind of a port physically
+ * facing the user.
+ */
+ DEVLINK_PORT_FLAVOUR_CPU, /* CPU port */
+ DEVLINK_PORT_FLAVOUR_DSA, /* Distributed switch architecture
+ * interconnect port.
+ */
+};
+
enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -224,6 +234,10 @@
DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID, /* u64 */
DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_UNITS,/* u64 */
+ DEVLINK_ATTR_PORT_FLAVOUR, /* u16 */
+ DEVLINK_ATTR_PORT_NUMBER, /* u32 */
+ DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER, /* u32 */
+
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index e2535d6..4e12c42 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -421,6 +421,7 @@
#define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */
#define NT_ARM_SVE 0x405 /* ARM Scalable Vector Extension registers */
#define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */
+#define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */
/* Note header in a PT_NOTE section */
typedef struct elf32_note {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 68699f6..cf01b68 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -333,6 +333,7 @@
IFLA_BRPORT_BCAST_FLOOD,
IFLA_BRPORT_GROUP_FWD_MASK,
IFLA_BRPORT_NEIGH_SUPPRESS,
+ IFLA_BRPORT_ISOLATED,
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
@@ -516,6 +517,7 @@
IFLA_VXLAN_COLLECT_METADATA,
IFLA_VXLAN_LABEL,
IFLA_VXLAN_GPE,
+ IFLA_VXLAN_TTL_INHERIT,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
new file mode 100644
index 0000000..4737cfe
--- /dev/null
+++ b/include/uapi/linux/if_xdp.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * if_xdp: XDP socket user-space interface
+ * Copyright(c) 2018 Intel Corporation.
+ *
+ * Author(s): Björn Töpel <bjorn.topel@intel.com>
+ * Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef _LINUX_IF_XDP_H
+#define _LINUX_IF_XDP_H
+
+#include <linux/types.h>
+
+/* Options for the sxdp_flags field */
+#define XDP_SHARED_UMEM 1
+
+struct sockaddr_xdp {
+ __u16 sxdp_family;
+ __u16 sxdp_flags;
+ __u32 sxdp_ifindex;
+ __u32 sxdp_queue_id;
+ __u32 sxdp_shared_umem_fd;
+};
+
+struct xdp_ring_offset {
+ __u64 producer;
+ __u64 consumer;
+ __u64 desc;
+};
+
+struct xdp_mmap_offsets {
+ struct xdp_ring_offset rx;
+ struct xdp_ring_offset tx;
+ struct xdp_ring_offset fr; /* Fill */
+ struct xdp_ring_offset cr; /* Completion */
+};
+
+/* XDP socket options */
+#define XDP_MMAP_OFFSETS 1
+#define XDP_RX_RING 2
+#define XDP_TX_RING 3
+#define XDP_UMEM_REG 4
+#define XDP_UMEM_FILL_RING 5
+#define XDP_UMEM_COMPLETION_RING 6
+#define XDP_STATISTICS 7
+
+struct xdp_umem_reg {
+ __u64 addr; /* Start of packet data area */
+ __u64 len; /* Length of packet data area */
+ __u32 frame_size; /* Frame size */
+ __u32 frame_headroom; /* Frame head room */
+};
+
+struct xdp_statistics {
+ __u64 rx_dropped; /* Dropped for reasons other than invalid desc */
+ __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
+ __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
+};
+
+/* Pgoff for mmaping the rings */
+#define XDP_PGOFF_RX_RING 0
+#define XDP_PGOFF_TX_RING 0x80000000
+#define XDP_UMEM_PGOFF_FILL_RING 0x100000000
+#define XDP_UMEM_PGOFF_COMPLETION_RING 0x180000000
+
+/* Rx/Tx descriptor */
+struct xdp_desc {
+ __u32 idx;
+ __u32 len;
+ __u16 offset;
+ __u8 flags;
+ __u8 padding[5];
+};
+
+/* UMEM descriptor is __u32 */
+
+#endif /* _LINUX_IF_XDP_H */
diff --git a/include/uapi/linux/netfilter/nf_nat.h b/include/uapi/linux/netfilter/nf_nat.h
index a33000d..4a95c0d 100644
--- a/include/uapi/linux/netfilter/nf_nat.h
+++ b/include/uapi/linux/netfilter/nf_nat.h
@@ -10,6 +10,7 @@
#define NF_NAT_RANGE_PROTO_RANDOM (1 << 2)
#define NF_NAT_RANGE_PERSISTENT (1 << 3)
#define NF_NAT_RANGE_PROTO_RANDOM_FULLY (1 << 4)
+#define NF_NAT_RANGE_PROTO_OFFSET (1 << 5)
#define NF_NAT_RANGE_PROTO_RANDOM_ALL \
(NF_NAT_RANGE_PROTO_RANDOM | NF_NAT_RANGE_PROTO_RANDOM_FULLY)
@@ -17,7 +18,7 @@
#define NF_NAT_RANGE_MASK \
(NF_NAT_RANGE_MAP_IPS | NF_NAT_RANGE_PROTO_SPECIFIED | \
NF_NAT_RANGE_PROTO_RANDOM | NF_NAT_RANGE_PERSISTENT | \
- NF_NAT_RANGE_PROTO_RANDOM_FULLY)
+ NF_NAT_RANGE_PROTO_RANDOM_FULLY | NF_NAT_RANGE_PROTO_OFFSET)
struct nf_nat_ipv4_range {
unsigned int flags;
@@ -40,4 +41,13 @@
union nf_conntrack_man_proto max_proto;
};
+struct nf_nat_range2 {
+ unsigned int flags;
+ union nf_inet_addr min_addr;
+ union nf_inet_addr max_addr;
+ union nf_conntrack_man_proto min_proto;
+ union nf_conntrack_man_proto max_proto;
+ union nf_conntrack_man_proto base_proto;
+};
+
#endif /* _NETFILTER_NF_NAT_H */
diff --git a/include/uapi/linux/netfilter/nf_osf.h b/include/uapi/linux/netfilter/nf_osf.h
new file mode 100644
index 0000000..8f2f2f4
--- /dev/null
+++ b/include/uapi/linux/netfilter/nf_osf.h
@@ -0,0 +1,86 @@
+#ifndef _NF_OSF_H
+#define _NF_OSF_H
+
+#include <linux/types.h>
+
+#define MAXGENRELEN 32
+
+#define NF_OSF_GENRE (1 << 0)
+#define NF_OSF_TTL (1 << 1)
+#define NF_OSF_LOG (1 << 2)
+#define NF_OSF_INVERT (1 << 3)
+
+#define NF_OSF_LOGLEVEL_ALL 0 /* log all matched fingerprints */
+#define NF_OSF_LOGLEVEL_FIRST 1 /* log only the first matced fingerprint */
+#define NF_OSF_LOGLEVEL_ALL_KNOWN 2 /* do not log unknown packets */
+
+#define NF_OSF_TTL_TRUE 0 /* True ip and fingerprint TTL comparison */
+
+/* Do not compare ip and fingerprint TTL at all */
+#define NF_OSF_TTL_NOCHECK 2
+
+/* Wildcard MSS (kind of).
+ * It is used to implement a state machine for the different wildcard values
+ * of the MSS and window sizes.
+ */
+struct nf_osf_wc {
+ __u32 wc;
+ __u32 val;
+};
+
+/* This struct represents IANA options
+ * http://www.iana.org/assignments/tcp-parameters
+ */
+struct nf_osf_opt {
+ __u16 kind, length;
+ struct nf_osf_wc wc;
+};
+
+struct nf_osf_info {
+ char genre[MAXGENRELEN];
+ __u32 len;
+ __u32 flags;
+ __u32 loglevel;
+ __u32 ttl;
+};
+
+struct nf_osf_user_finger {
+ struct nf_osf_wc wss;
+
+ __u8 ttl, df;
+ __u16 ss, mss;
+ __u16 opt_num;
+
+ char genre[MAXGENRELEN];
+ char version[MAXGENRELEN];
+ char subtype[MAXGENRELEN];
+
+ /* MAX_IPOPTLEN is maximum if all options are NOPs or EOLs */
+ struct nf_osf_opt opt[MAX_IPOPTLEN];
+};
+
+struct nf_osf_nlmsg {
+ struct nf_osf_user_finger f;
+ struct iphdr ip;
+ struct tcphdr tcp;
+};
+
+/* Defines for IANA option kinds */
+enum iana_options {
+ OSFOPT_EOL = 0, /* End of options */
+ OSFOPT_NOP, /* NOP */
+ OSFOPT_MSS, /* Maximum segment size */
+ OSFOPT_WSO, /* Window scale option */
+ OSFOPT_SACKP, /* SACK permitted */
+ OSFOPT_SACK, /* SACK */
+ OSFOPT_ECHO,
+ OSFOPT_ECHOREPLY,
+ OSFOPT_TS, /* Timestamp option */
+ OSFOPT_POCP, /* Partial Order Connection Permitted */
+ OSFOPT_POSP, /* Partial Order Service Profile */
+
+ /* Others are not used in the current OSF */
+ OSFOPT_EMPTY = 255,
+};
+
+#endif /* _NF_OSF_H */
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 6a3d653..9c71f02 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -831,7 +831,9 @@
NFT_RT_NEXTHOP4,
NFT_RT_NEXTHOP6,
NFT_RT_TCPMSS,
+ __NFT_RT_MAX
};
+#define NFT_RT_MAX (__NFT_RT_MAX - 1)
/**
* enum nft_hash_types - nf_tables hash expression types
@@ -854,6 +856,8 @@
* @NFTA_HASH_SEED: seed value (NLA_U32)
* @NFTA_HASH_OFFSET: add this offset value to hash result (NLA_U32)
* @NFTA_HASH_TYPE: hash operation (NLA_U32: nft_hash_types)
+ * @NFTA_HASH_SET_NAME: name of the map to lookup (NLA_STRING)
+ * @NFTA_HASH_SET_ID: id of the map (NLA_U32)
*/
enum nft_hash_attributes {
NFTA_HASH_UNSPEC,
@@ -864,6 +868,8 @@
NFTA_HASH_SEED,
NFTA_HASH_OFFSET,
NFTA_HASH_TYPE,
+ NFTA_HASH_SET_NAME,
+ NFTA_HASH_SET_ID,
__NFTA_HASH_MAX,
};
#define NFTA_HASH_MAX (__NFTA_HASH_MAX - 1)
@@ -949,7 +955,9 @@
NFT_CT_DST_IP,
NFT_CT_SRC_IP6,
NFT_CT_DST_IP6,
+ __NFT_CT_MAX
};
+#define NFT_CT_MAX (__NFT_CT_MAX - 1)
/**
* enum nft_ct_attributes - nf_tables ct expression netlink attributes
@@ -1450,6 +1458,8 @@
* @NFTA_NG_MODULUS: maximum counter value (NLA_U32)
* @NFTA_NG_TYPE: operation type (NLA_U32)
* @NFTA_NG_OFFSET: offset to be added to the counter (NLA_U32)
+ * @NFTA_NG_SET_NAME: name of the map to lookup (NLA_STRING)
+ * @NFTA_NG_SET_ID: id of the map (NLA_U32)
*/
enum nft_ng_attributes {
NFTA_NG_UNSPEC,
@@ -1457,6 +1467,8 @@
NFTA_NG_MODULUS,
NFTA_NG_TYPE,
NFTA_NG_OFFSET,
+ NFTA_NG_SET_NAME,
+ NFTA_NG_SET_ID,
__NFTA_NG_MAX
};
#define NFTA_NG_MAX (__NFTA_NG_MAX - 1)
diff --git a/include/uapi/linux/netfilter/nfnetlink_conntrack.h b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
index 7798711..1d41810 100644
--- a/include/uapi/linux/netfilter/nfnetlink_conntrack.h
+++ b/include/uapi/linux/netfilter/nfnetlink_conntrack.h
@@ -262,6 +262,7 @@
enum ctattr_stats_global {
CTA_STATS_GLOBAL_UNSPEC,
CTA_STATS_GLOBAL_ENTRIES,
+ CTA_STATS_GLOBAL_MAX_ENTRIES,
__CTA_STATS_GLOBAL_MAX,
};
#define CTA_STATS_GLOBAL_MAX (__CTA_STATS_GLOBAL_MAX - 1)
diff --git a/include/uapi/linux/netfilter/xt_osf.h b/include/uapi/linux/netfilter/xt_osf.h
index dad197e..72956ec 100644
--- a/include/uapi/linux/netfilter/xt_osf.h
+++ b/include/uapi/linux/netfilter/xt_osf.h
@@ -23,101 +23,29 @@
#include <linux/types.h>
#include <linux/ip.h>
#include <linux/tcp.h>
+#include <linux/netfilter/nf_osf.h>
-#define MAXGENRELEN 32
+#define XT_OSF_GENRE NF_OSF_GENRE
+#define XT_OSF_INVERT NF_OSF_INVERT
-#define XT_OSF_GENRE (1<<0)
-#define XT_OSF_TTL (1<<1)
-#define XT_OSF_LOG (1<<2)
-#define XT_OSF_INVERT (1<<3)
+#define XT_OSF_TTL NF_OSF_TTL
+#define XT_OSF_LOG NF_OSF_LOG
-#define XT_OSF_LOGLEVEL_ALL 0 /* log all matched fingerprints */
-#define XT_OSF_LOGLEVEL_FIRST 1 /* log only the first matced fingerprint */
-#define XT_OSF_LOGLEVEL_ALL_KNOWN 2 /* do not log unknown packets */
+#define XT_OSF_LOGLEVEL_ALL NF_OSF_LOGLEVEL_ALL
+#define XT_OSF_LOGLEVEL_FIRST NF_OSF_LOGLEVEL_FIRST
+#define XT_OSF_LOGLEVEL_ALL_KNOWN NF_OSF_LOGLEVEL_ALL_KNOWN
-#define XT_OSF_TTL_TRUE 0 /* True ip and fingerprint TTL comparison */
-#define XT_OSF_TTL_LESS 1 /* Check if ip TTL is less than fingerprint one */
-#define XT_OSF_TTL_NOCHECK 2 /* Do not compare ip and fingerprint TTL at all */
+#define XT_OSF_TTL_TRUE NF_OSF_TTL_TRUE
+#define XT_OSF_TTL_NOCHECK NF_OSF_TTL_NOCHECK
-struct xt_osf_info {
- char genre[MAXGENRELEN];
- __u32 len;
- __u32 flags;
- __u32 loglevel;
- __u32 ttl;
-};
+#define XT_OSF_TTL_LESS 1 /* Check if ip TTL is less than fingerprint one */
-/*
- * Wildcard MSS (kind of).
- * It is used to implement a state machine for the different wildcard values
- * of the MSS and window sizes.
- */
-struct xt_osf_wc {
- __u32 wc;
- __u32 val;
-};
-
-/*
- * This struct represents IANA options
- * http://www.iana.org/assignments/tcp-parameters
- */
-struct xt_osf_opt {
- __u16 kind, length;
- struct xt_osf_wc wc;
-};
-
-struct xt_osf_user_finger {
- struct xt_osf_wc wss;
-
- __u8 ttl, df;
- __u16 ss, mss;
- __u16 opt_num;
-
- char genre[MAXGENRELEN];
- char version[MAXGENRELEN];
- char subtype[MAXGENRELEN];
-
- /* MAX_IPOPTLEN is maximum if all options are NOPs or EOLs */
- struct xt_osf_opt opt[MAX_IPOPTLEN];
-};
-
-struct xt_osf_nlmsg {
- struct xt_osf_user_finger f;
- struct iphdr ip;
- struct tcphdr tcp;
-};
-
-/* Defines for IANA option kinds */
-
-enum iana_options {
- OSFOPT_EOL = 0, /* End of options */
- OSFOPT_NOP, /* NOP */
- OSFOPT_MSS, /* Maximum segment size */
- OSFOPT_WSO, /* Window scale option */
- OSFOPT_SACKP, /* SACK permitted */
- OSFOPT_SACK, /* SACK */
- OSFOPT_ECHO,
- OSFOPT_ECHOREPLY,
- OSFOPT_TS, /* Timestamp option */
- OSFOPT_POCP, /* Partial Order Connection Permitted */
- OSFOPT_POSP, /* Partial Order Service Profile */
-
- /* Others are not used in the current OSF */
- OSFOPT_EMPTY = 255,
-};
-
-/*
- * Initial window size option state machine: multiple of mss, mtu or
- * plain numeric value. Can also be made as plain numeric value which
- * is not a multiple of specified value.
- */
-enum xt_osf_window_size_options {
- OSF_WSS_PLAIN = 0,
- OSF_WSS_MSS,
- OSF_WSS_MTU,
- OSF_WSS_MODULO,
- OSF_WSS_MAX,
-};
+#define xt_osf_wc nf_osf_wc
+#define xt_osf_opt nf_osf_opt
+#define xt_osf_info nf_osf_info
+#define xt_osf_user_finger nf_osf_user_finger
+#define xt_osf_finger nf_osf_finger
+#define xt_osf_nlmsg nf_osf_nlmsg
/*
* Add/remove fingerprint from the kernel.
diff --git a/include/uapi/linux/netfilter_bridge/ebtables.h b/include/uapi/linux/netfilter_bridge/ebtables.h
index 0c7dc83..3b86c14 100644
--- a/include/uapi/linux/netfilter_bridge/ebtables.h
+++ b/include/uapi/linux/netfilter_bridge/ebtables.h
@@ -191,6 +191,12 @@
unsigned char elems[0] __attribute__ ((aligned (__alignof__(struct ebt_replace))));
};
+static __inline__ struct ebt_entry_target *
+ebt_get_target(struct ebt_entry *e)
+{
+ return (void *)e + e->target_offset;
+}
+
/* {g,s}etsockopt numbers */
#define EBT_BASE_CTL 128
diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
index f3cc0ef..54ed833 100644
--- a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
+++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
@@ -17,7 +17,10 @@
#define IP6T_SRH_LAST_GT 0x0100
#define IP6T_SRH_LAST_LT 0x0200
#define IP6T_SRH_TAG 0x0400
-#define IP6T_SRH_MASK 0x07FF
+#define IP6T_SRH_PSID 0x0800
+#define IP6T_SRH_NSID 0x1000
+#define IP6T_SRH_LSID 0x2000
+#define IP6T_SRH_MASK 0x3FFF
/* Values for "mt_invflags" field in struct ip6t_srh */
#define IP6T_SRH_INV_NEXTHDR 0x0001
@@ -31,7 +34,10 @@
#define IP6T_SRH_INV_LAST_GT 0x0100
#define IP6T_SRH_INV_LAST_LT 0x0200
#define IP6T_SRH_INV_TAG 0x0400
-#define IP6T_SRH_INV_MASK 0x07FF
+#define IP6T_SRH_INV_PSID 0x0800
+#define IP6T_SRH_INV_NSID 0x1000
+#define IP6T_SRH_INV_LSID 0x2000
+#define IP6T_SRH_INV_MASK 0x3FFF
/**
* struct ip6t_srh - SRH match options
@@ -54,4 +60,37 @@
__u16 mt_invflags;
};
+/**
+ * struct ip6t_srh1 - SRH match options (revision 1)
+ * @ next_hdr: Next header field of SRH
+ * @ hdr_len: Extension header length field of SRH
+ * @ segs_left: Segments left field of SRH
+ * @ last_entry: Last entry field of SRH
+ * @ tag: Tag field of SRH
+ * @ psid_addr: Address of previous SID in SRH SID list
+ * @ nsid_addr: Address of NEXT SID in SRH SID list
+ * @ lsid_addr: Address of LAST SID in SRH SID list
+ * @ psid_msk: Mask of previous SID in SRH SID list
+ * @ nsid_msk: Mask of next SID in SRH SID list
+ * @ lsid_msk: MAsk of last SID in SRH SID list
+ * @ mt_flags: match options
+ * @ mt_invflags: Invert the sense of match options
+ */
+
+struct ip6t_srh1 {
+ __u8 next_hdr;
+ __u8 hdr_len;
+ __u8 segs_left;
+ __u8 last_entry;
+ __u16 tag;
+ struct in6_addr psid_addr;
+ struct in6_addr nsid_addr;
+ struct in6_addr lsid_addr;
+ struct in6_addr psid_msk;
+ struct in6_addr nsid_msk;
+ struct in6_addr lsid_msk;
+ __u16 mt_flags;
+ __u16 mt_invflags;
+};
+
#endif /*_IP6T_SRH_H*/
diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 271b937..28b3654 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -11,6 +11,7 @@
* Copyright 2008 Jouni Malinen <jouni.malinen@atheros.com>
* Copyright 2008 Colin McCabe <colin@cozybit.com>
* Copyright 2015-2017 Intel Deutschland GmbH
+ * Copyright (C) 2018 Intel Corporation
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
@@ -203,7 +204,8 @@
* FILS shared key authentication offload should be able to construct the
* authentication and association frames for FILS shared key authentication and
* eventually do a key derivation as per IEEE 802.11ai. The below additional
- * parameters should be given to driver in %NL80211_CMD_CONNECT.
+ * parameters should be given to driver in %NL80211_CMD_CONNECT and/or in
+ * %NL80211_CMD_UPDATE_CONNECT_PARAMS.
* %NL80211_ATTR_FILS_ERP_USERNAME - used to construct keyname_nai
* %NL80211_ATTR_FILS_ERP_REALM - used to construct keyname_nai
* %NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM - used to construct erp message
@@ -214,7 +216,8 @@
* as specified in IETF RFC 6696.
*
* When FILS shared key authentication is completed, driver needs to provide the
- * below additional parameters to userspace.
+ * below additional parameters to userspace, which can be either after setting
+ * up a connection or after roaming.
* %NL80211_ATTR_FILS_KEK - used for key renewal
* %NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM - used in further EAP-RP exchanges
* %NL80211_ATTR_PMKID - used to identify the PMKSA used/generated
@@ -2225,6 +2228,16 @@
* @NL80211_ATTR_NSS: Station's New/updated RX_NSS value notified using this
* u8 attribute. This is used with %NL80211_CMD_STA_OPMODE_CHANGED.
*
+ * @NL80211_ATTR_TXQ_STATS: TXQ statistics (nested attribute, see &enum
+ * nl80211_txq_stats)
+ * @NL80211_ATTR_TXQ_LIMIT: Total packet limit for the TXQ queues for this phy.
+ * The smaller of this and the memory limit is enforced.
+ * @NL80211_ATTR_TXQ_MEMORY_LIMIT: Total memory memory limit (in bytes) for the
+ * TXQ queues for this phy. The smaller of this and the packet limit is
+ * enforced.
+ * @NL80211_ATTR_TXQ_QUANTUM: TXQ scheduler quantum (bytes). Number of bytes
+ * a flow is assigned on each round of the DRR scheduler.
+ *
* @NUM_NL80211_ATTR: total number of nl80211_attrs available
* @NL80211_ATTR_MAX: highest attribute number currently defined
* @__NL80211_ATTR_AFTER_LAST: internal use
@@ -2659,6 +2672,11 @@
NL80211_ATTR_CONTROL_PORT_OVER_NL80211,
+ NL80211_ATTR_TXQ_STATS,
+ NL80211_ATTR_TXQ_LIMIT,
+ NL80211_ATTR_TXQ_MEMORY_LIMIT,
+ NL80211_ATTR_TXQ_QUANTUM,
+
/* add attributes here, update the policy in nl80211.c */
__NL80211_ATTR_AFTER_LAST,
@@ -2982,6 +3000,8 @@
* received from the station (u64, usec)
* @NL80211_STA_INFO_PAD: attribute used for padding for 64-bit alignment
* @NL80211_STA_INFO_ACK_SIGNAL: signal strength of the last ACK frame(u8, dBm)
+ * @NL80211_STA_INFO_DATA_ACK_SIGNAL_AVG: avg signal strength of (data)
+ * ACK frame (s8, dBm)
* @__NL80211_STA_INFO_AFTER_LAST: internal
* @NL80211_STA_INFO_MAX: highest possible station info attribute
*/
@@ -3021,6 +3041,7 @@
NL80211_STA_INFO_RX_DURATION,
NL80211_STA_INFO_PAD,
NL80211_STA_INFO_ACK_SIGNAL,
+ NL80211_STA_INFO_DATA_ACK_SIGNAL_AVG,
/* keep last */
__NL80211_STA_INFO_AFTER_LAST,
@@ -3038,6 +3059,7 @@
* @NL80211_TID_STATS_TX_MSDU_FAILED: number of failed transmitted
* MSDUs (u64)
* @NL80211_TID_STATS_PAD: attribute used for padding for 64-bit alignment
+ * @NL80211_TID_STATS_TXQ_STATS: TXQ stats (nested attribute)
* @NUM_NL80211_TID_STATS: number of attributes here
* @NL80211_TID_STATS_MAX: highest numbered attribute here
*/
@@ -3048,6 +3070,7 @@
NL80211_TID_STATS_TX_MSDU_RETRIES,
NL80211_TID_STATS_TX_MSDU_FAILED,
NL80211_TID_STATS_PAD,
+ NL80211_TID_STATS_TXQ_STATS,
/* keep last */
NUM_NL80211_TID_STATS,
@@ -3055,6 +3078,44 @@
};
/**
+ * enum nl80211_txq_stats - per TXQ statistics attributes
+ * @__NL80211_TXQ_STATS_INVALID: attribute number 0 is reserved
+ * @NUM_NL80211_TXQ_STATS: number of attributes here
+ * @NL80211_TXQ_STATS_BACKLOG_BYTES: number of bytes currently backlogged
+ * @NL80211_TXQ_STATS_BACKLOG_PACKETS: number of packets currently
+ * backlogged
+ * @NL80211_TXQ_STATS_FLOWS: total number of new flows seen
+ * @NL80211_TXQ_STATS_DROPS: total number of packet drops
+ * @NL80211_TXQ_STATS_ECN_MARKS: total number of packet ECN marks
+ * @NL80211_TXQ_STATS_OVERLIMIT: number of drops due to queue space overflow
+ * @NL80211_TXQ_STATS_OVERMEMORY: number of drops due to memory limit overflow
+ * (only for per-phy stats)
+ * @NL80211_TXQ_STATS_COLLISIONS: number of hash collisions
+ * @NL80211_TXQ_STATS_TX_BYTES: total number of bytes dequeued from TXQ
+ * @NL80211_TXQ_STATS_TX_PACKETS: total number of packets dequeued from TXQ
+ * @NL80211_TXQ_STATS_MAX_FLOWS: number of flow buckets for PHY
+ * @NL80211_TXQ_STATS_MAX: highest numbered attribute here
+ */
+enum nl80211_txq_stats {
+ __NL80211_TXQ_STATS_INVALID,
+ NL80211_TXQ_STATS_BACKLOG_BYTES,
+ NL80211_TXQ_STATS_BACKLOG_PACKETS,
+ NL80211_TXQ_STATS_FLOWS,
+ NL80211_TXQ_STATS_DROPS,
+ NL80211_TXQ_STATS_ECN_MARKS,
+ NL80211_TXQ_STATS_OVERLIMIT,
+ NL80211_TXQ_STATS_OVERMEMORY,
+ NL80211_TXQ_STATS_COLLISIONS,
+ NL80211_TXQ_STATS_TX_BYTES,
+ NL80211_TXQ_STATS_TX_PACKETS,
+ NL80211_TXQ_STATS_MAX_FLOWS,
+
+ /* keep last */
+ NUM_NL80211_TXQ_STATS,
+ NL80211_TXQ_STATS_MAX = NUM_NL80211_TXQ_STATS - 1
+};
+
+/**
* enum nl80211_mpath_flags - nl80211 mesh path flags
*
* @NL80211_MPATH_FLAG_ACTIVE: the mesh path is active
@@ -3144,6 +3205,29 @@
#define NL80211_BAND_ATTR_HT_CAPA NL80211_BAND_ATTR_HT_CAPA
/**
+ * enum nl80211_wmm_rule - regulatory wmm rule
+ *
+ * @__NL80211_WMMR_INVALID: attribute number 0 is reserved
+ * @NL80211_WMMR_CW_MIN: Minimum contention window slot.
+ * @NL80211_WMMR_CW_MAX: Maximum contention window slot.
+ * @NL80211_WMMR_AIFSN: Arbitration Inter Frame Space.
+ * @NL80211_WMMR_TXOP: Maximum allowed tx operation time.
+ * @nl80211_WMMR_MAX: highest possible wmm rule.
+ * @__NL80211_WMMR_LAST: Internal use.
+ */
+enum nl80211_wmm_rule {
+ __NL80211_WMMR_INVALID,
+ NL80211_WMMR_CW_MIN,
+ NL80211_WMMR_CW_MAX,
+ NL80211_WMMR_AIFSN,
+ NL80211_WMMR_TXOP,
+
+ /* keep last */
+ __NL80211_WMMR_LAST,
+ NL80211_WMMR_MAX = __NL80211_WMMR_LAST - 1
+};
+
+/**
* enum nl80211_frequency_attr - frequency attributes
* @__NL80211_FREQUENCY_ATTR_INVALID: attribute number 0 is reserved
* @NL80211_FREQUENCY_ATTR_FREQ: Frequency in MHz
@@ -3192,6 +3276,9 @@
* on this channel in current regulatory domain.
* @NL80211_FREQUENCY_ATTR_NO_10MHZ: 10 MHz operation is not allowed
* on this channel in current regulatory domain.
+ * @NL80211_FREQUENCY_ATTR_WMM: this channel has wmm limitations.
+ * This is a nested attribute that contains the wmm limitation per AC.
+ * (see &enum nl80211_wmm_rule)
* @NL80211_FREQUENCY_ATTR_MAX: highest frequency attribute number
* currently defined
* @__NL80211_FREQUENCY_ATTR_AFTER_LAST: internal use
@@ -3220,6 +3307,7 @@
NL80211_FREQUENCY_ATTR_IR_CONCURRENT,
NL80211_FREQUENCY_ATTR_NO_20MHZ,
NL80211_FREQUENCY_ATTR_NO_10MHZ,
+ NL80211_FREQUENCY_ATTR_WMM,
/* keep last */
__NL80211_FREQUENCY_ATTR_AFTER_LAST,
@@ -5040,6 +5128,11 @@
* "radar detected" event.
* @NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211: Driver supports sending and
* receiving control port frames over nl80211 instead of the netdevice.
+ * @NL80211_EXT_FEATURE_DATA_ACK_SIGNAL_SUPPORT: This Driver support data ack
+ * rssi if firmware support, this flag is to intimate about ack rssi
+ * support to nl80211.
+ * @NL80211_EXT_FEATURE_TXQS: Driver supports FQ-CoDel-enabled intermediate
+ * TXQs.
*
* @NUM_NL80211_EXT_FEATURES: number of extended features.
* @MAX_NL80211_EXT_FEATURES: highest extended feature index.
@@ -5072,6 +5165,8 @@
NL80211_EXT_FEATURE_HIGH_ACCURACY_SCAN,
NL80211_EXT_FEATURE_DFS_OFFLOAD,
NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211,
+ NL80211_EXT_FEATURE_DATA_ACK_SIGNAL_SUPPORT,
+ NL80211_EXT_FEATURE_TXQS,
/* add new features before the definition below */
NUM_NL80211_EXT_FEATURES,
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 713e56c..863aaba 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -937,4 +937,32 @@
#define OVS_METER_BAND_TYPE_MAX (__OVS_METER_BAND_TYPE_MAX - 1)
+/* Conntrack limit */
+#define OVS_CT_LIMIT_FAMILY "ovs_ct_limit"
+#define OVS_CT_LIMIT_MCGROUP "ovs_ct_limit"
+#define OVS_CT_LIMIT_VERSION 0x1
+
+enum ovs_ct_limit_cmd {
+ OVS_CT_LIMIT_CMD_UNSPEC,
+ OVS_CT_LIMIT_CMD_SET, /* Add or modify ct limit. */
+ OVS_CT_LIMIT_CMD_DEL, /* Delete ct limit. */
+ OVS_CT_LIMIT_CMD_GET /* Get ct limit. */
+};
+
+enum ovs_ct_limit_attr {
+ OVS_CT_LIMIT_ATTR_UNSPEC,
+ OVS_CT_LIMIT_ATTR_ZONE_LIMIT, /* Nested struct ovs_zone_limit. */
+ __OVS_CT_LIMIT_ATTR_MAX
+};
+
+#define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
+
+#define OVS_ZONE_LIMIT_DEFAULT_ZONE -1
+
+struct ovs_zone_limit {
+ int zone_id;
+ __u32 limit;
+ __u32 count;
+};
+
#endif /* _LINUX_OPENVSWITCH_H */
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 103ba79..83ade9b 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -506,6 +506,8 @@
#define PCI_EXP_DEVCTL_READRQ_256B 0x1000 /* 256 Bytes */
#define PCI_EXP_DEVCTL_READRQ_512B 0x2000 /* 512 Bytes */
#define PCI_EXP_DEVCTL_READRQ_1024B 0x3000 /* 1024 Bytes */
+#define PCI_EXP_DEVCTL_READRQ_2048B 0x4000 /* 2048 Bytes */
+#define PCI_EXP_DEVCTL_READRQ_4096B 0x5000 /* 4096 Bytes */
#define PCI_EXP_DEVCTL_BCR_FLR 0x8000 /* Bridge Configuration Retry / FLR */
#define PCI_EXP_DEVSTA 10 /* Device Status */
#define PCI_EXP_DEVSTA_CED 0x0001 /* Correctable Error Detected */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index be05e66..84e4c1d 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -129,6 +129,7 @@
#define TCA_CLS_FLAGS_SKIP_SW (1 << 1) /* don't use filter in SW */
#define TCA_CLS_FLAGS_IN_HW (1 << 2) /* filter is offloaded to HW */
#define TCA_CLS_FLAGS_NOT_IN_HW (1 << 3) /* filter isn't offloaded to HW */
+#define TCA_CLS_FLAGS_VERBOSE (1 << 4) /* verbose logging */
/* U32 filters */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 9b15005..cabb210 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,9 @@
RTA_PAD,
RTA_UID,
RTA_TTL_PROPAGATE,
+ RTA_IP_PROTO,
+ RTA_SPORT,
+ RTA_DPORT,
__RTA_MAX
};
diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
index ef2d8c3..edc138b 100644
--- a/include/uapi/linux/seg6_local.h
+++ b/include/uapi/linux/seg6_local.h
@@ -25,6 +25,7 @@
SEG6_LOCAL_NH6,
SEG6_LOCAL_IIF,
SEG6_LOCAL_OIF,
+ SEG6_LOCAL_BPF,
__SEG6_LOCAL_MAX,
};
#define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
@@ -59,10 +60,21 @@
SEG6_LOCAL_ACTION_END_AS = 13,
/* forward to SR-unaware VNF with masquerading */
SEG6_LOCAL_ACTION_END_AM = 14,
+ /* custom BPF action */
+ SEG6_LOCAL_ACTION_END_BPF = 15,
__SEG6_LOCAL_ACTION_MAX,
};
#define SEG6_LOCAL_ACTION_MAX (__SEG6_LOCAL_ACTION_MAX - 1)
+enum {
+ SEG6_LOCAL_BPF_PROG_UNSPEC,
+ SEG6_LOCAL_BPF_PROG,
+ SEG6_LOCAL_BPF_PROG_NAME,
+ __SEG6_LOCAL_BPF_PROG_MAX,
+};
+
+#define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1)
+
#endif
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 33a70ec..750d891 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -276,6 +276,9 @@
LINUX_MIB_TCPKEEPALIVE, /* TCPKeepAlive */
LINUX_MIB_TCPMTUPFAIL, /* TCPMTUPFail */
LINUX_MIB_TCPMTUPSUCCESS, /* TCPMTUPSuccess */
+ LINUX_MIB_TCPDELIVERED, /* TCPDelivered */
+ LINUX_MIB_TCPDELIVEREDCE, /* TCPDeliveredCE */
+ LINUX_MIB_TCPACKCOMPRESSED, /* TCPAckCompressed */
__LINUX_MIB_MAX
};
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 560374c..29eb659 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -122,6 +122,10 @@
#define TCP_MD5SIG_EXT 32 /* TCP MD5 Signature with extensions */
#define TCP_FASTOPEN_KEY 33 /* Set the key for Fast Open (cookie) */
#define TCP_FASTOPEN_NO_COOKIE 34 /* Enable TFO without a TFO cookie */
+#define TCP_ZEROCOPY_RECEIVE 35
+#define TCP_INQ 36 /* Notify bytes available to read as a cmsg on read */
+
+#define TCP_CM_INQ TCP_INQ
struct tcp_repair_opt {
__u32 opt_code;
@@ -224,6 +228,9 @@
__u64 tcpi_busy_time; /* Time (usec) busy sending data */
__u64 tcpi_rwnd_limited; /* Time (usec) limited by receive window */
__u64 tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
+
+ __u32 tcpi_delivered;
+ __u32 tcpi_delivered_ce;
};
/* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
@@ -244,6 +251,8 @@
TCP_NLA_SNDQ_SIZE, /* Data (bytes) pending in send queue */
TCP_NLA_CA_STATE, /* ca_state of socket */
TCP_NLA_SND_SSTHRESH, /* Slow start size threshold */
+ TCP_NLA_DELIVERED, /* Data pkts delivered incl. out-of-order */
+ TCP_NLA_DELIVERED_CE, /* Like above but only ones w/ CE marks */
};
@@ -271,4 +280,11 @@
__u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN];
};
+/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
+
+struct tcp_zerocopy_receive {
+ __u64 address; /* in: address of mapping */
+ __u32 length; /* in/out: number of bytes to map/mapped */
+ __u32 recv_skip_hint; /* out: amount of bytes to skip */
+};
#endif /* _UAPI_LINUX_TCP_H */
diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h
index bf6d286..6b2fd4d 100644
--- a/include/uapi/linux/tipc.h
+++ b/include/uapi/linux/tipc.h
@@ -209,16 +209,16 @@
* The string formatting for each name element is:
* media: media
* interface: media:interface name
- * link: Z.C.N:interface-Z.C.N:interface
- *
+ * link: node:interface-node:interface
*/
-
+#define TIPC_NODEID_LEN 16
#define TIPC_MAX_MEDIA_NAME 16
#define TIPC_MAX_IF_NAME 16
#define TIPC_MAX_BEARER_NAME 32
#define TIPC_MAX_LINK_NAME 68
-#define SIOCGETLINKNAME SIOCPROTOPRIVATE
+#define SIOCGETLINKNAME SIOCPROTOPRIVATE
+#define SIOCGETNODEID (SIOCPROTOPRIVATE + 1)
struct tipc_sioc_ln_req {
__u32 peer;
@@ -226,6 +226,10 @@
char linkname[TIPC_MAX_LINK_NAME];
};
+struct tipc_sioc_nodeid_req {
+ __u32 peer;
+ char node_id[TIPC_NODEID_LEN];
+};
/* The macros and functions below are deprecated:
*/
diff --git a/include/uapi/linux/tipc_config.h b/include/uapi/linux/tipc_config.h
index 3f29e3c..4b2c93b 100644
--- a/include/uapi/linux/tipc_config.h
+++ b/include/uapi/linux/tipc_config.h
@@ -185,6 +185,11 @@
#define TIPC_DEF_LINK_WIN 50
#define TIPC_MAX_LINK_WIN 8191
+/*
+ * Default MTU for UDP media
+ */
+
+#define TIPC_DEF_LINK_UDP_MTU 14000
struct tipc_node_info {
__be32 addr; /* network address of node */
diff --git a/include/uapi/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h
index 0affb68..85c1198 100644
--- a/include/uapi/linux/tipc_netlink.h
+++ b/include/uapi/linux/tipc_netlink.h
@@ -266,6 +266,7 @@
TIPC_NLA_PROP_PRIO, /* u32 */
TIPC_NLA_PROP_TOL, /* u32 */
TIPC_NLA_PROP_WIN, /* u32 */
+ TIPC_NLA_PROP_MTU, /* u32 */
__TIPC_NLA_PROP_MAX,
TIPC_NLA_PROP_MAX = __TIPC_NLA_PROP_MAX - 1
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index efb7b59..09d00f8 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -32,6 +32,7 @@
#define UDP_ENCAP 100 /* Set the socket to accept encapsulated packets */
#define UDP_NO_CHECK6_TX 101 /* Disable sending checksum for UDP6X */
#define UDP_NO_CHECK6_RX 102 /* Disable accpeting checksum for UDP6 */
+#define UDP_SEGMENT 103 /* Set GSO segmentation size */
/* UDP encapsulation types */
#define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
diff --git a/include/uapi/linux/vmcore.h b/include/uapi/linux/vmcore.h
new file mode 100644
index 0000000..0226196
--- /dev/null
+++ b/include/uapi/linux/vmcore.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI_VMCORE_H
+#define _UAPI_VMCORE_H
+
+#include <linux/types.h>
+
+#define VMCOREDD_NOTE_NAME "LINUX"
+#define VMCOREDD_MAX_NAME_BYTES 44
+
+struct vmcoredd_header {
+ __u32 n_namesz; /* Name size */
+ __u32 n_descsz; /* Content size */
+ __u32 n_type; /* NT_VMCOREDD */
+ __u8 name[8]; /* LINUX\0\0\0 */
+ __u8 dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Device dump's name */
+};
+
+#endif /* _UAPI_VMCORE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 18b151f..1fecd5b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1391,6 +1391,7 @@
bool "Enable bpf() system call"
select ANON_INODES
select BPF
+ select IRQ_WORK
default n
help
Enable the bpf() system call that allows to manipulate eBPF
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index a713fd2..f27f549 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -4,9 +4,13 @@
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
+obj-$(CONFIG_BPF_SYSCALL) += btf.o
ifeq ($(CONFIG_NET),y)
obj-$(CONFIG_BPF_SYSCALL) += devmap.o
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
+ifeq ($(CONFIG_XDP_SOCKETS),y)
+obj-$(CONFIG_BPF_SYSCALL) += xskmap.o
+endif
obj-$(CONFIG_BPF_SYSCALL) += offload.o
ifeq ($(CONFIG_STREAM_PARSER),y)
ifeq ($(CONFIG_INET),y)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 027107f..544e58f 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -11,11 +11,13 @@
* General Public License for more details.
*/
#include <linux/bpf.h>
+#include <linux/btf.h>
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/filter.h>
#include <linux/perf_event.h>
+#include <uapi/linux/btf.h>
#include "map_in_map.h"
@@ -336,6 +338,52 @@
bpf_map_area_free(array);
}
+static void array_map_seq_show_elem(struct bpf_map *map, void *key,
+ struct seq_file *m)
+{
+ void *value;
+
+ rcu_read_lock();
+
+ value = array_map_lookup_elem(map, key);
+ if (!value) {
+ rcu_read_unlock();
+ return;
+ }
+
+ seq_printf(m, "%u: ", *(u32 *)key);
+ btf_type_seq_show(map->btf, map->btf_value_type_id, value, m);
+ seq_puts(m, "\n");
+
+ rcu_read_unlock();
+}
+
+static int array_map_check_btf(const struct bpf_map *map, const struct btf *btf,
+ u32 btf_key_id, u32 btf_value_id)
+{
+ const struct btf_type *key_type, *value_type;
+ u32 key_size, value_size;
+ u32 int_data;
+
+ key_type = btf_type_id_size(btf, &btf_key_id, &key_size);
+ if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
+ return -EINVAL;
+
+ int_data = *(u32 *)(key_type + 1);
+ /* bpf array can only take a u32 key. This check makes
+ * sure that the btf matches the attr used during map_create.
+ */
+ if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
+ BTF_INT_OFFSET(int_data))
+ return -EINVAL;
+
+ value_type = btf_type_id_size(btf, &btf_value_id, &value_size);
+ if (!value_type || value_size > map->value_size)
+ return -EINVAL;
+
+ return 0;
+}
+
const struct bpf_map_ops array_map_ops = {
.map_alloc_check = array_map_alloc_check,
.map_alloc = array_map_alloc,
@@ -345,6 +393,8 @@
.map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem,
.map_gen_lookup = array_map_gen_lookup,
+ .map_seq_show_elem = array_map_seq_show_elem,
+ .map_check_btf = array_map_check_btf,
};
const struct bpf_map_ops percpu_array_map_ops = {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
new file mode 100644
index 0000000..7e90fd1
--- /dev/null
+++ b/kernel/bpf/btf.c
@@ -0,0 +1,2324 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+
+#include <uapi/linux/btf.h>
+#include <uapi/linux/types.h>
+#include <linux/seq_file.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/uaccess.h>
+#include <linux/kernel.h>
+#include <linux/idr.h>
+#include <linux/sort.h>
+#include <linux/bpf_verifier.h>
+#include <linux/btf.h>
+
+/* BTF (BPF Type Format) is the meta data format which describes
+ * the data types of BPF program/map. Hence, it basically focus
+ * on the C programming language which the modern BPF is primary
+ * using.
+ *
+ * ELF Section:
+ * ~~~~~~~~~~~
+ * The BTF data is stored under the ".BTF" ELF section
+ *
+ * struct btf_type:
+ * ~~~~~~~~~~~~~~~
+ * Each 'struct btf_type' object describes a C data type.
+ * Depending on the type it is describing, a 'struct btf_type'
+ * object may be followed by more data. F.e.
+ * To describe an array, 'struct btf_type' is followed by
+ * 'struct btf_array'.
+ *
+ * 'struct btf_type' and any extra data following it are
+ * 4 bytes aligned.
+ *
+ * Type section:
+ * ~~~~~~~~~~~~~
+ * The BTF type section contains a list of 'struct btf_type' objects.
+ * Each one describes a C type. Recall from the above section
+ * that a 'struct btf_type' object could be immediately followed by extra
+ * data in order to desribe some particular C types.
+ *
+ * type_id:
+ * ~~~~~~~
+ * Each btf_type object is identified by a type_id. The type_id
+ * is implicitly implied by the location of the btf_type object in
+ * the BTF type section. The first one has type_id 1. The second
+ * one has type_id 2...etc. Hence, an earlier btf_type has
+ * a smaller type_id.
+ *
+ * A btf_type object may refer to another btf_type object by using
+ * type_id (i.e. the "type" in the "struct btf_type").
+ *
+ * NOTE that we cannot assume any reference-order.
+ * A btf_type object can refer to an earlier btf_type object
+ * but it can also refer to a later btf_type object.
+ *
+ * For example, to describe "const void *". A btf_type
+ * object describing "const" may refer to another btf_type
+ * object describing "void *". This type-reference is done
+ * by specifying type_id:
+ *
+ * [1] CONST (anon) type_id=2
+ * [2] PTR (anon) type_id=0
+ *
+ * The above is the btf_verifier debug log:
+ * - Each line started with "[?]" is a btf_type object
+ * - [?] is the type_id of the btf_type object.
+ * - CONST/PTR is the BTF_KIND_XXX
+ * - "(anon)" is the name of the type. It just
+ * happens that CONST and PTR has no name.
+ * - type_id=XXX is the 'u32 type' in btf_type
+ *
+ * NOTE: "void" has type_id 0
+ *
+ * String section:
+ * ~~~~~~~~~~~~~~
+ * The BTF string section contains the names used by the type section.
+ * Each string is referred by an "offset" from the beginning of the
+ * string section.
+ *
+ * Each string is '\0' terminated.
+ *
+ * The first character in the string section must be '\0'
+ * which is used to mean 'anonymous'. Some btf_type may not
+ * have a name.
+ */
+
+/* BTF verification:
+ *
+ * To verify BTF data, two passes are needed.
+ *
+ * Pass #1
+ * ~~~~~~~
+ * The first pass is to collect all btf_type objects to
+ * an array: "btf->types".
+ *
+ * Depending on the C type that a btf_type is describing,
+ * a btf_type may be followed by extra data. We don't know
+ * how many btf_type is there, and more importantly we don't
+ * know where each btf_type is located in the type section.
+ *
+ * Without knowing the location of each type_id, most verifications
+ * cannot be done. e.g. an earlier btf_type may refer to a later
+ * btf_type (recall the "const void *" above), so we cannot
+ * check this type-reference in the first pass.
+ *
+ * In the first pass, it still does some verifications (e.g.
+ * checking the name is a valid offset to the string section).
+ *
+ * Pass #2
+ * ~~~~~~~
+ * The main focus is to resolve a btf_type that is referring
+ * to another type.
+ *
+ * We have to ensure the referring type:
+ * 1) does exist in the BTF (i.e. in btf->types[])
+ * 2) does not cause a loop:
+ * struct A {
+ * struct B b;
+ * };
+ *
+ * struct B {
+ * struct A a;
+ * };
+ *
+ * btf_type_needs_resolve() decides if a btf_type needs
+ * to be resolved.
+ *
+ * The needs_resolve type implements the "resolve()" ops which
+ * essentially does a DFS and detects backedge.
+ *
+ * During resolve (or DFS), different C types have different
+ * "RESOLVED" conditions.
+ *
+ * When resolving a BTF_KIND_STRUCT, we need to resolve all its
+ * members because a member is always referring to another
+ * type. A struct's member can be treated as "RESOLVED" if
+ * it is referring to a BTF_KIND_PTR. Otherwise, the
+ * following valid C struct would be rejected:
+ *
+ * struct A {
+ * int m;
+ * struct A *a;
+ * };
+ *
+ * When resolving a BTF_KIND_PTR, it needs to keep resolving if
+ * it is referring to another BTF_KIND_PTR. Otherwise, we cannot
+ * detect a pointer loop, e.g.:
+ * BTF_KIND_CONST -> BTF_KIND_PTR -> BTF_KIND_CONST -> BTF_KIND_PTR +
+ * ^ |
+ * +-----------------------------------------+
+ *
+ */
+
+#define BITS_PER_U64 (sizeof(u64) * BITS_PER_BYTE)
+#define BITS_PER_BYTE_MASK (BITS_PER_BYTE - 1)
+#define BITS_PER_BYTE_MASKED(bits) ((bits) & BITS_PER_BYTE_MASK)
+#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
+#define BITS_ROUNDUP_BYTES(bits) \
+ (BITS_ROUNDDOWN_BYTES(bits) + !!BITS_PER_BYTE_MASKED(bits))
+
+#define BTF_INFO_MASK 0x0f00ffff
+#define BTF_INT_MASK 0x0fffffff
+#define BTF_TYPE_ID_VALID(type_id) ((type_id) <= BTF_MAX_TYPE)
+#define BTF_STR_OFFSET_VALID(name_off) ((name_off) <= BTF_MAX_NAME_OFFSET)
+
+/* 16MB for 64k structs and each has 16 members and
+ * a few MB spaces for the string section.
+ * The hard limit is S32_MAX.
+ */
+#define BTF_MAX_SIZE (16 * 1024 * 1024)
+
+#define for_each_member(i, struct_type, member) \
+ for (i = 0, member = btf_type_member(struct_type); \
+ i < btf_type_vlen(struct_type); \
+ i++, member++)
+
+#define for_each_member_from(i, from, struct_type, member) \
+ for (i = from, member = btf_type_member(struct_type) + from; \
+ i < btf_type_vlen(struct_type); \
+ i++, member++)
+
+static DEFINE_IDR(btf_idr);
+static DEFINE_SPINLOCK(btf_idr_lock);
+
+struct btf {
+ void *data;
+ struct btf_type **types;
+ u32 *resolved_ids;
+ u32 *resolved_sizes;
+ const char *strings;
+ void *nohdr_data;
+ struct btf_header hdr;
+ u32 nr_types;
+ u32 types_size;
+ u32 data_size;
+ refcount_t refcnt;
+ u32 id;
+ struct rcu_head rcu;
+};
+
+enum verifier_phase {
+ CHECK_META,
+ CHECK_TYPE,
+};
+
+struct resolve_vertex {
+ const struct btf_type *t;
+ u32 type_id;
+ u16 next_member;
+};
+
+enum visit_state {
+ NOT_VISITED,
+ VISITED,
+ RESOLVED,
+};
+
+enum resolve_mode {
+ RESOLVE_TBD, /* To Be Determined */
+ RESOLVE_PTR, /* Resolving for Pointer */
+ RESOLVE_STRUCT_OR_ARRAY, /* Resolving for struct/union
+ * or array
+ */
+};
+
+#define MAX_RESOLVE_DEPTH 32
+
+struct btf_sec_info {
+ u32 off;
+ u32 len;
+};
+
+struct btf_verifier_env {
+ struct btf *btf;
+ u8 *visit_states;
+ struct resolve_vertex stack[MAX_RESOLVE_DEPTH];
+ struct bpf_verifier_log log;
+ u32 log_type_id;
+ u32 top_stack;
+ enum verifier_phase phase;
+ enum resolve_mode resolve_mode;
+};
+
+static const char * const btf_kind_str[NR_BTF_KINDS] = {
+ [BTF_KIND_UNKN] = "UNKNOWN",
+ [BTF_KIND_INT] = "INT",
+ [BTF_KIND_PTR] = "PTR",
+ [BTF_KIND_ARRAY] = "ARRAY",
+ [BTF_KIND_STRUCT] = "STRUCT",
+ [BTF_KIND_UNION] = "UNION",
+ [BTF_KIND_ENUM] = "ENUM",
+ [BTF_KIND_FWD] = "FWD",
+ [BTF_KIND_TYPEDEF] = "TYPEDEF",
+ [BTF_KIND_VOLATILE] = "VOLATILE",
+ [BTF_KIND_CONST] = "CONST",
+ [BTF_KIND_RESTRICT] = "RESTRICT",
+};
+
+struct btf_kind_operations {
+ s32 (*check_meta)(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left);
+ int (*resolve)(struct btf_verifier_env *env,
+ const struct resolve_vertex *v);
+ int (*check_member)(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type);
+ void (*log_details)(struct btf_verifier_env *env,
+ const struct btf_type *t);
+ void (*seq_show)(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offsets,
+ struct seq_file *m);
+};
+
+static const struct btf_kind_operations * const kind_ops[NR_BTF_KINDS];
+static struct btf_type btf_void;
+
+static bool btf_type_is_modifier(const struct btf_type *t)
+{
+ /* Some of them is not strictly a C modifier
+ * but they are grouped into the same bucket
+ * for BTF concern:
+ * A type (t) that refers to another
+ * type through t->type AND its size cannot
+ * be determined without following the t->type.
+ *
+ * ptr does not fall into this bucket
+ * because its size is always sizeof(void *).
+ */
+ switch (BTF_INFO_KIND(t->info)) {
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ return true;
+ }
+
+ return false;
+}
+
+static bool btf_type_is_void(const struct btf_type *t)
+{
+ /* void => no type and size info.
+ * Hence, FWD is also treated as void.
+ */
+ return t == &btf_void || BTF_INFO_KIND(t->info) == BTF_KIND_FWD;
+}
+
+static bool btf_type_is_void_or_null(const struct btf_type *t)
+{
+ return !t || btf_type_is_void(t);
+}
+
+/* union is only a special case of struct:
+ * all its offsetof(member) == 0
+ */
+static bool btf_type_is_struct(const struct btf_type *t)
+{
+ u8 kind = BTF_INFO_KIND(t->info);
+
+ return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
+}
+
+static bool btf_type_is_array(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
+}
+
+static bool btf_type_is_ptr(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_PTR;
+}
+
+static bool btf_type_is_int(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_INT;
+}
+
+/* What types need to be resolved?
+ *
+ * btf_type_is_modifier() is an obvious one.
+ *
+ * btf_type_is_struct() because its member refers to
+ * another type (through member->type).
+
+ * btf_type_is_array() because its element (array->type)
+ * refers to another type. Array can be thought of a
+ * special case of struct while array just has the same
+ * member-type repeated by array->nelems of times.
+ */
+static bool btf_type_needs_resolve(const struct btf_type *t)
+{
+ return btf_type_is_modifier(t) ||
+ btf_type_is_ptr(t) ||
+ btf_type_is_struct(t) ||
+ btf_type_is_array(t);
+}
+
+/* t->size can be used */
+static bool btf_type_has_size(const struct btf_type *t)
+{
+ switch (BTF_INFO_KIND(t->info)) {
+ case BTF_KIND_INT:
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ case BTF_KIND_ENUM:
+ return true;
+ }
+
+ return false;
+}
+
+static const char *btf_int_encoding_str(u8 encoding)
+{
+ if (encoding == 0)
+ return "(none)";
+ else if (encoding == BTF_INT_SIGNED)
+ return "SIGNED";
+ else if (encoding == BTF_INT_CHAR)
+ return "CHAR";
+ else if (encoding == BTF_INT_BOOL)
+ return "BOOL";
+ else
+ return "UNKN";
+}
+
+static u16 btf_type_vlen(const struct btf_type *t)
+{
+ return BTF_INFO_VLEN(t->info);
+}
+
+static u32 btf_type_int(const struct btf_type *t)
+{
+ return *(u32 *)(t + 1);
+}
+
+static const struct btf_array *btf_type_array(const struct btf_type *t)
+{
+ return (const struct btf_array *)(t + 1);
+}
+
+static const struct btf_member *btf_type_member(const struct btf_type *t)
+{
+ return (const struct btf_member *)(t + 1);
+}
+
+static const struct btf_enum *btf_type_enum(const struct btf_type *t)
+{
+ return (const struct btf_enum *)(t + 1);
+}
+
+static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
+{
+ return kind_ops[BTF_INFO_KIND(t->info)];
+}
+
+static bool btf_name_offset_valid(const struct btf *btf, u32 offset)
+{
+ return BTF_STR_OFFSET_VALID(offset) &&
+ offset < btf->hdr.str_len;
+}
+
+static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
+{
+ if (!offset)
+ return "(anon)";
+ else if (offset < btf->hdr.str_len)
+ return &btf->strings[offset];
+ else
+ return "(invalid-name-offset)";
+}
+
+static const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
+{
+ if (type_id > btf->nr_types)
+ return NULL;
+
+ return btf->types[type_id];
+}
+
+/*
+ * Regular int is not a bit field and it must be either
+ * u8/u16/u32/u64.
+ */
+static bool btf_type_int_is_regular(const struct btf_type *t)
+{
+ u16 nr_bits, nr_bytes;
+ u32 int_data;
+
+ int_data = btf_type_int(t);
+ nr_bits = BTF_INT_BITS(int_data);
+ nr_bytes = BITS_ROUNDUP_BYTES(nr_bits);
+ if (BITS_PER_BYTE_MASKED(nr_bits) ||
+ BTF_INT_OFFSET(int_data) ||
+ (nr_bytes != sizeof(u8) && nr_bytes != sizeof(u16) &&
+ nr_bytes != sizeof(u32) && nr_bytes != sizeof(u64))) {
+ return false;
+ }
+
+ return true;
+}
+
+__printf(2, 3) static void __btf_verifier_log(struct bpf_verifier_log *log,
+ const char *fmt, ...)
+{
+ va_list args;
+
+ va_start(args, fmt);
+ bpf_verifier_vlog(log, fmt, args);
+ va_end(args);
+}
+
+__printf(2, 3) static void btf_verifier_log(struct btf_verifier_env *env,
+ const char *fmt, ...)
+{
+ struct bpf_verifier_log *log = &env->log;
+ va_list args;
+
+ if (!bpf_verifier_log_needed(log))
+ return;
+
+ va_start(args, fmt);
+ bpf_verifier_vlog(log, fmt, args);
+ va_end(args);
+}
+
+__printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ bool log_details,
+ const char *fmt, ...)
+{
+ struct bpf_verifier_log *log = &env->log;
+ u8 kind = BTF_INFO_KIND(t->info);
+ struct btf *btf = env->btf;
+ va_list args;
+
+ if (!bpf_verifier_log_needed(log))
+ return;
+
+ __btf_verifier_log(log, "[%u] %s %s%s",
+ env->log_type_id,
+ btf_kind_str[kind],
+ btf_name_by_offset(btf, t->name_off),
+ log_details ? " " : "");
+
+ if (log_details)
+ btf_type_ops(t)->log_details(env, t);
+
+ if (fmt && *fmt) {
+ __btf_verifier_log(log, " ");
+ va_start(args, fmt);
+ bpf_verifier_vlog(log, fmt, args);
+ va_end(args);
+ }
+
+ __btf_verifier_log(log, "\n");
+}
+
+#define btf_verifier_log_type(env, t, ...) \
+ __btf_verifier_log_type((env), (t), true, __VA_ARGS__)
+#define btf_verifier_log_basic(env, t, ...) \
+ __btf_verifier_log_type((env), (t), false, __VA_ARGS__)
+
+__printf(4, 5)
+static void btf_verifier_log_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const char *fmt, ...)
+{
+ struct bpf_verifier_log *log = &env->log;
+ struct btf *btf = env->btf;
+ va_list args;
+
+ if (!bpf_verifier_log_needed(log))
+ return;
+
+ /* The CHECK_META phase already did a btf dump.
+ *
+ * If member is logged again, it must hit an error in
+ * parsing this member. It is useful to print out which
+ * struct this member belongs to.
+ */
+ if (env->phase != CHECK_META)
+ btf_verifier_log_type(env, struct_type, NULL);
+
+ __btf_verifier_log(log, "\t%s type_id=%u bits_offset=%u",
+ btf_name_by_offset(btf, member->name_off),
+ member->type, member->offset);
+
+ if (fmt && *fmt) {
+ __btf_verifier_log(log, " ");
+ va_start(args, fmt);
+ bpf_verifier_vlog(log, fmt, args);
+ va_end(args);
+ }
+
+ __btf_verifier_log(log, "\n");
+}
+
+static void btf_verifier_log_hdr(struct btf_verifier_env *env,
+ u32 btf_data_size)
+{
+ struct bpf_verifier_log *log = &env->log;
+ const struct btf *btf = env->btf;
+ const struct btf_header *hdr;
+
+ if (!bpf_verifier_log_needed(log))
+ return;
+
+ hdr = &btf->hdr;
+ __btf_verifier_log(log, "magic: 0x%x\n", hdr->magic);
+ __btf_verifier_log(log, "version: %u\n", hdr->version);
+ __btf_verifier_log(log, "flags: 0x%x\n", hdr->flags);
+ __btf_verifier_log(log, "hdr_len: %u\n", hdr->hdr_len);
+ __btf_verifier_log(log, "type_off: %u\n", hdr->type_off);
+ __btf_verifier_log(log, "type_len: %u\n", hdr->type_len);
+ __btf_verifier_log(log, "str_off: %u\n", hdr->str_off);
+ __btf_verifier_log(log, "str_len: %u\n", hdr->str_len);
+ __btf_verifier_log(log, "btf_total_size: %u\n", btf_data_size);
+}
+
+static int btf_add_type(struct btf_verifier_env *env, struct btf_type *t)
+{
+ struct btf *btf = env->btf;
+
+ /* < 2 because +1 for btf_void which is always in btf->types[0].
+ * btf_void is not accounted in btf->nr_types because btf_void
+ * does not come from the BTF file.
+ */
+ if (btf->types_size - btf->nr_types < 2) {
+ /* Expand 'types' array */
+
+ struct btf_type **new_types;
+ u32 expand_by, new_size;
+
+ if (btf->types_size == BTF_MAX_TYPE) {
+ btf_verifier_log(env, "Exceeded max num of types");
+ return -E2BIG;
+ }
+
+ expand_by = max_t(u32, btf->types_size >> 2, 16);
+ new_size = min_t(u32, BTF_MAX_TYPE,
+ btf->types_size + expand_by);
+
+ new_types = kvzalloc(new_size * sizeof(*new_types),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!new_types)
+ return -ENOMEM;
+
+ if (btf->nr_types == 0)
+ new_types[0] = &btf_void;
+ else
+ memcpy(new_types, btf->types,
+ sizeof(*btf->types) * (btf->nr_types + 1));
+
+ kvfree(btf->types);
+ btf->types = new_types;
+ btf->types_size = new_size;
+ }
+
+ btf->types[++(btf->nr_types)] = t;
+
+ return 0;
+}
+
+static int btf_alloc_id(struct btf *btf)
+{
+ int id;
+
+ idr_preload(GFP_KERNEL);
+ spin_lock_bh(&btf_idr_lock);
+ id = idr_alloc_cyclic(&btf_idr, btf, 1, INT_MAX, GFP_ATOMIC);
+ if (id > 0)
+ btf->id = id;
+ spin_unlock_bh(&btf_idr_lock);
+ idr_preload_end();
+
+ if (WARN_ON_ONCE(!id))
+ return -ENOSPC;
+
+ return id > 0 ? 0 : id;
+}
+
+static void btf_free_id(struct btf *btf)
+{
+ unsigned long flags;
+
+ /*
+ * In map-in-map, calling map_delete_elem() on outer
+ * map will call bpf_map_put on the inner map.
+ * It will then eventually call btf_free_id()
+ * on the inner map. Some of the map_delete_elem()
+ * implementation may have irq disabled, so
+ * we need to use the _irqsave() version instead
+ * of the _bh() version.
+ */
+ spin_lock_irqsave(&btf_idr_lock, flags);
+ idr_remove(&btf_idr, btf->id);
+ spin_unlock_irqrestore(&btf_idr_lock, flags);
+}
+
+static void btf_free(struct btf *btf)
+{
+ kvfree(btf->types);
+ kvfree(btf->resolved_sizes);
+ kvfree(btf->resolved_ids);
+ kvfree(btf->data);
+ kfree(btf);
+}
+
+static void btf_free_rcu(struct rcu_head *rcu)
+{
+ struct btf *btf = container_of(rcu, struct btf, rcu);
+
+ btf_free(btf);
+}
+
+void btf_put(struct btf *btf)
+{
+ if (btf && refcount_dec_and_test(&btf->refcnt)) {
+ btf_free_id(btf);
+ call_rcu(&btf->rcu, btf_free_rcu);
+ }
+}
+
+static int env_resolve_init(struct btf_verifier_env *env)
+{
+ struct btf *btf = env->btf;
+ u32 nr_types = btf->nr_types;
+ u32 *resolved_sizes = NULL;
+ u32 *resolved_ids = NULL;
+ u8 *visit_states = NULL;
+
+ /* +1 for btf_void */
+ resolved_sizes = kvzalloc((nr_types + 1) * sizeof(*resolved_sizes),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!resolved_sizes)
+ goto nomem;
+
+ resolved_ids = kvzalloc((nr_types + 1) * sizeof(*resolved_ids),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!resolved_ids)
+ goto nomem;
+
+ visit_states = kvzalloc((nr_types + 1) * sizeof(*visit_states),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!visit_states)
+ goto nomem;
+
+ btf->resolved_sizes = resolved_sizes;
+ btf->resolved_ids = resolved_ids;
+ env->visit_states = visit_states;
+
+ return 0;
+
+nomem:
+ kvfree(resolved_sizes);
+ kvfree(resolved_ids);
+ kvfree(visit_states);
+ return -ENOMEM;
+}
+
+static void btf_verifier_env_free(struct btf_verifier_env *env)
+{
+ kvfree(env->visit_states);
+ kfree(env);
+}
+
+static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
+ const struct btf_type *next_type)
+{
+ switch (env->resolve_mode) {
+ case RESOLVE_TBD:
+ /* int, enum or void is a sink */
+ return !btf_type_needs_resolve(next_type);
+ case RESOLVE_PTR:
+ /* int, enum, void, struct or array is a sink for ptr */
+ return !btf_type_is_modifier(next_type) &&
+ !btf_type_is_ptr(next_type);
+ case RESOLVE_STRUCT_OR_ARRAY:
+ /* int, enum, void or ptr is a sink for struct and array */
+ return !btf_type_is_modifier(next_type) &&
+ !btf_type_is_array(next_type) &&
+ !btf_type_is_struct(next_type);
+ default:
+ BUG_ON(1);
+ }
+}
+
+static bool env_type_is_resolved(const struct btf_verifier_env *env,
+ u32 type_id)
+{
+ return env->visit_states[type_id] == RESOLVED;
+}
+
+static int env_stack_push(struct btf_verifier_env *env,
+ const struct btf_type *t, u32 type_id)
+{
+ struct resolve_vertex *v;
+
+ if (env->top_stack == MAX_RESOLVE_DEPTH)
+ return -E2BIG;
+
+ if (env->visit_states[type_id] != NOT_VISITED)
+ return -EEXIST;
+
+ env->visit_states[type_id] = VISITED;
+
+ v = &env->stack[env->top_stack++];
+ v->t = t;
+ v->type_id = type_id;
+ v->next_member = 0;
+
+ if (env->resolve_mode == RESOLVE_TBD) {
+ if (btf_type_is_ptr(t))
+ env->resolve_mode = RESOLVE_PTR;
+ else if (btf_type_is_struct(t) || btf_type_is_array(t))
+ env->resolve_mode = RESOLVE_STRUCT_OR_ARRAY;
+ }
+
+ return 0;
+}
+
+static void env_stack_set_next_member(struct btf_verifier_env *env,
+ u16 next_member)
+{
+ env->stack[env->top_stack - 1].next_member = next_member;
+}
+
+static void env_stack_pop_resolved(struct btf_verifier_env *env,
+ u32 resolved_type_id,
+ u32 resolved_size)
+{
+ u32 type_id = env->stack[--(env->top_stack)].type_id;
+ struct btf *btf = env->btf;
+
+ btf->resolved_sizes[type_id] = resolved_size;
+ btf->resolved_ids[type_id] = resolved_type_id;
+ env->visit_states[type_id] = RESOLVED;
+}
+
+static const struct resolve_vertex *env_stack_peak(struct btf_verifier_env *env)
+{
+ return env->top_stack ? &env->stack[env->top_stack - 1] : NULL;
+}
+
+/* The input param "type_id" must point to a needs_resolve type */
+static const struct btf_type *btf_type_id_resolve(const struct btf *btf,
+ u32 *type_id)
+{
+ *type_id = btf->resolved_ids[*type_id];
+ return btf_type_by_id(btf, *type_id);
+}
+
+const struct btf_type *btf_type_id_size(const struct btf *btf,
+ u32 *type_id, u32 *ret_size)
+{
+ const struct btf_type *size_type;
+ u32 size_type_id = *type_id;
+ u32 size = 0;
+
+ size_type = btf_type_by_id(btf, size_type_id);
+ if (btf_type_is_void_or_null(size_type))
+ return NULL;
+
+ if (btf_type_has_size(size_type)) {
+ size = size_type->size;
+ } else if (btf_type_is_array(size_type)) {
+ size = btf->resolved_sizes[size_type_id];
+ } else if (btf_type_is_ptr(size_type)) {
+ size = sizeof(void *);
+ } else {
+ if (WARN_ON_ONCE(!btf_type_is_modifier(size_type)))
+ return NULL;
+
+ size = btf->resolved_sizes[size_type_id];
+ size_type_id = btf->resolved_ids[size_type_id];
+ size_type = btf_type_by_id(btf, size_type_id);
+ if (btf_type_is_void(size_type))
+ return NULL;
+ }
+
+ *type_id = size_type_id;
+ if (ret_size)
+ *ret_size = size;
+
+ return size_type;
+}
+
+static int btf_df_check_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type)
+{
+ btf_verifier_log_basic(env, struct_type,
+ "Unsupported check_member");
+ return -EINVAL;
+}
+
+static int btf_df_resolve(struct btf_verifier_env *env,
+ const struct resolve_vertex *v)
+{
+ btf_verifier_log_basic(env, v->t, "Unsupported resolve");
+ return -EINVAL;
+}
+
+static void btf_df_seq_show(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offsets,
+ struct seq_file *m)
+{
+ seq_printf(m, "<unsupported kind:%u>", BTF_INFO_KIND(t->info));
+}
+
+static int btf_int_check_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type)
+{
+ u32 int_data = btf_type_int(member_type);
+ u32 struct_bits_off = member->offset;
+ u32 struct_size = struct_type->size;
+ u32 nr_copy_bits;
+ u32 bytes_offset;
+
+ if (U32_MAX - struct_bits_off < BTF_INT_OFFSET(int_data)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "bits_offset exceeds U32_MAX");
+ return -EINVAL;
+ }
+
+ struct_bits_off += BTF_INT_OFFSET(int_data);
+ bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
+ nr_copy_bits = BTF_INT_BITS(int_data) +
+ BITS_PER_BYTE_MASKED(struct_bits_off);
+
+ if (nr_copy_bits > BITS_PER_U64) {
+ btf_verifier_log_member(env, struct_type, member,
+ "nr_copy_bits exceeds 64");
+ return -EINVAL;
+ }
+
+ if (struct_size < bytes_offset ||
+ struct_size - bytes_offset < BITS_ROUNDUP_BYTES(nr_copy_bits)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member exceeds struct_size");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static s32 btf_int_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ u32 int_data, nr_bits, meta_needed = sizeof(int_data);
+ u16 encoding;
+
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ if (btf_type_vlen(t)) {
+ btf_verifier_log_type(env, t, "vlen != 0");
+ return -EINVAL;
+ }
+
+ int_data = btf_type_int(t);
+ if (int_data & ~BTF_INT_MASK) {
+ btf_verifier_log_basic(env, t, "Invalid int_data:%x",
+ int_data);
+ return -EINVAL;
+ }
+
+ nr_bits = BTF_INT_BITS(int_data) + BTF_INT_OFFSET(int_data);
+
+ if (nr_bits > BITS_PER_U64) {
+ btf_verifier_log_type(env, t, "nr_bits exceeds %zu",
+ BITS_PER_U64);
+ return -EINVAL;
+ }
+
+ if (BITS_ROUNDUP_BYTES(nr_bits) > t->size) {
+ btf_verifier_log_type(env, t, "nr_bits exceeds type_size");
+ return -EINVAL;
+ }
+
+ /*
+ * Only one of the encoding bits is allowed and it
+ * should be sufficient for the pretty print purpose (i.e. decoding).
+ * Multiple bits can be allowed later if it is found
+ * to be insufficient.
+ */
+ encoding = BTF_INT_ENCODING(int_data);
+ if (encoding &&
+ encoding != BTF_INT_SIGNED &&
+ encoding != BTF_INT_CHAR &&
+ encoding != BTF_INT_BOOL) {
+ btf_verifier_log_type(env, t, "Unsupported encoding");
+ return -ENOTSUPP;
+ }
+
+ btf_verifier_log_type(env, t, NULL);
+
+ return meta_needed;
+}
+
+static void btf_int_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ int int_data = btf_type_int(t);
+
+ btf_verifier_log(env,
+ "size=%u bits_offset=%u nr_bits=%u encoding=%s",
+ t->size, BTF_INT_OFFSET(int_data),
+ BTF_INT_BITS(int_data),
+ btf_int_encoding_str(BTF_INT_ENCODING(int_data)));
+}
+
+static void btf_int_bits_seq_show(const struct btf *btf,
+ const struct btf_type *t,
+ void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ u32 int_data = btf_type_int(t);
+ u16 nr_bits = BTF_INT_BITS(int_data);
+ u16 total_bits_offset;
+ u16 nr_copy_bytes;
+ u16 nr_copy_bits;
+ u8 nr_upper_bits;
+ union {
+ u64 u64_num;
+ u8 u8_nums[8];
+ } print_num;
+
+ total_bits_offset = bits_offset + BTF_INT_OFFSET(int_data);
+ data += BITS_ROUNDDOWN_BYTES(total_bits_offset);
+ bits_offset = BITS_PER_BYTE_MASKED(total_bits_offset);
+ nr_copy_bits = nr_bits + bits_offset;
+ nr_copy_bytes = BITS_ROUNDUP_BYTES(nr_copy_bits);
+
+ print_num.u64_num = 0;
+ memcpy(&print_num.u64_num, data, nr_copy_bytes);
+
+ /* Ditch the higher order bits */
+ nr_upper_bits = BITS_PER_BYTE_MASKED(nr_copy_bits);
+ if (nr_upper_bits) {
+ /* We need to mask out some bits of the upper byte. */
+ u8 mask = (1 << nr_upper_bits) - 1;
+
+ print_num.u8_nums[nr_copy_bytes - 1] &= mask;
+ }
+
+ print_num.u64_num >>= bits_offset;
+
+ seq_printf(m, "0x%llx", print_num.u64_num);
+}
+
+static void btf_int_seq_show(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ u32 int_data = btf_type_int(t);
+ u8 encoding = BTF_INT_ENCODING(int_data);
+ bool sign = encoding & BTF_INT_SIGNED;
+ u32 nr_bits = BTF_INT_BITS(int_data);
+
+ if (bits_offset || BTF_INT_OFFSET(int_data) ||
+ BITS_PER_BYTE_MASKED(nr_bits)) {
+ btf_int_bits_seq_show(btf, t, data, bits_offset, m);
+ return;
+ }
+
+ switch (nr_bits) {
+ case 64:
+ if (sign)
+ seq_printf(m, "%lld", *(s64 *)data);
+ else
+ seq_printf(m, "%llu", *(u64 *)data);
+ break;
+ case 32:
+ if (sign)
+ seq_printf(m, "%d", *(s32 *)data);
+ else
+ seq_printf(m, "%u", *(u32 *)data);
+ break;
+ case 16:
+ if (sign)
+ seq_printf(m, "%d", *(s16 *)data);
+ else
+ seq_printf(m, "%u", *(u16 *)data);
+ break;
+ case 8:
+ if (sign)
+ seq_printf(m, "%d", *(s8 *)data);
+ else
+ seq_printf(m, "%u", *(u8 *)data);
+ break;
+ default:
+ btf_int_bits_seq_show(btf, t, data, bits_offset, m);
+ }
+}
+
+static const struct btf_kind_operations int_ops = {
+ .check_meta = btf_int_check_meta,
+ .resolve = btf_df_resolve,
+ .check_member = btf_int_check_member,
+ .log_details = btf_int_log,
+ .seq_show = btf_int_seq_show,
+};
+
+static int btf_modifier_check_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type)
+{
+ const struct btf_type *resolved_type;
+ u32 resolved_type_id = member->type;
+ struct btf_member resolved_member;
+ struct btf *btf = env->btf;
+
+ resolved_type = btf_type_id_size(btf, &resolved_type_id, NULL);
+ if (!resolved_type) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Invalid member");
+ return -EINVAL;
+ }
+
+ resolved_member = *member;
+ resolved_member.type = resolved_type_id;
+
+ return btf_type_ops(resolved_type)->check_member(env, struct_type,
+ &resolved_member,
+ resolved_type);
+}
+
+static int btf_ptr_check_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type)
+{
+ u32 struct_size, struct_bits_off, bytes_offset;
+
+ struct_size = struct_type->size;
+ struct_bits_off = member->offset;
+ bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
+
+ if (BITS_PER_BYTE_MASKED(struct_bits_off)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member is not byte aligned");
+ return -EINVAL;
+ }
+
+ if (struct_size - bytes_offset < sizeof(void *)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member exceeds struct_size");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int btf_ref_type_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ if (btf_type_vlen(t)) {
+ btf_verifier_log_type(env, t, "vlen != 0");
+ return -EINVAL;
+ }
+
+ if (!BTF_TYPE_ID_VALID(t->type)) {
+ btf_verifier_log_type(env, t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+ btf_verifier_log_type(env, t, NULL);
+
+ return 0;
+}
+
+static int btf_modifier_resolve(struct btf_verifier_env *env,
+ const struct resolve_vertex *v)
+{
+ const struct btf_type *t = v->t;
+ const struct btf_type *next_type;
+ u32 next_type_id = t->type;
+ struct btf *btf = env->btf;
+ u32 next_type_size = 0;
+
+ next_type = btf_type_by_id(btf, next_type_id);
+ if (!next_type) {
+ btf_verifier_log_type(env, v->t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+ /* "typedef void new_void", "const void"...etc */
+ if (btf_type_is_void(next_type))
+ goto resolved;
+
+ if (!env_type_is_resolve_sink(env, next_type) &&
+ !env_type_is_resolved(env, next_type_id))
+ return env_stack_push(env, next_type, next_type_id);
+
+ /* Figure out the resolved next_type_id with size.
+ * They will be stored in the current modifier's
+ * resolved_ids and resolved_sizes such that it can
+ * save us a few type-following when we use it later (e.g. in
+ * pretty print).
+ */
+ if (!btf_type_id_size(btf, &next_type_id, &next_type_size) &&
+ !btf_type_is_void(btf_type_id_resolve(btf, &next_type_id))) {
+ btf_verifier_log_type(env, v->t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+resolved:
+ env_stack_pop_resolved(env, next_type_id, next_type_size);
+
+ return 0;
+}
+
+static int btf_ptr_resolve(struct btf_verifier_env *env,
+ const struct resolve_vertex *v)
+{
+ const struct btf_type *next_type;
+ const struct btf_type *t = v->t;
+ u32 next_type_id = t->type;
+ struct btf *btf = env->btf;
+ u32 next_type_size = 0;
+
+ next_type = btf_type_by_id(btf, next_type_id);
+ if (!next_type) {
+ btf_verifier_log_type(env, v->t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+ /* "void *" */
+ if (btf_type_is_void(next_type))
+ goto resolved;
+
+ if (!env_type_is_resolve_sink(env, next_type) &&
+ !env_type_is_resolved(env, next_type_id))
+ return env_stack_push(env, next_type, next_type_id);
+
+ /* If the modifier was RESOLVED during RESOLVE_STRUCT_OR_ARRAY,
+ * the modifier may have stopped resolving when it was resolved
+ * to a ptr (last-resolved-ptr).
+ *
+ * We now need to continue from the last-resolved-ptr to
+ * ensure the last-resolved-ptr will not referring back to
+ * the currenct ptr (t).
+ */
+ if (btf_type_is_modifier(next_type)) {
+ const struct btf_type *resolved_type;
+ u32 resolved_type_id;
+
+ resolved_type_id = next_type_id;
+ resolved_type = btf_type_id_resolve(btf, &resolved_type_id);
+
+ if (btf_type_is_ptr(resolved_type) &&
+ !env_type_is_resolve_sink(env, resolved_type) &&
+ !env_type_is_resolved(env, resolved_type_id))
+ return env_stack_push(env, resolved_type,
+ resolved_type_id);
+ }
+
+ if (!btf_type_id_size(btf, &next_type_id, &next_type_size) &&
+ !btf_type_is_void(btf_type_id_resolve(btf, &next_type_id))) {
+ btf_verifier_log_type(env, v->t, "Invalid type_id");
+ return -EINVAL;
+ }
+
+resolved:
+ env_stack_pop_resolved(env, next_type_id, 0);
+
+ return 0;
+}
+
+static void btf_modifier_seq_show(const struct btf *btf,
+ const struct btf_type *t,
+ u32 type_id, void *data,
+ u8 bits_offset, struct seq_file *m)
+{
+ t = btf_type_id_resolve(btf, &type_id);
+
+ btf_type_ops(t)->seq_show(btf, t, type_id, data, bits_offset, m);
+}
+
+static void btf_ptr_seq_show(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ /* It is a hashed value */
+ seq_printf(m, "%p", *(void **)data);
+}
+
+static void btf_ref_type_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ btf_verifier_log(env, "type_id=%u", t->type);
+}
+
+static struct btf_kind_operations modifier_ops = {
+ .check_meta = btf_ref_type_check_meta,
+ .resolve = btf_modifier_resolve,
+ .check_member = btf_modifier_check_member,
+ .log_details = btf_ref_type_log,
+ .seq_show = btf_modifier_seq_show,
+};
+
+static struct btf_kind_operations ptr_ops = {
+ .check_meta = btf_ref_type_check_meta,
+ .resolve = btf_ptr_resolve,
+ .check_member = btf_ptr_check_member,
+ .log_details = btf_ref_type_log,
+ .seq_show = btf_ptr_seq_show,
+};
+
+static struct btf_kind_operations fwd_ops = {
+ .check_meta = btf_ref_type_check_meta,
+ .resolve = btf_df_resolve,
+ .check_member = btf_df_check_member,
+ .log_details = btf_ref_type_log,
+ .seq_show = btf_df_seq_show,
+};
+
+static int btf_array_check_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type)
+{
+ u32 struct_bits_off = member->offset;
+ u32 struct_size, bytes_offset;
+ u32 array_type_id, array_size;
+ struct btf *btf = env->btf;
+
+ if (BITS_PER_BYTE_MASKED(struct_bits_off)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member is not byte aligned");
+ return -EINVAL;
+ }
+
+ array_type_id = member->type;
+ btf_type_id_size(btf, &array_type_id, &array_size);
+ struct_size = struct_type->size;
+ bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
+ if (struct_size - bytes_offset < array_size) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member exceeds struct_size");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static s32 btf_array_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ const struct btf_array *array = btf_type_array(t);
+ u32 meta_needed = sizeof(*array);
+
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ if (btf_type_vlen(t)) {
+ btf_verifier_log_type(env, t, "vlen != 0");
+ return -EINVAL;
+ }
+
+ /* Array elem type and index type cannot be in type void,
+ * so !array->type and !array->index_type are not allowed.
+ */
+ if (!array->type || !BTF_TYPE_ID_VALID(array->type)) {
+ btf_verifier_log_type(env, t, "Invalid elem");
+ return -EINVAL;
+ }
+
+ if (!array->index_type || !BTF_TYPE_ID_VALID(array->index_type)) {
+ btf_verifier_log_type(env, t, "Invalid index");
+ return -EINVAL;
+ }
+
+ btf_verifier_log_type(env, t, NULL);
+
+ return meta_needed;
+}
+
+static int btf_array_resolve(struct btf_verifier_env *env,
+ const struct resolve_vertex *v)
+{
+ const struct btf_array *array = btf_type_array(v->t);
+ const struct btf_type *elem_type, *index_type;
+ u32 elem_type_id, index_type_id;
+ struct btf *btf = env->btf;
+ u32 elem_size;
+
+ /* Check array->index_type */
+ index_type_id = array->index_type;
+ index_type = btf_type_by_id(btf, index_type_id);
+ if (btf_type_is_void_or_null(index_type)) {
+ btf_verifier_log_type(env, v->t, "Invalid index");
+ return -EINVAL;
+ }
+
+ if (!env_type_is_resolve_sink(env, index_type) &&
+ !env_type_is_resolved(env, index_type_id))
+ return env_stack_push(env, index_type, index_type_id);
+
+ index_type = btf_type_id_size(btf, &index_type_id, NULL);
+ if (!index_type || !btf_type_is_int(index_type) ||
+ !btf_type_int_is_regular(index_type)) {
+ btf_verifier_log_type(env, v->t, "Invalid index");
+ return -EINVAL;
+ }
+
+ /* Check array->type */
+ elem_type_id = array->type;
+ elem_type = btf_type_by_id(btf, elem_type_id);
+ if (btf_type_is_void_or_null(elem_type)) {
+ btf_verifier_log_type(env, v->t,
+ "Invalid elem");
+ return -EINVAL;
+ }
+
+ if (!env_type_is_resolve_sink(env, elem_type) &&
+ !env_type_is_resolved(env, elem_type_id))
+ return env_stack_push(env, elem_type, elem_type_id);
+
+ elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+ if (!elem_type) {
+ btf_verifier_log_type(env, v->t, "Invalid elem");
+ return -EINVAL;
+ }
+
+ if (btf_type_is_int(elem_type) && !btf_type_int_is_regular(elem_type)) {
+ btf_verifier_log_type(env, v->t, "Invalid array of int");
+ return -EINVAL;
+ }
+
+ if (array->nelems && elem_size > U32_MAX / array->nelems) {
+ btf_verifier_log_type(env, v->t,
+ "Array size overflows U32_MAX");
+ return -EINVAL;
+ }
+
+ env_stack_pop_resolved(env, elem_type_id, elem_size * array->nelems);
+
+ return 0;
+}
+
+static void btf_array_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ const struct btf_array *array = btf_type_array(t);
+
+ btf_verifier_log(env, "type_id=%u index_type_id=%u nr_elems=%u",
+ array->type, array->index_type, array->nelems);
+}
+
+static void btf_array_seq_show(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ const struct btf_array *array = btf_type_array(t);
+ const struct btf_kind_operations *elem_ops;
+ const struct btf_type *elem_type;
+ u32 i, elem_size, elem_type_id;
+
+ elem_type_id = array->type;
+ elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+ elem_ops = btf_type_ops(elem_type);
+ seq_puts(m, "[");
+ for (i = 0; i < array->nelems; i++) {
+ if (i)
+ seq_puts(m, ",");
+
+ elem_ops->seq_show(btf, elem_type, elem_type_id, data,
+ bits_offset, m);
+ data += elem_size;
+ }
+ seq_puts(m, "]");
+}
+
+static struct btf_kind_operations array_ops = {
+ .check_meta = btf_array_check_meta,
+ .resolve = btf_array_resolve,
+ .check_member = btf_array_check_member,
+ .log_details = btf_array_log,
+ .seq_show = btf_array_seq_show,
+};
+
+static int btf_struct_check_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type)
+{
+ u32 struct_bits_off = member->offset;
+ u32 struct_size, bytes_offset;
+
+ if (BITS_PER_BYTE_MASKED(struct_bits_off)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member is not byte aligned");
+ return -EINVAL;
+ }
+
+ struct_size = struct_type->size;
+ bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
+ if (struct_size - bytes_offset < member_type->size) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member exceeds struct_size");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static s32 btf_struct_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ bool is_union = BTF_INFO_KIND(t->info) == BTF_KIND_UNION;
+ const struct btf_member *member;
+ struct btf *btf = env->btf;
+ u32 struct_size = t->size;
+ u32 meta_needed;
+ u16 i;
+
+ meta_needed = btf_type_vlen(t) * sizeof(*member);
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ btf_verifier_log_type(env, t, NULL);
+
+ for_each_member(i, t, member) {
+ if (!btf_name_offset_valid(btf, member->name_off)) {
+ btf_verifier_log_member(env, t, member,
+ "Invalid member name_offset:%u",
+ member->name_off);
+ return -EINVAL;
+ }
+
+ /* A member cannot be in type void */
+ if (!member->type || !BTF_TYPE_ID_VALID(member->type)) {
+ btf_verifier_log_member(env, t, member,
+ "Invalid type_id");
+ return -EINVAL;
+ }
+
+ if (is_union && member->offset) {
+ btf_verifier_log_member(env, t, member,
+ "Invalid member bits_offset");
+ return -EINVAL;
+ }
+
+ if (BITS_ROUNDUP_BYTES(member->offset) > struct_size) {
+ btf_verifier_log_member(env, t, member,
+ "Memmber bits_offset exceeds its struct size");
+ return -EINVAL;
+ }
+
+ btf_verifier_log_member(env, t, member, NULL);
+ }
+
+ return meta_needed;
+}
+
+static int btf_struct_resolve(struct btf_verifier_env *env,
+ const struct resolve_vertex *v)
+{
+ const struct btf_member *member;
+ int err;
+ u16 i;
+
+ /* Before continue resolving the next_member,
+ * ensure the last member is indeed resolved to a
+ * type with size info.
+ */
+ if (v->next_member) {
+ const struct btf_type *last_member_type;
+ const struct btf_member *last_member;
+ u16 last_member_type_id;
+
+ last_member = btf_type_member(v->t) + v->next_member - 1;
+ last_member_type_id = last_member->type;
+ if (WARN_ON_ONCE(!env_type_is_resolved(env,
+ last_member_type_id)))
+ return -EINVAL;
+
+ last_member_type = btf_type_by_id(env->btf,
+ last_member_type_id);
+ err = btf_type_ops(last_member_type)->check_member(env, v->t,
+ last_member,
+ last_member_type);
+ if (err)
+ return err;
+ }
+
+ for_each_member_from(i, v->next_member, v->t, member) {
+ u32 member_type_id = member->type;
+ const struct btf_type *member_type = btf_type_by_id(env->btf,
+ member_type_id);
+
+ if (btf_type_is_void_or_null(member_type)) {
+ btf_verifier_log_member(env, v->t, member,
+ "Invalid member");
+ return -EINVAL;
+ }
+
+ if (!env_type_is_resolve_sink(env, member_type) &&
+ !env_type_is_resolved(env, member_type_id)) {
+ env_stack_set_next_member(env, i + 1);
+ return env_stack_push(env, member_type, member_type_id);
+ }
+
+ err = btf_type_ops(member_type)->check_member(env, v->t,
+ member,
+ member_type);
+ if (err)
+ return err;
+ }
+
+ env_stack_pop_resolved(env, 0, 0);
+
+ return 0;
+}
+
+static void btf_struct_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
+}
+
+static void btf_struct_seq_show(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ const char *seq = BTF_INFO_KIND(t->info) == BTF_KIND_UNION ? "|" : ",";
+ const struct btf_member *member;
+ u32 i;
+
+ seq_puts(m, "{");
+ for_each_member(i, t, member) {
+ const struct btf_type *member_type = btf_type_by_id(btf,
+ member->type);
+ u32 member_offset = member->offset;
+ u32 bytes_offset = BITS_ROUNDDOWN_BYTES(member_offset);
+ u8 bits8_offset = BITS_PER_BYTE_MASKED(member_offset);
+ const struct btf_kind_operations *ops;
+
+ if (i)
+ seq_puts(m, seq);
+
+ ops = btf_type_ops(member_type);
+ ops->seq_show(btf, member_type, member->type,
+ data + bytes_offset, bits8_offset, m);
+ }
+ seq_puts(m, "}");
+}
+
+static struct btf_kind_operations struct_ops = {
+ .check_meta = btf_struct_check_meta,
+ .resolve = btf_struct_resolve,
+ .check_member = btf_struct_check_member,
+ .log_details = btf_struct_log,
+ .seq_show = btf_struct_seq_show,
+};
+
+static int btf_enum_check_member(struct btf_verifier_env *env,
+ const struct btf_type *struct_type,
+ const struct btf_member *member,
+ const struct btf_type *member_type)
+{
+ u32 struct_bits_off = member->offset;
+ u32 struct_size, bytes_offset;
+
+ if (BITS_PER_BYTE_MASKED(struct_bits_off)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member is not byte aligned");
+ return -EINVAL;
+ }
+
+ struct_size = struct_type->size;
+ bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off);
+ if (struct_size - bytes_offset < sizeof(int)) {
+ btf_verifier_log_member(env, struct_type, member,
+ "Member exceeds struct_size");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static s32 btf_enum_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ const struct btf_enum *enums = btf_type_enum(t);
+ struct btf *btf = env->btf;
+ u16 i, nr_enums;
+ u32 meta_needed;
+
+ nr_enums = btf_type_vlen(t);
+ meta_needed = nr_enums * sizeof(*enums);
+
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ if (t->size != sizeof(int)) {
+ btf_verifier_log_type(env, t, "Expected size:%zu",
+ sizeof(int));
+ return -EINVAL;
+ }
+
+ btf_verifier_log_type(env, t, NULL);
+
+ for (i = 0; i < nr_enums; i++) {
+ if (!btf_name_offset_valid(btf, enums[i].name_off)) {
+ btf_verifier_log(env, "\tInvalid name_offset:%u",
+ enums[i].name_off);
+ return -EINVAL;
+ }
+
+ btf_verifier_log(env, "\t%s val=%d\n",
+ btf_name_by_offset(btf, enums[i].name_off),
+ enums[i].val);
+ }
+
+ return meta_needed;
+}
+
+static void btf_enum_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ btf_verifier_log(env, "size=%u vlen=%u", t->size, btf_type_vlen(t));
+}
+
+static void btf_enum_seq_show(const struct btf *btf, const struct btf_type *t,
+ u32 type_id, void *data, u8 bits_offset,
+ struct seq_file *m)
+{
+ const struct btf_enum *enums = btf_type_enum(t);
+ u32 i, nr_enums = btf_type_vlen(t);
+ int v = *(int *)data;
+
+ for (i = 0; i < nr_enums; i++) {
+ if (v == enums[i].val) {
+ seq_printf(m, "%s",
+ btf_name_by_offset(btf, enums[i].name_off));
+ return;
+ }
+ }
+
+ seq_printf(m, "%d", v);
+}
+
+static struct btf_kind_operations enum_ops = {
+ .check_meta = btf_enum_check_meta,
+ .resolve = btf_df_resolve,
+ .check_member = btf_enum_check_member,
+ .log_details = btf_enum_log,
+ .seq_show = btf_enum_seq_show,
+};
+
+static const struct btf_kind_operations * const kind_ops[NR_BTF_KINDS] = {
+ [BTF_KIND_INT] = &int_ops,
+ [BTF_KIND_PTR] = &ptr_ops,
+ [BTF_KIND_ARRAY] = &array_ops,
+ [BTF_KIND_STRUCT] = &struct_ops,
+ [BTF_KIND_UNION] = &struct_ops,
+ [BTF_KIND_ENUM] = &enum_ops,
+ [BTF_KIND_FWD] = &fwd_ops,
+ [BTF_KIND_TYPEDEF] = &modifier_ops,
+ [BTF_KIND_VOLATILE] = &modifier_ops,
+ [BTF_KIND_CONST] = &modifier_ops,
+ [BTF_KIND_RESTRICT] = &modifier_ops,
+};
+
+static s32 btf_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ u32 saved_meta_left = meta_left;
+ s32 var_meta_size;
+
+ if (meta_left < sizeof(*t)) {
+ btf_verifier_log(env, "[%u] meta_left:%u meta_needed:%zu",
+ env->log_type_id, meta_left, sizeof(*t));
+ return -EINVAL;
+ }
+ meta_left -= sizeof(*t);
+
+ if (t->info & ~BTF_INFO_MASK) {
+ btf_verifier_log(env, "[%u] Invalid btf_info:%x",
+ env->log_type_id, t->info);
+ return -EINVAL;
+ }
+
+ if (BTF_INFO_KIND(t->info) > BTF_KIND_MAX ||
+ BTF_INFO_KIND(t->info) == BTF_KIND_UNKN) {
+ btf_verifier_log(env, "[%u] Invalid kind:%u",
+ env->log_type_id, BTF_INFO_KIND(t->info));
+ return -EINVAL;
+ }
+
+ if (!btf_name_offset_valid(env->btf, t->name_off)) {
+ btf_verifier_log(env, "[%u] Invalid name_offset:%u",
+ env->log_type_id, t->name_off);
+ return -EINVAL;
+ }
+
+ var_meta_size = btf_type_ops(t)->check_meta(env, t, meta_left);
+ if (var_meta_size < 0)
+ return var_meta_size;
+
+ meta_left -= var_meta_size;
+
+ return saved_meta_left - meta_left;
+}
+
+static int btf_check_all_metas(struct btf_verifier_env *env)
+{
+ struct btf *btf = env->btf;
+ struct btf_header *hdr;
+ void *cur, *end;
+
+ hdr = &btf->hdr;
+ cur = btf->nohdr_data + hdr->type_off;
+ end = btf->nohdr_data + hdr->type_len;
+
+ env->log_type_id = 1;
+ while (cur < end) {
+ struct btf_type *t = cur;
+ s32 meta_size;
+
+ meta_size = btf_check_meta(env, t, end - cur);
+ if (meta_size < 0)
+ return meta_size;
+
+ btf_add_type(env, t);
+ cur += meta_size;
+ env->log_type_id++;
+ }
+
+ return 0;
+}
+
+static int btf_resolve(struct btf_verifier_env *env,
+ const struct btf_type *t, u32 type_id)
+{
+ const struct resolve_vertex *v;
+ int err = 0;
+
+ env->resolve_mode = RESOLVE_TBD;
+ env_stack_push(env, t, type_id);
+ while (!err && (v = env_stack_peak(env))) {
+ env->log_type_id = v->type_id;
+ err = btf_type_ops(v->t)->resolve(env, v);
+ }
+
+ env->log_type_id = type_id;
+ if (err == -E2BIG)
+ btf_verifier_log_type(env, t,
+ "Exceeded max resolving depth:%u",
+ MAX_RESOLVE_DEPTH);
+ else if (err == -EEXIST)
+ btf_verifier_log_type(env, t, "Loop detected");
+
+ return err;
+}
+
+static bool btf_resolve_valid(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 type_id)
+{
+ struct btf *btf = env->btf;
+
+ if (!env_type_is_resolved(env, type_id))
+ return false;
+
+ if (btf_type_is_struct(t))
+ return !btf->resolved_ids[type_id] &&
+ !btf->resolved_sizes[type_id];
+
+ if (btf_type_is_modifier(t) || btf_type_is_ptr(t)) {
+ t = btf_type_id_resolve(btf, &type_id);
+ return t && !btf_type_is_modifier(t);
+ }
+
+ if (btf_type_is_array(t)) {
+ const struct btf_array *array = btf_type_array(t);
+ const struct btf_type *elem_type;
+ u32 elem_type_id = array->type;
+ u32 elem_size;
+
+ elem_type = btf_type_id_size(btf, &elem_type_id, &elem_size);
+ return elem_type && !btf_type_is_modifier(elem_type) &&
+ (array->nelems * elem_size ==
+ btf->resolved_sizes[type_id]);
+ }
+
+ return false;
+}
+
+static int btf_check_all_types(struct btf_verifier_env *env)
+{
+ struct btf *btf = env->btf;
+ u32 type_id;
+ int err;
+
+ err = env_resolve_init(env);
+ if (err)
+ return err;
+
+ env->phase++;
+ for (type_id = 1; type_id <= btf->nr_types; type_id++) {
+ const struct btf_type *t = btf_type_by_id(btf, type_id);
+
+ env->log_type_id = type_id;
+ if (btf_type_needs_resolve(t) &&
+ !env_type_is_resolved(env, type_id)) {
+ err = btf_resolve(env, t, type_id);
+ if (err)
+ return err;
+ }
+
+ if (btf_type_needs_resolve(t) &&
+ !btf_resolve_valid(env, t, type_id)) {
+ btf_verifier_log_type(env, t, "Invalid resolve state");
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static int btf_parse_type_sec(struct btf_verifier_env *env)
+{
+ const struct btf_header *hdr = &env->btf->hdr;
+ int err;
+
+ /* Type section must align to 4 bytes */
+ if (hdr->type_off & (sizeof(u32) - 1)) {
+ btf_verifier_log(env, "Unaligned type_off");
+ return -EINVAL;
+ }
+
+ if (!hdr->type_len) {
+ btf_verifier_log(env, "No type found");
+ return -EINVAL;
+ }
+
+ err = btf_check_all_metas(env);
+ if (err)
+ return err;
+
+ return btf_check_all_types(env);
+}
+
+static int btf_parse_str_sec(struct btf_verifier_env *env)
+{
+ const struct btf_header *hdr;
+ struct btf *btf = env->btf;
+ const char *start, *end;
+
+ hdr = &btf->hdr;
+ start = btf->nohdr_data + hdr->str_off;
+ end = start + hdr->str_len;
+
+ if (end != btf->data + btf->data_size) {
+ btf_verifier_log(env, "String section is not at the end");
+ return -EINVAL;
+ }
+
+ if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_NAME_OFFSET ||
+ start[0] || end[-1]) {
+ btf_verifier_log(env, "Invalid string section");
+ return -EINVAL;
+ }
+
+ btf->strings = start;
+
+ return 0;
+}
+
+static const size_t btf_sec_info_offset[] = {
+ offsetof(struct btf_header, type_off),
+ offsetof(struct btf_header, str_off),
+};
+
+static int btf_sec_info_cmp(const void *a, const void *b)
+{
+ const struct btf_sec_info *x = a;
+ const struct btf_sec_info *y = b;
+
+ return (int)(x->off - y->off) ? : (int)(x->len - y->len);
+}
+
+static int btf_check_sec_info(struct btf_verifier_env *env,
+ u32 btf_data_size)
+{
+ struct btf_sec_info secs[ARRAY_SIZE(btf_sec_info_offset)];
+ u32 total, expected_total, i;
+ const struct btf_header *hdr;
+ const struct btf *btf;
+
+ btf = env->btf;
+ hdr = &btf->hdr;
+
+ /* Populate the secs from hdr */
+ for (i = 0; i < ARRAY_SIZE(btf_sec_info_offset); i++)
+ secs[i] = *(struct btf_sec_info *)((void *)hdr +
+ btf_sec_info_offset[i]);
+
+ sort(secs, ARRAY_SIZE(btf_sec_info_offset),
+ sizeof(struct btf_sec_info), btf_sec_info_cmp, NULL);
+
+ /* Check for gaps and overlap among sections */
+ total = 0;
+ expected_total = btf_data_size - hdr->hdr_len;
+ for (i = 0; i < ARRAY_SIZE(btf_sec_info_offset); i++) {
+ if (expected_total < secs[i].off) {
+ btf_verifier_log(env, "Invalid section offset");
+ return -EINVAL;
+ }
+ if (total < secs[i].off) {
+ /* gap */
+ btf_verifier_log(env, "Unsupported section found");
+ return -EINVAL;
+ }
+ if (total > secs[i].off) {
+ btf_verifier_log(env, "Section overlap found");
+ return -EINVAL;
+ }
+ if (expected_total - total < secs[i].len) {
+ btf_verifier_log(env,
+ "Total section length too long");
+ return -EINVAL;
+ }
+ total += secs[i].len;
+ }
+
+ /* There is data other than hdr and known sections */
+ if (expected_total != total) {
+ btf_verifier_log(env, "Unsupported section found");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int btf_parse_hdr(struct btf_verifier_env *env, void __user *btf_data,
+ u32 btf_data_size)
+{
+ const struct btf_header *hdr;
+ u32 hdr_len, hdr_copy;
+ /*
+ * Minimal part of the "struct btf_header" that
+ * contains the hdr_len.
+ */
+ struct btf_min_header {
+ u16 magic;
+ u8 version;
+ u8 flags;
+ u32 hdr_len;
+ } __user *min_hdr;
+ struct btf *btf;
+ int err;
+
+ btf = env->btf;
+ min_hdr = btf_data;
+
+ if (btf_data_size < sizeof(*min_hdr)) {
+ btf_verifier_log(env, "hdr_len not found");
+ return -EINVAL;
+ }
+
+ if (get_user(hdr_len, &min_hdr->hdr_len))
+ return -EFAULT;
+
+ if (btf_data_size < hdr_len) {
+ btf_verifier_log(env, "btf_header not found");
+ return -EINVAL;
+ }
+
+ err = bpf_check_uarg_tail_zero(btf_data, sizeof(btf->hdr), hdr_len);
+ if (err) {
+ if (err == -E2BIG)
+ btf_verifier_log(env, "Unsupported btf_header");
+ return err;
+ }
+
+ hdr_copy = min_t(u32, hdr_len, sizeof(btf->hdr));
+ if (copy_from_user(&btf->hdr, btf_data, hdr_copy))
+ return -EFAULT;
+
+ hdr = &btf->hdr;
+
+ btf_verifier_log_hdr(env, btf_data_size);
+
+ if (hdr->magic != BTF_MAGIC) {
+ btf_verifier_log(env, "Invalid magic");
+ return -EINVAL;
+ }
+
+ if (hdr->version != BTF_VERSION) {
+ btf_verifier_log(env, "Unsupported version");
+ return -ENOTSUPP;
+ }
+
+ if (hdr->flags) {
+ btf_verifier_log(env, "Unsupported flags");
+ return -ENOTSUPP;
+ }
+
+ if (btf_data_size == hdr->hdr_len) {
+ btf_verifier_log(env, "No data");
+ return -EINVAL;
+ }
+
+ err = btf_check_sec_info(env, btf_data_size);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
+ u32 log_level, char __user *log_ubuf, u32 log_size)
+{
+ struct btf_verifier_env *env = NULL;
+ struct bpf_verifier_log *log;
+ struct btf *btf = NULL;
+ u8 *data;
+ int err;
+
+ if (btf_data_size > BTF_MAX_SIZE)
+ return ERR_PTR(-E2BIG);
+
+ env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
+ if (!env)
+ return ERR_PTR(-ENOMEM);
+
+ log = &env->log;
+ if (log_level || log_ubuf || log_size) {
+ /* user requested verbose verifier output
+ * and supplied buffer to store the verification trace
+ */
+ log->level = log_level;
+ log->ubuf = log_ubuf;
+ log->len_total = log_size;
+
+ /* log attributes have to be sane */
+ if (log->len_total < 128 || log->len_total > UINT_MAX >> 8 ||
+ !log->level || !log->ubuf) {
+ err = -EINVAL;
+ goto errout;
+ }
+ }
+
+ btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
+ if (!btf) {
+ err = -ENOMEM;
+ goto errout;
+ }
+ env->btf = btf;
+
+ err = btf_parse_hdr(env, btf_data, btf_data_size);
+ if (err)
+ goto errout;
+
+ data = kvmalloc(btf_data_size, GFP_KERNEL | __GFP_NOWARN);
+ if (!data) {
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ btf->data = data;
+ btf->data_size = btf_data_size;
+ btf->nohdr_data = btf->data + btf->hdr.hdr_len;
+
+ if (copy_from_user(data, btf_data, btf_data_size)) {
+ err = -EFAULT;
+ goto errout;
+ }
+
+ err = btf_parse_str_sec(env);
+ if (err)
+ goto errout;
+
+ err = btf_parse_type_sec(env);
+ if (err)
+ goto errout;
+
+ if (log->level && bpf_verifier_log_full(log)) {
+ err = -ENOSPC;
+ goto errout;
+ }
+
+ btf_verifier_env_free(env);
+ refcount_set(&btf->refcnt, 1);
+ return btf;
+
+errout:
+ btf_verifier_env_free(env);
+ if (btf)
+ btf_free(btf);
+ return ERR_PTR(err);
+}
+
+void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
+ struct seq_file *m)
+{
+ const struct btf_type *t = btf_type_by_id(btf, type_id);
+
+ btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
+}
+
+static int btf_release(struct inode *inode, struct file *filp)
+{
+ btf_put(filp->private_data);
+ return 0;
+}
+
+const struct file_operations btf_fops = {
+ .release = btf_release,
+};
+
+static int __btf_new_fd(struct btf *btf)
+{
+ return anon_inode_getfd("btf", &btf_fops, btf, O_RDONLY | O_CLOEXEC);
+}
+
+int btf_new_fd(const union bpf_attr *attr)
+{
+ struct btf *btf;
+ int ret;
+
+ btf = btf_parse(u64_to_user_ptr(attr->btf),
+ attr->btf_size, attr->btf_log_level,
+ u64_to_user_ptr(attr->btf_log_buf),
+ attr->btf_log_size);
+ if (IS_ERR(btf))
+ return PTR_ERR(btf);
+
+ ret = btf_alloc_id(btf);
+ if (ret) {
+ btf_free(btf);
+ return ret;
+ }
+
+ /*
+ * The BTF ID is published to the userspace.
+ * All BTF free must go through call_rcu() from
+ * now on (i.e. free by calling btf_put()).
+ */
+
+ ret = __btf_new_fd(btf);
+ if (ret < 0)
+ btf_put(btf);
+
+ return ret;
+}
+
+struct btf *btf_get_by_fd(int fd)
+{
+ struct btf *btf;
+ struct fd f;
+
+ f = fdget(fd);
+
+ if (!f.file)
+ return ERR_PTR(-EBADF);
+
+ if (f.file->f_op != &btf_fops) {
+ fdput(f);
+ return ERR_PTR(-EINVAL);
+ }
+
+ btf = f.file->private_data;
+ refcount_inc(&btf->refcnt);
+ fdput(f);
+
+ return btf;
+}
+
+int btf_get_info_by_fd(const struct btf *btf,
+ const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ struct bpf_btf_info __user *uinfo;
+ struct bpf_btf_info info = {};
+ u32 info_copy, btf_copy;
+ void __user *ubtf;
+ u32 uinfo_len;
+
+ uinfo = u64_to_user_ptr(attr->info.info);
+ uinfo_len = attr->info.info_len;
+
+ info_copy = min_t(u32, uinfo_len, sizeof(info));
+ if (copy_from_user(&info, uinfo, info_copy))
+ return -EFAULT;
+
+ info.id = btf->id;
+ ubtf = u64_to_user_ptr(info.btf);
+ btf_copy = min_t(u32, btf->data_size, info.btf_size);
+ if (copy_to_user(ubtf, btf->data, btf_copy))
+ return -EFAULT;
+ info.btf_size = btf->data_size;
+
+ if (copy_to_user(uinfo, &info, info_copy) ||
+ put_user(info_copy, &uattr->info.info_len))
+ return -EFAULT;
+
+ return 0;
+}
+
+int btf_get_fd_by_id(u32 id)
+{
+ struct btf *btf;
+ int fd;
+
+ rcu_read_lock();
+ btf = idr_find(&btf_idr, id);
+ if (!btf || !refcount_inc_not_zero(&btf->refcnt))
+ btf = ERR_PTR(-ENOENT);
+ rcu_read_unlock();
+
+ if (IS_ERR(btf))
+ return PTR_ERR(btf);
+
+ fd = __btf_new_fd(btf);
+ if (fd < 0)
+ btf_put(btf);
+
+ return fd;
+}
+
+u32 btf_id(const struct btf *btf)
+{
+ return btf->id;
+}
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 6ef6746..b574ddd 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -31,6 +31,7 @@
#include <linux/rbtree_latch.h>
#include <linux/kallsyms.h>
#include <linux/rcupdate.h>
+#include <linux/perf_event.h>
#include <asm/unaligned.h>
@@ -683,23 +684,6 @@
*to++ = BPF_JMP_REG(from->code, from->dst_reg, BPF_REG_AX, off);
break;
- case BPF_LD | BPF_ABS | BPF_W:
- case BPF_LD | BPF_ABS | BPF_H:
- case BPF_LD | BPF_ABS | BPF_B:
- *to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
- *to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
- *to++ = BPF_LD_IND(from->code, BPF_REG_AX, 0);
- break;
-
- case BPF_LD | BPF_IND | BPF_W:
- case BPF_LD | BPF_IND | BPF_H:
- case BPF_LD | BPF_IND | BPF_B:
- *to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
- *to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
- *to++ = BPF_ALU32_REG(BPF_ADD, BPF_REG_AX, from->src_reg);
- *to++ = BPF_LD_IND(from->code, BPF_REG_AX, 0);
- break;
-
case BPF_LD | BPF_IMM | BPF_DW:
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ aux[1].imm);
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
@@ -940,14 +924,7 @@
INSN_3(LDX, MEM, W), \
INSN_3(LDX, MEM, DW), \
/* Immediate based. */ \
- INSN_3(LD, IMM, DW), \
- /* Misc (old cBPF carry-over). */ \
- INSN_3(LD, ABS, B), \
- INSN_3(LD, ABS, H), \
- INSN_3(LD, ABS, W), \
- INSN_3(LD, IND, B), \
- INSN_3(LD, IND, H), \
- INSN_3(LD, IND, W)
+ INSN_3(LD, IMM, DW)
bool bpf_opcode_in_insntable(u8 code)
{
@@ -957,6 +934,13 @@
[0 ... 255] = false,
/* Now overwrite non-defaults ... */
BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
+ /* UAPI exposed, but rewritten opcodes. cBPF carry-over. */
+ [BPF_LD | BPF_ABS | BPF_B] = true,
+ [BPF_LD | BPF_ABS | BPF_H] = true,
+ [BPF_LD | BPF_ABS | BPF_W] = true,
+ [BPF_LD | BPF_IND | BPF_B] = true,
+ [BPF_LD | BPF_IND | BPF_H] = true,
+ [BPF_LD | BPF_IND | BPF_W] = true,
};
#undef BPF_INSN_3_TBL
#undef BPF_INSN_2_TBL
@@ -987,8 +971,6 @@
#undef BPF_INSN_3_LBL
#undef BPF_INSN_2_LBL
u32 tail_call_cnt = 0;
- void *ptr;
- int off;
#define CONT ({ insn++; goto select_insn; })
#define CONT_JMP ({ insn++; goto select_insn; })
@@ -1315,67 +1297,6 @@
atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
(DST + insn->off));
CONT;
- LD_ABS_W: /* BPF_R0 = ntohl(*(u32 *) (skb->data + imm32)) */
- off = IMM;
-load_word:
- /* BPF_LD + BPD_ABS and BPF_LD + BPF_IND insns are only
- * appearing in the programs where ctx == skb
- * (see may_access_skb() in the verifier). All programs
- * keep 'ctx' in regs[BPF_REG_CTX] == BPF_R6,
- * bpf_convert_filter() saves it in BPF_R6, internal BPF
- * verifier will check that BPF_R6 == ctx.
- *
- * BPF_ABS and BPF_IND are wrappers of function calls,
- * so they scratch BPF_R1-BPF_R5 registers, preserve
- * BPF_R6-BPF_R9, and store return value into BPF_R0.
- *
- * Implicit input:
- * ctx == skb == BPF_R6 == CTX
- *
- * Explicit input:
- * SRC == any register
- * IMM == 32-bit immediate
- *
- * Output:
- * BPF_R0 - 8/16/32-bit skb data converted to cpu endianness
- */
-
- ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 4, &tmp);
- if (likely(ptr != NULL)) {
- BPF_R0 = get_unaligned_be32(ptr);
- CONT;
- }
-
- return 0;
- LD_ABS_H: /* BPF_R0 = ntohs(*(u16 *) (skb->data + imm32)) */
- off = IMM;
-load_half:
- ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 2, &tmp);
- if (likely(ptr != NULL)) {
- BPF_R0 = get_unaligned_be16(ptr);
- CONT;
- }
-
- return 0;
- LD_ABS_B: /* BPF_R0 = *(u8 *) (skb->data + imm32) */
- off = IMM;
-load_byte:
- ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 1, &tmp);
- if (likely(ptr != NULL)) {
- BPF_R0 = *(u8 *)ptr;
- CONT;
- }
-
- return 0;
- LD_IND_W: /* BPF_R0 = ntohl(*(u32 *) (skb->data + src_reg + imm32)) */
- off = IMM + SRC;
- goto load_word;
- LD_IND_H: /* BPF_R0 = ntohs(*(u16 *) (skb->data + src_reg + imm32)) */
- off = IMM + SRC;
- goto load_half;
- LD_IND_B: /* BPF_R0 = *(u8 *) (skb->data + src_reg + imm32) */
- off = IMM + SRC;
- goto load_byte;
default_label:
/* If we ever reach this, we have a bug somewhere. Die hard here
@@ -1772,6 +1693,10 @@
aux = container_of(work, struct bpf_prog_aux, work);
if (bpf_prog_is_dev_bound(aux))
bpf_prog_offload_destroy(aux->prog);
+#ifdef CONFIG_PERF_EVENTS
+ if (aux->prog->has_callchain_buf)
+ put_callchain_buffers();
+#endif
for (i = 0; i < aux->func_cnt; i++)
bpf_jit_free(aux->func[i]);
if (aux->func_cnt) {
@@ -1832,6 +1757,7 @@
const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
const struct bpf_func_proto bpf_get_current_comm_proto __weak;
const struct bpf_func_proto bpf_sock_map_update_proto __weak;
+const struct bpf_func_proto bpf_sock_hash_update_proto __weak;
const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
{
@@ -1844,6 +1770,7 @@
{
return -ENOTSUPP;
}
+EXPORT_SYMBOL_GPL(bpf_event_output);
/* Always built-in helper functions. */
const struct bpf_func_proto bpf_tail_call_proto = {
@@ -1890,9 +1817,3 @@
#include <linux/bpf_trace.h>
EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
-
-/* These are only used within the BPF_SYSCALL code */
-#ifdef CONFIG_BPF_SYSCALL
-EXPORT_TRACEPOINT_SYMBOL_GPL(bpf_prog_get_type);
-EXPORT_TRACEPOINT_SYMBOL_GPL(bpf_prog_put_rcu);
-#endif
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index a4bb0b3..e0918d1 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -19,6 +19,7 @@
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/ptr_ring.h>
+#include <net/xdp.h>
#include <linux/sched.h>
#include <linux/workqueue.h>
@@ -137,27 +138,6 @@
return ERR_PTR(err);
}
-static void __cpu_map_queue_destructor(void *ptr)
-{
- /* The tear-down procedure should have made sure that queue is
- * empty. See __cpu_map_entry_replace() and work-queue
- * invoked cpu_map_kthread_stop(). Catch any broken behaviour
- * gracefully and warn once.
- */
- if (WARN_ON_ONCE(ptr))
- page_frag_free(ptr);
-}
-
-static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
-{
- if (atomic_dec_and_test(&rcpu->refcnt)) {
- /* The queue should be empty at this point */
- ptr_ring_cleanup(rcpu->queue, __cpu_map_queue_destructor);
- kfree(rcpu->queue);
- kfree(rcpu);
- }
-}
-
static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
{
atomic_inc(&rcpu->refcnt);
@@ -179,45 +159,8 @@
kthread_stop(rcpu->kthread);
}
-/* For now, xdp_pkt is a cpumap internal data structure, with info
- * carried between enqueue to dequeue. It is mapped into the top
- * headroom of the packet, to avoid allocating separate mem.
- */
-struct xdp_pkt {
- void *data;
- u16 len;
- u16 headroom;
- u16 metasize;
- struct net_device *dev_rx;
-};
-
-/* Convert xdp_buff to xdp_pkt */
-static struct xdp_pkt *convert_to_xdp_pkt(struct xdp_buff *xdp)
-{
- struct xdp_pkt *xdp_pkt;
- int metasize;
- int headroom;
-
- /* Assure headroom is available for storing info */
- headroom = xdp->data - xdp->data_hard_start;
- metasize = xdp->data - xdp->data_meta;
- metasize = metasize > 0 ? metasize : 0;
- if (unlikely((headroom - metasize) < sizeof(*xdp_pkt)))
- return NULL;
-
- /* Store info in top of packet */
- xdp_pkt = xdp->data_hard_start;
-
- xdp_pkt->data = xdp->data;
- xdp_pkt->len = xdp->data_end - xdp->data;
- xdp_pkt->headroom = headroom - sizeof(*xdp_pkt);
- xdp_pkt->metasize = metasize;
-
- return xdp_pkt;
-}
-
static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
- struct xdp_pkt *xdp_pkt)
+ struct xdp_frame *xdpf)
{
unsigned int frame_size;
void *pkt_data_start;
@@ -232,7 +175,7 @@
* would be preferred to set frame_size to 2048 or 4096
* depending on the driver.
* frame_size = 2048;
- * frame_len = frame_size - sizeof(*xdp_pkt);
+ * frame_len = frame_size - sizeof(*xdp_frame);
*
* Instead, with info avail, skb_shared_info in placed after
* packet len. This, unfortunately fakes the truesize.
@@ -240,21 +183,21 @@
* is not at a fixed memory location, with mixed length
* packets, which is bad for cache-line hotness.
*/
- frame_size = SKB_DATA_ALIGN(xdp_pkt->len) + xdp_pkt->headroom +
+ frame_size = SKB_DATA_ALIGN(xdpf->len) + xdpf->headroom +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- pkt_data_start = xdp_pkt->data - xdp_pkt->headroom;
+ pkt_data_start = xdpf->data - xdpf->headroom;
skb = build_skb(pkt_data_start, frame_size);
if (!skb)
return NULL;
- skb_reserve(skb, xdp_pkt->headroom);
- __skb_put(skb, xdp_pkt->len);
- if (xdp_pkt->metasize)
- skb_metadata_set(skb, xdp_pkt->metasize);
+ skb_reserve(skb, xdpf->headroom);
+ __skb_put(skb, xdpf->len);
+ if (xdpf->metasize)
+ skb_metadata_set(skb, xdpf->metasize);
/* Essential SKB info: protocol and skb->dev */
- skb->protocol = eth_type_trans(skb, xdp_pkt->dev_rx);
+ skb->protocol = eth_type_trans(skb, xdpf->dev_rx);
/* Optional SKB info, currently missing:
* - HW checksum info (skb->ip_summed)
@@ -265,6 +208,31 @@
return skb;
}
+static void __cpu_map_ring_cleanup(struct ptr_ring *ring)
+{
+ /* The tear-down procedure should have made sure that queue is
+ * empty. See __cpu_map_entry_replace() and work-queue
+ * invoked cpu_map_kthread_stop(). Catch any broken behaviour
+ * gracefully and warn once.
+ */
+ struct xdp_frame *xdpf;
+
+ while ((xdpf = ptr_ring_consume(ring)))
+ if (WARN_ON_ONCE(xdpf))
+ xdp_return_frame(xdpf);
+}
+
+static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
+{
+ if (atomic_dec_and_test(&rcpu->refcnt)) {
+ /* The queue should be empty at this point */
+ __cpu_map_ring_cleanup(rcpu->queue);
+ ptr_ring_cleanup(rcpu->queue, NULL);
+ kfree(rcpu->queue);
+ kfree(rcpu);
+ }
+}
+
static int cpu_map_kthread_run(void *data)
{
struct bpf_cpu_map_entry *rcpu = data;
@@ -278,7 +246,7 @@
*/
while (!kthread_should_stop() || !__ptr_ring_empty(rcpu->queue)) {
unsigned int processed = 0, drops = 0, sched = 0;
- struct xdp_pkt *xdp_pkt;
+ struct xdp_frame *xdpf;
/* Release CPU reschedule checks */
if (__ptr_ring_empty(rcpu->queue)) {
@@ -301,13 +269,13 @@
* kthread CPU pinned. Lockless access to ptr_ring
* consume side valid as no-resize allowed of queue.
*/
- while ((xdp_pkt = __ptr_ring_consume(rcpu->queue))) {
+ while ((xdpf = __ptr_ring_consume(rcpu->queue))) {
struct sk_buff *skb;
int ret;
- skb = cpu_map_build_skb(rcpu, xdp_pkt);
+ skb = cpu_map_build_skb(rcpu, xdpf);
if (!skb) {
- page_frag_free(xdp_pkt);
+ xdp_return_frame(xdpf);
continue;
}
@@ -604,13 +572,13 @@
spin_lock(&q->producer_lock);
for (i = 0; i < bq->count; i++) {
- void *xdp_pkt = bq->q[i];
+ struct xdp_frame *xdpf = bq->q[i];
int err;
- err = __ptr_ring_produce(q, xdp_pkt);
+ err = __ptr_ring_produce(q, xdpf);
if (err) {
drops++;
- page_frag_free(xdp_pkt); /* Free xdp_pkt */
+ xdp_return_frame_rx_napi(xdpf);
}
processed++;
}
@@ -625,7 +593,7 @@
/* Runs under RCU-read-side, plus in softirq under NAPI protection.
* Thus, safe percpu variable access.
*/
-static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_pkt *xdp_pkt)
+static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf)
{
struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq);
@@ -636,28 +604,28 @@
* driver to code invoking us to finished, due to driver
* (e.g. ixgbe) recycle tricks based on page-refcnt.
*
- * Thus, incoming xdp_pkt is always queued here (else we race
+ * Thus, incoming xdp_frame is always queued here (else we race
* with another CPU on page-refcnt and remaining driver code).
* Queue time is very short, as driver will invoke flush
* operation, when completing napi->poll call.
*/
- bq->q[bq->count++] = xdp_pkt;
+ bq->q[bq->count++] = xdpf;
return 0;
}
int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
struct net_device *dev_rx)
{
- struct xdp_pkt *xdp_pkt;
+ struct xdp_frame *xdpf;
- xdp_pkt = convert_to_xdp_pkt(xdp);
- if (unlikely(!xdp_pkt))
+ xdpf = convert_to_xdp_frame(xdp);
+ if (unlikely(!xdpf))
return -EOVERFLOW;
/* Info needed when constructing SKB on remote CPU */
- xdp_pkt->dev_rx = dev_rx;
+ xdpf->dev_rx = dev_rx;
- bq_enqueue(rcpu, xdp_pkt);
+ bq_enqueue(rcpu, xdpf);
return 0;
}
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 565f9ec..ae16d0c 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -48,15 +48,25 @@
* calls will fail at this point.
*/
#include <linux/bpf.h>
+#include <net/xdp.h>
#include <linux/filter.h>
+#include <trace/events/xdp.h>
#define DEV_CREATE_FLAG_MASK \
(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+#define DEV_MAP_BULK_SIZE 16
+struct xdp_bulk_queue {
+ struct xdp_frame *q[DEV_MAP_BULK_SIZE];
+ struct net_device *dev_rx;
+ unsigned int count;
+};
+
struct bpf_dtab_netdev {
- struct net_device *dev;
+ struct net_device *dev; /* must be first member, due to tracepoint */
struct bpf_dtab *dtab;
unsigned int bit;
+ struct xdp_bulk_queue __percpu *bulkq;
struct rcu_head rcu;
};
@@ -206,6 +216,50 @@
__set_bit(bit, bitmap);
}
+static int bq_xmit_all(struct bpf_dtab_netdev *obj,
+ struct xdp_bulk_queue *bq)
+{
+ struct net_device *dev = obj->dev;
+ int sent = 0, drops = 0, err = 0;
+ int i;
+
+ if (unlikely(!bq->count))
+ return 0;
+
+ for (i = 0; i < bq->count; i++) {
+ struct xdp_frame *xdpf = bq->q[i];
+
+ prefetch(xdpf);
+ }
+
+ sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q);
+ if (sent < 0) {
+ err = sent;
+ sent = 0;
+ goto error;
+ }
+ drops = bq->count - sent;
+out:
+ bq->count = 0;
+
+ trace_xdp_devmap_xmit(&obj->dtab->map, obj->bit,
+ sent, drops, bq->dev_rx, dev, err);
+ bq->dev_rx = NULL;
+ return 0;
+error:
+ /* If ndo_xdp_xmit fails with an errno, no frames have been
+ * xmit'ed and it's our responsibility to them free all.
+ */
+ for (i = 0; i < bq->count; i++) {
+ struct xdp_frame *xdpf = bq->q[i];
+
+ /* RX path under NAPI protection, can return frames faster */
+ xdp_return_frame_rx_napi(xdpf);
+ drops++;
+ }
+ goto out;
+}
+
/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
* from the driver before returning from its napi->poll() routine. The poll()
* routine is called either from busy_poll context or net_rx_action signaled
@@ -221,6 +275,7 @@
for_each_set_bit(bit, bitmap, map->max_entries) {
struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]);
+ struct xdp_bulk_queue *bq;
struct net_device *netdev;
/* This is possible if the dev entry is removed by user space
@@ -230,6 +285,9 @@
continue;
__clear_bit(bit, bitmap);
+
+ bq = this_cpu_ptr(dev->bulkq);
+ bq_xmit_all(dev, bq);
netdev = dev->dev;
if (likely(netdev->netdev_ops->ndo_xdp_flush))
netdev->netdev_ops->ndo_xdp_flush(netdev);
@@ -240,21 +298,61 @@
* update happens in parallel here a dev_put wont happen until after reading the
* ifindex.
*/
-struct net_device *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
+struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
- struct bpf_dtab_netdev *dev;
+ struct bpf_dtab_netdev *obj;
if (key >= map->max_entries)
return NULL;
- dev = READ_ONCE(dtab->netdev_map[key]);
- return dev ? dev->dev : NULL;
+ obj = READ_ONCE(dtab->netdev_map[key]);
+ return obj;
+}
+
+/* Runs under RCU-read-side, plus in softirq under NAPI protection.
+ * Thus, safe percpu variable access.
+ */
+static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
+ struct net_device *dev_rx)
+
+{
+ struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
+
+ if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
+ bq_xmit_all(obj, bq);
+
+ /* Ingress dev_rx will be the same for all xdp_frame's in
+ * bulk_queue, because bq stored per-CPU and must be flushed
+ * from net_device drivers NAPI func end.
+ */
+ if (!bq->dev_rx)
+ bq->dev_rx = dev_rx;
+
+ bq->q[bq->count++] = xdpf;
+ return 0;
+}
+
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+ struct net_device *dev_rx)
+{
+ struct net_device *dev = dst->dev;
+ struct xdp_frame *xdpf;
+
+ if (!dev->netdev_ops->ndo_xdp_xmit)
+ return -EOPNOTSUPP;
+
+ xdpf = convert_to_xdp_frame(xdp);
+ if (unlikely(!xdpf))
+ return -EOVERFLOW;
+
+ return bq_enqueue(dst, xdpf, dev_rx);
}
static void *dev_map_lookup_elem(struct bpf_map *map, void *key)
{
- struct net_device *dev = __dev_map_lookup_elem(map, *(u32 *)key);
+ struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key);
+ struct net_device *dev = dev = obj ? obj->dev : NULL;
return dev ? &dev->ifindex : NULL;
}
@@ -263,13 +361,18 @@
{
if (dev->dev->netdev_ops->ndo_xdp_flush) {
struct net_device *fl = dev->dev;
+ struct xdp_bulk_queue *bq;
unsigned long *bitmap;
+
int cpu;
for_each_online_cpu(cpu) {
bitmap = per_cpu_ptr(dev->dtab->flush_needed, cpu);
__clear_bit(dev->bit, bitmap);
+ bq = per_cpu_ptr(dev->bulkq, cpu);
+ bq_xmit_all(dev, bq);
+
fl->netdev_ops->ndo_xdp_flush(dev->dev);
}
}
@@ -281,6 +384,7 @@
dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
dev_map_flush_old(dev);
+ free_percpu(dev->bulkq);
dev_put(dev->dev);
kfree(dev);
}
@@ -313,6 +417,7 @@
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct net *net = current->nsproxy->net_ns;
+ gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
struct bpf_dtab_netdev *dev, *old_dev;
u32 i = *(u32 *)key;
u32 ifindex = *(u32 *)value;
@@ -327,13 +432,20 @@
if (!ifindex) {
dev = NULL;
} else {
- dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN,
- map->numa_node);
+ dev = kmalloc_node(sizeof(*dev), gfp, map->numa_node);
if (!dev)
return -ENOMEM;
+ dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq),
+ sizeof(void *), gfp);
+ if (!dev->bulkq) {
+ kfree(dev);
+ return -ENOMEM;
+ }
+
dev->dev = dev_get_by_index(net, ifindex);
if (!dev->dev) {
+ free_percpu(dev->bulkq);
kfree(dev);
return -EINVAL;
}
@@ -405,6 +517,9 @@
static int __init dev_map_init(void)
{
+ /* Assure tracepoint shadow struct _bpf_dtab_netdev is in sync */
+ BUILD_BUG_ON(offsetof(struct bpf_dtab_netdev, dev) !=
+ offsetof(struct _bpf_dtab_netdev, dev));
register_netdevice_notifier(&dev_map_notifier);
return 0;
}
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index bf6da59..ed13645 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -150,8 +150,154 @@
return 0;
}
+struct map_iter {
+ void *key;
+ bool done;
+};
+
+static struct map_iter *map_iter(struct seq_file *m)
+{
+ return m->private;
+}
+
+static struct bpf_map *seq_file_to_map(struct seq_file *m)
+{
+ return file_inode(m->file)->i_private;
+}
+
+static void map_iter_free(struct map_iter *iter)
+{
+ if (iter) {
+ kfree(iter->key);
+ kfree(iter);
+ }
+}
+
+static struct map_iter *map_iter_alloc(struct bpf_map *map)
+{
+ struct map_iter *iter;
+
+ iter = kzalloc(sizeof(*iter), GFP_KERNEL | __GFP_NOWARN);
+ if (!iter)
+ goto error;
+
+ iter->key = kzalloc(map->key_size, GFP_KERNEL | __GFP_NOWARN);
+ if (!iter->key)
+ goto error;
+
+ return iter;
+
+error:
+ map_iter_free(iter);
+ return NULL;
+}
+
+static void *map_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct bpf_map *map = seq_file_to_map(m);
+ void *key = map_iter(m)->key;
+
+ if (map_iter(m)->done)
+ return NULL;
+
+ if (unlikely(v == SEQ_START_TOKEN))
+ goto done;
+
+ if (map->ops->map_get_next_key(map, key, key)) {
+ map_iter(m)->done = true;
+ return NULL;
+ }
+
+done:
+ ++(*pos);
+ return key;
+}
+
+static void *map_seq_start(struct seq_file *m, loff_t *pos)
+{
+ if (map_iter(m)->done)
+ return NULL;
+
+ return *pos ? map_iter(m)->key : SEQ_START_TOKEN;
+}
+
+static void map_seq_stop(struct seq_file *m, void *v)
+{
+}
+
+static int map_seq_show(struct seq_file *m, void *v)
+{
+ struct bpf_map *map = seq_file_to_map(m);
+ void *key = map_iter(m)->key;
+
+ if (unlikely(v == SEQ_START_TOKEN)) {
+ seq_puts(m, "# WARNING!! The output is for debug purpose only\n");
+ seq_puts(m, "# WARNING!! The output format will change\n");
+ } else {
+ map->ops->map_seq_show_elem(map, key, m);
+ }
+
+ return 0;
+}
+
+static const struct seq_operations bpffs_map_seq_ops = {
+ .start = map_seq_start,
+ .next = map_seq_next,
+ .show = map_seq_show,
+ .stop = map_seq_stop,
+};
+
+static int bpffs_map_open(struct inode *inode, struct file *file)
+{
+ struct bpf_map *map = inode->i_private;
+ struct map_iter *iter;
+ struct seq_file *m;
+ int err;
+
+ iter = map_iter_alloc(map);
+ if (!iter)
+ return -ENOMEM;
+
+ err = seq_open(file, &bpffs_map_seq_ops);
+ if (err) {
+ map_iter_free(iter);
+ return err;
+ }
+
+ m = file->private_data;
+ m->private = iter;
+
+ return 0;
+}
+
+static int bpffs_map_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *m = file->private_data;
+
+ map_iter_free(map_iter(m));
+
+ return seq_release(inode, file);
+}
+
+/* bpffs_map_fops should only implement the basic
+ * read operation for a BPF map. The purpose is to
+ * provide a simple user intuitive way to do
+ * "cat bpffs/pathto/a-pinned-map".
+ *
+ * Other operations (e.g. write, lookup...) should be realized by
+ * the userspace tools (e.g. bpftool) through the
+ * BPF_OBJ_GET_INFO_BY_FD and the map's lookup/update
+ * interface.
+ */
+static const struct file_operations bpffs_map_fops = {
+ .open = bpffs_map_open,
+ .read = seq_read,
+ .release = bpffs_map_release,
+};
+
static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw,
- const struct inode_operations *iops)
+ const struct inode_operations *iops,
+ const struct file_operations *fops)
{
struct inode *dir = dentry->d_parent->d_inode;
struct inode *inode = bpf_get_inode(dir->i_sb, dir, mode);
@@ -159,6 +305,7 @@
return PTR_ERR(inode);
inode->i_op = iops;
+ inode->i_fop = fops;
inode->i_private = raw;
bpf_dentry_finalize(dentry, inode, dir);
@@ -167,12 +314,15 @@
static int bpf_mkprog(struct dentry *dentry, umode_t mode, void *arg)
{
- return bpf_mkobj_ops(dentry, mode, arg, &bpf_prog_iops);
+ return bpf_mkobj_ops(dentry, mode, arg, &bpf_prog_iops, NULL);
}
static int bpf_mkmap(struct dentry *dentry, umode_t mode, void *arg)
{
- return bpf_mkobj_ops(dentry, mode, arg, &bpf_map_iops);
+ struct bpf_map *map = arg;
+
+ return bpf_mkobj_ops(dentry, mode, arg, &bpf_map_iops,
+ map->btf ? &bpffs_map_fops : NULL);
}
static struct dentry *
@@ -279,13 +429,6 @@
ret = bpf_obj_do_pin(pname, raw, type);
if (ret != 0)
bpf_any_put(raw, type);
- if ((trace_bpf_obj_pin_prog_enabled() ||
- trace_bpf_obj_pin_map_enabled()) && !ret) {
- if (type == BPF_TYPE_PROG)
- trace_bpf_obj_pin_prog(raw, ufd, pname);
- if (type == BPF_TYPE_MAP)
- trace_bpf_obj_pin_map(raw, ufd, pname);
- }
out:
putname(pname);
return ret;
@@ -352,15 +495,8 @@
else
goto out;
- if (ret < 0) {
+ if (ret < 0)
bpf_any_put(raw, type);
- } else if (trace_bpf_obj_get_prog_enabled() ||
- trace_bpf_obj_get_map_enabled()) {
- if (type == BPF_TYPE_PROG)
- trace_bpf_obj_get_prog(raw, ret, pname);
- if (type == BPF_TYPE_MAP)
- trace_bpf_obj_get_map(raw, ret, pname);
- }
out:
putname(pname);
return ret;
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index c940107..ac747d5 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -474,8 +474,10 @@
struct bpf_prog_offload *offload;
bool ret;
- if (!bpf_prog_is_dev_bound(prog->aux) || !bpf_map_is_dev_bound(map))
+ if (!bpf_prog_is_dev_bound(prog->aux))
return false;
+ if (!bpf_map_is_dev_bound(map))
+ return bpf_map_offload_neutral(map);
down_read(&bpf_devs_lock);
offload = prog->aux->offload;
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 95a84b2..52a91d8 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -48,14 +48,40 @@
#define SOCK_CREATE_FLAG_MASK \
(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
-struct bpf_stab {
- struct bpf_map map;
- struct sock **sock_map;
+struct bpf_sock_progs {
struct bpf_prog *bpf_tx_msg;
struct bpf_prog *bpf_parse;
struct bpf_prog *bpf_verdict;
};
+struct bpf_stab {
+ struct bpf_map map;
+ struct sock **sock_map;
+ struct bpf_sock_progs progs;
+};
+
+struct bucket {
+ struct hlist_head head;
+ raw_spinlock_t lock;
+};
+
+struct bpf_htab {
+ struct bpf_map map;
+ struct bucket *buckets;
+ atomic_t count;
+ u32 n_buckets;
+ u32 elem_size;
+ struct bpf_sock_progs progs;
+};
+
+struct htab_elem {
+ struct rcu_head rcu;
+ struct hlist_node hash_node;
+ u32 hash;
+ struct sock *sk;
+ char key[0];
+};
+
enum smap_psock_state {
SMAP_TX_RUNNING,
};
@@ -63,6 +89,8 @@
struct smap_psock_map_entry {
struct list_head list;
struct sock **entry;
+ struct htab_elem *hash_link;
+ struct bpf_htab *htab;
};
struct smap_psock {
@@ -191,6 +219,12 @@
rcu_read_unlock();
}
+static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
+{
+ atomic_dec(&htab->count);
+ kfree_rcu(l, rcu);
+}
+
static void bpf_tcp_close(struct sock *sk, long timeout)
{
void (*close_fun)(struct sock *sk, long timeout);
@@ -227,10 +261,16 @@
}
list_for_each_entry_safe(e, tmp, &psock->maps, list) {
- osk = cmpxchg(e->entry, sk, NULL);
- if (osk == sk) {
- list_del(&e->list);
- smap_release_sock(psock, sk);
+ if (e->entry) {
+ osk = cmpxchg(e->entry, sk, NULL);
+ if (osk == sk) {
+ list_del(&e->list);
+ smap_release_sock(psock, sk);
+ }
+ } else {
+ hlist_del_rcu(&e->hash_link->hash_node);
+ smap_release_sock(psock, e->hash_link->sk);
+ free_htab_elem(e->htab, e->hash_link);
}
}
write_unlock_bh(&sk->sk_callback_lock);
@@ -461,7 +501,7 @@
static int bpf_map_msg_verdict(int _rc, struct sk_msg_buff *md)
{
return ((_rc == SK_PASS) ?
- (md->map ? __SK_REDIRECT : __SK_PASS) :
+ (md->sk_redir ? __SK_REDIRECT : __SK_PASS) :
__SK_DROP);
}
@@ -483,6 +523,7 @@
}
bpf_compute_data_pointers_sg(md);
+ md->sk = sk;
rc = (*prog->bpf_func)(md, prog->insnsi);
psock->apply_bytes = md->apply_bytes;
@@ -1092,7 +1133,7 @@
* when we orphan the skb so that we don't have the possibility
* to reference a stale map.
*/
- TCP_SKB_CB(skb)->bpf.map = NULL;
+ TCP_SKB_CB(skb)->bpf.sk_redir = NULL;
skb->sk = psock->sock;
bpf_compute_data_pointers(skb);
preempt_disable();
@@ -1102,7 +1143,7 @@
/* Moving return codes from UAPI namespace into internal namespace */
return rc == SK_PASS ?
- (TCP_SKB_CB(skb)->bpf.map ? __SK_REDIRECT : __SK_PASS) :
+ (TCP_SKB_CB(skb)->bpf.sk_redir ? __SK_REDIRECT : __SK_PASS) :
__SK_DROP;
}
@@ -1372,7 +1413,6 @@
}
static void smap_init_progs(struct smap_psock *psock,
- struct bpf_stab *stab,
struct bpf_prog *verdict,
struct bpf_prog *parse)
{
@@ -1450,14 +1490,13 @@
kfree(psock);
}
-static struct smap_psock *smap_init_psock(struct sock *sock,
- struct bpf_stab *stab)
+static struct smap_psock *smap_init_psock(struct sock *sock, int node)
{
struct smap_psock *psock;
psock = kzalloc_node(sizeof(struct smap_psock),
GFP_ATOMIC | __GFP_NOWARN,
- stab->map.numa_node);
+ node);
if (!psock)
return ERR_PTR(-ENOMEM);
@@ -1525,12 +1564,14 @@
return ERR_PTR(err);
}
-static void smap_list_remove(struct smap_psock *psock, struct sock **entry)
+static void smap_list_remove(struct smap_psock *psock,
+ struct sock **entry,
+ struct htab_elem *hash_link)
{
struct smap_psock_map_entry *e, *tmp;
list_for_each_entry_safe(e, tmp, &psock->maps, list) {
- if (e->entry == entry) {
+ if (e->entry == entry || e->hash_link == hash_link) {
list_del(&e->list);
break;
}
@@ -1568,7 +1609,7 @@
* to be null and queued for garbage collection.
*/
if (likely(psock)) {
- smap_list_remove(psock, &stab->sock_map[i]);
+ smap_list_remove(psock, &stab->sock_map[i], NULL);
smap_release_sock(psock, sock);
}
write_unlock_bh(&sock->sk_callback_lock);
@@ -1627,7 +1668,7 @@
if (psock->bpf_parse)
smap_stop_sock(psock, sock);
- smap_list_remove(psock, &stab->sock_map[k]);
+ smap_list_remove(psock, &stab->sock_map[k], NULL);
smap_release_sock(psock, sock);
out:
write_unlock_bh(&sock->sk_callback_lock);
@@ -1662,40 +1703,26 @@
* - sock_map must use READ_ONCE and (cmp)xchg operations
* - BPF verdict/parse programs must use READ_ONCE and xchg operations
*/
-static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
- struct bpf_map *map,
- void *key, u64 flags)
+
+static int __sock_map_ctx_update_elem(struct bpf_map *map,
+ struct bpf_sock_progs *progs,
+ struct sock *sock,
+ struct sock **map_link,
+ void *key)
{
- struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
- struct smap_psock_map_entry *e = NULL;
struct bpf_prog *verdict, *parse, *tx_msg;
- struct sock *osock, *sock;
+ struct smap_psock_map_entry *e = NULL;
struct smap_psock *psock;
- u32 i = *(u32 *)key;
bool new = false;
- int err;
-
- if (unlikely(flags > BPF_EXIST))
- return -EINVAL;
-
- if (unlikely(i >= stab->map.max_entries))
- return -E2BIG;
-
- sock = READ_ONCE(stab->sock_map[i]);
- if (flags == BPF_EXIST && !sock)
- return -ENOENT;
- else if (flags == BPF_NOEXIST && sock)
- return -EEXIST;
-
- sock = skops->sk;
+ int err = 0;
/* 1. If sock map has BPF programs those will be inherited by the
* sock being added. If the sock is already attached to BPF programs
* this results in an error.
*/
- verdict = READ_ONCE(stab->bpf_verdict);
- parse = READ_ONCE(stab->bpf_parse);
- tx_msg = READ_ONCE(stab->bpf_tx_msg);
+ verdict = READ_ONCE(progs->bpf_verdict);
+ parse = READ_ONCE(progs->bpf_parse);
+ tx_msg = READ_ONCE(progs->bpf_tx_msg);
if (parse && verdict) {
/* bpf prog refcnt may be zero if a concurrent attach operation
@@ -1748,7 +1775,7 @@
goto out_progs;
}
} else {
- psock = smap_init_psock(sock, stab);
+ psock = smap_init_psock(sock, map->numa_node);
if (IS_ERR(psock)) {
err = PTR_ERR(psock);
goto out_progs;
@@ -1758,12 +1785,13 @@
new = true;
}
- e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
- if (!e) {
- err = -ENOMEM;
- goto out_progs;
+ if (map_link) {
+ e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
+ if (!e) {
+ err = -ENOMEM;
+ goto out_progs;
+ }
}
- e->entry = &stab->sock_map[i];
/* 3. At this point we have a reference to a valid psock that is
* running. Attach any BPF programs needed.
@@ -1780,7 +1808,7 @@
err = smap_init_sock(psock, sock);
if (err)
goto out_free;
- smap_init_progs(psock, stab, verdict, parse);
+ smap_init_progs(psock, verdict, parse);
smap_start_sock(psock, sock);
}
@@ -1789,19 +1817,12 @@
* it with. Because we can only have a single set of programs if
* old_sock has a strp we can stop it.
*/
- list_add_tail(&e->list, &psock->maps);
- write_unlock_bh(&sock->sk_callback_lock);
-
- osock = xchg(&stab->sock_map[i], sock);
- if (osock) {
- struct smap_psock *opsock = smap_psock_sk(osock);
-
- write_lock_bh(&osock->sk_callback_lock);
- smap_list_remove(opsock, &stab->sock_map[i]);
- smap_release_sock(opsock, osock);
- write_unlock_bh(&osock->sk_callback_lock);
+ if (map_link) {
+ e->entry = map_link;
+ list_add_tail(&e->list, &psock->maps);
}
- return 0;
+ write_unlock_bh(&sock->sk_callback_lock);
+ return err;
out_free:
smap_release_sock(psock, sock);
out_progs:
@@ -1816,23 +1837,73 @@
return err;
}
-int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type)
+static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
+ struct bpf_map *map,
+ void *key, u64 flags)
{
struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
+ struct bpf_sock_progs *progs = &stab->progs;
+ struct sock *osock, *sock;
+ u32 i = *(u32 *)key;
+ int err;
+
+ if (unlikely(flags > BPF_EXIST))
+ return -EINVAL;
+
+ if (unlikely(i >= stab->map.max_entries))
+ return -E2BIG;
+
+ sock = READ_ONCE(stab->sock_map[i]);
+ if (flags == BPF_EXIST && !sock)
+ return -ENOENT;
+ else if (flags == BPF_NOEXIST && sock)
+ return -EEXIST;
+
+ sock = skops->sk;
+ err = __sock_map_ctx_update_elem(map, progs, sock, &stab->sock_map[i],
+ key);
+ if (err)
+ goto out;
+
+ osock = xchg(&stab->sock_map[i], sock);
+ if (osock) {
+ struct smap_psock *opsock = smap_psock_sk(osock);
+
+ write_lock_bh(&osock->sk_callback_lock);
+ smap_list_remove(opsock, &stab->sock_map[i], NULL);
+ smap_release_sock(opsock, osock);
+ write_unlock_bh(&osock->sk_callback_lock);
+ }
+out:
+ return err;
+}
+
+int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type)
+{
+ struct bpf_sock_progs *progs;
struct bpf_prog *orig;
- if (unlikely(map->map_type != BPF_MAP_TYPE_SOCKMAP))
+ if (map->map_type == BPF_MAP_TYPE_SOCKMAP) {
+ struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
+
+ progs = &stab->progs;
+ } else if (map->map_type == BPF_MAP_TYPE_SOCKHASH) {
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+
+ progs = &htab->progs;
+ } else {
return -EINVAL;
+ }
switch (type) {
case BPF_SK_MSG_VERDICT:
- orig = xchg(&stab->bpf_tx_msg, prog);
+ orig = xchg(&progs->bpf_tx_msg, prog);
break;
case BPF_SK_SKB_STREAM_PARSER:
- orig = xchg(&stab->bpf_parse, prog);
+ orig = xchg(&progs->bpf_parse, prog);
break;
case BPF_SK_SKB_STREAM_VERDICT:
- orig = xchg(&stab->bpf_verdict, prog);
+ orig = xchg(&progs->bpf_verdict, prog);
break;
default:
return -EOPNOTSUPP;
@@ -1880,21 +1951,421 @@
static void sock_map_release(struct bpf_map *map)
{
- struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
+ struct bpf_sock_progs *progs;
struct bpf_prog *orig;
- orig = xchg(&stab->bpf_parse, NULL);
+ if (map->map_type == BPF_MAP_TYPE_SOCKMAP) {
+ struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
+
+ progs = &stab->progs;
+ } else {
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+
+ progs = &htab->progs;
+ }
+
+ orig = xchg(&progs->bpf_parse, NULL);
if (orig)
bpf_prog_put(orig);
- orig = xchg(&stab->bpf_verdict, NULL);
+ orig = xchg(&progs->bpf_verdict, NULL);
if (orig)
bpf_prog_put(orig);
- orig = xchg(&stab->bpf_tx_msg, NULL);
+ orig = xchg(&progs->bpf_tx_msg, NULL);
if (orig)
bpf_prog_put(orig);
}
+static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
+{
+ struct bpf_htab *htab;
+ int i, err;
+ u64 cost;
+
+ if (!capable(CAP_NET_ADMIN))
+ return ERR_PTR(-EPERM);
+
+ /* check sanity of attributes */
+ if (attr->max_entries == 0 || attr->value_size != 4 ||
+ attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
+ return ERR_PTR(-EINVAL);
+
+ if (attr->key_size > MAX_BPF_STACK)
+ /* eBPF programs initialize keys on stack, so they cannot be
+ * larger than max stack size
+ */
+ return ERR_PTR(-E2BIG);
+
+ err = bpf_tcp_ulp_register();
+ if (err && err != -EEXIST)
+ return ERR_PTR(err);
+
+ htab = kzalloc(sizeof(*htab), GFP_USER);
+ if (!htab)
+ return ERR_PTR(-ENOMEM);
+
+ bpf_map_init_from_attr(&htab->map, attr);
+
+ htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
+ htab->elem_size = sizeof(struct htab_elem) +
+ round_up(htab->map.key_size, 8);
+ err = -EINVAL;
+ if (htab->n_buckets == 0 ||
+ htab->n_buckets > U32_MAX / sizeof(struct bucket))
+ goto free_htab;
+
+ cost = (u64) htab->n_buckets * sizeof(struct bucket) +
+ (u64) htab->elem_size * htab->map.max_entries;
+
+ if (cost >= U32_MAX - PAGE_SIZE)
+ goto free_htab;
+
+ htab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+ err = bpf_map_precharge_memlock(htab->map.pages);
+ if (err)
+ goto free_htab;
+
+ err = -ENOMEM;
+ htab->buckets = bpf_map_area_alloc(
+ htab->n_buckets * sizeof(struct bucket),
+ htab->map.numa_node);
+ if (!htab->buckets)
+ goto free_htab;
+
+ for (i = 0; i < htab->n_buckets; i++) {
+ INIT_HLIST_HEAD(&htab->buckets[i].head);
+ raw_spin_lock_init(&htab->buckets[i].lock);
+ }
+
+ return &htab->map;
+free_htab:
+ kfree(htab);
+ return ERR_PTR(err);
+}
+
+static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
+{
+ return &htab->buckets[hash & (htab->n_buckets - 1)];
+}
+
+static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+{
+ return &__select_bucket(htab, hash)->head;
+}
+
+static void sock_hash_free(struct bpf_map *map)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ int i;
+
+ synchronize_rcu();
+
+ /* At this point no update, lookup or delete operations can happen.
+ * However, be aware we can still get a socket state event updates,
+ * and data ready callabacks that reference the psock from sk_user_data
+ * Also psock worker threads are still in-flight. So smap_release_sock
+ * will only free the psock after cancel_sync on the worker threads
+ * and a grace period expire to ensure psock is really safe to remove.
+ */
+ rcu_read_lock();
+ for (i = 0; i < htab->n_buckets; i++) {
+ struct hlist_head *head = select_bucket(htab, i);
+ struct hlist_node *n;
+ struct htab_elem *l;
+
+ hlist_for_each_entry_safe(l, n, head, hash_node) {
+ struct sock *sock = l->sk;
+ struct smap_psock *psock;
+
+ hlist_del_rcu(&l->hash_node);
+ write_lock_bh(&sock->sk_callback_lock);
+ psock = smap_psock_sk(sock);
+ /* This check handles a racing sock event that can get
+ * the sk_callback_lock before this case but after xchg
+ * causing the refcnt to hit zero and sock user data
+ * (psock) to be null and queued for garbage collection.
+ */
+ if (likely(psock)) {
+ smap_list_remove(psock, NULL, l);
+ smap_release_sock(psock, sock);
+ }
+ write_unlock_bh(&sock->sk_callback_lock);
+ kfree(l);
+ }
+ }
+ rcu_read_unlock();
+ bpf_map_area_free(htab->buckets);
+ kfree(htab);
+}
+
+static struct htab_elem *alloc_sock_hash_elem(struct bpf_htab *htab,
+ void *key, u32 key_size, u32 hash,
+ struct sock *sk,
+ struct htab_elem *old_elem)
+{
+ struct htab_elem *l_new;
+
+ if (atomic_inc_return(&htab->count) > htab->map.max_entries) {
+ if (!old_elem) {
+ atomic_dec(&htab->count);
+ return ERR_PTR(-E2BIG);
+ }
+ }
+ l_new = kmalloc_node(htab->elem_size, GFP_ATOMIC | __GFP_NOWARN,
+ htab->map.numa_node);
+ if (!l_new)
+ return ERR_PTR(-ENOMEM);
+
+ memcpy(l_new->key, key, key_size);
+ l_new->sk = sk;
+ l_new->hash = hash;
+ return l_new;
+}
+
+static struct htab_elem *lookup_elem_raw(struct hlist_head *head,
+ u32 hash, void *key, u32 key_size)
+{
+ struct htab_elem *l;
+
+ hlist_for_each_entry_rcu(l, head, hash_node) {
+ if (l->hash == hash && !memcmp(&l->key, key, key_size))
+ return l;
+ }
+
+ return NULL;
+}
+
+static inline u32 htab_map_hash(const void *key, u32 key_len)
+{
+ return jhash(key, key_len, 0);
+}
+
+static int sock_hash_get_next_key(struct bpf_map *map,
+ void *key, void *next_key)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct htab_elem *l, *next_l;
+ struct hlist_head *h;
+ u32 hash, key_size;
+ int i = 0;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ key_size = map->key_size;
+ if (!key)
+ goto find_first_elem;
+ hash = htab_map_hash(key, key_size);
+ h = select_bucket(htab, hash);
+
+ l = lookup_elem_raw(h, hash, key, key_size);
+ if (!l)
+ goto find_first_elem;
+ next_l = hlist_entry_safe(
+ rcu_dereference_raw(hlist_next_rcu(&l->hash_node)),
+ struct htab_elem, hash_node);
+ if (next_l) {
+ memcpy(next_key, next_l->key, key_size);
+ return 0;
+ }
+
+ /* no more elements in this hash list, go to the next bucket */
+ i = hash & (htab->n_buckets - 1);
+ i++;
+
+find_first_elem:
+ /* iterate over buckets */
+ for (; i < htab->n_buckets; i++) {
+ h = select_bucket(htab, i);
+
+ /* pick first element in the bucket */
+ next_l = hlist_entry_safe(
+ rcu_dereference_raw(hlist_first_rcu(h)),
+ struct htab_elem, hash_node);
+ if (next_l) {
+ /* if it's not empty, just return it */
+ memcpy(next_key, next_l->key, key_size);
+ return 0;
+ }
+ }
+
+ /* iterated over all buckets and all elements */
+ return -ENOENT;
+}
+
+static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
+ struct bpf_map *map,
+ void *key, u64 map_flags)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct bpf_sock_progs *progs = &htab->progs;
+ struct htab_elem *l_new = NULL, *l_old;
+ struct smap_psock_map_entry *e = NULL;
+ struct hlist_head *head;
+ struct smap_psock *psock;
+ u32 key_size, hash;
+ struct sock *sock;
+ struct bucket *b;
+ int err;
+
+ sock = skops->sk;
+
+ if (sock->sk_type != SOCK_STREAM ||
+ sock->sk_protocol != IPPROTO_TCP)
+ return -EOPNOTSUPP;
+
+ if (unlikely(map_flags > BPF_EXIST))
+ return -EINVAL;
+
+ e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
+ if (!e)
+ return -ENOMEM;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+ key_size = map->key_size;
+ hash = htab_map_hash(key, key_size);
+ b = __select_bucket(htab, hash);
+ head = &b->head;
+
+ err = __sock_map_ctx_update_elem(map, progs, sock, NULL, key);
+ if (err)
+ goto err;
+
+ /* bpf_map_update_elem() can be called in_irq() */
+ raw_spin_lock_bh(&b->lock);
+ l_old = lookup_elem_raw(head, hash, key, key_size);
+ if (l_old && map_flags == BPF_NOEXIST) {
+ err = -EEXIST;
+ goto bucket_err;
+ }
+ if (!l_old && map_flags == BPF_EXIST) {
+ err = -ENOENT;
+ goto bucket_err;
+ }
+
+ l_new = alloc_sock_hash_elem(htab, key, key_size, hash, sock, l_old);
+ if (IS_ERR(l_new)) {
+ err = PTR_ERR(l_new);
+ goto bucket_err;
+ }
+
+ psock = smap_psock_sk(sock);
+ if (unlikely(!psock)) {
+ err = -EINVAL;
+ goto bucket_err;
+ }
+
+ e->hash_link = l_new;
+ e->htab = container_of(map, struct bpf_htab, map);
+ list_add_tail(&e->list, &psock->maps);
+
+ /* add new element to the head of the list, so that
+ * concurrent search will find it before old elem
+ */
+ hlist_add_head_rcu(&l_new->hash_node, head);
+ if (l_old) {
+ psock = smap_psock_sk(l_old->sk);
+
+ hlist_del_rcu(&l_old->hash_node);
+ smap_list_remove(psock, NULL, l_old);
+ smap_release_sock(psock, l_old->sk);
+ free_htab_elem(htab, l_old);
+ }
+ raw_spin_unlock_bh(&b->lock);
+ return 0;
+bucket_err:
+ raw_spin_unlock_bh(&b->lock);
+err:
+ kfree(e);
+ psock = smap_psock_sk(sock);
+ if (psock)
+ smap_release_sock(psock, sock);
+ return err;
+}
+
+static int sock_hash_update_elem(struct bpf_map *map,
+ void *key, void *value, u64 flags)
+{
+ struct bpf_sock_ops_kern skops;
+ u32 fd = *(u32 *)value;
+ struct socket *socket;
+ int err;
+
+ socket = sockfd_lookup(fd, &err);
+ if (!socket)
+ return err;
+
+ skops.sk = socket->sk;
+ if (!skops.sk) {
+ fput(socket->file);
+ return -EINVAL;
+ }
+
+ err = sock_hash_ctx_update_elem(&skops, map, key, flags);
+ fput(socket->file);
+ return err;
+}
+
+static int sock_hash_delete_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct hlist_head *head;
+ struct bucket *b;
+ struct htab_elem *l;
+ u32 hash, key_size;
+ int ret = -ENOENT;
+
+ key_size = map->key_size;
+ hash = htab_map_hash(key, key_size);
+ b = __select_bucket(htab, hash);
+ head = &b->head;
+
+ raw_spin_lock_bh(&b->lock);
+ l = lookup_elem_raw(head, hash, key, key_size);
+ if (l) {
+ struct sock *sock = l->sk;
+ struct smap_psock *psock;
+
+ hlist_del_rcu(&l->hash_node);
+ write_lock_bh(&sock->sk_callback_lock);
+ psock = smap_psock_sk(sock);
+ /* This check handles a racing sock event that can get the
+ * sk_callback_lock before this case but after xchg happens
+ * causing the refcnt to hit zero and sock user data (psock)
+ * to be null and queued for garbage collection.
+ */
+ if (likely(psock)) {
+ smap_list_remove(psock, NULL, l);
+ smap_release_sock(psock, sock);
+ }
+ write_unlock_bh(&sock->sk_callback_lock);
+ free_htab_elem(htab, l);
+ ret = 0;
+ }
+ raw_spin_unlock_bh(&b->lock);
+ return ret;
+}
+
+struct sock *__sock_hash_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct hlist_head *head;
+ struct htab_elem *l;
+ u32 key_size, hash;
+ struct bucket *b;
+ struct sock *sk;
+
+ key_size = map->key_size;
+ hash = htab_map_hash(key, key_size);
+ b = __select_bucket(htab, hash);
+ head = &b->head;
+
+ raw_spin_lock_bh(&b->lock);
+ l = lookup_elem_raw(head, hash, key, key_size);
+ sk = l ? l->sk : NULL;
+ raw_spin_unlock_bh(&b->lock);
+ return sk;
+}
+
const struct bpf_map_ops sock_map_ops = {
.map_alloc = sock_map_alloc,
.map_free = sock_map_free,
@@ -1905,6 +2376,15 @@
.map_release_uref = sock_map_release,
};
+const struct bpf_map_ops sock_hash_ops = {
+ .map_alloc = sock_hash_alloc,
+ .map_free = sock_hash_free,
+ .map_lookup_elem = sock_map_lookup,
+ .map_get_next_key = sock_hash_get_next_key,
+ .map_update_elem = sock_hash_update_elem,
+ .map_delete_elem = sock_hash_delete_elem,
+};
+
BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, bpf_sock,
struct bpf_map *, map, void *, key, u64, flags)
{
@@ -1922,3 +2402,21 @@
.arg3_type = ARG_PTR_TO_MAP_KEY,
.arg4_type = ARG_ANYTHING,
};
+
+BPF_CALL_4(bpf_sock_hash_update, struct bpf_sock_ops_kern *, bpf_sock,
+ struct bpf_map *, map, void *, key, u64, flags)
+{
+ WARN_ON_ONCE(!rcu_read_lock_held());
+ return sock_hash_ctx_update_elem(bpf_sock, map, key, flags);
+}
+
+const struct bpf_func_proto bpf_sock_hash_update_proto = {
+ .func = bpf_sock_hash_update,
+ .gpl_only = false,
+ .pkt_access = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
+ .arg3_type = ARG_PTR_TO_MAP_KEY,
+ .arg4_type = ARG_ANYTHING,
+};
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 57eeb12..b59ace0 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -11,6 +11,7 @@
#include <linux/perf_event.h>
#include <linux/elf.h>
#include <linux/pagemap.h>
+#include <linux/irq_work.h>
#include "percpu_freelist.h"
#define STACK_CREATE_FLAG_MASK \
@@ -32,6 +33,23 @@
struct stack_map_bucket *buckets[];
};
+/* irq_work to run up_read() for build_id lookup in nmi context */
+struct stack_map_irq_work {
+ struct irq_work irq_work;
+ struct rw_semaphore *sem;
+};
+
+static void do_up_read(struct irq_work *entry)
+{
+ struct stack_map_irq_work *work;
+
+ work = container_of(entry, struct stack_map_irq_work, irq_work);
+ up_read(work->sem);
+ work->sem = NULL;
+}
+
+static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work);
+
static inline bool stack_map_use_build_id(struct bpf_map *map)
{
return (map->map_flags & BPF_F_STACK_BUILD_ID);
@@ -262,27 +280,32 @@
return ret;
}
-static void stack_map_get_build_id_offset(struct bpf_map *map,
- struct stack_map_bucket *bucket,
+static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
u64 *ips, u32 trace_nr, bool user)
{
int i;
struct vm_area_struct *vma;
- struct bpf_stack_build_id *id_offs;
+ bool in_nmi_ctx = in_nmi();
+ bool irq_work_busy = false;
+ struct stack_map_irq_work *work;
- bucket->nr = trace_nr;
- id_offs = (struct bpf_stack_build_id *)bucket->data;
+ if (in_nmi_ctx) {
+ work = this_cpu_ptr(&up_read_work);
+ if (work->irq_work.flags & IRQ_WORK_BUSY)
+ /* cannot queue more up_read, fallback */
+ irq_work_busy = true;
+ }
/*
- * We cannot do up_read() in nmi context, so build_id lookup is
- * only supported for non-nmi events. If at some point, it is
- * possible to run find_vma() without taking the semaphore, we
- * would like to allow build_id lookup in nmi context.
+ * We cannot do up_read() in nmi context. To do build_id lookup
+ * in nmi context, we need to run up_read() in irq_work. We use
+ * a percpu variable to do the irq_work. If the irq_work is
+ * already used by another lookup, we fall back to report ips.
*
* Same fallback is used for kernel stack (!user) on a stackmap
* with build_id.
*/
- if (!user || !current || !current->mm || in_nmi() ||
+ if (!user || !current || !current->mm || irq_work_busy ||
down_read_trylock(¤t->mm->mmap_sem) == 0) {
/* cannot access current->mm, fall back to ips */
for (i = 0; i < trace_nr; i++) {
@@ -304,7 +327,13 @@
- vma->vm_start;
id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
}
- up_read(¤t->mm->mmap_sem);
+
+ if (!in_nmi_ctx) {
+ up_read(¤t->mm->mmap_sem);
+ } else {
+ work->sem = ¤t->mm->mmap_sem;
+ irq_work_queue(&work->irq_work);
+ }
}
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
@@ -361,8 +390,10 @@
pcpu_freelist_pop(&smap->freelist);
if (unlikely(!new_bucket))
return -ENOMEM;
- stack_map_get_build_id_offset(map, new_bucket, ips,
- trace_nr, user);
+ new_bucket->nr = trace_nr;
+ stack_map_get_build_id_offset(
+ (struct bpf_stack_build_id *)new_bucket->data,
+ ips, trace_nr, user);
trace_len = trace_nr * sizeof(struct bpf_stack_build_id);
if (hash_matches && bucket->nr == trace_nr &&
memcmp(bucket->data, new_bucket->data, trace_len) == 0) {
@@ -405,6 +436,73 @@
.arg3_type = ARG_ANYTHING,
};
+BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
+ u64, flags)
+{
+ u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
+ bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ bool user = flags & BPF_F_USER_STACK;
+ struct perf_callchain_entry *trace;
+ bool kernel = !user;
+ int err = -EINVAL;
+ u64 *ips;
+
+ if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
+ BPF_F_USER_BUILD_ID)))
+ goto clear;
+ if (kernel && user_build_id)
+ goto clear;
+
+ elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id)
+ : sizeof(u64);
+ if (unlikely(size % elem_size))
+ goto clear;
+
+ num_elem = size / elem_size;
+ if (sysctl_perf_event_max_stack < num_elem)
+ init_nr = 0;
+ else
+ init_nr = sysctl_perf_event_max_stack - num_elem;
+ trace = get_perf_callchain(regs, init_nr, kernel, user,
+ sysctl_perf_event_max_stack, false, false);
+ if (unlikely(!trace))
+ goto err_fault;
+
+ trace_nr = trace->nr - init_nr;
+ if (trace_nr < skip)
+ goto err_fault;
+
+ trace_nr -= skip;
+ trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ copy_len = trace_nr * elem_size;
+ ips = trace->ip + skip + init_nr;
+ if (user && user_build_id)
+ stack_map_get_build_id_offset(buf, ips, trace_nr, user);
+ else
+ memcpy(buf, ips, copy_len);
+
+ if (size > copy_len)
+ memset(buf + copy_len, 0, size - copy_len);
+ return copy_len;
+
+err_fault:
+ err = -EFAULT;
+clear:
+ memset(buf, 0, size);
+ return err;
+}
+
+const struct bpf_func_proto bpf_get_stack_proto = {
+ .func = bpf_get_stack,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
+};
+
/* Called from eBPF program */
static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
{
@@ -511,3 +609,16 @@
.map_update_elem = stack_map_update_elem,
.map_delete_elem = stack_map_delete_elem,
};
+
+static int __init stack_map_init(void)
+{
+ int cpu;
+ struct stack_map_irq_work *work;
+
+ for_each_possible_cpu(cpu) {
+ work = per_cpu_ptr(&up_read_work, cpu);
+ init_irq_work(&work->irq_work, do_up_read);
+ }
+ return 0;
+}
+subsys_initcall(stack_map_init);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 016ef90..388d4fe 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -11,13 +11,16 @@
*/
#include <linux/bpf.h>
#include <linux/bpf_trace.h>
+#include <linux/btf.h>
#include <linux/syscalls.h>
#include <linux/slab.h>
#include <linux/sched/signal.h>
#include <linux/vmalloc.h>
#include <linux/mmzone.h>
#include <linux/anon_inodes.h>
+#include <linux/fdtable.h>
#include <linux/file.h>
+#include <linux/fs.h>
#include <linux/license.h>
#include <linux/filter.h>
#include <linux/version.h>
@@ -26,6 +29,7 @@
#include <linux/cred.h>
#include <linux/timekeeping.h>
#include <linux/ctype.h>
+#include <linux/btf.h>
#include <linux/nospec.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
@@ -63,9 +67,9 @@
* copy_from_user() call. However, this is not a concern since this function is
* meant to be a future-proofing of bits.
*/
-static int check_uarg_tail_zero(void __user *uaddr,
- size_t expected_size,
- size_t actual_size)
+int bpf_check_uarg_tail_zero(void __user *uaddr,
+ size_t expected_size,
+ size_t actual_size)
{
unsigned char __user *addr;
unsigned char __user *end;
@@ -273,6 +277,7 @@
if (atomic_dec_and_test(&map->refcnt)) {
/* bpf_map_free_id() must be called first */
bpf_map_free_id(map, do_idr_lock);
+ btf_put(map->btf);
INIT_WORK(&map->work, bpf_map_free_deferred);
schedule_work(&map->work);
}
@@ -282,6 +287,7 @@
{
__bpf_map_put(map, true);
}
+EXPORT_SYMBOL_GPL(bpf_map_put);
void bpf_map_put_with_uref(struct bpf_map *map)
{
@@ -418,7 +424,7 @@
return 0;
}
-#define BPF_MAP_CREATE_LAST_FIELD map_ifindex
+#define BPF_MAP_CREATE_LAST_FIELD btf_value_type_id
/* called via syscall */
static int map_create(union bpf_attr *attr)
{
@@ -452,6 +458,33 @@
atomic_set(&map->refcnt, 1);
atomic_set(&map->usercnt, 1);
+ if (bpf_map_support_seq_show(map) &&
+ (attr->btf_key_type_id || attr->btf_value_type_id)) {
+ struct btf *btf;
+
+ if (!attr->btf_key_type_id || !attr->btf_value_type_id) {
+ err = -EINVAL;
+ goto free_map_nouncharge;
+ }
+
+ btf = btf_get_by_fd(attr->btf_fd);
+ if (IS_ERR(btf)) {
+ err = PTR_ERR(btf);
+ goto free_map_nouncharge;
+ }
+
+ err = map->ops->map_check_btf(map, btf, attr->btf_key_type_id,
+ attr->btf_value_type_id);
+ if (err) {
+ btf_put(btf);
+ goto free_map_nouncharge;
+ }
+
+ map->btf = btf;
+ map->btf_key_type_id = attr->btf_key_type_id;
+ map->btf_value_type_id = attr->btf_value_type_id;
+ }
+
err = security_bpf_map_alloc(map);
if (err)
goto free_map_nouncharge;
@@ -476,7 +509,6 @@
return err;
}
- trace_bpf_map_create(map, err);
return err;
free_map:
@@ -484,6 +516,7 @@
free_map_sec:
security_bpf_map_free(map);
free_map_nouncharge:
+ btf_put(map->btf);
map->ops->map_free(map);
return err;
}
@@ -516,6 +549,7 @@
atomic_inc(&map->usercnt);
return map;
}
+EXPORT_SYMBOL_GPL(bpf_map_inc);
struct bpf_map *bpf_map_get_with_uref(u32 ufd)
{
@@ -635,7 +669,6 @@
if (copy_to_user(uvalue, value, value_size) != 0)
goto free_value;
- trace_bpf_map_lookup_elem(map, ufd, key, value);
err = 0;
free_value:
@@ -732,8 +765,6 @@
__this_cpu_dec(bpf_prog_active);
preempt_enable();
out:
- if (!err)
- trace_bpf_map_update_elem(map, ufd, key, value);
free_value:
kfree(value);
free_key:
@@ -786,8 +817,6 @@
__this_cpu_dec(bpf_prog_active);
preempt_enable();
out:
- if (!err)
- trace_bpf_map_delete_elem(map, ufd, key);
kfree(key);
err_put:
fdput(f);
@@ -851,7 +880,6 @@
if (copy_to_user(unext_key, next_key, map->key_size) != 0)
goto free_next_key;
- trace_bpf_map_next_key(map, ufd, key, next_key);
err = 0;
free_next_key:
@@ -1005,7 +1033,6 @@
if (atomic_dec_and_test(&prog->aux->refcnt)) {
int i;
- trace_bpf_prog_put_rcu(prog);
/* bpf_prog_free_id() must be called first */
bpf_prog_free_id(prog, do_idr_lock);
@@ -1172,11 +1199,7 @@
struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
bool attach_drv)
{
- struct bpf_prog *prog = __bpf_prog_get(ufd, &type, attach_drv);
-
- if (!IS_ERR(prog))
- trace_bpf_prog_get_type(prog);
- return prog;
+ return __bpf_prog_get(ufd, &type, attach_drv);
}
EXPORT_SYMBOL_GPL(bpf_prog_get_type_dev);
@@ -1351,7 +1374,6 @@
}
bpf_prog_kallsyms_add(prog);
- trace_bpf_prog_load(prog, err);
return err;
free_used_maps:
@@ -1879,7 +1901,7 @@
u32 ulen;
int err;
- err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
+ err = bpf_check_uarg_tail_zero(uinfo, sizeof(info), info_len);
if (err)
return err;
info_len = min_t(u32, sizeof(info), info_len);
@@ -1892,6 +1914,7 @@
info.load_time = prog->aux->load_time;
info.created_by_uid = from_kuid_munged(current_user_ns(),
prog->aux->user->uid);
+ info.gpl_compatible = prog->gpl_compatible;
memcpy(info.tag, prog->tag, sizeof(prog->tag));
memcpy(info.name, prog->aux->name, sizeof(prog->aux->name));
@@ -1912,6 +1935,7 @@
if (!capable(CAP_SYS_ADMIN)) {
info.jited_prog_len = 0;
info.xlated_prog_len = 0;
+ info.nr_jited_ksyms = 0;
goto done;
}
@@ -1948,18 +1972,93 @@
* for offload.
*/
ulen = info.jited_prog_len;
- info.jited_prog_len = prog->jited_len;
+ if (prog->aux->func_cnt) {
+ u32 i;
+
+ info.jited_prog_len = 0;
+ for (i = 0; i < prog->aux->func_cnt; i++)
+ info.jited_prog_len += prog->aux->func[i]->jited_len;
+ } else {
+ info.jited_prog_len = prog->jited_len;
+ }
+
if (info.jited_prog_len && ulen) {
if (bpf_dump_raw_ok()) {
uinsns = u64_to_user_ptr(info.jited_prog_insns);
ulen = min_t(u32, info.jited_prog_len, ulen);
- if (copy_to_user(uinsns, prog->bpf_func, ulen))
- return -EFAULT;
+
+ /* for multi-function programs, copy the JITed
+ * instructions for all the functions
+ */
+ if (prog->aux->func_cnt) {
+ u32 len, free, i;
+ u8 *img;
+
+ free = ulen;
+ for (i = 0; i < prog->aux->func_cnt; i++) {
+ len = prog->aux->func[i]->jited_len;
+ len = min_t(u32, len, free);
+ img = (u8 *) prog->aux->func[i]->bpf_func;
+ if (copy_to_user(uinsns, img, len))
+ return -EFAULT;
+ uinsns += len;
+ free -= len;
+ if (!free)
+ break;
+ }
+ } else {
+ if (copy_to_user(uinsns, prog->bpf_func, ulen))
+ return -EFAULT;
+ }
} else {
info.jited_prog_insns = 0;
}
}
+ ulen = info.nr_jited_ksyms;
+ info.nr_jited_ksyms = prog->aux->func_cnt;
+ if (info.nr_jited_ksyms && ulen) {
+ if (bpf_dump_raw_ok()) {
+ u64 __user *user_ksyms;
+ ulong ksym_addr;
+ u32 i;
+
+ /* copy the address of the kernel symbol
+ * corresponding to each function
+ */
+ ulen = min_t(u32, info.nr_jited_ksyms, ulen);
+ user_ksyms = u64_to_user_ptr(info.jited_ksyms);
+ for (i = 0; i < ulen; i++) {
+ ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
+ ksym_addr &= PAGE_MASK;
+ if (put_user((u64) ksym_addr, &user_ksyms[i]))
+ return -EFAULT;
+ }
+ } else {
+ info.jited_ksyms = 0;
+ }
+ }
+
+ ulen = info.nr_jited_func_lens;
+ info.nr_jited_func_lens = prog->aux->func_cnt;
+ if (info.nr_jited_func_lens && ulen) {
+ if (bpf_dump_raw_ok()) {
+ u32 __user *user_lens;
+ u32 func_len, i;
+
+ /* copy the JITed image lengths for each function */
+ ulen = min_t(u32, info.nr_jited_func_lens, ulen);
+ user_lens = u64_to_user_ptr(info.jited_func_lens);
+ for (i = 0; i < ulen; i++) {
+ func_len = prog->aux->func[i]->jited_len;
+ if (put_user(func_len, &user_lens[i]))
+ return -EFAULT;
+ }
+ } else {
+ info.jited_func_lens = 0;
+ }
+ }
+
done:
if (copy_to_user(uinfo, &info, info_len) ||
put_user(info_len, &uattr->info.info_len))
@@ -1977,7 +2076,7 @@
u32 info_len = attr->info.info_len;
int err;
- err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
+ err = bpf_check_uarg_tail_zero(uinfo, sizeof(info), info_len);
if (err)
return err;
info_len = min_t(u32, sizeof(info), info_len);
@@ -1990,6 +2089,12 @@
info.map_flags = map->map_flags;
memcpy(info.name, map->name, sizeof(map->name));
+ if (map->btf) {
+ info.btf_id = btf_id(map->btf);
+ info.btf_key_type_id = map->btf_key_type_id;
+ info.btf_value_type_id = map->btf_value_type_id;
+ }
+
if (bpf_map_is_dev_bound(map)) {
err = bpf_map_offload_info_fill(&info, map);
if (err)
@@ -2003,6 +2108,21 @@
return 0;
}
+static int bpf_btf_get_info_by_fd(struct btf *btf,
+ const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+ u32 info_len = attr->info.info_len;
+ int err;
+
+ err = bpf_check_uarg_tail_zero(uinfo, sizeof(*uinfo), info_len);
+ if (err)
+ return err;
+
+ return btf_get_info_by_fd(btf, attr, uattr);
+}
+
#define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info
static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
@@ -2025,6 +2145,8 @@
else if (f.file->f_op == &bpf_map_fops)
err = bpf_map_get_info_by_fd(f.file->private_data, attr,
uattr);
+ else if (f.file->f_op == &btf_fops)
+ err = bpf_btf_get_info_by_fd(f.file->private_data, attr, uattr);
else
err = -EINVAL;
@@ -2032,6 +2154,158 @@
return err;
}
+#define BPF_BTF_LOAD_LAST_FIELD btf_log_level
+
+static int bpf_btf_load(const union bpf_attr *attr)
+{
+ if (CHECK_ATTR(BPF_BTF_LOAD))
+ return -EINVAL;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ return btf_new_fd(attr);
+}
+
+#define BPF_BTF_GET_FD_BY_ID_LAST_FIELD btf_id
+
+static int bpf_btf_get_fd_by_id(const union bpf_attr *attr)
+{
+ if (CHECK_ATTR(BPF_BTF_GET_FD_BY_ID))
+ return -EINVAL;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ return btf_get_fd_by_id(attr->btf_id);
+}
+
+static int bpf_task_fd_query_copy(const union bpf_attr *attr,
+ union bpf_attr __user *uattr,
+ u32 prog_id, u32 fd_type,
+ const char *buf, u64 probe_offset,
+ u64 probe_addr)
+{
+ char __user *ubuf = u64_to_user_ptr(attr->task_fd_query.buf);
+ u32 len = buf ? strlen(buf) : 0, input_len;
+ int err = 0;
+
+ if (put_user(len, &uattr->task_fd_query.buf_len))
+ return -EFAULT;
+ input_len = attr->task_fd_query.buf_len;
+ if (input_len && ubuf) {
+ if (!len) {
+ /* nothing to copy, just make ubuf NULL terminated */
+ char zero = '\0';
+
+ if (put_user(zero, ubuf))
+ return -EFAULT;
+ } else if (input_len >= len + 1) {
+ /* ubuf can hold the string with NULL terminator */
+ if (copy_to_user(ubuf, buf, len + 1))
+ return -EFAULT;
+ } else {
+ /* ubuf cannot hold the string with NULL terminator,
+ * do a partial copy with NULL terminator.
+ */
+ char zero = '\0';
+
+ err = -ENOSPC;
+ if (copy_to_user(ubuf, buf, input_len - 1))
+ return -EFAULT;
+ if (put_user(zero, ubuf + input_len - 1))
+ return -EFAULT;
+ }
+ }
+
+ if (put_user(prog_id, &uattr->task_fd_query.prog_id) ||
+ put_user(fd_type, &uattr->task_fd_query.fd_type) ||
+ put_user(probe_offset, &uattr->task_fd_query.probe_offset) ||
+ put_user(probe_addr, &uattr->task_fd_query.probe_addr))
+ return -EFAULT;
+
+ return err;
+}
+
+#define BPF_TASK_FD_QUERY_LAST_FIELD task_fd_query.probe_addr
+
+static int bpf_task_fd_query(const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ pid_t pid = attr->task_fd_query.pid;
+ u32 fd = attr->task_fd_query.fd;
+ const struct perf_event *event;
+ struct files_struct *files;
+ struct task_struct *task;
+ struct file *file;
+ int err;
+
+ if (CHECK_ATTR(BPF_TASK_FD_QUERY))
+ return -EINVAL;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (attr->task_fd_query.flags != 0)
+ return -EINVAL;
+
+ task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
+ if (!task)
+ return -ENOENT;
+
+ files = get_files_struct(task);
+ put_task_struct(task);
+ if (!files)
+ return -ENOENT;
+
+ err = 0;
+ spin_lock(&files->file_lock);
+ file = fcheck_files(files, fd);
+ if (!file)
+ err = -EBADF;
+ else
+ get_file(file);
+ spin_unlock(&files->file_lock);
+ put_files_struct(files);
+
+ if (err)
+ goto out;
+
+ if (file->f_op == &bpf_raw_tp_fops) {
+ struct bpf_raw_tracepoint *raw_tp = file->private_data;
+ struct bpf_raw_event_map *btp = raw_tp->btp;
+
+ err = bpf_task_fd_query_copy(attr, uattr,
+ raw_tp->prog->aux->id,
+ BPF_FD_TYPE_RAW_TRACEPOINT,
+ btp->tp->name, 0, 0);
+ goto put_file;
+ }
+
+ event = perf_get_event(file);
+ if (!IS_ERR(event)) {
+ u64 probe_offset, probe_addr;
+ u32 prog_id, fd_type;
+ const char *buf;
+
+ err = bpf_get_perf_event_info(event, &prog_id, &fd_type,
+ &buf, &probe_offset,
+ &probe_addr);
+ if (!err)
+ err = bpf_task_fd_query_copy(attr, uattr, prog_id,
+ fd_type, buf,
+ probe_offset,
+ probe_addr);
+ goto put_file;
+ }
+
+ err = -ENOTSUPP;
+put_file:
+ fput(file);
+out:
+ return err;
+}
+
SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
{
union bpf_attr attr = {};
@@ -2040,7 +2314,7 @@
if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
return -EPERM;
- err = check_uarg_tail_zero(uattr, sizeof(attr), size);
+ err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
if (err)
return err;
size = min_t(u32, size, sizeof(attr));
@@ -2112,6 +2386,15 @@
case BPF_RAW_TRACEPOINT_OPEN:
err = bpf_raw_tracepoint_open(&attr);
break;
+ case BPF_BTF_LOAD:
+ err = bpf_btf_load(&attr);
+ break;
+ case BPF_BTF_GET_FD_BY_ID:
+ err = bpf_btf_get_fd_by_id(&attr);
+ break;
+ case BPF_TASK_FD_QUERY:
+ err = bpf_task_fd_query(&attr, uattr);
+ break;
default:
err = -EINVAL;
break;
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
index 1f4bf68..938d412 100644
--- a/kernel/bpf/tnum.c
+++ b/kernel/bpf/tnum.c
@@ -43,6 +43,16 @@
return TNUM(a.value >> shift, a.mask >> shift);
}
+struct tnum tnum_arshift(struct tnum a, u8 min_shift)
+{
+ /* if a.value is negative, arithmetic shifting by minimum shift
+ * will have larger negative offset compared to more shifting.
+ * If a.value is nonnegative, arithmetic shifting by minimum shift
+ * will have larger positive offset compare to more shifting.
+ */
+ return TNUM((s64)a.value >> min_shift, (s64)a.mask >> min_shift);
+}
+
struct tnum tnum_add(struct tnum a, struct tnum b)
{
u64 sm, sv, sigma, chi, mu;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1904e81..1fd9667b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -22,6 +22,7 @@
#include <linux/stringify.h>
#include <linux/bsearch.h>
#include <linux/sort.h>
+#include <linux/perf_event.h>
#include "disasm.h"
@@ -186,6 +187,8 @@
bool pkt_access;
int regno;
int access_size;
+ s64 msize_smax_value;
+ u64 msize_umax_value;
};
static DEFINE_MUTEX(bpf_verifier_lock);
@@ -760,18 +763,19 @@
static int cmp_subprogs(const void *a, const void *b)
{
- return *(int *)a - *(int *)b;
+ return ((struct bpf_subprog_info *)a)->start -
+ ((struct bpf_subprog_info *)b)->start;
}
static int find_subprog(struct bpf_verifier_env *env, int off)
{
- u32 *p;
+ struct bpf_subprog_info *p;
- p = bsearch(&off, env->subprog_starts, env->subprog_cnt,
- sizeof(env->subprog_starts[0]), cmp_subprogs);
+ p = bsearch(&off, env->subprog_info, env->subprog_cnt,
+ sizeof(env->subprog_info[0]), cmp_subprogs);
if (!p)
return -ENOENT;
- return p - env->subprog_starts;
+ return p - env->subprog_info;
}
@@ -791,18 +795,24 @@
verbose(env, "too many subprograms\n");
return -E2BIG;
}
- env->subprog_starts[env->subprog_cnt++] = off;
- sort(env->subprog_starts, env->subprog_cnt,
- sizeof(env->subprog_starts[0]), cmp_subprogs, NULL);
+ env->subprog_info[env->subprog_cnt++].start = off;
+ sort(env->subprog_info, env->subprog_cnt,
+ sizeof(env->subprog_info[0]), cmp_subprogs, NULL);
return 0;
}
static int check_subprogs(struct bpf_verifier_env *env)
{
int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
+ struct bpf_subprog_info *subprog = env->subprog_info;
struct bpf_insn *insn = env->prog->insnsi;
int insn_cnt = env->prog->len;
+ /* Add entry function. */
+ ret = add_subprog(env, 0);
+ if (ret < 0)
+ return ret;
+
/* determine subprog starts. The end is one before the next starts */
for (i = 0; i < insn_cnt; i++) {
if (insn[i].code != (BPF_JMP | BPF_CALL))
@@ -822,16 +832,18 @@
return ret;
}
+ /* Add a fake 'exit' subprog which could simplify subprog iteration
+ * logic. 'subprog_cnt' should not be increased.
+ */
+ subprog[env->subprog_cnt].start = insn_cnt;
+
if (env->log.level > 1)
for (i = 0; i < env->subprog_cnt; i++)
- verbose(env, "func#%d @%d\n", i, env->subprog_starts[i]);
+ verbose(env, "func#%d @%d\n", i, subprog[i].start);
/* now check that all jumps are within the same subprog */
- subprog_start = 0;
- if (env->subprog_cnt == cur_subprog)
- subprog_end = insn_cnt;
- else
- subprog_end = env->subprog_starts[cur_subprog++];
+ subprog_start = subprog[cur_subprog].start;
+ subprog_end = subprog[cur_subprog + 1].start;
for (i = 0; i < insn_cnt; i++) {
u8 code = insn[i].code;
@@ -856,10 +868,9 @@
return -EINVAL;
}
subprog_start = subprog_end;
- if (env->subprog_cnt == cur_subprog)
- subprog_end = insn_cnt;
- else
- subprog_end = env->subprog_starts[cur_subprog++];
+ cur_subprog++;
+ if (cur_subprog < env->subprog_cnt)
+ subprog_end = subprog[cur_subprog + 1].start;
}
}
return 0;
@@ -1298,6 +1309,7 @@
switch (env->prog->type) {
case BPF_PROG_TYPE_LWT_IN:
case BPF_PROG_TYPE_LWT_OUT:
+ case BPF_PROG_TYPE_LWT_SEG6LOCAL:
/* dst_input() and dst_output() can't write for now */
if (t == BPF_WRITE)
return false;
@@ -1517,13 +1529,13 @@
const struct bpf_func_state *func,
int off)
{
- u16 stack = env->subprog_stack_depth[func->subprogno];
+ u16 stack = env->subprog_info[func->subprogno].stack_depth;
if (stack >= -off)
return 0;
/* update known max for given subprogram */
- env->subprog_stack_depth[func->subprogno] = -off;
+ env->subprog_info[func->subprogno].stack_depth = -off;
return 0;
}
@@ -1535,9 +1547,9 @@
*/
static int check_max_stack_depth(struct bpf_verifier_env *env)
{
- int depth = 0, frame = 0, subprog = 0, i = 0, subprog_end;
+ int depth = 0, frame = 0, idx = 0, i = 0, subprog_end;
+ struct bpf_subprog_info *subprog = env->subprog_info;
struct bpf_insn *insn = env->prog->insnsi;
- int insn_cnt = env->prog->len;
int ret_insn[MAX_CALL_FRAMES];
int ret_prog[MAX_CALL_FRAMES];
@@ -1545,17 +1557,14 @@
/* round up to 32-bytes, since this is granularity
* of interpreter stack size
*/
- depth += round_up(max_t(u32, env->subprog_stack_depth[subprog], 1), 32);
+ depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
if (depth > MAX_BPF_STACK) {
verbose(env, "combined stack size of %d calls is %d. Too large\n",
frame + 1, depth);
return -EACCES;
}
continue_func:
- if (env->subprog_cnt == subprog)
- subprog_end = insn_cnt;
- else
- subprog_end = env->subprog_starts[subprog];
+ subprog_end = subprog[idx + 1].start;
for (; i < subprog_end; i++) {
if (insn[i].code != (BPF_JMP | BPF_CALL))
continue;
@@ -1563,17 +1572,16 @@
continue;
/* remember insn and function to return to */
ret_insn[frame] = i + 1;
- ret_prog[frame] = subprog;
+ ret_prog[frame] = idx;
/* find the callee */
i = i + insn[i].imm + 1;
- subprog = find_subprog(env, i);
- if (subprog < 0) {
+ idx = find_subprog(env, i);
+ if (idx < 0) {
WARN_ONCE(1, "verifier bug. No program starts at insn %d\n",
i);
return -EFAULT;
}
- subprog++;
frame++;
if (frame >= MAX_CALL_FRAMES) {
WARN_ONCE(1, "verifier bug. Call stack is too deep\n");
@@ -1586,10 +1594,10 @@
*/
if (frame == 0)
return 0;
- depth -= round_up(max_t(u32, env->subprog_stack_depth[subprog], 1), 32);
+ depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
frame--;
i = ret_insn[frame];
- subprog = ret_prog[frame];
+ idx = ret_prog[frame];
goto continue_func;
}
@@ -1605,8 +1613,7 @@
start);
return -EFAULT;
}
- subprog++;
- return env->subprog_stack_depth[subprog];
+ return env->subprog_info[subprog].stack_depth;
}
#endif
@@ -1961,7 +1968,7 @@
if (arg_type == ARG_PTR_TO_MAP_KEY ||
arg_type == ARG_PTR_TO_MAP_VALUE) {
expected_type = PTR_TO_STACK;
- if (!type_is_pkt_pointer(type) &&
+ if (!type_is_pkt_pointer(type) && type != PTR_TO_MAP_VALUE &&
type != expected_type)
goto err_type;
} else if (arg_type == ARG_CONST_SIZE ||
@@ -2013,14 +2020,9 @@
verbose(env, "invalid map_ptr to access map->key\n");
return -EACCES;
}
- if (type_is_pkt_pointer(type))
- err = check_packet_access(env, regno, reg->off,
- meta->map_ptr->key_size,
- false);
- else
- err = check_stack_boundary(env, regno,
- meta->map_ptr->key_size,
- false, NULL);
+ err = check_helper_mem_access(env, regno,
+ meta->map_ptr->key_size, false,
+ NULL);
} else if (arg_type == ARG_PTR_TO_MAP_VALUE) {
/* bpf_map_xxx(..., map_ptr, ..., value) call:
* check [value, value + map->value_size) validity
@@ -2030,17 +2032,18 @@
verbose(env, "invalid map_ptr to access map->value\n");
return -EACCES;
}
- if (type_is_pkt_pointer(type))
- err = check_packet_access(env, regno, reg->off,
- meta->map_ptr->value_size,
- false);
- else
- err = check_stack_boundary(env, regno,
- meta->map_ptr->value_size,
- false, NULL);
+ err = check_helper_mem_access(env, regno,
+ meta->map_ptr->value_size, false,
+ NULL);
} else if (arg_type_is_mem_size(arg_type)) {
bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
+ /* remember the mem_size which may be used later
+ * to refine return values.
+ */
+ meta->msize_smax_value = reg->smax_value;
+ meta->msize_umax_value = reg->umax_value;
+
/* The register is SCALAR_VALUE; the access check
* happens using its boundaries.
*/
@@ -2118,8 +2121,11 @@
if (func_id != BPF_FUNC_redirect_map)
goto error;
break;
- /* Restrict bpf side of cpumap, open when use-cases appear */
+ /* Restrict bpf side of cpumap and xskmap, open when use-cases
+ * appear.
+ */
case BPF_MAP_TYPE_CPUMAP:
+ case BPF_MAP_TYPE_XSKMAP:
if (func_id != BPF_FUNC_redirect_map)
goto error;
break;
@@ -2135,6 +2141,13 @@
func_id != BPF_FUNC_msg_redirect_map)
goto error;
break;
+ case BPF_MAP_TYPE_SOCKHASH:
+ if (func_id != BPF_FUNC_sk_redirect_hash &&
+ func_id != BPF_FUNC_sock_hash_update &&
+ func_id != BPF_FUNC_map_delete_elem &&
+ func_id != BPF_FUNC_msg_redirect_hash)
+ goto error;
+ break;
default:
break;
}
@@ -2144,7 +2157,7 @@
case BPF_FUNC_tail_call:
if (map->map_type != BPF_MAP_TYPE_PROG_ARRAY)
goto error;
- if (env->subprog_cnt) {
+ if (env->subprog_cnt > 1) {
verbose(env, "tail_calls are not allowed in programs with bpf-to-bpf calls\n");
return -EINVAL;
}
@@ -2166,16 +2179,20 @@
break;
case BPF_FUNC_redirect_map:
if (map->map_type != BPF_MAP_TYPE_DEVMAP &&
- map->map_type != BPF_MAP_TYPE_CPUMAP)
+ map->map_type != BPF_MAP_TYPE_CPUMAP &&
+ map->map_type != BPF_MAP_TYPE_XSKMAP)
goto error;
break;
case BPF_FUNC_sk_redirect_map:
case BPF_FUNC_msg_redirect_map:
+ case BPF_FUNC_sock_map_update:
if (map->map_type != BPF_MAP_TYPE_SOCKMAP)
goto error;
break;
- case BPF_FUNC_sock_map_update:
- if (map->map_type != BPF_MAP_TYPE_SOCKMAP)
+ case BPF_FUNC_sk_redirect_hash:
+ case BPF_FUNC_msg_redirect_hash:
+ case BPF_FUNC_sock_hash_update:
+ if (map->map_type != BPF_MAP_TYPE_SOCKHASH)
goto error;
break;
default:
@@ -2316,7 +2333,7 @@
/* remember the callsite, it will be used by bpf_exit */
*insn_idx /* callsite */,
state->curframe + 1 /* frameno within this callchain */,
- subprog + 1 /* subprog number within this prog */);
+ subprog /* subprog number within this prog */);
/* copy r1 - r5 args that callee can access */
for (i = BPF_REG_1; i <= BPF_REG_5; i++)
@@ -2380,6 +2397,23 @@
return 0;
}
+static void do_refine_retval_range(struct bpf_reg_state *regs, int ret_type,
+ int func_id,
+ struct bpf_call_arg_meta *meta)
+{
+ struct bpf_reg_state *ret_reg = ®s[BPF_REG_0];
+
+ if (ret_type != RET_INTEGER ||
+ (func_id != BPF_FUNC_get_stack &&
+ func_id != BPF_FUNC_probe_read_str))
+ return;
+
+ ret_reg->smax_value = meta->msize_smax_value;
+ ret_reg->umax_value = meta->msize_umax_value;
+ __reg_deduce_bounds(ret_reg);
+ __reg_bound_offset(ret_reg);
+}
+
static int
record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
int func_id, int insn_idx)
@@ -2516,10 +2550,30 @@
return -EINVAL;
}
+ do_refine_retval_range(regs, fn->ret_type, func_id, &meta);
+
err = check_map_func_compatibility(env, meta.map_ptr, func_id);
if (err)
return err;
+ if (func_id == BPF_FUNC_get_stack && !env->prog->has_callchain_buf) {
+ const char *err_str;
+
+#ifdef CONFIG_PERF_EVENTS
+ err = get_callchain_buffers(sysctl_perf_event_max_stack);
+ err_str = "cannot get callchain buffer for func %s#%d\n";
+#else
+ err = -ENOTSUPP;
+ err_str = "func %s#%d not supported without CONFIG_PERF_EVENTS\n";
+#endif
+ if (err) {
+ verbose(env, err_str, func_id_name(func_id), func_id);
+ return err;
+ }
+
+ env->prog->has_callchain_buf = true;
+ }
+
if (changes_data)
clear_all_pkt_pointers(env);
return 0;
@@ -2964,10 +3018,7 @@
dst_reg->umin_value <<= umin_val;
dst_reg->umax_value <<= umax_val;
}
- if (src_known)
- dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val);
- else
- dst_reg->var_off = tnum_lshift(tnum_unknown, umin_val);
+ dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val);
/* We may learn something more from the var_off */
__update_reg_bounds(dst_reg);
break;
@@ -2995,16 +3046,35 @@
*/
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
- if (src_known)
- dst_reg->var_off = tnum_rshift(dst_reg->var_off,
- umin_val);
- else
- dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
+ dst_reg->var_off = tnum_rshift(dst_reg->var_off, umin_val);
dst_reg->umin_value >>= umax_val;
dst_reg->umax_value >>= umin_val;
/* We may learn something more from the var_off */
__update_reg_bounds(dst_reg);
break;
+ case BPF_ARSH:
+ if (umax_val >= insn_bitness) {
+ /* Shifts greater than 31 or 63 are undefined.
+ * This includes shifts by a negative number.
+ */
+ mark_reg_unknown(env, regs, insn->dst_reg);
+ break;
+ }
+
+ /* Upon reaching here, src_known is true and
+ * umax_val is equal to umin_val.
+ */
+ dst_reg->smin_value >>= umin_val;
+ dst_reg->smax_value >>= umin_val;
+ dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val);
+
+ /* blow away the dst_reg umin_value/umax_value and rely on
+ * dst_reg var_off to refine the result.
+ */
+ dst_reg->umin_value = 0;
+ dst_reg->umax_value = U64_MAX;
+ __update_reg_bounds(dst_reg);
+ break;
default:
mark_reg_unknown(env, regs, insn->dst_reg);
break;
@@ -3888,7 +3958,12 @@
return -EINVAL;
}
- if (env->subprog_cnt) {
+ if (!env->ops->gen_ld_abs) {
+ verbose(env, "bpf verifier is misconfigured\n");
+ return -EINVAL;
+ }
+
+ if (env->subprog_cnt > 1) {
/* when program has LD_ABS insn JITs and interpreter assume
* that r1 == ctx == skb which is not the case for callees
* that can have arbitrary arguments. It's problematic
@@ -4919,15 +4994,15 @@
verbose(env, "processed %d insns (limit %d), stack depth ",
insn_processed, BPF_COMPLEXITY_LIMIT_INSNS);
- for (i = 0; i < env->subprog_cnt + 1; i++) {
- u32 depth = env->subprog_stack_depth[i];
+ for (i = 0; i < env->subprog_cnt; i++) {
+ u32 depth = env->subprog_info[i].stack_depth;
verbose(env, "%d", depth);
- if (i + 1 < env->subprog_cnt + 1)
+ if (i + 1 < env->subprog_cnt)
verbose(env, "+");
}
verbose(env, "\n");
- env->prog->aux->stack_depth = env->subprog_stack_depth[0];
+ env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
return 0;
}
@@ -5051,7 +5126,7 @@
/* hold the map. If the program is rejected by verifier,
* the map will be released by release_maps() or it
* will be used by the valid program until it's unloaded
- * and all maps are released in free_bpf_prog_info()
+ * and all maps are released in free_used_maps()
*/
map = bpf_map_inc(map, false);
if (IS_ERR(map)) {
@@ -5133,10 +5208,11 @@
if (len == 1)
return;
- for (i = 0; i < env->subprog_cnt; i++) {
- if (env->subprog_starts[i] < off)
+ /* NOTE: fake 'exit' subprog should be updated as well. */
+ for (i = 0; i <= env->subprog_cnt; i++) {
+ if (env->subprog_info[i].start < off)
continue;
- env->subprog_starts[i] += len - 1;
+ env->subprog_info[i].start += len - 1;
}
}
@@ -5210,7 +5286,7 @@
}
}
- if (!ops->convert_ctx_access)
+ if (!ops->convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux))
return 0;
insn = env->prog->insnsi + delta;
@@ -5328,7 +5404,7 @@
void *old_bpf_func;
int err = -ENOMEM;
- if (env->subprog_cnt == 0)
+ if (env->subprog_cnt <= 1)
return 0;
for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
@@ -5344,7 +5420,7 @@
/* temporarily remember subprog id inside insn instead of
* aux_data, since next loop will split up all insns into funcs
*/
- insn->off = subprog + 1;
+ insn->off = subprog;
/* remember original imm in case JIT fails and fallback
* to interpreter will be needed
*/
@@ -5353,16 +5429,13 @@
insn->imm = 1;
}
- func = kzalloc(sizeof(prog) * (env->subprog_cnt + 1), GFP_KERNEL);
+ func = kzalloc(sizeof(prog) * env->subprog_cnt, GFP_KERNEL);
if (!func)
return -ENOMEM;
- for (i = 0; i <= env->subprog_cnt; i++) {
+ for (i = 0; i < env->subprog_cnt; i++) {
subprog_start = subprog_end;
- if (env->subprog_cnt == i)
- subprog_end = prog->len;
- else
- subprog_end = env->subprog_starts[i];
+ subprog_end = env->subprog_info[i + 1].start;
len = subprog_end - subprog_start;
func[i] = bpf_prog_alloc(bpf_prog_size(len), GFP_USER);
@@ -5379,7 +5452,7 @@
* Long term would need debug info to populate names
*/
func[i]->aux->name[0] = 'F';
- func[i]->aux->stack_depth = env->subprog_stack_depth[i];
+ func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
func[i]->jit_requested = 1;
func[i] = bpf_int_jit_compile(func[i]);
if (!func[i]->jited) {
@@ -5392,20 +5465,33 @@
* now populate all bpf_calls with correct addresses and
* run last pass of JIT
*/
- for (i = 0; i <= env->subprog_cnt; i++) {
+ for (i = 0; i < env->subprog_cnt; i++) {
insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) {
if (insn->code != (BPF_JMP | BPF_CALL) ||
insn->src_reg != BPF_PSEUDO_CALL)
continue;
subprog = insn->off;
- insn->off = 0;
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
}
+
+ /* we use the aux data to keep a list of the start addresses
+ * of the JITed images for each function in the program
+ *
+ * for some architectures, such as powerpc64, the imm field
+ * might not be large enough to hold the offset of the start
+ * address of the callee's JITed image from __bpf_call_base
+ *
+ * in such cases, we can lookup the start address of a callee
+ * by using its subprog id, available from the off field of
+ * the call instruction, as an index for this list
+ */
+ func[i]->aux->func = func;
+ func[i]->aux->func_cnt = env->subprog_cnt;
}
- for (i = 0; i <= env->subprog_cnt; i++) {
+ for (i = 0; i < env->subprog_cnt; i++) {
old_bpf_func = func[i]->bpf_func;
tmp = bpf_int_jit_compile(func[i]);
if (tmp != func[i] || func[i]->bpf_func != old_bpf_func) {
@@ -5419,7 +5505,7 @@
/* finally lock prog and jit images for all functions and
* populate kallsysm
*/
- for (i = 0; i <= env->subprog_cnt; i++) {
+ for (i = 0; i < env->subprog_cnt; i++) {
bpf_prog_lock_ro(func[i]);
bpf_prog_kallsyms_add(func[i]);
}
@@ -5429,26 +5515,21 @@
* later look the same as if they were interpreted only.
*/
for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
- unsigned long addr;
-
if (insn->code != (BPF_JMP | BPF_CALL) ||
insn->src_reg != BPF_PSEUDO_CALL)
continue;
insn->off = env->insn_aux_data[i].call_imm;
subprog = find_subprog(env, i + insn->off + 1);
- addr = (unsigned long)func[subprog + 1]->bpf_func;
- addr &= PAGE_MASK;
- insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
- addr - __bpf_call_base;
+ insn->imm = subprog;
}
prog->jited = 1;
prog->bpf_func = func[0]->bpf_func;
prog->aux->func = func;
- prog->aux->func_cnt = env->subprog_cnt + 1;
+ prog->aux->func_cnt = env->subprog_cnt;
return 0;
out_free:
- for (i = 0; i <= env->subprog_cnt; i++)
+ for (i = 0; i < env->subprog_cnt; i++)
if (func[i])
bpf_jit_free(func[i]);
kfree(func);
@@ -5552,6 +5633,25 @@
continue;
}
+ if (BPF_CLASS(insn->code) == BPF_LD &&
+ (BPF_MODE(insn->code) == BPF_ABS ||
+ BPF_MODE(insn->code) == BPF_IND)) {
+ cnt = env->ops->gen_ld_abs(insn, insn_buf);
+ if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) {
+ verbose(env, "bpf verifier is misconfigured\n");
+ return -EINVAL;
+ }
+
+ new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
+ continue;
+ }
+
if (insn->code != (BPF_JMP | BPF_CALL))
continue;
if (insn->src_reg == BPF_PSEUDO_CALL)
@@ -5755,16 +5855,16 @@
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
env->strict_alignment = true;
- if (bpf_prog_is_dev_bound(env->prog->aux)) {
- ret = bpf_prog_offload_verifier_prep(env);
- if (ret)
- goto err_unlock;
- }
-
ret = replace_map_fd_with_map_ptr(env);
if (ret < 0)
goto skip_full_check;
+ if (bpf_prog_is_dev_bound(env->prog->aux)) {
+ ret = bpf_prog_offload_verifier_prep(env);
+ if (ret)
+ goto skip_full_check;
+ }
+
env->explored_states = kcalloc(env->prog->len,
sizeof(struct bpf_verifier_state_list *),
GFP_USER);
@@ -5835,7 +5935,7 @@
err_release_maps:
if (!env->prog->aux->used_maps)
/* if we didn't copy map pointers into bpf_prog_info, release
- * them now. Otherwise free_bpf_prog_info() will release them.
+ * them now. Otherwise free_used_maps() will release them.
*/
release_maps(env);
*prog = env->prog;
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
new file mode 100644
index 0000000..b3c5574
--- /dev/null
+++ b/kernel/bpf/xskmap.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/* XSKMAP used for AF_XDP sockets
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <linux/bpf.h>
+#include <linux/capability.h>
+#include <net/xdp_sock.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+
+struct xsk_map {
+ struct bpf_map map;
+ struct xdp_sock **xsk_map;
+ struct list_head __percpu *flush_list;
+};
+
+static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
+{
+ int cpu, err = -EINVAL;
+ struct xsk_map *m;
+ u64 cost;
+
+ if (!capable(CAP_NET_ADMIN))
+ return ERR_PTR(-EPERM);
+
+ if (attr->max_entries == 0 || attr->key_size != 4 ||
+ attr->value_size != 4 ||
+ attr->map_flags & ~(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY))
+ return ERR_PTR(-EINVAL);
+
+ m = kzalloc(sizeof(*m), GFP_USER);
+ if (!m)
+ return ERR_PTR(-ENOMEM);
+
+ bpf_map_init_from_attr(&m->map, attr);
+
+ cost = (u64)m->map.max_entries * sizeof(struct xdp_sock *);
+ cost += sizeof(struct list_head) * num_possible_cpus();
+ if (cost >= U32_MAX - PAGE_SIZE)
+ goto free_m;
+
+ m->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+
+ /* Notice returns -EPERM on if map size is larger than memlock limit */
+ err = bpf_map_precharge_memlock(m->map.pages);
+ if (err)
+ goto free_m;
+
+ err = -ENOMEM;
+
+ m->flush_list = alloc_percpu(struct list_head);
+ if (!m->flush_list)
+ goto free_m;
+
+ for_each_possible_cpu(cpu)
+ INIT_LIST_HEAD(per_cpu_ptr(m->flush_list, cpu));
+
+ m->xsk_map = bpf_map_area_alloc(m->map.max_entries *
+ sizeof(struct xdp_sock *),
+ m->map.numa_node);
+ if (!m->xsk_map)
+ goto free_percpu;
+ return &m->map;
+
+free_percpu:
+ free_percpu(m->flush_list);
+free_m:
+ kfree(m);
+ return ERR_PTR(err);
+}
+
+static void xsk_map_free(struct bpf_map *map)
+{
+ struct xsk_map *m = container_of(map, struct xsk_map, map);
+ int i;
+
+ synchronize_net();
+
+ for (i = 0; i < map->max_entries; i++) {
+ struct xdp_sock *xs;
+
+ xs = m->xsk_map[i];
+ if (!xs)
+ continue;
+
+ sock_put((struct sock *)xs);
+ }
+
+ free_percpu(m->flush_list);
+ bpf_map_area_free(m->xsk_map);
+ kfree(m);
+}
+
+static int xsk_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ struct xsk_map *m = container_of(map, struct xsk_map, map);
+ u32 index = key ? *(u32 *)key : U32_MAX;
+ u32 *next = next_key;
+
+ if (index >= m->map.max_entries) {
+ *next = 0;
+ return 0;
+ }
+
+ if (index == m->map.max_entries - 1)
+ return -ENOENT;
+ *next = index + 1;
+ return 0;
+}
+
+struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, u32 key)
+{
+ struct xsk_map *m = container_of(map, struct xsk_map, map);
+ struct xdp_sock *xs;
+
+ if (key >= map->max_entries)
+ return NULL;
+
+ xs = READ_ONCE(m->xsk_map[key]);
+ return xs;
+}
+
+int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
+ struct xdp_sock *xs)
+{
+ struct xsk_map *m = container_of(map, struct xsk_map, map);
+ struct list_head *flush_list = this_cpu_ptr(m->flush_list);
+ int err;
+
+ err = xsk_rcv(xs, xdp);
+ if (err)
+ return err;
+
+ if (!xs->flush_node.prev)
+ list_add(&xs->flush_node, flush_list);
+
+ return 0;
+}
+
+void __xsk_map_flush(struct bpf_map *map)
+{
+ struct xsk_map *m = container_of(map, struct xsk_map, map);
+ struct list_head *flush_list = this_cpu_ptr(m->flush_list);
+ struct xdp_sock *xs, *tmp;
+
+ list_for_each_entry_safe(xs, tmp, flush_list, flush_node) {
+ xsk_flush(xs);
+ __list_del(xs->flush_node.prev, xs->flush_node.next);
+ xs->flush_node.prev = NULL;
+ }
+}
+
+static void *xsk_map_lookup_elem(struct bpf_map *map, void *key)
+{
+ return NULL;
+}
+
+static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
+ u64 map_flags)
+{
+ struct xsk_map *m = container_of(map, struct xsk_map, map);
+ u32 i = *(u32 *)key, fd = *(u32 *)value;
+ struct xdp_sock *xs, *old_xs;
+ struct socket *sock;
+ int err;
+
+ if (unlikely(map_flags > BPF_EXIST))
+ return -EINVAL;
+ if (unlikely(i >= m->map.max_entries))
+ return -E2BIG;
+ if (unlikely(map_flags == BPF_NOEXIST))
+ return -EEXIST;
+
+ sock = sockfd_lookup(fd, &err);
+ if (!sock)
+ return err;
+
+ if (sock->sk->sk_family != PF_XDP) {
+ sockfd_put(sock);
+ return -EOPNOTSUPP;
+ }
+
+ xs = (struct xdp_sock *)sock->sk;
+
+ if (!xsk_is_setup_for_bpf_map(xs)) {
+ sockfd_put(sock);
+ return -EOPNOTSUPP;
+ }
+
+ sock_hold(sock->sk);
+
+ old_xs = xchg(&m->xsk_map[i], xs);
+ if (old_xs) {
+ /* Make sure we've flushed everything. */
+ synchronize_net();
+ sock_put((struct sock *)old_xs);
+ }
+
+ sockfd_put(sock);
+ return 0;
+}
+
+static int xsk_map_delete_elem(struct bpf_map *map, void *key)
+{
+ struct xsk_map *m = container_of(map, struct xsk_map, map);
+ struct xdp_sock *old_xs;
+ int k = *(u32 *)key;
+
+ if (k >= map->max_entries)
+ return -EINVAL;
+
+ old_xs = xchg(&m->xsk_map[k], NULL);
+ if (old_xs) {
+ /* Make sure we've flushed everything. */
+ synchronize_net();
+ sock_put((struct sock *)old_xs);
+ }
+
+ return 0;
+}
+
+const struct bpf_map_ops xsk_map_ops = {
+ .map_alloc = xsk_map_alloc,
+ .map_free = xsk_map_free,
+ .map_get_next_key = xsk_map_get_next_key,
+ .map_lookup_elem = xsk_map_lookup_elem,
+ .map_update_elem = xsk_map_update_elem,
+ .map_delete_elem = xsk_map_delete_elem,
+};
+
+
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 67612ce..6eeab86 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11212,6 +11212,14 @@
return file;
}
+const struct perf_event *perf_get_event(struct file *file)
+{
+ if (file->f_op != &perf_fops)
+ return ERR_PTR(-EINVAL);
+
+ return file->private_data;
+}
+
const struct perf_event_attr *perf_event_attrs(struct perf_event *event)
{
if (!event)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 56ba0f2..81fdf2f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -14,12 +14,14 @@
#include <linux/uaccess.h>
#include <linux/ctype.h>
#include <linux/kprobes.h>
+#include <linux/syscalls.h>
#include <linux/error-injection.h>
#include "trace_probe.h"
#include "trace.h"
u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+u64 bpf_get_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
/**
* trace_call_bpf - invoke BPF program
@@ -474,8 +476,6 @@
struct bpf_array *array = container_of(map, struct bpf_array, map);
struct cgroup *cgrp;
- if (unlikely(in_interrupt()))
- return -EINVAL;
if (unlikely(idx >= array->map.max_entries))
return -E2BIG;
@@ -577,6 +577,8 @@
return &bpf_perf_event_output_proto;
case BPF_FUNC_get_stackid:
return &bpf_get_stackid_proto;
+ case BPF_FUNC_get_stack:
+ return &bpf_get_stack_proto;
case BPF_FUNC_perf_event_read_value:
return &bpf_perf_event_read_value_proto;
#ifdef CONFIG_BPF_KPROBE_OVERRIDE
@@ -664,6 +666,25 @@
.arg3_type = ARG_ANYTHING,
};
+BPF_CALL_4(bpf_get_stack_tp, void *, tp_buff, void *, buf, u32, size,
+ u64, flags)
+{
+ struct pt_regs *regs = *(struct pt_regs **)tp_buff;
+
+ return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
+ (unsigned long) size, flags, 0);
+}
+
+static const struct bpf_func_proto bpf_get_stack_proto_tp = {
+ .func = bpf_get_stack_tp,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
+};
+
static const struct bpf_func_proto *
tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
@@ -672,6 +693,8 @@
return &bpf_perf_event_output_proto_tp;
case BPF_FUNC_get_stackid:
return &bpf_get_stackid_proto_tp;
+ case BPF_FUNC_get_stack:
+ return &bpf_get_stack_proto_tp;
default:
return tracing_func_proto(func_id, prog);
}
@@ -734,6 +757,8 @@
return &bpf_perf_event_output_proto_tp;
case BPF_FUNC_get_stackid:
return &bpf_get_stackid_proto_tp;
+ case BPF_FUNC_get_stack:
+ return &bpf_get_stack_proto_tp;
case BPF_FUNC_perf_prog_read_value:
return &bpf_perf_prog_read_value_proto;
default:
@@ -744,7 +769,7 @@
/*
* bpf_raw_tp_regs are separate from bpf_pt_regs used from skb/xdp
* to avoid potential recursive reuse issue when/if tracepoints are added
- * inside bpf_*_event_output and/or bpf_get_stack_id
+ * inside bpf_*_event_output, bpf_get_stackid and/or bpf_get_stack
*/
static DEFINE_PER_CPU(struct pt_regs, bpf_raw_tp_regs);
BPF_CALL_5(bpf_perf_event_output_raw_tp, struct bpf_raw_tracepoint_args *, args,
@@ -787,6 +812,26 @@
.arg3_type = ARG_ANYTHING,
};
+BPF_CALL_4(bpf_get_stack_raw_tp, struct bpf_raw_tracepoint_args *, args,
+ void *, buf, u32, size, u64, flags)
+{
+ struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+ perf_fetch_caller_regs(regs);
+ return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
+ (unsigned long) size, flags, 0);
+}
+
+static const struct bpf_func_proto bpf_get_stack_proto_raw_tp = {
+ .func = bpf_get_stack_raw_tp,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
+};
+
static const struct bpf_func_proto *
raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
@@ -795,6 +840,8 @@
return &bpf_perf_event_output_proto_raw_tp;
case BPF_FUNC_get_stackid:
return &bpf_get_stackid_proto_raw_tp;
+ case BPF_FUNC_get_stack:
+ return &bpf_get_stack_proto_raw_tp;
default:
return tracing_func_proto(func_id, prog);
}
@@ -1117,3 +1164,50 @@
mutex_unlock(&bpf_event_mutex);
return err;
}
+
+int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
+ u32 *fd_type, const char **buf,
+ u64 *probe_offset, u64 *probe_addr)
+{
+ bool is_tracepoint, is_syscall_tp;
+ struct bpf_prog *prog;
+ int flags, err = 0;
+
+ prog = event->prog;
+ if (!prog)
+ return -ENOENT;
+
+ /* not supporting BPF_PROG_TYPE_PERF_EVENT yet */
+ if (prog->type == BPF_PROG_TYPE_PERF_EVENT)
+ return -EOPNOTSUPP;
+
+ *prog_id = prog->aux->id;
+ flags = event->tp_event->flags;
+ is_tracepoint = flags & TRACE_EVENT_FL_TRACEPOINT;
+ is_syscall_tp = is_syscall_trace_event(event->tp_event);
+
+ if (is_tracepoint || is_syscall_tp) {
+ *buf = is_tracepoint ? event->tp_event->tp->name
+ : event->tp_event->name;
+ *fd_type = BPF_FD_TYPE_TRACEPOINT;
+ *probe_offset = 0x0;
+ *probe_addr = 0x0;
+ } else {
+ /* kprobe/uprobe */
+ err = -EOPNOTSUPP;
+#ifdef CONFIG_KPROBE_EVENTS
+ if (flags & TRACE_EVENT_FL_KPROBE)
+ err = bpf_get_kprobe_info(event, fd_type, buf,
+ probe_offset, probe_addr,
+ event->attr.type == PERF_TYPE_TRACEPOINT);
+#endif
+#ifdef CONFIG_UPROBE_EVENTS
+ if (flags & TRACE_EVENT_FL_UPROBE)
+ err = bpf_get_uprobe_info(event, fd_type, buf,
+ probe_offset,
+ event->attr.type == PERF_TYPE_TRACEPOINT);
+#endif
+ }
+
+ return err;
+}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 02aed76..daa8157 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1287,6 +1287,35 @@
head, NULL);
}
NOKPROBE_SYMBOL(kretprobe_perf_func);
+
+int bpf_get_kprobe_info(const struct perf_event *event, u32 *fd_type,
+ const char **symbol, u64 *probe_offset,
+ u64 *probe_addr, bool perf_type_tracepoint)
+{
+ const char *pevent = trace_event_name(event->tp_event);
+ const char *group = event->tp_event->class->system;
+ struct trace_kprobe *tk;
+
+ if (perf_type_tracepoint)
+ tk = find_trace_kprobe(pevent, group);
+ else
+ tk = event->tp_event->data;
+ if (!tk)
+ return -EINVAL;
+
+ *fd_type = trace_kprobe_is_return(tk) ? BPF_FD_TYPE_KRETPROBE
+ : BPF_FD_TYPE_KPROBE;
+ if (tk->symbol) {
+ *symbol = tk->symbol;
+ *probe_offset = tk->rp.kp.offset;
+ *probe_addr = 0;
+ } else {
+ *symbol = NULL;
+ *probe_offset = 0;
+ *probe_addr = (unsigned long)tk->rp.kp.addr;
+ }
+ return 0;
+}
#endif /* CONFIG_PERF_EVENTS */
/*
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index ac89287..bf89a51 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1161,6 +1161,28 @@
{
__uprobe_perf_func(tu, func, regs, ucb, dsize);
}
+
+int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type,
+ const char **filename, u64 *probe_offset,
+ bool perf_type_tracepoint)
+{
+ const char *pevent = trace_event_name(event->tp_event);
+ const char *group = event->tp_event->class->system;
+ struct trace_uprobe *tu;
+
+ if (perf_type_tracepoint)
+ tu = find_probe_event(pevent, group);
+ else
+ tu = event->tp_event->data;
+ if (!tu)
+ return -EINVAL;
+
+ *fd_type = is_ret_probe(tu) ? BPF_FD_TYPE_URETPROBE
+ : BPF_FD_TYPE_UPROBE;
+ *filename = tu->filename;
+ *probe_offset = tu->offset;
+ return 0;
+}
#endif /* CONFIG_PERF_EVENTS */
static int
diff --git a/kernel/umh.c b/kernel/umh.c
index f76b3ff..30db93f 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -25,6 +25,8 @@
#include <linux/ptrace.h>
#include <linux/async.h>
#include <linux/uaccess.h>
+#include <linux/shmem_fs.h>
+#include <linux/pipe_fs_i.h>
#include <trace/events/module.h>
@@ -97,9 +99,13 @@
commit_creds(new);
- retval = do_execve(getname_kernel(sub_info->path),
- (const char __user *const __user *)sub_info->argv,
- (const char __user *const __user *)sub_info->envp);
+ if (sub_info->file)
+ retval = do_execve_file(sub_info->file,
+ sub_info->argv, sub_info->envp);
+ else
+ retval = do_execve(getname_kernel(sub_info->path),
+ (const char __user *const __user *)sub_info->argv,
+ (const char __user *const __user *)sub_info->envp);
out:
sub_info->retval = retval;
/*
@@ -185,6 +191,8 @@
if (pid < 0) {
sub_info->retval = pid;
umh_complete(sub_info);
+ } else {
+ sub_info->pid = pid;
}
}
}
@@ -393,6 +401,117 @@
}
EXPORT_SYMBOL(call_usermodehelper_setup);
+struct subprocess_info *call_usermodehelper_setup_file(struct file *file,
+ int (*init)(struct subprocess_info *info, struct cred *new),
+ void (*cleanup)(struct subprocess_info *info), void *data)
+{
+ struct subprocess_info *sub_info;
+
+ sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL);
+ if (!sub_info)
+ return NULL;
+
+ INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
+ sub_info->path = "none";
+ sub_info->file = file;
+ sub_info->init = init;
+ sub_info->cleanup = cleanup;
+ sub_info->data = data;
+ return sub_info;
+}
+
+static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
+{
+ struct umh_info *umh_info = info->data;
+ struct file *from_umh[2];
+ struct file *to_umh[2];
+ int err;
+
+ /* create pipe to send data to umh */
+ err = create_pipe_files(to_umh, 0);
+ if (err)
+ return err;
+ err = replace_fd(0, to_umh[0], 0);
+ fput(to_umh[0]);
+ if (err < 0) {
+ fput(to_umh[1]);
+ return err;
+ }
+
+ /* create pipe to receive data from umh */
+ err = create_pipe_files(from_umh, 0);
+ if (err) {
+ fput(to_umh[1]);
+ replace_fd(0, NULL, 0);
+ return err;
+ }
+ err = replace_fd(1, from_umh[1], 0);
+ fput(from_umh[1]);
+ if (err < 0) {
+ fput(to_umh[1]);
+ replace_fd(0, NULL, 0);
+ fput(from_umh[0]);
+ return err;
+ }
+
+ umh_info->pipe_to_umh = to_umh[1];
+ umh_info->pipe_from_umh = from_umh[0];
+ return 0;
+}
+
+static void umh_save_pid(struct subprocess_info *info)
+{
+ struct umh_info *umh_info = info->data;
+
+ umh_info->pid = info->pid;
+}
+
+/**
+ * fork_usermode_blob - fork a blob of bytes as a usermode process
+ * @data: a blob of bytes that can be do_execv-ed as a file
+ * @len: length of the blob
+ * @info: information about usermode process (shouldn't be NULL)
+ *
+ * Returns either negative error or zero which indicates success
+ * in executing a blob of bytes as a usermode process. In such
+ * case 'struct umh_info *info' is populated with two pipes
+ * and a pid of the process. The caller is responsible for health
+ * check of the user process, killing it via pid, and closing the
+ * pipes when user process is no longer needed.
+ */
+int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
+{
+ struct subprocess_info *sub_info;
+ struct file *file;
+ ssize_t written;
+ loff_t pos = 0;
+ int err;
+
+ file = shmem_kernel_file_setup("", len, 0);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ written = kernel_write(file, data, len, &pos);
+ if (written != len) {
+ err = written;
+ if (err >= 0)
+ err = -ENOMEM;
+ goto out;
+ }
+
+ err = -ENOMEM;
+ sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup,
+ umh_save_pid, info);
+ if (!sub_info)
+ goto out;
+
+ err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC);
+out:
+ fput(file);
+ return err;
+}
+EXPORT_SYMBOL_GPL(fork_usermode_blob);
+
/**
* call_usermodehelper_exec - start a usermode application
* @sub_info: information about the subprocessa
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 15ea216..63d0816 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -22,6 +22,7 @@
#include <linux/socket.h>
#include <linux/skbuff.h>
#include <linux/netlink.h>
+#include <linux/uidgid.h>
#include <linux/uuid.h>
#include <linux/ctype.h>
#include <net/sock.h>
@@ -231,30 +232,6 @@
return r;
}
-#ifdef CONFIG_NET
-static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
-{
- struct kobject *kobj = data, *ksobj;
- const struct kobj_ns_type_operations *ops;
-
- ops = kobj_ns_ops(kobj);
- if (!ops && kobj->kset) {
- ksobj = &kobj->kset->kobj;
- if (ksobj->parent != NULL)
- ops = kobj_ns_ops(ksobj->parent);
- }
-
- if (ops && ops->netlink_ns && kobj->ktype->namespace) {
- const void *sock_ns, *ns;
- ns = kobj->ktype->namespace(kobj);
- sock_ns = ops->netlink_ns(dsk);
- return sock_ns != ns;
- }
-
- return 0;
-}
-#endif
-
#ifdef CONFIG_UEVENT_HELPER
static int kobj_usermode_filter(struct kobject *kobj)
{
@@ -296,15 +273,44 @@
}
#endif
-static int kobject_uevent_net_broadcast(struct kobject *kobj,
- struct kobj_uevent_env *env,
+#ifdef CONFIG_NET
+static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
const char *action_string,
const char *devpath)
{
- int retval = 0;
-#if defined(CONFIG_NET)
+ struct netlink_skb_parms *parms;
+ struct sk_buff *skb = NULL;
+ char *scratch;
+ size_t len;
+
+ /* allocate message with maximum possible size */
+ len = strlen(action_string) + strlen(devpath) + 2;
+ skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+ if (!skb)
+ return NULL;
+
+ /* add header */
+ scratch = skb_put(skb, len);
+ sprintf(scratch, "%s@%s", action_string, devpath);
+
+ skb_put_data(skb, env->buf, env->buflen);
+
+ parms = &NETLINK_CB(skb);
+ parms->creds.uid = GLOBAL_ROOT_UID;
+ parms->creds.gid = GLOBAL_ROOT_GID;
+ parms->dst_group = 1;
+ parms->portid = 0;
+
+ return skb;
+}
+
+static int uevent_net_broadcast_untagged(struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
struct sk_buff *skb = NULL;
struct uevent_sock *ue_sk;
+ int retval = 0;
/* send netlink message */
list_for_each_entry(ue_sk, &uevent_sock_list, list) {
@@ -314,37 +320,99 @@
continue;
if (!skb) {
- /* allocate message with the maximum possible size */
- size_t len = strlen(action_string) + strlen(devpath) + 2;
- char *scratch;
-
retval = -ENOMEM;
- skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+ skb = alloc_uevent_skb(env, action_string, devpath);
if (!skb)
continue;
-
- /* add header */
- scratch = skb_put(skb, len);
- sprintf(scratch, "%s@%s", action_string, devpath);
-
- skb_put_data(skb, env->buf, env->buflen);
-
- NETLINK_CB(skb).dst_group = 1;
}
- retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
- 0, 1, GFP_KERNEL,
- kobj_bcast_filter,
- kobj);
+ retval = netlink_broadcast(uevent_sock, skb_get(skb), 0, 1,
+ GFP_KERNEL);
/* ENOBUFS should be handled in userspace */
if (retval == -ENOBUFS || retval == -ESRCH)
retval = 0;
}
consume_skb(skb);
-#endif
+
return retval;
}
+static int uevent_net_broadcast_tagged(struct sock *usk,
+ struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ struct user_namespace *owning_user_ns = sock_net(usk)->user_ns;
+ struct sk_buff *skb = NULL;
+ int ret = 0;
+
+ skb = alloc_uevent_skb(env, action_string, devpath);
+ if (!skb)
+ return -ENOMEM;
+
+ /* fix credentials */
+ if (owning_user_ns != &init_user_ns) {
+ struct netlink_skb_parms *parms = &NETLINK_CB(skb);
+ kuid_t root_uid;
+ kgid_t root_gid;
+
+ /* fix uid */
+ root_uid = make_kuid(owning_user_ns, 0);
+ if (uid_valid(root_uid))
+ parms->creds.uid = root_uid;
+
+ /* fix gid */
+ root_gid = make_kgid(owning_user_ns, 0);
+ if (gid_valid(root_gid))
+ parms->creds.gid = root_gid;
+ }
+
+ ret = netlink_broadcast(usk, skb, 0, 1, GFP_KERNEL);
+ /* ENOBUFS should be handled in userspace */
+ if (ret == -ENOBUFS || ret == -ESRCH)
+ ret = 0;
+
+ return ret;
+}
+#endif
+
+static int kobject_uevent_net_broadcast(struct kobject *kobj,
+ struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ int ret = 0;
+
+#ifdef CONFIG_NET
+ const struct kobj_ns_type_operations *ops;
+ const struct net *net = NULL;
+
+ ops = kobj_ns_ops(kobj);
+ if (!ops && kobj->kset) {
+ struct kobject *ksobj = &kobj->kset->kobj;
+ if (ksobj->parent != NULL)
+ ops = kobj_ns_ops(ksobj->parent);
+ }
+
+ /* kobjects currently only carry network namespace tags and they
+ * are the only tag relevant here since we want to decide which
+ * network namespaces to broadcast the uevent into.
+ */
+ if (ops && ops->netlink_ns && kobj->ktype->namespace)
+ if (ops->type == KOBJ_NS_TYPE_NET)
+ net = kobj->ktype->namespace(kobj);
+
+ if (!net)
+ ret = uevent_net_broadcast_untagged(env, action_string,
+ devpath);
+ else
+ ret = uevent_net_broadcast_tagged(net->uevent_sock->sk, env,
+ action_string, devpath);
+#endif
+
+ return ret;
+}
+
static void zap_modalias_env(struct kobj_uevent_env *env)
{
static const char modalias_prefix[] = "MODALIAS=";
@@ -703,9 +771,13 @@
net->uevent_sock = ue_sk;
- mutex_lock(&uevent_sock_mutex);
- list_add_tail(&ue_sk->list, &uevent_sock_list);
- mutex_unlock(&uevent_sock_mutex);
+ /* Restrict uevents to initial user namespace. */
+ if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+ mutex_lock(&uevent_sock_mutex);
+ list_add_tail(&ue_sk->list, &uevent_sock_list);
+ mutex_unlock(&uevent_sock_mutex);
+ }
+
return 0;
}
@@ -713,9 +785,11 @@
{
struct uevent_sock *ue_sk = net->uevent_sock;
- mutex_lock(&uevent_sock_mutex);
- list_del(&ue_sk->list);
- mutex_unlock(&uevent_sock_mutex);
+ if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+ mutex_lock(&uevent_sock_mutex);
+ list_del(&ue_sk->list);
+ mutex_unlock(&uevent_sock_mutex);
+ }
netlink_kernel_release(ue_sk->sk);
kfree(ue_sk);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 2b2b799..9427b57 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -668,8 +668,9 @@
* For a completely stable walk you should construct your own data
* structure outside the hash table.
*
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
+ * This function may be called from any process context, including
+ * non-preemptable context, but cannot be called from softirq or
+ * hardirq context.
*
* You must call rhashtable_walk_exit after this function returns.
*/
@@ -726,6 +727,7 @@
__acquires(RCU)
{
struct rhashtable *ht = iter->ht;
+ bool rhlist = ht->rhlist;
rcu_read_lock();
@@ -734,11 +736,52 @@
list_del(&iter->walker.list);
spin_unlock(&ht->lock);
- if (!iter->walker.tbl && !iter->end_of_table) {
+ if (iter->end_of_table)
+ return 0;
+ if (!iter->walker.tbl) {
iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
+ iter->slot = 0;
+ iter->skip = 0;
return -EAGAIN;
}
+ if (iter->p && !rhlist) {
+ /*
+ * We need to validate that 'p' is still in the table, and
+ * if so, update 'skip'
+ */
+ struct rhash_head *p;
+ int skip = 0;
+ rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+ skip++;
+ if (p == iter->p) {
+ iter->skip = skip;
+ goto found;
+ }
+ }
+ iter->p = NULL;
+ } else if (iter->p && rhlist) {
+ /* Need to validate that 'list' is still in the table, and
+ * if so, update 'skip' and 'p'.
+ */
+ struct rhash_head *p;
+ struct rhlist_head *list;
+ int skip = 0;
+ rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+ for (list = container_of(p, struct rhlist_head, rhead);
+ list;
+ list = rcu_dereference(list->next)) {
+ skip++;
+ if (list == iter->list) {
+ iter->p = p;
+ skip = skip;
+ goto found;
+ }
+ }
+ }
+ iter->p = NULL;
+ }
+found:
return 0;
}
EXPORT_SYMBOL_GPL(rhashtable_walk_start_check);
@@ -914,8 +957,6 @@
iter->walker.tbl = NULL;
spin_unlock(&ht->lock);
- iter->p = NULL;
-
out:
rcu_read_unlock();
}
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 8e15780..317f231 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -386,116 +386,6 @@
return 0;
}
-#define PUSH_CNT 68
-/* test: {skb->data[0], vlan_push} x 68 + {skb->data[0], vlan_pop} x 68 */
-static int bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
-{
- unsigned int len = BPF_MAXINSNS;
- struct bpf_insn *insn;
- int i = 0, j, k = 0;
-
- insn = kmalloc_array(len, sizeof(*insn), GFP_KERNEL);
- if (!insn)
- return -ENOMEM;
-
- insn[i++] = BPF_MOV64_REG(R6, R1);
-loop:
- for (j = 0; j < PUSH_CNT; j++) {
- insn[i++] = BPF_LD_ABS(BPF_B, 0);
- insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0x34, len - i - 2);
- i++;
- insn[i++] = BPF_MOV64_REG(R1, R6);
- insn[i++] = BPF_MOV64_IMM(R2, 1);
- insn[i++] = BPF_MOV64_IMM(R3, 2);
- insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
- bpf_skb_vlan_push_proto.func - __bpf_call_base);
- insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0, len - i - 2);
- i++;
- }
-
- for (j = 0; j < PUSH_CNT; j++) {
- insn[i++] = BPF_LD_ABS(BPF_B, 0);
- insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0x34, len - i - 2);
- i++;
- insn[i++] = BPF_MOV64_REG(R1, R6);
- insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
- bpf_skb_vlan_pop_proto.func - __bpf_call_base);
- insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0, len - i - 2);
- i++;
- }
- if (++k < 5)
- goto loop;
-
- for (; i < len - 1; i++)
- insn[i] = BPF_ALU32_IMM(BPF_MOV, R0, 0xbef);
-
- insn[len - 1] = BPF_EXIT_INSN();
-
- self->u.ptr.insns = insn;
- self->u.ptr.len = len;
-
- return 0;
-}
-
-static int bpf_fill_ld_abs_vlan_push_pop2(struct bpf_test *self)
-{
- struct bpf_insn *insn;
-
- insn = kmalloc_array(16, sizeof(*insn), GFP_KERNEL);
- if (!insn)
- return -ENOMEM;
-
- /* Due to func address being non-const, we need to
- * assemble this here.
- */
- insn[0] = BPF_MOV64_REG(R6, R1);
- insn[1] = BPF_LD_ABS(BPF_B, 0);
- insn[2] = BPF_LD_ABS(BPF_H, 0);
- insn[3] = BPF_LD_ABS(BPF_W, 0);
- insn[4] = BPF_MOV64_REG(R7, R6);
- insn[5] = BPF_MOV64_IMM(R6, 0);
- insn[6] = BPF_MOV64_REG(R1, R7);
- insn[7] = BPF_MOV64_IMM(R2, 1);
- insn[8] = BPF_MOV64_IMM(R3, 2);
- insn[9] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
- bpf_skb_vlan_push_proto.func - __bpf_call_base);
- insn[10] = BPF_MOV64_REG(R6, R7);
- insn[11] = BPF_LD_ABS(BPF_B, 0);
- insn[12] = BPF_LD_ABS(BPF_H, 0);
- insn[13] = BPF_LD_ABS(BPF_W, 0);
- insn[14] = BPF_MOV64_IMM(R0, 42);
- insn[15] = BPF_EXIT_INSN();
-
- self->u.ptr.insns = insn;
- self->u.ptr.len = 16;
-
- return 0;
-}
-
-static int bpf_fill_jump_around_ld_abs(struct bpf_test *self)
-{
- unsigned int len = BPF_MAXINSNS;
- struct bpf_insn *insn;
- int i = 0;
-
- insn = kmalloc_array(len, sizeof(*insn), GFP_KERNEL);
- if (!insn)
- return -ENOMEM;
-
- insn[i++] = BPF_MOV64_REG(R6, R1);
- insn[i++] = BPF_LD_ABS(BPF_B, 0);
- insn[i] = BPF_JMP_IMM(BPF_JEQ, R0, 10, len - i - 2);
- i++;
- while (i < len - 1)
- insn[i++] = BPF_LD_ABS(BPF_B, 1);
- insn[i] = BPF_EXIT_INSN();
-
- self->u.ptr.insns = insn;
- self->u.ptr.len = len;
-
- return 0;
-}
-
static int __bpf_fill_stxdw(struct bpf_test *self, int size)
{
unsigned int len = BPF_MAXINSNS;
@@ -1988,40 +1878,6 @@
{ { 0, -1 } }
},
{
- "INT: DIV + ABS",
- .u.insns_int = {
- BPF_ALU64_REG(BPF_MOV, R6, R1),
- BPF_LD_ABS(BPF_B, 3),
- BPF_ALU64_IMM(BPF_MOV, R2, 2),
- BPF_ALU32_REG(BPF_DIV, R0, R2),
- BPF_ALU64_REG(BPF_MOV, R8, R0),
- BPF_LD_ABS(BPF_B, 4),
- BPF_ALU64_REG(BPF_ADD, R8, R0),
- BPF_LD_IND(BPF_B, R8, -70),
- BPF_EXIT_INSN(),
- },
- INTERNAL,
- { 10, 20, 30, 40, 50 },
- { { 4, 0 }, { 5, 10 } }
- },
- {
- /* This one doesn't go through verifier, but is just raw insn
- * as opposed to cBPF tests from here. Thus div by 0 tests are
- * done in test_verifier in BPF kselftests.
- */
- "INT: DIV by -1",
- .u.insns_int = {
- BPF_ALU64_REG(BPF_MOV, R6, R1),
- BPF_ALU64_IMM(BPF_MOV, R7, -1),
- BPF_LD_ABS(BPF_B, 3),
- BPF_ALU32_REG(BPF_DIV, R0, R7),
- BPF_EXIT_INSN(),
- },
- INTERNAL,
- { 10, 20, 30, 40, 50 },
- { { 3, 0 }, { 4, 0 } }
- },
- {
"check: missing ret",
.u.insns = {
BPF_STMT(BPF_LD | BPF_IMM, 1),
@@ -2383,50 +2239,6 @@
{ },
{ { 0, 1 } }
},
- {
- "nmap reduced",
- .u.insns_int = {
- BPF_MOV64_REG(R6, R1),
- BPF_LD_ABS(BPF_H, 12),
- BPF_JMP_IMM(BPF_JNE, R0, 0x806, 28),
- BPF_LD_ABS(BPF_H, 12),
- BPF_JMP_IMM(BPF_JNE, R0, 0x806, 26),
- BPF_MOV32_IMM(R0, 18),
- BPF_STX_MEM(BPF_W, R10, R0, -64),
- BPF_LDX_MEM(BPF_W, R7, R10, -64),
- BPF_LD_IND(BPF_W, R7, 14),
- BPF_STX_MEM(BPF_W, R10, R0, -60),
- BPF_MOV32_IMM(R0, 280971478),
- BPF_STX_MEM(BPF_W, R10, R0, -56),
- BPF_LDX_MEM(BPF_W, R7, R10, -56),
- BPF_LDX_MEM(BPF_W, R0, R10, -60),
- BPF_ALU32_REG(BPF_SUB, R0, R7),
- BPF_JMP_IMM(BPF_JNE, R0, 0, 15),
- BPF_LD_ABS(BPF_H, 12),
- BPF_JMP_IMM(BPF_JNE, R0, 0x806, 13),
- BPF_MOV32_IMM(R0, 22),
- BPF_STX_MEM(BPF_W, R10, R0, -56),
- BPF_LDX_MEM(BPF_W, R7, R10, -56),
- BPF_LD_IND(BPF_H, R7, 14),
- BPF_STX_MEM(BPF_W, R10, R0, -52),
- BPF_MOV32_IMM(R0, 17366),
- BPF_STX_MEM(BPF_W, R10, R0, -48),
- BPF_LDX_MEM(BPF_W, R7, R10, -48),
- BPF_LDX_MEM(BPF_W, R0, R10, -52),
- BPF_ALU32_REG(BPF_SUB, R0, R7),
- BPF_JMP_IMM(BPF_JNE, R0, 0, 2),
- BPF_MOV32_IMM(R0, 256),
- BPF_EXIT_INSN(),
- BPF_MOV32_IMM(R0, 0),
- BPF_EXIT_INSN(),
- },
- INTERNAL,
- { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x08, 0x06, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0x10, 0xbf, 0x48, 0xd6, 0x43, 0xd6},
- { { 38, 256 } },
- .stack_depth = 64,
- },
/* BPF_ALU | BPF_MOV | BPF_X */
{
"ALU_MOV_X: dst = 2",
@@ -5485,22 +5297,6 @@
{ { 1, 0xbee } },
.fill_helper = bpf_fill_ld_abs_get_processor_id,
},
- {
- "BPF_MAXINSNS: ld_abs+vlan_push/pop",
- { },
- INTERNAL,
- { 0x34 },
- { { ETH_HLEN, 0xbef } },
- .fill_helper = bpf_fill_ld_abs_vlan_push_pop,
- },
- {
- "BPF_MAXINSNS: jump around ld_abs",
- { },
- INTERNAL,
- { 10, 11 },
- { { 2, 10 } },
- .fill_helper = bpf_fill_jump_around_ld_abs,
- },
/*
* LD_IND / LD_ABS on fragmented SKBs
*/
@@ -5683,6 +5479,53 @@
{ {0x40, 0x05 } },
},
{
+ "LD_IND byte positive offset, all ff",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_B, 0x1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
+ { {0x40, 0xff } },
+ },
+ {
+ "LD_IND byte positive offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_B, 0x1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LD_IND byte negative offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_B, -0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 } },
+ },
+ {
+ "LD_IND byte negative offset, multiple calls",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3b),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 1),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 2),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 3),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 4),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x82 }, },
+ },
+ {
"LD_IND halfword positive offset",
.u.insns = {
BPF_STMT(BPF_LDX | BPF_IMM, 0x20),
@@ -5731,6 +5574,39 @@
{ {0x40, 0x66cc } },
},
{
+ "LD_IND halfword positive offset, all ff",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3d),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_H, 0x1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
+ { {0x40, 0xffff } },
+ },
+ {
+ "LD_IND halfword positive offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_H, 0x1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LD_IND halfword negative offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_H, -0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 } },
+ },
+ {
"LD_IND word positive offset",
.u.insns = {
BPF_STMT(BPF_LDX | BPF_IMM, 0x20),
@@ -5821,6 +5697,39 @@
{ {0x40, 0x66cc77dd } },
},
{
+ "LD_IND word positive offset, all ff",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3b),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_W, 0x1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
+ { {0x40, 0xffffffff } },
+ },
+ {
+ "LD_IND word positive offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_W, 0x1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LD_IND word negative offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
+ BPF_STMT(BPF_LD | BPF_IND | BPF_W, -0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 } },
+ },
+ {
"LD_ABS byte",
.u.insns = {
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, 0x20),
@@ -5838,6 +5747,68 @@
{ {0x40, 0xcc } },
},
{
+ "LD_ABS byte positive offset, all ff",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
+ { {0x40, 0xff } },
+ },
+ {
+ "LD_ABS byte positive offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LD_ABS byte negative offset, out of bounds load",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, -1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC | FLAG_EXPECTED_FAIL,
+ .expected_errcode = -EINVAL,
+ },
+ {
+ "LD_ABS byte negative offset, in bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x82 }, },
+ },
+ {
+ "LD_ABS byte negative offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LD_ABS byte negative offset, multiple calls",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3c),
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3d),
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3e),
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x82 }, },
+ },
+ {
"LD_ABS halfword",
.u.insns = {
BPF_STMT(BPF_LD | BPF_ABS | BPF_H, 0x22),
@@ -5872,6 +5843,55 @@
{ {0x40, 0x99ff } },
},
{
+ "LD_ABS halfword positive offset, all ff",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_H, 0x3e),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
+ { {0x40, 0xffff } },
+ },
+ {
+ "LD_ABS halfword positive offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_H, 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LD_ABS halfword negative offset, out of bounds load",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_H, -1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC | FLAG_EXPECTED_FAIL,
+ .expected_errcode = -EINVAL,
+ },
+ {
+ "LD_ABS halfword negative offset, in bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_H, SKF_LL_OFF + 0x3e),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x1982 }, },
+ },
+ {
+ "LD_ABS halfword negative offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_H, SKF_LL_OFF + 0x3e),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
"LD_ABS word",
.u.insns = {
BPF_STMT(BPF_LD | BPF_ABS | BPF_W, 0x1c),
@@ -5939,6 +5959,140 @@
},
{ {0x40, 0x88ee99ff } },
},
+ {
+ "LD_ABS word positive offset, all ff",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_W, 0x3c),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
+ { {0x40, 0xffffffff } },
+ },
+ {
+ "LD_ABS word positive offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_W, 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LD_ABS word negative offset, out of bounds load",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_W, -1),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC | FLAG_EXPECTED_FAIL,
+ .expected_errcode = -EINVAL,
+ },
+ {
+ "LD_ABS word negative offset, in bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_W, SKF_LL_OFF + 0x3c),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x25051982 }, },
+ },
+ {
+ "LD_ABS word negative offset, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_ABS | BPF_W, SKF_LL_OFF + 0x3c),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x3f, 0 }, },
+ },
+ {
+ "LDX_MSH standalone, preserved A",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3c),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0xffeebbaa }, },
+ },
+ {
+ "LDX_MSH standalone, preserved A 2",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_IMM, 0x175e9d63),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3c),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3d),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3e),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3f),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x175e9d63 }, },
+ },
+ {
+ "LDX_MSH standalone, test result 1",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3c),
+ BPF_STMT(BPF_MISC | BPF_TXA, 0),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x14 }, },
+ },
+ {
+ "LDX_MSH standalone, test result 2",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3e),
+ BPF_STMT(BPF_MISC | BPF_TXA, 0),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x24 }, },
+ },
+ {
+ "LDX_MSH standalone, negative offset",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, -1),
+ BPF_STMT(BPF_MISC | BPF_TXA, 0),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0 }, },
+ },
+ {
+ "LDX_MSH standalone, negative offset 2",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, SKF_LL_OFF + 0x3e),
+ BPF_STMT(BPF_MISC | BPF_TXA, 0),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0x24 }, },
+ },
+ {
+ "LDX_MSH standalone, out of bounds",
+ .u.insns = {
+ BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
+ BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x40),
+ BPF_STMT(BPF_MISC | BPF_TXA, 0),
+ BPF_STMT(BPF_RET | BPF_A, 0x0),
+ },
+ CLASSIC,
+ { [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
+ { {0x40, 0 }, },
+ },
/*
* verify that the interpreter or JIT correctly sets A and X
* to 0.
@@ -6127,14 +6281,6 @@
{},
{ {0x1, 0x42 } },
},
- {
- "LD_ABS with helper changing skb data",
- { },
- INTERNAL,
- { 0x34 },
- { { ETH_HLEN, 42 } },
- .fill_helper = bpf_fill_ld_abs_vlan_push_pop2,
- },
/* Checking interpreter vs JIT wrt signed extended imms. */
{
"JNE signed compare, test 1",
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 5505ee6..73a6578 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -118,17 +118,21 @@
}
int vlan_check_real_dev(struct net_device *real_dev,
- __be16 protocol, u16 vlan_id)
+ __be16 protocol, u16 vlan_id,
+ struct netlink_ext_ack *extack)
{
const char *name = real_dev->name;
if (real_dev->features & NETIF_F_VLAN_CHALLENGED) {
pr_info("VLANs not supported on %s\n", name);
+ NL_SET_ERR_MSG_MOD(extack, "VLANs not supported on device");
return -EOPNOTSUPP;
}
- if (vlan_find_dev(real_dev, protocol, vlan_id) != NULL)
+ if (vlan_find_dev(real_dev, protocol, vlan_id) != NULL) {
+ NL_SET_ERR_MSG_MOD(extack, "VLAN device already exists");
return -EEXIST;
+ }
return 0;
}
@@ -215,7 +219,8 @@
if (vlan_id >= VLAN_VID_MASK)
return -ERANGE;
- err = vlan_check_real_dev(real_dev, htons(ETH_P_8021Q), vlan_id);
+ err = vlan_check_real_dev(real_dev, htons(ETH_P_8021Q), vlan_id,
+ NULL);
if (err < 0)
return err;
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index e23aac3..44df1c3 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -109,7 +109,8 @@
void vlan_dev_get_realdev_name(const struct net_device *dev, char *result);
int vlan_check_real_dev(struct net_device *real_dev,
- __be16 protocol, u16 vlan_id);
+ __be16 protocol, u16 vlan_id,
+ struct netlink_ext_ack *extack);
void vlan_setup(struct net_device *dev);
int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack);
void unregister_vlan_dev(struct net_device *dev, struct list_head *head);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 236452e..546af0e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -215,7 +215,9 @@
return 0;
}
-/* Flags are defined in the vlan_flags enum in include/linux/if_vlan.h file. */
+/* Flags are defined in the vlan_flags enum in
+ * include/uapi/linux/if_vlan.h file.
+ */
int vlan_dev_change_flags(const struct net_device *dev, u32 flags, u32 mask)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index 6689c0b..9b60c1e 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -47,14 +47,20 @@
int err;
if (tb[IFLA_ADDRESS]) {
- if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN)
+ if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid link address");
return -EINVAL;
- if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS])))
+ }
+ if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid link address");
return -EADDRNOTAVAIL;
+ }
}
- if (!data)
+ if (!data) {
+ NL_SET_ERR_MSG_MOD(extack, "VLAN properties not specified");
return -EINVAL;
+ }
if (data[IFLA_VLAN_PROTOCOL]) {
switch (nla_get_be16(data[IFLA_VLAN_PROTOCOL])) {
@@ -62,29 +68,38 @@
case htons(ETH_P_8021AD):
break;
default:
+ NL_SET_ERR_MSG_MOD(extack, "Invalid VLAN protocol");
return -EPROTONOSUPPORT;
}
}
if (data[IFLA_VLAN_ID]) {
id = nla_get_u16(data[IFLA_VLAN_ID]);
- if (id >= VLAN_VID_MASK)
+ if (id >= VLAN_VID_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid VLAN id");
return -ERANGE;
+ }
}
if (data[IFLA_VLAN_FLAGS]) {
flags = nla_data(data[IFLA_VLAN_FLAGS]);
if ((flags->flags & flags->mask) &
~(VLAN_FLAG_REORDER_HDR | VLAN_FLAG_GVRP |
- VLAN_FLAG_LOOSE_BINDING | VLAN_FLAG_MVRP))
+ VLAN_FLAG_LOOSE_BINDING | VLAN_FLAG_MVRP)) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid VLAN flags");
return -EINVAL;
+ }
}
err = vlan_validate_qos_map(data[IFLA_VLAN_INGRESS_QOS]);
- if (err < 0)
+ if (err < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid ingress QOS map");
return err;
+ }
err = vlan_validate_qos_map(data[IFLA_VLAN_EGRESS_QOS]);
- if (err < 0)
+ if (err < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid egress QOS map");
return err;
+ }
return 0;
}
@@ -126,14 +141,21 @@
__be16 proto;
int err;
- if (!data[IFLA_VLAN_ID])
+ if (!data[IFLA_VLAN_ID]) {
+ NL_SET_ERR_MSG_MOD(extack, "VLAN id not specified");
return -EINVAL;
+ }
- if (!tb[IFLA_LINK])
+ if (!tb[IFLA_LINK]) {
+ NL_SET_ERR_MSG_MOD(extack, "link not specified");
return -EINVAL;
+ }
+
real_dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
- if (!real_dev)
+ if (!real_dev) {
+ NL_SET_ERR_MSG_MOD(extack, "link does not exist");
return -ENODEV;
+ }
if (data[IFLA_VLAN_PROTOCOL])
proto = nla_get_be16(data[IFLA_VLAN_PROTOCOL]);
@@ -146,7 +168,8 @@
dev->priv_flags |= (real_dev->priv_flags & IFF_XMIT_DST_RELEASE);
vlan->flags = VLAN_FLAG_REORDER_HDR;
- err = vlan_check_real_dev(real_dev, vlan->vlan_proto, vlan->vlan_id);
+ err = vlan_check_real_dev(real_dev, vlan->vlan_proto, vlan->vlan_id,
+ extack);
if (err < 0)
return err;
diff --git a/net/9p/mod.c b/net/9p/mod.c
index 6ab36ae..eb9777f 100644
--- a/net/9p/mod.c
+++ b/net/9p/mod.c
@@ -104,7 +104,7 @@
/**
* v9fs_get_trans_by_name - get transport with the matching name
- * @name: string identifying transport
+ * @s: string identifying transport
*
*/
struct p9_trans_module *v9fs_get_trans_by_name(char *s)
diff --git a/net/Kconfig b/net/Kconfig
index 0428f12..ba554ce 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -59,6 +59,7 @@
source "net/xfrm/Kconfig"
source "net/iucv/Kconfig"
source "net/smc/Kconfig"
+source "net/xdp/Kconfig"
config INET
bool "TCP/IP networking"
@@ -201,6 +202,8 @@
endif
+source "net/bpfilter/Kconfig"
+
source "net/dccp/Kconfig"
source "net/sctp/Kconfig"
source "net/rds/Kconfig"
@@ -407,6 +410,9 @@
bool
default n
+config SOCK_VALIDATE_XMIT
+ bool
+
config NET_DEVLINK
tristate "Network physical/parent device Netlink interface"
help
@@ -423,6 +429,9 @@
on MAY_USE_DEVLINK to ensure they do not cause link errors when
devlink is a loadable module and the driver using it is built-in.
+config PAGE_POOL
+ bool
+
endif # if NET
# Used by archs to tell that they support BPF JIT compiler plus which flavour.
diff --git a/net/Makefile b/net/Makefile
index a6147c6..bdaf539 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -20,6 +20,7 @@
obj-$(CONFIG_XFRM) += xfrm/
obj-$(CONFIG_UNIX) += unix/
obj-$(CONFIG_NET) += ipv6/
+obj-$(CONFIG_BPFILTER) += bpfilter/
obj-$(CONFIG_PACKET) += packet/
obj-$(CONFIG_NET_KEY) += key/
obj-$(CONFIG_BRIDGE) += bridge/
@@ -85,3 +86,4 @@
endif
obj-$(CONFIG_QRTR) += qrtr/
obj-$(CONFIG_NET_NCSI) += ncsi/
+obj-$(CONFIG_XDP_SOCKETS) += xdp/
diff --git a/net/batman-adv/Kconfig b/net/batman-adv/Kconfig
index e4e2e02..de8034d 100644
--- a/net/batman-adv/Kconfig
+++ b/net/batman-adv/Kconfig
@@ -35,7 +35,7 @@
config BATMAN_ADV_BATMAN_V
bool "B.A.T.M.A.N. V protocol (experimental)"
depends on BATMAN_ADV && !(CFG80211=m && BATMAN_ADV=y)
- default n
+ default y
help
This option enables the B.A.T.M.A.N. V protocol, the successor
of the currently used B.A.T.M.A.N. IV protocol. The main
@@ -94,13 +94,13 @@
bool "batman-adv debugfs entries"
depends on BATMAN_ADV
depends on DEBUG_FS
- default y
+ default n
help
Enable this to export routing related debug tables via debugfs.
The information for each soft-interface and used hard-interface can be
found under batman_adv/
- If unsure, say Y.
+ If unsure, say N.
config BATMAN_ADV_DEBUG
bool "B.A.T.M.A.N. debugging"
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 2868749..71c20c1 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -127,7 +127,20 @@
rtnl_lock();
ret = __ethtool_get_link_ksettings(hard_iface->net_dev, &link_settings);
rtnl_unlock();
- if (ret == 0) {
+
+ /* Virtual interface drivers such as tun / tap interfaces, VLAN, etc
+ * tend to initialize the interface throughput with some value for the
+ * sake of having a throughput number to export via ethtool. This
+ * exported throughput leaves batman-adv to conclude the interface
+ * throughput is genuine (reflecting reality), thus no measurements
+ * are necessary.
+ *
+ * Based on the observation that those interface types also tend to set
+ * the link auto-negotiation to 'off', batman-adv shall check this
+ * setting to differentiate between genuine link throughput information
+ * and placeholders installed by virtual interfaces.
+ */
+ if (ret == 0 && link_settings.base.autoneg == AUTONEG_ENABLE) {
/* link characteristics might change over time */
if (link_settings.base.duplex == DUPLEX_FULL)
hard_iface->bat_v.flags |= BATADV_FULL_DUPLEX;
diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
index 057a28a..8da3c93 100644
--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -25,7 +25,7 @@
#define BATADV_DRIVER_DEVICE "batman-adv"
#ifndef BATADV_SOURCE_VERSION
-#define BATADV_SOURCE_VERSION "2018.1"
+#define BATADV_SOURCE_VERSION "2018.2"
#endif
/* B.A.T.M.A.N. parameters */
diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
index a35f597..86725d7 100644
--- a/net/batman-adv/multicast.c
+++ b/net/batman-adv/multicast.c
@@ -815,9 +815,6 @@
if (!atomic_read(&bat_priv->multicast_mode))
return -EINVAL;
- if (atomic_read(&bat_priv->mcast.num_disabled))
- return -EINVAL;
-
switch (ntohs(ethhdr->h_proto)) {
case ETH_P_IP:
return batadv_mcast_forw_mode_check_ipv4(bat_priv, skb,
@@ -1183,33 +1180,23 @@
{
bool orig_mcast_enabled = !(flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND);
u8 mcast_flags = BATADV_NO_FLAGS;
- bool orig_initialized;
if (orig_mcast_enabled && tvlv_value &&
tvlv_value_len >= sizeof(mcast_flags))
mcast_flags = *(u8 *)tvlv_value;
- spin_lock_bh(&orig->mcast_handler_lock);
- orig_initialized = test_bit(BATADV_ORIG_CAPA_HAS_MCAST,
- &orig->capa_initialized);
+ if (!orig_mcast_enabled) {
+ mcast_flags |= BATADV_MCAST_WANT_ALL_IPV4;
+ mcast_flags |= BATADV_MCAST_WANT_ALL_IPV6;
+ }
- /* If mcast support is turned on decrease the disabled mcast node
- * counter only if we had increased it for this node before. If this
- * is a completely new orig_node no need to decrease the counter.
- */
+ spin_lock_bh(&orig->mcast_handler_lock);
+
if (orig_mcast_enabled &&
!test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities)) {
- if (orig_initialized)
- atomic_dec(&bat_priv->mcast.num_disabled);
set_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities);
- /* If mcast support is being switched off or if this is an initial
- * OGM without mcast support then increase the disabled mcast
- * node counter.
- */
} else if (!orig_mcast_enabled &&
- (test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities) ||
- !orig_initialized)) {
- atomic_inc(&bat_priv->mcast.num_disabled);
+ test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities)) {
clear_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities);
}
@@ -1595,10 +1582,6 @@
spin_lock_bh(&orig->mcast_handler_lock);
- if (!test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities) &&
- test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capa_initialized))
- atomic_dec(&bat_priv->mcast.num_disabled);
-
batadv_mcast_want_unsnoop_update(bat_priv, orig, BATADV_NO_FLAGS);
batadv_mcast_want_ipv4_update(bat_priv, orig, BATADV_NO_FLAGS);
batadv_mcast_want_ipv6_update(bat_priv, orig, BATADV_NO_FLAGS);
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index edeffcb..1485263 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -188,8 +188,8 @@
{
}
-static int batadv_interface_tx(struct sk_buff *skb,
- struct net_device *soft_iface)
+static netdev_tx_t batadv_interface_tx(struct sk_buff *skb,
+ struct net_device *soft_iface)
{
struct ethhdr *ethhdr;
struct batadv_priv *bat_priv = netdev_priv(soft_iface);
@@ -796,7 +796,6 @@
bat_priv->mcast.querier_ipv6.shadowing = false;
bat_priv->mcast.flags = BATADV_NO_FLAGS;
atomic_set(&bat_priv->multicast_mode, 1);
- atomic_set(&bat_priv->mcast.num_disabled, 0);
atomic_set(&bat_priv->mcast.num_want_all_unsnoopables, 0);
atomic_set(&bat_priv->mcast.num_want_all_ipv4, 0);
atomic_set(&bat_priv->mcast.num_want_all_ipv6, 0);
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 476b052..360357f 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -215,10 +215,12 @@
struct batadv_hard_iface_bat_v bat_v;
#endif
+#ifdef CONFIG_BATMAN_ADV_DEBUGFS
/**
* @debug_dir: dentry for nc subdir in batman-adv directory in debugfs
*/
struct dentry *debug_dir;
+#endif
/**
* @neigh_list: list of unique single hop neighbors via this interface
@@ -1160,13 +1162,13 @@
*/
struct batadv_mcast_querier_state {
/** @exists: whether a querier exists in the mesh */
- bool exists;
+ unsigned char exists:1;
/**
* @shadowing: if a querier exists, whether it is potentially shadowing
* multicast listeners (i.e. querier is behind our own bridge segment)
*/
- bool shadowing;
+ unsigned char shadowing:1;
};
/**
@@ -1207,13 +1209,10 @@
u8 flags;
/** @enabled: whether the multicast tvlv is currently enabled */
- bool enabled;
+ unsigned char enabled:1;
/** @bridged: whether the soft interface has a bridge on top */
- bool bridged;
-
- /** @num_disabled: number of nodes that have no mcast tvlv */
- atomic_t num_disabled;
+ unsigned char bridged:1;
/**
* @num_want_all_unsnoopables: number of nodes wanting unsnoopable IP
@@ -1245,10 +1244,12 @@
/** @work: work queue callback item for cleanup */
struct delayed_work work;
+#ifdef CONFIG_BATMAN_ADV_DEBUGFS
/**
* @debug_dir: dentry for nc subdir in batman-adv directory in debugfs
*/
struct dentry *debug_dir;
+#endif
/**
* @min_tq: only consider neighbors for encoding if neigh_tq > min_tq
@@ -1392,7 +1393,7 @@
atomic_t dup_acks;
/** @fast_recovery: true if in Fast Recovery mode */
- bool fast_recovery;
+ unsigned char fast_recovery:1;
/** @recover: last sent seqno when entering Fast Recovery */
u32 recover;
@@ -1601,8 +1602,10 @@
/** @mesh_obj: kobject for sysfs mesh subdirectory */
struct kobject *mesh_obj;
+#ifdef CONFIG_BATMAN_ADV_DEBUGFS
/** @debug_dir: dentry for debugfs batman-adv subdirectory */
struct dentry *debug_dir;
+#endif
/** @forw_bat_list: list of aggregated OGMs that will be forwarded */
struct hlist_head forw_bat_list;
@@ -2049,10 +2052,10 @@
* @decoded: Marks a skb as decoded, which is checked when searching for
* coding opportunities in network-coding.c
*/
- bool decoded;
+ unsigned char decoded:1;
/** @num_bcasts: Counter for broadcast packet retransmissions */
- unsigned int num_bcasts;
+ unsigned char num_bcasts;
};
/**
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 40d260f..b0ee9ed 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3422,6 +3422,37 @@
return 0;
}
+int __hci_cmd_send(struct hci_dev *hdev, u16 opcode, u32 plen,
+ const void *param)
+{
+ struct sk_buff *skb;
+
+ if (hci_opcode_ogf(opcode) != 0x3f) {
+ /* A controller receiving a command shall respond with either
+ * a Command Status Event or a Command Complete Event.
+ * Therefore, all standard HCI commands must be sent via the
+ * standard API, using hci_send_cmd or hci_cmd_sync helpers.
+ * Some vendors do not comply with this rule for vendor-specific
+ * commands and do not return any event. We want to support
+ * unresponded commands for such cases only.
+ */
+ bt_dev_err(hdev, "unresponded command not supported");
+ return -EINVAL;
+ }
+
+ skb = hci_prepare_cmd(hdev, opcode, plen, param);
+ if (!skb) {
+ bt_dev_err(hdev, "no memory for command (opcode 0x%4.4x)",
+ opcode);
+ return -ENOMEM;
+ }
+
+ hci_send_frame(hdev, skb);
+
+ return 0;
+}
+EXPORT_SYMBOL(__hci_cmd_send);
+
/* Get data from the previously sent command */
void *hci_sent_cmd_data(struct hci_dev *hdev, __u16 opcode)
{
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 139707c..235b5aa 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -4942,10 +4942,14 @@
struct hci_ev_le_advertising_info *ev = ptr;
s8 rssi;
- rssi = ev->data[ev->length];
- process_adv_report(hdev, ev->evt_type, &ev->bdaddr,
- ev->bdaddr_type, NULL, 0, rssi,
- ev->data, ev->length);
+ if (ev->length <= HCI_MAX_AD_LENGTH) {
+ rssi = ev->data[ev->length];
+ process_adv_report(hdev, ev->evt_type, &ev->bdaddr,
+ ev->bdaddr_type, NULL, 0, rssi,
+ ev->data, ev->length);
+ } else {
+ bt_dev_err(hdev, "Dropping invalid advertising data");
+ }
ptr += sizeof(*ev) + ev->length + 1;
}
diff --git a/net/bluetooth/hci_request.c b/net/bluetooth/hci_request.c
index 66c0781..e44d347 100644
--- a/net/bluetooth/hci_request.c
+++ b/net/bluetooth/hci_request.c
@@ -122,7 +122,6 @@
struct sk_buff *__hci_cmd_sync_ev(struct hci_dev *hdev, u16 opcode, u32 plen,
const void *param, u8 event, u32 timeout)
{
- DECLARE_WAITQUEUE(wait, current);
struct hci_request req;
struct sk_buff *skb;
int err = 0;
@@ -135,21 +134,14 @@
hdev->req_status = HCI_REQ_PEND;
- add_wait_queue(&hdev->req_wait_q, &wait);
- set_current_state(TASK_INTERRUPTIBLE);
-
err = hci_req_run_skb(&req, hci_req_sync_complete);
- if (err < 0) {
- remove_wait_queue(&hdev->req_wait_q, &wait);
- set_current_state(TASK_RUNNING);
+ if (err < 0)
return ERR_PTR(err);
- }
- schedule_timeout(timeout);
+ err = wait_event_interruptible_timeout(hdev->req_wait_q,
+ hdev->req_status != HCI_REQ_PEND, timeout);
- remove_wait_queue(&hdev->req_wait_q, &wait);
-
- if (signal_pending(current))
+ if (err == -ERESTARTSYS)
return ERR_PTR(-EINTR);
switch (hdev->req_status) {
@@ -197,7 +189,6 @@
unsigned long opt, u32 timeout, u8 *hci_status)
{
struct hci_request req;
- DECLARE_WAITQUEUE(wait, current);
int err = 0;
BT_DBG("%s start", hdev->name);
@@ -213,16 +204,10 @@
return err;
}
- add_wait_queue(&hdev->req_wait_q, &wait);
- set_current_state(TASK_INTERRUPTIBLE);
-
err = hci_req_run_skb(&req, hci_req_sync_complete);
if (err < 0) {
hdev->req_status = 0;
- remove_wait_queue(&hdev->req_wait_q, &wait);
- set_current_state(TASK_RUNNING);
-
/* ENODATA means the HCI request command queue is empty.
* This can happen when a request with conditionals doesn't
* trigger any commands to be sent. This is normal behavior
@@ -240,11 +225,10 @@
return err;
}
- schedule_timeout(timeout);
+ err = wait_event_interruptible_timeout(hdev->req_wait_q,
+ hdev->req_status != HCI_REQ_PEND, timeout);
- remove_wait_queue(&hdev->req_wait_q, &wait);
-
- if (signal_pending(current))
+ if (err == -ERESTARTSYS)
return -EINTR;
switch (hdev->req_status) {
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2ced486..68c3578 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -170,7 +170,8 @@
xdp.rxq = &rxqueue->xdp_rxq;
retval = bpf_test_run(prog, &xdp, repeat, &duration);
- if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN)
+ if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
+ xdp.data_end != xdp.data + size)
size = xdp.data_end - xdp.data;
ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
kfree(data);
diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
new file mode 100644
index 0000000..a948b07
--- /dev/null
+++ b/net/bpfilter/Kconfig
@@ -0,0 +1,16 @@
+menuconfig BPFILTER
+ bool "BPF based packet filtering framework (BPFILTER)"
+ default n
+ depends on NET && BPF && INET
+ help
+ This builds experimental bpfilter framework that is aiming to
+ provide netfilter compatible functionality via BPF
+
+if BPFILTER
+config BPFILTER_UMH
+ tristate "bpfilter kernel module with user mode helper"
+ default m
+ help
+ This builds bpfilter kernel module with embedded user mode helper
+endif
+
diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
new file mode 100644
index 0000000..2af752c
--- /dev/null
+++ b/net/bpfilter/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the Linux BPFILTER layer.
+#
+
+hostprogs-y := bpfilter_umh
+bpfilter_umh-objs := main.o
+HOSTCFLAGS += -I. -Itools/include/
+ifeq ($(CONFIG_BPFILTER_UMH), y)
+# builtin bpfilter_umh should be compiled with -static
+# since rootfs isn't mounted at the time of __init
+# function is called and do_execv won't find elf interpreter
+HOSTLDFLAGS += -static
+endif
+
+# a bit of elf magic to convert bpfilter_umh binary into a binary blob
+# inside bpfilter_umh.o elf file referenced by
+# _binary_net_bpfilter_bpfilter_umh_start symbol
+# which bpfilter_kern.c passes further into umh blob loader at run-time
+quiet_cmd_copy_umh = GEN $@
+ cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
+ $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
+ -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
+ --rename-section .data=.init.rodata $< $@
+
+$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
+ $(call cmd,copy_umh)
+
+obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
+bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
new file mode 100644
index 0000000..7596314b
--- /dev/null
+++ b/net/bpfilter/bpfilter_kern.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/umh.h>
+#include <linux/bpfilter.h>
+#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include "msgfmt.h"
+
+#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
+#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
+
+extern char UMH_start;
+extern char UMH_end;
+
+static struct umh_info info;
+/* since ip_getsockopt() can run in parallel, serialize access to umh */
+static DEFINE_MUTEX(bpfilter_lock);
+
+static void shutdown_umh(struct umh_info *info)
+{
+ struct task_struct *tsk;
+
+ tsk = pid_task(find_vpid(info->pid), PIDTYPE_PID);
+ if (tsk)
+ force_sig(SIGKILL, tsk);
+ fput(info->pipe_to_umh);
+ fput(info->pipe_from_umh);
+}
+
+static void __stop_umh(void)
+{
+ if (bpfilter_process_sockopt) {
+ bpfilter_process_sockopt = NULL;
+ shutdown_umh(&info);
+ }
+}
+
+static void stop_umh(void)
+{
+ mutex_lock(&bpfilter_lock);
+ __stop_umh();
+ mutex_unlock(&bpfilter_lock);
+}
+
+static int __bpfilter_process_sockopt(struct sock *sk, int optname,
+ char __user *optval,
+ unsigned int optlen, bool is_set)
+{
+ struct mbox_request req;
+ struct mbox_reply reply;
+ loff_t pos;
+ ssize_t n;
+ int ret;
+
+ req.is_set = is_set;
+ req.pid = current->pid;
+ req.cmd = optname;
+ req.addr = (long)optval;
+ req.len = optlen;
+ mutex_lock(&bpfilter_lock);
+ n = __kernel_write(info.pipe_to_umh, &req, sizeof(req), &pos);
+ if (n != sizeof(req)) {
+ pr_err("write fail %zd\n", n);
+ __stop_umh();
+ ret = -EFAULT;
+ goto out;
+ }
+ pos = 0;
+ n = kernel_read(info.pipe_from_umh, &reply, sizeof(reply), &pos);
+ if (n != sizeof(reply)) {
+ pr_err("read fail %zd\n", n);
+ __stop_umh();
+ ret = -EFAULT;
+ goto out;
+ }
+ ret = reply.status;
+out:
+ mutex_unlock(&bpfilter_lock);
+ return ret;
+}
+
+static int __init load_umh(void)
+{
+ int err;
+
+ /* fork usermode process */
+ err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
+ if (err)
+ return err;
+ pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
+
+ /* health check that usermode process started correctly */
+ if (__bpfilter_process_sockopt(NULL, 0, 0, 0, 0) != 0) {
+ stop_umh();
+ return -EFAULT;
+ }
+ bpfilter_process_sockopt = &__bpfilter_process_sockopt;
+ return 0;
+}
+
+static void __exit fini_umh(void)
+{
+ stop_umh();
+}
+module_init(load_umh);
+module_exit(fini_umh);
+MODULE_LICENSE("GPL");
diff --git a/net/bpfilter/main.c b/net/bpfilter/main.c
new file mode 100644
index 0000000..1317f10
--- /dev/null
+++ b/net/bpfilter/main.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sys/uio.h>
+#include <errno.h>
+#include <stdio.h>
+#include <sys/socket.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include "include/uapi/linux/bpf.h"
+#include <asm/unistd.h>
+#include "msgfmt.h"
+
+int debug_fd;
+
+static int handle_get_cmd(struct mbox_request *cmd)
+{
+ switch (cmd->cmd) {
+ case 0:
+ return 0;
+ default:
+ break;
+ }
+ return -ENOPROTOOPT;
+}
+
+static int handle_set_cmd(struct mbox_request *cmd)
+{
+ return -ENOPROTOOPT;
+}
+
+static void loop(void)
+{
+ while (1) {
+ struct mbox_request req;
+ struct mbox_reply reply;
+ int n;
+
+ n = read(0, &req, sizeof(req));
+ if (n != sizeof(req)) {
+ dprintf(debug_fd, "invalid request %d\n", n);
+ return;
+ }
+
+ reply.status = req.is_set ?
+ handle_set_cmd(&req) :
+ handle_get_cmd(&req);
+
+ n = write(1, &reply, sizeof(reply));
+ if (n != sizeof(reply)) {
+ dprintf(debug_fd, "reply failed %d\n", n);
+ return;
+ }
+ }
+}
+
+int main(void)
+{
+ debug_fd = open("/dev/console", 00000002);
+ dprintf(debug_fd, "Started bpfilter\n");
+ loop();
+ close(debug_fd);
+ return 0;
+}
diff --git a/net/bpfilter/msgfmt.h b/net/bpfilter/msgfmt.h
new file mode 100644
index 0000000..98d121c
--- /dev/null
+++ b/net/bpfilter/msgfmt.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _NET_BPFILTER_MSGFMT_H
+#define _NET_BPFILTER_MSGFMT_H
+
+struct mbox_request {
+ __u64 addr;
+ __u32 len;
+ __u32 is_set;
+ __u32 cmd;
+ __u32 pid;
+};
+
+struct mbox_reply {
+ __u32 status;
+};
+
+#endif
diff --git a/net/bridge/br.c b/net/bridge/br.c
index 671d13c..b0a0b82 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -34,6 +34,7 @@
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct net_bridge_port *p;
struct net_bridge *br;
+ bool notified = false;
bool changed_addr;
int err;
@@ -67,7 +68,7 @@
break;
case NETDEV_CHANGE:
- br_port_carrier_check(p);
+ br_port_carrier_check(p, ¬ified);
break;
case NETDEV_FEAT_CHANGE:
@@ -76,8 +77,10 @@
case NETDEV_DOWN:
spin_lock_bh(&br->lock);
- if (br->dev->flags & IFF_UP)
+ if (br->dev->flags & IFF_UP) {
br_stp_disable_port(p);
+ notified = true;
+ }
spin_unlock_bh(&br->lock);
break;
@@ -85,6 +88,7 @@
if (netif_running(br->dev) && netif_oper_up(dev)) {
spin_lock_bh(&br->lock);
br_stp_enable_port(p);
+ notified = true;
spin_unlock_bh(&br->lock);
}
break;
@@ -110,8 +114,8 @@
}
/* Events that may cause spanning tree to refresh */
- if (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
- event == NETDEV_CHANGE || event == NETDEV_DOWN)
+ if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
+ event == NETDEV_CHANGE || event == NETDEV_DOWN))
br_ifinfo_notify(RTM_NEWLINK, NULL, p);
return NOTIFY_DONE;
@@ -141,7 +145,7 @@
case SWITCHDEV_FDB_ADD_TO_BRIDGE:
fdb_info = ptr;
err = br_fdb_external_learn_add(br, p, fdb_info->addr,
- fdb_info->vid);
+ fdb_info->vid, false);
if (err) {
err = notifier_from_errno(err);
break;
@@ -152,7 +156,7 @@
case SWITCHDEV_FDB_DEL_TO_BRIDGE:
fdb_info = ptr;
err = br_fdb_external_learn_del(br, p, fdb_info->addr,
- fdb_info->vid);
+ fdb_info->vid, false);
if (err)
err = notifier_from_errno(err);
break;
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index d9e69e4..b19e310 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -40,7 +40,7 @@
static int fdb_insert(struct net_bridge *br, struct net_bridge_port *source,
const unsigned char *addr, u16 vid);
static void fdb_notify(struct net_bridge *br,
- const struct net_bridge_fdb_entry *, int);
+ const struct net_bridge_fdb_entry *, int, bool);
int __init br_fdb_init(void)
{
@@ -121,6 +121,28 @@
return fdb;
}
+struct net_device *br_fdb_find_port(const struct net_device *br_dev,
+ const unsigned char *addr,
+ __u16 vid)
+{
+ struct net_bridge_fdb_entry *f;
+ struct net_device *dev = NULL;
+ struct net_bridge *br;
+
+ ASSERT_RTNL();
+
+ if (!netif_is_bridge_master(br_dev))
+ return NULL;
+
+ br = netdev_priv(br_dev);
+ f = br_fdb_find(br, addr, vid);
+ if (f && f->dst)
+ dev = f->dst->dev;
+
+ return dev;
+}
+EXPORT_SYMBOL_GPL(br_fdb_find_port);
+
struct net_bridge_fdb_entry *br_fdb_find_rcu(struct net_bridge *br,
const unsigned char *addr,
__u16 vid)
@@ -173,7 +195,8 @@
}
}
-static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
+static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
+ bool swdev_notify)
{
trace_fdb_delete(br, f);
@@ -183,7 +206,7 @@
hlist_del_init_rcu(&f->fdb_node);
rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
br_fdb_rht_params);
- fdb_notify(br, f, RTM_DELNEIGH);
+ fdb_notify(br, f, RTM_DELNEIGH, swdev_notify);
call_rcu(&f->rcu, fdb_rcu_free);
}
@@ -219,7 +242,7 @@
return;
}
- fdb_delete(br, f);
+ fdb_delete(br, f, true);
}
void br_fdb_find_delete_local(struct net_bridge *br,
@@ -334,7 +357,7 @@
} else {
spin_lock_bh(&br->hash_lock);
if (!hlist_unhashed(&f->fdb_node))
- fdb_delete(br, f);
+ fdb_delete(br, f, true);
spin_unlock_bh(&br->hash_lock);
}
}
@@ -354,7 +377,7 @@
spin_lock_bh(&br->hash_lock);
hlist_for_each_entry_safe(f, tmp, &br->fdb_list, fdb_node) {
if (!f->is_static)
- fdb_delete(br, f);
+ fdb_delete(br, f, true);
}
spin_unlock_bh(&br->hash_lock);
}
@@ -383,7 +406,7 @@
if (f->is_local)
fdb_delete_local(br, p, f);
else
- fdb_delete(br, f);
+ fdb_delete(br, f, true);
}
spin_unlock_bh(&br->hash_lock);
}
@@ -509,7 +532,7 @@
return 0;
br_warn(br, "adding interface %s with same address as a received packet (addr:%pM, vlan:%u)\n",
source ? source->dev->name : br->dev->name, addr, vid);
- fdb_delete(br, fdb);
+ fdb_delete(br, fdb, true);
}
fdb = fdb_create(br, source, addr, vid, 1, 1);
@@ -517,7 +540,7 @@
return -ENOMEM;
fdb_add_hw_addr(br, addr);
- fdb_notify(br, fdb, RTM_NEWNEIGH);
+ fdb_notify(br, fdb, RTM_NEWNEIGH, true);
return 0;
}
@@ -572,7 +595,7 @@
fdb->added_by_user = 1;
if (unlikely(fdb_modified)) {
trace_br_fdb_update(br, source, addr, vid, added_by_user);
- fdb_notify(br, fdb, RTM_NEWNEIGH);
+ fdb_notify(br, fdb, RTM_NEWNEIGH, true);
}
}
} else {
@@ -583,7 +606,7 @@
fdb->added_by_user = 1;
trace_br_fdb_update(br, source, addr, vid,
added_by_user);
- fdb_notify(br, fdb, RTM_NEWNEIGH);
+ fdb_notify(br, fdb, RTM_NEWNEIGH, true);
}
/* else we lose race and someone else inserts
* it first, don't bother updating
@@ -665,13 +688,15 @@
}
static void fdb_notify(struct net_bridge *br,
- const struct net_bridge_fdb_entry *fdb, int type)
+ const struct net_bridge_fdb_entry *fdb, int type,
+ bool swdev_notify)
{
struct net *net = dev_net(br->dev);
struct sk_buff *skb;
int err = -ENOBUFS;
- br_switchdev_fdb_notify(fdb, type);
+ if (swdev_notify)
+ br_switchdev_fdb_notify(fdb, type);
skb = nlmsg_new(fdb_nlmsg_size(), GFP_ATOMIC);
if (skb == NULL)
@@ -810,7 +835,7 @@
fdb->used = jiffies;
if (modified) {
fdb->updated = jiffies;
- fdb_notify(br, fdb, RTM_NEWNEIGH);
+ fdb_notify(br, fdb, RTM_NEWNEIGH, true);
}
return 0;
@@ -834,7 +859,7 @@
rcu_read_unlock();
local_bh_enable();
} else if (ndm->ndm_flags & NTF_EXT_LEARNED) {
- err = br_fdb_external_learn_add(br, p, addr, vid);
+ err = br_fdb_external_learn_add(br, p, addr, vid, true);
} else {
spin_lock_bh(&br->hash_lock);
err = fdb_add_entry(br, p, addr, ndm->ndm_state,
@@ -923,7 +948,7 @@
if (!fdb || fdb->dst != p)
return -ENOENT;
- fdb_delete(br, fdb);
+ fdb_delete(br, fdb, true);
return 0;
}
@@ -1043,7 +1068,8 @@
}
int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
- const unsigned char *addr, u16 vid)
+ const unsigned char *addr, u16 vid,
+ bool swdev_notify)
{
struct net_bridge_fdb_entry *fdb;
bool modified = false;
@@ -1061,7 +1087,7 @@
goto err_unlock;
}
fdb->added_by_external_learn = 1;
- fdb_notify(br, fdb, RTM_NEWNEIGH);
+ fdb_notify(br, fdb, RTM_NEWNEIGH, swdev_notify);
} else {
fdb->updated = jiffies;
@@ -1080,7 +1106,7 @@
}
if (modified)
- fdb_notify(br, fdb, RTM_NEWNEIGH);
+ fdb_notify(br, fdb, RTM_NEWNEIGH, swdev_notify);
}
err_unlock:
@@ -1090,7 +1116,8 @@
}
int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
- const unsigned char *addr, u16 vid)
+ const unsigned char *addr, u16 vid,
+ bool swdev_notify)
{
struct net_bridge_fdb_entry *fdb;
int err = 0;
@@ -1099,7 +1126,7 @@
fdb = br_fdb_find(br, addr, vid);
if (fdb && fdb->added_by_external_learn)
- fdb_delete(br, fdb);
+ fdb_delete(br, fdb, swdev_notify);
else
err = -ENOENT;
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index b4eed11..9019f32 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -30,7 +30,8 @@
vg = nbp_vlan_group_rcu(p);
return ((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) &&
br_allowed_egress(vg, skb) && p->state == BR_STATE_FORWARDING &&
- nbp_switchdev_allowed_egress(p, skb);
+ nbp_switchdev_allowed_egress(p, skb) &&
+ !br_skb_isolated(p, skb);
}
int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
@@ -274,8 +275,7 @@
struct net_bridge_port *port, *lport, *rport;
lport = p ? p->port : NULL;
- rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
- NULL;
+ rport = hlist_entry_safe(rp, struct net_bridge_port, rlist);
if ((unsigned long)lport > (unsigned long)rport) {
port = lport;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 5bb6681..05e42d8 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -64,7 +64,7 @@
/* Check for port carrier transitions. */
-void br_port_carrier_check(struct net_bridge_port *p)
+void br_port_carrier_check(struct net_bridge_port *p, bool *notified)
{
struct net_device *dev = p->dev;
struct net_bridge *br = p->br;
@@ -73,16 +73,21 @@
netif_running(dev) && netif_oper_up(dev))
p->path_cost = port_cost(dev);
+ *notified = false;
if (!netif_running(br->dev))
return;
spin_lock_bh(&br->lock);
if (netif_running(dev) && netif_oper_up(dev)) {
- if (p->state == BR_STATE_DISABLED)
+ if (p->state == BR_STATE_DISABLED) {
br_stp_enable_port(p);
+ *notified = true;
+ }
} else {
- if (p->state != BR_STATE_DISABLED)
+ if (p->state != BR_STATE_DISABLED) {
br_stp_disable_port(p);
+ *notified = true;
+ }
}
spin_unlock_bh(&br->lock);
}
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 7f98a7d..7207427 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -114,6 +114,7 @@
goto drop;
BR_INPUT_SKB_CB(skb)->brdev = br->dev;
+ BR_INPUT_SKB_CB(skb)->src_port_isolated = !!(p->flags & BR_ISOLATED);
if (IS_ENABLED(CONFIG_INET) &&
(skb->protocol == htons(ETH_P_ARP) ||
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 015f465c..9f5eb05 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -139,6 +139,7 @@
+ nla_total_size(1) /* IFLA_BRPORT_PROXYARP_WIFI */
+ nla_total_size(1) /* IFLA_BRPORT_VLAN_TUNNEL */
+ nla_total_size(1) /* IFLA_BRPORT_NEIGH_SUPPRESS */
+ + nla_total_size(1) /* IFLA_BRPORT_ISOLATED */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_ROOT_ID */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_BRIDGE_ID */
+ nla_total_size(sizeof(u16)) /* IFLA_BRPORT_DESIGNATED_PORT */
@@ -213,7 +214,8 @@
BR_VLAN_TUNNEL)) ||
nla_put_u16(skb, IFLA_BRPORT_GROUP_FWD_MASK, p->group_fwd_mask) ||
nla_put_u8(skb, IFLA_BRPORT_NEIGH_SUPPRESS,
- !!(p->flags & BR_NEIGH_SUPPRESS)))
+ !!(p->flags & BR_NEIGH_SUPPRESS)) ||
+ nla_put_u8(skb, IFLA_BRPORT_ISOLATED, !!(p->flags & BR_ISOLATED)))
return -EMSGSIZE;
timerval = br_timer_value(&p->message_age_timer);
@@ -660,6 +662,7 @@
[IFLA_BRPORT_VLAN_TUNNEL] = { .type = NLA_U8 },
[IFLA_BRPORT_GROUP_FWD_MASK] = { .type = NLA_U16 },
[IFLA_BRPORT_NEIGH_SUPPRESS] = { .type = NLA_U8 },
+ [IFLA_BRPORT_ISOLATED] = { .type = NLA_U8 },
};
/* Change the state of the port and notify spanning tree */
@@ -810,6 +813,10 @@
if (err)
return err;
+ err = br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
+ if (err)
+ return err;
+
br_port_flags_change(p, old_flags ^ p->flags);
return 0;
}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index a7cb3ec..11520ed 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -423,6 +423,7 @@
#endif
bool proxyarp_replied;
+ bool src_port_isolated;
#ifdef CONFIG_BRIDGE_VLAN_FILTERING
bool vlan_filtered;
@@ -553,9 +554,11 @@
int br_fdb_sync_static(struct net_bridge *br, struct net_bridge_port *p);
void br_fdb_unsync_static(struct net_bridge *br, struct net_bridge_port *p);
int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
- const unsigned char *addr, u16 vid);
+ const unsigned char *addr, u16 vid,
+ bool swdev_notify);
int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
- const unsigned char *addr, u16 vid);
+ const unsigned char *addr, u16 vid,
+ bool swdev_notify);
void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
const unsigned char *addr, u16 vid);
@@ -572,8 +575,16 @@
void br_flood(struct net_bridge *br, struct sk_buff *skb,
enum br_pkt_type pkt_type, bool local_rcv, bool local_orig);
+/* return true if both source port and dest port are isolated */
+static inline bool br_skb_isolated(const struct net_bridge_port *to,
+ const struct sk_buff *skb)
+{
+ return BR_INPUT_SKB_CB(skb)->src_port_isolated &&
+ (to->flags & BR_ISOLATED);
+}
+
/* br_if.c */
-void br_port_carrier_check(struct net_bridge_port *p);
+void br_port_carrier_check(struct net_bridge_port *p, bool *notified);
int br_add_bridge(struct net *net, const char *name);
int br_del_bridge(struct net *net, const char *name);
int br_add_if(struct net_bridge *br, struct net_device *dev,
@@ -594,11 +605,22 @@
return rcu_dereference(dev->rx_handler) == br_handle_frame;
}
+static inline bool br_rx_handler_check_rtnl(const struct net_device *dev)
+{
+ return rcu_dereference_rtnl(dev->rx_handler) == br_handle_frame;
+}
+
static inline struct net_bridge_port *br_port_get_check_rcu(const struct net_device *dev)
{
return br_rx_handler_check_rcu(dev) ? br_port_get_rcu(dev) : NULL;
}
+static inline struct net_bridge_port *
+br_port_get_check_rtnl(const struct net_device *dev)
+{
+ return br_rx_handler_check_rtnl(dev) ? br_port_get_rtnl_rcu(dev) : NULL;
+}
+
/* br_ioctl.c */
int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd,
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index ee775f4..35474d4 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -102,13 +102,15 @@
static void
br_switchdev_fdb_call_notifiers(bool adding, const unsigned char *mac,
- u16 vid, struct net_device *dev)
+ u16 vid, struct net_device *dev,
+ bool added_by_user)
{
struct switchdev_notifier_fdb_info info;
unsigned long notifier_type;
info.addr = mac;
info.vid = vid;
+ info.added_by_user = added_by_user;
notifier_type = adding ? SWITCHDEV_FDB_ADD_TO_DEVICE : SWITCHDEV_FDB_DEL_TO_DEVICE;
call_switchdev_notifiers(notifier_type, dev, &info.info);
}
@@ -116,19 +118,21 @@
void
br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
{
- if (!fdb->added_by_user || !fdb->dst)
+ if (!fdb->dst)
return;
switch (type) {
case RTM_DELNEIGH:
br_switchdev_fdb_call_notifiers(false, fdb->key.addr.addr,
fdb->key.vlan_id,
- fdb->dst->dev);
+ fdb->dst->dev,
+ fdb->added_by_user);
break;
case RTM_NEWNEIGH:
br_switchdev_fdb_call_notifiers(true, fdb->key.addr.addr,
fdb->key.vlan_id,
- fdb->dst->dev);
+ fdb->dst->dev,
+ fdb->added_by_user);
break;
}
}
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index fd31ad8..f99c5bf 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -192,6 +192,7 @@
BRPORT_ATTR_FLAG(multicast_flood, BR_MCAST_FLOOD);
BRPORT_ATTR_FLAG(broadcast_flood, BR_BCAST_FLOOD);
BRPORT_ATTR_FLAG(neigh_suppress, BR_NEIGH_SUPPRESS);
+BRPORT_ATTR_FLAG(isolated, BR_ISOLATED);
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
@@ -243,6 +244,7 @@
&brport_attr_broadcast_flood,
&brport_attr_group_fwd_mask,
&brport_attr_neigh_suppress,
+ &brport_attr_isolated,
NULL
};
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 9896f49..dc832c09 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1149,3 +1149,44 @@
stats->tx_packets += txpackets;
}
}
+
+int br_vlan_get_pvid(const struct net_device *dev, u16 *p_pvid)
+{
+ struct net_bridge_vlan_group *vg;
+
+ ASSERT_RTNL();
+ if (netif_is_bridge_master(dev))
+ vg = br_vlan_group(netdev_priv(dev));
+ else
+ return -EINVAL;
+
+ *p_pvid = br_get_pvid(vg);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(br_vlan_get_pvid);
+
+int br_vlan_get_info(const struct net_device *dev, u16 vid,
+ struct bridge_vlan_info *p_vinfo)
+{
+ struct net_bridge_vlan_group *vg;
+ struct net_bridge_vlan *v;
+ struct net_bridge_port *p;
+
+ ASSERT_RTNL();
+ p = br_port_get_check_rtnl(dev);
+ if (p)
+ vg = nbp_vlan_group(p);
+ else if (netif_is_bridge_master(dev))
+ vg = br_vlan_group(netdev_priv(dev));
+ else
+ return -EINVAL;
+
+ v = br_vlan_find(vg, vid);
+ if (!v)
+ return -ENOENT;
+
+ p_vinfo->vid = vid;
+ p_vinfo->flags = v->flags;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(br_vlan_get_info);
diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig
index f212447..9a0159a 100644
--- a/net/bridge/netfilter/Kconfig
+++ b/net/bridge/netfilter/Kconfig
@@ -8,13 +8,6 @@
bool "Ethernet Bridge nf_tables support"
if NF_TABLES_BRIDGE
-
-config NFT_BRIDGE_META
- tristate "Netfilter nf_table bridge meta support"
- depends on NFT_META
- help
- Add support for bridge dedicated meta key.
-
config NFT_BRIDGE_REJECT
tristate "Netfilter nf_tables bridge reject support"
depends on NFT_REJECT && NFT_REJECT_IPV4 && NFT_REJECT_IPV6
diff --git a/net/bridge/netfilter/Makefile b/net/bridge/netfilter/Makefile
index 4bc758d..9b86886 100644
--- a/net/bridge/netfilter/Makefile
+++ b/net/bridge/netfilter/Makefile
@@ -3,7 +3,6 @@
# Makefile for the netfilter modules for Link Layer filtering on a bridge.
#
-obj-$(CONFIG_NFT_BRIDGE_META) += nft_meta_bridge.o
obj-$(CONFIG_NFT_BRIDGE_REJECT) += nft_reject_bridge.o
# packet logging
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 28a4c34..b286ed5 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -101,7 +101,7 @@
{
par->match = m->u.match;
par->matchinfo = m->data;
- return m->u.match->match(skb, par) ? EBT_MATCH : EBT_NOMATCH;
+ return !m->u.match->match(skb, par);
}
static inline int
@@ -177,6 +177,12 @@
return (void *)entry + entry->next_offset;
}
+static inline const struct ebt_entry_target *
+ebt_get_target_c(const struct ebt_entry *e)
+{
+ return ebt_get_target((struct ebt_entry *)e);
+}
+
/* Do some firewalling */
unsigned int ebt_do_table(struct sk_buff *skb,
const struct nf_hook_state *state,
@@ -230,8 +236,7 @@
*/
EBT_WATCHER_ITERATE(point, ebt_do_watcher, skb, &acpar);
- t = (struct ebt_entry_target *)
- (((char *)point) + point->target_offset);
+ t = ebt_get_target_c(point);
/* standard target */
if (!t->u.target->target)
verdict = ((struct ebt_standard_target *)t)->verdict;
@@ -343,6 +348,16 @@
"ebtable_", error, mutex);
}
+static inline void ebt_free_table_info(struct ebt_table_info *info)
+{
+ int i;
+
+ if (info->chainstack) {
+ for_each_possible_cpu(i)
+ vfree(info->chainstack[i]);
+ vfree(info->chainstack);
+ }
+}
static inline int
ebt_check_match(struct ebt_entry_match *m, struct xt_mtchk_param *par,
unsigned int *cnt)
@@ -627,7 +642,7 @@
return 1;
EBT_WATCHER_ITERATE(e, ebt_cleanup_watcher, net, NULL);
EBT_MATCH_ITERATE(e, ebt_cleanup_match, net, NULL);
- t = (struct ebt_entry_target *)(((char *)e) + e->target_offset);
+ t = ebt_get_target(e);
par.net = net;
par.target = t->u.target;
@@ -706,7 +721,7 @@
ret = EBT_WATCHER_ITERATE(e, ebt_check_watcher, &tgpar, &j);
if (ret != 0)
goto cleanup_watchers;
- t = (struct ebt_entry_target *)(((char *)e) + e->target_offset);
+ t = ebt_get_target(e);
gap = e->next_offset - e->target_offset;
target = xt_request_find_target(NFPROTO_BRIDGE, t->u.name, 0);
@@ -779,8 +794,7 @@
if (pos == nentries)
continue;
}
- t = (struct ebt_entry_target *)
- (((char *)e) + e->target_offset);
+ t = ebt_get_target_c(e);
if (strcmp(t->u.name, EBT_STANDARD_TARGET))
goto letscontinue;
if (e->target_offset + sizeof(struct ebt_standard_target) >
@@ -975,7 +989,7 @@
static int do_replace_finish(struct net *net, struct ebt_replace *repl,
struct ebt_table_info *newinfo)
{
- int ret, i;
+ int ret;
struct ebt_counter *counterstmp = NULL;
/* used to be able to unlock earlier */
struct ebt_table_info *table;
@@ -1051,13 +1065,8 @@
ebt_cleanup_entry, net, NULL);
vfree(table->entries);
- if (table->chainstack) {
- for_each_possible_cpu(i)
- vfree(table->chainstack[i]);
- vfree(table->chainstack);
- }
+ ebt_free_table_info(table);
vfree(table);
-
vfree(counterstmp);
#ifdef CONFIG_AUDIT
@@ -1078,11 +1087,7 @@
free_counterstmp:
vfree(counterstmp);
/* can be initialized in translate_table() */
- if (newinfo->chainstack) {
- for_each_possible_cpu(i)
- vfree(newinfo->chainstack[i]);
- vfree(newinfo->chainstack);
- }
+ ebt_free_table_info(newinfo);
return ret;
}
@@ -1147,8 +1152,6 @@
static void __ebt_unregister_table(struct net *net, struct ebt_table *table)
{
- int i;
-
mutex_lock(&ebt_mutex);
list_del(&table->list);
mutex_unlock(&ebt_mutex);
@@ -1157,11 +1160,7 @@
if (table->private->nentries)
module_put(table->me);
vfree(table->private->entries);
- if (table->private->chainstack) {
- for_each_possible_cpu(i)
- vfree(table->private->chainstack[i]);
- vfree(table->private->chainstack);
- }
+ ebt_free_table_info(table->private);
vfree(table->private);
kfree(table);
}
@@ -1263,11 +1262,7 @@
free_unlock:
mutex_unlock(&ebt_mutex);
free_chainstack:
- if (newinfo->chainstack) {
- for_each_possible_cpu(i)
- vfree(newinfo->chainstack[i]);
- vfree(newinfo->chainstack);
- }
+ ebt_free_table_info(newinfo);
vfree(newinfo->entries);
free_newinfo:
vfree(newinfo);
@@ -1405,7 +1400,7 @@
return -EFAULT;
hlp = ubase + (((char *)e + e->target_offset) - base);
- t = (struct ebt_entry_target *)(((char *)e) + e->target_offset);
+ t = ebt_get_target_c(e);
ret = EBT_MATCH_ITERATE(e, ebt_match_to_user, base, ubase);
if (ret != 0)
@@ -1746,7 +1741,7 @@
return ret;
target_offset = e->target_offset - (origsize - *size);
- t = (struct ebt_entry_target *) ((char *) e + e->target_offset);
+ t = ebt_get_target(e);
ret = compat_target_to_user(t, dstptr, size);
if (ret)
@@ -1794,7 +1789,7 @@
EBT_MATCH_ITERATE(e, compat_calc_match, &off);
EBT_WATCHER_ITERATE(e, compat_calc_watcher, &off);
- t = (const struct ebt_entry_target *) ((char *) e + e->target_offset);
+ t = ebt_get_target_c(e);
off += xt_compat_target_offset(t->u.target);
off += ebt_compat_entry_padsize();
diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
deleted file mode 100644
index bb63c9a..0000000
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ /dev/null
@@ -1,135 +0,0 @@
-/*
- * Copyright (c) 2014 Intel Corporation
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- */
-
-#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/module.h>
-#include <linux/netlink.h>
-#include <linux/netfilter.h>
-#include <linux/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables.h>
-#include <net/netfilter/nft_meta.h>
-
-#include "../br_private.h"
-
-static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
- struct nft_regs *regs,
- const struct nft_pktinfo *pkt)
-{
- const struct nft_meta *priv = nft_expr_priv(expr);
- const struct net_device *in = nft_in(pkt), *out = nft_out(pkt);
- u32 *dest = ®s->data[priv->dreg];
- const struct net_bridge_port *p;
-
- switch (priv->key) {
- case NFT_META_BRI_IIFNAME:
- if (in == NULL || (p = br_port_get_rcu(in)) == NULL)
- goto err;
- break;
- case NFT_META_BRI_OIFNAME:
- if (out == NULL || (p = br_port_get_rcu(out)) == NULL)
- goto err;
- break;
- default:
- goto out;
- }
-
- strncpy((char *)dest, p->br->dev->name, IFNAMSIZ);
- return;
-out:
- return nft_meta_get_eval(expr, regs, pkt);
-err:
- regs->verdict.code = NFT_BREAK;
-}
-
-static int nft_meta_bridge_get_init(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
- const struct nlattr * const tb[])
-{
- struct nft_meta *priv = nft_expr_priv(expr);
- unsigned int len;
-
- priv->key = ntohl(nla_get_be32(tb[NFTA_META_KEY]));
- switch (priv->key) {
- case NFT_META_BRI_IIFNAME:
- case NFT_META_BRI_OIFNAME:
- len = IFNAMSIZ;
- break;
- default:
- return nft_meta_get_init(ctx, expr, tb);
- }
-
- priv->dreg = nft_parse_register(tb[NFTA_META_DREG]);
- return nft_validate_register_store(ctx, priv->dreg, NULL,
- NFT_DATA_VALUE, len);
-}
-
-static struct nft_expr_type nft_meta_bridge_type;
-static const struct nft_expr_ops nft_meta_bridge_get_ops = {
- .type = &nft_meta_bridge_type,
- .size = NFT_EXPR_SIZE(sizeof(struct nft_meta)),
- .eval = nft_meta_bridge_get_eval,
- .init = nft_meta_bridge_get_init,
- .dump = nft_meta_get_dump,
-};
-
-static const struct nft_expr_ops nft_meta_bridge_set_ops = {
- .type = &nft_meta_bridge_type,
- .size = NFT_EXPR_SIZE(sizeof(struct nft_meta)),
- .eval = nft_meta_set_eval,
- .init = nft_meta_set_init,
- .destroy = nft_meta_set_destroy,
- .dump = nft_meta_set_dump,
- .validate = nft_meta_set_validate,
-};
-
-static const struct nft_expr_ops *
-nft_meta_bridge_select_ops(const struct nft_ctx *ctx,
- const struct nlattr * const tb[])
-{
- if (tb[NFTA_META_KEY] == NULL)
- return ERR_PTR(-EINVAL);
-
- if (tb[NFTA_META_DREG] && tb[NFTA_META_SREG])
- return ERR_PTR(-EINVAL);
-
- if (tb[NFTA_META_DREG])
- return &nft_meta_bridge_get_ops;
-
- if (tb[NFTA_META_SREG])
- return &nft_meta_bridge_set_ops;
-
- return ERR_PTR(-EINVAL);
-}
-
-static struct nft_expr_type nft_meta_bridge_type __read_mostly = {
- .family = NFPROTO_BRIDGE,
- .name = "meta",
- .select_ops = nft_meta_bridge_select_ops,
- .policy = nft_meta_policy,
- .maxattr = NFTA_META_MAX,
- .owner = THIS_MODULE,
-};
-
-static int __init nft_meta_bridge_module_init(void)
-{
- return nft_register_expr(&nft_meta_bridge_type);
-}
-
-static void __exit nft_meta_bridge_module_exit(void)
-{
- nft_unregister_expr(&nft_meta_bridge_type);
-}
-
-module_init(nft_meta_bridge_module_init);
-module_exit(nft_meta_bridge_module_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>");
-MODULE_ALIAS_NFT_AF_EXPR(AF_BRIDGE, "meta");
diff --git a/net/core/Makefile b/net/core/Makefile
index 6dbbba8..7080417 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -14,6 +14,7 @@
fib_notifier.o xdp.o
obj-y += net-sysfs.o
+obj-$(CONFIG_PAGE_POOL) += page_pool.o
obj-$(CONFIG_PROC_FS) += net-procfs.o
obj-$(CONFIG_NET_PKTGEN) += pktgen.o
obj-$(CONFIG_NETPOLL) += netpoll.o
diff --git a/net/core/dev.c b/net/core/dev.c
index 2af787e..1844d9b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1285,6 +1285,7 @@
return len;
}
+EXPORT_SYMBOL(dev_set_alias);
/**
* dev_get_alias - get ifalias of a device
@@ -1586,7 +1587,7 @@
N(UDP_TUNNEL_DROP_INFO) N(CHANGE_TX_QUEUE_LEN)
N(CVLAN_FILTER_PUSH_INFO) N(CVLAN_FILTER_DROP_INFO)
N(SVLAN_FILTER_PUSH_INFO) N(SVLAN_FILTER_DROP_INFO)
- };
+ }
#undef N
return "UNKNOWN_NETDEV_EVENT";
}
@@ -1754,38 +1755,38 @@
EXPORT_SYMBOL(call_netdevice_notifiers);
#ifdef CONFIG_NET_INGRESS
-static struct static_key ingress_needed __read_mostly;
+static DEFINE_STATIC_KEY_FALSE(ingress_needed_key);
void net_inc_ingress_queue(void)
{
- static_key_slow_inc(&ingress_needed);
+ static_branch_inc(&ingress_needed_key);
}
EXPORT_SYMBOL_GPL(net_inc_ingress_queue);
void net_dec_ingress_queue(void)
{
- static_key_slow_dec(&ingress_needed);
+ static_branch_dec(&ingress_needed_key);
}
EXPORT_SYMBOL_GPL(net_dec_ingress_queue);
#endif
#ifdef CONFIG_NET_EGRESS
-static struct static_key egress_needed __read_mostly;
+static DEFINE_STATIC_KEY_FALSE(egress_needed_key);
void net_inc_egress_queue(void)
{
- static_key_slow_inc(&egress_needed);
+ static_branch_inc(&egress_needed_key);
}
EXPORT_SYMBOL_GPL(net_inc_egress_queue);
void net_dec_egress_queue(void)
{
- static_key_slow_dec(&egress_needed);
+ static_branch_dec(&egress_needed_key);
}
EXPORT_SYMBOL_GPL(net_dec_egress_queue);
#endif
-static struct static_key netstamp_needed __read_mostly;
+static DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
#ifdef HAVE_JUMP_LABEL
static atomic_t netstamp_needed_deferred;
static atomic_t netstamp_wanted;
@@ -1796,9 +1797,9 @@
wanted = atomic_add_return(deferred, &netstamp_wanted);
if (wanted > 0)
- static_key_enable(&netstamp_needed);
+ static_branch_enable(&netstamp_needed_key);
else
- static_key_disable(&netstamp_needed);
+ static_branch_disable(&netstamp_needed_key);
}
static DECLARE_WORK(netstamp_work, netstamp_clear);
#endif
@@ -1818,7 +1819,7 @@
atomic_inc(&netstamp_needed_deferred);
schedule_work(&netstamp_work);
#else
- static_key_slow_inc(&netstamp_needed);
+ static_branch_inc(&netstamp_needed_key);
#endif
}
EXPORT_SYMBOL(net_enable_timestamp);
@@ -1838,7 +1839,7 @@
atomic_dec(&netstamp_needed_deferred);
schedule_work(&netstamp_work);
#else
- static_key_slow_dec(&netstamp_needed);
+ static_branch_dec(&netstamp_needed_key);
#endif
}
EXPORT_SYMBOL(net_disable_timestamp);
@@ -1846,15 +1847,15 @@
static inline void net_timestamp_set(struct sk_buff *skb)
{
skb->tstamp = 0;
- if (static_key_false(&netstamp_needed))
+ if (static_branch_unlikely(&netstamp_needed_key))
__net_timestamp(skb);
}
-#define net_timestamp_check(COND, SKB) \
- if (static_key_false(&netstamp_needed)) { \
- if ((COND) && !(SKB)->tstamp) \
- __net_timestamp(SKB); \
- } \
+#define net_timestamp_check(COND, SKB) \
+ if (static_branch_unlikely(&netstamp_needed_key)) { \
+ if ((COND) && !(SKB)->tstamp) \
+ __net_timestamp(SKB); \
+ } \
bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
{
@@ -2614,17 +2615,16 @@
* Returns a Tx hash based on the given packet descriptor a Tx queues' number
* to be used as a distribution range.
*/
-u16 __skb_tx_hash(const struct net_device *dev, struct sk_buff *skb,
- unsigned int num_tx_queues)
+static u16 skb_tx_hash(const struct net_device *dev, struct sk_buff *skb)
{
u32 hash;
u16 qoffset = 0;
- u16 qcount = num_tx_queues;
+ u16 qcount = dev->real_num_tx_queues;
if (skb_rx_queue_recorded(skb)) {
hash = skb_get_rx_queue(skb);
- while (unlikely(hash >= num_tx_queues))
- hash -= num_tx_queues;
+ while (unlikely(hash >= qcount))
+ hash -= qcount;
return hash;
}
@@ -2637,7 +2637,6 @@
return (u16) reciprocal_scale(skb_get_hash(skb), qcount) + qoffset;
}
-EXPORT_SYMBOL(__skb_tx_hash);
static void skb_warn_bad_offload(const struct sk_buff *skb)
{
@@ -3113,6 +3112,10 @@
if (unlikely(!skb))
goto out_null;
+ skb = sk_validate_xmit_skb(skb, dev);
+ if (unlikely(!skb))
+ goto out_null;
+
if (netif_needs_gso(skb, features)) {
struct sk_buff *segs;
@@ -3241,7 +3244,7 @@
rc = NET_XMIT_DROP;
} else {
rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
- __qdisc_run(q);
+ qdisc_run(q);
}
if (unlikely(to_free))
@@ -3529,7 +3532,7 @@
#ifdef CONFIG_NET_CLS_ACT
skb->tc_at_ingress = 0;
# ifdef CONFIG_NET_EGRESS
- if (static_key_false(&egress_needed)) {
+ if (static_branch_unlikely(&egress_needed_key)) {
skb = sch_handle_egress(skb, &rc, dev);
if (!skb)
goto out;
@@ -3624,6 +3627,44 @@
}
EXPORT_SYMBOL(dev_queue_xmit_accel);
+int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+{
+ struct net_device *dev = skb->dev;
+ struct sk_buff *orig_skb = skb;
+ struct netdev_queue *txq;
+ int ret = NETDEV_TX_BUSY;
+ bool again = false;
+
+ if (unlikely(!netif_running(dev) ||
+ !netif_carrier_ok(dev)))
+ goto drop;
+
+ skb = validate_xmit_skb_list(skb, dev, &again);
+ if (skb != orig_skb)
+ goto drop;
+
+ skb_set_queue_mapping(skb, queue_id);
+ txq = skb_get_tx_queue(dev, skb);
+
+ local_bh_disable();
+
+ HARD_TX_LOCK(dev, txq, smp_processor_id());
+ if (!netif_xmit_frozen_or_drv_stopped(txq))
+ ret = netdev_start_xmit(skb, dev, txq, false);
+ HARD_TX_UNLOCK(dev, txq);
+
+ local_bh_enable();
+
+ if (!dev_xmit_complete(ret))
+ kfree_skb(skb);
+
+ return ret;
+drop:
+ atomic_long_inc(&dev->tx_dropped);
+ kfree_skb_list(skb);
+ return NET_XMIT_DROP;
+}
+EXPORT_SYMBOL(dev_direct_xmit);
/*************************************************************************
* Receiver routines
@@ -3993,12 +4034,12 @@
}
static u32 netif_receive_generic_xdp(struct sk_buff *skb,
+ struct xdp_buff *xdp,
struct bpf_prog *xdp_prog)
{
struct netdev_rx_queue *rxqueue;
+ void *orig_data, *orig_data_end;
u32 metalen, act = XDP_DROP;
- struct xdp_buff xdp;
- void *orig_data;
int hlen, off;
u32 mac_len;
@@ -4033,31 +4074,42 @@
*/
mac_len = skb->data - skb_mac_header(skb);
hlen = skb_headlen(skb) + mac_len;
- xdp.data = skb->data - mac_len;
- xdp.data_meta = xdp.data;
- xdp.data_end = xdp.data + hlen;
- xdp.data_hard_start = skb->data - skb_headroom(skb);
- orig_data = xdp.data;
+ xdp->data = skb->data - mac_len;
+ xdp->data_meta = xdp->data;
+ xdp->data_end = xdp->data + hlen;
+ xdp->data_hard_start = skb->data - skb_headroom(skb);
+ orig_data_end = xdp->data_end;
+ orig_data = xdp->data;
rxqueue = netif_get_rxqueue(skb);
- xdp.rxq = &rxqueue->xdp_rxq;
+ xdp->rxq = &rxqueue->xdp_rxq;
- act = bpf_prog_run_xdp(xdp_prog, &xdp);
+ act = bpf_prog_run_xdp(xdp_prog, xdp);
- off = xdp.data - orig_data;
+ off = xdp->data - orig_data;
if (off > 0)
__skb_pull(skb, off);
else if (off < 0)
__skb_push(skb, -off);
skb->mac_header += off;
+ /* check if bpf_xdp_adjust_tail was used. it can only "shrink"
+ * pckt.
+ */
+ off = orig_data_end - xdp->data_end;
+ if (off != 0) {
+ skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
+ skb->len -= off;
+
+ }
+
switch (act) {
case XDP_REDIRECT:
case XDP_TX:
__skb_push(skb, mac_len);
break;
case XDP_PASS:
- metalen = xdp.data - xdp.data_meta;
+ metalen = xdp->data - xdp->data_meta;
if (metalen)
skb_metadata_set(skb, metalen);
break;
@@ -4102,22 +4154,24 @@
}
EXPORT_SYMBOL_GPL(generic_xdp_tx);
-static struct static_key generic_xdp_needed __read_mostly;
+static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
{
if (xdp_prog) {
- u32 act = netif_receive_generic_xdp(skb, xdp_prog);
+ struct xdp_buff xdp;
+ u32 act;
int err;
+ act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
if (act != XDP_PASS) {
switch (act) {
case XDP_REDIRECT:
err = xdp_do_generic_redirect(skb->dev, skb,
- xdp_prog);
+ &xdp, xdp_prog);
if (err)
goto out_redir;
- /* fallthru to submit skb */
+ break;
case XDP_TX:
generic_xdp_tx(skb, xdp_prog);
break;
@@ -4140,7 +4194,7 @@
trace_netif_rx(skb);
- if (static_key_false(&generic_xdp_needed)) {
+ if (static_branch_unlikely(&generic_xdp_needed_key)) {
int ret;
preempt_disable();
@@ -4512,7 +4566,7 @@
skip_taps:
#ifdef CONFIG_NET_INGRESS
- if (static_key_false(&ingress_needed)) {
+ if (static_branch_unlikely(&ingress_needed_key)) {
skb = sch_handle_ingress(skb, &pt_prev, &ret, orig_dev);
if (!skb)
goto out;
@@ -4672,9 +4726,9 @@
bpf_prog_put(old);
if (old && !new) {
- static_key_slow_dec(&generic_xdp_needed);
+ static_branch_dec(&generic_xdp_needed_key);
} else if (new && !old) {
- static_key_slow_inc(&generic_xdp_needed);
+ static_branch_inc(&generic_xdp_needed_key);
dev_disable_lro(dev);
dev_disable_gro_hw(dev);
}
@@ -4702,7 +4756,7 @@
if (skb_defer_rx_timestamp(skb))
return NET_RX_SUCCESS;
- if (static_key_false(&generic_xdp_needed)) {
+ if (static_branch_unlikely(&generic_xdp_needed_key)) {
int ret;
preempt_disable();
@@ -7870,6 +7924,8 @@
int ret;
struct net *net = dev_net(dev);
+ BUILD_BUG_ON(sizeof(netdev_features_t) * BITS_PER_BYTE <
+ NETDEV_FEATURE_COUNT);
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
diff --git a/net/core/devlink.c b/net/core/devlink.c
index ad13173..475246b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -453,6 +453,27 @@
msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
}
+static int devlink_nl_port_attrs_put(struct sk_buff *msg,
+ struct devlink_port *devlink_port)
+{
+ struct devlink_port_attrs *attrs = &devlink_port->attrs;
+
+ if (!attrs->set)
+ return 0;
+ if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour))
+ return -EMSGSIZE;
+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, attrs->port_number))
+ return -EMSGSIZE;
+ if (!attrs->split)
+ return 0;
+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, attrs->port_number))
+ return -EMSGSIZE;
+ if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER,
+ attrs->split_subport_number))
+ return -EMSGSIZE;
+ return 0;
+}
+
static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
struct devlink_port *devlink_port,
enum devlink_command cmd, u32 portid,
@@ -492,9 +513,7 @@
ibdev->name))
goto nla_put_failure;
}
- if (devlink_port->split &&
- nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP,
- devlink_port->split_group))
+ if (devlink_nl_port_attrs_put(msg, devlink_port))
goto nla_put_failure;
genlmsg_end(msg, hdr);
@@ -2737,7 +2756,8 @@
.doit = devlink_nl_cmd_eswitch_set_doit,
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
- .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+ DEVLINK_NL_FLAG_NO_LOCK,
},
{
.cmd = DEVLINK_CMD_DPIPE_TABLE_GET,
@@ -2971,19 +2991,64 @@
EXPORT_SYMBOL_GPL(devlink_port_type_clear);
/**
- * devlink_port_split_set - Set port is split
+ * devlink_port_attrs_set - Set port attributes
*
* @devlink_port: devlink port
- * @split_group: split group - identifies group split port is part of
+ * @flavour: flavour of the port
+ * @port_number: number of the port that is facing user, for example
+ * the front panel port number
+ * @split: indicates if this is split port
+ * @split_subport_number: if the port is split, this is the number
+ * of subport.
*/
-void devlink_port_split_set(struct devlink_port *devlink_port,
- u32 split_group)
+void devlink_port_attrs_set(struct devlink_port *devlink_port,
+ enum devlink_port_flavour flavour,
+ u32 port_number, bool split,
+ u32 split_subport_number)
{
- devlink_port->split = true;
- devlink_port->split_group = split_group;
+ struct devlink_port_attrs *attrs = &devlink_port->attrs;
+
+ attrs->set = true;
+ attrs->flavour = flavour;
+ attrs->port_number = port_number;
+ attrs->split = split;
+ attrs->split_subport_number = split_subport_number;
devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
}
-EXPORT_SYMBOL_GPL(devlink_port_split_set);
+EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
+
+int devlink_port_get_phys_port_name(struct devlink_port *devlink_port,
+ char *name, size_t len)
+{
+ struct devlink_port_attrs *attrs = &devlink_port->attrs;
+ int n = 0;
+
+ if (!attrs->set)
+ return -EOPNOTSUPP;
+
+ switch (attrs->flavour) {
+ case DEVLINK_PORT_FLAVOUR_PHYSICAL:
+ if (!attrs->split)
+ n = snprintf(name, len, "p%u", attrs->port_number);
+ else
+ n = snprintf(name, len, "p%us%u", attrs->port_number,
+ attrs->split_subport_number);
+ break;
+ case DEVLINK_PORT_FLAVOUR_CPU:
+ case DEVLINK_PORT_FLAVOUR_DSA:
+ /* As CPU and DSA ports do not have a netdevice associated
+ * case should not ever happen.
+ */
+ WARN_ON(1);
+ return -EINVAL;
+ }
+
+ if (n >= len)
+ return -EINVAL;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_port_get_phys_port_name);
int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
u32 size, u16 ingress_pools_count,
diff --git a/net/core/dst.c b/net/core/dst.c
index 007aa0b..2d9b37f 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -58,6 +58,7 @@
*/
.refcnt = REFCOUNT_INIT(1),
};
+EXPORT_SYMBOL(dst_default_metrics);
void dst_init(struct dst_entry *dst, struct dst_ops *ops,
struct net_device *dev, int initial_ref, int initial_obsolete,
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index ba02f0d..c15075d 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -92,6 +92,7 @@
[NETIF_F_GSO_PARTIAL_BIT] = "tx-gso-partial",
[NETIF_F_GSO_SCTP_BIT] = "tx-sctp-segmentation",
[NETIF_F_GSO_ESP_BIT] = "tx-esp-segmentation",
+ [NETIF_F_GSO_UDP_L4_BIT] = "tx-udp-segmentation",
[NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
[NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
@@ -109,6 +110,7 @@
[NETIF_F_HW_ESP_TX_CSUM_BIT] = "esp-tx-csum-hw-offload",
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] = "rx-udp_tunnel-port-offload",
[NETIF_F_HW_TLS_RECORD_BIT] = "tls-hw-record",
+ [NETIF_F_HW_TLS_TX_BIT] = "tls-hw-tx-offload",
};
static const char
@@ -210,23 +212,6 @@
return ret;
}
-static int phy_get_sset_count(struct phy_device *phydev)
-{
- int ret;
-
- if (phydev->drv->get_sset_count &&
- phydev->drv->get_strings &&
- phydev->drv->get_stats) {
- mutex_lock(&phydev->lock);
- ret = phydev->drv->get_sset_count(phydev);
- mutex_unlock(&phydev->lock);
-
- return ret;
- }
-
- return -EOPNOTSUPP;
-}
-
static int __ethtool_get_sset_count(struct net_device *dev, int sset)
{
const struct ethtool_ops *ops = dev->ethtool_ops;
@@ -243,12 +228,9 @@
if (sset == ETH_SS_PHY_TUNABLES)
return ARRAY_SIZE(phy_tunable_strings);
- if (sset == ETH_SS_PHY_STATS) {
- if (dev->phydev)
- return phy_get_sset_count(dev->phydev);
- else
- return -EOPNOTSUPP;
- }
+ if (sset == ETH_SS_PHY_STATS && dev->phydev &&
+ !ops->get_ethtool_phy_stats)
+ return phy_ethtool_get_sset_count(dev->phydev);
if (ops->get_sset_count && ops->get_strings)
return ops->get_sset_count(dev, sset);
@@ -271,17 +253,10 @@
memcpy(data, tunable_strings, sizeof(tunable_strings));
else if (stringset == ETH_SS_PHY_TUNABLES)
memcpy(data, phy_tunable_strings, sizeof(phy_tunable_strings));
- else if (stringset == ETH_SS_PHY_STATS) {
- struct phy_device *phydev = dev->phydev;
-
- if (phydev) {
- mutex_lock(&phydev->lock);
- phydev->drv->get_strings(phydev, data);
- mutex_unlock(&phydev->lock);
- } else {
- return;
- }
- } else
+ else if (stringset == ETH_SS_PHY_STATS && dev->phydev &&
+ !ops->get_ethtool_phy_stats)
+ phy_ethtool_get_strings(dev->phydev, data);
+ else
/* ops->get_strings is valid because checked earlier */
ops->get_strings(dev, stringset, data);
}
@@ -1998,15 +1973,19 @@
static int ethtool_get_phy_stats(struct net_device *dev, void __user *useraddr)
{
- struct ethtool_stats stats;
+ const struct ethtool_ops *ops = dev->ethtool_ops;
struct phy_device *phydev = dev->phydev;
+ struct ethtool_stats stats;
u64 *data;
int ret, n_stats;
- if (!phydev)
+ if (!phydev && (!ops->get_ethtool_phy_stats || !ops->get_sset_count))
return -EOPNOTSUPP;
- n_stats = phy_get_sset_count(phydev);
+ if (dev->phydev && !ops->get_ethtool_phy_stats)
+ n_stats = phy_ethtool_get_sset_count(dev->phydev);
+ else
+ n_stats = ops->get_sset_count(dev, ETH_SS_PHY_STATS);
if (n_stats < 0)
return n_stats;
if (n_stats > S32_MAX / sizeof(u64))
@@ -2021,9 +2000,13 @@
if (n_stats && !data)
return -ENOMEM;
- mutex_lock(&phydev->lock);
- phydev->drv->get_stats(phydev, &stats, data);
- mutex_unlock(&phydev->lock);
+ if (dev->phydev && !ops->get_ethtool_phy_stats) {
+ ret = phy_ethtool_get_stats(dev->phydev, &stats, data);
+ if (ret < 0)
+ return ret;
+ } else {
+ ops->get_ethtool_phy_stats(dev, &stats, data);
+ }
ret = -EFAULT;
if (copy_to_user(useraddr, &stats, sizeof(stats)))
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 33958f8..126ffc5 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -387,85 +387,263 @@
}
EXPORT_SYMBOL_GPL(fib_rules_seq_read);
-static int validate_rulemsg(struct fib_rule_hdr *frh, struct nlattr **tb,
- struct fib_rules_ops *ops)
-{
- int err = -EINVAL;
-
- if (frh->src_len)
- if (tb[FRA_SRC] == NULL ||
- frh->src_len > (ops->addr_size * 8) ||
- nla_len(tb[FRA_SRC]) != ops->addr_size)
- goto errout;
-
- if (frh->dst_len)
- if (tb[FRA_DST] == NULL ||
- frh->dst_len > (ops->addr_size * 8) ||
- nla_len(tb[FRA_DST]) != ops->addr_size)
- goto errout;
-
- err = 0;
-errout:
- return err;
-}
-
-static int rule_exists(struct fib_rules_ops *ops, struct fib_rule_hdr *frh,
- struct nlattr **tb, struct fib_rule *rule)
+static struct fib_rule *rule_find(struct fib_rules_ops *ops,
+ struct fib_rule_hdr *frh,
+ struct nlattr **tb,
+ struct fib_rule *rule,
+ bool user_priority)
{
struct fib_rule *r;
list_for_each_entry(r, &ops->rules_list, list) {
- if (r->action != rule->action)
+ if (rule->action && r->action != rule->action)
continue;
- if (r->table != rule->table)
+ if (rule->table && r->table != rule->table)
continue;
- if (r->pref != rule->pref)
+ if (user_priority && r->pref != rule->pref)
continue;
- if (memcmp(r->iifname, rule->iifname, IFNAMSIZ))
+ if (rule->iifname[0] &&
+ memcmp(r->iifname, rule->iifname, IFNAMSIZ))
continue;
- if (memcmp(r->oifname, rule->oifname, IFNAMSIZ))
+ if (rule->oifname[0] &&
+ memcmp(r->oifname, rule->oifname, IFNAMSIZ))
continue;
- if (r->mark != rule->mark)
+ if (rule->mark && r->mark != rule->mark)
continue;
- if (r->mark_mask != rule->mark_mask)
+ if (rule->mark_mask && r->mark_mask != rule->mark_mask)
continue;
- if (r->tun_id != rule->tun_id)
+ if (rule->tun_id && r->tun_id != rule->tun_id)
continue;
if (r->fr_net != rule->fr_net)
continue;
- if (r->l3mdev != rule->l3mdev)
+ if (rule->l3mdev && r->l3mdev != rule->l3mdev)
continue;
- if (!uid_eq(r->uid_range.start, rule->uid_range.start) ||
- !uid_eq(r->uid_range.end, rule->uid_range.end))
+ if (uid_range_set(&rule->uid_range) &&
+ (!uid_eq(r->uid_range.start, rule->uid_range.start) ||
+ !uid_eq(r->uid_range.end, rule->uid_range.end)))
continue;
- if (r->ip_proto != rule->ip_proto)
+ if (rule->ip_proto && r->ip_proto != rule->ip_proto)
continue;
- if (!fib_rule_port_range_compare(&r->sport_range,
+ if (fib_rule_port_range_set(&rule->sport_range) &&
+ !fib_rule_port_range_compare(&r->sport_range,
&rule->sport_range))
continue;
- if (!fib_rule_port_range_compare(&r->dport_range,
+ if (fib_rule_port_range_set(&rule->dport_range) &&
+ !fib_rule_port_range_compare(&r->dport_range,
&rule->dport_range))
continue;
if (!ops->compare(r, frh, tb))
continue;
- return 1;
+ return r;
}
+
+ return NULL;
+}
+
+#ifdef CONFIG_NET_L3_MASTER_DEV
+static int fib_nl2rule_l3mdev(struct nlattr *nla, struct fib_rule *nlrule,
+ struct netlink_ext_ack *extack)
+{
+ nlrule->l3mdev = nla_get_u8(nla);
+ if (nlrule->l3mdev != 1) {
+ NL_SET_ERR_MSG(extack, "Invalid l3mdev attribute");
+ return -1;
+ }
+
return 0;
}
+#else
+static int fib_nl2rule_l3mdev(struct nlattr *nla, struct fib_rule *nlrule,
+ struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG(extack, "l3mdev support is not enabled in kernel");
+ return -1;
+}
+#endif
+
+static int fib_nl2rule(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack,
+ struct fib_rules_ops *ops,
+ struct nlattr *tb[],
+ struct fib_rule **rule,
+ bool *user_priority)
+{
+ struct net *net = sock_net(skb->sk);
+ struct fib_rule_hdr *frh = nlmsg_data(nlh);
+ struct fib_rule *nlrule = NULL;
+ int err = -EINVAL;
+
+ if (frh->src_len)
+ if (!tb[FRA_SRC] ||
+ frh->src_len > (ops->addr_size * 8) ||
+ nla_len(tb[FRA_SRC]) != ops->addr_size) {
+ NL_SET_ERR_MSG(extack, "Invalid source address");
+ goto errout;
+ }
+
+ if (frh->dst_len)
+ if (!tb[FRA_DST] ||
+ frh->dst_len > (ops->addr_size * 8) ||
+ nla_len(tb[FRA_DST]) != ops->addr_size) {
+ NL_SET_ERR_MSG(extack, "Invalid dst address");
+ goto errout;
+ }
+
+ nlrule = kzalloc(ops->rule_size, GFP_KERNEL);
+ if (!nlrule) {
+ err = -ENOMEM;
+ goto errout;
+ }
+ refcount_set(&nlrule->refcnt, 1);
+ nlrule->fr_net = net;
+
+ if (tb[FRA_PRIORITY]) {
+ nlrule->pref = nla_get_u32(tb[FRA_PRIORITY]);
+ *user_priority = true;
+ } else {
+ nlrule->pref = fib_default_rule_pref(ops);
+ }
+
+ nlrule->proto = tb[FRA_PROTOCOL] ?
+ nla_get_u8(tb[FRA_PROTOCOL]) : RTPROT_UNSPEC;
+
+ if (tb[FRA_IIFNAME]) {
+ struct net_device *dev;
+
+ nlrule->iifindex = -1;
+ nla_strlcpy(nlrule->iifname, tb[FRA_IIFNAME], IFNAMSIZ);
+ dev = __dev_get_by_name(net, nlrule->iifname);
+ if (dev)
+ nlrule->iifindex = dev->ifindex;
+ }
+
+ if (tb[FRA_OIFNAME]) {
+ struct net_device *dev;
+
+ nlrule->oifindex = -1;
+ nla_strlcpy(nlrule->oifname, tb[FRA_OIFNAME], IFNAMSIZ);
+ dev = __dev_get_by_name(net, nlrule->oifname);
+ if (dev)
+ nlrule->oifindex = dev->ifindex;
+ }
+
+ if (tb[FRA_FWMARK]) {
+ nlrule->mark = nla_get_u32(tb[FRA_FWMARK]);
+ if (nlrule->mark)
+ /* compatibility: if the mark value is non-zero all bits
+ * are compared unless a mask is explicitly specified.
+ */
+ nlrule->mark_mask = 0xFFFFFFFF;
+ }
+
+ if (tb[FRA_FWMASK])
+ nlrule->mark_mask = nla_get_u32(tb[FRA_FWMASK]);
+
+ if (tb[FRA_TUN_ID])
+ nlrule->tun_id = nla_get_be64(tb[FRA_TUN_ID]);
+
+ err = -EINVAL;
+ if (tb[FRA_L3MDEV] &&
+ fib_nl2rule_l3mdev(tb[FRA_L3MDEV], nlrule, extack) < 0)
+ goto errout_free;
+
+ nlrule->action = frh->action;
+ nlrule->flags = frh->flags;
+ nlrule->table = frh_get_table(frh, tb);
+ if (tb[FRA_SUPPRESS_PREFIXLEN])
+ nlrule->suppress_prefixlen = nla_get_u32(tb[FRA_SUPPRESS_PREFIXLEN]);
+ else
+ nlrule->suppress_prefixlen = -1;
+
+ if (tb[FRA_SUPPRESS_IFGROUP])
+ nlrule->suppress_ifgroup = nla_get_u32(tb[FRA_SUPPRESS_IFGROUP]);
+ else
+ nlrule->suppress_ifgroup = -1;
+
+ if (tb[FRA_GOTO]) {
+ if (nlrule->action != FR_ACT_GOTO) {
+ NL_SET_ERR_MSG(extack, "Unexpected goto");
+ goto errout_free;
+ }
+
+ nlrule->target = nla_get_u32(tb[FRA_GOTO]);
+ /* Backward jumps are prohibited to avoid endless loops */
+ if (nlrule->target <= nlrule->pref) {
+ NL_SET_ERR_MSG(extack, "Backward goto not supported");
+ goto errout_free;
+ }
+ } else if (nlrule->action == FR_ACT_GOTO) {
+ NL_SET_ERR_MSG(extack, "Missing goto target for action goto");
+ goto errout_free;
+ }
+
+ if (nlrule->l3mdev && nlrule->table) {
+ NL_SET_ERR_MSG(extack, "l3mdev and table are mutually exclusive");
+ goto errout_free;
+ }
+
+ if (tb[FRA_UID_RANGE]) {
+ if (current_user_ns() != net->user_ns) {
+ err = -EPERM;
+ NL_SET_ERR_MSG(extack, "No permission to set uid");
+ goto errout_free;
+ }
+
+ nlrule->uid_range = nla_get_kuid_range(tb);
+
+ if (!uid_range_set(&nlrule->uid_range) ||
+ !uid_lte(nlrule->uid_range.start, nlrule->uid_range.end)) {
+ NL_SET_ERR_MSG(extack, "Invalid uid range");
+ goto errout_free;
+ }
+ } else {
+ nlrule->uid_range = fib_kuid_range_unset;
+ }
+
+ if (tb[FRA_IP_PROTO])
+ nlrule->ip_proto = nla_get_u8(tb[FRA_IP_PROTO]);
+
+ if (tb[FRA_SPORT_RANGE]) {
+ err = nla_get_port_range(tb[FRA_SPORT_RANGE],
+ &nlrule->sport_range);
+ if (err) {
+ NL_SET_ERR_MSG(extack, "Invalid sport range");
+ goto errout_free;
+ }
+ }
+
+ if (tb[FRA_DPORT_RANGE]) {
+ err = nla_get_port_range(tb[FRA_DPORT_RANGE],
+ &nlrule->dport_range);
+ if (err) {
+ NL_SET_ERR_MSG(extack, "Invalid dport range");
+ goto errout_free;
+ }
+ }
+
+ *rule = nlrule;
+
+ return 0;
+
+errout_free:
+ kfree(nlrule);
+errout:
+ return err;
+}
int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
@@ -473,161 +651,40 @@
struct net *net = sock_net(skb->sk);
struct fib_rule_hdr *frh = nlmsg_data(nlh);
struct fib_rules_ops *ops = NULL;
- struct fib_rule *rule, *r, *last = NULL;
- struct nlattr *tb[FRA_MAX+1];
+ struct fib_rule *rule = NULL, *r, *last = NULL;
+ struct nlattr *tb[FRA_MAX + 1];
int err = -EINVAL, unresolved = 0;
+ bool user_priority = false;
- if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh)))
+ if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh))) {
+ NL_SET_ERR_MSG(extack, "Invalid msg length");
goto errout;
+ }
ops = lookup_rules_ops(net, frh->family);
- if (ops == NULL) {
+ if (!ops) {
err = -EAFNOSUPPORT;
+ NL_SET_ERR_MSG(extack, "Rule family not supported");
goto errout;
}
err = nlmsg_parse(nlh, sizeof(*frh), tb, FRA_MAX, ops->policy, extack);
- if (err < 0)
- goto errout;
-
- err = validate_rulemsg(frh, tb, ops);
- if (err < 0)
- goto errout;
-
- rule = kzalloc(ops->rule_size, GFP_KERNEL);
- if (rule == NULL) {
- err = -ENOMEM;
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Error parsing msg");
goto errout;
}
- refcount_set(&rule->refcnt, 1);
- rule->fr_net = net;
- rule->pref = tb[FRA_PRIORITY] ? nla_get_u32(tb[FRA_PRIORITY])
- : fib_default_rule_pref(ops);
-
- rule->proto = tb[FRA_PROTOCOL] ?
- nla_get_u8(tb[FRA_PROTOCOL]) : RTPROT_UNSPEC;
-
- if (tb[FRA_IIFNAME]) {
- struct net_device *dev;
-
- rule->iifindex = -1;
- nla_strlcpy(rule->iifname, tb[FRA_IIFNAME], IFNAMSIZ);
- dev = __dev_get_by_name(net, rule->iifname);
- if (dev)
- rule->iifindex = dev->ifindex;
- }
-
- if (tb[FRA_OIFNAME]) {
- struct net_device *dev;
-
- rule->oifindex = -1;
- nla_strlcpy(rule->oifname, tb[FRA_OIFNAME], IFNAMSIZ);
- dev = __dev_get_by_name(net, rule->oifname);
- if (dev)
- rule->oifindex = dev->ifindex;
- }
-
- if (tb[FRA_FWMARK]) {
- rule->mark = nla_get_u32(tb[FRA_FWMARK]);
- if (rule->mark)
- /* compatibility: if the mark value is non-zero all bits
- * are compared unless a mask is explicitly specified.
- */
- rule->mark_mask = 0xFFFFFFFF;
- }
-
- if (tb[FRA_FWMASK])
- rule->mark_mask = nla_get_u32(tb[FRA_FWMASK]);
-
- if (tb[FRA_TUN_ID])
- rule->tun_id = nla_get_be64(tb[FRA_TUN_ID]);
-
- err = -EINVAL;
- if (tb[FRA_L3MDEV]) {
-#ifdef CONFIG_NET_L3_MASTER_DEV
- rule->l3mdev = nla_get_u8(tb[FRA_L3MDEV]);
- if (rule->l3mdev != 1)
-#endif
- goto errout_free;
- }
-
- rule->action = frh->action;
- rule->flags = frh->flags;
- rule->table = frh_get_table(frh, tb);
- if (tb[FRA_SUPPRESS_PREFIXLEN])
- rule->suppress_prefixlen = nla_get_u32(tb[FRA_SUPPRESS_PREFIXLEN]);
- else
- rule->suppress_prefixlen = -1;
-
- if (tb[FRA_SUPPRESS_IFGROUP])
- rule->suppress_ifgroup = nla_get_u32(tb[FRA_SUPPRESS_IFGROUP]);
- else
- rule->suppress_ifgroup = -1;
-
- if (tb[FRA_GOTO]) {
- if (rule->action != FR_ACT_GOTO)
- goto errout_free;
-
- rule->target = nla_get_u32(tb[FRA_GOTO]);
- /* Backward jumps are prohibited to avoid endless loops */
- if (rule->target <= rule->pref)
- goto errout_free;
-
- list_for_each_entry(r, &ops->rules_list, list) {
- if (r->pref == rule->target) {
- RCU_INIT_POINTER(rule->ctarget, r);
- break;
- }
- }
-
- if (rcu_dereference_protected(rule->ctarget, 1) == NULL)
- unresolved = 1;
- } else if (rule->action == FR_ACT_GOTO)
- goto errout_free;
-
- if (rule->l3mdev && rule->table)
- goto errout_free;
-
- if (tb[FRA_UID_RANGE]) {
- if (current_user_ns() != net->user_ns) {
- err = -EPERM;
- goto errout_free;
- }
-
- rule->uid_range = nla_get_kuid_range(tb);
-
- if (!uid_range_set(&rule->uid_range) ||
- !uid_lte(rule->uid_range.start, rule->uid_range.end))
- goto errout_free;
- } else {
- rule->uid_range = fib_kuid_range_unset;
- }
-
- if (tb[FRA_IP_PROTO])
- rule->ip_proto = nla_get_u8(tb[FRA_IP_PROTO]);
-
- if (tb[FRA_SPORT_RANGE]) {
- err = nla_get_port_range(tb[FRA_SPORT_RANGE],
- &rule->sport_range);
- if (err)
- goto errout_free;
- }
-
- if (tb[FRA_DPORT_RANGE]) {
- err = nla_get_port_range(tb[FRA_DPORT_RANGE],
- &rule->dport_range);
- if (err)
- goto errout_free;
- }
+ err = fib_nl2rule(skb, nlh, extack, ops, tb, &rule, &user_priority);
+ if (err)
+ goto errout;
if ((nlh->nlmsg_flags & NLM_F_EXCL) &&
- rule_exists(ops, frh, tb, rule)) {
+ rule_find(ops, frh, tb, rule, user_priority)) {
err = -EEXIST;
goto errout_free;
}
- err = ops->configure(rule, skb, frh, tb);
+ err = ops->configure(rule, skb, frh, tb, extack);
if (err < 0)
goto errout_free;
@@ -637,6 +694,16 @@
goto errout_free;
list_for_each_entry(r, &ops->rules_list, list) {
+ if (r->pref == rule->target) {
+ RCU_INIT_POINTER(rule->ctarget, r);
+ break;
+ }
+ }
+
+ if (rcu_dereference_protected(rule->ctarget, 1) == NULL)
+ unresolved = 1;
+
+ list_for_each_entry(r, &ops->rules_list, list) {
if (r->pref > rule->pref)
break;
last = r;
@@ -690,171 +757,97 @@
{
struct net *net = sock_net(skb->sk);
struct fib_rule_hdr *frh = nlmsg_data(nlh);
- struct fib_rule_port_range sprange = {0, 0};
- struct fib_rule_port_range dprange = {0, 0};
struct fib_rules_ops *ops = NULL;
- struct fib_rule *rule, *r;
+ struct fib_rule *rule = NULL, *r, *nlrule = NULL;
struct nlattr *tb[FRA_MAX+1];
- struct fib_kuid_range range;
int err = -EINVAL;
+ bool user_priority = false;
- if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh)))
+ if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh))) {
+ NL_SET_ERR_MSG(extack, "Invalid msg length");
goto errout;
+ }
ops = lookup_rules_ops(net, frh->family);
if (ops == NULL) {
err = -EAFNOSUPPORT;
+ NL_SET_ERR_MSG(extack, "Rule family not supported");
goto errout;
}
err = nlmsg_parse(nlh, sizeof(*frh), tb, FRA_MAX, ops->policy, extack);
- if (err < 0)
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Error parsing msg");
goto errout;
-
- err = validate_rulemsg(frh, tb, ops);
- if (err < 0)
- goto errout;
-
- if (tb[FRA_UID_RANGE]) {
- range = nla_get_kuid_range(tb);
- if (!uid_range_set(&range)) {
- err = -EINVAL;
- goto errout;
- }
- } else {
- range = fib_kuid_range_unset;
}
- if (tb[FRA_SPORT_RANGE]) {
- err = nla_get_port_range(tb[FRA_SPORT_RANGE],
- &sprange);
+ err = fib_nl2rule(skb, nlh, extack, ops, tb, &nlrule, &user_priority);
+ if (err)
+ goto errout;
+
+ rule = rule_find(ops, frh, tb, nlrule, user_priority);
+ if (!rule) {
+ err = -ENOENT;
+ goto errout;
+ }
+
+ if (rule->flags & FIB_RULE_PERMANENT) {
+ err = -EPERM;
+ goto errout;
+ }
+
+ if (ops->delete) {
+ err = ops->delete(rule);
if (err)
goto errout;
}
- if (tb[FRA_DPORT_RANGE]) {
- err = nla_get_port_range(tb[FRA_DPORT_RANGE],
- &dprange);
- if (err)
- goto errout;
+ if (rule->tun_id)
+ ip_tunnel_unneed_metadata();
+
+ list_del_rcu(&rule->list);
+
+ if (rule->action == FR_ACT_GOTO) {
+ ops->nr_goto_rules--;
+ if (rtnl_dereference(rule->ctarget) == NULL)
+ ops->unresolved_rules--;
}
- list_for_each_entry(rule, &ops->rules_list, list) {
- if (tb[FRA_PROTOCOL] &&
- (rule->proto != nla_get_u8(tb[FRA_PROTOCOL])))
- continue;
+ /*
+ * Check if this rule is a target to any of them. If so,
+ * adjust to the next one with the same preference or
+ * disable them. As this operation is eventually very
+ * expensive, it is only performed if goto rules, except
+ * current if it is goto rule, have actually been added.
+ */
+ if (ops->nr_goto_rules > 0) {
+ struct fib_rule *n;
- if (frh->action && (frh->action != rule->action))
- continue;
-
- if (frh_get_table(frh, tb) &&
- (frh_get_table(frh, tb) != rule->table))
- continue;
-
- if (tb[FRA_PRIORITY] &&
- (rule->pref != nla_get_u32(tb[FRA_PRIORITY])))
- continue;
-
- if (tb[FRA_IIFNAME] &&
- nla_strcmp(tb[FRA_IIFNAME], rule->iifname))
- continue;
-
- if (tb[FRA_OIFNAME] &&
- nla_strcmp(tb[FRA_OIFNAME], rule->oifname))
- continue;
-
- if (tb[FRA_FWMARK] &&
- (rule->mark != nla_get_u32(tb[FRA_FWMARK])))
- continue;
-
- if (tb[FRA_FWMASK] &&
- (rule->mark_mask != nla_get_u32(tb[FRA_FWMASK])))
- continue;
-
- if (tb[FRA_TUN_ID] &&
- (rule->tun_id != nla_get_be64(tb[FRA_TUN_ID])))
- continue;
-
- if (tb[FRA_L3MDEV] &&
- (rule->l3mdev != nla_get_u8(tb[FRA_L3MDEV])))
- continue;
-
- if (uid_range_set(&range) &&
- (!uid_eq(rule->uid_range.start, range.start) ||
- !uid_eq(rule->uid_range.end, range.end)))
- continue;
-
- if (tb[FRA_IP_PROTO] &&
- (rule->ip_proto != nla_get_u8(tb[FRA_IP_PROTO])))
- continue;
-
- if (fib_rule_port_range_set(&sprange) &&
- !fib_rule_port_range_compare(&rule->sport_range, &sprange))
- continue;
-
- if (fib_rule_port_range_set(&dprange) &&
- !fib_rule_port_range_compare(&rule->dport_range, &dprange))
- continue;
-
- if (!ops->compare(rule, frh, tb))
- continue;
-
- if (rule->flags & FIB_RULE_PERMANENT) {
- err = -EPERM;
- goto errout;
+ n = list_next_entry(rule, list);
+ if (&n->list == &ops->rules_list || n->pref != rule->pref)
+ n = NULL;
+ list_for_each_entry(r, &ops->rules_list, list) {
+ if (rtnl_dereference(r->ctarget) != rule)
+ continue;
+ rcu_assign_pointer(r->ctarget, n);
+ if (!n)
+ ops->unresolved_rules++;
}
-
- if (ops->delete) {
- err = ops->delete(rule);
- if (err)
- goto errout;
- }
-
- if (rule->tun_id)
- ip_tunnel_unneed_metadata();
-
- list_del_rcu(&rule->list);
-
- if (rule->action == FR_ACT_GOTO) {
- ops->nr_goto_rules--;
- if (rtnl_dereference(rule->ctarget) == NULL)
- ops->unresolved_rules--;
- }
-
- /*
- * Check if this rule is a target to any of them. If so,
- * adjust to the next one with the same preference or
- * disable them. As this operation is eventually very
- * expensive, it is only performed if goto rules, except
- * current if it is goto rule, have actually been added.
- */
- if (ops->nr_goto_rules > 0) {
- struct fib_rule *n;
-
- n = list_next_entry(rule, list);
- if (&n->list == &ops->rules_list || n->pref != rule->pref)
- n = NULL;
- list_for_each_entry(r, &ops->rules_list, list) {
- if (rtnl_dereference(r->ctarget) != rule)
- continue;
- rcu_assign_pointer(r->ctarget, n);
- if (!n)
- ops->unresolved_rules++;
- }
- }
-
- call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL, rule, ops,
- NULL);
- notify_rule_change(RTM_DELRULE, rule, ops, nlh,
- NETLINK_CB(skb).portid);
- fib_rule_put(rule);
- flush_route_cache(ops);
- rules_ops_put(ops);
- return 0;
}
- err = -ENOENT;
+ call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL, rule, ops,
+ NULL);
+ notify_rule_change(RTM_DELRULE, rule, ops, nlh,
+ NETLINK_CB(skb).portid);
+ fib_rule_put(rule);
+ flush_route_cache(ops);
+ rules_ops_put(ops);
+ kfree(nlrule);
+ return 0;
+
errout:
+ if (nlrule)
+ kfree(nlrule);
rules_ops_put(ops);
return err;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 201ff36b..acf1f4f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -57,7 +57,17 @@
#include <net/sock_reuseport.h>
#include <net/busy_poll.h>
#include <net/tcp.h>
+#include <net/xfrm.h>
#include <linux/bpf_trace.h>
+#include <net/xdp_sock.h>
+#include <linux/inetdevice.h>
+#include <net/ip_fib.h>
+#include <net/flow.h>
+#include <net/arp.h>
+#include <net/ipv6.h>
+#include <linux/seg6_local.h>
+#include <net/seg6.h>
+#include <net/seg6_local.h>
/**
* sk_filter_trim_cap - run a packet through a socket filter
@@ -111,12 +121,12 @@
}
EXPORT_SYMBOL(sk_filter_trim_cap);
-BPF_CALL_1(__skb_get_pay_offset, struct sk_buff *, skb)
+BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
{
return skb_get_poff(skb);
}
-BPF_CALL_3(__skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
+BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
{
struct nlattr *nla;
@@ -136,7 +146,7 @@
return 0;
}
-BPF_CALL_3(__skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
+BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
{
struct nlattr *nla;
@@ -160,13 +170,94 @@
return 0;
}
-BPF_CALL_0(__get_raw_cpu_id)
+BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
+ data, int, headlen, int, offset)
+{
+ u8 tmp, *ptr;
+ const int len = sizeof(tmp);
+
+ if (offset >= 0) {
+ if (headlen - offset >= len)
+ return *(u8 *)(data + offset);
+ if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+ return tmp;
+ } else {
+ ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
+ if (likely(ptr))
+ return *(u8 *)ptr;
+ }
+
+ return -EFAULT;
+}
+
+BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
+ int, offset)
+{
+ return ____bpf_skb_load_helper_8(skb, skb->data, skb->len - skb->data_len,
+ offset);
+}
+
+BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *,
+ data, int, headlen, int, offset)
+{
+ u16 tmp, *ptr;
+ const int len = sizeof(tmp);
+
+ if (offset >= 0) {
+ if (headlen - offset >= len)
+ return get_unaligned_be16(data + offset);
+ if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+ return be16_to_cpu(tmp);
+ } else {
+ ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
+ if (likely(ptr))
+ return get_unaligned_be16(ptr);
+ }
+
+ return -EFAULT;
+}
+
+BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
+ int, offset)
+{
+ return ____bpf_skb_load_helper_16(skb, skb->data, skb->len - skb->data_len,
+ offset);
+}
+
+BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *,
+ data, int, headlen, int, offset)
+{
+ u32 tmp, *ptr;
+ const int len = sizeof(tmp);
+
+ if (likely(offset >= 0)) {
+ if (headlen - offset >= len)
+ return get_unaligned_be32(data + offset);
+ if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
+ return be32_to_cpu(tmp);
+ } else {
+ ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
+ if (likely(ptr))
+ return get_unaligned_be32(ptr);
+ }
+
+ return -EFAULT;
+}
+
+BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb,
+ int, offset)
+{
+ return ____bpf_skb_load_helper_32(skb, skb->data, skb->len - skb->data_len,
+ offset);
+}
+
+BPF_CALL_0(bpf_get_raw_cpu_id)
{
return raw_smp_processor_id();
}
static const struct bpf_func_proto bpf_get_raw_smp_processor_id_proto = {
- .func = __get_raw_cpu_id,
+ .func = bpf_get_raw_cpu_id,
.gpl_only = false,
.ret_type = RET_INTEGER,
};
@@ -316,16 +407,16 @@
/* Emit call(arg1=CTX, arg2=A, arg3=X) */
switch (fp->k) {
case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
- *insn = BPF_EMIT_CALL(__skb_get_pay_offset);
+ *insn = BPF_EMIT_CALL(bpf_skb_get_pay_offset);
break;
case SKF_AD_OFF + SKF_AD_NLATTR:
- *insn = BPF_EMIT_CALL(__skb_get_nlattr);
+ *insn = BPF_EMIT_CALL(bpf_skb_get_nlattr);
break;
case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
- *insn = BPF_EMIT_CALL(__skb_get_nlattr_nest);
+ *insn = BPF_EMIT_CALL(bpf_skb_get_nlattr_nest);
break;
case SKF_AD_OFF + SKF_AD_CPU:
- *insn = BPF_EMIT_CALL(__get_raw_cpu_id);
+ *insn = BPF_EMIT_CALL(bpf_get_raw_cpu_id);
break;
case SKF_AD_OFF + SKF_AD_RANDOM:
*insn = BPF_EMIT_CALL(bpf_user_rnd_u32);
@@ -352,26 +443,87 @@
return true;
}
+static bool convert_bpf_ld_abs(struct sock_filter *fp, struct bpf_insn **insnp)
+{
+ const bool unaligned_ok = IS_BUILTIN(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS);
+ int size = bpf_size_to_bytes(BPF_SIZE(fp->code));
+ bool endian = BPF_SIZE(fp->code) == BPF_H ||
+ BPF_SIZE(fp->code) == BPF_W;
+ bool indirect = BPF_MODE(fp->code) == BPF_IND;
+ const int ip_align = NET_IP_ALIGN;
+ struct bpf_insn *insn = *insnp;
+ int offset = fp->k;
+
+ if (!indirect &&
+ ((unaligned_ok && offset >= 0) ||
+ (!unaligned_ok && offset >= 0 &&
+ offset + ip_align >= 0 &&
+ offset + ip_align % size == 0))) {
+ *insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_H);
+ *insn++ = BPF_ALU64_IMM(BPF_SUB, BPF_REG_TMP, offset);
+ *insn++ = BPF_JMP_IMM(BPF_JSLT, BPF_REG_TMP, size, 2 + endian);
+ *insn++ = BPF_LDX_MEM(BPF_SIZE(fp->code), BPF_REG_A, BPF_REG_D,
+ offset);
+ if (endian)
+ *insn++ = BPF_ENDIAN(BPF_FROM_BE, BPF_REG_A, size * 8);
+ *insn++ = BPF_JMP_A(8);
+ }
+
+ *insn++ = BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_CTX);
+ *insn++ = BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_D);
+ *insn++ = BPF_MOV64_REG(BPF_REG_ARG3, BPF_REG_H);
+ if (!indirect) {
+ *insn++ = BPF_MOV64_IMM(BPF_REG_ARG4, offset);
+ } else {
+ *insn++ = BPF_MOV64_REG(BPF_REG_ARG4, BPF_REG_X);
+ if (fp->k)
+ *insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG4, offset);
+ }
+
+ switch (BPF_SIZE(fp->code)) {
+ case BPF_B:
+ *insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_8);
+ break;
+ case BPF_H:
+ *insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_16);
+ break;
+ case BPF_W:
+ *insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_32);
+ break;
+ default:
+ return false;
+ }
+
+ *insn++ = BPF_JMP_IMM(BPF_JSGE, BPF_REG_A, 0, 2);
+ *insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+ *insn = BPF_EXIT_INSN();
+
+ *insnp = insn;
+ return true;
+}
+
/**
* bpf_convert_filter - convert filter program
* @prog: the user passed filter program
* @len: the length of the user passed filter program
* @new_prog: allocated 'struct bpf_prog' or NULL
* @new_len: pointer to store length of converted program
+ * @seen_ld_abs: bool whether we've seen ld_abs/ind
*
* Remap 'sock_filter' style classic BPF (cBPF) instruction set to 'bpf_insn'
* style extended BPF (eBPF).
* Conversion workflow:
*
* 1) First pass for calculating the new program length:
- * bpf_convert_filter(old_prog, old_len, NULL, &new_len)
+ * bpf_convert_filter(old_prog, old_len, NULL, &new_len, &seen_ld_abs)
*
* 2) 2nd pass to remap in two passes: 1st pass finds new
* jump offsets, 2nd pass remapping:
- * bpf_convert_filter(old_prog, old_len, new_prog, &new_len);
+ * bpf_convert_filter(old_prog, old_len, new_prog, &new_len, &seen_ld_abs)
*/
static int bpf_convert_filter(struct sock_filter *prog, int len,
- struct bpf_prog *new_prog, int *new_len)
+ struct bpf_prog *new_prog, int *new_len,
+ bool *seen_ld_abs)
{
int new_flen = 0, pass = 0, target, i, stack_off;
struct bpf_insn *new_insn, *first_insn = NULL;
@@ -410,12 +562,27 @@
* do this ourself. Initial CTX is present in BPF_REG_ARG1.
*/
*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+ if (*seen_ld_abs) {
+ /* For packet access in classic BPF, cache skb->data
+ * in callee-saved BPF R8 and skb->len - skb->data_len
+ * (headlen) in BPF R9. Since classic BPF is read-only
+ * on CTX, we only need to cache it once.
+ */
+ *new_insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data),
+ BPF_REG_D, BPF_REG_CTX,
+ offsetof(struct sk_buff, data));
+ *new_insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_H, BPF_REG_CTX,
+ offsetof(struct sk_buff, len));
+ *new_insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_TMP, BPF_REG_CTX,
+ offsetof(struct sk_buff, data_len));
+ *new_insn++ = BPF_ALU32_REG(BPF_SUB, BPF_REG_H, BPF_REG_TMP);
+ }
} else {
new_insn += 3;
}
for (i = 0; i < len; fp++, i++) {
- struct bpf_insn tmp_insns[6] = { };
+ struct bpf_insn tmp_insns[32] = { };
struct bpf_insn *insn = tmp_insns;
if (addrs)
@@ -458,6 +625,11 @@
BPF_MODE(fp->code) == BPF_ABS &&
convert_bpf_extensions(fp, &insn))
break;
+ if (BPF_CLASS(fp->code) == BPF_LD &&
+ convert_bpf_ld_abs(fp, &insn)) {
+ *seen_ld_abs = true;
+ break;
+ }
if (fp->code == (BPF_ALU | BPF_DIV | BPF_X) ||
fp->code == (BPF_ALU | BPF_MOD | BPF_X)) {
@@ -567,21 +739,31 @@
break;
/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
- case BPF_LDX | BPF_MSH | BPF_B:
- /* tmp = A */
- *insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+ case BPF_LDX | BPF_MSH | BPF_B: {
+ struct sock_filter tmp = {
+ .code = BPF_LD | BPF_ABS | BPF_B,
+ .k = fp->k,
+ };
+
+ *seen_ld_abs = true;
+
+ /* X = A */
+ *insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
/* A = BPF_R0 = *(u8 *) (skb->data + K) */
- *insn++ = BPF_LD_ABS(BPF_B, fp->k);
+ convert_bpf_ld_abs(&tmp, &insn);
+ insn++;
/* A &= 0xf */
*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
/* A <<= 2 */
*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+ /* tmp = X */
+ *insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_X);
/* X = A */
*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
/* A = tmp */
*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
break;
-
+ }
/* RET_K is remaped into 2 insns. RET_A case doesn't need an
* extra mov as BPF_REG_0 is already mapped into BPF_REG_A.
*/
@@ -663,6 +845,8 @@
if (!new_prog) {
/* Only calculating new length. */
*new_len = new_insn - first_insn;
+ if (*seen_ld_abs)
+ *new_len += 4; /* Prologue bits. */
return 0;
}
@@ -1024,6 +1208,7 @@
struct sock_filter *old_prog;
struct bpf_prog *old_fp;
int err, new_len, old_len = fp->len;
+ bool seen_ld_abs = false;
/* We are free to overwrite insns et al right here as it
* won't be used at this point in time anymore internally
@@ -1045,7 +1230,8 @@
}
/* 1st pass: calculate the new program length. */
- err = bpf_convert_filter(old_prog, old_len, NULL, &new_len);
+ err = bpf_convert_filter(old_prog, old_len, NULL, &new_len,
+ &seen_ld_abs);
if (err)
goto out_err_free;
@@ -1064,7 +1250,8 @@
fp->len = new_len;
/* 2nd pass: remap sock_filter insns into bpf_insn insns. */
- err = bpf_convert_filter(old_prog, old_len, fp, &new_len);
+ err = bpf_convert_filter(old_prog, old_len, fp, &new_len,
+ &seen_ld_abs);
if (err)
/* 2nd bpf_convert_filter() can fail only if it fails
* to allocate memory, remapping must succeed. Note,
@@ -1512,6 +1699,47 @@
.arg4_type = ARG_CONST_SIZE,
};
+BPF_CALL_5(bpf_skb_load_bytes_relative, const struct sk_buff *, skb,
+ u32, offset, void *, to, u32, len, u32, start_header)
+{
+ u8 *ptr;
+
+ if (unlikely(offset > 0xffff || len > skb_headlen(skb)))
+ goto err_clear;
+
+ switch (start_header) {
+ case BPF_HDR_START_MAC:
+ ptr = skb_mac_header(skb) + offset;
+ break;
+ case BPF_HDR_START_NET:
+ ptr = skb_network_header(skb) + offset;
+ break;
+ default:
+ goto err_clear;
+ }
+
+ if (likely(ptr >= skb_mac_header(skb) &&
+ ptr + len <= skb_tail_pointer(skb))) {
+ memcpy(to, ptr, len);
+ return 0;
+ }
+
+err_clear:
+ memset(to, 0, len);
+ return -EFAULT;
+}
+
+static const struct bpf_func_proto bpf_skb_load_bytes_relative_proto = {
+ .func = bpf_skb_load_bytes_relative,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg4_type = ARG_CONST_SIZE,
+ .arg5_type = ARG_ANYTHING,
+};
+
BPF_CALL_2(bpf_skb_pull_data, struct sk_buff *, skb, u32, len)
{
/* Idea is the following: should the needed direct read/write
@@ -1857,6 +2085,33 @@
.arg2_type = ARG_ANYTHING,
};
+BPF_CALL_4(bpf_sk_redirect_hash, struct sk_buff *, skb,
+ struct bpf_map *, map, void *, key, u64, flags)
+{
+ struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
+
+ /* If user passes invalid input drop the packet. */
+ if (unlikely(flags & ~(BPF_F_INGRESS)))
+ return SK_DROP;
+
+ tcb->bpf.flags = flags;
+ tcb->bpf.sk_redir = __sock_hash_lookup_elem(map, key);
+ if (!tcb->bpf.sk_redir)
+ return SK_DROP;
+
+ return SK_PASS;
+}
+
+static const struct bpf_func_proto bpf_sk_redirect_hash_proto = {
+ .func = bpf_sk_redirect_hash,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
+ .arg3_type = ARG_PTR_TO_MAP_KEY,
+ .arg4_type = ARG_ANYTHING,
+};
+
BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb,
struct bpf_map *, map, u32, key, u64, flags)
{
@@ -1866,9 +2121,10 @@
if (unlikely(flags & ~(BPF_F_INGRESS)))
return SK_DROP;
- tcb->bpf.key = key;
tcb->bpf.flags = flags;
- tcb->bpf.map = map;
+ tcb->bpf.sk_redir = __sock_map_lookup_elem(map, key);
+ if (!tcb->bpf.sk_redir)
+ return SK_DROP;
return SK_PASS;
}
@@ -1876,16 +2132,8 @@
struct sock *do_sk_redirect_map(struct sk_buff *skb)
{
struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
- struct sock *sk = NULL;
- if (tcb->bpf.map) {
- sk = __sock_map_lookup_elem(tcb->bpf.map, tcb->bpf.key);
-
- tcb->bpf.key = 0;
- tcb->bpf.map = NULL;
- }
-
- return sk;
+ return tcb->bpf.sk_redir;
}
static const struct bpf_func_proto bpf_sk_redirect_map_proto = {
@@ -1898,6 +2146,31 @@
.arg4_type = ARG_ANYTHING,
};
+BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg_buff *, msg,
+ struct bpf_map *, map, void *, key, u64, flags)
+{
+ /* If user passes invalid input drop the packet. */
+ if (unlikely(flags & ~(BPF_F_INGRESS)))
+ return SK_DROP;
+
+ msg->flags = flags;
+ msg->sk_redir = __sock_hash_lookup_elem(map, key);
+ if (!msg->sk_redir)
+ return SK_DROP;
+
+ return SK_PASS;
+}
+
+static const struct bpf_func_proto bpf_msg_redirect_hash_proto = {
+ .func = bpf_msg_redirect_hash,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
+ .arg3_type = ARG_PTR_TO_MAP_KEY,
+ .arg4_type = ARG_ANYTHING,
+};
+
BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg_buff *, msg,
struct bpf_map *, map, u32, key, u64, flags)
{
@@ -1905,25 +2178,17 @@
if (unlikely(flags & ~(BPF_F_INGRESS)))
return SK_DROP;
- msg->key = key;
msg->flags = flags;
- msg->map = map;
+ msg->sk_redir = __sock_map_lookup_elem(map, key);
+ if (!msg->sk_redir)
+ return SK_DROP;
return SK_PASS;
}
struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
{
- struct sock *sk = NULL;
-
- if (msg->map) {
- sk = __sock_map_lookup_elem(msg->map, msg->key);
-
- msg->key = 0;
- msg->map = NULL;
- }
-
- return sk;
+ return msg->sk_redir;
}
static const struct bpf_func_proto bpf_msg_redirect_map_proto = {
@@ -2186,7 +2451,7 @@
return ret;
}
-const struct bpf_func_proto bpf_skb_vlan_push_proto = {
+static const struct bpf_func_proto bpf_skb_vlan_push_proto = {
.func = bpf_skb_vlan_push,
.gpl_only = false,
.ret_type = RET_INTEGER,
@@ -2194,7 +2459,6 @@
.arg2_type = ARG_ANYTHING,
.arg3_type = ARG_ANYTHING,
};
-EXPORT_SYMBOL_GPL(bpf_skb_vlan_push_proto);
BPF_CALL_1(bpf_skb_vlan_pop, struct sk_buff *, skb)
{
@@ -2208,13 +2472,12 @@
return ret;
}
-const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
+static const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
.func = bpf_skb_vlan_pop,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
};
-EXPORT_SYMBOL_GPL(bpf_skb_vlan_pop_proto);
static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
{
@@ -2699,8 +2962,9 @@
BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)
{
+ void *xdp_frame_end = xdp->data_hard_start + sizeof(struct xdp_frame);
unsigned long metalen = xdp_get_metalen(xdp);
- void *data_start = xdp->data_hard_start + metalen;
+ void *data_start = xdp_frame_end + metalen;
void *data = xdp->data + offset;
if (unlikely(data < data_start ||
@@ -2724,14 +2988,39 @@
.arg2_type = ARG_ANYTHING,
};
+BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
+{
+ void *data_end = xdp->data_end + offset;
+
+ /* only shrinking is allowed for now. */
+ if (unlikely(offset >= 0))
+ return -EINVAL;
+
+ if (unlikely(data_end < xdp->data + ETH_HLEN))
+ return -EINVAL;
+
+ xdp->data_end = data_end;
+
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_adjust_tail_proto = {
+ .func = bpf_xdp_adjust_tail,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+};
+
BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
{
+ void *xdp_frame_end = xdp->data_hard_start + sizeof(struct xdp_frame);
void *meta = xdp->data_meta + offset;
unsigned long metalen = xdp->data - meta;
if (xdp_data_meta_unsupported(xdp))
return -ENOTSUPP;
- if (unlikely(meta < xdp->data_hard_start ||
+ if (unlikely(meta < xdp_frame_end ||
meta > xdp->data))
return -EINVAL;
if (unlikely((metalen & (sizeof(__u32) - 1)) ||
@@ -2756,15 +3045,20 @@
struct xdp_buff *xdp,
u32 index)
{
- int err;
+ struct xdp_frame *xdpf;
+ int sent;
if (!dev->netdev_ops->ndo_xdp_xmit) {
return -EOPNOTSUPP;
}
- err = dev->netdev_ops->ndo_xdp_xmit(dev, xdp);
- if (err)
- return err;
+ xdpf = convert_to_xdp_frame(xdp);
+ if (unlikely(!xdpf))
+ return -EOVERFLOW;
+
+ sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf);
+ if (sent <= 0)
+ return sent;
dev->netdev_ops->ndo_xdp_flush(dev);
return 0;
}
@@ -2776,24 +3070,33 @@
{
int err;
- if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
- struct net_device *dev = fwd;
+ switch (map->map_type) {
+ case BPF_MAP_TYPE_DEVMAP: {
+ struct bpf_dtab_netdev *dst = fwd;
- if (!dev->netdev_ops->ndo_xdp_xmit)
- return -EOPNOTSUPP;
-
- err = dev->netdev_ops->ndo_xdp_xmit(dev, xdp);
+ err = dev_map_enqueue(dst, xdp, dev_rx);
if (err)
return err;
__dev_map_insert_ctx(map, index);
-
- } else if (map->map_type == BPF_MAP_TYPE_CPUMAP) {
+ break;
+ }
+ case BPF_MAP_TYPE_CPUMAP: {
struct bpf_cpu_map_entry *rcpu = fwd;
err = cpu_map_enqueue(rcpu, xdp, dev_rx);
if (err)
return err;
__cpu_map_insert_ctx(map, index);
+ break;
+ }
+ case BPF_MAP_TYPE_XSKMAP: {
+ struct xdp_sock *xs = fwd;
+
+ err = __xsk_map_redirect(map, xdp, xs);
+ return err;
+ }
+ default:
+ break;
}
return 0;
}
@@ -2812,6 +3115,9 @@
case BPF_MAP_TYPE_CPUMAP:
__cpu_map_flush(map);
break;
+ case BPF_MAP_TYPE_XSKMAP:
+ __xsk_map_flush(map);
+ break;
default:
break;
}
@@ -2826,6 +3132,8 @@
return __dev_map_lookup_elem(map, index);
case BPF_MAP_TYPE_CPUMAP:
return __cpu_map_lookup_elem(map, index);
+ case BPF_MAP_TYPE_XSKMAP:
+ return __xsk_map_lookup_elem(map, index);
default:
return NULL;
}
@@ -2923,13 +3231,14 @@
static int xdp_do_generic_redirect_map(struct net_device *dev,
struct sk_buff *skb,
+ struct xdp_buff *xdp,
struct bpf_prog *xdp_prog)
{
struct redirect_info *ri = this_cpu_ptr(&redirect_info);
unsigned long map_owner = ri->map_owner;
struct bpf_map *map = ri->map;
- struct net_device *fwd = NULL;
u32 index = ri->ifindex;
+ void *fwd = NULL;
int err = 0;
ri->ifindex = 0;
@@ -2951,6 +3260,14 @@
if (unlikely((err = __xdp_generic_ok_fwd_dev(skb, fwd))))
goto err;
skb->dev = fwd;
+ generic_xdp_tx(skb, xdp_prog);
+ } else if (map->map_type == BPF_MAP_TYPE_XSKMAP) {
+ struct xdp_sock *xs = fwd;
+
+ err = xsk_generic_rcv(xs, xdp);
+ if (err)
+ goto err;
+ consume_skb(skb);
} else {
/* TODO: Handle BPF_MAP_TYPE_CPUMAP */
err = -EBADRQC;
@@ -2965,7 +3282,7 @@
}
int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *xdp_prog)
+ struct xdp_buff *xdp, struct bpf_prog *xdp_prog)
{
struct redirect_info *ri = this_cpu_ptr(&redirect_info);
u32 index = ri->ifindex;
@@ -2973,7 +3290,7 @@
int err = 0;
if (ri->map)
- return xdp_do_generic_redirect_map(dev, skb, xdp_prog);
+ return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog);
ri->ifindex = 0;
fwd = dev_get_by_index_rcu(dev_net(dev), index);
@@ -2987,6 +3304,7 @@
skb->dev = fwd;
_trace_xdp_redirect(dev, xdp_prog, index);
+ generic_xdp_tx(skb, xdp_prog);
return 0;
err:
_trace_xdp_redirect_err(dev, xdp_prog, index, err);
@@ -3045,27 +3363,6 @@
.arg3_type = ARG_ANYTHING,
};
-bool bpf_helper_changes_pkt_data(void *func)
-{
- if (func == bpf_skb_vlan_push ||
- func == bpf_skb_vlan_pop ||
- func == bpf_skb_store_bytes ||
- func == bpf_skb_change_proto ||
- func == bpf_skb_change_head ||
- func == bpf_skb_change_tail ||
- func == bpf_skb_adjust_room ||
- func == bpf_skb_pull_data ||
- func == bpf_clone_redirect ||
- func == bpf_l3_csum_replace ||
- func == bpf_l4_csum_replace ||
- func == bpf_xdp_adjust_head ||
- func == bpf_xdp_adjust_meta ||
- func == bpf_msg_pull_data)
- return true;
-
- return false;
-}
-
static unsigned long bpf_skb_copy(void *dst_buff, const void *skb,
unsigned long off, unsigned long len)
{
@@ -3711,6 +4008,594 @@
.arg3_type = ARG_CONST_SIZE,
};
+#ifdef CONFIG_XFRM
+BPF_CALL_5(bpf_skb_get_xfrm_state, struct sk_buff *, skb, u32, index,
+ struct bpf_xfrm_state *, to, u32, size, u64, flags)
+{
+ const struct sec_path *sp = skb_sec_path(skb);
+ const struct xfrm_state *x;
+
+ if (!sp || unlikely(index >= sp->len || flags))
+ goto err_clear;
+
+ x = sp->xvec[index];
+
+ if (unlikely(size != sizeof(struct bpf_xfrm_state)))
+ goto err_clear;
+
+ to->reqid = x->props.reqid;
+ to->spi = x->id.spi;
+ to->family = x->props.family;
+ if (to->family == AF_INET6) {
+ memcpy(to->remote_ipv6, x->props.saddr.a6,
+ sizeof(to->remote_ipv6));
+ } else {
+ to->remote_ipv4 = x->props.saddr.a4;
+ }
+
+ return 0;
+err_clear:
+ memset(to, 0, size);
+ return -EINVAL;
+}
+
+static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
+ .func = bpf_skb_get_xfrm_state,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg4_type = ARG_CONST_SIZE,
+ .arg5_type = ARG_ANYTHING,
+};
+#endif
+
+#if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
+static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
+ const struct neighbour *neigh,
+ const struct net_device *dev)
+{
+ memcpy(params->dmac, neigh->ha, ETH_ALEN);
+ memcpy(params->smac, dev->dev_addr, ETH_ALEN);
+ params->h_vlan_TCI = 0;
+ params->h_vlan_proto = 0;
+
+ return dev->ifindex;
+}
+#endif
+
+#if IS_ENABLED(CONFIG_INET)
+static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
+ u32 flags, bool check_mtu)
+{
+ struct in_device *in_dev;
+ struct neighbour *neigh;
+ struct net_device *dev;
+ struct fib_result res;
+ struct fib_nh *nh;
+ struct flowi4 fl4;
+ int err;
+ u32 mtu;
+
+ dev = dev_get_by_index_rcu(net, params->ifindex);
+ if (unlikely(!dev))
+ return -ENODEV;
+
+ /* verify forwarding is enabled on this interface */
+ in_dev = __in_dev_get_rcu(dev);
+ if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
+ return 0;
+
+ if (flags & BPF_FIB_LOOKUP_OUTPUT) {
+ fl4.flowi4_iif = 1;
+ fl4.flowi4_oif = params->ifindex;
+ } else {
+ fl4.flowi4_iif = params->ifindex;
+ fl4.flowi4_oif = 0;
+ }
+ fl4.flowi4_tos = params->tos & IPTOS_RT_MASK;
+ fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
+ fl4.flowi4_flags = 0;
+
+ fl4.flowi4_proto = params->l4_protocol;
+ fl4.daddr = params->ipv4_dst;
+ fl4.saddr = params->ipv4_src;
+ fl4.fl4_sport = params->sport;
+ fl4.fl4_dport = params->dport;
+
+ if (flags & BPF_FIB_LOOKUP_DIRECT) {
+ u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
+ struct fib_table *tb;
+
+ tb = fib_get_table(net, tbid);
+ if (unlikely(!tb))
+ return 0;
+
+ err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
+ } else {
+ fl4.flowi4_mark = 0;
+ fl4.flowi4_secid = 0;
+ fl4.flowi4_tun_key.tun_id = 0;
+ fl4.flowi4_uid = sock_net_uid(net, NULL);
+
+ err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
+ }
+
+ if (err || res.type != RTN_UNICAST)
+ return 0;
+
+ if (res.fi->fib_nhs > 1)
+ fib_select_path(net, &res, &fl4, NULL);
+
+ if (check_mtu) {
+ mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
+ if (params->tot_len > mtu)
+ return 0;
+ }
+
+ nh = &res.fi->fib_nh[res.nh_sel];
+
+ /* do not handle lwt encaps right now */
+ if (nh->nh_lwtstate)
+ return 0;
+
+ dev = nh->nh_dev;
+ if (unlikely(!dev))
+ return 0;
+
+ if (nh->nh_gw)
+ params->ipv4_dst = nh->nh_gw;
+
+ params->rt_metric = res.fi->fib_priority;
+
+ /* xdp and cls_bpf programs are run in RCU-bh so
+ * rcu_read_lock_bh is not needed here
+ */
+ neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
+ if (neigh)
+ return bpf_fib_set_fwd_params(params, neigh, dev);
+
+ return 0;
+}
+#endif
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
+ u32 flags, bool check_mtu)
+{
+ struct in6_addr *src = (struct in6_addr *) params->ipv6_src;
+ struct in6_addr *dst = (struct in6_addr *) params->ipv6_dst;
+ struct neighbour *neigh;
+ struct net_device *dev;
+ struct inet6_dev *idev;
+ struct fib6_info *f6i;
+ struct flowi6 fl6;
+ int strict = 0;
+ int oif;
+ u32 mtu;
+
+ /* link local addresses are never forwarded */
+ if (rt6_need_strict(dst) || rt6_need_strict(src))
+ return 0;
+
+ dev = dev_get_by_index_rcu(net, params->ifindex);
+ if (unlikely(!dev))
+ return -ENODEV;
+
+ idev = __in6_dev_get_safely(dev);
+ if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
+ return 0;
+
+ if (flags & BPF_FIB_LOOKUP_OUTPUT) {
+ fl6.flowi6_iif = 1;
+ oif = fl6.flowi6_oif = params->ifindex;
+ } else {
+ oif = fl6.flowi6_iif = params->ifindex;
+ fl6.flowi6_oif = 0;
+ strict = RT6_LOOKUP_F_HAS_SADDR;
+ }
+ fl6.flowlabel = params->flowlabel;
+ fl6.flowi6_scope = 0;
+ fl6.flowi6_flags = 0;
+ fl6.mp_hash = 0;
+
+ fl6.flowi6_proto = params->l4_protocol;
+ fl6.daddr = *dst;
+ fl6.saddr = *src;
+ fl6.fl6_sport = params->sport;
+ fl6.fl6_dport = params->dport;
+
+ if (flags & BPF_FIB_LOOKUP_DIRECT) {
+ u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
+ struct fib6_table *tb;
+
+ tb = ipv6_stub->fib6_get_table(net, tbid);
+ if (unlikely(!tb))
+ return 0;
+
+ f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
+ } else {
+ fl6.flowi6_mark = 0;
+ fl6.flowi6_secid = 0;
+ fl6.flowi6_tun_key.tun_id = 0;
+ fl6.flowi6_uid = sock_net_uid(net, NULL);
+
+ f6i = ipv6_stub->fib6_lookup(net, oif, &fl6, strict);
+ }
+
+ if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
+ return 0;
+
+ if (unlikely(f6i->fib6_flags & RTF_REJECT ||
+ f6i->fib6_type != RTN_UNICAST))
+ return 0;
+
+ if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
+ f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
+ fl6.flowi6_oif, NULL,
+ strict);
+
+ if (check_mtu) {
+ mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
+ if (params->tot_len > mtu)
+ return 0;
+ }
+
+ if (f6i->fib6_nh.nh_lwtstate)
+ return 0;
+
+ if (f6i->fib6_flags & RTF_GATEWAY)
+ *dst = f6i->fib6_nh.nh_gw;
+
+ dev = f6i->fib6_nh.nh_dev;
+ params->rt_metric = f6i->fib6_metric;
+
+ /* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is
+ * not needed here. Can not use __ipv6_neigh_lookup_noref here
+ * because we need to get nd_tbl via the stub
+ */
+ neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
+ ndisc_hashfn, dst, dev);
+ if (neigh)
+ return bpf_fib_set_fwd_params(params, neigh, dev);
+
+ return 0;
+}
+#endif
+
+BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
+ struct bpf_fib_lookup *, params, int, plen, u32, flags)
+{
+ if (plen < sizeof(*params))
+ return -EINVAL;
+
+ switch (params->family) {
+#if IS_ENABLED(CONFIG_INET)
+ case AF_INET:
+ return bpf_ipv4_fib_lookup(dev_net(ctx->rxq->dev), params,
+ flags, true);
+#endif
+#if IS_ENABLED(CONFIG_IPV6)
+ case AF_INET6:
+ return bpf_ipv6_fib_lookup(dev_net(ctx->rxq->dev), params,
+ flags, true);
+#endif
+ }
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
+ .func = bpf_xdp_fib_lookup,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_ANYTHING,
+};
+
+BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
+ struct bpf_fib_lookup *, params, int, plen, u32, flags)
+{
+ struct net *net = dev_net(skb->dev);
+ int index = 0;
+
+ if (plen < sizeof(*params))
+ return -EINVAL;
+
+ switch (params->family) {
+#if IS_ENABLED(CONFIG_INET)
+ case AF_INET:
+ index = bpf_ipv4_fib_lookup(net, params, flags, false);
+ break;
+#endif
+#if IS_ENABLED(CONFIG_IPV6)
+ case AF_INET6:
+ index = bpf_ipv6_fib_lookup(net, params, flags, false);
+ break;
+#endif
+ }
+
+ if (index > 0) {
+ struct net_device *dev;
+
+ dev = dev_get_by_index_rcu(net, index);
+ if (!is_skb_forwardable(dev, skb))
+ index = 0;
+ }
+
+ return index;
+}
+
+static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {
+ .func = bpf_skb_fib_lookup,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .arg4_type = ARG_ANYTHING,
+};
+
+#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
+static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+{
+ int err;
+ struct ipv6_sr_hdr *srh = (struct ipv6_sr_hdr *)hdr;
+
+ if (!seg6_validate_srh(srh, len))
+ return -EINVAL;
+
+ switch (type) {
+ case BPF_LWT_ENCAP_SEG6_INLINE:
+ if (skb->protocol != htons(ETH_P_IPV6))
+ return -EBADMSG;
+
+ err = seg6_do_srh_inline(skb, srh);
+ break;
+ case BPF_LWT_ENCAP_SEG6:
+ skb_reset_inner_headers(skb);
+ skb->encapsulation = 1;
+ err = seg6_do_srh_encap(skb, srh, IPPROTO_IPV6);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ bpf_compute_data_pointers(skb);
+ if (err)
+ return err;
+
+ ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+ skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+
+ return seg6_lookup_nexthop(skb, NULL, 0);
+}
+#endif /* CONFIG_IPV6_SEG6_BPF */
+
+BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
+ u32, len)
+{
+ switch (type) {
+#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
+ case BPF_LWT_ENCAP_SEG6:
+ case BPF_LWT_ENCAP_SEG6_INLINE:
+ return bpf_push_seg6_encap(skb, type, hdr, len);
+#endif
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct bpf_func_proto bpf_lwt_push_encap_proto = {
+ .func = bpf_lwt_push_encap,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
+ const void *, from, u32, len)
+{
+#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
+ struct seg6_bpf_srh_state *srh_state =
+ this_cpu_ptr(&seg6_bpf_srh_states);
+ void *srh_tlvs, *srh_end, *ptr;
+ struct ipv6_sr_hdr *srh;
+ int srhoff = 0;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ return -EINVAL;
+
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+ srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
+ srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
+
+ ptr = skb->data + offset;
+ if (ptr >= srh_tlvs && ptr + len <= srh_end)
+ srh_state->valid = 0;
+ else if (ptr < (void *)&srh->flags ||
+ ptr + len > (void *)&srh->segments)
+ return -EFAULT;
+
+ if (unlikely(bpf_try_make_writable(skb, offset + len)))
+ return -EFAULT;
+
+ memcpy(skb->data + offset, from, len);
+ return 0;
+#else /* CONFIG_IPV6_SEG6_BPF */
+ return -EOPNOTSUPP;
+#endif
+}
+
+static const struct bpf_func_proto bpf_lwt_seg6_store_bytes_proto = {
+ .func = bpf_lwt_seg6_store_bytes,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
+ u32, action, void *, param, u32, param_len)
+{
+#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
+ struct seg6_bpf_srh_state *srh_state =
+ this_cpu_ptr(&seg6_bpf_srh_states);
+ struct ipv6_sr_hdr *srh;
+ int srhoff = 0;
+ int err;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ return -EINVAL;
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+
+ if (!srh_state->valid) {
+ if (unlikely((srh_state->hdrlen & 7) != 0))
+ return -EBADMSG;
+
+ srh->hdrlen = (u8)(srh_state->hdrlen >> 3);
+ if (unlikely(!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)))
+ return -EBADMSG;
+
+ srh_state->valid = 1;
+ }
+
+ switch (action) {
+ case SEG6_LOCAL_ACTION_END_X:
+ if (param_len != sizeof(struct in6_addr))
+ return -EINVAL;
+ return seg6_lookup_nexthop(skb, (struct in6_addr *)param, 0);
+ case SEG6_LOCAL_ACTION_END_T:
+ if (param_len != sizeof(int))
+ return -EINVAL;
+ return seg6_lookup_nexthop(skb, NULL, *(int *)param);
+ case SEG6_LOCAL_ACTION_END_B6:
+ err = bpf_push_seg6_encap(skb, BPF_LWT_ENCAP_SEG6_INLINE,
+ param, param_len);
+ if (!err)
+ srh_state->hdrlen =
+ ((struct ipv6_sr_hdr *)param)->hdrlen << 3;
+ return err;
+ case SEG6_LOCAL_ACTION_END_B6_ENCAP:
+ err = bpf_push_seg6_encap(skb, BPF_LWT_ENCAP_SEG6,
+ param, param_len);
+ if (!err)
+ srh_state->hdrlen =
+ ((struct ipv6_sr_hdr *)param)->hdrlen << 3;
+ return err;
+ default:
+ return -EINVAL;
+ }
+#else /* CONFIG_IPV6_SEG6_BPF */
+ return -EOPNOTSUPP;
+#endif
+}
+
+static const struct bpf_func_proto bpf_lwt_seg6_action_proto = {
+ .func = bpf_lwt_seg6_action,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+BPF_CALL_3(bpf_lwt_seg6_adjust_srh, struct sk_buff *, skb, u32, offset,
+ s32, len)
+{
+#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
+ struct seg6_bpf_srh_state *srh_state =
+ this_cpu_ptr(&seg6_bpf_srh_states);
+ void *srh_end, *srh_tlvs, *ptr;
+ struct ipv6_sr_hdr *srh;
+ struct ipv6hdr *hdr;
+ int srhoff = 0;
+ int ret;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ return -EINVAL;
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+
+ srh_tlvs = (void *)((unsigned char *)srh + sizeof(*srh) +
+ ((srh->first_segment + 1) << 4));
+ srh_end = (void *)((unsigned char *)srh + sizeof(*srh) +
+ srh_state->hdrlen);
+ ptr = skb->data + offset;
+
+ if (unlikely(ptr < srh_tlvs || ptr > srh_end))
+ return -EFAULT;
+ if (unlikely(len < 0 && (void *)((char *)ptr - len) > srh_end))
+ return -EFAULT;
+
+ if (len > 0) {
+ ret = skb_cow_head(skb, len);
+ if (unlikely(ret < 0))
+ return ret;
+
+ ret = bpf_skb_net_hdr_push(skb, offset, len);
+ } else {
+ ret = bpf_skb_net_hdr_pop(skb, offset, -1 * len);
+ }
+
+ bpf_compute_data_pointers(skb);
+ if (unlikely(ret < 0))
+ return ret;
+
+ hdr = (struct ipv6hdr *)skb->data;
+ hdr->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+
+ srh_state->hdrlen += len;
+ srh_state->valid = 0;
+ return 0;
+#else /* CONFIG_IPV6_SEG6_BPF */
+ return -EOPNOTSUPP;
+#endif
+}
+
+static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
+ .func = bpf_lwt_seg6_adjust_srh,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_ANYTHING,
+};
+
+bool bpf_helper_changes_pkt_data(void *func)
+{
+ if (func == bpf_skb_vlan_push ||
+ func == bpf_skb_vlan_pop ||
+ func == bpf_skb_store_bytes ||
+ func == bpf_skb_change_proto ||
+ func == bpf_skb_change_head ||
+ func == bpf_skb_change_tail ||
+ func == bpf_skb_adjust_room ||
+ func == bpf_skb_pull_data ||
+ func == bpf_clone_redirect ||
+ func == bpf_l3_csum_replace ||
+ func == bpf_l4_csum_replace ||
+ func == bpf_xdp_adjust_head ||
+ func == bpf_xdp_adjust_meta ||
+ func == bpf_msg_pull_data ||
+ func == bpf_xdp_adjust_tail ||
+ func == bpf_lwt_push_encap ||
+ func == bpf_lwt_seg6_store_bytes ||
+ func == bpf_lwt_seg6_adjust_srh ||
+ func == bpf_lwt_seg6_action
+ )
+ return true;
+
+ return false;
+}
+
static const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
{
@@ -3781,6 +4666,8 @@
switch (func_id) {
case BPF_FUNC_skb_load_bytes:
return &bpf_skb_load_bytes_proto;
+ case BPF_FUNC_skb_load_bytes_relative:
+ return &bpf_skb_load_bytes_relative_proto;
case BPF_FUNC_get_socket_cookie:
return &bpf_get_socket_cookie_proto;
case BPF_FUNC_get_socket_uid:
@@ -3798,6 +4685,8 @@
return &bpf_skb_store_bytes_proto;
case BPF_FUNC_skb_load_bytes:
return &bpf_skb_load_bytes_proto;
+ case BPF_FUNC_skb_load_bytes_relative:
+ return &bpf_skb_load_bytes_relative_proto;
case BPF_FUNC_skb_pull_data:
return &bpf_skb_pull_data_proto;
case BPF_FUNC_csum_diff:
@@ -3852,6 +4741,12 @@
return &bpf_get_socket_cookie_proto;
case BPF_FUNC_get_socket_uid:
return &bpf_get_socket_uid_proto;
+#ifdef CONFIG_XFRM
+ case BPF_FUNC_skb_get_xfrm_state:
+ return &bpf_skb_get_xfrm_state_proto;
+#endif
+ case BPF_FUNC_fib_lookup:
+ return &bpf_skb_fib_lookup_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -3875,33 +4770,10 @@
return &bpf_xdp_redirect_proto;
case BPF_FUNC_redirect_map:
return &bpf_xdp_redirect_map_proto;
- default:
- return bpf_base_func_proto(func_id);
- }
-}
-
-static const struct bpf_func_proto *
-lwt_inout_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
-{
- switch (func_id) {
- case BPF_FUNC_skb_load_bytes:
- return &bpf_skb_load_bytes_proto;
- case BPF_FUNC_skb_pull_data:
- return &bpf_skb_pull_data_proto;
- case BPF_FUNC_csum_diff:
- return &bpf_csum_diff_proto;
- case BPF_FUNC_get_cgroup_classid:
- return &bpf_get_cgroup_classid_proto;
- case BPF_FUNC_get_route_realm:
- return &bpf_get_route_realm_proto;
- case BPF_FUNC_get_hash_recalc:
- return &bpf_get_hash_recalc_proto;
- case BPF_FUNC_perf_event_output:
- return &bpf_skb_event_output_proto;
- case BPF_FUNC_get_smp_processor_id:
- return &bpf_get_smp_processor_id_proto;
- case BPF_FUNC_skb_under_cgroup:
- return &bpf_skb_under_cgroup_proto;
+ case BPF_FUNC_xdp_adjust_tail:
+ return &bpf_xdp_adjust_tail_proto;
+ case BPF_FUNC_fib_lookup:
+ return &bpf_xdp_fib_lookup_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -3919,6 +4791,8 @@
return &bpf_sock_ops_cb_flags_set_proto;
case BPF_FUNC_sock_map_update:
return &bpf_sock_map_update_proto;
+ case BPF_FUNC_sock_hash_update:
+ return &bpf_sock_hash_update_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -3930,6 +4804,8 @@
switch (func_id) {
case BPF_FUNC_msg_redirect_map:
return &bpf_msg_redirect_map_proto;
+ case BPF_FUNC_msg_redirect_hash:
+ return &bpf_msg_redirect_hash_proto;
case BPF_FUNC_msg_apply_bytes:
return &bpf_msg_apply_bytes_proto;
case BPF_FUNC_msg_cork_bytes:
@@ -3961,12 +4837,52 @@
return &bpf_get_socket_uid_proto;
case BPF_FUNC_sk_redirect_map:
return &bpf_sk_redirect_map_proto;
+ case BPF_FUNC_sk_redirect_hash:
+ return &bpf_sk_redirect_hash_proto;
default:
return bpf_base_func_proto(func_id);
}
}
static const struct bpf_func_proto *
+lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+ switch (func_id) {
+ case BPF_FUNC_skb_load_bytes:
+ return &bpf_skb_load_bytes_proto;
+ case BPF_FUNC_skb_pull_data:
+ return &bpf_skb_pull_data_proto;
+ case BPF_FUNC_csum_diff:
+ return &bpf_csum_diff_proto;
+ case BPF_FUNC_get_cgroup_classid:
+ return &bpf_get_cgroup_classid_proto;
+ case BPF_FUNC_get_route_realm:
+ return &bpf_get_route_realm_proto;
+ case BPF_FUNC_get_hash_recalc:
+ return &bpf_get_hash_recalc_proto;
+ case BPF_FUNC_perf_event_output:
+ return &bpf_skb_event_output_proto;
+ case BPF_FUNC_get_smp_processor_id:
+ return &bpf_get_smp_processor_id_proto;
+ case BPF_FUNC_skb_under_cgroup:
+ return &bpf_skb_under_cgroup_proto;
+ default:
+ return bpf_base_func_proto(func_id);
+ }
+}
+
+static const struct bpf_func_proto *
+lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+ switch (func_id) {
+ case BPF_FUNC_lwt_push_encap:
+ return &bpf_lwt_push_encap_proto;
+ default:
+ return lwt_out_func_proto(func_id, prog);
+ }
+}
+
+static const struct bpf_func_proto *
lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
@@ -3997,7 +4913,22 @@
case BPF_FUNC_set_hash_invalid:
return &bpf_set_hash_invalid_proto;
default:
- return lwt_inout_func_proto(func_id, prog);
+ return lwt_out_func_proto(func_id, prog);
+ }
+}
+
+static const struct bpf_func_proto *
+lwt_seg6local_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+ switch (func_id) {
+ case BPF_FUNC_lwt_seg6_store_bytes:
+ return &bpf_lwt_seg6_store_bytes_proto;
+ case BPF_FUNC_lwt_seg6_action:
+ return &bpf_lwt_seg6_action_proto;
+ case BPF_FUNC_lwt_seg6_adjust_srh:
+ return &bpf_lwt_seg6_adjust_srh_proto;
+ default:
+ return lwt_out_func_proto(func_id, prog);
}
}
@@ -4105,7 +5036,6 @@
return bpf_skb_is_valid_access(off, size, type, prog, info);
}
-
/* Attach type specific accesses */
static bool __sock_filter_check_attach_type(int off,
enum bpf_access_type access_type,
@@ -4221,6 +5151,41 @@
return insn - insn_buf;
}
+static int bpf_gen_ld_abs(const struct bpf_insn *orig,
+ struct bpf_insn *insn_buf)
+{
+ bool indirect = BPF_MODE(orig->code) == BPF_IND;
+ struct bpf_insn *insn = insn_buf;
+
+ /* We're guaranteed here that CTX is in R6. */
+ *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_CTX);
+ if (!indirect) {
+ *insn++ = BPF_MOV64_IMM(BPF_REG_2, orig->imm);
+ } else {
+ *insn++ = BPF_MOV64_REG(BPF_REG_2, orig->src_reg);
+ if (orig->imm)
+ *insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, orig->imm);
+ }
+
+ switch (BPF_SIZE(orig->code)) {
+ case BPF_B:
+ *insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_8_no_cache);
+ break;
+ case BPF_H:
+ *insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_16_no_cache);
+ break;
+ case BPF_W:
+ *insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_32_no_cache);
+ break;
+ }
+
+ *insn++ = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 2);
+ *insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_0, BPF_REG_0);
+ *insn++ = BPF_EXIT_INSN();
+
+ return insn - insn_buf;
+}
+
static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
const struct bpf_prog *prog)
{
@@ -4279,8 +5244,15 @@
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
- if (type == BPF_WRITE)
+ if (type == BPF_WRITE) {
+ if (bpf_prog_is_dev_bound(prog->aux)) {
+ switch (off) {
+ case offsetof(struct xdp_md, rx_queue_index):
+ return __is_valid_xdp_access(off, size);
+ }
+ }
return false;
+ }
switch (off) {
case offsetof(struct xdp_md, data):
@@ -4465,18 +5437,23 @@
switch (off) {
case offsetof(struct sk_msg_md, data):
info->reg_type = PTR_TO_PACKET;
+ if (size != sizeof(__u64))
+ return false;
break;
case offsetof(struct sk_msg_md, data_end):
info->reg_type = PTR_TO_PACKET_END;
+ if (size != sizeof(__u64))
+ return false;
break;
+ default:
+ if (size != sizeof(__u32))
+ return false;
}
if (off < 0 || off >= sizeof(struct sk_msg_md))
return false;
if (off % size != 0)
return false;
- if (size != sizeof(__u64))
- return false;
return true;
}
@@ -5152,7 +6129,8 @@
break;
case offsetof(struct bpf_sock_ops, local_ip4):
- BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_rcv_saddr) != 4);
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+ skc_rcv_saddr) != 4);
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
struct bpf_sock_ops_kern, sk),
@@ -5469,6 +6447,7 @@
struct bpf_prog *prog, u32 *target_size)
{
struct bpf_insn *insn = insn_buf;
+ int off;
switch (si->off) {
case offsetof(struct sk_msg_md, data):
@@ -5481,6 +6460,107 @@
si->dst_reg, si->src_reg,
offsetof(struct sk_msg_buff, data_end));
break;
+ case offsetof(struct sk_msg_md, family):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_family) != 2);
+
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct sk_msg_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_msg_buff, sk));
+ *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common, skc_family));
+ break;
+
+ case offsetof(struct sk_msg_md, remote_ip4):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_daddr) != 4);
+
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct sk_msg_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_msg_buff, sk));
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common, skc_daddr));
+ break;
+
+ case offsetof(struct sk_msg_md, local_ip4):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+ skc_rcv_saddr) != 4);
+
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct sk_msg_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_msg_buff, sk));
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common,
+ skc_rcv_saddr));
+ break;
+
+ case offsetof(struct sk_msg_md, remote_ip6[0]) ...
+ offsetof(struct sk_msg_md, remote_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+ skc_v6_daddr.s6_addr32[0]) != 4);
+
+ off = si->off;
+ off -= offsetof(struct sk_msg_md, remote_ip6[0]);
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct sk_msg_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_msg_buff, sk));
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common,
+ skc_v6_daddr.s6_addr32[0]) +
+ off);
+#else
+ *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+ break;
+
+ case offsetof(struct sk_msg_md, local_ip6[0]) ...
+ offsetof(struct sk_msg_md, local_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common,
+ skc_v6_rcv_saddr.s6_addr32[0]) != 4);
+
+ off = si->off;
+ off -= offsetof(struct sk_msg_md, local_ip6[0]);
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct sk_msg_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_msg_buff, sk));
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common,
+ skc_v6_rcv_saddr.s6_addr32[0]) +
+ off);
+#else
+ *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+ break;
+
+ case offsetof(struct sk_msg_md, remote_port):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_dport) != 2);
+
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct sk_msg_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_msg_buff, sk));
+ *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common, skc_dport));
+#ifndef __BIG_ENDIAN_BITFIELD
+ *insn++ = BPF_ALU32_IMM(BPF_LSH, si->dst_reg, 16);
+#endif
+ break;
+
+ case offsetof(struct sk_msg_md, local_port):
+ BUILD_BUG_ON(FIELD_SIZEOF(struct sock_common, skc_num) != 2);
+
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct sk_msg_buff, sk),
+ si->dst_reg, si->src_reg,
+ offsetof(struct sk_msg_buff, sk));
+ *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
+ offsetof(struct sock_common, skc_num));
+ break;
}
return insn - insn_buf;
@@ -5490,6 +6570,7 @@
.get_func_proto = sk_filter_func_proto,
.is_valid_access = sk_filter_is_valid_access,
.convert_ctx_access = bpf_convert_ctx_access,
+ .gen_ld_abs = bpf_gen_ld_abs,
};
const struct bpf_prog_ops sk_filter_prog_ops = {
@@ -5501,6 +6582,7 @@
.is_valid_access = tc_cls_act_is_valid_access,
.convert_ctx_access = tc_cls_act_convert_ctx_access,
.gen_prologue = tc_cls_act_prologue,
+ .gen_ld_abs = bpf_gen_ld_abs,
};
const struct bpf_prog_ops tc_cls_act_prog_ops = {
@@ -5527,13 +6609,23 @@
.test_run = bpf_prog_test_run_skb,
};
-const struct bpf_verifier_ops lwt_inout_verifier_ops = {
- .get_func_proto = lwt_inout_func_proto,
+const struct bpf_verifier_ops lwt_in_verifier_ops = {
+ .get_func_proto = lwt_in_func_proto,
.is_valid_access = lwt_is_valid_access,
.convert_ctx_access = bpf_convert_ctx_access,
};
-const struct bpf_prog_ops lwt_inout_prog_ops = {
+const struct bpf_prog_ops lwt_in_prog_ops = {
+ .test_run = bpf_prog_test_run_skb,
+};
+
+const struct bpf_verifier_ops lwt_out_verifier_ops = {
+ .get_func_proto = lwt_out_func_proto,
+ .is_valid_access = lwt_is_valid_access,
+ .convert_ctx_access = bpf_convert_ctx_access,
+};
+
+const struct bpf_prog_ops lwt_out_prog_ops = {
.test_run = bpf_prog_test_run_skb,
};
@@ -5548,6 +6640,16 @@
.test_run = bpf_prog_test_run_skb,
};
+const struct bpf_verifier_ops lwt_seg6local_verifier_ops = {
+ .get_func_proto = lwt_seg6local_func_proto,
+ .is_valid_access = lwt_is_valid_access,
+ .convert_ctx_access = bpf_convert_ctx_access,
+};
+
+const struct bpf_prog_ops lwt_seg6local_prog_ops = {
+ .test_run = bpf_prog_test_run_skb,
+};
+
const struct bpf_verifier_ops cg_sock_verifier_ops = {
.get_func_proto = sock_filter_func_proto,
.is_valid_access = sock_filter_is_valid_access,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d29f09b..4fc1e84 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1253,7 +1253,7 @@
EXPORT_SYMBOL(skb_get_hash_perturb);
u32 __skb_get_poff(const struct sk_buff *skb, void *data,
- const struct flow_keys *keys, int hlen)
+ const struct flow_keys_basic *keys, int hlen)
{
u32 poff = keys->control.thoff;
@@ -1314,9 +1314,9 @@
*/
u32 skb_get_poff(const struct sk_buff *skb)
{
- struct flow_keys keys;
+ struct flow_keys_basic keys;
- if (!skb_flow_dissect_flow_keys(skb, &keys, 0))
+ if (!skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0))
return 0;
return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
@@ -1403,7 +1403,7 @@
},
};
-static const struct flow_dissector_key flow_keys_buf_dissector_keys[] = {
+static const struct flow_dissector_key flow_keys_basic_dissector_keys[] = {
{
.key_id = FLOW_DISSECTOR_KEY_CONTROL,
.offset = offsetof(struct flow_keys, control),
@@ -1417,7 +1417,8 @@
struct flow_dissector flow_keys_dissector __read_mostly;
EXPORT_SYMBOL(flow_keys_dissector);
-struct flow_dissector flow_keys_buf_dissector __read_mostly;
+struct flow_dissector flow_keys_basic_dissector __read_mostly;
+EXPORT_SYMBOL(flow_keys_basic_dissector);
static int __init init_default_flow_dissectors(void)
{
@@ -1427,9 +1428,9 @@
skb_flow_dissector_init(&flow_keys_dissector_symmetric,
flow_keys_dissector_symmetric_keys,
ARRAY_SIZE(flow_keys_dissector_symmetric_keys));
- skb_flow_dissector_init(&flow_keys_buf_dissector,
- flow_keys_buf_dissector_keys,
- ARRAY_SIZE(flow_keys_buf_dissector_keys));
+ skb_flow_dissector_init(&flow_keys_basic_dissector,
+ flow_keys_basic_dissector_keys,
+ ARRAY_SIZE(flow_keys_basic_dissector_keys));
return 0;
}
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ce51986..5afae29 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -820,7 +820,8 @@
write_lock(&n->lock);
state = n->nud_state;
- if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
+ if ((state & (NUD_PERMANENT | NUD_IN_TIMER)) ||
+ (n->flags & NTF_EXT_LEARNED)) {
write_unlock(&n->lock);
goto next_elt;
}
@@ -1136,6 +1137,8 @@
if (neigh->dead)
goto out;
+ neigh_update_ext_learned(neigh, flags, ¬ify);
+
if (!(new & NUD_VALID)) {
neigh_del_timer(neigh);
if (old & NUD_CONNECTED)
@@ -1781,6 +1784,9 @@
flags &= ~NEIGH_UPDATE_F_OVERRIDE;
}
+ if (ndm->ndm_flags & NTF_EXT_LEARNED)
+ flags |= NEIGH_UPDATE_F_EXT_LEARNED;
+
if (ndm->ndm_flags & NTF_USE) {
neigh_event_send(neigh, NULL);
err = 0;
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index 38093458..419af6d 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -35,10 +35,6 @@
#include <trace/events/tcp.h>
#include <trace/events/fib.h>
#include <trace/events/qdisc.h>
-#if IS_ENABLED(CONFIG_IPV6)
-#include <trace/events/fib6.h>
-EXPORT_TRACEPOINT_SYMBOL_GPL(fib6_table_lookup);
-#endif
#if IS_ENABLED(CONFIG_BRIDGE)
#include <trace/events/bridge.h>
EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
new file mode 100644
index 0000000..68bf0720
--- /dev/null
+++ b/net/core/page_pool.c
@@ -0,0 +1,317 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * page_pool.c
+ * Author: Jesper Dangaard Brouer <netoptimizer@brouer.com>
+ * Copyright (C) 2016 Red Hat, Inc.
+ */
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+#include <net/page_pool.h>
+#include <linux/dma-direction.h>
+#include <linux/dma-mapping.h>
+#include <linux/page-flags.h>
+#include <linux/mm.h> /* for __put_page() */
+
+static int page_pool_init(struct page_pool *pool,
+ const struct page_pool_params *params)
+{
+ unsigned int ring_qsize = 1024; /* Default */
+
+ memcpy(&pool->p, params, sizeof(pool->p));
+
+ /* Validate only known flags were used */
+ if (pool->p.flags & ~(PP_FLAG_ALL))
+ return -EINVAL;
+
+ if (pool->p.pool_size)
+ ring_qsize = pool->p.pool_size;
+
+ /* Sanity limit mem that can be pinned down */
+ if (ring_qsize > 32768)
+ return -E2BIG;
+
+ /* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ * DMA_BIDIRECTIONAL is for allowing page used for DMA sending,
+ * which is the XDP_TX use-case.
+ */
+ if ((pool->p.dma_dir != DMA_FROM_DEVICE) &&
+ (pool->p.dma_dir != DMA_BIDIRECTIONAL))
+ return -EINVAL;
+
+ if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0)
+ return -ENOMEM;
+
+ return 0;
+}
+
+struct page_pool *page_pool_create(const struct page_pool_params *params)
+{
+ struct page_pool *pool;
+ int err = 0;
+
+ pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, params->nid);
+ if (!pool)
+ return ERR_PTR(-ENOMEM);
+
+ err = page_pool_init(pool, params);
+ if (err < 0) {
+ pr_warn("%s() gave up with errno %d\n", __func__, err);
+ kfree(pool);
+ return ERR_PTR(err);
+ }
+ return pool;
+}
+EXPORT_SYMBOL(page_pool_create);
+
+/* fast path */
+static struct page *__page_pool_get_cached(struct page_pool *pool)
+{
+ struct ptr_ring *r = &pool->ring;
+ struct page *page;
+
+ /* Quicker fallback, avoid locks when ring is empty */
+ if (__ptr_ring_empty(r))
+ return NULL;
+
+ /* Test for safe-context, caller should provide this guarantee */
+ if (likely(in_serving_softirq())) {
+ if (likely(pool->alloc.count)) {
+ /* Fast-path */
+ page = pool->alloc.cache[--pool->alloc.count];
+ return page;
+ }
+ /* Slower-path: Alloc array empty, time to refill
+ *
+ * Open-coded bulk ptr_ring consumer.
+ *
+ * Discussion: the ring consumer lock is not really
+ * needed due to the softirq/NAPI protection, but
+ * later need the ability to reclaim pages on the
+ * ring. Thus, keeping the locks.
+ */
+ spin_lock(&r->consumer_lock);
+ while ((page = __ptr_ring_consume(r))) {
+ if (pool->alloc.count == PP_ALLOC_CACHE_REFILL)
+ break;
+ pool->alloc.cache[pool->alloc.count++] = page;
+ }
+ spin_unlock(&r->consumer_lock);
+ return page;
+ }
+
+ /* Slow-path: Get page from locked ring queue */
+ page = ptr_ring_consume(&pool->ring);
+ return page;
+}
+
+/* slow path */
+noinline
+static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
+ gfp_t _gfp)
+{
+ struct page *page;
+ gfp_t gfp = _gfp;
+ dma_addr_t dma;
+
+ /* We could always set __GFP_COMP, and avoid this branch, as
+ * prep_new_page() can handle order-0 with __GFP_COMP.
+ */
+ if (pool->p.order)
+ gfp |= __GFP_COMP;
+
+ /* FUTURE development:
+ *
+ * Current slow-path essentially falls back to single page
+ * allocations, which doesn't improve performance. This code
+ * need bulk allocation support from the page allocator code.
+ */
+
+ /* Cache was empty, do real allocation */
+ page = alloc_pages_node(pool->p.nid, gfp, pool->p.order);
+ if (!page)
+ return NULL;
+
+ if (!(pool->p.flags & PP_FLAG_DMA_MAP))
+ goto skip_dma_map;
+
+ /* Setup DMA mapping: use page->private for DMA-addr
+ * This mapping is kept for lifetime of page, until leaving pool.
+ */
+ dma = dma_map_page(pool->p.dev, page, 0,
+ (PAGE_SIZE << pool->p.order),
+ pool->p.dma_dir);
+ if (dma_mapping_error(pool->p.dev, dma)) {
+ put_page(page);
+ return NULL;
+ }
+ set_page_private(page, dma); /* page->private = dma; */
+
+skip_dma_map:
+ /* When page just alloc'ed is should/must have refcnt 1. */
+ return page;
+}
+
+/* For using page_pool replace: alloc_pages() API calls, but provide
+ * synchronization guarantee for allocation side.
+ */
+struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
+{
+ struct page *page;
+
+ /* Fast-path: Get a page from cache */
+ page = __page_pool_get_cached(pool);
+ if (page)
+ return page;
+
+ /* Slow-path: cache empty, do real allocation */
+ page = __page_pool_alloc_pages_slow(pool, gfp);
+ return page;
+}
+EXPORT_SYMBOL(page_pool_alloc_pages);
+
+/* Cleanup page_pool state from page */
+static void __page_pool_clean_page(struct page_pool *pool,
+ struct page *page)
+{
+ if (!(pool->p.flags & PP_FLAG_DMA_MAP))
+ return;
+
+ /* DMA unmap */
+ dma_unmap_page(pool->p.dev, page_private(page),
+ PAGE_SIZE << pool->p.order, pool->p.dma_dir);
+ set_page_private(page, 0);
+}
+
+/* Return a page to the page allocator, cleaning up our state */
+static void __page_pool_return_page(struct page_pool *pool, struct page *page)
+{
+ __page_pool_clean_page(pool, page);
+ put_page(page);
+ /* An optimization would be to call __free_pages(page, pool->p.order)
+ * knowing page is not part of page-cache (thus avoiding a
+ * __page_cache_release() call).
+ */
+}
+
+static bool __page_pool_recycle_into_ring(struct page_pool *pool,
+ struct page *page)
+{
+ int ret;
+ /* BH protection not needed if current is serving softirq */
+ if (in_serving_softirq())
+ ret = ptr_ring_produce(&pool->ring, page);
+ else
+ ret = ptr_ring_produce_bh(&pool->ring, page);
+
+ return (ret == 0) ? true : false;
+}
+
+/* Only allow direct recycling in special circumstances, into the
+ * alloc side cache. E.g. during RX-NAPI processing for XDP_DROP use-case.
+ *
+ * Caller must provide appropriate safe context.
+ */
+static bool __page_pool_recycle_direct(struct page *page,
+ struct page_pool *pool)
+{
+ if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
+ return false;
+
+ /* Caller MUST have verified/know (page_ref_count(page) == 1) */
+ pool->alloc.cache[pool->alloc.count++] = page;
+ return true;
+}
+
+void __page_pool_put_page(struct page_pool *pool,
+ struct page *page, bool allow_direct)
+{
+ /* This allocator is optimized for the XDP mode that uses
+ * one-frame-per-page, but have fallbacks that act like the
+ * regular page allocator APIs.
+ *
+ * refcnt == 1 means page_pool owns page, and can recycle it.
+ */
+ if (likely(page_ref_count(page) == 1)) {
+ /* Read barrier done in page_ref_count / READ_ONCE */
+
+ if (allow_direct && in_serving_softirq())
+ if (__page_pool_recycle_direct(page, pool))
+ return;
+
+ if (!__page_pool_recycle_into_ring(pool, page)) {
+ /* Cache full, fallback to free pages */
+ __page_pool_return_page(pool, page);
+ }
+ return;
+ }
+ /* Fallback/non-XDP mode: API user have elevated refcnt.
+ *
+ * Many drivers split up the page into fragments, and some
+ * want to keep doing this to save memory and do refcnt based
+ * recycling. Support this use case too, to ease drivers
+ * switching between XDP/non-XDP.
+ *
+ * In-case page_pool maintains the DMA mapping, API user must
+ * call page_pool_put_page once. In this elevated refcnt
+ * case, the DMA is unmapped/released, as driver is likely
+ * doing refcnt based recycle tricks, meaning another process
+ * will be invoking put_page.
+ */
+ __page_pool_clean_page(pool, page);
+ put_page(page);
+}
+EXPORT_SYMBOL(__page_pool_put_page);
+
+static void __page_pool_empty_ring(struct page_pool *pool)
+{
+ struct page *page;
+
+ /* Empty recycle ring */
+ while ((page = ptr_ring_consume(&pool->ring))) {
+ /* Verify the refcnt invariant of cached pages */
+ if (!(page_ref_count(page) == 1))
+ pr_crit("%s() page_pool refcnt %d violation\n",
+ __func__, page_ref_count(page));
+
+ __page_pool_return_page(pool, page);
+ }
+}
+
+static void __page_pool_destroy_rcu(struct rcu_head *rcu)
+{
+ struct page_pool *pool;
+
+ pool = container_of(rcu, struct page_pool, rcu);
+
+ WARN(pool->alloc.count, "API usage violation");
+
+ __page_pool_empty_ring(pool);
+ ptr_ring_cleanup(&pool->ring, NULL);
+ kfree(pool);
+}
+
+/* Cleanup and release resources */
+void page_pool_destroy(struct page_pool *pool)
+{
+ struct page *page;
+
+ /* Empty alloc cache, assume caller made sure this is
+ * no-longer in use, and page_pool_alloc_pages() cannot be
+ * call concurrently.
+ */
+ while (pool->alloc.count) {
+ page = pool->alloc.cache[--pool->alloc.count];
+ __page_pool_return_page(pool, page);
+ }
+
+ /* No more consumers should exist, but producers could still
+ * be in-flight.
+ */
+ __page_pool_empty_ring(pool);
+
+ /* An xdp_mem_allocator can still ref page_pool pointer */
+ call_rcu(&pool->rcu, __page_pool_destroy_rcu);
+}
+EXPORT_SYMBOL(page_pool_destroy);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4593692..8080254 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -785,13 +785,15 @@
long expires, u32 error)
{
struct rta_cacheinfo ci = {
- .rta_lastuse = jiffies_delta_to_clock_t(jiffies - dst->lastuse),
- .rta_used = dst->__use,
- .rta_clntref = atomic_read(&(dst->__refcnt)),
.rta_error = error,
.rta_id = id,
};
+ if (dst) {
+ ci.rta_lastuse = jiffies_delta_to_clock_t(jiffies - dst->lastuse);
+ ci.rta_used = dst->__use;
+ ci.rta_clntref = atomic_read(&dst->__refcnt);
+ }
if (expires) {
unsigned long clock;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 345b518..c642304 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1305,7 +1305,7 @@
skb->inner_mac_header += off;
}
-static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
+void skb_copy_header(struct sk_buff *new, const struct sk_buff *old)
{
__copy_skb_header(new, old);
@@ -1313,6 +1313,7 @@
skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
}
+EXPORT_SYMBOL(skb_copy_header);
static inline int skb_alloc_rx_flag(const struct sk_buff *skb)
{
@@ -1355,7 +1356,7 @@
BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len));
- copy_skb_header(n, skb);
+ skb_copy_header(n, skb);
return n;
}
EXPORT_SYMBOL(skb_copy);
@@ -1419,7 +1420,7 @@
skb_clone_fraglist(n);
}
- copy_skb_header(n, skb);
+ skb_copy_header(n, skb);
out:
return n;
}
@@ -1599,7 +1600,7 @@
BUG_ON(skb_copy_bits(skb, -head_copy_len, n->head + head_copy_off,
skb->len + head_copy_len));
- copy_skb_header(n, skb);
+ skb_copy_header(n, skb);
skb_headers_offset_update(n, newheadroom - oldheadroom);
@@ -1839,6 +1840,20 @@
}
EXPORT_SYMBOL(___pskb_trim);
+/* Note : use pskb_trim_rcsum() instead of calling this directly
+ */
+int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
+{
+ if (skb->ip_summed == CHECKSUM_COMPLETE) {
+ int delta = skb->len - len;
+
+ skb->csum = csum_sub(skb->csum,
+ skb_checksum(skb, len, delta, 0));
+ }
+ return __pskb_trim(skb, len);
+}
+EXPORT_SYMBOL(pskb_trim_rcsum_slow);
+
/**
* __pskb_pull_tail - advance tail of skb header
* @skb: buffer to reallocate
@@ -4926,6 +4941,8 @@
thlen = tcp_hdrlen(skb);
} else if (unlikely(skb_is_gso_sctp(skb))) {
thlen = sizeof(struct sctphdr);
+ } else if (shinfo->gso_type & SKB_GSO_UDP_L4) {
+ thlen = sizeof(struct udphdr);
}
/* UFO sets gso_size to the size of the fragmentation
* payload, i.e. the size of the L4 (UDP) header is already
diff --git a/net/core/sock.c b/net/core/sock.c
index 3b6d028..435a0ba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -226,7 +226,8 @@
x "AF_RXRPC" , x "AF_ISDN" , x "AF_PHONET" , \
x "AF_IEEE802154", x "AF_CAIF" , x "AF_ALG" , \
x "AF_NFC" , x "AF_VSOCK" , x "AF_KCM" , \
- x "AF_QIPCRTR", x "AF_SMC" , x "AF_MAX"
+ x "AF_QIPCRTR", x "AF_SMC" , x "AF_XDP" , \
+ x "AF_MAX"
static const char *const af_family_key_strings[AF_MAX+1] = {
_sock_locks("sk_lock-")
@@ -262,7 +263,8 @@
"rlock-AF_RXRPC" , "rlock-AF_ISDN" , "rlock-AF_PHONET" ,
"rlock-AF_IEEE802154", "rlock-AF_CAIF" , "rlock-AF_ALG" ,
"rlock-AF_NFC" , "rlock-AF_VSOCK" , "rlock-AF_KCM" ,
- "rlock-AF_QIPCRTR", "rlock-AF_SMC" , "rlock-AF_MAX"
+ "rlock-AF_QIPCRTR", "rlock-AF_SMC" , "rlock-AF_XDP" ,
+ "rlock-AF_MAX"
};
static const char *const af_family_wlock_key_strings[AF_MAX+1] = {
"wlock-AF_UNSPEC", "wlock-AF_UNIX" , "wlock-AF_INET" ,
@@ -279,7 +281,8 @@
"wlock-AF_RXRPC" , "wlock-AF_ISDN" , "wlock-AF_PHONET" ,
"wlock-AF_IEEE802154", "wlock-AF_CAIF" , "wlock-AF_ALG" ,
"wlock-AF_NFC" , "wlock-AF_VSOCK" , "wlock-AF_KCM" ,
- "wlock-AF_QIPCRTR", "wlock-AF_SMC" , "wlock-AF_MAX"
+ "wlock-AF_QIPCRTR", "wlock-AF_SMC" , "wlock-AF_XDP" ,
+ "wlock-AF_MAX"
};
static const char *const af_family_elock_key_strings[AF_MAX+1] = {
"elock-AF_UNSPEC", "elock-AF_UNIX" , "elock-AF_INET" ,
@@ -296,7 +299,8 @@
"elock-AF_RXRPC" , "elock-AF_ISDN" , "elock-AF_PHONET" ,
"elock-AF_IEEE802154", "elock-AF_CAIF" , "elock-AF_ALG" ,
"elock-AF_NFC" , "elock-AF_VSOCK" , "elock-AF_KCM" ,
- "elock-AF_QIPCRTR", "elock-AF_SMC" , "elock-AF_MAX"
+ "elock-AF_QIPCRTR", "elock-AF_SMC" , "elock-AF_XDP" ,
+ "elock-AF_MAX"
};
/*
@@ -323,8 +327,8 @@
int sysctl_tstamp_allow_data __read_mostly = 1;
-struct static_key memalloc_socks = STATIC_KEY_INIT_FALSE;
-EXPORT_SYMBOL_GPL(memalloc_socks);
+DEFINE_STATIC_KEY_FALSE(memalloc_socks_key);
+EXPORT_SYMBOL_GPL(memalloc_socks_key);
/**
* sk_set_memalloc - sets %SOCK_MEMALLOC
@@ -338,7 +342,7 @@
{
sock_set_flag(sk, SOCK_MEMALLOC);
sk->sk_allocation |= __GFP_MEMALLOC;
- static_key_slow_inc(&memalloc_socks);
+ static_branch_inc(&memalloc_socks_key);
}
EXPORT_SYMBOL_GPL(sk_set_memalloc);
@@ -346,7 +350,7 @@
{
sock_reset_flag(sk, SOCK_MEMALLOC);
sk->sk_allocation &= ~__GFP_MEMALLOC;
- static_key_slow_dec(&memalloc_socks);
+ static_branch_dec(&memalloc_socks_key);
/*
* SOCK_MEMALLOC is allowed to ignore rmem limits to ensure forward
@@ -905,7 +909,10 @@
case SO_RCVLOWAT:
if (val < 0)
val = INT_MAX;
- sk->sk_rcvlowat = val ? : 1;
+ if (sock->ops->set_rcvlowat)
+ ret = sock->ops->set_rcvlowat(sk, val);
+ else
+ sk->sk_rcvlowat = val ? : 1;
break;
case SO_RCVTIMEO:
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 097a0f7..cb8c4e0 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -5,6 +5,10 @@
*/
#include <linux/types.h>
#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <linux/rhashtable.h>
+#include <net/page_pool.h>
#include <net/xdp.h>
@@ -13,6 +17,104 @@
#define REG_STATE_UNREGISTERED 0x2
#define REG_STATE_UNUSED 0x3
+static DEFINE_IDA(mem_id_pool);
+static DEFINE_MUTEX(mem_id_lock);
+#define MEM_ID_MAX 0xFFFE
+#define MEM_ID_MIN 1
+static int mem_id_next = MEM_ID_MIN;
+
+static bool mem_id_init; /* false */
+static struct rhashtable *mem_id_ht;
+
+struct xdp_mem_allocator {
+ struct xdp_mem_info mem;
+ union {
+ void *allocator;
+ struct page_pool *page_pool;
+ };
+ struct rhash_head node;
+ struct rcu_head rcu;
+};
+
+static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed)
+{
+ const u32 *k = data;
+ const u32 key = *k;
+
+ BUILD_BUG_ON(FIELD_SIZEOF(struct xdp_mem_allocator, mem.id)
+ != sizeof(u32));
+
+ /* Use cyclic increasing ID as direct hash key, see rht_bucket_index */
+ return key << RHT_HASH_RESERVED_SPACE;
+}
+
+static int xdp_mem_id_cmp(struct rhashtable_compare_arg *arg,
+ const void *ptr)
+{
+ const struct xdp_mem_allocator *xa = ptr;
+ u32 mem_id = *(u32 *)arg->key;
+
+ return xa->mem.id != mem_id;
+}
+
+static const struct rhashtable_params mem_id_rht_params = {
+ .nelem_hint = 64,
+ .head_offset = offsetof(struct xdp_mem_allocator, node),
+ .key_offset = offsetof(struct xdp_mem_allocator, mem.id),
+ .key_len = FIELD_SIZEOF(struct xdp_mem_allocator, mem.id),
+ .max_size = MEM_ID_MAX,
+ .min_size = 8,
+ .automatic_shrinking = true,
+ .hashfn = xdp_mem_id_hashfn,
+ .obj_cmpfn = xdp_mem_id_cmp,
+};
+
+static void __xdp_mem_allocator_rcu_free(struct rcu_head *rcu)
+{
+ struct xdp_mem_allocator *xa;
+
+ xa = container_of(rcu, struct xdp_mem_allocator, rcu);
+
+ /* Allow this ID to be reused */
+ ida_simple_remove(&mem_id_pool, xa->mem.id);
+
+ /* Notice, driver is expected to free the *allocator,
+ * e.g. page_pool, and MUST also use RCU free.
+ */
+
+ /* Poison memory */
+ xa->mem.id = 0xFFFF;
+ xa->mem.type = 0xF0F0;
+ xa->allocator = (void *)0xDEAD9001;
+
+ kfree(xa);
+}
+
+static void __xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
+{
+ struct xdp_mem_allocator *xa;
+ int id = xdp_rxq->mem.id;
+ int err;
+
+ if (id == 0)
+ return;
+
+ mutex_lock(&mem_id_lock);
+
+ xa = rhashtable_lookup(mem_id_ht, &id, mem_id_rht_params);
+ if (!xa) {
+ mutex_unlock(&mem_id_lock);
+ return;
+ }
+
+ err = rhashtable_remove_fast(mem_id_ht, &xa->node, mem_id_rht_params);
+ WARN_ON(err);
+
+ call_rcu(&xa->rcu, __xdp_mem_allocator_rcu_free);
+
+ mutex_unlock(&mem_id_lock);
+}
+
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq)
{
/* Simplify driver cleanup code paths, allow unreg "unused" */
@@ -21,8 +123,14 @@
WARN(!(xdp_rxq->reg_state == REG_STATE_REGISTERED), "Driver BUG");
+ __xdp_rxq_info_unreg_mem_model(xdp_rxq);
+
xdp_rxq->reg_state = REG_STATE_UNREGISTERED;
xdp_rxq->dev = NULL;
+
+ /* Reset mem info to defaults */
+ xdp_rxq->mem.id = 0;
+ xdp_rxq->mem.type = 0;
}
EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg);
@@ -71,3 +179,185 @@
return (xdp_rxq->reg_state == REG_STATE_REGISTERED);
}
EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg);
+
+static int __mem_id_init_hash_table(void)
+{
+ struct rhashtable *rht;
+ int ret;
+
+ if (unlikely(mem_id_init))
+ return 0;
+
+ rht = kzalloc(sizeof(*rht), GFP_KERNEL);
+ if (!rht)
+ return -ENOMEM;
+
+ ret = rhashtable_init(rht, &mem_id_rht_params);
+ if (ret < 0) {
+ kfree(rht);
+ return ret;
+ }
+ mem_id_ht = rht;
+ smp_mb(); /* mutex lock should provide enough pairing */
+ mem_id_init = true;
+
+ return 0;
+}
+
+/* Allocate a cyclic ID that maps to allocator pointer.
+ * See: https://www.kernel.org/doc/html/latest/core-api/idr.html
+ *
+ * Caller must lock mem_id_lock.
+ */
+static int __mem_id_cyclic_get(gfp_t gfp)
+{
+ int retries = 1;
+ int id;
+
+again:
+ id = ida_simple_get(&mem_id_pool, mem_id_next, MEM_ID_MAX, gfp);
+ if (id < 0) {
+ if (id == -ENOSPC) {
+ /* Cyclic allocator, reset next id */
+ if (retries--) {
+ mem_id_next = MEM_ID_MIN;
+ goto again;
+ }
+ }
+ return id; /* errno */
+ }
+ mem_id_next = id + 1;
+
+ return id;
+}
+
+static bool __is_supported_mem_type(enum xdp_mem_type type)
+{
+ if (type == MEM_TYPE_PAGE_POOL)
+ return is_page_pool_compiled_in();
+
+ if (type >= MEM_TYPE_MAX)
+ return false;
+
+ return true;
+}
+
+int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
+ enum xdp_mem_type type, void *allocator)
+{
+ struct xdp_mem_allocator *xdp_alloc;
+ gfp_t gfp = GFP_KERNEL;
+ int id, errno, ret;
+ void *ptr;
+
+ if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
+ WARN(1, "Missing register, driver bug");
+ return -EFAULT;
+ }
+
+ if (!__is_supported_mem_type(type))
+ return -EOPNOTSUPP;
+
+ xdp_rxq->mem.type = type;
+
+ if (!allocator) {
+ if (type == MEM_TYPE_PAGE_POOL)
+ return -EINVAL; /* Setup time check page_pool req */
+ return 0;
+ }
+
+ /* Delay init of rhashtable to save memory if feature isn't used */
+ if (!mem_id_init) {
+ mutex_lock(&mem_id_lock);
+ ret = __mem_id_init_hash_table();
+ mutex_unlock(&mem_id_lock);
+ if (ret < 0) {
+ WARN_ON(1);
+ return ret;
+ }
+ }
+
+ xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
+ if (!xdp_alloc)
+ return -ENOMEM;
+
+ mutex_lock(&mem_id_lock);
+ id = __mem_id_cyclic_get(gfp);
+ if (id < 0) {
+ errno = id;
+ goto err;
+ }
+ xdp_rxq->mem.id = id;
+ xdp_alloc->mem = xdp_rxq->mem;
+ xdp_alloc->allocator = allocator;
+
+ /* Insert allocator into ID lookup table */
+ ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
+ if (IS_ERR(ptr)) {
+ errno = PTR_ERR(ptr);
+ goto err;
+ }
+
+ mutex_unlock(&mem_id_lock);
+
+ return 0;
+err:
+ mutex_unlock(&mem_id_lock);
+ kfree(xdp_alloc);
+ return errno;
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
+
+/* XDP RX runs under NAPI protection, and in different delivery error
+ * scenarios (e.g. queue full), it is possible to return the xdp_frame
+ * while still leveraging this protection. The @napi_direct boolian
+ * is used for those calls sites. Thus, allowing for faster recycling
+ * of xdp_frames/pages in those cases.
+ */
+static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct)
+{
+ struct xdp_mem_allocator *xa;
+ struct page *page;
+
+ switch (mem->type) {
+ case MEM_TYPE_PAGE_POOL:
+ rcu_read_lock();
+ /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
+ xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
+ page = virt_to_head_page(data);
+ if (xa)
+ page_pool_put_page(xa->page_pool, page, napi_direct);
+ else
+ put_page(page);
+ rcu_read_unlock();
+ break;
+ case MEM_TYPE_PAGE_SHARED:
+ page_frag_free(data);
+ break;
+ case MEM_TYPE_PAGE_ORDER0:
+ page = virt_to_page(data); /* Assumes order0 page*/
+ put_page(page);
+ break;
+ default:
+ /* Not possible, checked in xdp_rxq_info_reg_mem_model() */
+ break;
+ }
+}
+
+void xdp_return_frame(struct xdp_frame *xdpf)
+{
+ __xdp_return(xdpf->data, &xdpf->mem, false);
+}
+EXPORT_SYMBOL_GPL(xdp_return_frame);
+
+void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
+{
+ __xdp_return(xdpf->data, &xdpf->mem, true);
+}
+EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
+
+void xdp_return_buff(struct xdp_buff *xdp)
+{
+ __xdp_return(xdp->data, &xdp->rxq->mem, true);
+}
+EXPORT_SYMBOL_GPL(xdp_return_buff);
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index bae7d78..d2f4e0c 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -176,6 +176,7 @@
[DCB_ATTR_IEEE_MAXRATE] = {.len = sizeof(struct ieee_maxrate)},
[DCB_ATTR_IEEE_QCN] = {.len = sizeof(struct ieee_qcn)},
[DCB_ATTR_IEEE_QCN_STATS] = {.len = sizeof(struct ieee_qcn_stats)},
+ [DCB_ATTR_DCB_BUFFER] = {.len = sizeof(struct dcbnl_buffer)},
};
/* DCB number of traffic classes nested attributes. */
@@ -1094,6 +1095,16 @@
return -EMSGSIZE;
}
+ if (ops->dcbnl_getbuffer) {
+ struct dcbnl_buffer buffer;
+
+ memset(&buffer, 0, sizeof(buffer));
+ err = ops->dcbnl_getbuffer(netdev, &buffer);
+ if (!err &&
+ nla_put(skb, DCB_ATTR_DCB_BUFFER, sizeof(buffer), &buffer))
+ return -EMSGSIZE;
+ }
+
app = nla_nest_start(skb, DCB_ATTR_IEEE_APP_TABLE);
if (!app)
return -EMSGSIZE;
@@ -1453,6 +1464,15 @@
goto err;
}
+ if (ieee[DCB_ATTR_DCB_BUFFER] && ops->dcbnl_setbuffer) {
+ struct dcbnl_buffer *buffer =
+ nla_data(ieee[DCB_ATTR_DCB_BUFFER]);
+
+ err = ops->dcbnl_setbuffer(netdev, buffer);
+ if (err)
+ goto err;
+ }
+
if (ieee[DCB_ATTR_IEEE_APP_TABLE]) {
struct nlattr *attr;
int rem;
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index c795c3f..7223669 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -121,13 +121,16 @@
static int dn_fib_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct fib_rule_hdr *frh,
- struct nlattr **tb)
+ struct nlattr **tb,
+ struct netlink_ext_ack *extack)
{
int err = -EINVAL;
struct dn_fib_rule *r = (struct dn_fib_rule *)rule;
- if (frh->tos)
+ if (frh->tos) {
+ NL_SET_ERR_MSG(extack, "Invalid tos value");
goto errout;
+ }
if (rule->table == RT_TABLE_UNSPEC) {
if (rule->action == FR_ACT_TO_TBL) {
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index bbf2c82..4183e4b 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -9,7 +9,7 @@
depends on HAVE_NET_DSA && MAY_USE_DEVLINK
depends on BRIDGE || BRIDGE=n
select NET_SWITCHDEV
- select PHYLIB
+ select PHYLINK
---help---
Say Y if you want to enable support for the hardware switches supported
by the Distributed Switch Architecture.
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 4772525..dc5d9af 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -272,7 +272,28 @@
case DSA_PORT_TYPE_UNUSED:
break;
case DSA_PORT_TYPE_CPU:
+ /* dp->index is used now as port_number. However
+ * CPU ports should have separate numbering
+ * independent from front panel port numbers.
+ */
+ devlink_port_attrs_set(&dp->devlink_port,
+ DEVLINK_PORT_FLAVOUR_CPU,
+ dp->index, false, 0);
+ err = dsa_port_link_register_of(dp);
+ if (err) {
+ dev_err(ds->dev, "failed to setup link for port %d.%d\n",
+ ds->index, dp->index);
+ return err;
+ }
+ break;
case DSA_PORT_TYPE_DSA:
+ /* dp->index is used now as port_number. However
+ * DSA ports should have separate numbering
+ * independent from front panel port numbers.
+ */
+ devlink_port_attrs_set(&dp->devlink_port,
+ DEVLINK_PORT_FLAVOUR_DSA,
+ dp->index, false, 0);
err = dsa_port_link_register_of(dp);
if (err) {
dev_err(ds->dev, "failed to setup link for port %d.%d\n",
@@ -281,6 +302,9 @@
}
break;
case DSA_PORT_TYPE_USER:
+ devlink_port_attrs_set(&dp->devlink_port,
+ DEVLINK_PORT_FLAVOUR_PHYSICAL,
+ dp->index, false, 0);
err = dsa_slave_create(dp);
if (err)
dev_err(ds->dev, "failed to create slave for port %d.%d\n",
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 0537314..3964c6f 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -75,15 +75,6 @@
/* DSA port data, such as switch, port index, etc. */
struct dsa_port *dp;
- /*
- * The phylib phy_device pointer for the PHY connected
- * to this port.
- */
- phy_interface_t phy_interface;
- int old_link;
- int old_pause;
- int old_duplex;
-
#ifdef CONFIG_NET_POLL_CONTROLLER
struct netpoll *netpoll;
#endif
diff --git a/net/dsa/master.c b/net/dsa/master.c
index 90e6df0..c90ee32 100644
--- a/net/dsa/master.c
+++ b/net/dsa/master.c
@@ -22,7 +22,7 @@
int port = cpu_dp->index;
int count = 0;
- if (ops && ops->get_sset_count && ops->get_ethtool_stats) {
+ if (ops->get_sset_count && ops->get_ethtool_stats) {
count = ops->get_sset_count(dev, ETH_SS_STATS);
ops->get_ethtool_stats(dev, stats, data);
}
@@ -31,6 +31,32 @@
ds->ops->get_ethtool_stats(ds, port, data + count);
}
+static void dsa_master_get_ethtool_phy_stats(struct net_device *dev,
+ struct ethtool_stats *stats,
+ uint64_t *data)
+{
+ struct dsa_port *cpu_dp = dev->dsa_ptr;
+ const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
+ struct dsa_switch *ds = cpu_dp->ds;
+ int port = cpu_dp->index;
+ int count = 0;
+
+ if (dev->phydev && !ops->get_ethtool_phy_stats) {
+ count = phy_ethtool_get_sset_count(dev->phydev);
+ if (count >= 0)
+ phy_ethtool_get_stats(dev->phydev, stats, data);
+ } else if (ops->get_sset_count && ops->get_ethtool_phy_stats) {
+ count = ops->get_sset_count(dev, ETH_SS_PHY_STATS);
+ ops->get_ethtool_phy_stats(dev, stats, data);
+ }
+
+ if (count < 0)
+ count = 0;
+
+ if (ds->ops->get_ethtool_phy_stats)
+ ds->ops->get_ethtool_phy_stats(ds, port, data + count);
+}
+
static int dsa_master_get_sset_count(struct net_device *dev, int sset)
{
struct dsa_port *cpu_dp = dev->dsa_ptr;
@@ -38,11 +64,17 @@
struct dsa_switch *ds = cpu_dp->ds;
int count = 0;
- if (ops && ops->get_sset_count)
- count += ops->get_sset_count(dev, sset);
+ if (sset == ETH_SS_PHY_STATS && dev->phydev &&
+ !ops->get_ethtool_phy_stats)
+ count = phy_ethtool_get_sset_count(dev->phydev);
+ else if (ops->get_sset_count)
+ count = ops->get_sset_count(dev, sset);
- if (sset == ETH_SS_STATS && ds->ops->get_sset_count)
- count += ds->ops->get_sset_count(ds, cpu_dp->index);
+ if (count < 0)
+ count = 0;
+
+ if (ds->ops->get_sset_count)
+ count += ds->ops->get_sset_count(ds, cpu_dp->index, sset);
return count;
}
@@ -64,19 +96,28 @@
/* We do not want to be NULL-terminated, since this is a prefix */
pfx[sizeof(pfx) - 1] = '_';
- if (ops && ops->get_sset_count && ops->get_strings) {
- mcount = ops->get_sset_count(dev, ETH_SS_STATS);
+ if (stringset == ETH_SS_PHY_STATS && dev->phydev &&
+ !ops->get_ethtool_phy_stats) {
+ mcount = phy_ethtool_get_sset_count(dev->phydev);
+ if (mcount < 0)
+ mcount = 0;
+ else
+ phy_ethtool_get_strings(dev->phydev, data);
+ } else if (ops->get_sset_count && ops->get_strings) {
+ mcount = ops->get_sset_count(dev, stringset);
+ if (mcount < 0)
+ mcount = 0;
ops->get_strings(dev, stringset, data);
}
- if (stringset == ETH_SS_STATS && ds->ops->get_strings) {
+ if (ds->ops->get_strings) {
ndata = data + mcount * len;
/* This function copies ETH_GSTRINGS_LEN bytes, we will mangle
* the output after to prepend our CPU port prefix we
* constructed earlier
*/
- ds->ops->get_strings(ds, port, ndata);
- count = ds->ops->get_sset_count(ds, port);
+ ds->ops->get_strings(ds, port, stringset, ndata);
+ count = ds->ops->get_sset_count(ds, port, stringset);
for (i = 0; i < count; i++) {
memmove(ndata + (i * len + sizeof(pfx)),
ndata + i * len, len - sizeof(pfx));
@@ -102,6 +143,7 @@
ops->get_sset_count = dsa_master_get_sset_count;
ops->get_ethtool_stats = dsa_master_get_ethtool_stats;
ops->get_strings = dsa_master_get_strings;
+ ops->get_ethtool_phy_stats = dsa_master_get_ethtool_phy_stats;
dev->ethtool_ops = ops;
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 7acc116..2413beb 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -273,24 +273,37 @@
return 0;
}
+static struct phy_device *dsa_port_get_phy_device(struct dsa_port *dp)
+{
+ struct device_node *phy_dn;
+ struct phy_device *phydev;
+
+ phy_dn = of_parse_phandle(dp->dn, "phy-handle", 0);
+ if (!phy_dn)
+ return NULL;
+
+ phydev = of_phy_find_device(phy_dn);
+ if (!phydev) {
+ of_node_put(phy_dn);
+ return ERR_PTR(-EPROBE_DEFER);
+ }
+
+ return phydev;
+}
+
static int dsa_port_setup_phy_of(struct dsa_port *dp, bool enable)
{
- struct device_node *port_dn = dp->dn;
- struct device_node *phy_dn;
struct dsa_switch *ds = dp->ds;
struct phy_device *phydev;
int port = dp->index;
int err = 0;
- phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
- if (!phy_dn)
+ phydev = dsa_port_get_phy_device(dp);
+ if (!phydev)
return 0;
- phydev = of_phy_find_device(phy_dn);
- if (!phydev) {
- err = -EPROBE_DEFER;
- goto err_put_of;
- }
+ if (IS_ERR(phydev))
+ return PTR_ERR(phydev);
if (enable) {
err = genphy_config_init(phydev);
@@ -317,8 +330,6 @@
err_put_dev:
put_device(&phydev->mdio.dev);
-err_put_of:
- of_node_put(phy_dn);
return err;
}
@@ -372,3 +383,60 @@
else
dsa_port_setup_phy_of(dp, false);
}
+
+int dsa_port_get_phy_strings(struct dsa_port *dp, uint8_t *data)
+{
+ struct phy_device *phydev;
+ int ret = -EOPNOTSUPP;
+
+ if (of_phy_is_fixed_link(dp->dn))
+ return ret;
+
+ phydev = dsa_port_get_phy_device(dp);
+ if (IS_ERR_OR_NULL(phydev))
+ return ret;
+
+ ret = phy_ethtool_get_strings(phydev, data);
+ put_device(&phydev->mdio.dev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(dsa_port_get_phy_strings);
+
+int dsa_port_get_ethtool_phy_stats(struct dsa_port *dp, uint64_t *data)
+{
+ struct phy_device *phydev;
+ int ret = -EOPNOTSUPP;
+
+ if (of_phy_is_fixed_link(dp->dn))
+ return ret;
+
+ phydev = dsa_port_get_phy_device(dp);
+ if (IS_ERR_OR_NULL(phydev))
+ return ret;
+
+ ret = phy_ethtool_get_stats(phydev, NULL, data);
+ put_device(&phydev->mdio.dev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(dsa_port_get_ethtool_phy_stats);
+
+int dsa_port_get_phy_sset_count(struct dsa_port *dp)
+{
+ struct phy_device *phydev;
+ int ret = -EOPNOTSUPP;
+
+ if (of_phy_is_fixed_link(dp->dn))
+ return ret;
+
+ phydev = dsa_port_get_phy_device(dp);
+ if (IS_ERR_OR_NULL(phydev))
+ return ret;
+
+ ret = phy_ethtool_get_sset_count(phydev);
+ put_device(&phydev->mdio.dev);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(dsa_port_get_phy_sset_count);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 18561af..1e3b6a6 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -13,6 +13,7 @@
#include <linux/netdevice.h>
#include <linux/phy.h>
#include <linux/phy_fixed.h>
+#include <linux/phylink.h>
#include <linux/of_net.h>
#include <linux/of_mdio.h>
#include <linux/mdio.h>
@@ -97,8 +98,7 @@
if (err)
goto clear_promisc;
- if (dev->phydev)
- phy_start(dev->phydev);
+ phylink_start(dp->pl);
return 0;
@@ -120,8 +120,7 @@
struct net_device *master = dsa_slave_to_master(dev);
struct dsa_port *dp = dsa_slave_to_port(dev);
- if (dev->phydev)
- phy_stop(dev->phydev);
+ phylink_stop(dp->pl);
dsa_port_disable(dp, dev->phydev);
@@ -272,10 +271,7 @@
break;
}
- if (!dev->phydev)
- return -ENODEV;
-
- return phy_mii_ioctl(dev->phydev, ifr, cmd);
+ return phylink_mii_ioctl(p->dp->pl, ifr, cmd);
}
static int dsa_slave_port_attr_set(struct net_device *dev,
@@ -498,14 +494,11 @@
ds->ops->get_regs(ds, dp->index, regs, _p);
}
-static u32 dsa_slave_get_link(struct net_device *dev)
+static int dsa_slave_nway_reset(struct net_device *dev)
{
- if (!dev->phydev)
- return -ENODEV;
+ struct dsa_port *dp = dsa_slave_to_port(dev);
- genphy_update_link(dev->phydev);
-
- return dev->phydev->link;
+ return phylink_ethtool_nway_reset(dp->pl);
}
static int dsa_slave_get_eeprom_len(struct net_device *dev)
@@ -560,7 +553,8 @@
strncpy(data + 2 * len, "rx_packets", len);
strncpy(data + 3 * len, "rx_bytes", len);
if (ds->ops->get_strings)
- ds->ops->get_strings(ds, dp->index, data + 4 * len);
+ ds->ops->get_strings(ds, dp->index, stringset,
+ data + 4 * len);
}
}
@@ -605,7 +599,7 @@
count = 4;
if (ds->ops->get_sset_count)
- count += ds->ops->get_sset_count(ds, dp->index);
+ count += ds->ops->get_sset_count(ds, dp->index, sset);
return count;
}
@@ -618,6 +612,8 @@
struct dsa_port *dp = dsa_slave_to_port(dev);
struct dsa_switch *ds = dp->ds;
+ phylink_ethtool_get_wol(dp->pl, w);
+
if (ds->ops->get_wol)
ds->ops->get_wol(ds, dp->index, w);
}
@@ -628,6 +624,8 @@
struct dsa_switch *ds = dp->ds;
int ret = -EOPNOTSUPP;
+ phylink_ethtool_set_wol(dp->pl, w);
+
if (ds->ops->set_wol)
ret = ds->ops->set_wol(ds, dp->index, w);
@@ -651,13 +649,7 @@
if (ret)
return ret;
- if (e->eee_enabled) {
- ret = phy_init_eee(dev->phydev, 0);
- if (ret)
- return ret;
- }
-
- return phy_ethtool_set_eee(dev->phydev, e);
+ return phylink_ethtool_set_eee(dp->pl, e);
}
static int dsa_slave_get_eee(struct net_device *dev, struct ethtool_eee *e)
@@ -677,7 +669,23 @@
if (ret)
return ret;
- return phy_ethtool_get_eee(dev->phydev, e);
+ return phylink_ethtool_get_eee(dp->pl, e);
+}
+
+static int dsa_slave_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+
+ return phylink_ethtool_ksettings_get(dp->pl, cmd);
+}
+
+static int dsa_slave_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+
+ return phylink_ethtool_ksettings_set(dp->pl, cmd);
}
#ifdef CONFIG_NET_POLL_CONTROLLER
@@ -980,8 +988,8 @@
.get_drvinfo = dsa_slave_get_drvinfo,
.get_regs_len = dsa_slave_get_regs_len,
.get_regs = dsa_slave_get_regs,
- .nway_reset = phy_ethtool_nway_reset,
- .get_link = dsa_slave_get_link,
+ .nway_reset = dsa_slave_nway_reset,
+ .get_link = ethtool_op_get_link,
.get_eeprom_len = dsa_slave_get_eeprom_len,
.get_eeprom = dsa_slave_get_eeprom,
.set_eeprom = dsa_slave_set_eeprom,
@@ -992,8 +1000,8 @@
.get_wol = dsa_slave_get_wol,
.set_eee = dsa_slave_set_eee,
.get_eee = dsa_slave_get_eee,
- .get_link_ksettings = phy_ethtool_get_link_ksettings,
- .set_link_ksettings = phy_ethtool_set_link_ksettings,
+ .get_link_ksettings = dsa_slave_get_link_ksettings,
+ .set_link_ksettings = dsa_slave_set_link_ksettings,
.get_rxnfc = dsa_slave_get_rxnfc,
.set_rxnfc = dsa_slave_set_rxnfc,
.get_ts_info = dsa_slave_get_ts_info,
@@ -1052,56 +1060,122 @@
.name = "dsa",
};
-static void dsa_slave_adjust_link(struct net_device *dev)
+static void dsa_slave_phylink_validate(struct net_device *dev,
+ unsigned long *supported,
+ struct phylink_link_state *state)
{
struct dsa_port *dp = dsa_slave_to_port(dev);
- struct dsa_slave_priv *p = netdev_priv(dev);
struct dsa_switch *ds = dp->ds;
- unsigned int status_changed = 0;
- if (p->old_link != dev->phydev->link) {
- status_changed = 1;
- p->old_link = dev->phydev->link;
- }
+ if (!ds->ops->phylink_validate)
+ return;
- if (p->old_duplex != dev->phydev->duplex) {
- status_changed = 1;
- p->old_duplex = dev->phydev->duplex;
- }
-
- if (p->old_pause != dev->phydev->pause) {
- status_changed = 1;
- p->old_pause = dev->phydev->pause;
- }
-
- if (ds->ops->adjust_link && status_changed)
- ds->ops->adjust_link(ds, dp->index, dev->phydev);
-
- if (status_changed)
- phy_print_status(dev->phydev);
+ ds->ops->phylink_validate(ds, dp->index, supported, state);
}
-static int dsa_slave_fixed_link_update(struct net_device *dev,
- struct fixed_phy_status *status)
+static int dsa_slave_phylink_mac_link_state(struct net_device *dev,
+ struct phylink_link_state *state)
{
- struct dsa_switch *ds;
- struct dsa_port *dp;
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ struct dsa_switch *ds = dp->ds;
- if (dev) {
- dp = dsa_slave_to_port(dev);
- ds = dp->ds;
- if (ds->ops->fixed_link_update)
- ds->ops->fixed_link_update(ds, dp->index, status);
+ /* Only called for SGMII and 802.3z */
+ if (!ds->ops->phylink_mac_link_state)
+ return -EOPNOTSUPP;
+
+ return ds->ops->phylink_mac_link_state(ds, dp->index, state);
+}
+
+static void dsa_slave_phylink_mac_config(struct net_device *dev,
+ unsigned int mode,
+ const struct phylink_link_state *state)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ struct dsa_switch *ds = dp->ds;
+
+ if (!ds->ops->phylink_mac_config)
+ return;
+
+ ds->ops->phylink_mac_config(ds, dp->index, mode, state);
+}
+
+static void dsa_slave_phylink_mac_an_restart(struct net_device *dev)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ struct dsa_switch *ds = dp->ds;
+
+ if (!ds->ops->phylink_mac_an_restart)
+ return;
+
+ ds->ops->phylink_mac_an_restart(ds, dp->index);
+}
+
+static void dsa_slave_phylink_mac_link_down(struct net_device *dev,
+ unsigned int mode,
+ phy_interface_t interface)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ struct dsa_switch *ds = dp->ds;
+
+ if (!ds->ops->phylink_mac_link_down) {
+ if (ds->ops->adjust_link && dev->phydev)
+ ds->ops->adjust_link(ds, dp->index, dev->phydev);
+ return;
}
- return 0;
+ ds->ops->phylink_mac_link_down(ds, dp->index, mode, interface);
+}
+
+static void dsa_slave_phylink_mac_link_up(struct net_device *dev,
+ unsigned int mode,
+ phy_interface_t interface,
+ struct phy_device *phydev)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ struct dsa_switch *ds = dp->ds;
+
+ if (!ds->ops->phylink_mac_link_up) {
+ if (ds->ops->adjust_link && dev->phydev)
+ ds->ops->adjust_link(ds, dp->index, dev->phydev);
+ return;
+ }
+
+ ds->ops->phylink_mac_link_up(ds, dp->index, mode, interface, phydev);
+}
+
+static const struct phylink_mac_ops dsa_slave_phylink_mac_ops = {
+ .validate = dsa_slave_phylink_validate,
+ .mac_link_state = dsa_slave_phylink_mac_link_state,
+ .mac_config = dsa_slave_phylink_mac_config,
+ .mac_an_restart = dsa_slave_phylink_mac_an_restart,
+ .mac_link_down = dsa_slave_phylink_mac_link_down,
+ .mac_link_up = dsa_slave_phylink_mac_link_up,
+};
+
+void dsa_port_phylink_mac_change(struct dsa_switch *ds, int port, bool up)
+{
+ const struct dsa_port *dp = dsa_to_port(ds, port);
+
+ phylink_mac_change(dp->pl, up);
+}
+EXPORT_SYMBOL_GPL(dsa_port_phylink_mac_change);
+
+static void dsa_slave_phylink_fixed_state(struct net_device *dev,
+ struct phylink_link_state *state)
+{
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ struct dsa_switch *ds = dp->ds;
+
+ /* No need to check that this operation is valid, the callback would
+ * not be called if it was not.
+ */
+ ds->ops->phylink_fixed_state(ds, dp->index, state);
}
/* slave device setup *******************************************************/
static int dsa_slave_phy_connect(struct net_device *slave_dev, int addr)
{
struct dsa_port *dp = dsa_slave_to_port(slave_dev);
- struct dsa_slave_priv *p = netdev_priv(slave_dev);
struct dsa_switch *ds = dp->ds;
slave_dev->phydev = mdiobus_get_phy(ds->slave_mii_bus, addr);
@@ -1110,75 +1184,54 @@
return -ENODEV;
}
- /* Use already configured phy mode */
- if (p->phy_interface == PHY_INTERFACE_MODE_NA)
- p->phy_interface = slave_dev->phydev->interface;
-
- return phy_connect_direct(slave_dev, slave_dev->phydev,
- dsa_slave_adjust_link, p->phy_interface);
+ return phylink_connect_phy(dp->pl, slave_dev->phydev);
}
static int dsa_slave_phy_setup(struct net_device *slave_dev)
{
struct dsa_port *dp = dsa_slave_to_port(slave_dev);
- struct dsa_slave_priv *p = netdev_priv(slave_dev);
struct device_node *port_dn = dp->dn;
struct dsa_switch *ds = dp->ds;
- struct device_node *phy_dn;
- bool phy_is_fixed = false;
u32 phy_flags = 0;
int mode, ret;
mode = of_get_phy_mode(port_dn);
if (mode < 0)
mode = PHY_INTERFACE_MODE_NA;
- p->phy_interface = mode;
- phy_dn = of_parse_phandle(port_dn, "phy-handle", 0);
- if (!phy_dn && of_phy_is_fixed_link(port_dn)) {
- /* In the case of a fixed PHY, the DT node associated
- * to the fixed PHY is the Port DT node
- */
- ret = of_phy_register_fixed_link(port_dn);
- if (ret) {
- netdev_err(slave_dev, "failed to register fixed PHY: %d\n", ret);
- return ret;
- }
- phy_is_fixed = true;
- phy_dn = of_node_get(port_dn);
+ dp->pl = phylink_create(slave_dev, of_fwnode_handle(port_dn), mode,
+ &dsa_slave_phylink_mac_ops);
+ if (IS_ERR(dp->pl)) {
+ netdev_err(slave_dev,
+ "error creating PHYLINK: %ld\n", PTR_ERR(dp->pl));
+ return PTR_ERR(dp->pl);
}
+ /* Register only if the switch provides such a callback, since this
+ * callback takes precedence over polling the link GPIO in PHYLINK
+ * (see phylink_get_fixed_state).
+ */
+ if (ds->ops->phylink_fixed_state)
+ phylink_fixed_state_cb(dp->pl, dsa_slave_phylink_fixed_state);
+
if (ds->ops->get_phy_flags)
phy_flags = ds->ops->get_phy_flags(ds, dp->index);
- if (phy_dn) {
- slave_dev->phydev = of_phy_connect(slave_dev, phy_dn,
- dsa_slave_adjust_link,
- phy_flags,
- p->phy_interface);
- of_node_put(phy_dn);
- }
-
- if (slave_dev->phydev && phy_is_fixed)
- fixed_phy_set_link_update(slave_dev->phydev,
- dsa_slave_fixed_link_update);
-
- /* We could not connect to a designated PHY, so use the switch internal
- * MDIO bus instead
- */
- if (!slave_dev->phydev) {
+ ret = phylink_of_phy_connect(dp->pl, port_dn, phy_flags);
+ if (ret == -ENODEV) {
+ /* We could not connect to a designated PHY or SFP, so use the
+ * switch internal MDIO bus instead
+ */
ret = dsa_slave_phy_connect(slave_dev, dp->index);
if (ret) {
- netdev_err(slave_dev, "failed to connect to port %d: %d\n",
+ netdev_err(slave_dev,
+ "failed to connect to port %d: %d\n",
dp->index, ret);
- if (phy_is_fixed)
- of_phy_deregister_fixed_link(port_dn);
+ phylink_destroy(dp->pl);
return ret;
}
}
- phy_attached_info(slave_dev->phydev);
-
return 0;
}
@@ -1193,29 +1246,26 @@
int dsa_slave_suspend(struct net_device *slave_dev)
{
- struct dsa_slave_priv *p = netdev_priv(slave_dev);
+ struct dsa_port *dp = dsa_slave_to_port(slave_dev);
netif_device_detach(slave_dev);
- if (slave_dev->phydev) {
- phy_stop(slave_dev->phydev);
- p->old_pause = -1;
- p->old_link = -1;
- p->old_duplex = -1;
- phy_suspend(slave_dev->phydev);
- }
+ rtnl_lock();
+ phylink_stop(dp->pl);
+ rtnl_unlock();
return 0;
}
int dsa_slave_resume(struct net_device *slave_dev)
{
+ struct dsa_port *dp = dsa_slave_to_port(slave_dev);
+
netif_device_attach(slave_dev);
- if (slave_dev->phydev) {
- phy_resume(slave_dev->phydev);
- phy_start(slave_dev->phydev);
- }
+ rtnl_lock();
+ phylink_start(dp->pl);
+ rtnl_unlock();
return 0;
}
@@ -1280,11 +1330,6 @@
p->dp = port;
INIT_LIST_HEAD(&p->mall_tc_list);
p->xmit = cpu_dp->tag_ops->xmit;
-
- p->old_pause = -1;
- p->old_link = -1;
- p->old_duplex = -1;
-
port->slave = slave_dev;
netif_carrier_off(slave_dev);
@@ -1307,9 +1352,10 @@
return 0;
out_phy:
- phy_disconnect(slave_dev->phydev);
- if (of_phy_is_fixed_link(port->dn))
- of_phy_deregister_fixed_link(port->dn);
+ rtnl_lock();
+ phylink_disconnect_phy(p->dp->pl);
+ rtnl_unlock();
+ phylink_destroy(p->dp->pl);
out_free:
free_percpu(p->stats64);
free_netdev(slave_dev);
@@ -1321,17 +1367,15 @@
{
struct dsa_port *dp = dsa_slave_to_port(slave_dev);
struct dsa_slave_priv *p = netdev_priv(slave_dev);
- struct device_node *port_dn = dp->dn;
netif_carrier_off(slave_dev);
- if (slave_dev->phydev) {
- phy_disconnect(slave_dev->phydev);
+ rtnl_lock();
+ phylink_disconnect_phy(dp->pl);
+ rtnl_unlock();
- if (of_phy_is_fixed_link(port_dn))
- of_phy_deregister_fixed_link(port_dn);
- }
dsa_slave_notify(slave_dev, DSA_PORT_UNREGISTER);
unregister_netdev(slave_dev);
+ phylink_destroy(dp->pl);
free_percpu(p->stats64);
free_netdev(slave_dev);
}
@@ -1394,6 +1438,9 @@
switch (switchdev_work->event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE:
fdb_info = &switchdev_work->fdb_info;
+ if (!fdb_info->added_by_user)
+ break;
+
err = dsa_port_fdb_add(dp, fdb_info->addr, fdb_info->vid);
if (err) {
netdev_dbg(dev, "fdb add failed err=%d\n", err);
@@ -1405,6 +1452,9 @@
case SWITCHDEV_FDB_DEL_TO_DEVICE:
fdb_info = &switchdev_work->fdb_info;
+ if (!fdb_info->added_by_user)
+ break;
+
err = dsa_port_fdb_del(dp, fdb_info->addr, fdb_info->vid);
if (err) {
netdev_dbg(dev, "fdb del failed err=%d\n", err);
@@ -1457,8 +1507,7 @@
switch (event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE: /* fall through */
case SWITCHDEV_FDB_DEL_TO_DEVICE:
- if (dsa_slave_switchdev_fdb_work_init(switchdev_work,
- ptr))
+ if (dsa_slave_switchdev_fdb_work_init(switchdev_work, ptr))
goto err_fdb_work_init;
dev_hold(dev);
break;
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index eaeba9b..ee28440 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -128,15 +128,15 @@
{
const unsigned int flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
const struct ethhdr *eth = (const struct ethhdr *)data;
- struct flow_keys keys;
+ struct flow_keys_basic keys;
/* this should never happen, but better safe than sorry */
if (unlikely(len < sizeof(*eth)))
return len;
/* parse any remaining L2/L3 headers, check for L4 */
- if (!skb_flow_dissect_flow_keys_buf(&keys, data, eth->h_proto,
- sizeof(*eth), len, flags))
+ if (!skb_flow_dissect_flow_keys_basic(NULL, &keys, data, eth->h_proto,
+ sizeof(*eth), len, flags))
return max_t(u32, keys.control.thoff, sizeof(*eth));
/* parse for any L4 headers */
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index a07b7dd..eec9569 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -13,7 +13,10 @@
tcp_offload.o datagram.o raw.o udp.o udplite.o \
udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
- inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o
+ inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
+ metrics.o netlink.o
+
+obj-$(CONFIG_BPFILTER) += bpfilter/
obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index eaed036..b403499 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -994,7 +994,9 @@
.getsockopt = sock_common_getsockopt,
.sendmsg = inet_sendmsg,
.recvmsg = inet_recvmsg,
- .mmap = sock_no_mmap,
+#ifdef CONFIG_MMU
+ .mmap = tcp_mmap,
+#endif
.sendpage = inet_sendpage,
.splice_read = tcp_splice_read,
.read_sock = tcp_read_sock,
@@ -1006,6 +1008,7 @@
.compat_getsockopt = compat_sock_common_getsockopt,
.compat_ioctl = inet_compat_ioctl,
#endif
+ .set_rcvlowat = tcp_set_rcvlowat,
};
EXPORT_SYMBOL(inet_stream_ops);
diff --git a/net/ipv4/bpfilter/Makefile b/net/ipv4/bpfilter/Makefile
new file mode 100644
index 0000000..ce262d7
--- /dev/null
+++ b/net/ipv4/bpfilter/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_BPFILTER) += sockopt.o
+
diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c
new file mode 100644
index 0000000..42a96d2
--- /dev/null
+++ b/net/ipv4/bpfilter/sockopt.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/uaccess.h>
+#include <linux/bpfilter.h>
+#include <uapi/linux/bpf.h>
+#include <linux/wait.h>
+#include <linux/kmod.h>
+
+int (*bpfilter_process_sockopt)(struct sock *sk, int optname,
+ char __user *optval,
+ unsigned int optlen, bool is_set);
+EXPORT_SYMBOL_GPL(bpfilter_process_sockopt);
+
+int bpfilter_mbox_request(struct sock *sk, int optname, char __user *optval,
+ unsigned int optlen, bool is_set)
+{
+ if (!bpfilter_process_sockopt) {
+ int err = request_module("bpfilter");
+
+ if (err)
+ return err;
+ if (!bpfilter_process_sockopt)
+ return -ECHILD;
+ }
+ return bpfilter_process_sockopt(sk, optname, optval, optlen, is_set);
+}
+
+int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval,
+ unsigned int optlen)
+{
+ return bpfilter_mbox_request(sk, optname, optval, optlen, true);
+}
+
+int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval,
+ int __user *optlen)
+{
+ int len;
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+
+ return bpfilter_mbox_request(sk, optname, optval, len, false);
+}
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index e66172a..b69e282 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -354,8 +354,6 @@
fl4.fl4_dport = 0;
}
- trace_fib_validate_source(dev, &fl4);
-
if (fib_lookup(net, &fl4, &res, 0))
goto last_resort;
if (res.type != RTN_UNICAST &&
@@ -650,6 +648,9 @@
[RTA_UID] = { .type = NLA_U32 },
[RTA_MARK] = { .type = NLA_U32 },
[RTA_TABLE] = { .type = NLA_U32 },
+ [RTA_IP_PROTO] = { .type = NLA_U8 },
+ [RTA_SPORT] = { .type = NLA_U16 },
+ [RTA_DPORT] = { .type = NLA_U16 },
};
static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 737d11b..f8eb78d 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -213,14 +213,17 @@
static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct fib_rule_hdr *frh,
- struct nlattr **tb)
+ struct nlattr **tb,
+ struct netlink_ext_ack *extack)
{
struct net *net = sock_net(skb->sk);
int err = -EINVAL;
struct fib4_rule *rule4 = (struct fib4_rule *) rule;
- if (frh->tos & ~IPTOS_TOS_MASK)
+ if (frh->tos & ~IPTOS_TOS_MASK) {
+ NL_SET_ERR_MSG(extack, "Invalid tos");
goto errout;
+ }
/* split local/main if they are not already split */
err = fib_unmerge(net);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c27122f..6608db2 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1019,47 +1019,8 @@
static int
fib_convert_metrics(struct fib_info *fi, const struct fib_config *cfg)
{
- bool ecn_ca = false;
- struct nlattr *nla;
- int remaining;
-
- if (!cfg->fc_mx)
- return 0;
-
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
- u32 val;
-
- if (!type)
- continue;
- if (type > RTAX_MAX)
- return -EINVAL;
-
- if (type == RTAX_CC_ALGO) {
- char tmp[TCP_CA_NAME_MAX];
-
- nla_strlcpy(tmp, nla, sizeof(tmp));
- val = tcp_ca_get_key_by_name(fi->fib_net, tmp, &ecn_ca);
- if (val == TCP_CA_UNSPEC)
- return -EINVAL;
- } else {
- val = nla_get_u32(nla);
- }
- if (type == RTAX_ADVMSS && val > 65535 - 40)
- val = 65535 - 40;
- if (type == RTAX_MTU && val > 65535 - 15)
- val = 65535 - 15;
- if (type == RTAX_HOPLIMIT && val > 255)
- val = 255;
- if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK))
- return -EINVAL;
- fi->fib_metrics->metrics[type - 1] = val;
- }
-
- if (ecn_ca)
- fi->fib_metrics->metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
-
- return 0;
+ return ip_metrics_convert(fi->fib_net, cfg->fc_mx, cfg->fc_mx_len,
+ fi->fib_metrics->metrics);
}
struct fib_info *fib_create_info(struct fib_config *cfg,
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3dcffd3..65c340f 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1326,14 +1326,14 @@
unsigned long index;
t_key cindex;
- trace_fib_table_lookup(tb->tb_id, flp);
-
pn = t->kv;
cindex = 0;
n = get_child_rcu(pn, cindex);
- if (!n)
+ if (!n) {
+ trace_fib_table_lookup(tb->tb_id, flp, NULL, -EAGAIN);
return -EAGAIN;
+ }
#ifdef CONFIG_IP_FIB_TRIE_STATS
this_cpu_inc(stats->gets);
@@ -1416,8 +1416,11 @@
* nothing for us to do as we do not have any
* further nodes to parse.
*/
- if (IS_TRIE(pn))
+ if (IS_TRIE(pn)) {
+ trace_fib_table_lookup(tb->tb_id, flp,
+ NULL, -EAGAIN);
return -EAGAIN;
+ }
#ifdef CONFIG_IP_FIB_TRIE_STATS
this_cpu_inc(stats->backtrack);
#endif
@@ -1459,6 +1462,7 @@
#ifdef CONFIG_IP_FIB_TRIE_STATS
this_cpu_inc(stats->semantic_match_passed);
#endif
+ trace_fib_table_lookup(tb->tb_id, flp, NULL, err);
return err;
}
if (fi->fib_flags & RTNH_F_DEAD)
@@ -1494,7 +1498,7 @@
#ifdef CONFIG_IP_FIB_TRIE_STATS
this_cpu_inc(stats->semantic_match_passed);
#endif
- trace_fib_table_lookup_nh(nh);
+ trace_fib_table_lookup(tb->tb_id, flp, nh, err);
return err;
}
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 881ac6d..33a88e0 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -27,11 +27,6 @@
#include <net/sock_reuseport.h>
#include <net/addrconf.h>
-#ifdef INET_CSK_DEBUG
-const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
-EXPORT_SYMBOL(inet_csk_timer_bug_msg);
-#endif
-
#if IS_ENABLED(CONFIG_IPV6)
/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6
* only, and any IPv4 addresses if not IPv6 only
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index f200b30..2d8efee 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -578,6 +578,8 @@
int tunnel_hlen;
int version;
__be16 df;
+ int nhoff;
+ int thoff;
tun_info = skb_tunnel_info(skb);
if (unlikely(!tun_info || !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
@@ -605,6 +607,16 @@
truncate = true;
}
+ nhoff = skb_network_header(skb) - skb_mac_header(skb);
+ if (skb->protocol == htons(ETH_P_IP) &&
+ (ntohs(ip_hdr(skb)->tot_len) > skb->len - nhoff))
+ truncate = true;
+
+ thoff = skb_transport_header(skb) - skb_mac_header(skb);
+ if (skb->protocol == htons(ETH_P_IPV6) &&
+ (ntohs(ipv6_hdr(skb)->payload_len) > skb->len - thoff))
+ truncate = true;
+
if (version == 1) {
erspan_build_header(skb, ntohl(tunnel_id_to_key32(key->tun_id)),
ntohl(md->u.index), truncate, true);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d54abc0..af5a830 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -878,11 +878,14 @@
struct rtable *rt = (struct rtable *)cork->dst;
unsigned int wmem_alloc_delta = 0;
u32 tskey = 0;
+ bool paged;
skb = skb_peek_tail(queue);
exthdrlen = !skb ? rt->dst.header_len : 0;
- mtu = cork->fragsize;
+ mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
+ paged = !!cork->gso_size;
+
if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
tskey = sk->sk_tskey++;
@@ -906,8 +909,8 @@
if (transhdrlen &&
length + fragheaderlen <= mtu &&
rt->dst.dev->features & (NETIF_F_HW_CSUM | NETIF_F_IP_CSUM) &&
- !(flags & MSG_MORE) &&
- !exthdrlen)
+ (!(flags & MSG_MORE) || cork->gso_size) &&
+ (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM)))
csummode = CHECKSUM_PARTIAL;
cork->length += length;
@@ -933,6 +936,7 @@
unsigned int fraglen;
unsigned int fraggap;
unsigned int alloclen;
+ unsigned int pagedlen = 0;
struct sk_buff *skb_prev;
alloc_new_skb:
skb_prev = skb;
@@ -953,8 +957,12 @@
if ((flags & MSG_MORE) &&
!(rt->dst.dev->features&NETIF_F_SG))
alloclen = mtu;
- else
+ else if (!paged)
alloclen = fraglen;
+ else {
+ alloclen = min_t(int, fraglen, MAX_HEADER);
+ pagedlen = fraglen - alloclen;
+ }
alloclen += exthdrlen;
@@ -998,7 +1006,7 @@
/*
* Find where to start putting bytes.
*/
- data = skb_put(skb, fraglen + exthdrlen);
+ data = skb_put(skb, fraglen + exthdrlen - pagedlen);
skb_set_network_header(skb, exthdrlen);
skb->transport_header = (skb->network_header +
fragheaderlen);
@@ -1014,7 +1022,7 @@
pskb_trim_unique(skb_prev, maxfraglen);
}
- copy = datalen - transhdrlen - fraggap;
+ copy = datalen - transhdrlen - fraggap - pagedlen;
if (copy > 0 && getfrag(from, data + transhdrlen, offset, copy, fraggap, skb) < 0) {
err = -EFAULT;
kfree_skb(skb);
@@ -1022,7 +1030,7 @@
}
offset += copy;
- length -= datalen - fraggap;
+ length -= copy + transhdrlen;
transhdrlen = 0;
exthdrlen = 0;
csummode = CHECKSUM_NONE;
@@ -1136,6 +1144,8 @@
*rtp = NULL;
cork->fragsize = ip_sk_use_pmtu(sk) ?
dst_mtu(&rt->dst) : rt->dst.dev->mtu;
+
+ cork->gso_size = sk->sk_type == SOCK_DGRAM ? ipc->gso_size : 0;
cork->dst = &rt->dst;
cork->length = 0;
cork->ttl = ipc->ttl;
@@ -1215,7 +1225,7 @@
return -EOPNOTSUPP;
hh_len = LL_RESERVED_SPACE(rt->dst.dev);
- mtu = cork->fragsize;
+ mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
@@ -1471,9 +1481,8 @@
int len, int odd, struct sk_buff *skb),
void *from, int length, int transhdrlen,
struct ipcm_cookie *ipc, struct rtable **rtp,
- unsigned int flags)
+ struct inet_cork *cork, unsigned int flags)
{
- struct inet_cork cork;
struct sk_buff_head queue;
int err;
@@ -1482,22 +1491,22 @@
__skb_queue_head_init(&queue);
- cork.flags = 0;
- cork.addr = 0;
- cork.opt = NULL;
- err = ip_setup_cork(sk, &cork, ipc, rtp);
+ cork->flags = 0;
+ cork->addr = 0;
+ cork->opt = NULL;
+ err = ip_setup_cork(sk, cork, ipc, rtp);
if (err)
return ERR_PTR(err);
- err = __ip_append_data(sk, fl4, &queue, &cork,
+ err = __ip_append_data(sk, fl4, &queue, cork,
¤t->task_frag, getfrag,
from, length, transhdrlen, flags);
if (err) {
- __ip_flush_pending_frames(sk, &queue, &cork);
+ __ip_flush_pending_frames(sk, &queue, cork);
return ERR_PTR(err);
}
- return __ip_make_skb(sk, fl4, &queue, &cork);
+ return __ip_make_skb(sk, fl4, &queue, cork);
}
/*
@@ -1553,7 +1562,7 @@
oif = skb->skb_iif;
flowi4_init_output(&fl4, oif,
- IP4_REPLY_MARK(net, skb->mark),
+ IP4_REPLY_MARK(net, skb->mark) ?: sk->sk_mark,
RT_TOS(arg->tos),
RT_SCOPE_UNIVERSE, ip_hdr(skb)->protocol,
ip_reply_arg_flowi_flags(arg),
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 57bbb06..fc32fdb 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -47,6 +47,8 @@
#include <linux/errqueue.h>
#include <linux/uaccess.h>
+#include <linux/bpfilter.h>
+
/*
* SOL_IP control messages.
*/
@@ -1242,6 +1244,11 @@
return -ENOPROTOOPT;
err = do_ip_setsockopt(sk, level, optname, optval, optlen);
+#ifdef CONFIG_BPFILTER
+ if (optname >= BPFILTER_IPT_SO_SET_REPLACE &&
+ optname < BPFILTER_IPT_SET_MAX)
+ err = bpfilter_ip_set_sockopt(sk, optname, optval, optlen);
+#endif
#ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
@@ -1550,6 +1557,11 @@
int err;
err = do_ip_getsockopt(sk, level, optname, optval, optlen, 0);
+#ifdef CONFIG_BPFILTER
+ if (optname >= BPFILTER_IPT_SO_GET_INFO &&
+ optname < BPFILTER_IPT_GET_MAX)
+ err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
+#endif
#ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
@@ -1582,6 +1594,11 @@
err = do_ip_getsockopt(sk, level, optname, optval, optlen,
MSG_CMSG_COMPAT);
+#ifdef CONFIG_BPFILTER
+ if (optname >= BPFILTER_IPT_SO_GET_INFO &&
+ optname < BPFILTER_IPT_GET_MAX)
+ err = bpfilter_ip_get_sockopt(sk, optname, optval, optlen);
+#endif
#ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 2f39479..dde671e 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -423,17 +423,17 @@
lwtunnel_encap_add_ops(&ip6_tun_lwt_ops, LWTUNNEL_ENCAP_IP6);
}
-struct static_key ip_tunnel_metadata_cnt = STATIC_KEY_INIT_FALSE;
+DEFINE_STATIC_KEY_FALSE(ip_tunnel_metadata_cnt);
EXPORT_SYMBOL(ip_tunnel_metadata_cnt);
void ip_tunnel_need_metadata(void)
{
- static_key_slow_inc(&ip_tunnel_metadata_cnt);
+ static_branch_inc(&ip_tunnel_metadata_cnt);
}
EXPORT_SYMBOL_GPL(ip_tunnel_need_metadata);
void ip_tunnel_unneed_metadata(void)
{
- static_key_slow_dec(&ip_tunnel_metadata_cnt);
+ static_branch_dec(&ip_tunnel_metadata_cnt);
}
EXPORT_SYMBOL_GPL(ip_tunnel_unneed_metadata);
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 43f620f..86c9f75 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -28,6 +28,9 @@
*
* Multiple Nameservers in /proc/net/pnp
* -- Josef Siemes <jsiemes@web.de>, Aug 2002
+ *
+ * NTP servers in /proc/net/ipconfig/ntp_servers
+ * -- Chris Novakovic <chris@chrisn.me.uk>, April 2018
*/
#include <linux/types.h>
@@ -93,6 +96,7 @@
#define CONF_TIMEOUT_MAX (HZ*30) /* Maximum allowed timeout */
#define CONF_NAMESERVERS_MAX 3 /* Maximum number of nameservers
- '3' from resolv.h */
+#define CONF_NTP_SERVERS_MAX 3 /* Maximum number of NTP servers */
#define NONE cpu_to_be32(INADDR_NONE)
#define ANY cpu_to_be32(INADDR_ANY)
@@ -152,6 +156,7 @@
#define ic_proto_used 0
#endif
static __be32 ic_nameservers[CONF_NAMESERVERS_MAX]; /* DNS Server IP addresses */
+static __be32 ic_ntp_servers[CONF_NTP_SERVERS_MAX]; /* NTP server IP addresses */
static u8 ic_domain[64]; /* DNS (not NIS) domain name */
/*
@@ -576,6 +581,15 @@
ic_nameservers[i] = NONE;
}
+/* Predefine NTP servers */
+static inline void __init ic_ntp_servers_predef(void)
+{
+ int i;
+
+ for (i = 0; i < CONF_NTP_SERVERS_MAX; i++)
+ ic_ntp_servers[i] = NONE;
+}
+
/*
* DHCP/BOOTP support.
*/
@@ -671,6 +685,7 @@
17, /* Boot path */
26, /* MTU */
40, /* NIS domain name */
+ 42, /* NTP servers */
};
*e++ = 55; /* Parameter request list */
@@ -721,9 +736,11 @@
*e++ = 3; /* Default gateway request */
*e++ = 4;
e += 4;
- *e++ = 5; /* Name server request */
- *e++ = 8;
- e += 8;
+#if CONF_NAMESERVERS_MAX > 0
+ *e++ = 6; /* (DNS) name server request */
+ *e++ = 4 * CONF_NAMESERVERS_MAX;
+ e += 4 * CONF_NAMESERVERS_MAX;
+#endif
*e++ = 12; /* Host name request */
*e++ = 32;
e += 32;
@@ -748,7 +765,13 @@
*/
static inline void __init ic_bootp_init(void)
{
+ /* Re-initialise all name servers and NTP servers to NONE, in case any
+ * were set via the "ip=" or "nfsaddrs=" kernel command line parameters:
+ * any IP addresses specified there will already have been decoded but
+ * are no longer needed
+ */
ic_nameservers_predef();
+ ic_ntp_servers_predef();
dev_add_pack(&bootp_packet_type);
}
@@ -912,6 +935,15 @@
ic_bootp_string(utsname()->domainname, ext+1, *ext,
__NEW_UTS_LEN);
break;
+ case 42: /* NTP servers */
+ servers = *ext / 4;
+ if (servers > CONF_NTP_SERVERS_MAX)
+ servers = CONF_NTP_SERVERS_MAX;
+ for (i = 0; i < servers; i++) {
+ if (ic_ntp_servers[i] == NONE)
+ memcpy(&ic_ntp_servers[i], ext+1+4*i, 4);
+ }
+ break;
}
}
@@ -1257,7 +1289,10 @@
#endif /* IPCONFIG_DYNAMIC */
#ifdef CONFIG_PROC_FS
+/* proc_dir_entry for /proc/net/ipconfig */
+static struct proc_dir_entry *ipconfig_dir;
+/* Name servers: */
static int pnp_seq_show(struct seq_file *seq, void *v)
{
int i;
@@ -1294,6 +1329,62 @@
.llseek = seq_lseek,
.release = single_release,
};
+
+/* Create the /proc/net/ipconfig directory */
+static int __init ipconfig_proc_net_init(void)
+{
+ ipconfig_dir = proc_net_mkdir(&init_net, "ipconfig", init_net.proc_net);
+ if (!ipconfig_dir)
+ return -ENOMEM;
+
+ return 0;
+}
+
+/* Create a new file under /proc/net/ipconfig */
+static int ipconfig_proc_net_create(const char *name,
+ const struct file_operations *fops)
+{
+ char *pname;
+ struct proc_dir_entry *p;
+
+ if (!ipconfig_dir)
+ return -ENOMEM;
+
+ pname = kasprintf(GFP_KERNEL, "%s%s", "ipconfig/", name);
+ if (!pname)
+ return -ENOMEM;
+
+ p = proc_create(pname, 0444, init_net.proc_net, fops);
+ kfree(pname);
+ if (!p)
+ return -ENOMEM;
+
+ return 0;
+}
+
+/* Write NTP server IP addresses to /proc/net/ipconfig/ntp_servers */
+static int ntp_servers_seq_show(struct seq_file *seq, void *v)
+{
+ int i;
+
+ for (i = 0; i < CONF_NTP_SERVERS_MAX; i++) {
+ if (ic_ntp_servers[i] != NONE)
+ seq_printf(seq, "%pI4\n", &ic_ntp_servers[i]);
+ }
+ return 0;
+}
+
+static int ntp_servers_seq_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, ntp_servers_seq_show, NULL);
+}
+
+static const struct file_operations ntp_servers_seq_fops = {
+ .open = ntp_servers_seq_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
#endif /* CONFIG_PROC_FS */
/*
@@ -1368,8 +1459,20 @@
int err;
unsigned int i;
+ /* Initialise all name servers and NTP servers to NONE (but only if the
+ * "ip=" or "nfsaddrs=" kernel command line parameters weren't decoded,
+ * otherwise we'll overwrite the IP addresses specified there)
+ */
+ if (ic_set_manually == 0) {
+ ic_nameservers_predef();
+ ic_ntp_servers_predef();
+ }
+
#ifdef CONFIG_PROC_FS
proc_create("pnp", 0444, init_net.proc_net, &pnp_seq_fops);
+
+ if (ipconfig_proc_net_init() == 0)
+ ipconfig_proc_net_create("ntp_servers", &ntp_servers_seq_fops);
#endif /* CONFIG_PROC_FS */
if (!ic_enable)
@@ -1481,16 +1584,32 @@
&ic_servaddr, &root_server_addr, root_server_path);
if (ic_dev_mtu)
pr_cont(", mtu=%d", ic_dev_mtu);
- for (i = 0; i < CONF_NAMESERVERS_MAX; i++)
+ /* Name servers (if any): */
+ for (i = 0; i < CONF_NAMESERVERS_MAX; i++) {
if (ic_nameservers[i] != NONE) {
- pr_cont(" nameserver%u=%pI4",
- i, &ic_nameservers[i]);
- break;
+ if (i == 0)
+ pr_info(" nameserver%u=%pI4",
+ i, &ic_nameservers[i]);
+ else
+ pr_cont(", nameserver%u=%pI4",
+ i, &ic_nameservers[i]);
}
- for (i++; i < CONF_NAMESERVERS_MAX; i++)
- if (ic_nameservers[i] != NONE)
- pr_cont(", nameserver%u=%pI4", i, &ic_nameservers[i]);
- pr_cont("\n");
+ if (i + 1 == CONF_NAMESERVERS_MAX)
+ pr_cont("\n");
+ }
+ /* NTP servers (if any): */
+ for (i = 0; i < CONF_NTP_SERVERS_MAX; i++) {
+ if (ic_ntp_servers[i] != NONE) {
+ if (i == 0)
+ pr_info(" ntpserver%u=%pI4",
+ i, &ic_ntp_servers[i]);
+ else
+ pr_cont(", ntpserver%u=%pI4",
+ i, &ic_ntp_servers[i]);
+ }
+ if (i + 1 == CONF_NTP_SERVERS_MAX)
+ pr_cont("\n");
+ }
#endif /* !SILENT */
/*
@@ -1588,7 +1707,9 @@
return 1;
}
+ /* Initialise all name servers and NTP servers to NONE */
ic_nameservers_predef();
+ ic_ntp_servers_predef();
/* Parse string for static IP assignment. */
ip = addrs;
@@ -1647,6 +1768,13 @@
ic_nameservers[1] = NONE;
}
break;
+ case 9:
+ if (CONF_NTP_SERVERS_MAX >= 1) {
+ ic_ntp_servers[0] = in_aton(ip);
+ if (ic_ntp_servers[0] == ANY)
+ ic_ntp_servers[0] = NONE;
+ }
+ break;
}
}
ip = cp;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 2fb4de3..38e092e 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -201,7 +201,8 @@
};
static int ipmr_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
- struct fib_rule_hdr *frh, struct nlattr **tb)
+ struct fib_rule_hdr *frh, struct nlattr **tb,
+ struct netlink_ext_ack *extack)
{
return 0;
}
diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
new file mode 100644
index 0000000..5121c64
--- /dev/null
+++ b/net/ipv4/metrics.c
@@ -0,0 +1,53 @@
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <linux/types.h>
+#include <net/ip.h>
+#include <net/net_namespace.h>
+#include <net/tcp.h>
+
+int ip_metrics_convert(struct net *net, struct nlattr *fc_mx, int fc_mx_len,
+ u32 *metrics)
+{
+ bool ecn_ca = false;
+ struct nlattr *nla;
+ int remaining;
+
+ if (!fc_mx)
+ return 0;
+
+ nla_for_each_attr(nla, fc_mx, fc_mx_len, remaining) {
+ int type = nla_type(nla);
+ u32 val;
+
+ if (!type)
+ continue;
+ if (type > RTAX_MAX)
+ return -EINVAL;
+
+ if (type == RTAX_CC_ALGO) {
+ char tmp[TCP_CA_NAME_MAX];
+
+ nla_strlcpy(tmp, nla, sizeof(tmp));
+ val = tcp_ca_get_key_by_name(net, tmp, &ecn_ca);
+ if (val == TCP_CA_UNSPEC)
+ return -EINVAL;
+ } else {
+ val = nla_get_u32(nla);
+ }
+ if (type == RTAX_ADVMSS && val > 65535 - 40)
+ val = 65535 - 40;
+ if (type == RTAX_MTU && val > 65535 - 15)
+ val = 65535 - 15;
+ if (type == RTAX_HOPLIMIT && val > 255)
+ val = 255;
+ if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK))
+ return -EINVAL;
+ metrics[type - 1] = val;
+ }
+
+ if (ecn_ca)
+ metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ip_metrics_convert);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index e85f35b..38ab97b 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -301,7 +301,7 @@
counter = xt_get_this_cpu_counter(&e->counters);
ADD_COUNTER(*counter, skb->len, 1);
- t = ipt_get_target(e);
+ t = ipt_get_target_c(e);
WARN_ON(!t->u.kernel.target);
#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE)
@@ -1783,6 +1783,8 @@
/* set res now, will see skbs right after nf_register_net_hooks */
WRITE_ONCE(*res, new_table);
+ if (!ops)
+ return 0;
ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
if (ret != 0) {
@@ -1800,7 +1802,8 @@
void ipt_unregister_table(struct net *net, struct xt_table *table,
const struct nf_hook_ops *ops)
{
- nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+ if (ops)
+ nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
__ipt_unregister_table(net, table);
}
diff --git a/net/ipv4/netfilter/ipt_MASQUERADE.c b/net/ipv4/netfilter/ipt_MASQUERADE.c
index a03e4e7..ce1512b 100644
--- a/net/ipv4/netfilter/ipt_MASQUERADE.c
+++ b/net/ipv4/netfilter/ipt_MASQUERADE.c
@@ -47,7 +47,7 @@
static unsigned int
masquerade_tg(struct sk_buff *skb, const struct xt_action_param *par)
{
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
const struct nf_nat_ipv4_multi_range_compat *mr;
mr = par->targinfo;
diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c
index 0f7255c..a317445 100644
--- a/net/ipv4/netfilter/iptable_nat.c
+++ b/net/ipv4/netfilter/iptable_nat.c
@@ -33,75 +33,63 @@
static unsigned int iptable_nat_do_chain(void *priv,
struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct)
+ const struct nf_hook_state *state)
{
return ipt_do_table(skb, state, state->net->ipv4.nat_table);
}
-static unsigned int iptable_nat_ipv4_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv4_fn(priv, skb, state, iptable_nat_do_chain);
-}
-
-static unsigned int iptable_nat_ipv4_in(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv4_in(priv, skb, state, iptable_nat_do_chain);
-}
-
-static unsigned int iptable_nat_ipv4_out(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv4_out(priv, skb, state, iptable_nat_do_chain);
-}
-
-static unsigned int iptable_nat_ipv4_local_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv4_local_fn(priv, skb, state, iptable_nat_do_chain);
-}
-
static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
- /* Before packet filtering, change destination */
{
- .hook = iptable_nat_ipv4_in,
+ .hook = iptable_nat_do_chain,
.pf = NFPROTO_IPV4,
- .nat_hook = true,
.hooknum = NF_INET_PRE_ROUTING,
.priority = NF_IP_PRI_NAT_DST,
},
- /* After packet filtering, change source */
{
- .hook = iptable_nat_ipv4_out,
+ .hook = iptable_nat_do_chain,
.pf = NFPROTO_IPV4,
- .nat_hook = true,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP_PRI_NAT_SRC,
},
- /* Before packet filtering, change destination */
{
- .hook = iptable_nat_ipv4_local_fn,
+ .hook = iptable_nat_do_chain,
.pf = NFPROTO_IPV4,
- .nat_hook = true,
.hooknum = NF_INET_LOCAL_OUT,
.priority = NF_IP_PRI_NAT_DST,
},
- /* After packet filtering, change source */
{
- .hook = iptable_nat_ipv4_fn,
+ .hook = iptable_nat_do_chain,
.pf = NFPROTO_IPV4,
- .nat_hook = true,
.hooknum = NF_INET_LOCAL_IN,
.priority = NF_IP_PRI_NAT_SRC,
},
};
+static int ipt_nat_register_lookups(struct net *net)
+{
+ int i, ret;
+
+ for (i = 0; i < ARRAY_SIZE(nf_nat_ipv4_ops); i++) {
+ ret = nf_nat_l3proto_ipv4_register_fn(net, &nf_nat_ipv4_ops[i]);
+ if (ret) {
+ while (i)
+ nf_nat_l3proto_ipv4_unregister_fn(net, &nf_nat_ipv4_ops[--i]);
+
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static void ipt_nat_unregister_lookups(struct net *net)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(nf_nat_ipv4_ops); i++)
+ nf_nat_l3proto_ipv4_unregister_fn(net, &nf_nat_ipv4_ops[i]);
+}
+
static int __net_init iptable_nat_table_init(struct net *net)
{
struct ipt_replace *repl;
@@ -114,7 +102,18 @@
if (repl == NULL)
return -ENOMEM;
ret = ipt_register_table(net, &nf_nat_ipv4_table, repl,
- nf_nat_ipv4_ops, &net->ipv4.nat_table);
+ NULL, &net->ipv4.nat_table);
+ if (ret < 0) {
+ kfree(repl);
+ return ret;
+ }
+
+ ret = ipt_nat_register_lookups(net);
+ if (ret < 0) {
+ ipt_unregister_table(net, net->ipv4.nat_table, NULL);
+ net->ipv4.nat_table = NULL;
+ }
+
kfree(repl);
return ret;
}
@@ -123,7 +122,8 @@
{
if (!net->ipv4.nat_table)
return;
- ipt_unregister_table(net, net->ipv4.nat_table, nf_nat_ipv4_ops);
+ ipt_nat_unregister_lookups(net);
+ ipt_unregister_table(net, net->ipv4.nat_table, NULL);
net->ipv4.nat_table = NULL;
}
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index 0cd46bf..e1e56d7 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -2,265 +2,12 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/netfilter.h>
-#include <linux/rhashtable.h>
-#include <linux/ip.h>
-#include <linux/netdevice.h>
-#include <net/ip.h>
-#include <net/neighbour.h>
#include <net/netfilter/nf_flow_table.h>
#include <net/netfilter/nf_tables.h>
-/* For layer 4 checksum field offset. */
-#include <linux/tcp.h>
-#include <linux/udp.h>
-
-static int nf_flow_nat_ip_tcp(struct sk_buff *skb, unsigned int thoff,
- __be32 addr, __be32 new_addr)
-{
- struct tcphdr *tcph;
-
- if (!pskb_may_pull(skb, thoff + sizeof(*tcph)) ||
- skb_try_make_writable(skb, thoff + sizeof(*tcph)))
- return -1;
-
- tcph = (void *)(skb_network_header(skb) + thoff);
- inet_proto_csum_replace4(&tcph->check, skb, addr, new_addr, true);
-
- return 0;
-}
-
-static int nf_flow_nat_ip_udp(struct sk_buff *skb, unsigned int thoff,
- __be32 addr, __be32 new_addr)
-{
- struct udphdr *udph;
-
- if (!pskb_may_pull(skb, thoff + sizeof(*udph)) ||
- skb_try_make_writable(skb, thoff + sizeof(*udph)))
- return -1;
-
- udph = (void *)(skb_network_header(skb) + thoff);
- if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) {
- inet_proto_csum_replace4(&udph->check, skb, addr,
- new_addr, true);
- if (!udph->check)
- udph->check = CSUM_MANGLED_0;
- }
-
- return 0;
-}
-
-static int nf_flow_nat_ip_l4proto(struct sk_buff *skb, struct iphdr *iph,
- unsigned int thoff, __be32 addr,
- __be32 new_addr)
-{
- switch (iph->protocol) {
- case IPPROTO_TCP:
- if (nf_flow_nat_ip_tcp(skb, thoff, addr, new_addr) < 0)
- return NF_DROP;
- break;
- case IPPROTO_UDP:
- if (nf_flow_nat_ip_udp(skb, thoff, addr, new_addr) < 0)
- return NF_DROP;
- break;
- }
-
- return 0;
-}
-
-static int nf_flow_snat_ip(const struct flow_offload *flow, struct sk_buff *skb,
- struct iphdr *iph, unsigned int thoff,
- enum flow_offload_tuple_dir dir)
-{
- __be32 addr, new_addr;
-
- switch (dir) {
- case FLOW_OFFLOAD_DIR_ORIGINAL:
- addr = iph->saddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4.s_addr;
- iph->saddr = new_addr;
- break;
- case FLOW_OFFLOAD_DIR_REPLY:
- addr = iph->daddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v4.s_addr;
- iph->daddr = new_addr;
- break;
- default:
- return -1;
- }
- csum_replace4(&iph->check, addr, new_addr);
-
- return nf_flow_nat_ip_l4proto(skb, iph, thoff, addr, new_addr);
-}
-
-static int nf_flow_dnat_ip(const struct flow_offload *flow, struct sk_buff *skb,
- struct iphdr *iph, unsigned int thoff,
- enum flow_offload_tuple_dir dir)
-{
- __be32 addr, new_addr;
-
- switch (dir) {
- case FLOW_OFFLOAD_DIR_ORIGINAL:
- addr = iph->daddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4.s_addr;
- iph->daddr = new_addr;
- break;
- case FLOW_OFFLOAD_DIR_REPLY:
- addr = iph->saddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v4.s_addr;
- iph->saddr = new_addr;
- break;
- default:
- return -1;
- }
- csum_replace4(&iph->check, addr, new_addr);
-
- return nf_flow_nat_ip_l4proto(skb, iph, thoff, addr, new_addr);
-}
-
-static int nf_flow_nat_ip(const struct flow_offload *flow, struct sk_buff *skb,
- enum flow_offload_tuple_dir dir)
-{
- struct iphdr *iph = ip_hdr(skb);
- unsigned int thoff = iph->ihl * 4;
-
- if (flow->flags & FLOW_OFFLOAD_SNAT &&
- (nf_flow_snat_port(flow, skb, thoff, iph->protocol, dir) < 0 ||
- nf_flow_snat_ip(flow, skb, iph, thoff, dir) < 0))
- return -1;
- if (flow->flags & FLOW_OFFLOAD_DNAT &&
- (nf_flow_dnat_port(flow, skb, thoff, iph->protocol, dir) < 0 ||
- nf_flow_dnat_ip(flow, skb, iph, thoff, dir) < 0))
- return -1;
-
- return 0;
-}
-
-static bool ip_has_options(unsigned int thoff)
-{
- return thoff != sizeof(struct iphdr);
-}
-
-static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
- struct flow_offload_tuple *tuple)
-{
- struct flow_ports *ports;
- unsigned int thoff;
- struct iphdr *iph;
-
- if (!pskb_may_pull(skb, sizeof(*iph)))
- return -1;
-
- iph = ip_hdr(skb);
- thoff = iph->ihl * 4;
-
- if (ip_is_fragment(iph) ||
- unlikely(ip_has_options(thoff)))
- return -1;
-
- if (iph->protocol != IPPROTO_TCP &&
- iph->protocol != IPPROTO_UDP)
- return -1;
-
- thoff = iph->ihl * 4;
- if (!pskb_may_pull(skb, thoff + sizeof(*ports)))
- return -1;
-
- ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
-
- tuple->src_v4.s_addr = iph->saddr;
- tuple->dst_v4.s_addr = iph->daddr;
- tuple->src_port = ports->source;
- tuple->dst_port = ports->dest;
- tuple->l3proto = AF_INET;
- tuple->l4proto = iph->protocol;
- tuple->iifidx = dev->ifindex;
-
- return 0;
-}
-
-/* Based on ip_exceeds_mtu(). */
-static bool __nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
-{
- if (skb->len <= mtu)
- return false;
-
- if ((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0)
- return false;
-
- if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
- return false;
-
- return true;
-}
-
-static bool nf_flow_exceeds_mtu(struct sk_buff *skb, const struct rtable *rt)
-{
- u32 mtu;
-
- mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
- if (__nf_flow_exceeds_mtu(skb, mtu))
- return true;
-
- return false;
-}
-
-unsigned int
-nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- struct flow_offload_tuple_rhash *tuplehash;
- struct nf_flowtable *flow_table = priv;
- struct flow_offload_tuple tuple = {};
- enum flow_offload_tuple_dir dir;
- struct flow_offload *flow;
- struct net_device *outdev;
- const struct rtable *rt;
- struct iphdr *iph;
- __be32 nexthop;
-
- if (skb->protocol != htons(ETH_P_IP))
- return NF_ACCEPT;
-
- if (nf_flow_tuple_ip(skb, state->in, &tuple) < 0)
- return NF_ACCEPT;
-
- tuplehash = flow_offload_lookup(flow_table, &tuple);
- if (tuplehash == NULL)
- return NF_ACCEPT;
-
- outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
- if (!outdev)
- return NF_ACCEPT;
-
- dir = tuplehash->tuple.dir;
- flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
-
- rt = (const struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
- if (unlikely(nf_flow_exceeds_mtu(skb, rt)))
- return NF_ACCEPT;
-
- if (skb_try_make_writable(skb, sizeof(*iph)))
- return NF_DROP;
-
- if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
- nf_flow_nat_ip(flow, skb, dir) < 0)
- return NF_DROP;
-
- flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
- iph = ip_hdr(skb);
- ip_decrease_ttl(iph);
-
- skb->dev = outdev;
- nexthop = rt_nexthop(rt, flow->tuplehash[!dir].tuple.src_v4.s_addr);
- neigh_xmit(NEIGH_ARP_TABLE, outdev, &nexthop, skb);
-
- return NF_STOLEN;
-}
-EXPORT_SYMBOL_GPL(nf_flow_offload_ip_hook);
static struct nf_flowtable_type flowtable_ipv4 = {
.family = NFPROTO_IPV4,
- .params = &nf_flow_offload_rhash_params,
- .gc = nf_flow_offload_work_gc,
+ .init = nf_flow_table_init,
.free = nf_flow_table_free,
.hook = nf_flow_offload_ip_hook,
.owner = THIS_MODULE,
diff --git a/net/ipv4/netfilter/nf_nat_h323.c b/net/ipv4/netfilter/nf_nat_h323.c
index ac8342d..4e6b53a 100644
--- a/net/ipv4/netfilter/nf_nat_h323.c
+++ b/net/ipv4/netfilter/nf_nat_h323.c
@@ -395,7 +395,7 @@
static void ip_nat_q931_expect(struct nf_conn *new,
struct nf_conntrack_expect *this)
{
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
if (this->tuple.src.u3.ip != 0) { /* Only accept calls from GK */
nf_nat_follow_master(new, this);
@@ -497,7 +497,7 @@
static void ip_nat_callforwarding_expect(struct nf_conn *new,
struct nf_conntrack_expect *this)
{
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
/* This must be a fresh one. */
BUG_ON(new->status & IPS_NAT_DONE_MASK);
diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index f7ff6a36..6115bf1 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -63,7 +63,7 @@
#endif /* CONFIG_XFRM */
static bool nf_nat_ipv4_in_range(const struct nf_conntrack_tuple *t,
- const struct nf_nat_range *range)
+ const struct nf_nat_range2 *range)
{
return ntohl(t->src.u3.ip) >= ntohl(range->min_addr.ip) &&
ntohl(t->src.u3.ip) <= ntohl(range->max_addr.ip);
@@ -143,7 +143,7 @@
#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
static int nf_nat_ipv4_nlattr_to_range(struct nlattr *tb[],
- struct nf_nat_range *range)
+ struct nf_nat_range2 *range)
{
if (tb[CTA_NAT_V4_MINIP]) {
range->min_addr.ip = nla_get_be32(tb[CTA_NAT_V4_MINIP]);
@@ -241,34 +241,18 @@
}
EXPORT_SYMBOL_GPL(nf_nat_icmp_reply_translation);
-unsigned int
+static unsigned int
nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
struct nf_conn *ct;
enum ip_conntrack_info ctinfo;
- struct nf_conn_nat *nat;
- /* maniptype == SRC for postrouting. */
- enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
ct = nf_ct_get(skb, &ctinfo);
- /* Can't track? It's not due to stress, or conntrack would
- * have dropped it. Hence it's the user's responsibilty to
- * packet filter it out, or implement conntrack/NAT for that
- * protocol. 8) --RR
- */
if (!ct)
return NF_ACCEPT;
- nat = nfct_nat(ct);
-
- switch (ctinfo) {
- case IP_CT_RELATED:
- case IP_CT_RELATED_REPLY:
+ if (ctinfo == IP_CT_RELATED || ctinfo == IP_CT_RELATED_REPLY) {
if (ip_hdr(skb)->protocol == IPPROTO_ICMP) {
if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
state->hook))
@@ -276,78 +260,30 @@
else
return NF_ACCEPT;
}
- /* Only ICMPs can be IP_CT_IS_REPLY: */
- /* fall through */
- case IP_CT_NEW:
- /* Seen it before? This can happen for loopback, retrans,
- * or local packets.
- */
- if (!nf_nat_initialized(ct, maniptype)) {
- unsigned int ret;
-
- ret = do_chain(priv, skb, state, ct);
- if (ret != NF_ACCEPT)
- return ret;
-
- if (nf_nat_initialized(ct, HOOK2MANIP(state->hook)))
- break;
-
- ret = nf_nat_alloc_null_binding(ct, state->hook);
- if (ret != NF_ACCEPT)
- return ret;
- } else {
- pr_debug("Already setup manip %s for ct %p\n",
- maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
- ct);
- if (nf_nat_oif_changed(state->hook, ctinfo, nat,
- state->out))
- goto oif_changed;
- }
- break;
-
- default:
- /* ESTABLISHED */
- WARN_ON(ctinfo != IP_CT_ESTABLISHED &&
- ctinfo != IP_CT_ESTABLISHED_REPLY);
- if (nf_nat_oif_changed(state->hook, ctinfo, nat, state->out))
- goto oif_changed;
}
- return nf_nat_packet(ct, ctinfo, state->hook, skb);
-
-oif_changed:
- nf_ct_kill_acct(ct, ctinfo, skb);
- return NF_DROP;
+ return nf_nat_inet_fn(priv, skb, state);
}
EXPORT_SYMBOL_GPL(nf_nat_ipv4_fn);
-unsigned int
+static unsigned int
nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
unsigned int ret;
__be32 daddr = ip_hdr(skb)->daddr;
- ret = nf_nat_ipv4_fn(priv, skb, state, do_chain);
+ ret = nf_nat_ipv4_fn(priv, skb, state);
if (ret != NF_DROP && ret != NF_STOLEN &&
daddr != ip_hdr(skb)->daddr)
skb_dst_drop(skb);
return ret;
}
-EXPORT_SYMBOL_GPL(nf_nat_ipv4_in);
-unsigned int
+static unsigned int
nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
#ifdef CONFIG_XFRM
const struct nf_conn *ct;
@@ -356,7 +292,7 @@
#endif
unsigned int ret;
- ret = nf_nat_ipv4_fn(priv, skb, state, do_chain);
+ ret = nf_nat_ipv4_fn(priv, skb, state);
#ifdef CONFIG_XFRM
if (ret != NF_DROP && ret != NF_STOLEN &&
!(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
@@ -376,22 +312,17 @@
#endif
return ret;
}
-EXPORT_SYMBOL_GPL(nf_nat_ipv4_out);
-unsigned int
+static unsigned int
nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
const struct nf_conn *ct;
enum ip_conntrack_info ctinfo;
unsigned int ret;
int err;
- ret = nf_nat_ipv4_fn(priv, skb, state, do_chain);
+ ret = nf_nat_ipv4_fn(priv, skb, state);
if (ret != NF_DROP && ret != NF_STOLEN &&
(ct = nf_ct_get(skb, &ctinfo)) != NULL) {
enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
@@ -415,7 +346,49 @@
}
return ret;
}
-EXPORT_SYMBOL_GPL(nf_nat_ipv4_local_fn);
+
+static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
+ /* Before packet filtering, change destination */
+ {
+ .hook = nf_nat_ipv4_in,
+ .pf = NFPROTO_IPV4,
+ .hooknum = NF_INET_PRE_ROUTING,
+ .priority = NF_IP_PRI_NAT_DST,
+ },
+ /* After packet filtering, change source */
+ {
+ .hook = nf_nat_ipv4_out,
+ .pf = NFPROTO_IPV4,
+ .hooknum = NF_INET_POST_ROUTING,
+ .priority = NF_IP_PRI_NAT_SRC,
+ },
+ /* Before packet filtering, change destination */
+ {
+ .hook = nf_nat_ipv4_local_fn,
+ .pf = NFPROTO_IPV4,
+ .hooknum = NF_INET_LOCAL_OUT,
+ .priority = NF_IP_PRI_NAT_DST,
+ },
+ /* After packet filtering, change source */
+ {
+ .hook = nf_nat_ipv4_fn,
+ .pf = NFPROTO_IPV4,
+ .hooknum = NF_INET_LOCAL_IN,
+ .priority = NF_IP_PRI_NAT_SRC,
+ },
+};
+
+int nf_nat_l3proto_ipv4_register_fn(struct net *net, const struct nf_hook_ops *ops)
+{
+ return nf_nat_register_fn(net, ops, nf_nat_ipv4_ops, ARRAY_SIZE(nf_nat_ipv4_ops));
+}
+EXPORT_SYMBOL_GPL(nf_nat_l3proto_ipv4_register_fn);
+
+void nf_nat_l3proto_ipv4_unregister_fn(struct net *net, const struct nf_hook_ops *ops)
+{
+ nf_nat_unregister_fn(net, ops, ARRAY_SIZE(nf_nat_ipv4_ops));
+}
+EXPORT_SYMBOL_GPL(nf_nat_l3proto_ipv4_unregister_fn);
static int __init nf_nat_l3proto_ipv4_init(void)
{
diff --git a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
index 0c366aa..f538c500 100644
--- a/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_masquerade_ipv4.c
@@ -24,13 +24,13 @@
unsigned int
nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
const struct net_device *out)
{
struct nf_conn *ct;
struct nf_conn_nat *nat;
enum ip_conntrack_info ctinfo;
- struct nf_nat_range newrange;
+ struct nf_nat_range2 newrange;
const struct rtable *rt;
__be32 newsrc, nh;
diff --git a/net/ipv4/netfilter/nf_nat_pptp.c b/net/ipv4/netfilter/nf_nat_pptp.c
index 8a69363..5d259a1 100644
--- a/net/ipv4/netfilter/nf_nat_pptp.c
+++ b/net/ipv4/netfilter/nf_nat_pptp.c
@@ -48,7 +48,7 @@
struct nf_conntrack_tuple t = {};
const struct nf_ct_pptp_master *ct_pptp_info;
const struct nf_nat_pptp *nat_pptp_info;
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
struct nf_conn_nat *nat;
nat = nf_ct_nat_ext_add(ct);
diff --git a/net/ipv4/netfilter/nf_nat_proto_gre.c b/net/ipv4/netfilter/nf_nat_proto_gre.c
index edf0500..00fda63 100644
--- a/net/ipv4/netfilter/nf_nat_proto_gre.c
+++ b/net/ipv4/netfilter/nf_nat_proto_gre.c
@@ -41,7 +41,7 @@
static void
gre_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/ipv4/netfilter/nf_nat_proto_icmp.c b/net/ipv4/netfilter/nf_nat_proto_icmp.c
index 7b98baa..6d7cf1d 100644
--- a/net/ipv4/netfilter/nf_nat_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_nat_proto_icmp.c
@@ -30,7 +30,7 @@
static void
icmp_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/ipv4/netfilter/nft_chain_nat_ipv4.c b/net/ipv4/netfilter/nft_chain_nat_ipv4.c
index b5464a3..a3c4ea3 100644
--- a/net/ipv4/netfilter/nft_chain_nat_ipv4.c
+++ b/net/ipv4/netfilter/nft_chain_nat_ipv4.c
@@ -27,9 +27,8 @@
#include <net/ip.h>
static unsigned int nft_nat_do_chain(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct)
+ struct sk_buff *skb,
+ const struct nf_hook_state *state)
{
struct nft_pktinfo pkt;
@@ -39,42 +38,14 @@
return nft_do_chain(&pkt, priv);
}
-static unsigned int nft_nat_ipv4_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
+static int nft_nat_ipv4_reg(struct net *net, const struct nf_hook_ops *ops)
{
- return nf_nat_ipv4_fn(priv, skb, state, nft_nat_do_chain);
+ return nf_nat_l3proto_ipv4_register_fn(net, ops);
}
-static unsigned int nft_nat_ipv4_in(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
+static void nft_nat_ipv4_unreg(struct net *net, const struct nf_hook_ops *ops)
{
- return nf_nat_ipv4_in(priv, skb, state, nft_nat_do_chain);
-}
-
-static unsigned int nft_nat_ipv4_out(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv4_out(priv, skb, state, nft_nat_do_chain);
-}
-
-static unsigned int nft_nat_ipv4_local_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv4_local_fn(priv, skb, state, nft_nat_do_chain);
-}
-
-static int nft_nat_ipv4_init(struct nft_ctx *ctx)
-{
- return nf_ct_netns_get(ctx->net, ctx->family);
-}
-
-static void nft_nat_ipv4_free(struct nft_ctx *ctx)
-{
- nf_ct_netns_put(ctx->net, ctx->family);
+ nf_nat_l3proto_ipv4_unregister_fn(net, ops);
}
static const struct nft_chain_type nft_chain_nat_ipv4 = {
@@ -87,13 +58,13 @@
(1 << NF_INET_LOCAL_OUT) |
(1 << NF_INET_LOCAL_IN),
.hooks = {
- [NF_INET_PRE_ROUTING] = nft_nat_ipv4_in,
- [NF_INET_POST_ROUTING] = nft_nat_ipv4_out,
- [NF_INET_LOCAL_OUT] = nft_nat_ipv4_local_fn,
- [NF_INET_LOCAL_IN] = nft_nat_ipv4_fn,
+ [NF_INET_PRE_ROUTING] = nft_nat_do_chain,
+ [NF_INET_POST_ROUTING] = nft_nat_do_chain,
+ [NF_INET_LOCAL_OUT] = nft_nat_do_chain,
+ [NF_INET_LOCAL_IN] = nft_nat_do_chain,
},
- .init = nft_nat_ipv4_init,
- .free = nft_nat_ipv4_free,
+ .ops_register = nft_nat_ipv4_reg,
+ .ops_unregister = nft_nat_ipv4_unreg,
};
static int __init nft_chain_nat_init(void)
diff --git a/net/ipv4/netfilter/nft_masq_ipv4.c b/net/ipv4/netfilter/nft_masq_ipv4.c
index f186772..f1193e1 100644
--- a/net/ipv4/netfilter/nft_masq_ipv4.c
+++ b/net/ipv4/netfilter/nft_masq_ipv4.c
@@ -21,7 +21,7 @@
const struct nft_pktinfo *pkt)
{
struct nft_masq *priv = nft_expr_priv(expr);
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
memset(&range, 0, sizeof(range));
range.flags = priv->flags;
diff --git a/net/ipv4/netlink.c b/net/ipv4/netlink.c
new file mode 100644
index 0000000..f86bb4f
--- /dev/null
+++ b/net/ipv4/netlink.c
@@ -0,0 +1,23 @@
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <linux/types.h>
+#include <net/net_namespace.h>
+#include <net/netlink.h>
+#include <net/ip.h>
+
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+ struct netlink_ext_ack *extack)
+{
+ *ip_proto = nla_get_u8(attr);
+
+ switch (*ip_proto) {
+ case IPPROTO_TCP:
+ case IPPROTO_UDP:
+ case IPPROTO_ICMP:
+ return 0;
+ default:
+ NL_SET_ERR_MSG(extack, "Unsupported ip proto");
+ return -EOPNOTSUPP;
+ }
+}
+EXPORT_SYMBOL_GPL(rtm_getroute_parse_ip_proto);
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index a058de6..6c1ff89 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -296,6 +296,9 @@
SNMP_MIB_ITEM("TCPKeepAlive", LINUX_MIB_TCPKEEPALIVE),
SNMP_MIB_ITEM("TCPMTUPFail", LINUX_MIB_TCPMTUPFAIL),
SNMP_MIB_ITEM("TCPMTUPSuccess", LINUX_MIB_TCPMTUPSUCCESS),
+ SNMP_MIB_ITEM("TCPDelivered", LINUX_MIB_TCPDELIVERED),
+ SNMP_MIB_ITEM("TCPDeliveredCE", LINUX_MIB_TCPDELIVEREDCE),
+ SNMP_MIB_ITEM("TCPAckCompressed", LINUX_MIB_TCPACKCOMPRESSED),
SNMP_MIB_SENTINEL
};
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2cfa1b5..45ad258 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1352,6 +1352,37 @@
return NULL;
}
+/* MTU selection:
+ * 1. mtu on route is locked - use it
+ * 2. mtu from nexthop exception
+ * 3. mtu from egress device
+ */
+
+u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr)
+{
+ struct fib_info *fi = res->fi;
+ struct fib_nh *nh = &fi->fib_nh[res->nh_sel];
+ struct net_device *dev = nh->nh_dev;
+ u32 mtu = 0;
+
+ if (dev_net(dev)->ipv4.sysctl_ip_fwd_use_pmtu ||
+ fi->fib_metrics->metrics[RTAX_LOCK - 1] & (1 << RTAX_MTU))
+ mtu = fi->fib_mtu;
+
+ if (likely(!mtu)) {
+ struct fib_nh_exception *fnhe;
+
+ fnhe = find_exception(nh, daddr);
+ if (fnhe && !time_after_eq(jiffies, fnhe->fnhe_expires))
+ mtu = fnhe->fnhe_pmtu;
+ }
+
+ if (likely(!mtu))
+ mtu = min(READ_ONCE(dev->mtu), IP_MAX_MTU);
+
+ return mtu - lwtunnel_headroom(nh->nh_lwtstate, mtu);
+}
+
static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
__be32 daddr, const bool do_cache)
{
@@ -2574,11 +2605,10 @@
EXPORT_SYMBOL_GPL(ip_route_output_flow);
/* called with rcu_read_lock held */
-static int rt_fill_info(struct net *net, __be32 dst, __be32 src, u32 table_id,
- struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
- u32 seq)
+static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
+ struct rtable *rt, u32 table_id, struct flowi4 *fl4,
+ struct sk_buff *skb, u32 portid, u32 seq)
{
- struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
struct nlmsghdr *nlh;
unsigned long expires = 0;
@@ -2674,7 +2704,7 @@
}
} else
#endif
- if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
+ if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
goto nla_put_failure;
}
@@ -2689,43 +2719,93 @@
return -EMSGSIZE;
}
+static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst,
+ u8 ip_proto, __be16 sport,
+ __be16 dport)
+{
+ struct sk_buff *skb;
+ struct iphdr *iph;
+
+ skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+ if (!skb)
+ return NULL;
+
+ /* Reserve room for dummy headers, this skb can pass
+ * through good chunk of routing engine.
+ */
+ skb_reset_mac_header(skb);
+ skb_reset_network_header(skb);
+ skb->protocol = htons(ETH_P_IP);
+ iph = skb_put(skb, sizeof(struct iphdr));
+ iph->protocol = ip_proto;
+ iph->saddr = src;
+ iph->daddr = dst;
+ iph->version = 0x4;
+ iph->frag_off = 0;
+ iph->ihl = 0x5;
+ skb_set_transport_header(skb, skb->len);
+
+ switch (iph->protocol) {
+ case IPPROTO_UDP: {
+ struct udphdr *udph;
+
+ udph = skb_put_zero(skb, sizeof(struct udphdr));
+ udph->source = sport;
+ udph->dest = dport;
+ udph->len = sizeof(struct udphdr);
+ udph->check = 0;
+ break;
+ }
+ case IPPROTO_TCP: {
+ struct tcphdr *tcph;
+
+ tcph = skb_put_zero(skb, sizeof(struct tcphdr));
+ tcph->source = sport;
+ tcph->dest = dport;
+ tcph->doff = sizeof(struct tcphdr) / 4;
+ tcph->rst = 1;
+ tcph->check = ~tcp_v4_check(sizeof(struct tcphdr),
+ src, dst, 0);
+ break;
+ }
+ case IPPROTO_ICMP: {
+ struct icmphdr *icmph;
+
+ icmph = skb_put_zero(skb, sizeof(struct icmphdr));
+ icmph->type = ICMP_ECHO;
+ icmph->code = 0;
+ }
+ }
+
+ return skb;
+}
+
static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(in_skb->sk);
- struct rtmsg *rtm;
struct nlattr *tb[RTA_MAX+1];
+ u32 table_id = RT_TABLE_MAIN;
+ __be16 sport = 0, dport = 0;
struct fib_result res = {};
+ u8 ip_proto = IPPROTO_UDP;
struct rtable *rt = NULL;
+ struct sk_buff *skb;
+ struct rtmsg *rtm;
struct flowi4 fl4;
__be32 dst = 0;
__be32 src = 0;
+ kuid_t uid;
u32 iif;
int err;
int mark;
- struct sk_buff *skb;
- u32 table_id = RT_TABLE_MAIN;
- kuid_t uid;
err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy,
extack);
if (err < 0)
- goto errout;
+ return err;
rtm = nlmsg_data(nlh);
-
- skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
- if (!skb) {
- err = -ENOBUFS;
- goto errout;
- }
-
- /* Reserve room for dummy headers, this skb can pass
- through good chunk of routing engine.
- */
- skb_reset_mac_header(skb);
- skb_reset_network_header(skb);
-
src = tb[RTA_SRC] ? nla_get_in_addr(tb[RTA_SRC]) : 0;
dst = tb[RTA_DST] ? nla_get_in_addr(tb[RTA_DST]) : 0;
iif = tb[RTA_IIF] ? nla_get_u32(tb[RTA_IIF]) : 0;
@@ -2735,14 +2815,22 @@
else
uid = (iif ? INVALID_UID : current_uid());
- /* Bugfix: need to give ip_route_input enough of an IP header to
- * not gag.
- */
- ip_hdr(skb)->protocol = IPPROTO_UDP;
- ip_hdr(skb)->saddr = src;
- ip_hdr(skb)->daddr = dst;
+ if (tb[RTA_IP_PROTO]) {
+ err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+ &ip_proto, extack);
+ if (err)
+ return err;
+ }
- skb_reserve(skb, MAX_HEADER + sizeof(struct iphdr));
+ if (tb[RTA_SPORT])
+ sport = nla_get_be16(tb[RTA_SPORT]);
+
+ if (tb[RTA_DPORT])
+ dport = nla_get_be16(tb[RTA_DPORT]);
+
+ skb = inet_rtm_getroute_build_skb(src, dst, ip_proto, sport, dport);
+ if (!skb)
+ return -ENOBUFS;
memset(&fl4, 0, sizeof(fl4));
fl4.daddr = dst;
@@ -2751,6 +2839,11 @@
fl4.flowi4_oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0;
fl4.flowi4_mark = mark;
fl4.flowi4_uid = uid;
+ if (sport)
+ fl4.fl4_sport = sport;
+ if (dport)
+ fl4.fl4_dport = dport;
+ fl4.flowi4_proto = ip_proto;
rcu_read_lock();
@@ -2760,10 +2853,10 @@
dev = dev_get_by_index_rcu(net, iif);
if (!dev) {
err = -ENODEV;
- goto errout_free;
+ goto errout_rcu;
}
- skb->protocol = htons(ETH_P_IP);
+ fl4.flowi4_iif = iif; /* for rt_fill_info */
skb->dev = dev;
skb->mark = mark;
err = ip_route_input_rcu(skb, dst, src, rtm->rtm_tos,
@@ -2783,7 +2876,7 @@
}
if (err)
- goto errout_free;
+ goto errout_rcu;
if (rtm->rtm_flags & RTM_F_NOTIFY)
rt->rt_flags |= RTCF_NOTIFY;
@@ -2791,34 +2884,40 @@
if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
table_id = res.table ? res.table->tb_id : 0;
+ /* reset skb for netlink reply msg */
+ skb_trim(skb, 0);
+ skb_reset_network_header(skb);
+ skb_reset_transport_header(skb);
+ skb_reset_mac_header(skb);
+
if (rtm->rtm_flags & RTM_F_FIB_MATCH) {
if (!res.fi) {
err = fib_props[res.type].error;
if (!err)
err = -EHOSTUNREACH;
- goto errout_free;
+ goto errout_rcu;
}
err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
rt->rt_type, res.prefix, res.prefixlen,
fl4.flowi4_tos, res.fi, 0);
} else {
- err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
+ err = rt_fill_info(net, dst, src, rt, table_id, &fl4, skb,
NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
}
if (err < 0)
- goto errout_free;
+ goto errout_rcu;
rcu_read_unlock();
err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
-errout:
- return err;
errout_free:
+ return err;
+errout_rcu:
rcu_read_unlock();
kfree_skb(skb);
- goto errout;
+ goto errout_free;
}
void ip_rt_multicast_event(struct in_device *in_dev)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 4b195ba..d2eed3d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -46,6 +46,7 @@
static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
static int ip_ping_group_range_min[] = { 0, 0 };
static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
+static int comp_sack_nr_max = 255;
/* obsolete */
static int sysctl_tcp_low_latency __read_mostly;
@@ -1152,6 +1153,22 @@
.extra1 = &one,
},
{
+ .procname = "tcp_comp_sack_delay_ns",
+ .data = &init_net.ipv4.sysctl_tcp_comp_sack_delay_ns,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "tcp_comp_sack_nr",
+ .data = &init_net.ipv4.sysctl_tcp_comp_sack_nr,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &comp_sack_nr_max,
+ },
+ {
.procname = "udp_rmem_min",
.data = &init_net.ipv4.sysctl_udp_rmem_min,
.maxlen = sizeof(init_net.ipv4.sysctl_udp_rmem_min),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c9d00ef..0a2ea0b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1702,6 +1702,139 @@
}
EXPORT_SYMBOL(tcp_peek_len);
+/* Make sure sk_rcvbuf is big enough to satisfy SO_RCVLOWAT hint */
+int tcp_set_rcvlowat(struct sock *sk, int val)
+{
+ sk->sk_rcvlowat = val ? : 1;
+
+ /* Check if we need to signal EPOLLIN right now */
+ tcp_data_ready(sk);
+
+ if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
+ return 0;
+
+ /* val comes from user space and might be close to INT_MAX */
+ val <<= 1;
+ if (val < 0)
+ val = INT_MAX;
+
+ val = min(val, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]);
+ if (val > sk->sk_rcvbuf) {
+ sk->sk_rcvbuf = val;
+ tcp_sk(sk)->window_clamp = tcp_win_from_space(sk, val);
+ }
+ return 0;
+}
+EXPORT_SYMBOL(tcp_set_rcvlowat);
+
+#ifdef CONFIG_MMU
+static const struct vm_operations_struct tcp_vm_ops = {
+};
+
+int tcp_mmap(struct file *file, struct socket *sock,
+ struct vm_area_struct *vma)
+{
+ if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+ return -EPERM;
+ vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+ /* Instruct vm_insert_page() to not down_read(mmap_sem) */
+ vma->vm_flags |= VM_MIXEDMAP;
+
+ vma->vm_ops = &tcp_vm_ops;
+ return 0;
+}
+EXPORT_SYMBOL(tcp_mmap);
+
+static int tcp_zerocopy_receive(struct sock *sk,
+ struct tcp_zerocopy_receive *zc)
+{
+ unsigned long address = (unsigned long)zc->address;
+ const skb_frag_t *frags = NULL;
+ u32 length = 0, seq, offset;
+ struct vm_area_struct *vma;
+ struct sk_buff *skb = NULL;
+ struct tcp_sock *tp;
+ int ret;
+
+ if (address & (PAGE_SIZE - 1) || address != zc->address)
+ return -EINVAL;
+
+ if (sk->sk_state == TCP_LISTEN)
+ return -ENOTCONN;
+
+ sock_rps_record_flow(sk);
+
+ down_read(¤t->mm->mmap_sem);
+
+ ret = -EINVAL;
+ vma = find_vma(current->mm, address);
+ if (!vma || vma->vm_start > address || vma->vm_ops != &tcp_vm_ops)
+ goto out;
+ zc->length = min_t(unsigned long, zc->length, vma->vm_end - address);
+
+ tp = tcp_sk(sk);
+ seq = tp->copied_seq;
+ zc->length = min_t(u32, zc->length, tcp_inq(sk));
+ zc->length &= ~(PAGE_SIZE - 1);
+
+ zap_page_range(vma, address, zc->length);
+
+ zc->recv_skip_hint = 0;
+ ret = 0;
+ while (length + PAGE_SIZE <= zc->length) {
+ if (zc->recv_skip_hint < PAGE_SIZE) {
+ if (skb) {
+ skb = skb->next;
+ offset = seq - TCP_SKB_CB(skb)->seq;
+ } else {
+ skb = tcp_recv_skb(sk, seq, &offset);
+ }
+
+ zc->recv_skip_hint = skb->len - offset;
+ offset -= skb_headlen(skb);
+ if ((int)offset < 0 || skb_has_frag_list(skb))
+ break;
+ frags = skb_shinfo(skb)->frags;
+ while (offset) {
+ if (frags->size > offset)
+ goto out;
+ offset -= frags->size;
+ frags++;
+ }
+ }
+ if (frags->size != PAGE_SIZE || frags->page_offset)
+ break;
+ ret = vm_insert_page(vma, address + length,
+ skb_frag_page(frags));
+ if (ret)
+ break;
+ length += PAGE_SIZE;
+ seq += PAGE_SIZE;
+ zc->recv_skip_hint -= PAGE_SIZE;
+ frags++;
+ }
+out:
+ up_read(¤t->mm->mmap_sem);
+ if (length) {
+ tp->copied_seq = seq;
+ tcp_rcv_space_adjust(sk);
+
+ /* Clean up data we have read: This will do ACK frames. */
+ tcp_recv_skb(sk, seq, &offset);
+ tcp_cleanup_rbuf(sk, length);
+ ret = 0;
+ if (length == zc->length)
+ zc->recv_skip_hint = 0;
+ } else {
+ if (!zc->recv_skip_hint && sock_flag(sk, SOCK_DONE))
+ ret = -EIO;
+ }
+ zc->length = length;
+ return ret;
+}
+#endif
+
static void tcp_update_recv_tstamps(struct sk_buff *skb,
struct scm_timestamping *tss)
{
@@ -1757,6 +1890,22 @@
}
}
+static int tcp_inq_hint(struct sock *sk)
+{
+ const struct tcp_sock *tp = tcp_sk(sk);
+ u32 copied_seq = READ_ONCE(tp->copied_seq);
+ u32 rcv_nxt = READ_ONCE(tp->rcv_nxt);
+ int inq;
+
+ inq = rcv_nxt - copied_seq;
+ if (unlikely(inq < 0 || copied_seq != READ_ONCE(tp->copied_seq))) {
+ lock_sock(sk);
+ inq = tp->rcv_nxt - tp->copied_seq;
+ release_sock(sk);
+ }
+ return inq;
+}
+
/*
* This routine copies from a sock struct into the user buffer.
*
@@ -1773,13 +1922,14 @@
u32 peek_seq;
u32 *seq;
unsigned long used;
- int err;
+ int err, inq;
int target; /* Read at least this many bytes */
long timeo;
struct sk_buff *skb, *last;
u32 urg_hole = 0;
struct scm_timestamping tss;
bool has_tss = false;
+ bool has_cmsg;
if (unlikely(flags & MSG_ERRQUEUE))
return inet_recv_error(sk, msg, len, addr_len);
@@ -1794,6 +1944,7 @@
if (sk->sk_state == TCP_LISTEN)
goto out;
+ has_cmsg = tp->recvmsg_inq;
timeo = sock_rcvtimeo(sk, nonblock);
/* Urgent data needs to be handled specially. */
@@ -1980,6 +2131,7 @@
if (TCP_SKB_CB(skb)->has_rxtstamp) {
tcp_update_recv_tstamps(skb, &tss);
has_tss = true;
+ has_cmsg = true;
}
if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
goto found_fin_ok;
@@ -1999,13 +2151,20 @@
* on connected socket. I was just happy when found this 8) --ANK
*/
- if (has_tss)
- tcp_recv_timestamp(msg, sk, &tss);
-
/* Clean up data we have read: This will do ACK frames. */
tcp_cleanup_rbuf(sk, copied);
release_sock(sk);
+
+ if (has_cmsg) {
+ if (has_tss)
+ tcp_recv_timestamp(msg, sk, &tss);
+ if (tp->recvmsg_inq) {
+ inq = tcp_inq_hint(sk);
+ put_cmsg(msg, SOL_TCP, TCP_CM_INQ, sizeof(inq), &inq);
+ }
+ }
+
return copied;
out:
@@ -2422,6 +2581,7 @@
tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
tp->snd_cwnd_cnt = 0;
tp->window_clamp = 0;
+ tp->delivered_ce = 0;
tcp_set_ca_state(sk, TCP_CA_Open);
tp->is_sack_reneg = 0;
tcp_clear_retrans(tp);
@@ -2435,6 +2595,7 @@
dst_release(sk->sk_rx_dst);
sk->sk_rx_dst = NULL;
tcp_saved_syn_free(tp);
+ tp->compressed_ack = 0;
/* Clean up fastopen related fields */
tcp_free_fastopen_req(tp);
@@ -2873,6 +3034,12 @@
tp->notsent_lowat = val;
sk->sk_write_space(sk);
break;
+ case TCP_INQ:
+ if (val > 1 || val < 0)
+ err = -EINVAL;
+ else
+ tp->recvmsg_inq = val;
+ break;
default:
err = -ENOPROTOOPT;
break;
@@ -3031,6 +3198,8 @@
rate64 = tcp_compute_delivery_rate(tp);
if (rate64)
info->tcpi_delivery_rate = rate64;
+ info->tcpi_delivered = tp->delivered;
+ info->tcpi_delivered_ce = tp->delivered_ce;
unlock_sock_fast(sk, slow);
}
EXPORT_SYMBOL_GPL(tcp_get_info);
@@ -3044,7 +3213,7 @@
u32 rate;
stats = alloc_skb(7 * nla_total_size_64bit(sizeof(u64)) +
- 5 * nla_total_size(sizeof(u32)) +
+ 7 * nla_total_size(sizeof(u32)) +
3 * nla_total_size(sizeof(u8)), GFP_ATOMIC);
if (!stats)
return NULL;
@@ -3075,9 +3244,12 @@
nla_put_u8(stats, TCP_NLA_RECUR_RETRANS, inet_csk(sk)->icsk_retransmits);
nla_put_u8(stats, TCP_NLA_DELIVERY_RATE_APP_LMT, !!tp->rate_app_limited);
nla_put_u32(stats, TCP_NLA_SND_SSTHRESH, tp->snd_ssthresh);
+ nla_put_u32(stats, TCP_NLA_DELIVERED, tp->delivered);
+ nla_put_u32(stats, TCP_NLA_DELIVERED_CE, tp->delivered_ce);
nla_put_u32(stats, TCP_NLA_SNDQ_SIZE, tp->write_seq - tp->snd_una);
nla_put_u8(stats, TCP_NLA_CA_STATE, inet_csk(sk)->icsk_ca_state);
+
return stats;
}
@@ -3293,6 +3465,9 @@
case TCP_NOTSENT_LOWAT:
val = tp->notsent_lowat;
break;
+ case TCP_INQ:
+ val = tp->recvmsg_inq;
+ break;
case TCP_SAVE_SYN:
val = tp->save_syn;
break;
@@ -3329,6 +3504,25 @@
}
return 0;
}
+#ifdef CONFIG_MMU
+ case TCP_ZEROCOPY_RECEIVE: {
+ struct tcp_zerocopy_receive zc;
+ int err;
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+ if (len != sizeof(zc))
+ return -EINVAL;
+ if (copy_from_user(&zc, optval, len))
+ return -EFAULT;
+ lock_sock(sk);
+ err = tcp_zerocopy_receive(sk, &zc);
+ release_sock(sk);
+ if (!err && copy_to_user(optval, &zc, len))
+ err = -EFAULT;
+ return err;
+ }
+#endif
default:
return -ENOPROTOOPT;
}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e51c644..1191cac 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -111,6 +111,25 @@
#define REXMIT_LOST 1 /* retransmit packets marked lost */
#define REXMIT_NEW 2 /* FRTO-style transmit of unsent/new packets */
+#if IS_ENABLED(CONFIG_TLS_DEVICE)
+static DEFINE_STATIC_KEY_FALSE(clean_acked_data_enabled);
+
+void clean_acked_data_enable(struct inet_connection_sock *icsk,
+ void (*cad)(struct sock *sk, u32 ack_seq))
+{
+ icsk->icsk_clean_acked = cad;
+ static_branch_inc(&clean_acked_data_enabled);
+}
+EXPORT_SYMBOL_GPL(clean_acked_data_enable);
+
+void clean_acked_data_disable(struct inet_connection_sock *icsk)
+{
+ static_branch_dec(&clean_acked_data_enabled);
+ icsk->icsk_clean_acked = NULL;
+}
+EXPORT_SYMBOL_GPL(clean_acked_data_disable);
+#endif
+
static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb,
unsigned int len)
{
@@ -184,21 +203,23 @@
}
}
-static void tcp_incr_quickack(struct sock *sk)
+static void tcp_incr_quickack(struct sock *sk, unsigned int max_quickacks)
{
struct inet_connection_sock *icsk = inet_csk(sk);
unsigned int quickacks = tcp_sk(sk)->rcv_wnd / (2 * icsk->icsk_ack.rcv_mss);
if (quickacks == 0)
quickacks = 2;
+ quickacks = min(quickacks, max_quickacks);
if (quickacks > icsk->icsk_ack.quick)
- icsk->icsk_ack.quick = min(quickacks, TCP_MAX_QUICKACKS);
+ icsk->icsk_ack.quick = quickacks;
}
-static void tcp_enter_quickack_mode(struct sock *sk)
+static void tcp_enter_quickack_mode(struct sock *sk, unsigned int max_quickacks)
{
struct inet_connection_sock *icsk = inet_csk(sk);
- tcp_incr_quickack(sk);
+
+ tcp_incr_quickack(sk, max_quickacks);
icsk->icsk_ack.pingpong = 0;
icsk->icsk_ack.ato = TCP_ATO_MIN;
}
@@ -242,7 +263,7 @@
* it is probably a retransmit.
*/
if (tp->ecn_flags & TCP_ECN_SEEN)
- tcp_enter_quickack_mode((struct sock *)tp);
+ tcp_enter_quickack_mode((struct sock *)tp, 1);
break;
case INET_ECN_CE:
if (tcp_ca_needs_ecn((struct sock *)tp))
@@ -250,7 +271,7 @@
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low cwnd */
- tcp_enter_quickack_mode((struct sock *)tp);
+ tcp_enter_quickack_mode((struct sock *)tp, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;
@@ -582,6 +603,8 @@
u32 copied;
int time;
+ trace_tcp_rcv_space_adjust(sk);
+
tcp_mstamp_refresh(tp);
time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time);
if (time < (tp->rcv_rtt_est.rtt_us >> 3) || tp->rcv_rtt_est.rtt_us == 0)
@@ -665,7 +688,7 @@
/* The _first_ data packet received, initialize
* delayed ACK engine.
*/
- tcp_incr_quickack(sk);
+ tcp_incr_quickack(sk, TCP_MAX_QUICKACKS);
icsk->icsk_ack.ato = TCP_ATO_MIN;
} else {
int m = now - icsk->icsk_ack.lrcvtime;
@@ -681,7 +704,7 @@
/* Too long gap. Apparently sender failed to
* restart window, so that we send ACKs quickly.
*/
- tcp_incr_quickack(sk);
+ tcp_incr_quickack(sk, TCP_MAX_QUICKACKS);
sk_mem_reclaim(sk);
}
}
@@ -1896,19 +1919,54 @@
tp->undo_retrans = tp->retrans_out ? : -1;
}
-/* Enter Loss state. If we detect SACK reneging, forget all SACK information
+static bool tcp_is_rack(const struct sock *sk)
+{
+ return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION;
+}
+
+/* If we detect SACK reneging, forget all SACK information
* and reset tags completely, otherwise preserve SACKs. If receiver
* dropped its ofo queue, we will know this due to reneging detection.
*/
+static void tcp_timeout_mark_lost(struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+ struct sk_buff *skb, *head;
+ bool is_reneg; /* is receiver reneging on SACKs? */
+
+ head = tcp_rtx_queue_head(sk);
+ is_reneg = head && (TCP_SKB_CB(head)->sacked & TCPCB_SACKED_ACKED);
+ if (is_reneg) {
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
+ tp->sacked_out = 0;
+ /* Mark SACK reneging until we recover from this loss event. */
+ tp->is_sack_reneg = 1;
+ } else if (tcp_is_reno(tp)) {
+ tcp_reset_reno_sack(tp);
+ }
+
+ skb = head;
+ skb_rbtree_walk_from(skb) {
+ if (is_reneg)
+ TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
+ else if (tcp_is_rack(sk) && skb != head &&
+ tcp_rack_skb_timeout(tp, skb, 0) > 0)
+ continue; /* Don't mark recently sent ones lost yet */
+ tcp_mark_skb_lost(sk, skb);
+ }
+ tcp_verify_left_out(tp);
+ tcp_clear_all_retrans_hints(tp);
+}
+
+/* Enter Loss state. */
void tcp_enter_loss(struct sock *sk)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct net *net = sock_net(sk);
- struct sk_buff *skb;
bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
- bool is_reneg; /* is receiver reneging on SACKs? */
- bool mark_lost;
+
+ tcp_timeout_mark_lost(sk);
/* Reduce ssthresh if it has not yet been made inside this window. */
if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
@@ -1920,40 +1978,10 @@
tcp_ca_event(sk, CA_EVENT_LOSS);
tcp_init_undo(tp);
}
- tp->snd_cwnd = 1;
+ tp->snd_cwnd = tcp_packets_in_flight(tp) + 1;
tp->snd_cwnd_cnt = 0;
tp->snd_cwnd_stamp = tcp_jiffies32;
- tp->retrans_out = 0;
- tp->lost_out = 0;
-
- if (tcp_is_reno(tp))
- tcp_reset_reno_sack(tp);
-
- skb = tcp_rtx_queue_head(sk);
- is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED);
- if (is_reneg) {
- NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
- tp->sacked_out = 0;
- /* Mark SACK reneging until we recover from this loss event. */
- tp->is_sack_reneg = 1;
- }
- tcp_clear_all_retrans_hints(tp);
-
- skb_rbtree_walk_from(skb) {
- mark_lost = (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||
- is_reneg);
- if (mark_lost)
- tcp_sum_lost(tp, skb);
- TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
- if (mark_lost) {
- TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
- TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
- tp->lost_out += tcp_skb_pcount(skb);
- }
- }
- tcp_verify_left_out(tp);
-
/* Timeout in disordered state after receiving substantial DUPACKs
* suggests that the degree of reordering is over-estimated.
*/
@@ -2120,7 +2148,7 @@
return true;
/* Not-A-Trick#2 : Classic rule... */
- if (tcp_dupack_heuristics(tp) > tp->reordering)
+ if (!tcp_is_rack(sk) && tcp_dupack_heuristics(tp) > tp->reordering)
return true;
return false;
@@ -2197,9 +2225,7 @@
{
struct tcp_sock *tp = tcp_sk(sk);
- if (tcp_is_reno(tp)) {
- tcp_mark_head_lost(sk, 1, 1);
- } else {
+ if (tcp_is_sack(tp)) {
int sacked_upto = tp->sacked_out - tp->reordering;
if (sacked_upto >= 0)
tcp_mark_head_lost(sk, sacked_upto, 0);
@@ -2697,12 +2723,16 @@
return false;
}
-static void tcp_rack_identify_loss(struct sock *sk, int *ack_flag)
+static void tcp_identify_packet_loss(struct sock *sk, int *ack_flag)
{
struct tcp_sock *tp = tcp_sk(sk);
- /* Use RACK to detect loss */
- if (sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION) {
+ if (tcp_rtx_queue_empty(sk))
+ return;
+
+ if (unlikely(tcp_is_reno(tp))) {
+ tcp_newreno_mark_lost(sk, *ack_flag & FLAG_SND_UNA_ADVANCED);
+ } else if (tcp_is_rack(sk)) {
u32 prior_retrans = tp->retrans_out;
tcp_rack_mark_lost(sk);
@@ -2798,11 +2828,11 @@
tcp_try_keep_open(sk);
return;
}
- tcp_rack_identify_loss(sk, ack_flag);
+ tcp_identify_packet_loss(sk, ack_flag);
break;
case TCP_CA_Loss:
tcp_process_loss(sk, flag, is_dupack, rexmit);
- tcp_rack_identify_loss(sk, ack_flag);
+ tcp_identify_packet_loss(sk, ack_flag);
if (!(icsk->icsk_ca_state == TCP_CA_Open ||
(*ack_flag & FLAG_LOST_RETRANS)))
return;
@@ -2819,7 +2849,7 @@
if (icsk->icsk_ca_state <= TCP_CA_Disorder)
tcp_try_undo_dsack(sk);
- tcp_rack_identify_loss(sk, ack_flag);
+ tcp_identify_packet_loss(sk, ack_flag);
if (!tcp_time_to_recover(sk, flag)) {
tcp_try_to_open(sk, flag);
return;
@@ -2841,7 +2871,7 @@
fast_rexmit = 1;
}
- if (do_lost)
+ if (!tcp_is_rack(sk) && do_lost)
tcp_update_scoreboard(sk, fast_rexmit);
*rexmit = REXMIT_LOST;
}
@@ -3496,6 +3526,22 @@
tcp_xmit_retransmit_queue(sk);
}
+/* Returns the number of packets newly acked or sacked by the current ACK */
+static u32 tcp_newly_delivered(struct sock *sk, u32 prior_delivered, int flag)
+{
+ const struct net *net = sock_net(sk);
+ struct tcp_sock *tp = tcp_sk(sk);
+ u32 delivered;
+
+ delivered = tp->delivered - prior_delivered;
+ NET_ADD_STATS(net, LINUX_MIB_TCPDELIVERED, delivered);
+ if (flag & FLAG_ECE) {
+ tp->delivered_ce += delivered;
+ NET_ADD_STATS(net, LINUX_MIB_TCPDELIVEREDCE, delivered);
+ }
+ return delivered;
+}
+
/* This routine deals with incoming acks, but not outgoing ones. */
static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
{
@@ -3542,6 +3588,12 @@
if (after(ack, prior_snd_una)) {
flag |= FLAG_SND_UNA_ADVANCED;
icsk->icsk_retransmits = 0;
+
+#if IS_ENABLED(CONFIG_TLS_DEVICE)
+ if (static_branch_unlikely(&clean_acked_data_enabled))
+ if (icsk->icsk_clean_acked)
+ icsk->icsk_clean_acked(sk, ack);
+#endif
}
prior_fack = tcp_is_sack(tp) ? tcp_highest_sack_seq(tp) : tp->snd_una;
@@ -3619,7 +3671,7 @@
if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
sk_dst_confirm(sk);
- delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */
+ delivered = tcp_newly_delivered(sk, delivered, flag);
lost = tp->lost - lost; /* freshly marked lost */
rs.is_ack_delayed = !!(flag & FLAG_ACK_MAYBE_DELAYED);
tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate);
@@ -3629,9 +3681,11 @@
no_queue:
/* If data was DSACKed, see if we can undo a cwnd reduction. */
- if (flag & FLAG_DSACKING_ACK)
+ if (flag & FLAG_DSACKING_ACK) {
tcp_fastretrans_alert(sk, prior_snd_una, is_dupack, &flag,
&rexmit);
+ tcp_newly_delivered(sk, delivered, flag);
+ }
/* If this ack opens up a zero window, clear backoff. It was
* being used to time the probes, and is probably far higher than
* it needs to be for normal retransmission.
@@ -3655,6 +3709,7 @@
&sack_state);
tcp_fastretrans_alert(sk, prior_snd_una, is_dupack, &flag,
&rexmit);
+ tcp_newly_delivered(sk, delivered, flag);
tcp_xmit_recovery(sk, rexmit);
}
@@ -4126,7 +4181,7 @@
if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
- tcp_enter_quickack_mode(sk);
+ tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS);
if (tcp_is_sack(tp) && sock_net(sk)->ipv4.sysctl_tcp_dsack) {
u32 end_seq = TCP_SKB_CB(skb)->end_seq;
@@ -4196,6 +4251,8 @@
* If the sack array is full, forget about the last one.
*/
if (this_sack >= TCP_NUM_SACKS) {
+ if (tp->compressed_ack)
+ tcp_send_ack(sk);
this_sack--;
tp->rx_opt.num_sacks--;
sp--;
@@ -4573,6 +4630,17 @@
}
+void tcp_data_ready(struct sock *sk)
+{
+ const struct tcp_sock *tp = tcp_sk(sk);
+ int avail = tp->rcv_nxt - tp->copied_seq;
+
+ if (avail < sk->sk_rcvlowat && !sock_flag(sk, SOCK_DONE))
+ return;
+
+ sk->sk_data_ready(sk);
+}
+
static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
@@ -4630,7 +4698,7 @@
if (eaten > 0)
kfree_skb_partial(skb, fragstolen);
if (!sock_flag(sk, SOCK_DEAD))
- sk->sk_data_ready(sk);
+ tcp_data_ready(sk);
return;
}
@@ -4640,7 +4708,7 @@
tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq);
out_of_window:
- tcp_enter_quickack_mode(sk);
+ tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS);
inet_csk_schedule_ack(sk);
drop:
tcp_drop(sk, skb);
@@ -4651,8 +4719,6 @@
if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt + tcp_receive_window(tp)))
goto out_of_window;
- tcp_enter_quickack_mode(sk);
-
if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
/* Partial packet, seq < rcv_next < end_seq */
SOCK_DEBUG(sk, "partial packet: rcv_next %X seq %X - %X\n",
@@ -5019,23 +5085,48 @@
static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
{
struct tcp_sock *tp = tcp_sk(sk);
+ unsigned long rtt, delay;
/* More than one full frame received... */
if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss &&
/* ... and right edge of window advances far enough.
- * (tcp_recvmsg() will send ACK otherwise). Or...
+ * (tcp_recvmsg() will send ACK otherwise).
+ * If application uses SO_RCVLOWAT, we want send ack now if
+ * we have not received enough bytes to satisfy the condition.
*/
- __tcp_select_window(sk) >= tp->rcv_wnd) ||
+ (tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat ||
+ __tcp_select_window(sk) >= tp->rcv_wnd)) ||
/* We ACK each frame or... */
- tcp_in_quickack_mode(sk) ||
- /* We have out of order data. */
- (ofo_possible && !RB_EMPTY_ROOT(&tp->out_of_order_queue))) {
- /* Then ack it now */
+ tcp_in_quickack_mode(sk)) {
+send_now:
tcp_send_ack(sk);
- } else {
- /* Else, send delayed ack. */
- tcp_send_delayed_ack(sk);
+ return;
}
+
+ if (!ofo_possible || RB_EMPTY_ROOT(&tp->out_of_order_queue)) {
+ tcp_send_delayed_ack(sk);
+ return;
+ }
+
+ if (!tcp_is_sack(tp) ||
+ tp->compressed_ack >= sock_net(sk)->ipv4.sysctl_tcp_comp_sack_nr)
+ goto send_now;
+ tp->compressed_ack++;
+
+ if (hrtimer_is_queued(&tp->compressed_ack_timer))
+ return;
+
+ /* compress ack timer : 5 % of rtt, but no more than tcp_comp_sack_delay_ns */
+
+ rtt = tp->rcv_rtt_est.rtt_us;
+ if (tp->srtt_us && tp->srtt_us < rtt)
+ rtt = tp->srtt_us;
+
+ delay = min_t(unsigned long, sock_net(sk)->ipv4.sysctl_tcp_comp_sack_delay_ns,
+ rtt * (NSEC_PER_USEC >> 3)/20);
+ sock_hold(sk);
+ hrtimer_start(&tp->compressed_ack_timer, ns_to_ktime(delay),
+ HRTIMER_MODE_REL_PINNED_SOFT);
}
static inline void tcp_ack_snd_check(struct sock *sk)
@@ -5428,7 +5519,7 @@
no_ack:
if (eaten)
kfree_skb_partial(skb, fragstolen);
- sk->sk_data_ready(sk);
+ tcp_data_ready(sk);
return;
}
}
@@ -5550,9 +5641,12 @@
return true;
}
tp->syn_data_acked = tp->syn_data;
- if (tp->syn_data_acked)
- NET_INC_STATS(sock_net(sk),
- LINUX_MIB_TCPFASTOPENACTIVE);
+ if (tp->syn_data_acked) {
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENACTIVE);
+ /* SYN-data is counted as two separate packets in tcp_ack() */
+ if (tp->delivered > 1)
+ --tp->delivered;
+ }
tcp_fastopen_add_skb(sk, synack);
@@ -5698,7 +5792,7 @@
* to stand against the temptation 8) --ANK
*/
inet_csk_schedule_ack(sk);
- tcp_enter_quickack_mode(sk);
+ tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS);
inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
TCP_DELACK_MAX, TCP_RTO_MAX);
@@ -5884,6 +5978,7 @@
}
switch (sk->sk_state) {
case TCP_SYN_RECV:
+ tp->delivered++; /* SYN-ACK delivery isn't tracked in tcp_ack */
if (!tp->srtt_us)
tcp_synack_rtt_meas(sk, req);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f70586b..adbdb50 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -621,6 +621,7 @@
struct sock *sk1 = NULL;
#endif
struct net *net;
+ struct sock *ctl_sk;
/* Never send a reset in response to a reset. */
if (th->rst)
@@ -723,11 +724,16 @@
arg.tos = ip_hdr(skb)->tos;
arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL);
local_bh_disable();
- ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
+ ctl_sk = *this_cpu_ptr(net->ipv4.tcp_sk);
+ if (sk)
+ ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ?
+ inet_twsk(sk)->tw_mark : sk->sk_mark;
+ ip_send_unicast_reply(ctl_sk,
skb, &TCP_SKB_CB(skb)->header.h4.opt,
ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
&arg, arg.iov[0].iov_len);
+ ctl_sk->sk_mark = 0;
__TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
__TCP_INC_STATS(net, TCP_MIB_OUTRSTS);
local_bh_enable();
@@ -759,6 +765,7 @@
} rep;
struct net *net = sock_net(sk);
struct ip_reply_arg arg;
+ struct sock *ctl_sk;
memset(&rep.th, 0, sizeof(struct tcphdr));
memset(&arg, 0, sizeof(arg));
@@ -809,11 +816,16 @@
arg.tos = tos;
arg.uid = sock_net_uid(net, sk_fullsock(sk) ? sk : NULL);
local_bh_disable();
- ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk),
+ ctl_sk = *this_cpu_ptr(net->ipv4.tcp_sk);
+ if (sk)
+ ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ?
+ inet_twsk(sk)->tw_mark : sk->sk_mark;
+ ip_send_unicast_reply(ctl_sk,
skb, &TCP_SKB_CB(skb)->header.h4.opt,
ip_hdr(skb)->saddr, ip_hdr(skb)->daddr,
&arg, arg.iov[0].iov_len);
+ ctl_sk->sk_mark = 0;
__TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
local_bh_enable();
}
@@ -2560,6 +2572,8 @@
init_net.ipv4.sysctl_tcp_wmem,
sizeof(init_net.ipv4.sysctl_tcp_wmem));
}
+ net->ipv4.sysctl_tcp_comp_sack_delay_ns = NSEC_PER_MSEC;
+ net->ipv4.sysctl_tcp_comp_sack_nr = 44;
net->ipv4.sysctl_tcp_fastopen = TFO_CLIENT_ENABLE;
spin_lock_init(&net->ipv4.tcp_fastopen_ctx_lock);
net->ipv4.sysctl_tcp_fastopen_blackhole_timeout = 60 * 60;
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 57b5468..f867658 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -263,6 +263,7 @@
struct inet_sock *inet = inet_sk(sk);
tw->tw_transparent = inet->transparent;
+ tw->tw_mark = sk->sk_mark;
tw->tw_rcv_wscale = tp->rx_opt.rcv_wscale;
tcptw->tw_rcv_nxt = tp->rcv_nxt;
tcptw->tw_snd_nxt = tp->snd_nxt;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d07e34f..8e08b40 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -162,6 +162,15 @@
/* Account for an ACK we sent. */
static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts)
{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (unlikely(tp->compressed_ack)) {
+ NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED,
+ tp->compressed_ack);
+ tp->compressed_ack = 0;
+ if (hrtimer_try_to_cancel(&tp->compressed_ack_timer) == 1)
+ __sock_put(sk);
+ }
tcp_dec_quickack_mode(sk, pkts);
inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
}
@@ -229,11 +238,9 @@
}
}
- if (mss > (1 << *rcv_wscale)) {
- if (!init_rcv_wnd) /* Use default unless specified otherwise */
- init_rcv_wnd = tcp_default_init_rwnd(mss);
- *rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
- }
+ if (!init_rcv_wnd) /* Use default unless specified otherwise */
+ init_rcv_wnd = tcp_default_init_rwnd(mss);
+ *rcv_wnd = min(*rcv_wnd, init_rcv_wnd * mss);
/* Set the clamp no higher than max representable value */
(*window_clamp) = min_t(__u32, U16_MAX << (*rcv_wscale), *window_clamp);
@@ -585,14 +592,15 @@
unsigned int remaining = MAX_TCP_OPTION_SPACE;
struct tcp_fastopen_request *fastopen = tp->fastopen_req;
-#ifdef CONFIG_TCP_MD5SIG
- *md5 = tp->af_specific->md5_lookup(sk, sk);
- if (*md5) {
- opts->options |= OPTION_MD5;
- remaining -= TCPOLEN_MD5SIG_ALIGNED;
- }
-#else
*md5 = NULL;
+#ifdef CONFIG_TCP_MD5SIG
+ if (unlikely(rcu_access_pointer(tp->md5sig_info))) {
+ *md5 = tp->af_specific->md5_lookup(sk, sk);
+ if (*md5) {
+ opts->options |= OPTION_MD5;
+ remaining -= TCPOLEN_MD5SIG_ALIGNED;
+ }
+ }
#endif
/* We always get an MSS option. The option bytes which will be seen in
@@ -720,14 +728,15 @@
opts->options = 0;
-#ifdef CONFIG_TCP_MD5SIG
- *md5 = tp->af_specific->md5_lookup(sk, sk);
- if (unlikely(*md5)) {
- opts->options |= OPTION_MD5;
- size += TCPOLEN_MD5SIG_ALIGNED;
- }
-#else
*md5 = NULL;
+#ifdef CONFIG_TCP_MD5SIG
+ if (unlikely(rcu_access_pointer(tp->md5sig_info))) {
+ *md5 = tp->af_specific->md5_lookup(sk, sk);
+ if (*md5) {
+ opts->options |= OPTION_MD5;
+ size += TCPOLEN_MD5SIG_ALIGNED;
+ }
+ }
#endif
if (likely(tp->rx_opt.tstamp_ok)) {
@@ -772,7 +781,7 @@
};
static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet);
-static void tcp_tsq_handler(struct sock *sk)
+static void tcp_tsq_write(struct sock *sk)
{
if ((1 << sk->sk_state) &
(TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
@@ -789,6 +798,16 @@
0, GFP_ATOMIC);
}
}
+
+static void tcp_tsq_handler(struct sock *sk)
+{
+ bh_lock_sock(sk);
+ if (!sock_owned_by_user(sk))
+ tcp_tsq_write(sk);
+ else if (!test_and_set_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags))
+ sock_hold(sk);
+ bh_unlock_sock(sk);
+}
/*
* One tasklet per cpu tries to send more skbs.
* We run in tasklet context but need to disable irqs when
@@ -816,16 +835,7 @@
smp_mb__before_atomic();
clear_bit(TSQ_QUEUED, &sk->sk_tsq_flags);
- if (!sk->sk_lock.owned &&
- test_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags)) {
- bh_lock_sock(sk);
- if (!sock_owned_by_user(sk)) {
- clear_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags);
- tcp_tsq_handler(sk);
- }
- bh_unlock_sock(sk);
- }
-
+ tcp_tsq_handler(sk);
sk_free(sk);
}
}
@@ -853,9 +863,10 @@
nflags = flags & ~TCP_DEFERRED_ALL;
} while (cmpxchg(&sk->sk_tsq_flags, flags, nflags) != flags);
- if (flags & TCPF_TSQ_DEFERRED)
- tcp_tsq_handler(sk);
-
+ if (flags & TCPF_TSQ_DEFERRED) {
+ tcp_tsq_write(sk);
+ __sock_put(sk);
+ }
/* Here begins the tricky part :
* We are called from release_sock() with :
* 1) BH disabled
@@ -929,7 +940,7 @@
if (!(oval & TSQF_THROTTLED) || (oval & TSQF_QUEUED))
goto out;
- nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | TCPF_TSQ_DEFERRED;
+ nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED;
nval = cmpxchg(&sk->sk_tsq_flags, oval, nval);
if (nval != oval)
continue;
@@ -948,37 +959,17 @@
sk_free(sk);
}
-/* Note: Called under hard irq.
- * We can not call TCP stack right away.
+/* Note: Called under soft irq.
+ * We can call TCP stack right away, unless socket is owned by user.
*/
enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)
{
struct tcp_sock *tp = container_of(timer, struct tcp_sock, pacing_timer);
struct sock *sk = (struct sock *)tp;
- unsigned long nval, oval;
- for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) {
- struct tsq_tasklet *tsq;
- bool empty;
+ tcp_tsq_handler(sk);
+ sock_put(sk);
- if (oval & TSQF_QUEUED)
- break;
-
- nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | TCPF_TSQ_DEFERRED;
- nval = cmpxchg(&sk->sk_tsq_flags, oval, nval);
- if (nval != oval)
- continue;
-
- if (!refcount_inc_not_zero(&sk->sk_wmem_alloc))
- break;
- /* queue this socket to tasklet queue */
- tsq = this_cpu_ptr(&tsq_tasklet);
- empty = list_empty(&tsq->head);
- list_add(&tp->tsq_node, &tsq->head);
- if (empty)
- tasklet_schedule(&tsq->tasklet);
- break;
- }
return HRTIMER_NORESTART;
}
@@ -1011,7 +1002,8 @@
do_div(len_ns, rate);
hrtimer_start(&tcp_sk(sk)->pacing_timer,
ktime_add_ns(ktime_get(), len_ns),
- HRTIMER_MODE_ABS_PINNED);
+ HRTIMER_MODE_ABS_PINNED_SOFT);
+ sock_hold(sk);
}
static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb)
@@ -1078,7 +1070,7 @@
/* if no packet is in qdisc/device queue, then allow XPS to select
* another queue. We can be called from tcp_tsq_handler()
- * which holds one reference to sk_wmem_alloc.
+ * which holds one reference to sk.
*
* TODO: Ideally, in-flight pure ACK packets should not matter here.
* One way to get this would be to set skb->truesize = 2 on them.
@@ -2185,7 +2177,7 @@
static bool tcp_pacing_check(const struct sock *sk)
{
return tcp_needs_internal_pacing(sk) &&
- hrtimer_active(&tcp_sk(sk)->pacing_timer);
+ hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
}
/* TCP Small Queues :
@@ -2365,8 +2357,6 @@
skb, limit, mss_now, gfp)))
break;
- if (test_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags))
- clear_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags);
if (tcp_small_queue_check(sk, skb, 0))
break;
diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 3a81720..71593e4 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -2,7 +2,7 @@
#include <linux/tcp.h>
#include <net/tcp.h>
-static void tcp_rack_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
+void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
{
struct tcp_sock *tp = tcp_sk(sk);
@@ -21,6 +21,38 @@
return t1 > t2 || (t1 == t2 && after(seq1, seq2));
}
+static u32 tcp_rack_reo_wnd(const struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (!tp->rack.reord) {
+ /* If reordering has not been observed, be aggressive during
+ * the recovery or starting the recovery by DUPACK threshold.
+ */
+ if (inet_csk(sk)->icsk_ca_state >= TCP_CA_Recovery)
+ return 0;
+
+ if (tp->sacked_out >= tp->reordering &&
+ !(sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_NO_DUPTHRESH))
+ return 0;
+ }
+
+ /* To be more reordering resilient, allow min_rtt/4 settling delay.
+ * Use min_rtt instead of the smoothed RTT because reordering is
+ * often a path property and less related to queuing or delayed ACKs.
+ * Upon receiving DSACKs, linearly increase the window up to the
+ * smoothed RTT.
+ */
+ return min((tcp_min_rtt(tp) >> 2) * tp->rack.reo_wnd_steps,
+ tp->srtt_us >> 3);
+}
+
+s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, u32 reo_wnd)
+{
+ return tp->rack.rtt_us + reo_wnd -
+ tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp);
+}
+
/* RACK loss detection (IETF draft draft-ietf-tcpm-rack-01):
*
* Marks a packet lost, if some packet sent later has been (s)acked.
@@ -44,23 +76,11 @@
static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
{
struct tcp_sock *tp = tcp_sk(sk);
- u32 min_rtt = tcp_min_rtt(tp);
struct sk_buff *skb, *n;
u32 reo_wnd;
*reo_timeout = 0;
- /* To be more reordering resilient, allow min_rtt/4 settling delay
- * (lower-bounded to 1000uS). We use min_rtt instead of the smoothed
- * RTT because reordering is often a path property and less related
- * to queuing or delayed ACKs.
- */
- reo_wnd = 1000;
- if ((tp->rack.reord || inet_csk(sk)->icsk_ca_state < TCP_CA_Recovery) &&
- min_rtt != ~0U) {
- reo_wnd = max((min_rtt >> 2) * tp->rack.reo_wnd_steps, reo_wnd);
- reo_wnd = min(reo_wnd, tp->srtt_us >> 3);
- }
-
+ reo_wnd = tcp_rack_reo_wnd(sk);
list_for_each_entry_safe(skb, n, &tp->tsorted_sent_queue,
tcp_tsorted_anchor) {
struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
@@ -78,10 +98,9 @@
/* A packet is lost if it has not been s/acked beyond
* the recent RTT plus the reordering window.
*/
- remaining = tp->rack.rtt_us + reo_wnd -
- tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp);
+ remaining = tcp_rack_skb_timeout(tp, skb, reo_wnd);
if (remaining <= 0) {
- tcp_rack_mark_skb_lost(sk, skb);
+ tcp_mark_skb_lost(sk, skb);
list_del_init(&skb->tcp_tsorted_anchor);
} else {
/* Record maximum wait time */
@@ -202,3 +221,30 @@
tp->rack.reo_wnd_steps = 1;
}
}
+
+/* RFC6582 NewReno recovery for non-SACK connection. It simply retransmits
+ * the next unacked packet upon receiving
+ * a) three or more DUPACKs to start the fast recovery
+ * b) an ACK acknowledging new data during the fast recovery.
+ */
+void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced)
+{
+ const u8 state = inet_csk(sk)->icsk_ca_state;
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if ((state < TCP_CA_Recovery && tp->sacked_out >= tp->reordering) ||
+ (state == TCP_CA_Recovery && snd_una_advanced)) {
+ struct sk_buff *skb = tcp_rtx_queue_head(sk);
+ u32 mss;
+
+ if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+ return;
+
+ mss = tcp_skb_mss(skb);
+ if (tcp_skb_pcount(skb) > 1 && skb->len > mss)
+ tcp_fragment(sk, TCP_FRAG_IN_RTX_QUEUE, skb,
+ mss, mss, GFP_ATOMIC);
+
+ tcp_skb_mark_lost_uncond_verify(tp, skb);
+ }
+}
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index f7d9448..3b36117 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -708,11 +708,36 @@
sock_put(sk);
}
+static enum hrtimer_restart tcp_compressed_ack_kick(struct hrtimer *timer)
+{
+ struct tcp_sock *tp = container_of(timer, struct tcp_sock, compressed_ack_timer);
+ struct sock *sk = (struct sock *)tp;
+
+ bh_lock_sock(sk);
+ if (!sock_owned_by_user(sk)) {
+ if (tp->compressed_ack)
+ tcp_send_ack(sk);
+ } else {
+ if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED,
+ &sk->sk_tsq_flags))
+ sock_hold(sk);
+ }
+ bh_unlock_sock(sk);
+
+ sock_put(sk);
+
+ return HRTIMER_NORESTART;
+}
+
void tcp_init_xmit_timers(struct sock *sk)
{
inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer,
&tcp_keepalive_timer);
hrtimer_init(&tcp_sk(sk)->pacing_timer, CLOCK_MONOTONIC,
- HRTIMER_MODE_ABS_PINNED);
+ HRTIMER_MODE_ABS_PINNED_SOFT);
tcp_sk(sk)->pacing_timer.function = tcp_pace_kick;
+
+ hrtimer_init(&tcp_sk(sk)->compressed_ack_timer, CLOCK_MONOTONIC,
+ HRTIMER_MODE_REL_PINNED_SOFT);
+ tcp_sk(sk)->compressed_ack_timer.function = tcp_compressed_ack_kick;
}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b61a770..d71f1f3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -757,7 +757,8 @@
}
EXPORT_SYMBOL(udp_set_csum);
-static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4)
+static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
+ struct inet_cork *cork)
{
struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(sk);
@@ -777,6 +778,27 @@
uh->len = htons(len);
uh->check = 0;
+ if (cork->gso_size) {
+ const int hlen = skb_network_header_len(skb) +
+ sizeof(struct udphdr);
+
+ if (hlen + cork->gso_size > cork->fragsize)
+ return -EINVAL;
+ if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS)
+ return -EINVAL;
+ if (sk->sk_no_check_tx)
+ return -EINVAL;
+ if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
+ dst_xfrm(skb_dst(skb)))
+ return -EIO;
+
+ skb_shinfo(skb)->gso_size = cork->gso_size;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(len - sizeof(uh),
+ cork->gso_size);
+ goto csum_partial;
+ }
+
if (is_udplite) /* UDP-Lite */
csum = udplite_csum(skb);
@@ -786,6 +808,7 @@
goto send;
} else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
+csum_partial:
udp4_hwcsum(skb, fl4->saddr, fl4->daddr);
goto send;
@@ -828,7 +851,7 @@
if (!skb)
goto out;
- err = udp_send_skb(skb, fl4);
+ err = udp_send_skb(skb, fl4, &inet->cork.base);
out:
up->len = 0;
@@ -837,6 +860,43 @@
}
EXPORT_SYMBOL(udp_push_pending_frames);
+static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
+{
+ switch (cmsg->cmsg_type) {
+ case UDP_SEGMENT:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u16)))
+ return -EINVAL;
+ *gso_size = *(__u16 *)CMSG_DATA(cmsg);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size)
+{
+ struct cmsghdr *cmsg;
+ bool need_ip = false;
+ int err;
+
+ for_each_cmsghdr(cmsg, msg) {
+ if (!CMSG_OK(msg, cmsg))
+ return -EINVAL;
+
+ if (cmsg->cmsg_level != SOL_UDP) {
+ need_ip = true;
+ continue;
+ }
+
+ err = __udp_cmsg_send(cmsg, gso_size);
+ if (err)
+ return err;
+ }
+
+ return need_ip;
+}
+EXPORT_SYMBOL_GPL(udp_cmsg_send);
+
int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
struct inet_sock *inet = inet_sk(sk);
@@ -922,10 +982,14 @@
ipc.sockc.tsflags = sk->sk_tsflags;
ipc.addr = inet->inet_saddr;
ipc.oif = sk->sk_bound_dev_if;
+ ipc.gso_size = up->gso_size;
if (msg->msg_controllen) {
- err = ip_cmsg_send(sk, msg, &ipc, sk->sk_family == AF_INET6);
- if (unlikely(err)) {
+ err = udp_cmsg_send(sk, msg, &ipc.gso_size);
+ if (err > 0)
+ err = ip_cmsg_send(sk, msg, &ipc,
+ sk->sk_family == AF_INET6);
+ if (unlikely(err < 0)) {
kfree(ipc.opt);
return err;
}
@@ -1032,12 +1096,14 @@
/* Lockless fast path for the non-corking case. */
if (!corkreq) {
+ struct inet_cork cork;
+
skb = ip_make_skb(sk, fl4, getfrag, msg, ulen,
sizeof(struct udphdr), &ipc, &rt,
- msg->msg_flags);
+ &cork, msg->msg_flags);
err = PTR_ERR(skb);
if (!IS_ERR_OR_NULL(skb))
- err = udp_send_skb(skb, fl4);
+ err = udp_send_skb(skb, fl4, &cork);
goto out;
}
@@ -1813,10 +1879,10 @@
return 0;
}
-static struct static_key udp_encap_needed __read_mostly;
+static DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
void udp_encap_enable(void)
{
- static_key_enable(&udp_encap_needed);
+ static_branch_enable(&udp_encap_needed_key);
}
EXPORT_SYMBOL(udp_encap_enable);
@@ -1840,7 +1906,7 @@
goto drop;
nf_reset(skb);
- if (static_key_false(&udp_encap_needed) && up->encap_type) {
+ if (static_branch_unlikely(&udp_encap_needed_key) && up->encap_type) {
int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
/*
@@ -2303,7 +2369,7 @@
bool slow = lock_sock_fast(sk);
udp_flush_pending_frames(sk);
unlock_sock_fast(sk, slow);
- if (static_key_false(&udp_encap_needed) && up->encap_type) {
+ if (static_branch_unlikely(&udp_encap_needed_key) && up->encap_type) {
void (*encap_destroy)(struct sock *sk);
encap_destroy = READ_ONCE(up->encap_destroy);
if (encap_destroy)
@@ -2368,6 +2434,12 @@
up->no_check6_rx = valbool;
break;
+ case UDP_SEGMENT:
+ if (val < 0 || val > USHRT_MAX)
+ return -EINVAL;
+ up->gso_size = val;
+ break;
+
/*
* UDP-Lite's partial checksum coverage (RFC 3828).
*/
@@ -2458,6 +2530,10 @@
val = up->no_check6_rx;
break;
+ case UDP_SEGMENT:
+ val = up->gso_size;
+ break;
+
/* The following two cannot be changed on UDP sockets, the return is
* always 0 (which corresponds to the full checksum coverage of UDP). */
case UDPLITE_SEND_CSCOV:
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index ea6e6e7..92dc9e5 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -187,6 +187,102 @@
}
EXPORT_SYMBOL(skb_udp_tunnel_segment);
+struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
+ netdev_features_t features)
+{
+ struct sock *sk = gso_skb->sk;
+ unsigned int sum_truesize = 0;
+ struct sk_buff *segs, *seg;
+ struct udphdr *uh;
+ unsigned int mss;
+ bool copy_dtor;
+ __sum16 check;
+ __be16 newlen;
+
+ mss = skb_shinfo(gso_skb)->gso_size;
+ if (gso_skb->len <= sizeof(*uh) + mss)
+ return ERR_PTR(-EINVAL);
+
+ skb_pull(gso_skb, sizeof(*uh));
+
+ /* clear destructor to avoid skb_segment assigning it to tail */
+ copy_dtor = gso_skb->destructor == sock_wfree;
+ if (copy_dtor)
+ gso_skb->destructor = NULL;
+
+ segs = skb_segment(gso_skb, features);
+ if (unlikely(IS_ERR_OR_NULL(segs))) {
+ if (copy_dtor)
+ gso_skb->destructor = sock_wfree;
+ return segs;
+ }
+
+ /* GSO partial and frag_list segmentation only requires splitting
+ * the frame into an MSS multiple and possibly a remainder, both
+ * cases return a GSO skb. So update the mss now.
+ */
+ if (skb_is_gso(segs))
+ mss *= skb_shinfo(segs)->gso_segs;
+
+ seg = segs;
+ uh = udp_hdr(seg);
+
+ /* compute checksum adjustment based on old length versus new */
+ newlen = htons(sizeof(*uh) + mss);
+ check = csum16_add(csum16_sub(uh->check, uh->len), newlen);
+
+ for (;;) {
+ if (copy_dtor) {
+ seg->destructor = sock_wfree;
+ seg->sk = sk;
+ sum_truesize += seg->truesize;
+ }
+
+ if (!seg->next)
+ break;
+
+ uh->len = newlen;
+ uh->check = check;
+
+ if (seg->ip_summed == CHECKSUM_PARTIAL)
+ gso_reset_checksum(seg, ~check);
+ else
+ uh->check = gso_make_checksum(seg, ~check) ? :
+ CSUM_MANGLED_0;
+
+ seg = seg->next;
+ uh = udp_hdr(seg);
+ }
+
+ /* last packet can be partial gso_size, account for that in checksum */
+ newlen = htons(skb_tail_pointer(seg) - skb_transport_header(seg) +
+ seg->data_len);
+ check = csum16_add(csum16_sub(uh->check, uh->len), newlen);
+
+ uh->len = newlen;
+ uh->check = check;
+
+ if (seg->ip_summed == CHECKSUM_PARTIAL)
+ gso_reset_checksum(seg, ~check);
+ else
+ uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
+
+ /* update refcount for the packet */
+ if (copy_dtor) {
+ int delta = sum_truesize - gso_skb->truesize;
+
+ /* In some pathological cases, delta can be negative.
+ * We need to either use refcount_add() or refcount_sub_and_test()
+ */
+ if (likely(delta >= 0))
+ refcount_add(delta, &sk->sk_wmem_alloc);
+ else
+ WARN_ON_ONCE(refcount_sub_and_test(-delta, &sk->sk_wmem_alloc));
+ }
+ return segs;
+}
+EXPORT_SYMBOL_GPL(__udp_gso_segment);
+
static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
netdev_features_t features)
{
@@ -203,12 +299,15 @@
goto out;
}
- if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+ if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP_L4)))
goto out;
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
goto out;
+ if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+ return __udp_gso_segment(skb, features);
+
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 11e4e80..0eff755 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -329,4 +329,9 @@
If unsure, say N.
+config IPV6_SEG6_BPF
+ def_bool y
+ depends on IPV6_SEG6_LWTUNNEL
+ depends on IPV6 = y
+
endif # IPV6
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 78cef00..fbfd71a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -170,7 +170,7 @@
unsigned long event);
static int addrconf_ifdown(struct net_device *dev, int how);
-static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
+static struct fib6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
int plen,
const struct net_device *dev,
u32 flags, u32 noflags);
@@ -916,7 +916,6 @@
pr_warn("Freeing alive inet6 address %p\n", ifp);
return;
}
- ip6_rt_put(ifp->rt);
kfree_rcu(ifp, rcu);
}
@@ -995,7 +994,7 @@
gfp_t gfp_flags = can_block ? GFP_KERNEL : GFP_ATOMIC;
struct net *net = dev_net(idev->dev);
struct inet6_ifaddr *ifa = NULL;
- struct rt6_info *rt = NULL;
+ struct fib6_info *f6i = NULL;
int err = 0;
int addr_type = ipv6_addr_type(addr);
@@ -1037,16 +1036,16 @@
goto out;
}
- rt = addrconf_dst_alloc(idev, addr, false);
- if (IS_ERR(rt)) {
- err = PTR_ERR(rt);
- rt = NULL;
+ f6i = addrconf_f6i_alloc(net, idev, addr, false, gfp_flags);
+ if (IS_ERR(f6i)) {
+ err = PTR_ERR(f6i);
+ f6i = NULL;
goto out;
}
if (net->ipv6.devconf_all->disable_policy ||
idev->cnf.disable_policy)
- rt->dst.flags |= DST_NOPOLICY;
+ f6i->dst_nopolicy = true;
neigh_parms_data_state_setall(idev->nd_parms);
@@ -1068,7 +1067,7 @@
ifa->cstamp = ifa->tstamp = jiffies;
ifa->tokenized = false;
- ifa->rt = rt;
+ ifa->rt = f6i;
ifa->idev = idev;
in6_dev_hold(idev);
@@ -1102,8 +1101,8 @@
inet6addr_notifier_call_chain(NETDEV_UP, ifa);
out:
if (unlikely(err < 0)) {
- if (rt)
- ip6_rt_put(rt);
+ fib6_info_release(f6i);
+
if (ifa) {
if (ifa->idev)
in6_dev_put(ifa->idev);
@@ -1179,19 +1178,19 @@
static void
cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned long expires, bool del_rt)
{
- struct rt6_info *rt;
+ struct fib6_info *f6i;
- rt = addrconf_get_prefix_route(&ifp->addr,
+ f6i = addrconf_get_prefix_route(&ifp->addr,
ifp->prefix_len,
ifp->idev->dev,
0, RTF_GATEWAY | RTF_DEFAULT);
- if (rt) {
+ if (f6i) {
if (del_rt)
- ip6_del_rt(rt);
+ ip6_del_rt(dev_net(ifp->idev->dev), f6i);
else {
- if (!(rt->rt6i_flags & RTF_EXPIRES))
- rt6_set_expires(rt, expires);
- ip6_rt_put(rt);
+ if (!(f6i->fib6_flags & RTF_EXPIRES))
+ fib6_set_expires(f6i, expires);
+ fib6_info_release(f6i);
}
}
}
@@ -2320,7 +2319,7 @@
static void
addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
- unsigned long expires, u32 flags)
+ unsigned long expires, u32 flags, gfp_t gfp_flags)
{
struct fib6_config cfg = {
.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX,
@@ -2331,6 +2330,7 @@
.fc_flags = RTF_UP | flags,
.fc_nlinfo.nl_net = dev_net(dev),
.fc_protocol = RTPROT_KERNEL,
+ .fc_type = RTN_UNICAST,
};
cfg.fc_dst = *pfx;
@@ -2344,17 +2344,17 @@
cfg.fc_flags |= RTF_NONEXTHOP;
#endif
- ip6_route_add(&cfg, NULL);
+ ip6_route_add(&cfg, gfp_flags, NULL);
}
-static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
+static struct fib6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
int plen,
const struct net_device *dev,
u32 flags, u32 noflags)
{
struct fib6_node *fn;
- struct rt6_info *rt = NULL;
+ struct fib6_info *rt = NULL;
struct fib6_table *table;
u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX;
@@ -2368,14 +2368,13 @@
goto out;
for_each_fib6_node_rt_rcu(fn) {
- if (rt->dst.dev->ifindex != dev->ifindex)
+ if (rt->fib6_nh.nh_dev->ifindex != dev->ifindex)
continue;
- if ((rt->rt6i_flags & flags) != flags)
+ if ((rt->fib6_flags & flags) != flags)
continue;
- if ((rt->rt6i_flags & noflags) != 0)
+ if ((rt->fib6_flags & noflags) != 0)
continue;
- if (!dst_hold_safe(&rt->dst))
- rt = NULL;
+ fib6_info_hold(rt);
break;
}
out:
@@ -2394,12 +2393,13 @@
.fc_ifindex = dev->ifindex,
.fc_dst_len = 8,
.fc_flags = RTF_UP,
+ .fc_type = RTN_UNICAST,
.fc_nlinfo.nl_net = dev_net(dev),
};
ipv6_addr_set(&cfg.fc_dst, htonl(0xFF000000), 0, 0, 0);
- ip6_route_add(&cfg, NULL);
+ ip6_route_add(&cfg, GFP_ATOMIC, NULL);
}
static struct inet6_dev *addrconf_add_dev(struct net_device *dev)
@@ -2529,7 +2529,6 @@
if (IS_ERR_OR_NULL(ifp))
return -1;
- update_lft = 0;
create = 1;
spin_lock_bh(&ifp->lock);
ifp->flags |= IFA_F_MANAGETEMPADDR;
@@ -2551,7 +2550,7 @@
stored_lft = ifp->valid_lft - (now - ifp->tstamp) / HZ;
else
stored_lft = 0;
- if (!update_lft && !create && stored_lft) {
+ if (!create && stored_lft) {
const u32 minimum_lft = min_t(u32,
stored_lft, MIN_VALID_LIFETIME);
valid_lft = max(valid_lft, minimum_lft);
@@ -2642,7 +2641,7 @@
*/
if (pinfo->onlink) {
- struct rt6_info *rt;
+ struct fib6_info *rt;
unsigned long rt_expires;
/* Avoid arithmetic overflow. Really, we could
@@ -2667,13 +2666,13 @@
if (rt) {
/* Autoconf prefix route */
if (valid_lft == 0) {
- ip6_del_rt(rt);
+ ip6_del_rt(net, rt);
rt = NULL;
} else if (addrconf_finite_timeout(rt_expires)) {
/* not infinity */
- rt6_set_expires(rt, jiffies + rt_expires);
+ fib6_set_expires(rt, jiffies + rt_expires);
} else {
- rt6_clean_expires(rt);
+ fib6_clean_expires(rt);
}
} else if (valid_lft) {
clock_t expires = 0;
@@ -2684,9 +2683,9 @@
expires = jiffies_to_clock_t(rt_expires);
}
addrconf_prefix_route(&pinfo->prefix, pinfo->prefix_len,
- dev, expires, flags);
+ dev, expires, flags, GFP_ATOMIC);
}
- ip6_rt_put(rt);
+ fib6_info_release(rt);
}
/* Try to figure out our local address for this prefix */
@@ -2899,9 +2898,14 @@
if (!IS_ERR(ifp)) {
if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
addrconf_prefix_route(&ifp->addr, ifp->prefix_len, dev,
- expires, flags);
+ expires, flags, GFP_KERNEL);
}
+ /* Send a netlink notification if DAD is enabled and
+ * optimistic flag is not set
+ */
+ if (!(ifp->flags & (IFA_F_OPTIMISTIC | IFA_F_NODAD)))
+ ipv6_ifa_notify(0, ifp);
/*
* Note that section 3.1 of RFC 4429 indicates
* that the Optimistic flag should not be set for
@@ -3047,7 +3051,8 @@
if (addr.s6_addr32[3]) {
add_addr(idev, &addr, plen, scope);
- addrconf_prefix_route(&addr, plen, idev->dev, 0, pflags);
+ addrconf_prefix_route(&addr, plen, idev->dev, 0, pflags,
+ GFP_ATOMIC);
return;
}
@@ -3072,7 +3077,7 @@
add_addr(idev, &addr, plen, flag);
addrconf_prefix_route(&addr, plen, idev->dev, 0,
- pflags);
+ pflags, GFP_ATOMIC);
}
}
}
@@ -3112,7 +3117,8 @@
ifp = ipv6_add_addr(idev, addr, NULL, 64, IFA_LINK, addr_flags,
INFINITY_LIFE_TIME, INFINITY_LIFE_TIME, true, NULL);
if (!IS_ERR(ifp)) {
- addrconf_prefix_route(&ifp->addr, ifp->prefix_len, idev->dev, 0, 0);
+ addrconf_prefix_route(&ifp->addr, ifp->prefix_len, idev->dev,
+ 0, 0, GFP_ATOMIC);
addrconf_dad_start(ifp);
in6_ifa_put(ifp);
}
@@ -3227,7 +3233,8 @@
addrconf_add_linklocal(idev, &addr,
IFA_F_STABLE_PRIVACY);
else if (prefix_route)
- addrconf_prefix_route(&addr, 64, idev->dev, 0, 0);
+ addrconf_prefix_route(&addr, 64, idev->dev,
+ 0, 0, GFP_KERNEL);
break;
case IN6_ADDR_GEN_MODE_EUI64:
/* addrconf_add_linklocal also adds a prefix_route and we
@@ -3237,7 +3244,8 @@
if (ipv6_generate_eui64(addr.s6_addr + 8, idev->dev) == 0)
addrconf_add_linklocal(idev, &addr, 0);
else if (prefix_route)
- addrconf_prefix_route(&addr, 64, idev->dev, 0, 0);
+ addrconf_prefix_route(&addr, 64, idev->dev,
+ 0, 0, GFP_KERNEL);
break;
case IN6_ADDR_GEN_MODE_NONE:
default:
@@ -3329,32 +3337,34 @@
}
#endif
-static int fixup_permanent_addr(struct inet6_dev *idev,
+static int fixup_permanent_addr(struct net *net,
+ struct inet6_dev *idev,
struct inet6_ifaddr *ifp)
{
- /* !rt6i_node means the host route was removed from the
+ /* !fib6_node means the host route was removed from the
* FIB, for example, if 'lo' device is taken down. In that
* case regenerate the host route.
*/
- if (!ifp->rt || !ifp->rt->rt6i_node) {
- struct rt6_info *rt, *prev;
+ if (!ifp->rt || !ifp->rt->fib6_node) {
+ struct fib6_info *f6i, *prev;
- rt = addrconf_dst_alloc(idev, &ifp->addr, false);
- if (IS_ERR(rt))
- return PTR_ERR(rt);
+ f6i = addrconf_f6i_alloc(net, idev, &ifp->addr, false,
+ GFP_ATOMIC);
+ if (IS_ERR(f6i))
+ return PTR_ERR(f6i);
/* ifp->rt can be accessed outside of rtnl */
spin_lock(&ifp->lock);
prev = ifp->rt;
- ifp->rt = rt;
+ ifp->rt = f6i;
spin_unlock(&ifp->lock);
- ip6_rt_put(prev);
+ fib6_info_release(prev);
}
if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
addrconf_prefix_route(&ifp->addr, ifp->prefix_len,
- idev->dev, 0, 0);
+ idev->dev, 0, 0, GFP_ATOMIC);
}
if (ifp->state == INET6_IFADDR_STATE_PREDAD)
@@ -3363,7 +3373,7 @@
return 0;
}
-static void addrconf_permanent_addr(struct net_device *dev)
+static void addrconf_permanent_addr(struct net *net, struct net_device *dev)
{
struct inet6_ifaddr *ifp, *tmp;
struct inet6_dev *idev;
@@ -3376,7 +3386,7 @@
list_for_each_entry_safe(ifp, tmp, &idev->addr_list, if_list) {
if ((ifp->flags & IFA_F_PERMANENT) &&
- fixup_permanent_addr(idev, ifp) < 0) {
+ fixup_permanent_addr(net, idev, ifp) < 0) {
write_unlock_bh(&idev->lock);
in6_ifa_hold(ifp);
ipv6_del_addr(ifp);
@@ -3445,7 +3455,7 @@
if (event == NETDEV_UP) {
/* restore routes for permanent addresses */
- addrconf_permanent_addr(dev);
+ addrconf_permanent_addr(net, dev);
if (!addrconf_link_ready(dev)) {
/* device is not ready yet. */
@@ -3612,8 +3622,7 @@
struct net *net = dev_net(dev);
struct inet6_dev *idev;
struct inet6_ifaddr *ifa, *tmp;
- int _keep_addr;
- bool keep_addr;
+ bool keep_addr = false;
int state, i;
ASSERT_RTNL();
@@ -3639,15 +3648,18 @@
}
- /* aggregate the system setting and interface setting */
- _keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
- if (!_keep_addr)
- _keep_addr = idev->cnf.keep_addr_on_down;
-
/* combine the user config with event to determine if permanent
* addresses are to be removed from address hash table
*/
- keep_addr = !(how || _keep_addr <= 0 || idev->cnf.disable_ipv6);
+ if (!how && !idev->cnf.disable_ipv6) {
+ /* aggregate the system setting and interface setting */
+ int _keep_addr = net->ipv6.devconf_all->keep_addr_on_down;
+
+ if (!_keep_addr)
+ _keep_addr = idev->cnf.keep_addr_on_down;
+
+ keep_addr = (_keep_addr > 0);
+ }
/* Step 2: clear hash table */
for (i = 0; i < IN6_ADDR_HSIZE; i++) {
@@ -3697,13 +3709,8 @@
write_lock_bh(&idev->lock);
}
- /* re-combine the user config with event to determine if permanent
- * addresses are to be removed from the interface list
- */
- keep_addr = (!how && _keep_addr > 0 && !idev->cnf.disable_ipv6);
-
list_for_each_entry_safe(ifa, tmp, &idev->addr_list, if_list) {
- struct rt6_info *rt = NULL;
+ struct fib6_info *rt = NULL;
bool keep;
addrconf_del_dad_work(ifa);
@@ -3731,7 +3738,7 @@
spin_unlock_bh(&ifa->lock);
if (rt)
- ip6_del_rt(rt);
+ ip6_del_rt(net, rt);
if (state != INET6_IFADDR_STATE_DEAD) {
__ipv6_ifa_notify(RTM_DELADDR, ifa);
@@ -3849,6 +3856,7 @@
struct inet6_dev *idev = ifp->idev;
struct net_device *dev = idev->dev;
bool bump_id, notify = false;
+ struct net *net;
addrconf_join_solict(dev, &ifp->addr);
@@ -3859,8 +3867,9 @@
if (ifp->state == INET6_IFADDR_STATE_DEAD)
goto out;
+ net = dev_net(dev);
if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) ||
- (dev_net(dev)->ipv6.devconf_all->accept_dad < 1 &&
+ (net->ipv6.devconf_all->accept_dad < 1 &&
idev->cnf.accept_dad < 1) ||
!(ifp->flags&IFA_F_TENTATIVE) ||
ifp->flags & IFA_F_NODAD) {
@@ -3896,8 +3905,8 @@
* Frames right away
*/
if (ifp->flags & IFA_F_OPTIMISTIC) {
- ip6_ins_rt(ifp->rt);
- if (ipv6_use_optimistic_addr(dev_net(dev), idev)) {
+ ip6_ins_rt(net, ifp->rt);
+ if (ipv6_use_optimistic_addr(net, idev)) {
/* Because optimistic nodes can use this address,
* notify listeners. If DAD fails, RTM_DELADDR is sent.
*/
@@ -4563,8 +4572,9 @@
ipv6_ifa_notify(0, ifp);
if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
- addrconf_prefix_route(&ifp->addr, ifp->prefix_len, ifp->idev->dev,
- expires, flags);
+ addrconf_prefix_route(&ifp->addr, ifp->prefix_len,
+ ifp->idev->dev, expires, flags,
+ GFP_KERNEL);
} else if (had_prefixroute) {
enum cleanup_prefix_rt_t action;
unsigned long rt_expires;
@@ -4804,9 +4814,10 @@
static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
u32 portid, u32 seq, int event, unsigned int flags)
{
+ struct net_device *dev = fib6_info_nh_dev(ifaca->aca_rt);
+ int ifindex = dev ? dev->ifindex : 1;
struct nlmsghdr *nlh;
u8 scope = RT_SCOPE_UNIVERSE;
- int ifindex = ifaca->aca_idev->dev->ifindex;
if (ipv6_addr_scope(&ifaca->aca_addr) & IFA_SITE)
scope = RT_SCOPE_SITE;
@@ -5029,14 +5040,6 @@
struct net *net = dev_net(ifa->idev->dev);
int err = -ENOBUFS;
- /* Don't send DELADDR notification for TENTATIVE address,
- * since NEWADDR notification is sent only after removing
- * TENTATIVE flag, if DAD has not failed.
- */
- if (ifa->flags & IFA_F_TENTATIVE && !(ifa->flags & IFA_F_DADFAILED) &&
- event == RTM_DELADDR)
- return;
-
skb = nlmsg_new(inet6_ifaddr_msgsize(), GFP_ATOMIC);
if (!skb)
goto errout;
@@ -5607,29 +5610,30 @@
* our DAD process, so we don't need
* to do it again
*/
- if (!rcu_access_pointer(ifp->rt->rt6i_node))
- ip6_ins_rt(ifp->rt);
+ if (!rcu_access_pointer(ifp->rt->fib6_node))
+ ip6_ins_rt(net, ifp->rt);
if (ifp->idev->cnf.forwarding)
addrconf_join_anycast(ifp);
if (!ipv6_addr_any(&ifp->peer_addr))
addrconf_prefix_route(&ifp->peer_addr, 128,
- ifp->idev->dev, 0, 0);
+ ifp->idev->dev, 0, 0,
+ GFP_ATOMIC);
break;
case RTM_DELADDR:
if (ifp->idev->cnf.forwarding)
addrconf_leave_anycast(ifp);
addrconf_leave_solict(ifp->idev, &ifp->addr);
if (!ipv6_addr_any(&ifp->peer_addr)) {
- struct rt6_info *rt;
+ struct fib6_info *rt;
rt = addrconf_get_prefix_route(&ifp->peer_addr, 128,
ifp->idev->dev, 0, 0);
if (rt)
- ip6_del_rt(rt);
+ ip6_del_rt(net, rt);
}
if (ifp->rt) {
- if (dst_hold_safe(&ifp->rt->dst))
- ip6_del_rt(ifp->rt);
+ ip6_del_rt(net, ifp->rt);
+ ifp->rt = NULL;
}
rt_genid_bump_ipv6(net);
break;
@@ -5976,11 +5980,11 @@
list_for_each_entry(ifa, &idev->addr_list, if_list) {
spin_lock(&ifa->lock);
if (ifa->rt) {
- struct rt6_info *rt = ifa->rt;
+ struct fib6_info *rt = ifa->rt;
int cpu;
rcu_read_lock();
- addrconf_set_nopolicy(ifa->rt, val);
+ ifa->rt->dst_nopolicy = val ? true : false;
if (rt->rt6i_pcpu) {
for_each_possible_cpu(cpu) {
struct rt6_info **rtp;
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index 32b564d..5cd0029 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -134,8 +134,47 @@
return -EAFNOSUPPORT;
}
+static struct fib6_table *eafnosupport_fib6_get_table(struct net *net, u32 id)
+{
+ return NULL;
+}
+
+static struct fib6_info *
+eafnosupport_fib6_table_lookup(struct net *net, struct fib6_table *table,
+ int oif, struct flowi6 *fl6, int flags)
+{
+ return NULL;
+}
+
+static struct fib6_info *
+eafnosupport_fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+ int flags)
+{
+ return NULL;
+}
+
+static struct fib6_info *
+eafnosupport_fib6_multipath_select(const struct net *net, struct fib6_info *f6i,
+ struct flowi6 *fl6, int oif,
+ const struct sk_buff *skb, int strict)
+{
+ return f6i;
+}
+
+static u32
+eafnosupport_ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+ struct in6_addr *saddr)
+{
+ return 0;
+}
+
const struct ipv6_stub *ipv6_stub __read_mostly = &(struct ipv6_stub) {
- .ipv6_dst_lookup = eafnosupport_ipv6_dst_lookup,
+ .ipv6_dst_lookup = eafnosupport_ipv6_dst_lookup,
+ .fib6_get_table = eafnosupport_fib6_get_table,
+ .fib6_table_lookup = eafnosupport_fib6_table_lookup,
+ .fib6_lookup = eafnosupport_fib6_lookup,
+ .fib6_multipath_select = eafnosupport_fib6_multipath_select,
+ .ip6_mtu_from_fib6 = eafnosupport_ip6_mtu_from_fib6,
};
EXPORT_SYMBOL_GPL(ipv6_stub);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 8da0b51..9ed0eae 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -273,33 +273,8 @@
goto out;
}
-
-/* bind for INET6 API */
-int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
-{
- struct sock *sk = sock->sk;
- int err = 0;
-
- /* If the socket has its own bind function then use it. */
- if (sk->sk_prot->bind)
- return sk->sk_prot->bind(sk, uaddr, addr_len);
-
- if (addr_len < SIN6_LEN_RFC2133)
- return -EINVAL;
-
- /* BPF prog is run before any checks are done so that if the prog
- * changes context in a wrong way it will be caught.
- */
- err = BPF_CGROUP_RUN_PROG_INET6_BIND(sk, uaddr);
- if (err)
- return err;
-
- return __inet6_bind(sk, uaddr, addr_len, false, true);
-}
-EXPORT_SYMBOL(inet6_bind);
-
-int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
- bool force_bind_address_no_port, bool with_lock)
+static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
+ bool force_bind_address_no_port, bool with_lock)
{
struct sockaddr_in6 *addr = (struct sockaddr_in6 *)uaddr;
struct inet_sock *inet = inet_sk(sk);
@@ -444,6 +419,30 @@
goto out;
}
+/* bind for INET6 API */
+int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+{
+ struct sock *sk = sock->sk;
+ int err = 0;
+
+ /* If the socket has its own bind function then use it. */
+ if (sk->sk_prot->bind)
+ return sk->sk_prot->bind(sk, uaddr, addr_len);
+
+ if (addr_len < SIN6_LEN_RFC2133)
+ return -EINVAL;
+
+ /* BPF prog is run before any checks are done so that if the prog
+ * changes context in a wrong way it will be caught.
+ */
+ err = BPF_CGROUP_RUN_PROG_INET6_BIND(sk, uaddr);
+ if (err)
+ return err;
+
+ return __inet6_bind(sk, uaddr, addr_len, false, true);
+}
+EXPORT_SYMBOL(inet6_bind);
+
int inet6_release(struct socket *sock)
{
struct sock *sk = sock->sk;
@@ -579,7 +578,9 @@
.getsockopt = sock_common_getsockopt, /* ok */
.sendmsg = inet_sendmsg, /* ok */
.recvmsg = inet_recvmsg, /* ok */
- .mmap = sock_no_mmap,
+#ifdef CONFIG_MMU
+ .mmap = tcp_mmap,
+#endif
.sendpage = inet_sendpage,
.sendmsg_locked = tcp_sendmsg_locked,
.sendpage_locked = tcp_sendpage_locked,
@@ -590,6 +591,7 @@
.compat_setsockopt = compat_sock_common_setsockopt,
.compat_getsockopt = compat_sock_common_getsockopt,
#endif
+ .set_rcvlowat = tcp_set_rcvlowat,
};
const struct proto_ops inet6_dgram_ops = {
@@ -887,7 +889,12 @@
static const struct ipv6_stub ipv6_stub_impl = {
.ipv6_sock_mc_join = ipv6_sock_mc_join,
.ipv6_sock_mc_drop = ipv6_sock_mc_drop,
- .ipv6_dst_lookup = ip6_dst_lookup,
+ .ipv6_dst_lookup = ip6_dst_lookup,
+ .fib6_get_table = fib6_get_table,
+ .fib6_table_lookup = fib6_table_lookup,
+ .fib6_lookup = fib6_lookup,
+ .fib6_multipath_select = fib6_multipath_select,
+ .ip6_mtu_from_fib6 = ip6_mtu_from_fib6,
.udpv6_encap_enable = udpv6_encap_enable,
.ndisc_send_na = ndisc_send_na,
.nd_tbl = &nd_tbl,
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index bbcabbb..0250d19 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -212,16 +212,14 @@
static void aca_put(struct ifacaddr6 *ac)
{
if (refcount_dec_and_test(&ac->aca_refcnt)) {
- in6_dev_put(ac->aca_idev);
- dst_release(&ac->aca_rt->dst);
+ fib6_info_release(ac->aca_rt);
kfree(ac);
}
}
-static struct ifacaddr6 *aca_alloc(struct rt6_info *rt,
+static struct ifacaddr6 *aca_alloc(struct fib6_info *f6i,
const struct in6_addr *addr)
{
- struct inet6_dev *idev = rt->rt6i_idev;
struct ifacaddr6 *aca;
aca = kzalloc(sizeof(*aca), GFP_ATOMIC);
@@ -229,9 +227,8 @@
return NULL;
aca->aca_addr = *addr;
- in6_dev_hold(idev);
- aca->aca_idev = idev;
- aca->aca_rt = rt;
+ fib6_info_hold(f6i);
+ aca->aca_rt = f6i;
aca->aca_users = 1;
/* aca_tstamp should be updated upon changes */
aca->aca_cstamp = aca->aca_tstamp = jiffies;
@@ -246,7 +243,8 @@
int __ipv6_dev_ac_inc(struct inet6_dev *idev, const struct in6_addr *addr)
{
struct ifacaddr6 *aca;
- struct rt6_info *rt;
+ struct fib6_info *f6i;
+ struct net *net;
int err;
ASSERT_RTNL();
@@ -265,14 +263,15 @@
}
}
- rt = addrconf_dst_alloc(idev, addr, true);
- if (IS_ERR(rt)) {
- err = PTR_ERR(rt);
+ net = dev_net(idev->dev);
+ f6i = addrconf_f6i_alloc(net, idev, addr, true, GFP_ATOMIC);
+ if (IS_ERR(f6i)) {
+ err = PTR_ERR(f6i);
goto out;
}
- aca = aca_alloc(rt, addr);
+ aca = aca_alloc(f6i, addr);
if (!aca) {
- ip6_rt_put(rt);
+ fib6_info_release(f6i);
err = -ENOMEM;
goto out;
}
@@ -286,7 +285,7 @@
aca_get(aca);
write_unlock_bh(&idev->lock);
- ip6_ins_rt(rt);
+ ip6_ins_rt(net, f6i);
addrconf_join_solict(idev->dev, &aca->aca_addr);
@@ -328,8 +327,7 @@
write_unlock_bh(&idev->lock);
addrconf_leave_solict(idev, &aca->aca_addr);
- dst_hold(&aca->aca_rt->dst);
- ip6_del_rt(aca->aca_rt);
+ ip6_del_rt(dev_net(idev->dev), aca->aca_rt);
aca_put(aca);
return 0;
@@ -356,8 +354,7 @@
addrconf_leave_solict(idev, &aca->aca_addr);
- dst_hold(&aca->aca_rt->dst);
- ip6_del_rt(aca->aca_rt);
+ ip6_del_rt(dev_net(idev->dev), aca->aca_rt);
aca_put(aca);
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index bc68eb6..5bc2bf3 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -280,6 +280,7 @@
static int ipv6_destopt_rcv(struct sk_buff *skb)
{
+ struct inet6_dev *idev = __in6_dev_get(skb->dev);
struct inet6_skb_parm *opt = IP6CB(skb);
#if IS_ENABLED(CONFIG_IPV6_MIP6)
__u16 dstbuf;
@@ -291,7 +292,7 @@
if (!pskb_may_pull(skb, skb_transport_offset(skb) + 8) ||
!pskb_may_pull(skb, (skb_transport_offset(skb) +
((skb_transport_header(skb)[1] + 1) << 3)))) {
- __IP6_INC_STATS(dev_net(dst->dev), ip6_dst_idev(dst),
+ __IP6_INC_STATS(dev_net(dst->dev), idev,
IPSTATS_MIB_INHDRERRORS);
fail_and_free:
kfree_skb(skb);
@@ -319,8 +320,7 @@
return 1;
}
- __IP6_INC_STATS(dev_net(dst->dev),
- ip6_dst_idev(dst), IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
return -1;
}
@@ -416,8 +416,7 @@
}
if (hdr->segments_left >= (hdr->hdrlen >> 1)) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD,
((&hdr->segments_left) -
skb_network_header(skb)));
@@ -456,8 +455,7 @@
if (skb_dst(skb)->dev->flags & IFF_LOOPBACK) {
if (ipv6_hdr(skb)->hop_limit <= 1) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_send(skb, ICMPV6_TIME_EXCEED,
ICMPV6_EXC_HOPLIMIT, 0);
kfree_skb(skb);
@@ -481,10 +479,10 @@
/* called with rcu_read_lock() */
static int ipv6_rthdr_rcv(struct sk_buff *skb)
{
+ struct inet6_dev *idev = __in6_dev_get(skb->dev);
struct inet6_skb_parm *opt = IP6CB(skb);
struct in6_addr *addr = NULL;
struct in6_addr daddr;
- struct inet6_dev *idev;
int n, i;
struct ipv6_rt_hdr *hdr;
struct rt0_hdr *rthdr;
@@ -498,8 +496,7 @@
if (!pskb_may_pull(skb, skb_transport_offset(skb) + 8) ||
!pskb_may_pull(skb, (skb_transport_offset(skb) +
((skb_transport_header(skb)[1] + 1) << 3)))) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
kfree_skb(skb);
return -1;
}
@@ -508,8 +505,7 @@
if (ipv6_addr_is_multicast(&ipv6_hdr(skb)->daddr) ||
skb->pkt_type != PACKET_HOST) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INADDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
kfree_skb(skb);
return -1;
}
@@ -527,7 +523,7 @@
* processed by own
*/
if (!addr) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+ __IP6_INC_STATS(net, idev,
IPSTATS_MIB_INADDRERRORS);
kfree_skb(skb);
return -1;
@@ -553,8 +549,7 @@
goto unknown_rh;
/* Silently discard invalid RTH type 2 */
if (hdr->hdrlen != 2 || hdr->segments_left != 1) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
kfree_skb(skb);
return -1;
}
@@ -572,8 +567,7 @@
n = hdr->hdrlen >> 1;
if (hdr->segments_left > n) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD,
((&hdr->segments_left) -
skb_network_header(skb)));
@@ -609,14 +603,12 @@
if (xfrm6_input_addr(skb, (xfrm_address_t *)addr,
(xfrm_address_t *)&ipv6_hdr(skb)->saddr,
IPPROTO_ROUTING) < 0) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INADDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
kfree_skb(skb);
return -1;
}
if (!ipv6_chk_home_addr(dev_net(skb_dst(skb)->dev), addr)) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INADDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
kfree_skb(skb);
return -1;
}
@@ -627,8 +619,7 @@
}
if (ipv6_addr_is_multicast(addr)) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INADDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
kfree_skb(skb);
return -1;
}
@@ -647,8 +638,7 @@
if (skb_dst(skb)->dev->flags&IFF_LOOPBACK) {
if (ipv6_hdr(skb)->hop_limit <= 1) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT,
0);
kfree_skb(skb);
@@ -663,7 +653,7 @@
return -1;
unknown_rh:
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD,
(&hdr->type) - skb_network_header(skb));
return -1;
@@ -755,34 +745,31 @@
static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
{
const unsigned char *nh = skb_network_header(skb);
+ struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
struct net *net = ipv6_skb_net(skb);
u32 pkt_len;
if (nh[optoff + 1] != 4 || (optoff & 3) != 2) {
net_dbg_ratelimited("ipv6_hop_jumbo: wrong jumbo opt length/alignment %d\n",
nh[optoff+1]);
- __IP6_INC_STATS(net, ipv6_skb_idev(skb),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
goto drop;
}
pkt_len = ntohl(*(__be32 *)(nh + optoff + 2));
if (pkt_len <= IPV6_MAXPLEN) {
- __IP6_INC_STATS(net, ipv6_skb_idev(skb),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
return false;
}
if (ipv6_hdr(skb)->payload_len) {
- __IP6_INC_STATS(net, ipv6_skb_idev(skb),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
return false;
}
if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
- __IP6_INC_STATS(net, ipv6_skb_idev(skb),
- IPSTATS_MIB_INTRUNCATEDPKTS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INTRUNCATEDPKTS);
goto drop;
}
diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index b643f5c..ae365df 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -161,7 +161,7 @@
* if target < 0. "last header" is transport protocol header, ESP, or
* "No next header".
*
- * Note that *offset is used as input/output parameter. an if it is not zero,
+ * Note that *offset is used as input/output parameter, and if it is not zero,
* then it must be a valid offset to an inner IPv6 header. This can be used
* to explore inner IPv6 header, eg. ICMPv6 error messages.
*
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index df113c7..f590446 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -60,6 +60,39 @@
return fib_rules_seq_read(net, AF_INET6);
}
+/* called with rcu lock held; no reference taken on fib6_info */
+struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+ int flags)
+{
+ struct fib6_info *f6i;
+ int err;
+
+ if (net->ipv6.fib6_has_custom_rules) {
+ struct fib_lookup_arg arg = {
+ .lookup_ptr = fib6_table_lookup,
+ .lookup_data = &oif,
+ .flags = FIB_LOOKUP_NOREF,
+ };
+
+ l3mdev_update_flow(net, flowi6_to_flowi(fl6));
+
+ err = fib_rules_lookup(net->ipv6.fib6_rules_ops,
+ flowi6_to_flowi(fl6), flags, &arg);
+ if (err)
+ return ERR_PTR(err);
+
+ f6i = arg.result ? : net->ipv6.fib6_null_entry;
+ } else {
+ f6i = fib6_table_lookup(net, net->ipv6.fib6_local_tbl,
+ oif, fl6, flags);
+ if (!f6i || f6i == net->ipv6.fib6_null_entry)
+ f6i = fib6_table_lookup(net, net->ipv6.fib6_main_tbl,
+ oif, fl6, flags);
+ }
+
+ return f6i;
+}
+
struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
const struct sk_buff *skb,
int flags, pol_lookup_t lookup)
@@ -96,8 +129,73 @@
return &net->ipv6.ip6_null_entry->dst;
}
-static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
- int flags, struct fib_lookup_arg *arg)
+static int fib6_rule_saddr(struct net *net, struct fib_rule *rule, int flags,
+ struct flowi6 *flp6, const struct net_device *dev)
+{
+ struct fib6_rule *r = (struct fib6_rule *)rule;
+
+ /* If we need to find a source address for this traffic,
+ * we check the result if it meets requirement of the rule.
+ */
+ if ((rule->flags & FIB_RULE_FIND_SADDR) &&
+ r->src.plen && !(flags & RT6_LOOKUP_F_HAS_SADDR)) {
+ struct in6_addr saddr;
+
+ if (ipv6_dev_get_saddr(net, dev, &flp6->daddr,
+ rt6_flags2srcprefs(flags), &saddr))
+ return -EAGAIN;
+
+ if (!ipv6_prefix_equal(&saddr, &r->src.addr, r->src.plen))
+ return -EAGAIN;
+
+ flp6->saddr = saddr;
+ }
+
+ return 0;
+}
+
+static int fib6_rule_action_alt(struct fib_rule *rule, struct flowi *flp,
+ int flags, struct fib_lookup_arg *arg)
+{
+ struct flowi6 *flp6 = &flp->u.ip6;
+ struct net *net = rule->fr_net;
+ struct fib6_table *table;
+ struct fib6_info *f6i;
+ int err = -EAGAIN, *oif;
+ u32 tb_id;
+
+ switch (rule->action) {
+ case FR_ACT_TO_TBL:
+ break;
+ case FR_ACT_UNREACHABLE:
+ return -ENETUNREACH;
+ case FR_ACT_PROHIBIT:
+ return -EACCES;
+ case FR_ACT_BLACKHOLE:
+ default:
+ return -EINVAL;
+ }
+
+ tb_id = fib_rule_get_table(rule, arg);
+ table = fib6_get_table(net, tb_id);
+ if (!table)
+ return -EAGAIN;
+
+ oif = (int *)arg->lookup_data;
+ f6i = fib6_table_lookup(net, table, *oif, flp6, flags);
+ if (f6i != net->ipv6.fib6_null_entry) {
+ err = fib6_rule_saddr(net, rule, flags, flp6,
+ fib6_info_nh_dev(f6i));
+
+ if (likely(!err))
+ arg->result = f6i;
+ }
+
+ return err;
+}
+
+static int __fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
+ int flags, struct fib_lookup_arg *arg)
{
struct flowi6 *flp6 = &flp->u.ip6;
struct rt6_info *rt = NULL;
@@ -134,27 +232,12 @@
rt = lookup(net, table, flp6, arg->lookup_data, flags);
if (rt != net->ipv6.ip6_null_entry) {
- struct fib6_rule *r = (struct fib6_rule *)rule;
+ err = fib6_rule_saddr(net, rule, flags, flp6,
+ ip6_dst_idev(&rt->dst)->dev);
- /*
- * If we need to find a source address for this traffic,
- * we check the result if it meets requirement of the rule.
- */
- if ((rule->flags & FIB_RULE_FIND_SADDR) &&
- r->src.plen && !(flags & RT6_LOOKUP_F_HAS_SADDR)) {
- struct in6_addr saddr;
+ if (err == -EAGAIN)
+ goto again;
- if (ipv6_dev_get_saddr(net,
- ip6_dst_idev(&rt->dst)->dev,
- &flp6->daddr,
- rt6_flags2srcprefs(flags),
- &saddr))
- goto again;
- if (!ipv6_prefix_equal(&saddr, &r->src.addr,
- r->src.plen))
- goto again;
- flp6->saddr = saddr;
- }
err = rt->dst.error;
if (err != -EAGAIN)
goto out;
@@ -172,6 +255,15 @@
return err;
}
+static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp,
+ int flags, struct fib_lookup_arg *arg)
+{
+ if (arg->lookup_ptr == fib6_table_lookup)
+ return fib6_rule_action_alt(rule, flp, flags, arg);
+
+ return __fib6_rule_action(rule, flp, flags, arg);
+}
+
static bool fib6_rule_suppress(struct fib_rule *rule, struct fib_lookup_arg *arg)
{
struct rt6_info *rt = (struct rt6_info *) arg->result;
@@ -245,15 +337,18 @@
static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct fib_rule_hdr *frh,
- struct nlattr **tb)
+ struct nlattr **tb,
+ struct netlink_ext_ack *extack)
{
int err = -EINVAL;
struct net *net = sock_net(skb->sk);
struct fib6_rule *rule6 = (struct fib6_rule *) rule;
if (rule->action == FR_ACT_TO_TBL && !rule->l3mdev) {
- if (rule->table == RT6_TABLE_UNSPEC)
+ if (rule->table == RT6_TABLE_UNSPEC) {
+ NL_SET_ERR_MSG(extack, "Invalid table");
goto errout;
+ }
if (fib6_new_table(net, rule->table) == NULL) {
err = -ENOBUFS;
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deab2db..f9132a6 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -43,7 +43,7 @@
struct fib6_cleaner {
struct fib6_walker w;
struct net *net;
- int (*func)(struct rt6_info *, void *arg);
+ int (*func)(struct fib6_info *, void *arg);
int sernum;
void *arg;
};
@@ -54,7 +54,7 @@
#define FWS_INIT FWS_L
#endif
-static struct rt6_info *fib6_find_prefix(struct net *net,
+static struct fib6_info *fib6_find_prefix(struct net *net,
struct fib6_table *table,
struct fib6_node *fn);
static struct fib6_node *fib6_repair_tree(struct net *net,
@@ -105,13 +105,12 @@
FIB6_NO_SERNUM_CHANGE = 0,
};
-void fib6_update_sernum(struct rt6_info *rt)
+void fib6_update_sernum(struct net *net, struct fib6_info *f6i)
{
- struct net *net = dev_net(rt->dst.dev);
struct fib6_node *fn;
- fn = rcu_dereference_protected(rt->rt6i_node,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ fn = rcu_dereference_protected(f6i->fib6_node,
+ lockdep_is_held(&f6i->fib6_table->tb6_lock));
if (fn)
fn->fn_sernum = fib6_new_sernum(net);
}
@@ -146,6 +145,69 @@
addr[fn_bit >> 5];
}
+struct fib6_info *fib6_info_alloc(gfp_t gfp_flags)
+{
+ struct fib6_info *f6i;
+
+ f6i = kzalloc(sizeof(*f6i), gfp_flags);
+ if (!f6i)
+ return NULL;
+
+ f6i->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, gfp_flags);
+ if (!f6i->rt6i_pcpu) {
+ kfree(f6i);
+ return NULL;
+ }
+
+ INIT_LIST_HEAD(&f6i->fib6_siblings);
+ f6i->fib6_metrics = (struct dst_metrics *)&dst_default_metrics;
+
+ atomic_inc(&f6i->fib6_ref);
+
+ return f6i;
+}
+
+void fib6_info_destroy(struct fib6_info *f6i)
+{
+ struct rt6_exception_bucket *bucket;
+ struct dst_metrics *m;
+
+ WARN_ON(f6i->fib6_node);
+
+ bucket = rcu_dereference_protected(f6i->rt6i_exception_bucket, 1);
+ if (bucket) {
+ f6i->rt6i_exception_bucket = NULL;
+ kfree(bucket);
+ }
+
+ if (f6i->rt6i_pcpu) {
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct rt6_info **ppcpu_rt;
+ struct rt6_info *pcpu_rt;
+
+ ppcpu_rt = per_cpu_ptr(f6i->rt6i_pcpu, cpu);
+ pcpu_rt = *ppcpu_rt;
+ if (pcpu_rt) {
+ dst_dev_put(&pcpu_rt->dst);
+ dst_release(&pcpu_rt->dst);
+ *ppcpu_rt = NULL;
+ }
+ }
+ }
+
+ if (f6i->fib6_nh.nh_dev)
+ dev_put(f6i->fib6_nh.nh_dev);
+
+ m = f6i->fib6_metrics;
+ if (m != &dst_default_metrics && refcount_dec_and_test(&m->refcnt))
+ kfree(m);
+
+ kfree(f6i);
+}
+EXPORT_SYMBOL_GPL(fib6_info_destroy);
+
static struct fib6_node *node_alloc(struct net *net)
{
struct fib6_node *fn;
@@ -176,28 +238,6 @@
net->ipv6.rt6_stats->fib_nodes--;
}
-void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
-{
- int cpu;
-
- if (!non_pcpu_rt->rt6i_pcpu)
- return;
-
- for_each_possible_cpu(cpu) {
- struct rt6_info **ppcpu_rt;
- struct rt6_info *pcpu_rt;
-
- ppcpu_rt = per_cpu_ptr(non_pcpu_rt->rt6i_pcpu, cpu);
- pcpu_rt = *ppcpu_rt;
- if (pcpu_rt) {
- dst_dev_put(&pcpu_rt->dst);
- dst_release(&pcpu_rt->dst);
- *ppcpu_rt = NULL;
- }
- }
-}
-EXPORT_SYMBOL_GPL(rt6_free_pcpu);
-
static void fib6_free_table(struct fib6_table *table)
{
inetpeer_invalidate_tree(&table->tb6_peers);
@@ -232,7 +272,7 @@
if (table) {
table->tb6_id = id;
rcu_assign_pointer(table->tb6_root.leaf,
- net->ipv6.ip6_null_entry);
+ net->ipv6.fib6_null_entry);
table->tb6_root.fn_flags = RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
inet_peer_base_init(&table->tb6_peers);
}
@@ -314,6 +354,13 @@
return &rt->dst;
}
+/* called with rcu lock held; no reference taken on fib6_info */
+struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6,
+ int flags)
+{
+ return fib6_table_lookup(net, net->ipv6.fib6_main_tbl, oif, fl6, flags);
+}
+
static void __net_init fib6_tables_init(struct net *net)
{
fib6_link_table(net, net->ipv6.fib6_main_tbl);
@@ -340,7 +387,7 @@
static int call_fib6_entry_notifier(struct notifier_block *nb, struct net *net,
enum fib_event_type event_type,
- struct rt6_info *rt)
+ struct fib6_info *rt)
{
struct fib6_entry_notifier_info info = {
.rt = rt,
@@ -351,7 +398,7 @@
static int call_fib6_entry_notifiers(struct net *net,
enum fib_event_type event_type,
- struct rt6_info *rt,
+ struct fib6_info *rt,
struct netlink_ext_ack *extack)
{
struct fib6_entry_notifier_info info = {
@@ -359,7 +406,7 @@
.rt = rt,
};
- rt->rt6i_table->fib_seq++;
+ rt->fib6_table->fib_seq++;
return call_fib6_notifiers(net, event_type, &info.info);
}
@@ -368,16 +415,16 @@
struct notifier_block *nb;
};
-static void fib6_rt_dump(struct rt6_info *rt, struct fib6_dump_arg *arg)
+static void fib6_rt_dump(struct fib6_info *rt, struct fib6_dump_arg *arg)
{
- if (rt == arg->net->ipv6.ip6_null_entry)
+ if (rt == arg->net->ipv6.fib6_null_entry)
return;
call_fib6_entry_notifier(arg->nb, arg->net, FIB_EVENT_ENTRY_ADD, rt);
}
static int fib6_node_dump(struct fib6_walker *w)
{
- struct rt6_info *rt;
+ struct fib6_info *rt;
for_each_fib6_walker_rt(w)
fib6_rt_dump(rt, w->args);
@@ -426,7 +473,7 @@
static int fib6_dump_node(struct fib6_walker *w)
{
int res;
- struct rt6_info *rt;
+ struct fib6_info *rt;
for_each_fib6_walker_rt(w) {
res = rt6_dump_route(rt, w->args);
@@ -441,10 +488,10 @@
* last sibling of this route (no need to dump the
* sibling routes again)
*/
- if (rt->rt6i_nsiblings)
- rt = list_last_entry(&rt->rt6i_siblings,
- struct rt6_info,
- rt6i_siblings);
+ if (rt->fib6_nsiblings)
+ rt = list_last_entry(&rt->fib6_siblings,
+ struct fib6_info,
+ fib6_siblings);
}
w->leaf = NULL;
return 0;
@@ -579,6 +626,24 @@
return res;
}
+void fib6_metric_set(struct fib6_info *f6i, int metric, u32 val)
+{
+ if (!f6i)
+ return;
+
+ if (f6i->fib6_metrics == &dst_default_metrics) {
+ struct dst_metrics *p = kzalloc(sizeof(*p), GFP_ATOMIC);
+
+ if (!p)
+ return;
+
+ refcount_set(&p->refcnt, 1);
+ f6i->fib6_metrics = p;
+ }
+
+ f6i->fib6_metrics->metrics[metric - 1] = val;
+}
+
/*
* Routing Table
*
@@ -608,7 +673,7 @@
fn = root;
do {
- struct rt6_info *leaf = rcu_dereference_protected(fn->leaf,
+ struct fib6_info *leaf = rcu_dereference_protected(fn->leaf,
lockdep_is_held(&table->tb6_lock));
key = (struct rt6key *)((u8 *)leaf + offset);
@@ -637,11 +702,11 @@
/* clean up an intermediate node */
if (!(fn->fn_flags & RTN_RTINFO)) {
RCU_INIT_POINTER(fn->leaf, NULL);
- rt6_release(leaf);
+ fib6_info_release(leaf);
/* remove null_entry in the root node */
} else if (fn->fn_flags & RTN_TL_ROOT &&
rcu_access_pointer(fn->leaf) ==
- net->ipv6.ip6_null_entry) {
+ net->ipv6.fib6_null_entry) {
RCU_INIT_POINTER(fn->leaf, NULL);
}
@@ -750,7 +815,7 @@
RCU_INIT_POINTER(in->parent, pn);
in->leaf = fn->leaf;
atomic_inc(&rcu_dereference_protected(in->leaf,
- lockdep_is_held(&table->tb6_lock))->rt6i_ref);
+ lockdep_is_held(&table->tb6_lock))->fib6_ref);
/* update parent pointer */
if (dir)
@@ -802,44 +867,37 @@
return ln;
}
-static void fib6_copy_metrics(u32 *mp, const struct mx6_config *mxc)
+static void fib6_drop_pcpu_from(struct fib6_info *f6i,
+ const struct fib6_table *table)
{
- int i;
+ int cpu;
- for (i = 0; i < RTAX_MAX; i++) {
- if (test_bit(i, mxc->mx_valid))
- mp[i] = mxc->mx[i];
+ /* release the reference to this fib entry from
+ * all of its cached pcpu routes
+ */
+ for_each_possible_cpu(cpu) {
+ struct rt6_info **ppcpu_rt;
+ struct rt6_info *pcpu_rt;
+
+ ppcpu_rt = per_cpu_ptr(f6i->rt6i_pcpu, cpu);
+ pcpu_rt = *ppcpu_rt;
+ if (pcpu_rt) {
+ struct fib6_info *from;
+
+ from = rcu_dereference_protected(pcpu_rt->from,
+ lockdep_is_held(&table->tb6_lock));
+ rcu_assign_pointer(pcpu_rt->from, NULL);
+ fib6_info_release(from);
+ }
}
}
-static int fib6_commit_metrics(struct dst_entry *dst, struct mx6_config *mxc)
-{
- if (!mxc->mx)
- return 0;
-
- if (dst->flags & DST_HOST) {
- u32 *mp = dst_metrics_write_ptr(dst);
-
- if (unlikely(!mp))
- return -ENOMEM;
-
- fib6_copy_metrics(mp, mxc);
- } else {
- dst_init_metrics(dst, mxc->mx, false);
-
- /* We've stolen mx now. */
- mxc->mx = NULL;
- }
-
- return 0;
-}
-
-static void fib6_purge_rt(struct rt6_info *rt, struct fib6_node *fn,
+static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn,
struct net *net)
{
- struct fib6_table *table = rt->rt6i_table;
+ struct fib6_table *table = rt->fib6_table;
- if (atomic_read(&rt->rt6i_ref) != 1) {
+ if (atomic_read(&rt->fib6_ref) != 1) {
/* This route is used as dummy address holder in some split
* nodes. It is not leaked, but it still holds other resources,
* which must be released in time. So, scan ascendant nodes
@@ -847,18 +905,22 @@
* to still alive ones.
*/
while (fn) {
- struct rt6_info *leaf = rcu_dereference_protected(fn->leaf,
+ struct fib6_info *leaf = rcu_dereference_protected(fn->leaf,
lockdep_is_held(&table->tb6_lock));
- struct rt6_info *new_leaf;
+ struct fib6_info *new_leaf;
if (!(fn->fn_flags & RTN_RTINFO) && leaf == rt) {
new_leaf = fib6_find_prefix(net, table, fn);
- atomic_inc(&new_leaf->rt6i_ref);
+ atomic_inc(&new_leaf->fib6_ref);
+
rcu_assign_pointer(fn->leaf, new_leaf);
- rt6_release(rt);
+ fib6_info_release(rt);
}
fn = rcu_dereference_protected(fn->parent,
lockdep_is_held(&table->tb6_lock));
}
+
+ if (rt->rt6i_pcpu)
+ fib6_drop_pcpu_from(rt, table);
}
}
@@ -866,37 +928,37 @@
* Insert routing information in a node.
*/
-static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
- struct nl_info *info, struct mx6_config *mxc,
+static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
+ struct nl_info *info,
struct netlink_ext_ack *extack)
{
- struct rt6_info *leaf = rcu_dereference_protected(fn->leaf,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
- struct rt6_info *iter = NULL;
- struct rt6_info __rcu **ins;
- struct rt6_info __rcu **fallback_ins = NULL;
+ struct fib6_info *leaf = rcu_dereference_protected(fn->leaf,
+ lockdep_is_held(&rt->fib6_table->tb6_lock));
+ struct fib6_info *iter = NULL, *match = NULL;
+ struct fib6_info __rcu **ins;
int replace = (info->nlh &&
(info->nlh->nlmsg_flags & NLM_F_REPLACE));
+ int append = (info->nlh &&
+ (info->nlh->nlmsg_flags & NLM_F_APPEND));
int add = (!info->nlh ||
(info->nlh->nlmsg_flags & NLM_F_CREATE));
int found = 0;
- bool rt_can_ecmp = rt6_qualify_for_ecmp(rt);
u16 nlflags = NLM_F_EXCL;
int err;
- if (info->nlh && (info->nlh->nlmsg_flags & NLM_F_APPEND))
+ if (append)
nlflags |= NLM_F_APPEND;
ins = &fn->leaf;
for (iter = leaf; iter;
- iter = rcu_dereference_protected(iter->rt6_next,
- lockdep_is_held(&rt->rt6i_table->tb6_lock))) {
+ iter = rcu_dereference_protected(iter->fib6_next,
+ lockdep_is_held(&rt->fib6_table->tb6_lock))) {
/*
* Search for duplicates
*/
- if (iter->rt6i_metric == rt->rt6i_metric) {
+ if (iter->fib6_metric == rt->fib6_metric) {
/*
* Same priority level
*/
@@ -906,56 +968,32 @@
nlflags &= ~NLM_F_EXCL;
if (replace) {
- if (rt_can_ecmp == rt6_qualify_for_ecmp(iter)) {
- found++;
- break;
- }
- if (rt_can_ecmp)
- fallback_ins = fallback_ins ?: ins;
- goto next_iter;
+ found++;
+ break;
}
if (rt6_duplicate_nexthop(iter, rt)) {
- if (rt->rt6i_nsiblings)
- rt->rt6i_nsiblings = 0;
- if (!(iter->rt6i_flags & RTF_EXPIRES))
+ if (rt->fib6_nsiblings)
+ rt->fib6_nsiblings = 0;
+ if (!(iter->fib6_flags & RTF_EXPIRES))
return -EEXIST;
- if (!(rt->rt6i_flags & RTF_EXPIRES))
- rt6_clean_expires(iter);
+ if (!(rt->fib6_flags & RTF_EXPIRES))
+ fib6_clean_expires(iter);
else
- rt6_set_expires(iter, rt->dst.expires);
- iter->rt6i_pmtu = rt->rt6i_pmtu;
+ fib6_set_expires(iter, rt->expires);
+ fib6_metric_set(iter, RTAX_MTU, rt->fib6_pmtu);
return -EEXIST;
}
- /* If we have the same destination and the same metric,
- * but not the same gateway, then the route we try to
- * add is sibling to this route, increment our counter
- * of siblings, and later we will add our route to the
- * list.
- * Only static routes (which don't have flag
- * RTF_EXPIRES) are used for ECMPv6.
- *
- * To avoid long list, we only had siblings if the
- * route have a gateway.
- */
- if (rt_can_ecmp &&
- rt6_qualify_for_ecmp(iter))
- rt->rt6i_nsiblings++;
+
+ /* first route that matches */
+ if (!match)
+ match = iter;
}
- if (iter->rt6i_metric > rt->rt6i_metric)
+ if (iter->fib6_metric > rt->fib6_metric)
break;
-next_iter:
- ins = &iter->rt6_next;
- }
-
- if (fallback_ins && !found) {
- /* No ECMP-able route found, replace first non-ECMP one */
- ins = fallback_ins;
- iter = rcu_dereference_protected(*ins,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
- found++;
+ ins = &iter->fib6_next;
}
/* Reset round-robin state, if necessary */
@@ -963,59 +1001,56 @@
fn->rr_ptr = NULL;
/* Link this route to others same route. */
- if (rt->rt6i_nsiblings) {
- unsigned int rt6i_nsiblings;
- struct rt6_info *sibling, *temp_sibling;
+ if (append && match) {
+ struct fib6_info *sibling, *temp_sibling;
- /* Find the first route that have the same metric */
- sibling = leaf;
- while (sibling) {
- if (sibling->rt6i_metric == rt->rt6i_metric &&
- rt6_qualify_for_ecmp(sibling)) {
- list_add_tail(&rt->rt6i_siblings,
- &sibling->rt6i_siblings);
- break;
- }
- sibling = rcu_dereference_protected(sibling->rt6_next,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ if (rt->fib6_flags & RTF_REJECT) {
+ NL_SET_ERR_MSG(extack,
+ "Can not append a REJECT route");
+ return -EINVAL;
+ } else if (match->fib6_flags & RTF_REJECT) {
+ NL_SET_ERR_MSG(extack,
+ "Can not append to a REJECT route");
+ return -EINVAL;
}
+ rt->fib6_nsiblings = match->fib6_nsiblings;
+ list_add_tail(&rt->fib6_siblings, &match->fib6_siblings);
+ match->fib6_nsiblings++;
+
/* For each sibling in the list, increment the counter of
* siblings. BUG() if counters does not match, list of siblings
* is broken!
*/
- rt6i_nsiblings = 0;
list_for_each_entry_safe(sibling, temp_sibling,
- &rt->rt6i_siblings, rt6i_siblings) {
- sibling->rt6i_nsiblings++;
- BUG_ON(sibling->rt6i_nsiblings != rt->rt6i_nsiblings);
- rt6i_nsiblings++;
+ &match->fib6_siblings, fib6_siblings) {
+ sibling->fib6_nsiblings++;
+ BUG_ON(sibling->fib6_nsiblings != match->fib6_nsiblings);
}
- BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
- rt6_multipath_rebalance(temp_sibling);
+
+ rt6_multipath_rebalance(match);
}
/*
* insert node
*/
if (!replace) {
+ enum fib_event_type event;
+
if (!add)
pr_warn("NLM_F_CREATE should be set when creating new route\n");
add:
nlflags |= NLM_F_CREATE;
- err = fib6_commit_metrics(&rt->dst, mxc);
+
+ event = append ? FIB_EVENT_ENTRY_APPEND : FIB_EVENT_ENTRY_ADD;
+ err = call_fib6_entry_notifiers(info->nl_net, event, rt,
+ extack);
if (err)
return err;
- err = call_fib6_entry_notifiers(info->nl_net,
- FIB_EVENT_ENTRY_ADD,
- rt, extack);
- if (err)
- return err;
-
- rcu_assign_pointer(rt->rt6_next, iter);
- atomic_inc(&rt->rt6i_ref);
- rcu_assign_pointer(rt->rt6i_node, fn);
+ rcu_assign_pointer(rt->fib6_next, iter);
+ atomic_inc(&rt->fib6_ref);
+ rcu_assign_pointer(rt->fib6_node, fn);
rcu_assign_pointer(*ins, rt);
if (!info->skip_notify)
inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
@@ -1027,7 +1062,7 @@
}
} else {
- int nsiblings;
+ struct fib6_info *tmp;
if (!found) {
if (add)
@@ -1036,67 +1071,72 @@
return -ENOENT;
}
- err = fib6_commit_metrics(&rt->dst, mxc);
- if (err)
- return err;
-
err = call_fib6_entry_notifiers(info->nl_net,
FIB_EVENT_ENTRY_REPLACE,
rt, extack);
if (err)
return err;
- atomic_inc(&rt->rt6i_ref);
- rcu_assign_pointer(rt->rt6i_node, fn);
- rt->rt6_next = iter->rt6_next;
+ /* if route being replaced has siblings, set tmp to
+ * last one, otherwise tmp is current route. this is
+ * used to set fib6_next for new route
+ */
+ if (iter->fib6_nsiblings)
+ tmp = list_last_entry(&iter->fib6_siblings,
+ struct fib6_info,
+ fib6_siblings);
+ else
+ tmp = iter;
+
+ /* insert new route */
+ atomic_inc(&rt->fib6_ref);
+ rcu_assign_pointer(rt->fib6_node, fn);
+ rt->fib6_next = tmp->fib6_next;
rcu_assign_pointer(*ins, rt);
+
if (!info->skip_notify)
inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
if (!(fn->fn_flags & RTN_RTINFO)) {
info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
fn->fn_flags |= RTN_RTINFO;
}
- nsiblings = iter->rt6i_nsiblings;
- iter->rt6i_node = NULL;
- fib6_purge_rt(iter, fn, info->nl_net);
- if (rcu_access_pointer(fn->rr_ptr) == iter)
- fn->rr_ptr = NULL;
- rt6_release(iter);
- if (nsiblings) {
+ /* delete old route */
+ rt = iter;
+
+ if (rt->fib6_nsiblings) {
+ struct fib6_info *tmp;
+
/* Replacing an ECMP route, remove all siblings */
- ins = &rt->rt6_next;
- iter = rcu_dereference_protected(*ins,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
- while (iter) {
- if (iter->rt6i_metric > rt->rt6i_metric)
- break;
- if (rt6_qualify_for_ecmp(iter)) {
- *ins = iter->rt6_next;
- iter->rt6i_node = NULL;
- fib6_purge_rt(iter, fn, info->nl_net);
- if (rcu_access_pointer(fn->rr_ptr) == iter)
- fn->rr_ptr = NULL;
- rt6_release(iter);
- nsiblings--;
- info->nl_net->ipv6.rt6_stats->fib_rt_entries--;
- } else {
- ins = &iter->rt6_next;
- }
- iter = rcu_dereference_protected(*ins,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ list_for_each_entry_safe(iter, tmp, &rt->fib6_siblings,
+ fib6_siblings) {
+ iter->fib6_node = NULL;
+ fib6_purge_rt(iter, fn, info->nl_net);
+ if (rcu_access_pointer(fn->rr_ptr) == iter)
+ fn->rr_ptr = NULL;
+ fib6_info_release(iter);
+
+ rt->fib6_nsiblings--;
+ info->nl_net->ipv6.rt6_stats->fib_rt_entries--;
}
- WARN_ON(nsiblings != 0);
}
+
+ WARN_ON(rt->fib6_nsiblings != 0);
+
+ rt->fib6_node = NULL;
+ fib6_purge_rt(rt, fn, info->nl_net);
+ if (rcu_access_pointer(fn->rr_ptr) == rt)
+ fn->rr_ptr = NULL;
+ fib6_info_release(rt);
}
return 0;
}
-static void fib6_start_gc(struct net *net, struct rt6_info *rt)
+static void fib6_start_gc(struct net *net, struct fib6_info *rt)
{
if (!timer_pending(&net->ipv6.ip6_fib_timer) &&
- (rt->rt6i_flags & (RTF_EXPIRES | RTF_CACHE)))
+ (rt->fib6_flags & RTF_EXPIRES))
mod_timer(&net->ipv6.ip6_fib_timer,
jiffies + net->ipv6.sysctl.ip6_rt_gc_interval);
}
@@ -1108,22 +1148,22 @@
jiffies + net->ipv6.sysctl.ip6_rt_gc_interval);
}
-static void __fib6_update_sernum_upto_root(struct rt6_info *rt,
+static void __fib6_update_sernum_upto_root(struct fib6_info *rt,
int sernum)
{
- struct fib6_node *fn = rcu_dereference_protected(rt->rt6i_node,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ struct fib6_node *fn = rcu_dereference_protected(rt->fib6_node,
+ lockdep_is_held(&rt->fib6_table->tb6_lock));
/* paired with smp_rmb() in rt6_get_cookie_safe() */
smp_wmb();
while (fn) {
fn->fn_sernum = sernum;
fn = rcu_dereference_protected(fn->parent,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ lockdep_is_held(&rt->fib6_table->tb6_lock));
}
}
-void fib6_update_sernum_upto_root(struct net *net, struct rt6_info *rt)
+void fib6_update_sernum_upto_root(struct net *net, struct fib6_info *rt)
{
__fib6_update_sernum_upto_root(rt, fib6_new_sernum(net));
}
@@ -1135,22 +1175,16 @@
* Need to own table->tb6_lock
*/
-int fib6_add(struct fib6_node *root, struct rt6_info *rt,
- struct nl_info *info, struct mx6_config *mxc,
- struct netlink_ext_ack *extack)
+int fib6_add(struct fib6_node *root, struct fib6_info *rt,
+ struct nl_info *info, struct netlink_ext_ack *extack)
{
- struct fib6_table *table = rt->rt6i_table;
+ struct fib6_table *table = rt->fib6_table;
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
int allow_create = 1;
int replace_required = 0;
int sernum = fib6_new_sernum(info->nl_net);
- if (WARN_ON_ONCE(!atomic_read(&rt->dst.__refcnt)))
- return -EINVAL;
- if (WARN_ON_ONCE(rt->rt6i_flags & RTF_CACHE))
- return -EINVAL;
-
if (info->nlh) {
if (!(info->nlh->nlmsg_flags & NLM_F_CREATE))
allow_create = 0;
@@ -1161,8 +1195,8 @@
pr_warn("RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE\n");
fn = fib6_add_1(info->nl_net, table, root,
- &rt->rt6i_dst.addr, rt->rt6i_dst.plen,
- offsetof(struct rt6_info, rt6i_dst), allow_create,
+ &rt->fib6_dst.addr, rt->fib6_dst.plen,
+ offsetof(struct fib6_info, fib6_dst), allow_create,
replace_required, extack);
if (IS_ERR(fn)) {
err = PTR_ERR(fn);
@@ -1173,7 +1207,7 @@
pn = fn;
#ifdef CONFIG_IPV6_SUBTREES
- if (rt->rt6i_src.plen) {
+ if (rt->fib6_src.plen) {
struct fib6_node *sn;
if (!rcu_access_pointer(fn->subtree)) {
@@ -1194,16 +1228,16 @@
if (!sfn)
goto failure;
- atomic_inc(&info->nl_net->ipv6.ip6_null_entry->rt6i_ref);
+ atomic_inc(&info->nl_net->ipv6.fib6_null_entry->fib6_ref);
rcu_assign_pointer(sfn->leaf,
- info->nl_net->ipv6.ip6_null_entry);
+ info->nl_net->ipv6.fib6_null_entry);
sfn->fn_flags = RTN_ROOT;
/* Now add the first leaf node to new subtree */
sn = fib6_add_1(info->nl_net, table, sfn,
- &rt->rt6i_src.addr, rt->rt6i_src.plen,
- offsetof(struct rt6_info, rt6i_src),
+ &rt->fib6_src.addr, rt->fib6_src.plen,
+ offsetof(struct fib6_info, fib6_src),
allow_create, replace_required, extack);
if (IS_ERR(sn)) {
@@ -1221,8 +1255,8 @@
rcu_assign_pointer(fn->subtree, sfn);
} else {
sn = fib6_add_1(info->nl_net, table, FIB6_SUBTREE(fn),
- &rt->rt6i_src.addr, rt->rt6i_src.plen,
- offsetof(struct rt6_info, rt6i_src),
+ &rt->fib6_src.addr, rt->fib6_src.plen,
+ offsetof(struct fib6_info, fib6_src),
allow_create, replace_required, extack);
if (IS_ERR(sn)) {
@@ -1235,9 +1269,9 @@
if (fn->fn_flags & RTN_TL_ROOT) {
/* put back null_entry for root node */
rcu_assign_pointer(fn->leaf,
- info->nl_net->ipv6.ip6_null_entry);
+ info->nl_net->ipv6.fib6_null_entry);
} else {
- atomic_inc(&rt->rt6i_ref);
+ atomic_inc(&rt->fib6_ref);
rcu_assign_pointer(fn->leaf, rt);
}
}
@@ -1245,7 +1279,7 @@
}
#endif
- err = fib6_add_rt2node(fn, rt, info, mxc, extack);
+ err = fib6_add_rt2node(fn, rt, info, extack);
if (!err) {
__fib6_update_sernum_upto_root(rt, sernum);
fib6_start_gc(info->nl_net, rt);
@@ -1259,13 +1293,13 @@
* super-tree leaf node we have to find a new one for it.
*/
if (pn != fn) {
- struct rt6_info *pn_leaf =
+ struct fib6_info *pn_leaf =
rcu_dereference_protected(pn->leaf,
lockdep_is_held(&table->tb6_lock));
if (pn_leaf == rt) {
pn_leaf = NULL;
RCU_INIT_POINTER(pn->leaf, NULL);
- atomic_dec(&rt->rt6i_ref);
+ fib6_info_release(rt);
}
if (!pn_leaf && !(pn->fn_flags & RTN_RTINFO)) {
pn_leaf = fib6_find_prefix(info->nl_net, table,
@@ -1274,10 +1308,10 @@
if (!pn_leaf) {
WARN_ON(!pn_leaf);
pn_leaf =
- info->nl_net->ipv6.ip6_null_entry;
+ info->nl_net->ipv6.fib6_null_entry;
}
#endif
- atomic_inc(&pn_leaf->rt6i_ref);
+ fib6_info_hold(pn_leaf);
rcu_assign_pointer(pn->leaf, pn_leaf);
}
}
@@ -1299,10 +1333,6 @@
(fn->fn_flags & RTN_TL_ROOT &&
!rcu_access_pointer(fn->leaf))))
fib6_repair_tree(info->nl_net, table, fn);
- /* Always release dst as dst->__refcnt is guaranteed
- * to be taken before entering this function
- */
- dst_release_immediate(&rt->dst);
return err;
}
@@ -1312,12 +1342,12 @@
*/
struct lookup_args {
- int offset; /* key offset on rt6_info */
+ int offset; /* key offset on fib6_info */
const struct in6_addr *addr; /* search key */
};
-static struct fib6_node *fib6_lookup_1(struct fib6_node *root,
- struct lookup_args *args)
+static struct fib6_node *fib6_node_lookup_1(struct fib6_node *root,
+ struct lookup_args *args)
{
struct fib6_node *fn;
__be32 dir;
@@ -1350,7 +1380,7 @@
struct fib6_node *subtree = FIB6_SUBTREE(fn);
if (subtree || fn->fn_flags & RTN_RTINFO) {
- struct rt6_info *leaf = rcu_dereference(fn->leaf);
+ struct fib6_info *leaf = rcu_dereference(fn->leaf);
struct rt6key *key;
if (!leaf)
@@ -1362,7 +1392,8 @@
#ifdef CONFIG_IPV6_SUBTREES
if (subtree) {
struct fib6_node *sfn;
- sfn = fib6_lookup_1(subtree, args + 1);
+ sfn = fib6_node_lookup_1(subtree,
+ args + 1);
if (!sfn)
goto backtrack;
fn = sfn;
@@ -1384,18 +1415,19 @@
/* called with rcu_read_lock() held
*/
-struct fib6_node *fib6_lookup(struct fib6_node *root, const struct in6_addr *daddr,
- const struct in6_addr *saddr)
+struct fib6_node *fib6_node_lookup(struct fib6_node *root,
+ const struct in6_addr *daddr,
+ const struct in6_addr *saddr)
{
struct fib6_node *fn;
struct lookup_args args[] = {
{
- .offset = offsetof(struct rt6_info, rt6i_dst),
+ .offset = offsetof(struct fib6_info, fib6_dst),
.addr = daddr,
},
#ifdef CONFIG_IPV6_SUBTREES
{
- .offset = offsetof(struct rt6_info, rt6i_src),
+ .offset = offsetof(struct fib6_info, fib6_src),
.addr = saddr,
},
#endif
@@ -1404,7 +1436,7 @@
}
};
- fn = fib6_lookup_1(root, daddr ? args : args + 1);
+ fn = fib6_node_lookup_1(root, daddr ? args : args + 1);
if (!fn || fn->fn_flags & RTN_TL_ROOT)
fn = root;
@@ -1431,7 +1463,7 @@
struct fib6_node *fn, *prev = NULL;
for (fn = root; fn ; ) {
- struct rt6_info *leaf = rcu_dereference(fn->leaf);
+ struct fib6_info *leaf = rcu_dereference(fn->leaf);
struct rt6key *key;
/* This node is being deleted */
@@ -1480,7 +1512,7 @@
struct fib6_node *fn;
fn = fib6_locate_1(root, daddr, dst_len,
- offsetof(struct rt6_info, rt6i_dst),
+ offsetof(struct fib6_info, fib6_dst),
exact_match);
#ifdef CONFIG_IPV6_SUBTREES
@@ -1491,7 +1523,7 @@
if (subtree) {
fn = fib6_locate_1(subtree, saddr, src_len,
- offsetof(struct rt6_info, rt6i_src),
+ offsetof(struct fib6_info, fib6_src),
exact_match);
}
}
@@ -1510,14 +1542,14 @@
*
*/
-static struct rt6_info *fib6_find_prefix(struct net *net,
+static struct fib6_info *fib6_find_prefix(struct net *net,
struct fib6_table *table,
struct fib6_node *fn)
{
struct fib6_node *child_left, *child_right;
if (fn->fn_flags & RTN_ROOT)
- return net->ipv6.ip6_null_entry;
+ return net->ipv6.fib6_null_entry;
while (fn) {
child_left = rcu_dereference_protected(fn->left,
@@ -1554,7 +1586,7 @@
/* Set fn->leaf to null_entry for root node. */
if (fn->fn_flags & RTN_TL_ROOT) {
- rcu_assign_pointer(fn->leaf, net->ipv6.ip6_null_entry);
+ rcu_assign_pointer(fn->leaf, net->ipv6.fib6_null_entry);
return fn;
}
@@ -1569,11 +1601,11 @@
lockdep_is_held(&table->tb6_lock));
struct fib6_node *pn_l = rcu_dereference_protected(pn->left,
lockdep_is_held(&table->tb6_lock));
- struct rt6_info *fn_leaf = rcu_dereference_protected(fn->leaf,
+ struct fib6_info *fn_leaf = rcu_dereference_protected(fn->leaf,
lockdep_is_held(&table->tb6_lock));
- struct rt6_info *pn_leaf = rcu_dereference_protected(pn->leaf,
+ struct fib6_info *pn_leaf = rcu_dereference_protected(pn->leaf,
lockdep_is_held(&table->tb6_lock));
- struct rt6_info *new_fn_leaf;
+ struct fib6_info *new_fn_leaf;
RT6_TRACE("fixing tree: plen=%d iter=%d\n", fn->fn_bit, iter);
iter++;
@@ -1599,10 +1631,10 @@
#if RT6_DEBUG >= 2
if (!new_fn_leaf) {
WARN_ON(!new_fn_leaf);
- new_fn_leaf = net->ipv6.ip6_null_entry;
+ new_fn_leaf = net->ipv6.fib6_null_entry;
}
#endif
- atomic_inc(&new_fn_leaf->rt6i_ref);
+ fib6_info_hold(new_fn_leaf);
rcu_assign_pointer(fn->leaf, new_fn_leaf);
return pn;
}
@@ -1658,26 +1690,24 @@
return pn;
RCU_INIT_POINTER(pn->leaf, NULL);
- rt6_release(pn_leaf);
+ fib6_info_release(pn_leaf);
fn = pn;
}
}
static void fib6_del_route(struct fib6_table *table, struct fib6_node *fn,
- struct rt6_info __rcu **rtp, struct nl_info *info)
+ struct fib6_info __rcu **rtp, struct nl_info *info)
{
struct fib6_walker *w;
- struct rt6_info *rt = rcu_dereference_protected(*rtp,
+ struct fib6_info *rt = rcu_dereference_protected(*rtp,
lockdep_is_held(&table->tb6_lock));
struct net *net = info->nl_net;
RT6_TRACE("fib6_del_route\n");
- WARN_ON_ONCE(rt->rt6i_flags & RTF_CACHE);
-
/* Unlink it */
- *rtp = rt->rt6_next;
- rt->rt6i_node = NULL;
+ *rtp = rt->fib6_next;
+ rt->fib6_node = NULL;
net->ipv6.rt6_stats->fib_rt_entries--;
net->ipv6.rt6_stats->fib_discarded_routes++;
@@ -1689,14 +1719,14 @@
fn->rr_ptr = NULL;
/* Remove this entry from other siblings */
- if (rt->rt6i_nsiblings) {
- struct rt6_info *sibling, *next_sibling;
+ if (rt->fib6_nsiblings) {
+ struct fib6_info *sibling, *next_sibling;
list_for_each_entry_safe(sibling, next_sibling,
- &rt->rt6i_siblings, rt6i_siblings)
- sibling->rt6i_nsiblings--;
- rt->rt6i_nsiblings = 0;
- list_del_init(&rt->rt6i_siblings);
+ &rt->fib6_siblings, fib6_siblings)
+ sibling->fib6_nsiblings--;
+ rt->fib6_nsiblings = 0;
+ list_del_init(&rt->fib6_siblings);
rt6_multipath_rebalance(next_sibling);
}
@@ -1705,7 +1735,7 @@
FOR_WALKERS(net, w) {
if (w->state == FWS_C && w->leaf == rt) {
RT6_TRACE("walker %p adjusted by delroute\n", w);
- w->leaf = rcu_dereference_protected(rt->rt6_next,
+ w->leaf = rcu_dereference_protected(rt->fib6_next,
lockdep_is_held(&table->tb6_lock));
if (!w->leaf)
w->state = FWS_U;
@@ -1730,46 +1760,36 @@
call_fib6_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, rt, NULL);
if (!info->skip_notify)
inet6_rt_notify(RTM_DELROUTE, rt, info, 0);
- rt6_release(rt);
+ fib6_info_release(rt);
}
/* Need to own table->tb6_lock */
-int fib6_del(struct rt6_info *rt, struct nl_info *info)
+int fib6_del(struct fib6_info *rt, struct nl_info *info)
{
- struct fib6_node *fn = rcu_dereference_protected(rt->rt6i_node,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
- struct fib6_table *table = rt->rt6i_table;
+ struct fib6_node *fn = rcu_dereference_protected(rt->fib6_node,
+ lockdep_is_held(&rt->fib6_table->tb6_lock));
+ struct fib6_table *table = rt->fib6_table;
struct net *net = info->nl_net;
- struct rt6_info __rcu **rtp;
- struct rt6_info __rcu **rtp_next;
+ struct fib6_info __rcu **rtp;
+ struct fib6_info __rcu **rtp_next;
-#if RT6_DEBUG >= 2
- if (rt->dst.obsolete > 0) {
- WARN_ON(fn);
- return -ENOENT;
- }
-#endif
- if (!fn || rt == net->ipv6.ip6_null_entry)
+ if (!fn || rt == net->ipv6.fib6_null_entry)
return -ENOENT;
WARN_ON(!(fn->fn_flags & RTN_RTINFO));
- /* remove cached dst from exception table */
- if (rt->rt6i_flags & RTF_CACHE)
- return rt6_remove_exception_rt(rt);
-
/*
* Walk the leaf entries looking for ourself
*/
for (rtp = &fn->leaf; *rtp; rtp = rtp_next) {
- struct rt6_info *cur = rcu_dereference_protected(*rtp,
+ struct fib6_info *cur = rcu_dereference_protected(*rtp,
lockdep_is_held(&table->tb6_lock));
if (rt == cur) {
fib6_del_route(table, fn, rtp, info);
return 0;
}
- rtp_next = &cur->rt6_next;
+ rtp_next = &cur->fib6_next;
}
return -ENOENT;
}
@@ -1907,7 +1927,7 @@
static int fib6_clean_node(struct fib6_walker *w)
{
int res;
- struct rt6_info *rt;
+ struct fib6_info *rt;
struct fib6_cleaner *c = container_of(w, struct fib6_cleaner, w);
struct nl_info info = {
.nl_net = c->net,
@@ -1932,17 +1952,17 @@
#if RT6_DEBUG >= 2
pr_debug("%s: del failed: rt=%p@%p err=%d\n",
__func__, rt,
- rcu_access_pointer(rt->rt6i_node),
+ rcu_access_pointer(rt->fib6_node),
res);
#endif
continue;
}
return 0;
} else if (res == -2) {
- if (WARN_ON(!rt->rt6i_nsiblings))
+ if (WARN_ON(!rt->fib6_nsiblings))
continue;
- rt = list_last_entry(&rt->rt6i_siblings,
- struct rt6_info, rt6i_siblings);
+ rt = list_last_entry(&rt->fib6_siblings,
+ struct fib6_info, fib6_siblings);
continue;
}
WARN_ON(res != 0);
@@ -1961,7 +1981,7 @@
*/
static void fib6_clean_tree(struct net *net, struct fib6_node *root,
- int (*func)(struct rt6_info *, void *arg),
+ int (*func)(struct fib6_info *, void *arg),
int sernum, void *arg)
{
struct fib6_cleaner c;
@@ -1979,7 +1999,7 @@
}
static void __fib6_clean_all(struct net *net,
- int (*func)(struct rt6_info *, void *),
+ int (*func)(struct fib6_info *, void *),
int sernum, void *arg)
{
struct fib6_table *table;
@@ -1999,7 +2019,7 @@
rcu_read_unlock();
}
-void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *),
+void fib6_clean_all(struct net *net, int (*func)(struct fib6_info *, void *),
void *arg)
{
__fib6_clean_all(net, func, FIB6_NO_SERNUM_CHANGE, arg);
@@ -2016,7 +2036,7 @@
* Garbage collection
*/
-static int fib6_age(struct rt6_info *rt, void *arg)
+static int fib6_age(struct fib6_info *rt, void *arg)
{
struct fib6_gc_args *gc_args = arg;
unsigned long now = jiffies;
@@ -2026,8 +2046,8 @@
* Routes are expired even if they are in use.
*/
- if (rt->rt6i_flags & RTF_EXPIRES && rt->dst.expires) {
- if (time_after(now, rt->dst.expires)) {
+ if (rt->fib6_flags & RTF_EXPIRES && rt->expires) {
+ if (time_after(now, rt->expires)) {
RT6_TRACE("expiring %p\n", rt);
return -1;
}
@@ -2110,7 +2130,7 @@
net->ipv6.fib6_main_tbl->tb6_id = RT6_TABLE_MAIN;
rcu_assign_pointer(net->ipv6.fib6_main_tbl->tb6_root.leaf,
- net->ipv6.ip6_null_entry);
+ net->ipv6.fib6_null_entry);
net->ipv6.fib6_main_tbl->tb6_root.fn_flags =
RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
inet_peer_base_init(&net->ipv6.fib6_main_tbl->tb6_peers);
@@ -2122,7 +2142,7 @@
goto out_fib6_main_tbl;
net->ipv6.fib6_local_tbl->tb6_id = RT6_TABLE_LOCAL;
rcu_assign_pointer(net->ipv6.fib6_local_tbl->tb6_root.leaf,
- net->ipv6.ip6_null_entry);
+ net->ipv6.fib6_null_entry);
net->ipv6.fib6_local_tbl->tb6_root.fn_flags =
RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
inet_peer_base_init(&net->ipv6.fib6_local_tbl->tb6_peers);
@@ -2220,25 +2240,26 @@
static int ipv6_route_seq_show(struct seq_file *seq, void *v)
{
- struct rt6_info *rt = v;
+ struct fib6_info *rt = v;
struct ipv6_route_iter *iter = seq->private;
+ const struct net_device *dev;
- seq_printf(seq, "%pi6 %02x ", &rt->rt6i_dst.addr, rt->rt6i_dst.plen);
+ seq_printf(seq, "%pi6 %02x ", &rt->fib6_dst.addr, rt->fib6_dst.plen);
#ifdef CONFIG_IPV6_SUBTREES
- seq_printf(seq, "%pi6 %02x ", &rt->rt6i_src.addr, rt->rt6i_src.plen);
+ seq_printf(seq, "%pi6 %02x ", &rt->fib6_src.addr, rt->fib6_src.plen);
#else
seq_puts(seq, "00000000000000000000000000000000 00 ");
#endif
- if (rt->rt6i_flags & RTF_GATEWAY)
- seq_printf(seq, "%pi6", &rt->rt6i_gateway);
+ if (rt->fib6_flags & RTF_GATEWAY)
+ seq_printf(seq, "%pi6", &rt->fib6_nh.nh_gw);
else
seq_puts(seq, "00000000000000000000000000000000");
+ dev = rt->fib6_nh.nh_dev;
seq_printf(seq, " %08x %08x %08x %08x %8s\n",
- rt->rt6i_metric, atomic_read(&rt->dst.__refcnt),
- rt->dst.__use, rt->rt6i_flags,
- rt->dst.dev ? rt->dst.dev->name : "");
+ rt->fib6_metric, atomic_read(&rt->fib6_ref), 0,
+ rt->fib6_flags, dev ? dev->name : "");
iter->w.leaf = NULL;
return 0;
}
@@ -2252,7 +2273,7 @@
do {
iter->w.leaf = rcu_dereference_protected(
- iter->w.leaf->rt6_next,
+ iter->w.leaf->fib6_next,
lockdep_is_held(&iter->tbl->tb6_lock));
iter->skip--;
if (!iter->skip && iter->w.leaf)
@@ -2311,14 +2332,14 @@
static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
int r;
- struct rt6_info *n;
+ struct fib6_info *n;
struct net *net = seq_file_net(seq);
struct ipv6_route_iter *iter = seq->private;
if (!v)
goto iter_table;
- n = rcu_dereference_bh(((struct rt6_info *)v)->rt6_next);
+ n = rcu_dereference_bh(((struct fib6_info *)v)->fib6_next);
if (n) {
++*pos;
return n;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 458de35..c8cf2fd 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -848,7 +848,7 @@
}
/**
- * ip6_tnl_addr_conflict - compare packet addresses to tunnel's own
+ * ip6gre_tnl_addr_conflict - compare packet addresses to tunnel's own
* @t: the outgoing tunnel device
* @hdr: IPv6 header from the incoming packet
*
@@ -937,6 +937,8 @@
struct flowi6 fl6;
int err = -EINVAL;
__u32 mtu;
+ int nhoff;
+ int thoff;
if (!ip6_tnl_xmit_ctl(t, &t->parms.laddr, &t->parms.raddr))
goto tx_err;
@@ -949,6 +951,16 @@
truncate = true;
}
+ nhoff = skb_network_header(skb) - skb_mac_header(skb);
+ if (skb->protocol == htons(ETH_P_IP) &&
+ (ntohs(ip_hdr(skb)->tot_len) > skb->len - nhoff))
+ truncate = true;
+
+ thoff = skb_transport_header(skb) - skb_mac_header(skb);
+ if (skb->protocol == htons(ETH_P_IPV6) &&
+ (ntohs(ipv6_hdr(skb)->payload_len) > skb->len - thoff))
+ truncate = true;
+
if (skb_cow_head(skb, dev->needed_headroom ?: t->hlen))
goto tx_err;
@@ -1376,6 +1388,7 @@
{
struct ip6_tnl *t = netdev_priv(dev);
+ gro_cells_destroy(&t->gro_cells);
dst_cache_destroy(&t->dst_cache);
free_percpu(dev->tstats);
}
@@ -1443,11 +1456,12 @@
return -ENOMEM;
ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL);
- if (ret) {
- free_percpu(dev->tstats);
- dev->tstats = NULL;
- return ret;
- }
+ if (ret)
+ goto cleanup_alloc_pcpu_stats;
+
+ ret = gro_cells_init(&tunnel->gro_cells, dev);
+ if (ret)
+ goto cleanup_dst_cache_init;
t_hlen = ip6gre_calc_hlen(tunnel);
dev->mtu = ETH_DATA_LEN - t_hlen;
@@ -1463,6 +1477,13 @@
ip6gre_tnl_init_features(dev);
return 0;
+
+cleanup_dst_cache_init:
+ dst_cache_destroy(&tunnel->dst_cache);
+cleanup_alloc_pcpu_stats:
+ free_percpu(dev->tstats);
+ dev->tstats = NULL;
+ return ret;
}
static int ip6gre_tunnel_init(struct net_device *dev)
@@ -1822,11 +1843,12 @@
return -ENOMEM;
ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL);
- if (ret) {
- free_percpu(dev->tstats);
- dev->tstats = NULL;
- return ret;
- }
+ if (ret)
+ goto cleanup_alloc_pcpu_stats;
+
+ ret = gro_cells_init(&tunnel->gro_cells, dev);
+ if (ret)
+ goto cleanup_dst_cache_init;
t_hlen = ip6erspan_calc_hlen(tunnel);
dev->mtu = ETH_DATA_LEN - t_hlen;
@@ -1839,6 +1861,13 @@
ip6erspan_tnl_link_config(tunnel, 1);
return 0;
+
+cleanup_dst_cache_init:
+ dst_cache_destroy(&tunnel->dst_cache);
+cleanup_alloc_pcpu_stats:
+ free_percpu(dev->tstats);
+ dev->tstats = NULL;
+ return ret;
}
static const struct net_device_ops ip6erspan_netdev_ops = {
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9ee208a..f08d344 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -336,7 +336,7 @@
bool deliver;
__IP6_UPD_PO_STATS(dev_net(skb_dst(skb)->dev),
- ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_INMCAST,
+ __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INMCAST,
skb->len);
hdr = ipv6_hdr(skb);
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 4a87f94..5b3f2f8 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -88,9 +88,11 @@
if (skb->encapsulation &&
skb_shinfo(skb)->gso_type & (SKB_GSO_IPXIP4 | SKB_GSO_IPXIP6))
- udpfrag = proto == IPPROTO_UDP && encap;
+ udpfrag = proto == IPPROTO_UDP && encap &&
+ (skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
else
- udpfrag = proto == IPPROTO_UDP && !skb->encapsulation;
+ udpfrag = proto == IPPROTO_UDP && !skb->encapsulation &&
+ (skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
ops = rcu_dereference(inet6_offloads[proto]);
if (likely(ops && ops->callbacks.gso_segment)) {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7b6d168..60b0d16 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -383,28 +383,6 @@
return dst_output(net, sk, skb);
}
-unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
-{
- unsigned int mtu;
- struct inet6_dev *idev;
-
- if (dst_metric_locked(dst, RTAX_MTU)) {
- mtu = dst_metric_raw(dst, RTAX_MTU);
- if (mtu)
- return mtu;
- }
-
- mtu = IPV6_MIN_MTU;
- rcu_read_lock();
- idev = __in6_dev_get(dst->dev);
- if (idev)
- mtu = idev->cnf.mtu6;
- rcu_read_unlock();
-
- return mtu;
-}
-EXPORT_SYMBOL_GPL(ip6_dst_mtu_forward);
-
static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
{
if (skb->len <= mtu)
@@ -425,6 +403,7 @@
int ip6_forward(struct sk_buff *skb)
{
+ struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
struct dst_entry *dst = skb_dst(skb);
struct ipv6hdr *hdr = ipv6_hdr(skb);
struct inet6_skb_parm *opt = IP6CB(skb);
@@ -444,8 +423,7 @@
goto drop;
if (!xfrm6_policy_check(NULL, XFRM_POLICY_FWD, skb)) {
- __IP6_INC_STATS(net, ip6_dst_idev(dst),
- IPSTATS_MIB_INDISCARDS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS);
goto drop;
}
@@ -476,8 +454,7 @@
/* Force OUTPUT device used as source address */
skb->dev = dst->dev;
icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0);
- __IP6_INC_STATS(net, ip6_dst_idev(dst),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
kfree_skb(skb);
return -ETIMEDOUT;
@@ -490,15 +467,13 @@
if (proxied > 0)
return ip6_input(skb);
else if (proxied < 0) {
- __IP6_INC_STATS(net, ip6_dst_idev(dst),
- IPSTATS_MIB_INDISCARDS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS);
goto drop;
}
}
if (!xfrm6_route_forward(skb)) {
- __IP6_INC_STATS(net, ip6_dst_idev(dst),
- IPSTATS_MIB_INDISCARDS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS);
goto drop;
}
dst = skb_dst(skb);
@@ -554,8 +529,7 @@
/* Again, force OUTPUT device used as source address */
skb->dev = dst->dev;
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
- __IP6_INC_STATS(net, ip6_dst_idev(dst),
- IPSTATS_MIB_INTOOBIGERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INTOOBIGERRORS);
__IP6_INC_STATS(net, ip6_dst_idev(dst),
IPSTATS_MIB_FRAGFAILS);
kfree_skb(skb);
@@ -579,7 +553,7 @@
ip6_forward_finish);
error:
- __IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_INADDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
drop:
kfree_skb(skb);
return -EINVAL;
@@ -966,15 +940,21 @@
* that's why we try it again later.
*/
if (ipv6_addr_any(&fl6->saddr) && (!*dst || !(*dst)->error)) {
+ struct fib6_info *from;
struct rt6_info *rt;
bool had_dst = *dst != NULL;
if (!had_dst)
*dst = ip6_route_output(net, sk, fl6);
rt = (*dst)->error ? NULL : (struct rt6_info *)*dst;
- err = ip6_route_get_saddr(net, rt, &fl6->daddr,
+
+ rcu_read_lock();
+ from = rt ? rcu_dereference(rt->from) : NULL;
+ err = ip6_route_get_saddr(net, from, &fl6->daddr,
sk ? inet6_sk(sk)->srcprefs : 0,
&fl6->saddr);
+ rcu_read_unlock();
+
if (err)
goto out_err_release;
@@ -1238,6 +1218,8 @@
if (mtu < IPV6_MIN_MTU)
return -EINVAL;
cork->base.fragsize = mtu;
+ cork->base.gso_size = sk->sk_type == SOCK_DGRAM ? ipc6->gso_size : 0;
+
if (dst_allfrag(xfrm_dst_path(&rt->dst)))
cork->base.flags |= IPCORK_ALLFRAG;
cork->base.length = 0;
@@ -1272,6 +1254,7 @@
int csummode = CHECKSUM_NONE;
unsigned int maxnonfragsize, headersize;
unsigned int wmem_alloc_delta = 0;
+ bool paged;
skb = skb_peek_tail(queue);
if (!skb) {
@@ -1279,7 +1262,8 @@
dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
}
- mtu = cork->fragsize;
+ paged = !!cork->gso_size;
+ mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
orig_mtu = mtu;
hh_len = LL_RESERVED_SPACE(rt->dst.dev);
@@ -1327,7 +1311,7 @@
if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
headersize == sizeof(struct ipv6hdr) &&
length <= mtu - headersize &&
- !(flags & MSG_MORE) &&
+ (!(flags & MSG_MORE) || cork->gso_size) &&
rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM))
csummode = CHECKSUM_PARTIAL;
@@ -1370,6 +1354,7 @@
unsigned int fraglen;
unsigned int fraggap;
unsigned int alloclen;
+ unsigned int pagedlen = 0;
alloc_new_skb:
/* There's no room in the current skb */
if (skb)
@@ -1392,11 +1377,17 @@
if (datalen > (cork->length <= mtu && !(cork->flags & IPCORK_ALLFRAG) ? mtu : maxfraglen) - fragheaderlen)
datalen = maxfraglen - fragheaderlen - rt->dst.trailer_len;
+ fraglen = datalen + fragheaderlen;
+
if ((flags & MSG_MORE) &&
!(rt->dst.dev->features&NETIF_F_SG))
alloclen = mtu;
- else
- alloclen = datalen + fragheaderlen;
+ else if (!paged)
+ alloclen = fraglen;
+ else {
+ alloclen = min_t(int, fraglen, MAX_HEADER);
+ pagedlen = fraglen - alloclen;
+ }
alloclen += dst_exthdrlen;
@@ -1418,7 +1409,7 @@
*/
alloclen += sizeof(struct frag_hdr);
- copy = datalen - transhdrlen - fraggap;
+ copy = datalen - transhdrlen - fraggap - pagedlen;
if (copy < 0) {
err = -EINVAL;
goto error;
@@ -1457,7 +1448,7 @@
/*
* Find where to start putting bytes
*/
- data = skb_put(skb, fraglen);
+ data = skb_put(skb, fraglen - pagedlen);
skb_set_network_header(skb, exthdrlen);
data += fragheaderlen;
skb->transport_header = (skb->network_header +
@@ -1480,7 +1471,7 @@
}
offset += copy;
- length -= datalen - fraggap;
+ length -= copy + transhdrlen;
transhdrlen = 0;
exthdrlen = 0;
dst_exthdrlen = 0;
@@ -1754,9 +1745,9 @@
void *from, int length, int transhdrlen,
struct ipcm6_cookie *ipc6, struct flowi6 *fl6,
struct rt6_info *rt, unsigned int flags,
+ struct inet_cork_full *cork,
const struct sockcm_cookie *sockc)
{
- struct inet_cork_full cork;
struct inet6_cork v6_cork;
struct sk_buff_head queue;
int exthdrlen = (ipc6->opt ? ipc6->opt->opt_flen : 0);
@@ -1767,27 +1758,27 @@
__skb_queue_head_init(&queue);
- cork.base.flags = 0;
- cork.base.addr = 0;
- cork.base.opt = NULL;
- cork.base.dst = NULL;
+ cork->base.flags = 0;
+ cork->base.addr = 0;
+ cork->base.opt = NULL;
+ cork->base.dst = NULL;
v6_cork.opt = NULL;
- err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6);
+ err = ip6_setup_cork(sk, cork, &v6_cork, ipc6, rt, fl6);
if (err) {
- ip6_cork_release(&cork, &v6_cork);
+ ip6_cork_release(cork, &v6_cork);
return ERR_PTR(err);
}
if (ipc6->dontfrag < 0)
ipc6->dontfrag = inet6_sk(sk)->dontfrag;
- err = __ip6_append_data(sk, fl6, &queue, &cork.base, &v6_cork,
+ err = __ip6_append_data(sk, fl6, &queue, &cork->base, &v6_cork,
¤t->task_frag, getfrag, from,
length + exthdrlen, transhdrlen + exthdrlen,
flags, ipc6, sockc);
if (err) {
- __ip6_flush_pending_frames(sk, &queue, &cork, &v6_cork);
+ __ip6_flush_pending_frames(sk, &queue, cork, &v6_cork);
return ERR_PTR(err);
}
- return __ip6_make_skb(sk, &queue, &cork, &v6_cork);
+ return __ip6_make_skb(sk, &queue, cork, &v6_cork);
}
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ca957dd..b7f28de 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -743,7 +743,7 @@
}
/**
- * vti6_tnl_ioctl - configure vti6 tunnels from userspace
+ * vti6_ioctl - configure vti6 tunnels from userspace
* @dev: virtual device associated with tunnel
* @ifr: parameters passed from userspace
* @cmd: command to be performed
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 298fd8b..20a419e 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -180,7 +180,8 @@
};
static int ip6mr_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
- struct fib_rule_hdr *frh, struct nlattr **tb)
+ struct fib_rule_hdr *frh, struct nlattr **tb,
+ struct netlink_ext_ack *extack)
{
return 0;
}
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 9de4dfb1..9ac5366 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1155,7 +1155,8 @@
struct ra_msg *ra_msg = (struct ra_msg *)skb_transport_header(skb);
struct neighbour *neigh = NULL;
struct inet6_dev *in6_dev;
- struct rt6_info *rt = NULL;
+ struct fib6_info *rt = NULL;
+ struct net *net;
int lifetime;
struct ndisc_options ndopts;
int optlen;
@@ -1253,9 +1254,9 @@
/* Do not accept RA with source-addr found on local machine unless
* accept_ra_from_local is set to true.
*/
+ net = dev_net(in6_dev->dev);
if (!in6_dev->cnf.accept_ra_from_local &&
- ipv6_chk_addr(dev_net(in6_dev->dev), &ipv6_hdr(skb)->saddr,
- in6_dev->dev, 0)) {
+ ipv6_chk_addr(net, &ipv6_hdr(skb)->saddr, in6_dev->dev, 0)) {
ND_PRINTK(2, info,
"RA from local address detected on dev: %s: default router ignored\n",
skb->dev->name);
@@ -1272,20 +1273,22 @@
pref = ICMPV6_ROUTER_PREF_MEDIUM;
#endif
- rt = rt6_get_dflt_router(&ipv6_hdr(skb)->saddr, skb->dev);
+ rt = rt6_get_dflt_router(net, &ipv6_hdr(skb)->saddr, skb->dev);
if (rt) {
- neigh = dst_neigh_lookup(&rt->dst, &ipv6_hdr(skb)->saddr);
+ neigh = ip6_neigh_lookup(&rt->fib6_nh.nh_gw,
+ rt->fib6_nh.nh_dev, NULL,
+ &ipv6_hdr(skb)->saddr);
if (!neigh) {
ND_PRINTK(0, err,
"RA: %s got default router without neighbour\n",
__func__);
- ip6_rt_put(rt);
+ fib6_info_release(rt);
return;
}
}
if (rt && lifetime == 0) {
- ip6_del_rt(rt);
+ ip6_del_rt(net, rt);
rt = NULL;
}
@@ -1294,7 +1297,8 @@
if (!rt && lifetime) {
ND_PRINTK(3, info, "RA: adding default router\n");
- rt = rt6_add_dflt_router(&ipv6_hdr(skb)->saddr, skb->dev, pref);
+ rt = rt6_add_dflt_router(net, &ipv6_hdr(skb)->saddr,
+ skb->dev, pref);
if (!rt) {
ND_PRINTK(0, err,
"RA: %s failed to add default route\n",
@@ -1302,28 +1306,29 @@
return;
}
- neigh = dst_neigh_lookup(&rt->dst, &ipv6_hdr(skb)->saddr);
+ neigh = ip6_neigh_lookup(&rt->fib6_nh.nh_gw,
+ rt->fib6_nh.nh_dev, NULL,
+ &ipv6_hdr(skb)->saddr);
if (!neigh) {
ND_PRINTK(0, err,
"RA: %s got default router without neighbour\n",
__func__);
- ip6_rt_put(rt);
+ fib6_info_release(rt);
return;
}
neigh->flags |= NTF_ROUTER;
} else if (rt) {
- rt->rt6i_flags = (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref);
+ rt->fib6_flags = (rt->fib6_flags & ~RTF_PREF_MASK) | RTF_PREF(pref);
}
if (rt)
- rt6_set_expires(rt, jiffies + (HZ * lifetime));
+ fib6_set_expires(rt, jiffies + (HZ * lifetime));
if (in6_dev->cnf.accept_ra_min_hop_limit < 256 &&
ra_msg->icmph.icmp6_hop_limit) {
if (in6_dev->cnf.accept_ra_min_hop_limit <= ra_msg->icmph.icmp6_hop_limit) {
in6_dev->cnf.hop_limit = ra_msg->icmph.icmp6_hop_limit;
- if (rt)
- dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
- ra_msg->icmph.icmp6_hop_limit);
+ fib6_metric_set(rt, RTAX_HOPLIMIT,
+ ra_msg->icmph.icmp6_hop_limit);
} else {
ND_PRINTK(2, warn, "RA: Got route advertisement with lower hop_limit than minimum\n");
}
@@ -1475,10 +1480,7 @@
ND_PRINTK(2, warn, "RA: invalid mtu: %d\n", mtu);
} else if (in6_dev->cnf.mtu6 != mtu) {
in6_dev->cnf.mtu6 = mtu;
-
- if (rt)
- dst_metric_set(&rt->dst, RTAX_MTU, mtu);
-
+ fib6_metric_set(rt, RTAX_MTU, mtu);
rt6_mtu_change(skb->dev, mtu);
}
}
@@ -1497,7 +1499,7 @@
ND_PRINTK(2, warn, "RA: invalid RA options\n");
}
out:
- ip6_rt_put(rt);
+ fib6_info_release(rt);
if (neigh)
neigh_release(neigh);
}
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 97f79dc..0758b5b 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -529,7 +529,6 @@
.family = NFPROTO_IPV6,
};
- t = ip6t_get_target(e);
return xt_check_target(&par, t->u.target_size - sizeof(*t),
e->ipv6.proto,
e->ipv6.invflags & IP6T_INV_PROTO);
@@ -1794,6 +1793,8 @@
/* set res now, will see skbs right after nf_register_net_hooks */
WRITE_ONCE(*res, new_table);
+ if (!ops)
+ return 0;
ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
if (ret != 0) {
@@ -1811,7 +1812,8 @@
void ip6t_unregister_table(struct net *net, struct xt_table *table,
const struct nf_hook_ops *ops)
{
- nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+ if (ops)
+ nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
__ip6t_unregister_table(net, table);
}
diff --git a/net/ipv6/netfilter/ip6t_MASQUERADE.c b/net/ipv6/netfilter/ip6t_MASQUERADE.c
index 92c0047..491f808 100644
--- a/net/ipv6/netfilter/ip6t_MASQUERADE.c
+++ b/net/ipv6/netfilter/ip6t_MASQUERADE.c
@@ -29,7 +29,7 @@
static int masquerade_tg6_checkentry(const struct xt_tgchk_param *par)
{
- const struct nf_nat_range *range = par->targinfo;
+ const struct nf_nat_range2 *range = par->targinfo;
if (range->flags & NF_NAT_RANGE_MAP_IPS)
return -EINVAL;
diff --git a/net/ipv6/netfilter/ip6t_rpfilter.c b/net/ipv6/netfilter/ip6t_rpfilter.c
index d12f511..0fe61ed 100644
--- a/net/ipv6/netfilter/ip6t_rpfilter.c
+++ b/net/ipv6/netfilter/ip6t_rpfilter.c
@@ -48,6 +48,8 @@
}
fl6.flowi6_mark = flags & XT_RPFILTER_VALID_MARK ? skb->mark : 0;
+ if ((flags & XT_RPFILTER_LOOSE) == 0)
+ fl6.flowi6_oif = dev->ifindex;
rt = (void *)ip6_route_lookup(net, &fl6, skb, lookup_flags);
if (rt->dst.error)
diff --git a/net/ipv6/netfilter/ip6t_srh.c b/net/ipv6/netfilter/ip6t_srh.c
index 33719d5..1059894 100644
--- a/net/ipv6/netfilter/ip6t_srh.c
+++ b/net/ipv6/netfilter/ip6t_srh.c
@@ -117,6 +117,130 @@
return true;
}
+static bool srh1_mt6(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ int hdrlen, psidoff, nsidoff, lsidoff, srhoff = 0;
+ const struct ip6t_srh1 *srhinfo = par->matchinfo;
+ struct in6_addr *psid, *nsid, *lsid;
+ struct in6_addr _psid, _nsid, _lsid;
+ struct ipv6_sr_hdr *srh;
+ struct ipv6_sr_hdr _srh;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ return false;
+ srh = skb_header_pointer(skb, srhoff, sizeof(_srh), &_srh);
+ if (!srh)
+ return false;
+
+ hdrlen = ipv6_optlen(srh);
+ if (skb->len - srhoff < hdrlen)
+ return false;
+
+ if (srh->type != IPV6_SRCRT_TYPE_4)
+ return false;
+
+ if (srh->segments_left > srh->first_segment)
+ return false;
+
+ /* Next Header matching */
+ if (srhinfo->mt_flags & IP6T_SRH_NEXTHDR)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_NEXTHDR,
+ !(srh->nexthdr == srhinfo->next_hdr)))
+ return false;
+
+ /* Header Extension Length matching */
+ if (srhinfo->mt_flags & IP6T_SRH_LEN_EQ)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LEN_EQ,
+ !(srh->hdrlen == srhinfo->hdr_len)))
+ return false;
+ if (srhinfo->mt_flags & IP6T_SRH_LEN_GT)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LEN_GT,
+ !(srh->hdrlen > srhinfo->hdr_len)))
+ return false;
+ if (srhinfo->mt_flags & IP6T_SRH_LEN_LT)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LEN_LT,
+ !(srh->hdrlen < srhinfo->hdr_len)))
+ return false;
+
+ /* Segments Left matching */
+ if (srhinfo->mt_flags & IP6T_SRH_SEGS_EQ)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_SEGS_EQ,
+ !(srh->segments_left == srhinfo->segs_left)))
+ return false;
+ if (srhinfo->mt_flags & IP6T_SRH_SEGS_GT)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_SEGS_GT,
+ !(srh->segments_left > srhinfo->segs_left)))
+ return false;
+ if (srhinfo->mt_flags & IP6T_SRH_SEGS_LT)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_SEGS_LT,
+ !(srh->segments_left < srhinfo->segs_left)))
+ return false;
+
+ /**
+ * Last Entry matching
+ * Last_Entry field was introduced in revision 6 of the SRH draft.
+ * It was called First_Segment in the previous revision
+ */
+ if (srhinfo->mt_flags & IP6T_SRH_LAST_EQ)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LAST_EQ,
+ !(srh->first_segment == srhinfo->last_entry)))
+ return false;
+ if (srhinfo->mt_flags & IP6T_SRH_LAST_GT)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LAST_GT,
+ !(srh->first_segment > srhinfo->last_entry)))
+ return false;
+ if (srhinfo->mt_flags & IP6T_SRH_LAST_LT)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LAST_LT,
+ !(srh->first_segment < srhinfo->last_entry)))
+ return false;
+
+ /**
+ * Tag matchig
+ * Tag field was introduced in revision 6 of the SRH draft
+ */
+ if (srhinfo->mt_flags & IP6T_SRH_TAG)
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_TAG,
+ !(srh->tag == srhinfo->tag)))
+ return false;
+
+ /* Previous SID matching */
+ if (srhinfo->mt_flags & IP6T_SRH_PSID) {
+ if (srh->segments_left == srh->first_segment)
+ return false;
+ psidoff = srhoff + sizeof(struct ipv6_sr_hdr) +
+ ((srh->segments_left + 1) * sizeof(struct in6_addr));
+ psid = skb_header_pointer(skb, psidoff, sizeof(_psid), &_psid);
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_PSID,
+ ipv6_masked_addr_cmp(psid, &srhinfo->psid_msk,
+ &srhinfo->psid_addr)))
+ return false;
+ }
+
+ /* Next SID matching */
+ if (srhinfo->mt_flags & IP6T_SRH_NSID) {
+ if (srh->segments_left == 0)
+ return false;
+ nsidoff = srhoff + sizeof(struct ipv6_sr_hdr) +
+ ((srh->segments_left - 1) * sizeof(struct in6_addr));
+ nsid = skb_header_pointer(skb, nsidoff, sizeof(_nsid), &_nsid);
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_NSID,
+ ipv6_masked_addr_cmp(nsid, &srhinfo->nsid_msk,
+ &srhinfo->nsid_addr)))
+ return false;
+ }
+
+ /* Last SID matching */
+ if (srhinfo->mt_flags & IP6T_SRH_LSID) {
+ lsidoff = srhoff + sizeof(struct ipv6_sr_hdr);
+ lsid = skb_header_pointer(skb, lsidoff, sizeof(_lsid), &_lsid);
+ if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LSID,
+ ipv6_masked_addr_cmp(lsid, &srhinfo->lsid_msk,
+ &srhinfo->lsid_addr)))
+ return false;
+ }
+ return true;
+}
+
static int srh_mt6_check(const struct xt_mtchk_param *par)
{
const struct ip6t_srh *srhinfo = par->matchinfo;
@@ -136,23 +260,54 @@
return 0;
}
-static struct xt_match srh_mt6_reg __read_mostly = {
- .name = "srh",
- .family = NFPROTO_IPV6,
- .match = srh_mt6,
- .matchsize = sizeof(struct ip6t_srh),
- .checkentry = srh_mt6_check,
- .me = THIS_MODULE,
+static int srh1_mt6_check(const struct xt_mtchk_param *par)
+{
+ const struct ip6t_srh1 *srhinfo = par->matchinfo;
+
+ if (srhinfo->mt_flags & ~IP6T_SRH_MASK) {
+ pr_info_ratelimited("unknown srh match flags %X\n",
+ srhinfo->mt_flags);
+ return -EINVAL;
+ }
+
+ if (srhinfo->mt_invflags & ~IP6T_SRH_INV_MASK) {
+ pr_info_ratelimited("unknown srh invflags %X\n",
+ srhinfo->mt_invflags);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static struct xt_match srh_mt6_reg[] __read_mostly = {
+ {
+ .name = "srh",
+ .revision = 0,
+ .family = NFPROTO_IPV6,
+ .match = srh_mt6,
+ .matchsize = sizeof(struct ip6t_srh),
+ .checkentry = srh_mt6_check,
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "srh",
+ .revision = 1,
+ .family = NFPROTO_IPV6,
+ .match = srh1_mt6,
+ .matchsize = sizeof(struct ip6t_srh1),
+ .checkentry = srh1_mt6_check,
+ .me = THIS_MODULE,
+ }
};
static int __init srh_mt6_init(void)
{
- return xt_register_match(&srh_mt6_reg);
+ return xt_register_matches(srh_mt6_reg, ARRAY_SIZE(srh_mt6_reg));
}
static void __exit srh_mt6_exit(void)
{
- xt_unregister_match(&srh_mt6_reg);
+ xt_unregister_matches(srh_mt6_reg, ARRAY_SIZE(srh_mt6_reg));
}
module_init(srh_mt6_init);
diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c
index 47306e4..67ba70a 100644
--- a/net/ipv6/netfilter/ip6table_nat.c
+++ b/net/ipv6/netfilter/ip6table_nat.c
@@ -35,75 +35,63 @@
static unsigned int ip6table_nat_do_chain(void *priv,
struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct)
+ const struct nf_hook_state *state)
{
return ip6t_do_table(skb, state, state->net->ipv6.ip6table_nat);
}
-static unsigned int ip6table_nat_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv6_fn(priv, skb, state, ip6table_nat_do_chain);
-}
-
-static unsigned int ip6table_nat_in(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv6_in(priv, skb, state, ip6table_nat_do_chain);
-}
-
-static unsigned int ip6table_nat_out(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv6_out(priv, skb, state, ip6table_nat_do_chain);
-}
-
-static unsigned int ip6table_nat_local_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv6_local_fn(priv, skb, state, ip6table_nat_do_chain);
-}
-
static const struct nf_hook_ops nf_nat_ipv6_ops[] = {
- /* Before packet filtering, change destination */
{
- .hook = ip6table_nat_in,
+ .hook = ip6table_nat_do_chain,
.pf = NFPROTO_IPV6,
- .nat_hook = true,
.hooknum = NF_INET_PRE_ROUTING,
.priority = NF_IP6_PRI_NAT_DST,
},
- /* After packet filtering, change source */
{
- .hook = ip6table_nat_out,
+ .hook = ip6table_nat_do_chain,
.pf = NFPROTO_IPV6,
- .nat_hook = true,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP6_PRI_NAT_SRC,
},
- /* Before packet filtering, change destination */
{
- .hook = ip6table_nat_local_fn,
+ .hook = ip6table_nat_do_chain,
.pf = NFPROTO_IPV6,
- .nat_hook = true,
.hooknum = NF_INET_LOCAL_OUT,
.priority = NF_IP6_PRI_NAT_DST,
},
- /* After packet filtering, change source */
{
- .hook = ip6table_nat_fn,
- .nat_hook = true,
+ .hook = ip6table_nat_do_chain,
.pf = NFPROTO_IPV6,
.hooknum = NF_INET_LOCAL_IN,
.priority = NF_IP6_PRI_NAT_SRC,
},
};
+static int ip6t_nat_register_lookups(struct net *net)
+{
+ int i, ret;
+
+ for (i = 0; i < ARRAY_SIZE(nf_nat_ipv6_ops); i++) {
+ ret = nf_nat_l3proto_ipv6_register_fn(net, &nf_nat_ipv6_ops[i]);
+ if (ret) {
+ while (i)
+ nf_nat_l3proto_ipv6_unregister_fn(net, &nf_nat_ipv6_ops[--i]);
+
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static void ip6t_nat_unregister_lookups(struct net *net)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(nf_nat_ipv6_ops); i++)
+ nf_nat_l3proto_ipv6_unregister_fn(net, &nf_nat_ipv6_ops[i]);
+}
+
static int __net_init ip6table_nat_table_init(struct net *net)
{
struct ip6t_replace *repl;
@@ -116,7 +104,17 @@
if (repl == NULL)
return -ENOMEM;
ret = ip6t_register_table(net, &nf_nat_ipv6_table, repl,
- nf_nat_ipv6_ops, &net->ipv6.ip6table_nat);
+ NULL, &net->ipv6.ip6table_nat);
+ if (ret < 0) {
+ kfree(repl);
+ return ret;
+ }
+
+ ret = ip6t_nat_register_lookups(net);
+ if (ret < 0) {
+ ip6t_unregister_table(net, net->ipv6.ip6table_nat, NULL);
+ net->ipv6.ip6table_nat = NULL;
+ }
kfree(repl);
return ret;
}
@@ -125,7 +123,8 @@
{
if (!net->ipv6.ip6table_nat)
return;
- ip6t_unregister_table(net, net->ipv6.ip6table_nat, nf_nat_ipv6_ops);
+ ip6t_nat_unregister_lookups(net);
+ ip6t_unregister_table(net, net->ipv6.ip6table_nat, NULL);
net->ipv6.ip6table_nat = NULL;
}
diff --git a/net/ipv6/netfilter/nf_flow_table_ipv6.c b/net/ipv6/netfilter/nf_flow_table_ipv6.c
index 207cb35..c511d20 100644
--- a/net/ipv6/netfilter/nf_flow_table_ipv6.c
+++ b/net/ipv6/netfilter/nf_flow_table_ipv6.c
@@ -3,256 +3,12 @@
#include <linux/module.h>
#include <linux/netfilter.h>
#include <linux/rhashtable.h>
-#include <linux/ipv6.h>
-#include <linux/netdevice.h>
-#include <net/ipv6.h>
-#include <net/ip6_route.h>
-#include <net/neighbour.h>
#include <net/netfilter/nf_flow_table.h>
#include <net/netfilter/nf_tables.h>
-/* For layer 4 checksum field offset. */
-#include <linux/tcp.h>
-#include <linux/udp.h>
-
-static int nf_flow_nat_ipv6_tcp(struct sk_buff *skb, unsigned int thoff,
- struct in6_addr *addr,
- struct in6_addr *new_addr)
-{
- struct tcphdr *tcph;
-
- if (!pskb_may_pull(skb, thoff + sizeof(*tcph)) ||
- skb_try_make_writable(skb, thoff + sizeof(*tcph)))
- return -1;
-
- tcph = (void *)(skb_network_header(skb) + thoff);
- inet_proto_csum_replace16(&tcph->check, skb, addr->s6_addr32,
- new_addr->s6_addr32, true);
-
- return 0;
-}
-
-static int nf_flow_nat_ipv6_udp(struct sk_buff *skb, unsigned int thoff,
- struct in6_addr *addr,
- struct in6_addr *new_addr)
-{
- struct udphdr *udph;
-
- if (!pskb_may_pull(skb, thoff + sizeof(*udph)) ||
- skb_try_make_writable(skb, thoff + sizeof(*udph)))
- return -1;
-
- udph = (void *)(skb_network_header(skb) + thoff);
- if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) {
- inet_proto_csum_replace16(&udph->check, skb, addr->s6_addr32,
- new_addr->s6_addr32, true);
- if (!udph->check)
- udph->check = CSUM_MANGLED_0;
- }
-
- return 0;
-}
-
-static int nf_flow_nat_ipv6_l4proto(struct sk_buff *skb, struct ipv6hdr *ip6h,
- unsigned int thoff, struct in6_addr *addr,
- struct in6_addr *new_addr)
-{
- switch (ip6h->nexthdr) {
- case IPPROTO_TCP:
- if (nf_flow_nat_ipv6_tcp(skb, thoff, addr, new_addr) < 0)
- return NF_DROP;
- break;
- case IPPROTO_UDP:
- if (nf_flow_nat_ipv6_udp(skb, thoff, addr, new_addr) < 0)
- return NF_DROP;
- break;
- }
-
- return 0;
-}
-
-static int nf_flow_snat_ipv6(const struct flow_offload *flow,
- struct sk_buff *skb, struct ipv6hdr *ip6h,
- unsigned int thoff,
- enum flow_offload_tuple_dir dir)
-{
- struct in6_addr addr, new_addr;
-
- switch (dir) {
- case FLOW_OFFLOAD_DIR_ORIGINAL:
- addr = ip6h->saddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v6;
- ip6h->saddr = new_addr;
- break;
- case FLOW_OFFLOAD_DIR_REPLY:
- addr = ip6h->daddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v6;
- ip6h->daddr = new_addr;
- break;
- default:
- return -1;
- }
-
- return nf_flow_nat_ipv6_l4proto(skb, ip6h, thoff, &addr, &new_addr);
-}
-
-static int nf_flow_dnat_ipv6(const struct flow_offload *flow,
- struct sk_buff *skb, struct ipv6hdr *ip6h,
- unsigned int thoff,
- enum flow_offload_tuple_dir dir)
-{
- struct in6_addr addr, new_addr;
-
- switch (dir) {
- case FLOW_OFFLOAD_DIR_ORIGINAL:
- addr = ip6h->daddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v6;
- ip6h->daddr = new_addr;
- break;
- case FLOW_OFFLOAD_DIR_REPLY:
- addr = ip6h->saddr;
- new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v6;
- ip6h->saddr = new_addr;
- break;
- default:
- return -1;
- }
-
- return nf_flow_nat_ipv6_l4proto(skb, ip6h, thoff, &addr, &new_addr);
-}
-
-static int nf_flow_nat_ipv6(const struct flow_offload *flow,
- struct sk_buff *skb,
- enum flow_offload_tuple_dir dir)
-{
- struct ipv6hdr *ip6h = ipv6_hdr(skb);
- unsigned int thoff = sizeof(*ip6h);
-
- if (flow->flags & FLOW_OFFLOAD_SNAT &&
- (nf_flow_snat_port(flow, skb, thoff, ip6h->nexthdr, dir) < 0 ||
- nf_flow_snat_ipv6(flow, skb, ip6h, thoff, dir) < 0))
- return -1;
- if (flow->flags & FLOW_OFFLOAD_DNAT &&
- (nf_flow_dnat_port(flow, skb, thoff, ip6h->nexthdr, dir) < 0 ||
- nf_flow_dnat_ipv6(flow, skb, ip6h, thoff, dir) < 0))
- return -1;
-
- return 0;
-}
-
-static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
- struct flow_offload_tuple *tuple)
-{
- struct flow_ports *ports;
- struct ipv6hdr *ip6h;
- unsigned int thoff;
-
- if (!pskb_may_pull(skb, sizeof(*ip6h)))
- return -1;
-
- ip6h = ipv6_hdr(skb);
-
- if (ip6h->nexthdr != IPPROTO_TCP &&
- ip6h->nexthdr != IPPROTO_UDP)
- return -1;
-
- thoff = sizeof(*ip6h);
- if (!pskb_may_pull(skb, thoff + sizeof(*ports)))
- return -1;
-
- ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
-
- tuple->src_v6 = ip6h->saddr;
- tuple->dst_v6 = ip6h->daddr;
- tuple->src_port = ports->source;
- tuple->dst_port = ports->dest;
- tuple->l3proto = AF_INET6;
- tuple->l4proto = ip6h->nexthdr;
- tuple->iifidx = dev->ifindex;
-
- return 0;
-}
-
-/* Based on ip_exceeds_mtu(). */
-static bool __nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
-{
- if (skb->len <= mtu)
- return false;
-
- if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
- return false;
-
- return true;
-}
-
-static bool nf_flow_exceeds_mtu(struct sk_buff *skb, const struct rt6_info *rt)
-{
- u32 mtu;
-
- mtu = ip6_dst_mtu_forward(&rt->dst);
- if (__nf_flow_exceeds_mtu(skb, mtu))
- return true;
-
- return false;
-}
-
-unsigned int
-nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- struct flow_offload_tuple_rhash *tuplehash;
- struct nf_flowtable *flow_table = priv;
- struct flow_offload_tuple tuple = {};
- enum flow_offload_tuple_dir dir;
- struct flow_offload *flow;
- struct net_device *outdev;
- struct in6_addr *nexthop;
- struct ipv6hdr *ip6h;
- struct rt6_info *rt;
-
- if (skb->protocol != htons(ETH_P_IPV6))
- return NF_ACCEPT;
-
- if (nf_flow_tuple_ipv6(skb, state->in, &tuple) < 0)
- return NF_ACCEPT;
-
- tuplehash = flow_offload_lookup(flow_table, &tuple);
- if (tuplehash == NULL)
- return NF_ACCEPT;
-
- outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
- if (!outdev)
- return NF_ACCEPT;
-
- dir = tuplehash->tuple.dir;
- flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
-
- rt = (struct rt6_info *)flow->tuplehash[dir].tuple.dst_cache;
- if (unlikely(nf_flow_exceeds_mtu(skb, rt)))
- return NF_ACCEPT;
-
- if (skb_try_make_writable(skb, sizeof(*ip6h)))
- return NF_DROP;
-
- if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
- nf_flow_nat_ipv6(flow, skb, dir) < 0)
- return NF_DROP;
-
- flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
- ip6h = ipv6_hdr(skb);
- ip6h->hop_limit--;
-
- skb->dev = outdev;
- nexthop = rt6_nexthop(rt, &flow->tuplehash[!dir].tuple.src_v6);
- neigh_xmit(NEIGH_ND_TABLE, outdev, nexthop, skb);
-
- return NF_STOLEN;
-}
-EXPORT_SYMBOL_GPL(nf_flow_offload_ipv6_hook);
static struct nf_flowtable_type flowtable_ipv6 = {
.family = NFPROTO_IPV6,
- .params = &nf_flow_offload_rhash_params,
- .gc = nf_flow_offload_work_gc,
+ .init = nf_flow_table_init,
.free = nf_flow_table_free,
.hook = nf_flow_offload_ipv6_hook,
.owner = THIS_MODULE,
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index 6b7f075..ca6d386 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -62,7 +62,7 @@
#endif
static bool nf_nat_ipv6_in_range(const struct nf_conntrack_tuple *t,
- const struct nf_nat_range *range)
+ const struct nf_nat_range2 *range)
{
return ipv6_addr_cmp(&t->src.u3.in6, &range->min_addr.in6) >= 0 &&
ipv6_addr_cmp(&t->src.u3.in6, &range->max_addr.in6) <= 0;
@@ -151,7 +151,7 @@
#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
static int nf_nat_ipv6_nlattr_to_range(struct nlattr *tb[],
- struct nf_nat_range *range)
+ struct nf_nat_range2 *range)
{
if (tb[CTA_NAT_V6_MINIP]) {
nla_memcpy(&range->min_addr.ip6, tb[CTA_NAT_V6_MINIP],
@@ -252,18 +252,12 @@
}
EXPORT_SYMBOL_GPL(nf_nat_icmpv6_reply_translation);
-unsigned int
+static unsigned int
nf_nat_ipv6_fn(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
struct nf_conn *ct;
enum ip_conntrack_info ctinfo;
- struct nf_conn_nat *nat;
- enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
__be16 frag_off;
int hdrlen;
u8 nexthdr;
@@ -277,11 +271,7 @@
if (!ct)
return NF_ACCEPT;
- nat = nfct_nat(ct);
-
- switch (ctinfo) {
- case IP_CT_RELATED:
- case IP_CT_RELATED_REPLY:
+ if (ctinfo == IP_CT_RELATED || ctinfo == IP_CT_RELATED_REPLY) {
nexthdr = ipv6_hdr(skb)->nexthdr;
hdrlen = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
&nexthdr, &frag_off);
@@ -294,77 +284,29 @@
else
return NF_ACCEPT;
}
- /* Only ICMPs can be IP_CT_IS_REPLY: */
- /* fall through */
- case IP_CT_NEW:
- /* Seen it before? This can happen for loopback, retrans,
- * or local packets.
- */
- if (!nf_nat_initialized(ct, maniptype)) {
- unsigned int ret;
-
- ret = do_chain(priv, skb, state, ct);
- if (ret != NF_ACCEPT)
- return ret;
-
- if (nf_nat_initialized(ct, HOOK2MANIP(state->hook)))
- break;
-
- ret = nf_nat_alloc_null_binding(ct, state->hook);
- if (ret != NF_ACCEPT)
- return ret;
- } else {
- pr_debug("Already setup manip %s for ct %p\n",
- maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
- ct);
- if (nf_nat_oif_changed(state->hook, ctinfo, nat, state->out))
- goto oif_changed;
- }
- break;
-
- default:
- /* ESTABLISHED */
- WARN_ON(ctinfo != IP_CT_ESTABLISHED &&
- ctinfo != IP_CT_ESTABLISHED_REPLY);
- if (nf_nat_oif_changed(state->hook, ctinfo, nat, state->out))
- goto oif_changed;
}
- return nf_nat_packet(ct, ctinfo, state->hook, skb);
-
-oif_changed:
- nf_ct_kill_acct(ct, ctinfo, skb);
- return NF_DROP;
+ return nf_nat_inet_fn(priv, skb, state);
}
-EXPORT_SYMBOL_GPL(nf_nat_ipv6_fn);
-unsigned int
+static unsigned int
nf_nat_ipv6_in(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
unsigned int ret;
struct in6_addr daddr = ipv6_hdr(skb)->daddr;
- ret = nf_nat_ipv6_fn(priv, skb, state, do_chain);
+ ret = nf_nat_ipv6_fn(priv, skb, state);
if (ret != NF_DROP && ret != NF_STOLEN &&
ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr))
skb_dst_drop(skb);
return ret;
}
-EXPORT_SYMBOL_GPL(nf_nat_ipv6_in);
-unsigned int
+static unsigned int
nf_nat_ipv6_out(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
#ifdef CONFIG_XFRM
const struct nf_conn *ct;
@@ -373,7 +315,7 @@
#endif
unsigned int ret;
- ret = nf_nat_ipv6_fn(priv, skb, state, do_chain);
+ ret = nf_nat_ipv6_fn(priv, skb, state);
#ifdef CONFIG_XFRM
if (ret != NF_DROP && ret != NF_STOLEN &&
!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
@@ -393,22 +335,17 @@
#endif
return ret;
}
-EXPORT_SYMBOL_GPL(nf_nat_ipv6_out);
-unsigned int
+static unsigned int
nf_nat_ipv6_local_fn(void *priv, struct sk_buff *skb,
- const struct nf_hook_state *state,
- unsigned int (*do_chain)(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct))
+ const struct nf_hook_state *state)
{
const struct nf_conn *ct;
enum ip_conntrack_info ctinfo;
unsigned int ret;
int err;
- ret = nf_nat_ipv6_fn(priv, skb, state, do_chain);
+ ret = nf_nat_ipv6_fn(priv, skb, state);
if (ret != NF_DROP && ret != NF_STOLEN &&
(ct = nf_ct_get(skb, &ctinfo)) != NULL) {
enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
@@ -432,7 +369,49 @@
}
return ret;
}
-EXPORT_SYMBOL_GPL(nf_nat_ipv6_local_fn);
+
+static const struct nf_hook_ops nf_nat_ipv6_ops[] = {
+ /* Before packet filtering, change destination */
+ {
+ .hook = nf_nat_ipv6_in,
+ .pf = NFPROTO_IPV6,
+ .hooknum = NF_INET_PRE_ROUTING,
+ .priority = NF_IP6_PRI_NAT_DST,
+ },
+ /* After packet filtering, change source */
+ {
+ .hook = nf_nat_ipv6_out,
+ .pf = NFPROTO_IPV6,
+ .hooknum = NF_INET_POST_ROUTING,
+ .priority = NF_IP6_PRI_NAT_SRC,
+ },
+ /* Before packet filtering, change destination */
+ {
+ .hook = nf_nat_ipv6_local_fn,
+ .pf = NFPROTO_IPV6,
+ .hooknum = NF_INET_LOCAL_OUT,
+ .priority = NF_IP6_PRI_NAT_DST,
+ },
+ /* After packet filtering, change source */
+ {
+ .hook = nf_nat_ipv6_fn,
+ .pf = NFPROTO_IPV6,
+ .hooknum = NF_INET_LOCAL_IN,
+ .priority = NF_IP6_PRI_NAT_SRC,
+ },
+};
+
+int nf_nat_l3proto_ipv6_register_fn(struct net *net, const struct nf_hook_ops *ops)
+{
+ return nf_nat_register_fn(net, ops, nf_nat_ipv6_ops, ARRAY_SIZE(nf_nat_ipv6_ops));
+}
+EXPORT_SYMBOL_GPL(nf_nat_l3proto_ipv6_register_fn);
+
+void nf_nat_l3proto_ipv6_unregister_fn(struct net *net, const struct nf_hook_ops *ops)
+{
+ nf_nat_unregister_fn(net, ops, ARRAY_SIZE(nf_nat_ipv6_ops));
+}
+EXPORT_SYMBOL_GPL(nf_nat_l3proto_ipv6_unregister_fn);
static int __init nf_nat_l3proto_ipv6_init(void)
{
diff --git a/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
index 98f61fc..9dfc2b9 100644
--- a/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_masquerade_ipv6.c
@@ -26,14 +26,14 @@
static atomic_t v6_worker_count;
unsigned int
-nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range *range,
+nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
const struct net_device *out)
{
enum ip_conntrack_info ctinfo;
struct nf_conn_nat *nat;
struct in6_addr src;
struct nf_conn *ct;
- struct nf_nat_range newrange;
+ struct nf_nat_range2 newrange;
ct = nf_ct_get(skb, &ctinfo);
WARN_ON(!(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
diff --git a/net/ipv6/netfilter/nf_nat_proto_icmpv6.c b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c
index 57593b00..d9bf42b 100644
--- a/net/ipv6/netfilter/nf_nat_proto_icmpv6.c
+++ b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c
@@ -32,7 +32,7 @@
static void
icmpv6_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/ipv6/netfilter/nft_chain_nat_ipv6.c b/net/ipv6/netfilter/nft_chain_nat_ipv6.c
index 3557b11..8a081ad 100644
--- a/net/ipv6/netfilter/nft_chain_nat_ipv6.c
+++ b/net/ipv6/netfilter/nft_chain_nat_ipv6.c
@@ -26,8 +26,7 @@
static unsigned int nft_nat_do_chain(void *priv,
struct sk_buff *skb,
- const struct nf_hook_state *state,
- struct nf_conn *ct)
+ const struct nf_hook_state *state)
{
struct nft_pktinfo pkt;
@@ -37,42 +36,14 @@
return nft_do_chain(&pkt, priv);
}
-static unsigned int nft_nat_ipv6_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
+static int nft_nat_ipv6_reg(struct net *net, const struct nf_hook_ops *ops)
{
- return nf_nat_ipv6_fn(priv, skb, state, nft_nat_do_chain);
+ return nf_nat_l3proto_ipv6_register_fn(net, ops);
}
-static unsigned int nft_nat_ipv6_in(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
+static void nft_nat_ipv6_unreg(struct net *net, const struct nf_hook_ops *ops)
{
- return nf_nat_ipv6_in(priv, skb, state, nft_nat_do_chain);
-}
-
-static unsigned int nft_nat_ipv6_out(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv6_out(priv, skb, state, nft_nat_do_chain);
-}
-
-static unsigned int nft_nat_ipv6_local_fn(void *priv,
- struct sk_buff *skb,
- const struct nf_hook_state *state)
-{
- return nf_nat_ipv6_local_fn(priv, skb, state, nft_nat_do_chain);
-}
-
-static int nft_nat_ipv6_init(struct nft_ctx *ctx)
-{
- return nf_ct_netns_get(ctx->net, ctx->family);
-}
-
-static void nft_nat_ipv6_free(struct nft_ctx *ctx)
-{
- nf_ct_netns_put(ctx->net, ctx->family);
+ nf_nat_l3proto_ipv6_unregister_fn(net, ops);
}
static const struct nft_chain_type nft_chain_nat_ipv6 = {
@@ -85,13 +56,13 @@
(1 << NF_INET_LOCAL_OUT) |
(1 << NF_INET_LOCAL_IN),
.hooks = {
- [NF_INET_PRE_ROUTING] = nft_nat_ipv6_in,
- [NF_INET_POST_ROUTING] = nft_nat_ipv6_out,
- [NF_INET_LOCAL_OUT] = nft_nat_ipv6_local_fn,
- [NF_INET_LOCAL_IN] = nft_nat_ipv6_fn,
+ [NF_INET_PRE_ROUTING] = nft_nat_do_chain,
+ [NF_INET_POST_ROUTING] = nft_nat_do_chain,
+ [NF_INET_LOCAL_OUT] = nft_nat_do_chain,
+ [NF_INET_LOCAL_IN] = nft_nat_do_chain,
},
- .init = nft_nat_ipv6_init,
- .free = nft_nat_ipv6_free,
+ .ops_register = nft_nat_ipv6_reg,
+ .ops_unregister = nft_nat_ipv6_unreg,
};
static int __init nft_chain_nat_ipv6_init(void)
diff --git a/net/ipv6/netfilter/nft_masq_ipv6.c b/net/ipv6/netfilter/nft_masq_ipv6.c
index 4146536..dd0122f 100644
--- a/net/ipv6/netfilter/nft_masq_ipv6.c
+++ b/net/ipv6/netfilter/nft_masq_ipv6.c
@@ -22,7 +22,7 @@
const struct nft_pktinfo *pkt)
{
struct nft_masq *priv = nft_expr_priv(expr);
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
memset(&range, 0, sizeof(range));
range.flags = priv->flags;
diff --git a/net/ipv6/netfilter/nft_redir_ipv6.c b/net/ipv6/netfilter/nft_redir_ipv6.c
index a27e424..7426986 100644
--- a/net/ipv6/netfilter/nft_redir_ipv6.c
+++ b/net/ipv6/netfilter/nft_redir_ipv6.c
@@ -22,7 +22,7 @@
const struct nft_pktinfo *pkt)
{
struct nft_redir *priv = nft_expr_priv(expr);
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
memset(&range, 0, sizeof(range));
if (priv->sreg_proto_min) {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 4979610..b939b94 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -163,7 +163,8 @@
}
static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
- struct frag_hdr *fhdr, int nhoff)
+ struct frag_hdr *fhdr, int nhoff,
+ u32 *prob_offset)
{
struct sk_buff *prev, *next;
struct net_device *dev;
@@ -179,11 +180,7 @@
((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1)));
if ((unsigned int)end > IPV6_MAXPLEN) {
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
- icmpv6_param_prob(skb, ICMPV6_HDR_FIELD,
- ((u8 *)&fhdr->frag_off -
- skb_network_header(skb)));
+ *prob_offset = (u8 *)&fhdr->frag_off - skb_network_header(skb);
return -1;
}
@@ -214,10 +211,7 @@
/* RFC2460 says always send parameter problem in
* this case. -DaveM
*/
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_INHDRERRORS);
- icmpv6_param_prob(skb, ICMPV6_HDR_FIELD,
- offsetof(struct ipv6hdr, payload_len));
+ *prob_offset = offsetof(struct ipv6hdr, payload_len);
return -1;
}
if (end > fq->q.len) {
@@ -519,15 +513,22 @@
iif = skb->dev ? skb->dev->ifindex : 0;
fq = fq_find(net, fhdr->identification, hdr, iif);
if (fq) {
+ u32 prob_offset = 0;
int ret;
spin_lock(&fq->q.lock);
fq->iif = iif;
- ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
+ ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff,
+ &prob_offset);
spin_unlock(&fq->q.lock);
inet_frag_put(&fq->q);
+ if (prob_offset) {
+ __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev),
+ IPSTATS_MIB_INHDRERRORS);
+ icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, prob_offset);
+ }
return ret;
}
@@ -536,7 +537,7 @@
return -1;
fail_hdr:
- __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+ __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev),
IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, skb_network_header_len(skb));
return -1;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f4d6173..22c4de2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -63,14 +63,20 @@
#include <net/lwtunnel.h>
#include <net/ip_tunnels.h>
#include <net/l3mdev.h>
-#include <trace/events/fib6.h>
-
+#include <net/ip.h>
#include <linux/uaccess.h>
#ifdef CONFIG_SYSCTL
#include <linux/sysctl.h>
#endif
+static int ip6_rt_type_to_error(u8 fib6_type);
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/fib6.h>
+EXPORT_TRACEPOINT_SYMBOL_GPL(fib6_table_lookup);
+#undef CREATE_TRACE_POINTS
+
enum rt6_nud_state {
RT6_NUD_FAIL_HARD = -3,
RT6_NUD_FAIL_PROBE = -2,
@@ -78,7 +84,6 @@
RT6_NUD_SUCCEED = 1
};
-static void ip6_rt_copy_init(struct rt6_info *rt, struct rt6_info *ort);
static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie);
static unsigned int ip6_default_advmss(const struct dst_entry *dst);
static unsigned int ip6_mtu(const struct dst_entry *dst);
@@ -97,25 +102,24 @@
struct sk_buff *skb, u32 mtu);
static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk,
struct sk_buff *skb);
-static void rt6_dst_from_metrics_check(struct rt6_info *rt);
-static int rt6_score_route(struct rt6_info *rt, int oif, int strict);
-static size_t rt6_nlmsg_size(struct rt6_info *rt);
-static int rt6_fill_node(struct net *net,
- struct sk_buff *skb, struct rt6_info *rt,
- struct in6_addr *dst, struct in6_addr *src,
+static int rt6_score_route(struct fib6_info *rt, int oif, int strict);
+static size_t rt6_nlmsg_size(struct fib6_info *rt);
+static int rt6_fill_node(struct net *net, struct sk_buff *skb,
+ struct fib6_info *rt, struct dst_entry *dst,
+ struct in6_addr *dest, struct in6_addr *src,
int iif, int type, u32 portid, u32 seq,
unsigned int flags);
-static struct rt6_info *rt6_find_cached_rt(struct rt6_info *rt,
+static struct rt6_info *rt6_find_cached_rt(struct fib6_info *rt,
struct in6_addr *daddr,
struct in6_addr *saddr);
#ifdef CONFIG_IPV6_ROUTE_INFO
-static struct rt6_info *rt6_add_route_info(struct net *net,
+static struct fib6_info *rt6_add_route_info(struct net *net,
const struct in6_addr *prefix, int prefixlen,
const struct in6_addr *gwaddr,
struct net_device *dev,
unsigned int pref);
-static struct rt6_info *rt6_get_route_info(struct net *net,
+static struct fib6_info *rt6_get_route_info(struct net *net,
const struct in6_addr *prefix, int prefixlen,
const struct in6_addr *gwaddr,
struct net_device *dev);
@@ -184,29 +188,10 @@
}
}
-static u32 *rt6_pcpu_cow_metrics(struct rt6_info *rt)
-{
- return dst_metrics_write_ptr(&rt->from->dst);
-}
-
-static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
-{
- struct rt6_info *rt = (struct rt6_info *)dst;
-
- if (rt->rt6i_flags & RTF_PCPU)
- return rt6_pcpu_cow_metrics(rt);
- else if (rt->rt6i_flags & RTF_CACHE)
- return NULL;
- else
- return dst_cow_metrics_generic(dst, old);
-}
-
-static inline const void *choose_neigh_daddr(struct rt6_info *rt,
+static inline const void *choose_neigh_daddr(const struct in6_addr *p,
struct sk_buff *skb,
const void *daddr)
{
- struct in6_addr *p = &rt->rt6i_gateway;
-
if (!ipv6_addr_any(p))
return (const void *) p;
else if (skb)
@@ -214,18 +199,27 @@
return daddr;
}
-static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
- struct sk_buff *skb,
- const void *daddr)
+struct neighbour *ip6_neigh_lookup(const struct in6_addr *gw,
+ struct net_device *dev,
+ struct sk_buff *skb,
+ const void *daddr)
{
- struct rt6_info *rt = (struct rt6_info *) dst;
struct neighbour *n;
- daddr = choose_neigh_daddr(rt, skb, daddr);
- n = __ipv6_neigh_lookup(dst->dev, daddr);
+ daddr = choose_neigh_daddr(gw, skb, daddr);
+ n = __ipv6_neigh_lookup(dev, daddr);
if (n)
return n;
- return neigh_create(&nd_tbl, daddr, dst->dev);
+ return neigh_create(&nd_tbl, daddr, dev);
+}
+
+static struct neighbour *ip6_dst_neigh_lookup(const struct dst_entry *dst,
+ struct sk_buff *skb,
+ const void *daddr)
+{
+ const struct rt6_info *rt = container_of(dst, struct rt6_info, dst);
+
+ return ip6_neigh_lookup(&rt->rt6i_gateway, dst->dev, skb, daddr);
}
static void ip6_confirm_neigh(const struct dst_entry *dst, const void *daddr)
@@ -233,7 +227,7 @@
struct net_device *dev = dst->dev;
struct rt6_info *rt = (struct rt6_info *)dst;
- daddr = choose_neigh_daddr(rt, NULL, daddr);
+ daddr = choose_neigh_daddr(&rt->rt6i_gateway, NULL, daddr);
if (!daddr)
return;
if (dev->flags & (IFF_NOARP | IFF_LOOPBACK))
@@ -250,7 +244,7 @@
.check = ip6_dst_check,
.default_advmss = ip6_default_advmss,
.mtu = ip6_mtu,
- .cow_metrics = ipv6_cow_metrics,
+ .cow_metrics = dst_cow_metrics_generic,
.destroy = ip6_dst_destroy,
.ifdown = ip6_dst_ifdown,
.negative_advice = ip6_negative_advice,
@@ -258,7 +252,7 @@
.update_pmtu = ip6_rt_update_pmtu,
.redirect = rt6_do_redirect,
.local_out = __ip6_local_out,
- .neigh_lookup = ip6_neigh_lookup,
+ .neigh_lookup = ip6_dst_neigh_lookup,
.confirm_neigh = ip6_confirm_neigh,
};
@@ -288,13 +282,22 @@
.update_pmtu = ip6_rt_blackhole_update_pmtu,
.redirect = ip6_rt_blackhole_redirect,
.cow_metrics = dst_cow_metrics_generic,
- .neigh_lookup = ip6_neigh_lookup,
+ .neigh_lookup = ip6_dst_neigh_lookup,
};
static const u32 ip6_template_metrics[RTAX_MAX] = {
[RTAX_HOPLIMIT - 1] = 0,
};
+static const struct fib6_info fib6_null_entry_template = {
+ .fib6_flags = (RTF_REJECT | RTF_NONEXTHOP),
+ .fib6_protocol = RTPROT_KERNEL,
+ .fib6_metric = ~(u32)0,
+ .fib6_ref = ATOMIC_INIT(1),
+ .fib6_type = RTN_UNREACHABLE,
+ .fib6_metrics = (struct dst_metrics *)&dst_default_metrics,
+};
+
static const struct rt6_info ip6_null_entry_template = {
.dst = {
.__refcnt = ATOMIC_INIT(1),
@@ -305,9 +308,6 @@
.output = ip6_pkt_discard_out,
},
.rt6i_flags = (RTF_REJECT | RTF_NONEXTHOP),
- .rt6i_protocol = RTPROT_KERNEL,
- .rt6i_metric = ~(u32) 0,
- .rt6i_ref = ATOMIC_INIT(1),
};
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
@@ -322,9 +322,6 @@
.output = ip6_pkt_prohibit_out,
},
.rt6i_flags = (RTF_REJECT | RTF_NONEXTHOP),
- .rt6i_protocol = RTPROT_KERNEL,
- .rt6i_metric = ~(u32) 0,
- .rt6i_ref = ATOMIC_INIT(1),
};
static const struct rt6_info ip6_blk_hole_entry_template = {
@@ -337,9 +334,6 @@
.output = dst_discard_out,
},
.rt6i_flags = (RTF_REJECT | RTF_NONEXTHOP),
- .rt6i_protocol = RTPROT_KERNEL,
- .rt6i_metric = ~(u32) 0,
- .rt6i_ref = ATOMIC_INIT(1),
};
#endif
@@ -349,14 +343,12 @@
struct dst_entry *dst = &rt->dst;
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
- INIT_LIST_HEAD(&rt->rt6i_siblings);
INIT_LIST_HEAD(&rt->rt6i_uncached);
}
/* allocate dst with ip6_dst_ops */
-static struct rt6_info *__ip6_dst_alloc(struct net *net,
- struct net_device *dev,
- int flags)
+struct rt6_info *ip6_dst_alloc(struct net *net, struct net_device *dev,
+ int flags)
{
struct rt6_info *rt = dst_alloc(&net->ipv6.ip6_dst_ops, dev,
1, DST_OBSOLETE_FORCE_CHK, flags);
@@ -368,34 +360,15 @@
return rt;
}
-
-struct rt6_info *ip6_dst_alloc(struct net *net,
- struct net_device *dev,
- int flags)
-{
- struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
-
- if (rt) {
- rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
- if (!rt->rt6i_pcpu) {
- dst_release_immediate(&rt->dst);
- return NULL;
- }
- }
-
- return rt;
-}
EXPORT_SYMBOL(ip6_dst_alloc);
static void ip6_dst_destroy(struct dst_entry *dst)
{
struct rt6_info *rt = (struct rt6_info *)dst;
- struct rt6_exception_bucket *bucket;
- struct rt6_info *from = rt->from;
+ struct fib6_info *from;
struct inet6_dev *idev;
dst_destroy_metrics_generic(dst);
- free_percpu(rt->rt6i_pcpu);
rt6_uncached_list_del(rt);
idev = rt->rt6i_idev;
@@ -403,14 +376,12 @@
rt->rt6i_idev = NULL;
in6_dev_put(idev);
}
- bucket = rcu_dereference_protected(rt->rt6i_exception_bucket, 1);
- if (bucket) {
- rt->rt6i_exception_bucket = NULL;
- kfree(bucket);
- }
- rt->from = NULL;
- dst_release(&from->dst);
+ rcu_read_lock();
+ from = rcu_dereference(rt->from);
+ rcu_assign_pointer(rt->from, NULL);
+ fib6_info_release(from);
+ rcu_read_unlock();
}
static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
@@ -440,23 +411,27 @@
static bool rt6_check_expired(const struct rt6_info *rt)
{
+ struct fib6_info *from;
+
+ from = rcu_dereference(rt->from);
+
if (rt->rt6i_flags & RTF_EXPIRES) {
if (time_after(jiffies, rt->dst.expires))
return true;
- } else if (rt->from) {
+ } else if (from) {
return rt->dst.obsolete != DST_OBSOLETE_FORCE_CHK ||
- rt6_check_expired(rt->from);
+ fib6_check_expired(from);
}
return false;
}
-static struct rt6_info *rt6_multipath_select(const struct net *net,
- struct rt6_info *match,
- struct flowi6 *fl6, int oif,
- const struct sk_buff *skb,
- int strict)
+struct fib6_info *fib6_multipath_select(const struct net *net,
+ struct fib6_info *match,
+ struct flowi6 *fl6, int oif,
+ const struct sk_buff *skb,
+ int strict)
{
- struct rt6_info *sibling, *next_sibling;
+ struct fib6_info *sibling, *next_sibling;
/* We might have already computed the hash for ICMPv6 errors. In such
* case it will always be non-zero. Otherwise now is the time to do it.
@@ -464,12 +439,15 @@
if (!fl6->mp_hash)
fl6->mp_hash = rt6_multipath_hash(net, fl6, skb, NULL);
- if (fl6->mp_hash <= atomic_read(&match->rt6i_nh_upper_bound))
+ if (fl6->mp_hash <= atomic_read(&match->fib6_nh.nh_upper_bound))
return match;
- list_for_each_entry_safe(sibling, next_sibling, &match->rt6i_siblings,
- rt6i_siblings) {
- if (fl6->mp_hash > atomic_read(&sibling->rt6i_nh_upper_bound))
+ list_for_each_entry_safe(sibling, next_sibling, &match->fib6_siblings,
+ fib6_siblings) {
+ int nh_upper_bound;
+
+ nh_upper_bound = atomic_read(&sibling->fib6_nh.nh_upper_bound);
+ if (fl6->mp_hash > nh_upper_bound)
continue;
if (rt6_score_route(sibling, oif, strict) < 0)
break;
@@ -484,38 +462,27 @@
* Route lookup. rcu_read_lock() should be held.
*/
-static inline struct rt6_info *rt6_device_match(struct net *net,
- struct rt6_info *rt,
+static inline struct fib6_info *rt6_device_match(struct net *net,
+ struct fib6_info *rt,
const struct in6_addr *saddr,
int oif,
int flags)
{
- struct rt6_info *local = NULL;
- struct rt6_info *sprt;
+ struct fib6_info *sprt;
- if (!oif && ipv6_addr_any(saddr) && !(rt->rt6i_nh_flags & RTNH_F_DEAD))
+ if (!oif && ipv6_addr_any(saddr) &&
+ !(rt->fib6_nh.nh_flags & RTNH_F_DEAD))
return rt;
- for (sprt = rt; sprt; sprt = rcu_dereference(sprt->rt6_next)) {
- struct net_device *dev = sprt->dst.dev;
+ for (sprt = rt; sprt; sprt = rcu_dereference(sprt->fib6_next)) {
+ const struct net_device *dev = sprt->fib6_nh.nh_dev;
- if (sprt->rt6i_nh_flags & RTNH_F_DEAD)
+ if (sprt->fib6_nh.nh_flags & RTNH_F_DEAD)
continue;
if (oif) {
if (dev->ifindex == oif)
return sprt;
- if (dev->flags & IFF_LOOPBACK) {
- if (!sprt->rt6i_idev ||
- sprt->rt6i_idev->dev->ifindex != oif) {
- if (flags & RT6_LOOKUP_F_IFACE)
- continue;
- if (local &&
- local->rt6i_idev->dev->ifindex == oif)
- continue;
- }
- local = sprt;
- }
} else {
if (ipv6_chk_addr(net, saddr, dev,
flags & RT6_LOOKUP_F_IFACE))
@@ -523,15 +490,10 @@
}
}
- if (oif) {
- if (local)
- return local;
+ if (oif && flags & RT6_LOOKUP_F_IFACE)
+ return net->ipv6.fib6_null_entry;
- if (flags & RT6_LOOKUP_F_IFACE)
- return net->ipv6.ip6_null_entry;
- }
-
- return rt->rt6i_nh_flags & RTNH_F_DEAD ? net->ipv6.ip6_null_entry : rt;
+ return rt->fib6_nh.nh_flags & RTNH_F_DEAD ? net->ipv6.fib6_null_entry : rt;
}
#ifdef CONFIG_IPV6_ROUTER_PREF
@@ -553,10 +515,13 @@
kfree(work);
}
-static void rt6_probe(struct rt6_info *rt)
+static void rt6_probe(struct fib6_info *rt)
{
struct __rt6_probe_work *work;
+ const struct in6_addr *nh_gw;
struct neighbour *neigh;
+ struct net_device *dev;
+
/*
* Okay, this does not seem to be appropriate
* for now, however, we need to check if it
@@ -565,20 +530,25 @@
* Router Reachability Probe MUST be rate-limited
* to no more than one per minute.
*/
- if (!rt || !(rt->rt6i_flags & RTF_GATEWAY))
+ if (!rt || !(rt->fib6_flags & RTF_GATEWAY))
return;
+
+ nh_gw = &rt->fib6_nh.nh_gw;
+ dev = rt->fib6_nh.nh_dev;
rcu_read_lock_bh();
- neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
+ neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
if (neigh) {
+ struct inet6_dev *idev;
+
if (neigh->nud_state & NUD_VALID)
goto out;
+ idev = __in6_dev_get(dev);
work = NULL;
write_lock(&neigh->lock);
if (!(neigh->nud_state & NUD_VALID) &&
time_after(jiffies,
- neigh->updated +
- rt->rt6i_idev->cnf.rtr_probe_interval)) {
+ neigh->updated + idev->cnf.rtr_probe_interval)) {
work = kmalloc(sizeof(*work), GFP_ATOMIC);
if (work)
__neigh_set_probe_once(neigh);
@@ -590,9 +560,9 @@
if (work) {
INIT_WORK(&work->work, rt6_probe_deferred);
- work->target = rt->rt6i_gateway;
- dev_hold(rt->dst.dev);
- work->dev = rt->dst.dev;
+ work->target = *nh_gw;
+ dev_hold(dev);
+ work->dev = dev;
schedule_work(&work->work);
}
@@ -600,7 +570,7 @@
rcu_read_unlock_bh();
}
#else
-static inline void rt6_probe(struct rt6_info *rt)
+static inline void rt6_probe(struct fib6_info *rt)
{
}
#endif
@@ -608,28 +578,27 @@
/*
* Default Router Selection (RFC 2461 6.3.6)
*/
-static inline int rt6_check_dev(struct rt6_info *rt, int oif)
+static inline int rt6_check_dev(struct fib6_info *rt, int oif)
{
- struct net_device *dev = rt->dst.dev;
+ const struct net_device *dev = rt->fib6_nh.nh_dev;
+
if (!oif || dev->ifindex == oif)
return 2;
- if ((dev->flags & IFF_LOOPBACK) &&
- rt->rt6i_idev && rt->rt6i_idev->dev->ifindex == oif)
- return 1;
return 0;
}
-static inline enum rt6_nud_state rt6_check_neigh(struct rt6_info *rt)
+static inline enum rt6_nud_state rt6_check_neigh(struct fib6_info *rt)
{
- struct neighbour *neigh;
enum rt6_nud_state ret = RT6_NUD_FAIL_HARD;
+ struct neighbour *neigh;
- if (rt->rt6i_flags & RTF_NONEXTHOP ||
- !(rt->rt6i_flags & RTF_GATEWAY))
+ if (rt->fib6_flags & RTF_NONEXTHOP ||
+ !(rt->fib6_flags & RTF_GATEWAY))
return RT6_NUD_SUCCEED;
rcu_read_lock_bh();
- neigh = __ipv6_neigh_lookup_noref(rt->dst.dev, &rt->rt6i_gateway);
+ neigh = __ipv6_neigh_lookup_noref(rt->fib6_nh.nh_dev,
+ &rt->fib6_nh.nh_gw);
if (neigh) {
read_lock(&neigh->lock);
if (neigh->nud_state & NUD_VALID)
@@ -650,8 +619,7 @@
return ret;
}
-static int rt6_score_route(struct rt6_info *rt, int oif,
- int strict)
+static int rt6_score_route(struct fib6_info *rt, int oif, int strict)
{
int m;
@@ -659,7 +627,7 @@
if (!m && (strict & RT6_LOOKUP_F_IFACE))
return RT6_NUD_FAIL_HARD;
#ifdef CONFIG_IPV6_ROUTER_PREF
- m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->rt6i_flags)) << 2;
+ m |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(rt->fib6_flags)) << 2;
#endif
if (strict & RT6_LOOKUP_F_REACHABLE) {
int n = rt6_check_neigh(rt);
@@ -669,23 +637,37 @@
return m;
}
-static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict,
- int *mpri, struct rt6_info *match,
+/* called with rc_read_lock held */
+static inline bool fib6_ignore_linkdown(const struct fib6_info *f6i)
+{
+ const struct net_device *dev = fib6_info_nh_dev(f6i);
+ bool rc = false;
+
+ if (dev) {
+ const struct inet6_dev *idev = __in6_dev_get(dev);
+
+ rc = !!idev->cnf.ignore_routes_with_linkdown;
+ }
+
+ return rc;
+}
+
+static struct fib6_info *find_match(struct fib6_info *rt, int oif, int strict,
+ int *mpri, struct fib6_info *match,
bool *do_rr)
{
int m;
bool match_do_rr = false;
- struct inet6_dev *idev = rt->rt6i_idev;
- if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+ if (rt->fib6_nh.nh_flags & RTNH_F_DEAD)
goto out;
- if (idev->cnf.ignore_routes_with_linkdown &&
- rt->rt6i_nh_flags & RTNH_F_LINKDOWN &&
+ if (fib6_ignore_linkdown(rt) &&
+ rt->fib6_nh.nh_flags & RTNH_F_LINKDOWN &&
!(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE))
goto out;
- if (rt6_check_expired(rt))
+ if (fib6_check_expired(rt))
goto out;
m = rt6_score_route(rt, oif, strict);
@@ -709,19 +691,19 @@
return match;
}
-static struct rt6_info *find_rr_leaf(struct fib6_node *fn,
- struct rt6_info *leaf,
- struct rt6_info *rr_head,
+static struct fib6_info *find_rr_leaf(struct fib6_node *fn,
+ struct fib6_info *leaf,
+ struct fib6_info *rr_head,
u32 metric, int oif, int strict,
bool *do_rr)
{
- struct rt6_info *rt, *match, *cont;
+ struct fib6_info *rt, *match, *cont;
int mpri = -1;
match = NULL;
cont = NULL;
- for (rt = rr_head; rt; rt = rcu_dereference(rt->rt6_next)) {
- if (rt->rt6i_metric != metric) {
+ for (rt = rr_head; rt; rt = rcu_dereference(rt->fib6_next)) {
+ if (rt->fib6_metric != metric) {
cont = rt;
break;
}
@@ -730,8 +712,8 @@
}
for (rt = leaf; rt && rt != rr_head;
- rt = rcu_dereference(rt->rt6_next)) {
- if (rt->rt6i_metric != metric) {
+ rt = rcu_dereference(rt->fib6_next)) {
+ if (rt->fib6_metric != metric) {
cont = rt;
break;
}
@@ -742,22 +724,22 @@
if (match || !cont)
return match;
- for (rt = cont; rt; rt = rcu_dereference(rt->rt6_next))
+ for (rt = cont; rt; rt = rcu_dereference(rt->fib6_next))
match = find_match(rt, oif, strict, &mpri, match, do_rr);
return match;
}
-static struct rt6_info *rt6_select(struct net *net, struct fib6_node *fn,
+static struct fib6_info *rt6_select(struct net *net, struct fib6_node *fn,
int oif, int strict)
{
- struct rt6_info *leaf = rcu_dereference(fn->leaf);
- struct rt6_info *match, *rt0;
+ struct fib6_info *leaf = rcu_dereference(fn->leaf);
+ struct fib6_info *match, *rt0;
bool do_rr = false;
int key_plen;
- if (!leaf || leaf == net->ipv6.ip6_null_entry)
- return net->ipv6.ip6_null_entry;
+ if (!leaf || leaf == net->ipv6.fib6_null_entry)
+ return net->ipv6.fib6_null_entry;
rt0 = rcu_dereference(fn->rr_ptr);
if (!rt0)
@@ -768,39 +750,39 @@
* (This might happen if all routes under fn are deleted from
* the tree and fib6_repair_tree() is called on the node.)
*/
- key_plen = rt0->rt6i_dst.plen;
+ key_plen = rt0->fib6_dst.plen;
#ifdef CONFIG_IPV6_SUBTREES
- if (rt0->rt6i_src.plen)
- key_plen = rt0->rt6i_src.plen;
+ if (rt0->fib6_src.plen)
+ key_plen = rt0->fib6_src.plen;
#endif
if (fn->fn_bit != key_plen)
- return net->ipv6.ip6_null_entry;
+ return net->ipv6.fib6_null_entry;
- match = find_rr_leaf(fn, leaf, rt0, rt0->rt6i_metric, oif, strict,
+ match = find_rr_leaf(fn, leaf, rt0, rt0->fib6_metric, oif, strict,
&do_rr);
if (do_rr) {
- struct rt6_info *next = rcu_dereference(rt0->rt6_next);
+ struct fib6_info *next = rcu_dereference(rt0->fib6_next);
/* no entries matched; do round-robin */
- if (!next || next->rt6i_metric != rt0->rt6i_metric)
+ if (!next || next->fib6_metric != rt0->fib6_metric)
next = leaf;
if (next != rt0) {
- spin_lock_bh(&leaf->rt6i_table->tb6_lock);
+ spin_lock_bh(&leaf->fib6_table->tb6_lock);
/* make sure next is not being deleted from the tree */
- if (next->rt6i_node)
+ if (next->fib6_node)
rcu_assign_pointer(fn->rr_ptr, next);
- spin_unlock_bh(&leaf->rt6i_table->tb6_lock);
+ spin_unlock_bh(&leaf->fib6_table->tb6_lock);
}
}
- return match ? match : net->ipv6.ip6_null_entry;
+ return match ? match : net->ipv6.fib6_null_entry;
}
-static bool rt6_is_gw_or_nonexthop(const struct rt6_info *rt)
+static bool rt6_is_gw_or_nonexthop(const struct fib6_info *rt)
{
- return (rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY));
+ return (rt->fib6_flags & (RTF_NONEXTHOP | RTF_GATEWAY));
}
#ifdef CONFIG_IPV6_ROUTE_INFO
@@ -812,7 +794,7 @@
struct in6_addr prefix_buf, *prefix;
unsigned int pref;
unsigned long lifetime;
- struct rt6_info *rt;
+ struct fib6_info *rt;
if (len < sizeof(struct route_info)) {
return -EINVAL;
@@ -850,13 +832,13 @@
}
if (rinfo->prefix_len == 0)
- rt = rt6_get_dflt_router(gwaddr, dev);
+ rt = rt6_get_dflt_router(net, gwaddr, dev);
else
rt = rt6_get_route_info(net, prefix, rinfo->prefix_len,
gwaddr, dev);
if (rt && !lifetime) {
- ip6_del_rt(rt);
+ ip6_del_rt(net, rt);
rt = NULL;
}
@@ -864,21 +846,162 @@
rt = rt6_add_route_info(net, prefix, rinfo->prefix_len, gwaddr,
dev, pref);
else if (rt)
- rt->rt6i_flags = RTF_ROUTEINFO |
- (rt->rt6i_flags & ~RTF_PREF_MASK) | RTF_PREF(pref);
+ rt->fib6_flags = RTF_ROUTEINFO |
+ (rt->fib6_flags & ~RTF_PREF_MASK) | RTF_PREF(pref);
if (rt) {
if (!addrconf_finite_timeout(lifetime))
- rt6_clean_expires(rt);
+ fib6_clean_expires(rt);
else
- rt6_set_expires(rt, jiffies + HZ * lifetime);
+ fib6_set_expires(rt, jiffies + HZ * lifetime);
- ip6_rt_put(rt);
+ fib6_info_release(rt);
}
return 0;
}
#endif
+/*
+ * Misc support functions
+ */
+
+/* called with rcu_lock held */
+static struct net_device *ip6_rt_get_dev_rcu(struct fib6_info *rt)
+{
+ struct net_device *dev = rt->fib6_nh.nh_dev;
+
+ if (rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST)) {
+ /* for copies of local routes, dst->dev needs to be the
+ * device if it is a master device, the master device if
+ * device is enslaved, and the loopback as the default
+ */
+ if (netif_is_l3_slave(dev) &&
+ !rt6_need_strict(&rt->fib6_dst.addr))
+ dev = l3mdev_master_dev_rcu(dev);
+ else if (!netif_is_l3_master(dev))
+ dev = dev_net(dev)->loopback_dev;
+ /* last case is netif_is_l3_master(dev) is true in which
+ * case we want dev returned to be dev
+ */
+ }
+
+ return dev;
+}
+
+static const int fib6_prop[RTN_MAX + 1] = {
+ [RTN_UNSPEC] = 0,
+ [RTN_UNICAST] = 0,
+ [RTN_LOCAL] = 0,
+ [RTN_BROADCAST] = 0,
+ [RTN_ANYCAST] = 0,
+ [RTN_MULTICAST] = 0,
+ [RTN_BLACKHOLE] = -EINVAL,
+ [RTN_UNREACHABLE] = -EHOSTUNREACH,
+ [RTN_PROHIBIT] = -EACCES,
+ [RTN_THROW] = -EAGAIN,
+ [RTN_NAT] = -EINVAL,
+ [RTN_XRESOLVE] = -EINVAL,
+};
+
+static int ip6_rt_type_to_error(u8 fib6_type)
+{
+ return fib6_prop[fib6_type];
+}
+
+static unsigned short fib6_info_dst_flags(struct fib6_info *rt)
+{
+ unsigned short flags = 0;
+
+ if (rt->dst_nocount)
+ flags |= DST_NOCOUNT;
+ if (rt->dst_nopolicy)
+ flags |= DST_NOPOLICY;
+ if (rt->dst_host)
+ flags |= DST_HOST;
+
+ return flags;
+}
+
+static void ip6_rt_init_dst_reject(struct rt6_info *rt, struct fib6_info *ort)
+{
+ rt->dst.error = ip6_rt_type_to_error(ort->fib6_type);
+
+ switch (ort->fib6_type) {
+ case RTN_BLACKHOLE:
+ rt->dst.output = dst_discard_out;
+ rt->dst.input = dst_discard;
+ break;
+ case RTN_PROHIBIT:
+ rt->dst.output = ip6_pkt_prohibit_out;
+ rt->dst.input = ip6_pkt_prohibit;
+ break;
+ case RTN_THROW:
+ case RTN_UNREACHABLE:
+ default:
+ rt->dst.output = ip6_pkt_discard_out;
+ rt->dst.input = ip6_pkt_discard;
+ break;
+ }
+}
+
+static void ip6_rt_init_dst(struct rt6_info *rt, struct fib6_info *ort)
+{
+ rt->dst.flags |= fib6_info_dst_flags(ort);
+
+ if (ort->fib6_flags & RTF_REJECT) {
+ ip6_rt_init_dst_reject(rt, ort);
+ return;
+ }
+
+ rt->dst.error = 0;
+ rt->dst.output = ip6_output;
+
+ if (ort->fib6_type == RTN_LOCAL) {
+ rt->dst.input = ip6_input;
+ } else if (ipv6_addr_type(&ort->fib6_dst.addr) & IPV6_ADDR_MULTICAST) {
+ rt->dst.input = ip6_mc_input;
+ } else {
+ rt->dst.input = ip6_forward;
+ }
+
+ if (ort->fib6_nh.nh_lwtstate) {
+ rt->dst.lwtstate = lwtstate_get(ort->fib6_nh.nh_lwtstate);
+ lwtunnel_set_redirect(&rt->dst);
+ }
+
+ rt->dst.lastuse = jiffies;
+}
+
+static void rt6_set_from(struct rt6_info *rt, struct fib6_info *from)
+{
+ rt->rt6i_flags &= ~RTF_EXPIRES;
+ fib6_info_hold(from);
+ rcu_assign_pointer(rt->from, from);
+ dst_init_metrics(&rt->dst, from->fib6_metrics->metrics, true);
+ if (from->fib6_metrics != &dst_default_metrics) {
+ rt->dst._metrics |= DST_METRICS_REFCOUNTED;
+ refcount_inc(&from->fib6_metrics->refcnt);
+ }
+}
+
+static void ip6_rt_copy_init(struct rt6_info *rt, struct fib6_info *ort)
+{
+ struct net_device *dev = fib6_info_nh_dev(ort);
+
+ ip6_rt_init_dst(rt, ort);
+
+ rt->rt6i_dst = ort->fib6_dst;
+ rt->rt6i_idev = dev ? in6_dev_get(dev) : NULL;
+ rt->rt6i_gateway = ort->fib6_nh.nh_gw;
+ rt->rt6i_flags = ort->fib6_flags;
+ rt6_set_from(rt, ort);
+#ifdef CONFIG_IPV6_SUBTREES
+ rt->rt6i_src = ort->fib6_src;
+#endif
+ rt->rt6i_prefsrc = ort->fib6_prefsrc;
+ rt->dst.lwtstate = lwtstate_get(ort->fib6_nh.nh_lwtstate);
+}
+
static struct fib6_node* fib6_backtrack(struct fib6_node *fn,
struct in6_addr *saddr)
{
@@ -889,7 +1012,7 @@
pn = rcu_dereference(fn->parent);
sn = FIB6_SUBTREE(pn);
if (sn && sn != fn)
- fn = fib6_lookup(sn, NULL, saddr);
+ fn = fib6_node_lookup(sn, NULL, saddr);
else
fn = pn;
if (fn->fn_flags & RTN_RTINFO)
@@ -914,50 +1037,74 @@
return false;
}
+/* called with rcu_lock held */
+static struct rt6_info *ip6_create_rt_rcu(struct fib6_info *rt)
+{
+ unsigned short flags = fib6_info_dst_flags(rt);
+ struct net_device *dev = rt->fib6_nh.nh_dev;
+ struct rt6_info *nrt;
+
+ nrt = ip6_dst_alloc(dev_net(dev), dev, flags);
+ if (nrt)
+ ip6_rt_copy_init(nrt, rt);
+
+ return nrt;
+}
+
static struct rt6_info *ip6_pol_route_lookup(struct net *net,
struct fib6_table *table,
struct flowi6 *fl6,
const struct sk_buff *skb,
int flags)
{
- struct rt6_info *rt, *rt_cache;
+ struct fib6_info *f6i;
struct fib6_node *fn;
+ struct rt6_info *rt;
if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF)
flags &= ~RT6_LOOKUP_F_IFACE;
rcu_read_lock();
- fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+ fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
restart:
- rt = rcu_dereference(fn->leaf);
- if (!rt) {
- rt = net->ipv6.ip6_null_entry;
+ f6i = rcu_dereference(fn->leaf);
+ if (!f6i) {
+ f6i = net->ipv6.fib6_null_entry;
} else {
- rt = rt6_device_match(net, rt, &fl6->saddr,
+ f6i = rt6_device_match(net, f6i, &fl6->saddr,
fl6->flowi6_oif, flags);
- if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
- rt = rt6_multipath_select(net, rt, fl6, fl6->flowi6_oif,
- skb, flags);
+ if (f6i->fib6_nsiblings && fl6->flowi6_oif == 0)
+ f6i = fib6_multipath_select(net, f6i, fl6,
+ fl6->flowi6_oif, skb,
+ flags);
}
- if (rt == net->ipv6.ip6_null_entry) {
+ if (f6i == net->ipv6.fib6_null_entry) {
fn = fib6_backtrack(fn, &fl6->saddr);
if (fn)
goto restart;
}
- /* Search through exception table */
- rt_cache = rt6_find_cached_rt(rt, &fl6->daddr, &fl6->saddr);
- if (rt_cache)
- rt = rt_cache;
- if (ip6_hold_safe(net, &rt, true))
- dst_use_noref(&rt->dst, jiffies);
+ trace_fib6_table_lookup(net, f6i, table, fl6);
+
+ /* Search through exception table */
+ rt = rt6_find_cached_rt(f6i, &fl6->daddr, &fl6->saddr);
+ if (rt) {
+ if (ip6_hold_safe(net, &rt, true))
+ dst_use_noref(&rt->dst, jiffies);
+ } else if (f6i == net->ipv6.fib6_null_entry) {
+ rt = net->ipv6.ip6_null_entry;
+ dst_hold(&rt->dst);
+ } else {
+ rt = ip6_create_rt_rcu(f6i);
+ if (!rt) {
+ rt = net->ipv6.ip6_null_entry;
+ dst_hold(&rt->dst);
+ }
+ }
rcu_read_unlock();
- trace_fib6_table_lookup(net, rt, table, fl6);
-
return rt;
-
}
struct dst_entry *ip6_route_lookup(struct net *net, struct flowi6 *fl6,
@@ -999,55 +1146,28 @@
* Caller must hold dst before calling it.
*/
-static int __ip6_ins_rt(struct rt6_info *rt, struct nl_info *info,
- struct mx6_config *mxc,
+static int __ip6_ins_rt(struct fib6_info *rt, struct nl_info *info,
struct netlink_ext_ack *extack)
{
int err;
struct fib6_table *table;
- table = rt->rt6i_table;
+ table = rt->fib6_table;
spin_lock_bh(&table->tb6_lock);
- err = fib6_add(&table->tb6_root, rt, info, mxc, extack);
+ err = fib6_add(&table->tb6_root, rt, info, extack);
spin_unlock_bh(&table->tb6_lock);
return err;
}
-int ip6_ins_rt(struct rt6_info *rt)
+int ip6_ins_rt(struct net *net, struct fib6_info *rt)
{
- struct nl_info info = { .nl_net = dev_net(rt->dst.dev), };
- struct mx6_config mxc = { .mx = NULL, };
+ struct nl_info info = { .nl_net = net, };
- /* Hold dst to account for the reference from the fib6 tree */
- dst_hold(&rt->dst);
- return __ip6_ins_rt(rt, &info, &mxc, NULL);
+ return __ip6_ins_rt(rt, &info, NULL);
}
-/* called with rcu_lock held */
-static struct net_device *ip6_rt_get_dev_rcu(struct rt6_info *rt)
-{
- struct net_device *dev = rt->dst.dev;
-
- if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST)) {
- /* for copies of local routes, dst->dev needs to be the
- * device if it is a master device, the master device if
- * device is enslaved, and the loopback as the default
- */
- if (netif_is_l3_slave(dev) &&
- !rt6_need_strict(&rt->rt6i_dst.addr))
- dev = l3mdev_master_dev_rcu(dev);
- else if (!netif_is_l3_master(dev))
- dev = dev_net(dev)->loopback_dev;
- /* last case is netif_is_l3_master(dev) is true in which
- * case we want dev returned to be dev
- */
- }
-
- return dev;
-}
-
-static struct rt6_info *ip6_rt_cache_alloc(struct rt6_info *ort,
+static struct rt6_info *ip6_rt_cache_alloc(struct fib6_info *ort,
const struct in6_addr *daddr,
const struct in6_addr *saddr)
{
@@ -1058,26 +1178,20 @@
* Clone the route.
*/
- if (ort->rt6i_flags & (RTF_CACHE | RTF_PCPU))
- ort = ort->from;
-
- rcu_read_lock();
dev = ip6_rt_get_dev_rcu(ort);
- rt = __ip6_dst_alloc(dev_net(dev), dev, 0);
- rcu_read_unlock();
+ rt = ip6_dst_alloc(dev_net(dev), dev, 0);
if (!rt)
return NULL;
ip6_rt_copy_init(rt, ort);
rt->rt6i_flags |= RTF_CACHE;
- rt->rt6i_metric = 0;
rt->dst.flags |= DST_HOST;
rt->rt6i_dst.addr = *daddr;
rt->rt6i_dst.plen = 128;
if (!rt6_is_gw_or_nonexthop(ort)) {
- if (ort->rt6i_dst.plen != 128 &&
- ipv6_addr_equal(&ort->rt6i_dst.addr, daddr))
+ if (ort->fib6_dst.plen != 128 &&
+ ipv6_addr_equal(&ort->fib6_dst.addr, daddr))
rt->rt6i_flags |= RTF_ANYCAST;
#ifdef CONFIG_IPV6_SUBTREES
if (rt->rt6i_src.plen && saddr) {
@@ -1090,45 +1204,44 @@
return rt;
}
-static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info *rt)
+static struct rt6_info *ip6_rt_pcpu_alloc(struct fib6_info *rt)
{
+ unsigned short flags = fib6_info_dst_flags(rt);
struct net_device *dev;
struct rt6_info *pcpu_rt;
rcu_read_lock();
dev = ip6_rt_get_dev_rcu(rt);
- pcpu_rt = __ip6_dst_alloc(dev_net(dev), dev, rt->dst.flags);
+ pcpu_rt = ip6_dst_alloc(dev_net(dev), dev, flags);
rcu_read_unlock();
if (!pcpu_rt)
return NULL;
ip6_rt_copy_init(pcpu_rt, rt);
- pcpu_rt->rt6i_protocol = rt->rt6i_protocol;
pcpu_rt->rt6i_flags |= RTF_PCPU;
return pcpu_rt;
}
/* It should be called with rcu_read_lock() acquired */
-static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt)
+static struct rt6_info *rt6_get_pcpu_route(struct fib6_info *rt)
{
struct rt6_info *pcpu_rt, **p;
p = this_cpu_ptr(rt->rt6i_pcpu);
pcpu_rt = *p;
- if (pcpu_rt && ip6_hold_safe(NULL, &pcpu_rt, false))
- rt6_dst_from_metrics_check(pcpu_rt);
+ if (pcpu_rt)
+ ip6_hold_safe(NULL, &pcpu_rt, false);
return pcpu_rt;
}
-static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt)
+static struct rt6_info *rt6_make_pcpu_route(struct net *net,
+ struct fib6_info *rt)
{
struct rt6_info *pcpu_rt, *prev, **p;
pcpu_rt = ip6_rt_pcpu_alloc(rt);
if (!pcpu_rt) {
- struct net *net = dev_net(rt->dst.dev);
-
dst_hold(&net->ipv6.ip6_null_entry->dst);
return net->ipv6.ip6_null_entry;
}
@@ -1138,7 +1251,6 @@
prev = cmpxchg(p, NULL, pcpu_rt);
BUG_ON(prev);
- rt6_dst_from_metrics_check(pcpu_rt);
return pcpu_rt;
}
@@ -1158,9 +1270,8 @@
return;
net = dev_net(rt6_ex->rt6i->dst.dev);
- rt6_ex->rt6i->rt6i_node = NULL;
hlist_del_rcu(&rt6_ex->hlist);
- rt6_release(rt6_ex->rt6i);
+ dst_release(&rt6_ex->rt6i->dst);
kfree_rcu(rt6_ex, rcu);
WARN_ON_ONCE(!bucket->depth);
bucket->depth--;
@@ -1268,20 +1379,36 @@
return NULL;
}
-static int rt6_insert_exception(struct rt6_info *nrt,
- struct rt6_info *ort)
+static unsigned int fib6_mtu(const struct fib6_info *rt)
{
- struct net *net = dev_net(ort->dst.dev);
+ unsigned int mtu;
+
+ if (rt->fib6_pmtu) {
+ mtu = rt->fib6_pmtu;
+ } else {
+ struct net_device *dev = fib6_info_nh_dev(rt);
+ struct inet6_dev *idev;
+
+ rcu_read_lock();
+ idev = __in6_dev_get(dev);
+ mtu = idev->cnf.mtu6;
+ rcu_read_unlock();
+ }
+
+ mtu = min_t(unsigned int, mtu, IP6_MAX_MTU);
+
+ return mtu - lwtunnel_headroom(rt->fib6_nh.nh_lwtstate, mtu);
+}
+
+static int rt6_insert_exception(struct rt6_info *nrt,
+ struct fib6_info *ort)
+{
+ struct net *net = dev_net(nrt->dst.dev);
struct rt6_exception_bucket *bucket;
struct in6_addr *src_key = NULL;
struct rt6_exception *rt6_ex;
int err = 0;
- /* ort can't be a cache or pcpu route */
- if (ort->rt6i_flags & (RTF_CACHE | RTF_PCPU))
- ort = ort->from;
- WARN_ON_ONCE(ort->rt6i_flags & (RTF_CACHE | RTF_PCPU));
-
spin_lock_bh(&rt6_exception_lock);
if (ort->exception_bucket_flushed) {
@@ -1308,19 +1435,19 @@
* Otherwise, the exception table is indexed by
* a hash of only rt6i_dst.
*/
- if (ort->rt6i_src.plen)
+ if (ort->fib6_src.plen)
src_key = &nrt->rt6i_src.addr;
#endif
/* Update rt6i_prefsrc as it could be changed
* in rt6_remove_prefsrc()
*/
- nrt->rt6i_prefsrc = ort->rt6i_prefsrc;
+ nrt->rt6i_prefsrc = ort->fib6_prefsrc;
/* rt6_mtu_change() might lower mtu on ort.
* Only insert this exception route if its mtu
* is less than ort's mtu value.
*/
- if (nrt->rt6i_pmtu >= dst_mtu(&ort->dst)) {
+ if (dst_metric_raw(&nrt->dst, RTAX_MTU) >= fib6_mtu(ort)) {
err = -EINVAL;
goto out;
}
@@ -1337,8 +1464,6 @@
}
rt6_ex->rt6i = nrt;
rt6_ex->stamp = jiffies;
- atomic_inc(&nrt->rt6i_ref);
- nrt->rt6i_node = ort->rt6i_node;
hlist_add_head_rcu(&rt6_ex->hlist, &bucket->chain);
bucket->depth++;
net->ipv6.rt6_stats->fib_rt_cache++;
@@ -1351,16 +1476,16 @@
/* Update fn->fn_sernum to invalidate all cached dst */
if (!err) {
- spin_lock_bh(&ort->rt6i_table->tb6_lock);
- fib6_update_sernum(ort);
- spin_unlock_bh(&ort->rt6i_table->tb6_lock);
+ spin_lock_bh(&ort->fib6_table->tb6_lock);
+ fib6_update_sernum(net, ort);
+ spin_unlock_bh(&ort->fib6_table->tb6_lock);
fib6_force_start_gc(net);
}
return err;
}
-void rt6_flush_exceptions(struct rt6_info *rt)
+void rt6_flush_exceptions(struct fib6_info *rt)
{
struct rt6_exception_bucket *bucket;
struct rt6_exception *rt6_ex;
@@ -1390,7 +1515,7 @@
/* Find cached rt in the hash table inside passed in rt
* Caller has to hold rcu_read_lock()
*/
-static struct rt6_info *rt6_find_cached_rt(struct rt6_info *rt,
+static struct rt6_info *rt6_find_cached_rt(struct fib6_info *rt,
struct in6_addr *daddr,
struct in6_addr *saddr)
{
@@ -1408,7 +1533,7 @@
* Otherwise, the exception table is indexed by
* a hash of only rt6i_dst.
*/
- if (rt->rt6i_src.plen)
+ if (rt->fib6_src.plen)
src_key = saddr;
#endif
rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
@@ -1420,14 +1545,15 @@
}
/* Remove the passed in cached rt from the hash table that contains it */
-int rt6_remove_exception_rt(struct rt6_info *rt)
+static int rt6_remove_exception_rt(struct rt6_info *rt)
{
struct rt6_exception_bucket *bucket;
- struct rt6_info *from = rt->from;
struct in6_addr *src_key = NULL;
struct rt6_exception *rt6_ex;
+ struct fib6_info *from;
int err;
+ from = rcu_dereference(rt->from);
if (!from ||
!(rt->rt6i_flags & RTF_CACHE))
return -EINVAL;
@@ -1445,7 +1571,7 @@
* Otherwise, the exception table is indexed by
* a hash of only rt6i_dst.
*/
- if (from->rt6i_src.plen)
+ if (from->fib6_src.plen)
src_key = &rt->rt6i_src.addr;
#endif
rt6_ex = __rt6_find_exception_spinlock(&bucket,
@@ -1468,7 +1594,7 @@
static void rt6_update_exception_stamp_rt(struct rt6_info *rt)
{
struct rt6_exception_bucket *bucket;
- struct rt6_info *from = rt->from;
+ struct fib6_info *from = rt->from;
struct in6_addr *src_key = NULL;
struct rt6_exception *rt6_ex;
@@ -1486,7 +1612,7 @@
* Otherwise, the exception table is indexed by
* a hash of only rt6i_dst.
*/
- if (from->rt6i_src.plen)
+ if (from->fib6_src.plen)
src_key = &rt->rt6i_src.addr;
#endif
rt6_ex = __rt6_find_exception_rcu(&bucket,
@@ -1498,7 +1624,7 @@
rcu_read_unlock();
}
-static void rt6_exceptions_remove_prefsrc(struct rt6_info *rt)
+static void rt6_exceptions_remove_prefsrc(struct fib6_info *rt)
{
struct rt6_exception_bucket *bucket;
struct rt6_exception *rt6_ex;
@@ -1540,7 +1666,7 @@
}
static void rt6_exceptions_update_pmtu(struct inet6_dev *idev,
- struct rt6_info *rt, int mtu)
+ struct fib6_info *rt, int mtu)
{
struct rt6_exception_bucket *bucket;
struct rt6_exception *rt6_ex;
@@ -1557,12 +1683,12 @@
struct rt6_info *entry = rt6_ex->rt6i;
/* For RTF_CACHE with rt6i_pmtu == 0 (i.e. a redirected
- * route), the metrics of its rt->dst.from have already
+ * route), the metrics of its rt->from have already
* been updated.
*/
- if (entry->rt6i_pmtu &&
+ if (dst_metric_raw(&entry->dst, RTAX_MTU) &&
rt6_mtu_change_route_allowed(idev, entry, mtu))
- entry->rt6i_pmtu = mtu;
+ dst_metric_set(&entry->dst, RTAX_MTU, mtu);
}
bucket++;
}
@@ -1570,7 +1696,7 @@
#define RTF_CACHE_GATEWAY (RTF_GATEWAY | RTF_CACHE)
-static void rt6_exceptions_clean_tohost(struct rt6_info *rt,
+static void rt6_exceptions_clean_tohost(struct fib6_info *rt,
struct in6_addr *gateway)
{
struct rt6_exception_bucket *bucket;
@@ -1649,7 +1775,7 @@
gc_args->more++;
}
-void rt6_age_exceptions(struct rt6_info *rt,
+void rt6_age_exceptions(struct fib6_info *rt,
struct fib6_gc_args *gc_args,
unsigned long now)
{
@@ -1680,32 +1806,22 @@
rcu_read_unlock_bh();
}
-struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
- int oif, struct flowi6 *fl6,
- const struct sk_buff *skb, int flags)
+/* must be called with rcu lock held */
+struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table,
+ int oif, struct flowi6 *fl6, int strict)
{
struct fib6_node *fn, *saved_fn;
- struct rt6_info *rt, *rt_cache;
- int strict = 0;
+ struct fib6_info *f6i;
- strict |= flags & RT6_LOOKUP_F_IFACE;
- strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE;
- if (net->ipv6.devconf_all->forwarding == 0)
- strict |= RT6_LOOKUP_F_REACHABLE;
-
- rcu_read_lock();
-
- fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+ fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
saved_fn = fn;
if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF)
oif = 0;
redo_rt6_select:
- rt = rt6_select(net, fn, oif, strict);
- if (rt->rt6i_nsiblings)
- rt = rt6_multipath_select(net, rt, fl6, oif, skb, strict);
- if (rt == net->ipv6.ip6_null_entry) {
+ f6i = rt6_select(net, fn, oif, strict);
+ if (f6i == net->ipv6.fib6_null_entry) {
fn = fib6_backtrack(fn, &fl6->saddr);
if (fn)
goto redo_rt6_select;
@@ -1717,45 +1833,57 @@
}
}
- /*Search through exception table */
- rt_cache = rt6_find_cached_rt(rt, &fl6->daddr, &fl6->saddr);
- if (rt_cache)
- rt = rt_cache;
+ trace_fib6_table_lookup(net, f6i, table, fl6);
- if (rt == net->ipv6.ip6_null_entry) {
+ return f6i;
+}
+
+struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
+ int oif, struct flowi6 *fl6,
+ const struct sk_buff *skb, int flags)
+{
+ struct fib6_info *f6i;
+ struct rt6_info *rt;
+ int strict = 0;
+
+ strict |= flags & RT6_LOOKUP_F_IFACE;
+ strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE;
+ if (net->ipv6.devconf_all->forwarding == 0)
+ strict |= RT6_LOOKUP_F_REACHABLE;
+
+ rcu_read_lock();
+
+ f6i = fib6_table_lookup(net, table, oif, fl6, strict);
+ if (f6i->fib6_nsiblings)
+ f6i = fib6_multipath_select(net, f6i, fl6, oif, skb, strict);
+
+ if (f6i == net->ipv6.fib6_null_entry) {
+ rt = net->ipv6.ip6_null_entry;
rcu_read_unlock();
dst_hold(&rt->dst);
- trace_fib6_table_lookup(net, rt, table, fl6);
return rt;
- } else if (rt->rt6i_flags & RTF_CACHE) {
- if (ip6_hold_safe(net, &rt, true)) {
+ }
+
+ /*Search through exception table */
+ rt = rt6_find_cached_rt(f6i, &fl6->daddr, &fl6->saddr);
+ if (rt) {
+ if (ip6_hold_safe(net, &rt, true))
dst_use_noref(&rt->dst, jiffies);
- rt6_dst_from_metrics_check(rt);
- }
+
rcu_read_unlock();
- trace_fib6_table_lookup(net, rt, table, fl6);
return rt;
} else if (unlikely((fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH) &&
- !(rt->rt6i_flags & RTF_GATEWAY))) {
+ !(f6i->fib6_flags & RTF_GATEWAY))) {
/* Create a RTF_CACHE clone which will not be
* owned by the fib6 tree. It is for the special case where
* the daddr in the skb during the neighbor look-up is different
* from the fl6->daddr used to look-up route here.
*/
-
struct rt6_info *uncached_rt;
- if (ip6_hold_safe(net, &rt, true)) {
- dst_use_noref(&rt->dst, jiffies);
- } else {
- rcu_read_unlock();
- uncached_rt = rt;
- goto uncached_rt_out;
- }
- rcu_read_unlock();
+ uncached_rt = ip6_rt_cache_alloc(f6i, &fl6->daddr, NULL);
- uncached_rt = ip6_rt_cache_alloc(rt, &fl6->daddr, NULL);
- dst_release(&rt->dst);
+ rcu_read_unlock();
if (uncached_rt) {
/* Uncached_rt's refcnt is taken during ip6_rt_cache_alloc()
@@ -1768,36 +1896,21 @@
dst_hold(&uncached_rt->dst);
}
-uncached_rt_out:
- trace_fib6_table_lookup(net, uncached_rt, table, fl6);
return uncached_rt;
-
} else {
/* Get a percpu copy */
struct rt6_info *pcpu_rt;
- dst_use_noref(&rt->dst, jiffies);
local_bh_disable();
- pcpu_rt = rt6_get_pcpu_route(rt);
+ pcpu_rt = rt6_get_pcpu_route(f6i);
- if (!pcpu_rt) {
- /* atomic_inc_not_zero() is needed when using rcu */
- if (atomic_inc_not_zero(&rt->rt6i_ref)) {
- /* No dst_hold() on rt is needed because grabbing
- * rt->rt6i_ref makes sure rt can't be released.
- */
- pcpu_rt = rt6_make_pcpu_route(rt);
- rt6_release(rt);
- } else {
- /* rt is already removed from tree */
- pcpu_rt = net->ipv6.ip6_null_entry;
- dst_hold(&pcpu_rt->dst);
- }
- }
+ if (!pcpu_rt)
+ pcpu_rt = rt6_make_pcpu_route(net, f6i);
+
local_bh_enable();
rcu_read_unlock();
- trace_fib6_table_lookup(net, pcpu_rt, table, fl6);
+
return pcpu_rt;
}
}
@@ -2020,7 +2133,6 @@
rt->rt6i_idev = in6_dev_get(loopback_dev);
rt->rt6i_gateway = ort->rt6i_gateway;
rt->rt6i_flags = ort->rt6i_flags & ~RTF_PCPU;
- rt->rt6i_metric = 0;
memcpy(&rt->rt6i_dst, &ort->rt6i_dst, sizeof(struct rt6key));
#ifdef CONFIG_IPV6_SUBTREES
@@ -2036,18 +2148,27 @@
* Destination cache support functions
*/
-static void rt6_dst_from_metrics_check(struct rt6_info *rt)
-{
- if (rt->from &&
- dst_metrics_ptr(&rt->dst) != dst_metrics_ptr(&rt->from->dst))
- dst_init_metrics(&rt->dst, dst_metrics_ptr(&rt->from->dst), true);
-}
-
-static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
+static bool fib6_check(struct fib6_info *f6i, u32 cookie)
{
u32 rt_cookie = 0;
- if (!rt6_get_cookie_safe(rt, &rt_cookie) || rt_cookie != cookie)
+ if (!fib6_get_cookie_safe(f6i, &rt_cookie) || rt_cookie != cookie)
+ return false;
+
+ if (fib6_check_expired(f6i))
+ return false;
+
+ return true;
+}
+
+static struct dst_entry *rt6_check(struct rt6_info *rt,
+ struct fib6_info *from,
+ u32 cookie)
+{
+ u32 rt_cookie = 0;
+
+ if ((from && !fib6_get_cookie_safe(from, &rt_cookie)) ||
+ rt_cookie != cookie)
return NULL;
if (rt6_check_expired(rt))
@@ -2056,11 +2177,13 @@
return &rt->dst;
}
-static struct dst_entry *rt6_dst_from_check(struct rt6_info *rt, u32 cookie)
+static struct dst_entry *rt6_dst_from_check(struct rt6_info *rt,
+ struct fib6_info *from,
+ u32 cookie)
{
if (!__rt6_check_expired(rt) &&
rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
- rt6_check(rt->from, cookie))
+ fib6_check(from, cookie))
return &rt->dst;
else
return NULL;
@@ -2068,22 +2191,30 @@
static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie)
{
+ struct dst_entry *dst_ret;
+ struct fib6_info *from;
struct rt6_info *rt;
- rt = (struct rt6_info *) dst;
+ rt = container_of(dst, struct rt6_info, dst);
+
+ rcu_read_lock();
/* All IPV6 dsts are created with ->obsolete set to the value
* DST_OBSOLETE_FORCE_CHK which forces validation calls down
* into this function always.
*/
- rt6_dst_from_metrics_check(rt);
+ from = rcu_dereference(rt->from);
- if (rt->rt6i_flags & RTF_PCPU ||
- (unlikely(!list_empty(&rt->rt6i_uncached)) && rt->from))
- return rt6_dst_from_check(rt, cookie);
+ if (from && (rt->rt6i_flags & RTF_PCPU ||
+ unlikely(!list_empty(&rt->rt6i_uncached))))
+ dst_ret = rt6_dst_from_check(rt, from, cookie);
else
- return rt6_check(rt, cookie);
+ dst_ret = rt6_check(rt, from, cookie);
+
+ rcu_read_unlock();
+
+ return dst_ret;
}
static struct dst_entry *ip6_negative_advice(struct dst_entry *dst)
@@ -2092,10 +2223,12 @@
if (rt) {
if (rt->rt6i_flags & RTF_CACHE) {
+ rcu_read_lock();
if (rt6_check_expired(rt)) {
- ip6_del_rt(rt);
+ rt6_remove_exception_rt(rt);
dst = NULL;
}
+ rcu_read_unlock();
} else {
dst_release(dst);
dst = NULL;
@@ -2112,35 +2245,60 @@
rt = (struct rt6_info *) skb_dst(skb);
if (rt) {
+ rcu_read_lock();
if (rt->rt6i_flags & RTF_CACHE) {
if (dst_hold_safe(&rt->dst))
- ip6_del_rt(rt);
+ rt6_remove_exception_rt(rt);
} else {
+ struct fib6_info *from;
struct fib6_node *fn;
- rcu_read_lock();
- fn = rcu_dereference(rt->rt6i_node);
- if (fn && (rt->rt6i_flags & RTF_DEFAULT))
- fn->fn_sernum = -1;
- rcu_read_unlock();
+ from = rcu_dereference(rt->from);
+ if (from) {
+ fn = rcu_dereference(from->fib6_node);
+ if (fn && (rt->rt6i_flags & RTF_DEFAULT))
+ fn->fn_sernum = -1;
+ }
}
+ rcu_read_unlock();
}
}
+static void rt6_update_expires(struct rt6_info *rt0, int timeout)
+{
+ if (!(rt0->rt6i_flags & RTF_EXPIRES)) {
+ struct fib6_info *from;
+
+ rcu_read_lock();
+ from = rcu_dereference(rt0->from);
+ if (from)
+ rt0->dst.expires = from->expires;
+ rcu_read_unlock();
+ }
+
+ dst_set_expires(&rt0->dst, timeout);
+ rt0->rt6i_flags |= RTF_EXPIRES;
+}
+
static void rt6_do_update_pmtu(struct rt6_info *rt, u32 mtu)
{
struct net *net = dev_net(rt->dst.dev);
+ dst_metric_set(&rt->dst, RTAX_MTU, mtu);
rt->rt6i_flags |= RTF_MODIFIED;
- rt->rt6i_pmtu = mtu;
rt6_update_expires(rt, net->ipv6.sysctl.ip6_rt_mtu_expires);
}
static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
{
+ bool from_set;
+
+ rcu_read_lock();
+ from_set = !!rcu_dereference(rt->from);
+ rcu_read_unlock();
+
return !(rt->rt6i_flags & RTF_CACHE) &&
- (rt->rt6i_flags & RTF_PCPU ||
- rcu_access_pointer(rt->rt6i_node));
+ (rt->rt6i_flags & RTF_PCPU || from_set);
}
static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
@@ -2176,14 +2334,18 @@
if (rt6->rt6i_flags & RTF_CACHE)
rt6_update_exception_stamp_rt(rt6);
} else if (daddr) {
+ struct fib6_info *from;
struct rt6_info *nrt6;
- nrt6 = ip6_rt_cache_alloc(rt6, daddr, saddr);
+ rcu_read_lock();
+ from = rcu_dereference(rt6->from);
+ nrt6 = ip6_rt_cache_alloc(from, daddr, saddr);
if (nrt6) {
rt6_do_update_pmtu(nrt6, mtu);
- if (rt6_insert_exception(nrt6, rt6))
+ if (rt6_insert_exception(nrt6, from))
dst_release_immediate(&nrt6->dst);
}
+ rcu_read_unlock();
}
}
@@ -2264,7 +2426,8 @@
int flags)
{
struct ip6rd_flowi *rdfl = (struct ip6rd_flowi *)fl6;
- struct rt6_info *rt, *rt_cache;
+ struct rt6_info *ret = NULL, *rt_cache;
+ struct fib6_info *rt;
struct fib6_node *fn;
/* Get the "current" route for this destination and
@@ -2278,32 +2441,32 @@
*/
rcu_read_lock();
- fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
+ fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
restart:
for_each_fib6_node_rt_rcu(fn) {
- if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+ if (rt->fib6_nh.nh_flags & RTNH_F_DEAD)
continue;
- if (rt6_check_expired(rt))
+ if (fib6_check_expired(rt))
continue;
- if (rt->dst.error)
+ if (rt->fib6_flags & RTF_REJECT)
break;
- if (!(rt->rt6i_flags & RTF_GATEWAY))
+ if (!(rt->fib6_flags & RTF_GATEWAY))
continue;
- if (fl6->flowi6_oif != rt->dst.dev->ifindex)
+ if (fl6->flowi6_oif != rt->fib6_nh.nh_dev->ifindex)
continue;
/* rt_cache's gateway might be different from its 'parent'
* in the case of an ip redirect.
* So we keep searching in the exception table if the gateway
* is different.
*/
- if (!ipv6_addr_equal(&rdfl->gateway, &rt->rt6i_gateway)) {
+ if (!ipv6_addr_equal(&rdfl->gateway, &rt->fib6_nh.nh_gw)) {
rt_cache = rt6_find_cached_rt(rt,
&fl6->daddr,
&fl6->saddr);
if (rt_cache &&
ipv6_addr_equal(&rdfl->gateway,
&rt_cache->rt6i_gateway)) {
- rt = rt_cache;
+ ret = rt_cache;
break;
}
continue;
@@ -2312,25 +2475,28 @@
}
if (!rt)
- rt = net->ipv6.ip6_null_entry;
- else if (rt->dst.error) {
- rt = net->ipv6.ip6_null_entry;
+ rt = net->ipv6.fib6_null_entry;
+ else if (rt->fib6_flags & RTF_REJECT) {
+ ret = net->ipv6.ip6_null_entry;
goto out;
}
- if (rt == net->ipv6.ip6_null_entry) {
+ if (rt == net->ipv6.fib6_null_entry) {
fn = fib6_backtrack(fn, &fl6->saddr);
if (fn)
goto restart;
}
out:
- ip6_hold_safe(net, &rt, true);
+ if (ret)
+ dst_hold(&ret->dst);
+ else
+ ret = ip6_create_rt_rcu(rt);
rcu_read_unlock();
trace_fib6_table_lookup(net, rt, table, fl6);
- return rt;
+ return ret;
};
static struct dst_entry *ip6_route_redirect(struct net *net,
@@ -2422,12 +2588,8 @@
static unsigned int ip6_mtu(const struct dst_entry *dst)
{
- const struct rt6_info *rt = (const struct rt6_info *)dst;
- unsigned int mtu = rt->rt6i_pmtu;
struct inet6_dev *idev;
-
- if (mtu)
- goto out;
+ unsigned int mtu;
mtu = dst_metric_raw(dst, RTAX_MTU);
if (mtu)
@@ -2447,6 +2609,54 @@
return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
}
+/* MTU selection:
+ * 1. mtu on route is locked - use it
+ * 2. mtu from nexthop exception
+ * 3. mtu from egress device
+ *
+ * based on ip6_dst_mtu_forward and exception logic of
+ * rt6_find_cached_rt; called with rcu_read_lock
+ */
+u32 ip6_mtu_from_fib6(struct fib6_info *f6i, struct in6_addr *daddr,
+ struct in6_addr *saddr)
+{
+ struct rt6_exception_bucket *bucket;
+ struct rt6_exception *rt6_ex;
+ struct in6_addr *src_key;
+ struct inet6_dev *idev;
+ u32 mtu = 0;
+
+ if (unlikely(fib6_metric_locked(f6i, RTAX_MTU))) {
+ mtu = f6i->fib6_pmtu;
+ if (mtu)
+ goto out;
+ }
+
+ src_key = NULL;
+#ifdef CONFIG_IPV6_SUBTREES
+ if (f6i->fib6_src.plen)
+ src_key = saddr;
+#endif
+
+ bucket = rcu_dereference(f6i->rt6i_exception_bucket);
+ rt6_ex = __rt6_find_exception_rcu(&bucket, daddr, src_key);
+ if (rt6_ex && !rt6_check_expired(rt6_ex->rt6i))
+ mtu = dst_metric_raw(&rt6_ex->rt6i->dst, RTAX_MTU);
+
+ if (likely(!mtu)) {
+ struct net_device *dev = fib6_info_nh_dev(f6i);
+
+ mtu = IPV6_MIN_MTU;
+ idev = __in6_dev_get(dev);
+ if (idev && idev->cnf.mtu6 > mtu)
+ mtu = idev->cnf.mtu6;
+ }
+
+ mtu = min_t(unsigned int, mtu, IP6_MAX_MTU);
+out:
+ return mtu - lwtunnel_headroom(fib6_info_nh_lwt(f6i), mtu);
+}
+
struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
struct flowi6 *fl6)
{
@@ -2511,60 +2721,22 @@
return entries > rt_max_size;
}
-static int ip6_convert_metrics(struct mx6_config *mxc,
- const struct fib6_config *cfg)
+static int ip6_convert_metrics(struct net *net, struct fib6_info *rt,
+ struct fib6_config *cfg)
{
- struct net *net = cfg->fc_nlinfo.nl_net;
- bool ecn_ca = false;
- struct nlattr *nla;
- int remaining;
- u32 *mp;
+ struct dst_metrics *p;
if (!cfg->fc_mx)
return 0;
- mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
- if (unlikely(!mp))
+ p = kzalloc(sizeof(*rt->fib6_metrics), GFP_KERNEL);
+ if (unlikely(!p))
return -ENOMEM;
- nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla_type(nla);
- u32 val;
+ refcount_set(&p->refcnt, 1);
+ rt->fib6_metrics = p;
- if (!type)
- continue;
- if (unlikely(type > RTAX_MAX))
- goto err;
-
- if (type == RTAX_CC_ALGO) {
- char tmp[TCP_CA_NAME_MAX];
-
- nla_strlcpy(tmp, nla, sizeof(tmp));
- val = tcp_ca_get_key_by_name(net, tmp, &ecn_ca);
- if (val == TCP_CA_UNSPEC)
- goto err;
- } else {
- val = nla_get_u32(nla);
- }
- if (type == RTAX_HOPLIMIT && val > 255)
- val = 255;
- if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK))
- goto err;
-
- mp[type - 1] = val;
- __set_bit(type - 1, mxc->mx_valid);
- }
-
- if (ecn_ca) {
- __set_bit(RTAX_FEATURES - 1, mxc->mx_valid);
- mp[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA;
- }
-
- mxc->mx = mp;
- return 0;
- err:
- kfree(mp);
- return -EINVAL;
+ return ip_metrics_convert(net, cfg->fc_mx, cfg->fc_mx_len, p->metrics);
}
static struct rt6_info *ip6_nh_lookup_table(struct net *net,
@@ -2750,11 +2922,12 @@
return err;
}
-static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg,
+static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
+ gfp_t gfp_flags,
struct netlink_ext_ack *extack)
{
struct net *net = cfg->fc_nlinfo.nl_net;
- struct rt6_info *rt = NULL;
+ struct fib6_info *rt = NULL;
struct net_device *dev = NULL;
struct inet6_dev *idev = NULL;
struct fib6_table *table;
@@ -2773,6 +2946,11 @@
goto out;
}
+ if (cfg->fc_type > RTN_MAX) {
+ NL_SET_ERR_MSG(extack, "Invalid route type");
+ goto out;
+ }
+
if (cfg->fc_dst_len > 128) {
NL_SET_ERR_MSG(extack, "Invalid prefix length");
goto out;
@@ -2831,35 +3009,30 @@
if (!table)
goto out;
- rt = ip6_dst_alloc(net, NULL,
- (cfg->fc_flags & RTF_ADDRCONF) ? 0 : DST_NOCOUNT);
-
- if (!rt) {
- err = -ENOMEM;
+ err = -ENOMEM;
+ rt = fib6_info_alloc(gfp_flags);
+ if (!rt)
goto out;
- }
+
+ if (cfg->fc_flags & RTF_ADDRCONF)
+ rt->dst_nocount = true;
+
+ err = ip6_convert_metrics(net, rt, cfg);
+ if (err < 0)
+ goto out;
if (cfg->fc_flags & RTF_EXPIRES)
- rt6_set_expires(rt, jiffies +
+ fib6_set_expires(rt, jiffies +
clock_t_to_jiffies(cfg->fc_expires));
else
- rt6_clean_expires(rt);
+ fib6_clean_expires(rt);
if (cfg->fc_protocol == RTPROT_UNSPEC)
cfg->fc_protocol = RTPROT_BOOT;
- rt->rt6i_protocol = cfg->fc_protocol;
+ rt->fib6_protocol = cfg->fc_protocol;
addr_type = ipv6_addr_type(&cfg->fc_dst);
- if (addr_type & IPV6_ADDR_MULTICAST)
- rt->dst.input = ip6_mc_input;
- else if (cfg->fc_flags & RTF_LOCAL)
- rt->dst.input = ip6_input;
- else
- rt->dst.input = ip6_forward;
-
- rt->dst.output = ip6_output;
-
if (cfg->fc_encap) {
struct lwtunnel_state *lwtstate;
@@ -2868,22 +3041,23 @@
&lwtstate, extack);
if (err)
goto out;
- rt->dst.lwtstate = lwtstate_get(lwtstate);
- lwtunnel_set_redirect(&rt->dst);
+ rt->fib6_nh.nh_lwtstate = lwtstate_get(lwtstate);
}
- ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
- rt->rt6i_dst.plen = cfg->fc_dst_len;
- if (rt->rt6i_dst.plen == 128)
- rt->dst.flags |= DST_HOST;
+ ipv6_addr_prefix(&rt->fib6_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
+ rt->fib6_dst.plen = cfg->fc_dst_len;
+ if (rt->fib6_dst.plen == 128)
+ rt->dst_host = true;
#ifdef CONFIG_IPV6_SUBTREES
- ipv6_addr_prefix(&rt->rt6i_src.addr, &cfg->fc_src, cfg->fc_src_len);
- rt->rt6i_src.plen = cfg->fc_src_len;
+ ipv6_addr_prefix(&rt->fib6_src.addr, &cfg->fc_src, cfg->fc_src_len);
+ rt->fib6_src.plen = cfg->fc_src_len;
#endif
- rt->rt6i_metric = cfg->fc_metric;
- rt->rt6i_nh_weight = 1;
+ rt->fib6_metric = cfg->fc_metric;
+ rt->fib6_nh.nh_weight = 1;
+
+ rt->fib6_type = cfg->fc_type;
/* We cannot add true routes via loopback here,
they would result in kernel looping; promote them to reject routes
@@ -2906,28 +3080,7 @@
goto out;
}
}
- rt->rt6i_flags = RTF_REJECT|RTF_NONEXTHOP;
- switch (cfg->fc_type) {
- case RTN_BLACKHOLE:
- rt->dst.error = -EINVAL;
- rt->dst.output = dst_discard_out;
- rt->dst.input = dst_discard;
- break;
- case RTN_PROHIBIT:
- rt->dst.error = -EACCES;
- rt->dst.output = ip6_pkt_prohibit_out;
- rt->dst.input = ip6_pkt_prohibit;
- break;
- case RTN_THROW:
- case RTN_UNREACHABLE:
- default:
- rt->dst.error = (cfg->fc_type == RTN_THROW) ? -EAGAIN
- : (cfg->fc_type == RTN_UNREACHABLE)
- ? -EHOSTUNREACH : -ENETUNREACH;
- rt->dst.output = ip6_pkt_discard_out;
- rt->dst.input = ip6_pkt_discard;
- break;
- }
+ rt->fib6_flags = RTF_REJECT|RTF_NONEXTHOP;
goto install_route;
}
@@ -2936,7 +3089,7 @@
if (err)
goto out;
- rt->rt6i_gateway = cfg->fc_gateway;
+ rt->fib6_nh.nh_gw = cfg->fc_gateway;
}
err = -ENODEV;
@@ -2961,96 +3114,82 @@
err = -EINVAL;
goto out;
}
- rt->rt6i_prefsrc.addr = cfg->fc_prefsrc;
- rt->rt6i_prefsrc.plen = 128;
+ rt->fib6_prefsrc.addr = cfg->fc_prefsrc;
+ rt->fib6_prefsrc.plen = 128;
} else
- rt->rt6i_prefsrc.plen = 0;
+ rt->fib6_prefsrc.plen = 0;
- rt->rt6i_flags = cfg->fc_flags;
+ rt->fib6_flags = cfg->fc_flags;
install_route:
- if (!(rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST)) &&
+ if (!(rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST)) &&
!netif_carrier_ok(dev))
- rt->rt6i_nh_flags |= RTNH_F_LINKDOWN;
- rt->rt6i_nh_flags |= (cfg->fc_flags & RTNH_F_ONLINK);
- rt->dst.dev = dev;
- rt->rt6i_idev = idev;
- rt->rt6i_table = table;
+ rt->fib6_nh.nh_flags |= RTNH_F_LINKDOWN;
+ rt->fib6_nh.nh_flags |= (cfg->fc_flags & RTNH_F_ONLINK);
+ rt->fib6_nh.nh_dev = dev;
+ rt->fib6_table = table;
cfg->fc_nlinfo.nl_net = dev_net(dev);
+ if (idev)
+ in6_dev_put(idev);
+
return rt;
out:
if (dev)
dev_put(dev);
if (idev)
in6_dev_put(idev);
- if (rt)
- dst_release_immediate(&rt->dst);
+ fib6_info_release(rt);
return ERR_PTR(err);
}
-int ip6_route_add(struct fib6_config *cfg,
+int ip6_route_add(struct fib6_config *cfg, gfp_t gfp_flags,
struct netlink_ext_ack *extack)
{
- struct mx6_config mxc = { .mx = NULL, };
- struct rt6_info *rt;
+ struct fib6_info *rt;
int err;
- rt = ip6_route_info_create(cfg, extack);
- if (IS_ERR(rt)) {
- err = PTR_ERR(rt);
- rt = NULL;
- goto out;
- }
+ rt = ip6_route_info_create(cfg, gfp_flags, extack);
+ if (IS_ERR(rt))
+ return PTR_ERR(rt);
- err = ip6_convert_metrics(&mxc, cfg);
- if (err)
- goto out;
-
- err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, &mxc, extack);
-
- kfree(mxc.mx);
-
- return err;
-out:
- if (rt)
- dst_release_immediate(&rt->dst);
+ err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, extack);
+ fib6_info_release(rt);
return err;
}
-static int __ip6_del_rt(struct rt6_info *rt, struct nl_info *info)
+static int __ip6_del_rt(struct fib6_info *rt, struct nl_info *info)
{
- int err;
+ struct net *net = info->nl_net;
struct fib6_table *table;
- struct net *net = dev_net(rt->dst.dev);
+ int err;
- if (rt == net->ipv6.ip6_null_entry) {
+ if (rt == net->ipv6.fib6_null_entry) {
err = -ENOENT;
goto out;
}
- table = rt->rt6i_table;
+ table = rt->fib6_table;
spin_lock_bh(&table->tb6_lock);
err = fib6_del(rt, info);
spin_unlock_bh(&table->tb6_lock);
out:
- ip6_rt_put(rt);
+ fib6_info_release(rt);
return err;
}
-int ip6_del_rt(struct rt6_info *rt)
+int ip6_del_rt(struct net *net, struct fib6_info *rt)
{
- struct nl_info info = {
- .nl_net = dev_net(rt->dst.dev),
- };
+ struct nl_info info = { .nl_net = net };
+
return __ip6_del_rt(rt, &info);
}
-static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg)
+static int __ip6_del_rt_siblings(struct fib6_info *rt, struct fib6_config *cfg)
{
struct nl_info *info = &cfg->fc_nlinfo;
struct net *net = info->nl_net;
@@ -3058,20 +3197,20 @@
struct fib6_table *table;
int err = -ENOENT;
- if (rt == net->ipv6.ip6_null_entry)
+ if (rt == net->ipv6.fib6_null_entry)
goto out_put;
- table = rt->rt6i_table;
+ table = rt->fib6_table;
spin_lock_bh(&table->tb6_lock);
- if (rt->rt6i_nsiblings && cfg->fc_delete_all_nh) {
- struct rt6_info *sibling, *next_sibling;
+ if (rt->fib6_nsiblings && cfg->fc_delete_all_nh) {
+ struct fib6_info *sibling, *next_sibling;
/* prefer to send a single notification with all hops */
skb = nlmsg_new(rt6_nlmsg_size(rt), gfp_any());
if (skb) {
u32 seq = info->nlh ? info->nlh->nlmsg_seq : 0;
- if (rt6_fill_node(net, skb, rt,
+ if (rt6_fill_node(net, skb, rt, NULL,
NULL, NULL, 0, RTM_DELROUTE,
info->portid, seq, 0) < 0) {
kfree_skb(skb);
@@ -3081,8 +3220,8 @@
}
list_for_each_entry_safe(sibling, next_sibling,
- &rt->rt6i_siblings,
- rt6i_siblings) {
+ &rt->fib6_siblings,
+ fib6_siblings) {
err = fib6_del(sibling, info);
if (err)
goto out_unlock;
@@ -3093,7 +3232,7 @@
out_unlock:
spin_unlock_bh(&table->tb6_lock);
out_put:
- ip6_rt_put(rt);
+ fib6_info_release(rt);
if (skb) {
rtnl_notify(skb, net, info->portid, RTNLGRP_IPV6_ROUTE,
@@ -3102,11 +3241,28 @@
return err;
}
+static int ip6_del_cached_rt(struct rt6_info *rt, struct fib6_config *cfg)
+{
+ int rc = -ESRCH;
+
+ if (cfg->fc_ifindex && rt->dst.dev->ifindex != cfg->fc_ifindex)
+ goto out;
+
+ if (cfg->fc_flags & RTF_GATEWAY &&
+ !ipv6_addr_equal(&cfg->fc_gateway, &rt->rt6i_gateway))
+ goto out;
+ if (dst_hold_safe(&rt->dst))
+ rc = rt6_remove_exception_rt(rt);
+out:
+ return rc;
+}
+
static int ip6_route_del(struct fib6_config *cfg,
struct netlink_ext_ack *extack)
{
- struct rt6_info *rt, *rt_cache;
+ struct rt6_info *rt_cache;
struct fib6_table *table;
+ struct fib6_info *rt;
struct fib6_node *fn;
int err = -ESRCH;
@@ -3126,25 +3282,31 @@
if (fn) {
for_each_fib6_node_rt_rcu(fn) {
if (cfg->fc_flags & RTF_CACHE) {
+ int rc;
+
rt_cache = rt6_find_cached_rt(rt, &cfg->fc_dst,
&cfg->fc_src);
- if (!rt_cache)
- continue;
- rt = rt_cache;
+ if (rt_cache) {
+ rc = ip6_del_cached_rt(rt_cache, cfg);
+ if (rc != -ESRCH) {
+ rcu_read_unlock();
+ return rc;
+ }
+ }
+ continue;
}
if (cfg->fc_ifindex &&
- (!rt->dst.dev ||
- rt->dst.dev->ifindex != cfg->fc_ifindex))
+ (!rt->fib6_nh.nh_dev ||
+ rt->fib6_nh.nh_dev->ifindex != cfg->fc_ifindex))
continue;
if (cfg->fc_flags & RTF_GATEWAY &&
- !ipv6_addr_equal(&cfg->fc_gateway, &rt->rt6i_gateway))
+ !ipv6_addr_equal(&cfg->fc_gateway, &rt->fib6_nh.nh_gw))
continue;
- if (cfg->fc_metric && cfg->fc_metric != rt->rt6i_metric)
+ if (cfg->fc_metric && cfg->fc_metric != rt->fib6_metric)
continue;
- if (cfg->fc_protocol && cfg->fc_protocol != rt->rt6i_protocol)
+ if (cfg->fc_protocol && cfg->fc_protocol != rt->fib6_protocol)
continue;
- if (!dst_hold_safe(&rt->dst))
- break;
+ fib6_info_hold(rt);
rcu_read_unlock();
/* if gateway was specified only delete the one hop */
@@ -3166,6 +3328,7 @@
struct ndisc_options ndopts;
struct inet6_dev *in6_dev;
struct neighbour *neigh;
+ struct fib6_info *from;
struct rd_msg *msg;
int optlen, on_link;
u8 *lladdr;
@@ -3247,7 +3410,12 @@
NEIGH_UPDATE_F_ISROUTER)),
NDISC_REDIRECT, &ndopts);
- nrt = ip6_rt_cache_alloc(rt, &msg->dest, NULL);
+ rcu_read_lock();
+ from = rcu_dereference(rt->from);
+ fib6_info_hold(from);
+ rcu_read_unlock();
+
+ nrt = ip6_rt_cache_alloc(from, &msg->dest, NULL);
if (!nrt)
goto out;
@@ -3255,14 +3423,13 @@
if (on_link)
nrt->rt6i_flags &= ~RTF_GATEWAY;
- nrt->rt6i_protocol = RTPROT_REDIRECT;
nrt->rt6i_gateway = *(struct in6_addr *)neigh->primary_key;
/* No need to remove rt from the exception table if rt is
* a cached route because rt6_insert_exception() will
* takes care of it
*/
- if (rt6_insert_exception(nrt, rt)) {
+ if (rt6_insert_exception(nrt, from)) {
dst_release_immediate(&nrt->dst);
goto out;
}
@@ -3274,47 +3441,12 @@
call_netevent_notifiers(NETEVENT_REDIRECT, &netevent);
out:
+ fib6_info_release(from);
neigh_release(neigh);
}
-/*
- * Misc support functions
- */
-
-static void rt6_set_from(struct rt6_info *rt, struct rt6_info *from)
-{
- BUG_ON(from->from);
-
- rt->rt6i_flags &= ~RTF_EXPIRES;
- dst_hold(&from->dst);
- rt->from = from;
- dst_init_metrics(&rt->dst, dst_metrics_ptr(&from->dst), true);
-}
-
-static void ip6_rt_copy_init(struct rt6_info *rt, struct rt6_info *ort)
-{
- rt->dst.input = ort->dst.input;
- rt->dst.output = ort->dst.output;
- rt->rt6i_dst = ort->rt6i_dst;
- rt->dst.error = ort->dst.error;
- rt->rt6i_idev = ort->rt6i_idev;
- if (rt->rt6i_idev)
- in6_dev_hold(rt->rt6i_idev);
- rt->dst.lastuse = jiffies;
- rt->rt6i_gateway = ort->rt6i_gateway;
- rt->rt6i_flags = ort->rt6i_flags;
- rt6_set_from(rt, ort);
- rt->rt6i_metric = ort->rt6i_metric;
-#ifdef CONFIG_IPV6_SUBTREES
- rt->rt6i_src = ort->rt6i_src;
-#endif
- rt->rt6i_prefsrc = ort->rt6i_prefsrc;
- rt->rt6i_table = ort->rt6i_table;
- rt->dst.lwtstate = lwtstate_get(ort->dst.lwtstate);
-}
-
#ifdef CONFIG_IPV6_ROUTE_INFO
-static struct rt6_info *rt6_get_route_info(struct net *net,
+static struct fib6_info *rt6_get_route_info(struct net *net,
const struct in6_addr *prefix, int prefixlen,
const struct in6_addr *gwaddr,
struct net_device *dev)
@@ -3322,7 +3454,7 @@
u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO;
int ifindex = dev->ifindex;
struct fib6_node *fn;
- struct rt6_info *rt = NULL;
+ struct fib6_info *rt = NULL;
struct fib6_table *table;
table = fib6_get_table(net, tb_id);
@@ -3335,13 +3467,13 @@
goto out;
for_each_fib6_node_rt_rcu(fn) {
- if (rt->dst.dev->ifindex != ifindex)
+ if (rt->fib6_nh.nh_dev->ifindex != ifindex)
continue;
- if ((rt->rt6i_flags & (RTF_ROUTEINFO|RTF_GATEWAY)) != (RTF_ROUTEINFO|RTF_GATEWAY))
+ if ((rt->fib6_flags & (RTF_ROUTEINFO|RTF_GATEWAY)) != (RTF_ROUTEINFO|RTF_GATEWAY))
continue;
- if (!ipv6_addr_equal(&rt->rt6i_gateway, gwaddr))
+ if (!ipv6_addr_equal(&rt->fib6_nh.nh_gw, gwaddr))
continue;
- ip6_hold_safe(NULL, &rt, false);
+ fib6_info_hold(rt);
break;
}
out:
@@ -3349,7 +3481,7 @@
return rt;
}
-static struct rt6_info *rt6_add_route_info(struct net *net,
+static struct fib6_info *rt6_add_route_info(struct net *net,
const struct in6_addr *prefix, int prefixlen,
const struct in6_addr *gwaddr,
struct net_device *dev,
@@ -3362,6 +3494,7 @@
.fc_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_ROUTEINFO |
RTF_UP | RTF_PREF(pref),
.fc_protocol = RTPROT_RA,
+ .fc_type = RTN_UNICAST,
.fc_nlinfo.portid = 0,
.fc_nlinfo.nlh = NULL,
.fc_nlinfo.nl_net = net,
@@ -3375,36 +3508,39 @@
if (!prefixlen)
cfg.fc_flags |= RTF_DEFAULT;
- ip6_route_add(&cfg, NULL);
+ ip6_route_add(&cfg, GFP_ATOMIC, NULL);
return rt6_get_route_info(net, prefix, prefixlen, gwaddr, dev);
}
#endif
-struct rt6_info *rt6_get_dflt_router(const struct in6_addr *addr, struct net_device *dev)
+struct fib6_info *rt6_get_dflt_router(struct net *net,
+ const struct in6_addr *addr,
+ struct net_device *dev)
{
u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT;
- struct rt6_info *rt;
+ struct fib6_info *rt;
struct fib6_table *table;
- table = fib6_get_table(dev_net(dev), tb_id);
+ table = fib6_get_table(net, tb_id);
if (!table)
return NULL;
rcu_read_lock();
for_each_fib6_node_rt_rcu(&table->tb6_root) {
- if (dev == rt->dst.dev &&
- ((rt->rt6i_flags & (RTF_ADDRCONF | RTF_DEFAULT)) == (RTF_ADDRCONF | RTF_DEFAULT)) &&
- ipv6_addr_equal(&rt->rt6i_gateway, addr))
+ if (dev == rt->fib6_nh.nh_dev &&
+ ((rt->fib6_flags & (RTF_ADDRCONF | RTF_DEFAULT)) == (RTF_ADDRCONF | RTF_DEFAULT)) &&
+ ipv6_addr_equal(&rt->fib6_nh.nh_gw, addr))
break;
}
if (rt)
- ip6_hold_safe(NULL, &rt, false);
+ fib6_info_hold(rt);
rcu_read_unlock();
return rt;
}
-struct rt6_info *rt6_add_dflt_router(const struct in6_addr *gwaddr,
+struct fib6_info *rt6_add_dflt_router(struct net *net,
+ const struct in6_addr *gwaddr,
struct net_device *dev,
unsigned int pref)
{
@@ -3415,14 +3551,15 @@
.fc_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT |
RTF_UP | RTF_EXPIRES | RTF_PREF(pref),
.fc_protocol = RTPROT_RA,
+ .fc_type = RTN_UNICAST,
.fc_nlinfo.portid = 0,
.fc_nlinfo.nlh = NULL,
- .fc_nlinfo.nl_net = dev_net(dev),
+ .fc_nlinfo.nl_net = net,
};
cfg.fc_gateway = *gwaddr;
- if (!ip6_route_add(&cfg, NULL)) {
+ if (!ip6_route_add(&cfg, GFP_ATOMIC, NULL)) {
struct fib6_table *table;
table = fib6_get_table(dev_net(dev), cfg.fc_table);
@@ -3430,24 +3567,25 @@
table->flags |= RT6_TABLE_HAS_DFLT_ROUTER;
}
- return rt6_get_dflt_router(gwaddr, dev);
+ return rt6_get_dflt_router(net, gwaddr, dev);
}
-static void __rt6_purge_dflt_routers(struct fib6_table *table)
+static void __rt6_purge_dflt_routers(struct net *net,
+ struct fib6_table *table)
{
- struct rt6_info *rt;
+ struct fib6_info *rt;
restart:
rcu_read_lock();
for_each_fib6_node_rt_rcu(&table->tb6_root) {
- if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF) &&
- (!rt->rt6i_idev || rt->rt6i_idev->cnf.accept_ra != 2)) {
- if (dst_hold_safe(&rt->dst)) {
- rcu_read_unlock();
- ip6_del_rt(rt);
- } else {
- rcu_read_unlock();
- }
+ struct net_device *dev = fib6_info_nh_dev(rt);
+ struct inet6_dev *idev = dev ? __in6_dev_get(dev) : NULL;
+
+ if (rt->fib6_flags & (RTF_DEFAULT | RTF_ADDRCONF) &&
+ (!idev || idev->cnf.accept_ra != 2)) {
+ fib6_info_hold(rt);
+ rcu_read_unlock();
+ ip6_del_rt(net, rt);
goto restart;
}
}
@@ -3468,7 +3606,7 @@
head = &net->ipv6.fib_table_hash[h];
hlist_for_each_entry_rcu(table, head, tb6_hlist) {
if (table->flags & RT6_TABLE_HAS_DFLT_ROUTER)
- __rt6_purge_dflt_routers(table);
+ __rt6_purge_dflt_routers(net, table);
}
}
@@ -3489,6 +3627,7 @@
cfg->fc_dst_len = rtmsg->rtmsg_dst_len;
cfg->fc_src_len = rtmsg->rtmsg_src_len;
cfg->fc_flags = rtmsg->rtmsg_flags;
+ cfg->fc_type = rtmsg->rtmsg_type;
cfg->fc_nlinfo.nl_net = net;
@@ -3518,7 +3657,7 @@
rtnl_lock();
switch (cmd) {
case SIOCADDRT:
- err = ip6_route_add(&cfg, NULL);
+ err = ip6_route_add(&cfg, GFP_KERNEL, NULL);
break;
case SIOCDELRT:
err = ip6_route_del(&cfg, NULL);
@@ -3546,7 +3685,8 @@
case IPSTATS_MIB_INNOROUTES:
type = ipv6_addr_type(&ipv6_hdr(skb)->daddr);
if (type == IPV6_ADDR_ANY) {
- IP6_INC_STATS(dev_net(dst->dev), ip6_dst_idev(dst),
+ IP6_INC_STATS(dev_net(dst->dev),
+ __in6_dev_get_safely(skb->dev),
IPSTATS_MIB_INADDRERRORS);
break;
}
@@ -3587,40 +3727,40 @@
* Allocate a dst for local (unicast / anycast) address.
*/
-struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
- const struct in6_addr *addr,
- bool anycast)
+struct fib6_info *addrconf_f6i_alloc(struct net *net,
+ struct inet6_dev *idev,
+ const struct in6_addr *addr,
+ bool anycast, gfp_t gfp_flags)
{
u32 tb_id;
- struct net *net = dev_net(idev->dev);
struct net_device *dev = idev->dev;
- struct rt6_info *rt;
+ struct fib6_info *f6i;
- rt = ip6_dst_alloc(net, dev, DST_NOCOUNT);
- if (!rt)
+ f6i = fib6_info_alloc(gfp_flags);
+ if (!f6i)
return ERR_PTR(-ENOMEM);
- in6_dev_hold(idev);
+ f6i->dst_nocount = true;
+ f6i->dst_host = true;
+ f6i->fib6_protocol = RTPROT_KERNEL;
+ f6i->fib6_flags = RTF_UP | RTF_NONEXTHOP;
+ if (anycast) {
+ f6i->fib6_type = RTN_ANYCAST;
+ f6i->fib6_flags |= RTF_ANYCAST;
+ } else {
+ f6i->fib6_type = RTN_LOCAL;
+ f6i->fib6_flags |= RTF_LOCAL;
+ }
- rt->dst.flags |= DST_HOST;
- rt->dst.input = ip6_input;
- rt->dst.output = ip6_output;
- rt->rt6i_idev = idev;
-
- rt->rt6i_protocol = RTPROT_KERNEL;
- rt->rt6i_flags = RTF_UP | RTF_NONEXTHOP;
- if (anycast)
- rt->rt6i_flags |= RTF_ANYCAST;
- else
- rt->rt6i_flags |= RTF_LOCAL;
-
- rt->rt6i_gateway = *addr;
- rt->rt6i_dst.addr = *addr;
- rt->rt6i_dst.plen = 128;
+ f6i->fib6_nh.nh_gw = *addr;
+ dev_hold(dev);
+ f6i->fib6_nh.nh_dev = dev;
+ f6i->fib6_dst.addr = *addr;
+ f6i->fib6_dst.plen = 128;
tb_id = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL;
- rt->rt6i_table = fib6_get_table(net, tb_id);
+ f6i->fib6_table = fib6_get_table(net, tb_id);
- return rt;
+ return f6i;
}
/* remove deleted ip from prefsrc entries */
@@ -3630,18 +3770,18 @@
struct in6_addr *addr;
};
-static int fib6_remove_prefsrc(struct rt6_info *rt, void *arg)
+static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg)
{
struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev;
struct net *net = ((struct arg_dev_net_ip *)arg)->net;
struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
- if (((void *)rt->dst.dev == dev || !dev) &&
- rt != net->ipv6.ip6_null_entry &&
- ipv6_addr_equal(addr, &rt->rt6i_prefsrc.addr)) {
+ if (((void *)rt->fib6_nh.nh_dev == dev || !dev) &&
+ rt != net->ipv6.fib6_null_entry &&
+ ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) {
spin_lock_bh(&rt6_exception_lock);
/* remove prefsrc entry */
- rt->rt6i_prefsrc.plen = 0;
+ rt->fib6_prefsrc.plen = 0;
/* need to update cache as well */
rt6_exceptions_remove_prefsrc(rt);
spin_unlock_bh(&rt6_exception_lock);
@@ -3663,12 +3803,12 @@
#define RTF_RA_ROUTER (RTF_ADDRCONF | RTF_DEFAULT | RTF_GATEWAY)
/* Remove routers and update dst entries when gateway turn into host. */
-static int fib6_clean_tohost(struct rt6_info *rt, void *arg)
+static int fib6_clean_tohost(struct fib6_info *rt, void *arg)
{
struct in6_addr *gateway = (struct in6_addr *)arg;
- if (((rt->rt6i_flags & RTF_RA_ROUTER) == RTF_RA_ROUTER) &&
- ipv6_addr_equal(gateway, &rt->rt6i_gateway)) {
+ if (((rt->fib6_flags & RTF_RA_ROUTER) == RTF_RA_ROUTER) &&
+ ipv6_addr_equal(gateway, &rt->fib6_nh.nh_gw)) {
return -1;
}
@@ -3694,85 +3834,85 @@
};
};
-static struct rt6_info *rt6_multipath_first_sibling(const struct rt6_info *rt)
+static struct fib6_info *rt6_multipath_first_sibling(const struct fib6_info *rt)
{
- struct rt6_info *iter;
+ struct fib6_info *iter;
struct fib6_node *fn;
- fn = rcu_dereference_protected(rt->rt6i_node,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ fn = rcu_dereference_protected(rt->fib6_node,
+ lockdep_is_held(&rt->fib6_table->tb6_lock));
iter = rcu_dereference_protected(fn->leaf,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ lockdep_is_held(&rt->fib6_table->tb6_lock));
while (iter) {
- if (iter->rt6i_metric == rt->rt6i_metric &&
- rt6_qualify_for_ecmp(iter))
+ if (iter->fib6_metric == rt->fib6_metric &&
+ iter->fib6_nsiblings)
return iter;
- iter = rcu_dereference_protected(iter->rt6_next,
- lockdep_is_held(&rt->rt6i_table->tb6_lock));
+ iter = rcu_dereference_protected(iter->fib6_next,
+ lockdep_is_held(&rt->fib6_table->tb6_lock));
}
return NULL;
}
-static bool rt6_is_dead(const struct rt6_info *rt)
+static bool rt6_is_dead(const struct fib6_info *rt)
{
- if (rt->rt6i_nh_flags & RTNH_F_DEAD ||
- (rt->rt6i_nh_flags & RTNH_F_LINKDOWN &&
- rt->rt6i_idev->cnf.ignore_routes_with_linkdown))
+ if (rt->fib6_nh.nh_flags & RTNH_F_DEAD ||
+ (rt->fib6_nh.nh_flags & RTNH_F_LINKDOWN &&
+ fib6_ignore_linkdown(rt)))
return true;
return false;
}
-static int rt6_multipath_total_weight(const struct rt6_info *rt)
+static int rt6_multipath_total_weight(const struct fib6_info *rt)
{
- struct rt6_info *iter;
+ struct fib6_info *iter;
int total = 0;
if (!rt6_is_dead(rt))
- total += rt->rt6i_nh_weight;
+ total += rt->fib6_nh.nh_weight;
- list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings) {
+ list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings) {
if (!rt6_is_dead(iter))
- total += iter->rt6i_nh_weight;
+ total += iter->fib6_nh.nh_weight;
}
return total;
}
-static void rt6_upper_bound_set(struct rt6_info *rt, int *weight, int total)
+static void rt6_upper_bound_set(struct fib6_info *rt, int *weight, int total)
{
int upper_bound = -1;
if (!rt6_is_dead(rt)) {
- *weight += rt->rt6i_nh_weight;
+ *weight += rt->fib6_nh.nh_weight;
upper_bound = DIV_ROUND_CLOSEST_ULL((u64) (*weight) << 31,
total) - 1;
}
- atomic_set(&rt->rt6i_nh_upper_bound, upper_bound);
+ atomic_set(&rt->fib6_nh.nh_upper_bound, upper_bound);
}
-static void rt6_multipath_upper_bound_set(struct rt6_info *rt, int total)
+static void rt6_multipath_upper_bound_set(struct fib6_info *rt, int total)
{
- struct rt6_info *iter;
+ struct fib6_info *iter;
int weight = 0;
rt6_upper_bound_set(rt, &weight, total);
- list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+ list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
rt6_upper_bound_set(iter, &weight, total);
}
-void rt6_multipath_rebalance(struct rt6_info *rt)
+void rt6_multipath_rebalance(struct fib6_info *rt)
{
- struct rt6_info *first;
+ struct fib6_info *first;
int total;
/* In case the entire multipath route was marked for flushing,
* then there is no need to rebalance upon the removal of every
* sibling route.
*/
- if (!rt->rt6i_nsiblings || rt->should_flush)
+ if (!rt->fib6_nsiblings || rt->should_flush)
return;
/* During lookup routes are evaluated in order, so we need to
@@ -3787,14 +3927,14 @@
rt6_multipath_upper_bound_set(first, total);
}
-static int fib6_ifup(struct rt6_info *rt, void *p_arg)
+static int fib6_ifup(struct fib6_info *rt, void *p_arg)
{
const struct arg_netdev_event *arg = p_arg;
- const struct net *net = dev_net(arg->dev);
+ struct net *net = dev_net(arg->dev);
- if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev) {
- rt->rt6i_nh_flags &= ~arg->nh_flags;
- fib6_update_sernum_upto_root(dev_net(rt->dst.dev), rt);
+ if (rt != net->ipv6.fib6_null_entry && rt->fib6_nh.nh_dev == arg->dev) {
+ rt->fib6_nh.nh_flags &= ~arg->nh_flags;
+ fib6_update_sernum_upto_root(net, rt);
rt6_multipath_rebalance(rt);
}
@@ -3816,95 +3956,96 @@
fib6_clean_all(dev_net(dev), fib6_ifup, &arg);
}
-static bool rt6_multipath_uses_dev(const struct rt6_info *rt,
+static bool rt6_multipath_uses_dev(const struct fib6_info *rt,
const struct net_device *dev)
{
- struct rt6_info *iter;
+ struct fib6_info *iter;
- if (rt->dst.dev == dev)
+ if (rt->fib6_nh.nh_dev == dev)
return true;
- list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
- if (iter->dst.dev == dev)
+ list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
+ if (iter->fib6_nh.nh_dev == dev)
return true;
return false;
}
-static void rt6_multipath_flush(struct rt6_info *rt)
+static void rt6_multipath_flush(struct fib6_info *rt)
{
- struct rt6_info *iter;
+ struct fib6_info *iter;
rt->should_flush = 1;
- list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+ list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
iter->should_flush = 1;
}
-static unsigned int rt6_multipath_dead_count(const struct rt6_info *rt,
+static unsigned int rt6_multipath_dead_count(const struct fib6_info *rt,
const struct net_device *down_dev)
{
- struct rt6_info *iter;
+ struct fib6_info *iter;
unsigned int dead = 0;
- if (rt->dst.dev == down_dev || rt->rt6i_nh_flags & RTNH_F_DEAD)
+ if (rt->fib6_nh.nh_dev == down_dev ||
+ rt->fib6_nh.nh_flags & RTNH_F_DEAD)
dead++;
- list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
- if (iter->dst.dev == down_dev ||
- iter->rt6i_nh_flags & RTNH_F_DEAD)
+ list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
+ if (iter->fib6_nh.nh_dev == down_dev ||
+ iter->fib6_nh.nh_flags & RTNH_F_DEAD)
dead++;
return dead;
}
-static void rt6_multipath_nh_flags_set(struct rt6_info *rt,
+static void rt6_multipath_nh_flags_set(struct fib6_info *rt,
const struct net_device *dev,
unsigned int nh_flags)
{
- struct rt6_info *iter;
+ struct fib6_info *iter;
- if (rt->dst.dev == dev)
- rt->rt6i_nh_flags |= nh_flags;
- list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
- if (iter->dst.dev == dev)
- iter->rt6i_nh_flags |= nh_flags;
+ if (rt->fib6_nh.nh_dev == dev)
+ rt->fib6_nh.nh_flags |= nh_flags;
+ list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)
+ if (iter->fib6_nh.nh_dev == dev)
+ iter->fib6_nh.nh_flags |= nh_flags;
}
/* called with write lock held for table with rt */
-static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
+static int fib6_ifdown(struct fib6_info *rt, void *p_arg)
{
const struct arg_netdev_event *arg = p_arg;
const struct net_device *dev = arg->dev;
- const struct net *net = dev_net(dev);
+ struct net *net = dev_net(dev);
- if (rt == net->ipv6.ip6_null_entry)
+ if (rt == net->ipv6.fib6_null_entry)
return 0;
switch (arg->event) {
case NETDEV_UNREGISTER:
- return rt->dst.dev == dev ? -1 : 0;
+ return rt->fib6_nh.nh_dev == dev ? -1 : 0;
case NETDEV_DOWN:
if (rt->should_flush)
return -1;
- if (!rt->rt6i_nsiblings)
- return rt->dst.dev == dev ? -1 : 0;
+ if (!rt->fib6_nsiblings)
+ return rt->fib6_nh.nh_dev == dev ? -1 : 0;
if (rt6_multipath_uses_dev(rt, dev)) {
unsigned int count;
count = rt6_multipath_dead_count(rt, dev);
- if (rt->rt6i_nsiblings + 1 == count) {
+ if (rt->fib6_nsiblings + 1 == count) {
rt6_multipath_flush(rt);
return -1;
}
rt6_multipath_nh_flags_set(rt, dev, RTNH_F_DEAD |
RTNH_F_LINKDOWN);
- fib6_update_sernum(rt);
+ fib6_update_sernum(net, rt);
rt6_multipath_rebalance(rt);
}
return -2;
case NETDEV_CHANGE:
- if (rt->dst.dev != dev ||
- rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
+ if (rt->fib6_nh.nh_dev != dev ||
+ rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST))
break;
- rt->rt6i_nh_flags |= RTNH_F_LINKDOWN;
+ rt->fib6_nh.nh_flags |= RTNH_F_LINKDOWN;
rt6_multipath_rebalance(rt);
break;
}
@@ -3936,7 +4077,7 @@
unsigned int mtu;
};
-static int rt6_mtu_change_route(struct rt6_info *rt, void *p_arg)
+static int rt6_mtu_change_route(struct fib6_info *rt, void *p_arg)
{
struct rt6_mtu_change_arg *arg = (struct rt6_mtu_change_arg *) p_arg;
struct inet6_dev *idev;
@@ -3956,12 +4097,15 @@
Since RFC 1981 doesn't include administrative MTU increase
update PMTU increase is a MUST. (i.e. jumbo frame)
*/
- if (rt->dst.dev == arg->dev &&
- !dst_metric_locked(&rt->dst, RTAX_MTU)) {
+ if (rt->fib6_nh.nh_dev == arg->dev &&
+ !fib6_metric_locked(rt, RTAX_MTU)) {
+ u32 mtu = rt->fib6_pmtu;
+
+ if (mtu >= arg->mtu ||
+ (mtu < arg->mtu && mtu == idev->cnf.mtu6))
+ fib6_metric_set(rt, RTAX_MTU, arg->mtu);
+
spin_lock_bh(&rt6_exception_lock);
- if (dst_metric_raw(&rt->dst, RTAX_MTU) &&
- rt6_mtu_change_route_allowed(idev, rt, arg->mtu))
- dst_metric_set(&rt->dst, RTAX_MTU, arg->mtu);
rt6_exceptions_update_pmtu(idev, rt, arg->mtu);
spin_unlock_bh(&rt6_exception_lock);
}
@@ -3993,6 +4137,9 @@
[RTA_UID] = { .type = NLA_U32 },
[RTA_MARK] = { .type = NLA_U32 },
[RTA_TABLE] = { .type = NLA_U32 },
+ [RTA_IP_PROTO] = { .type = NLA_U8 },
+ [RTA_SPORT] = { .type = NLA_U16 },
+ [RTA_DPORT] = { .type = NLA_U16 },
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -4122,9 +4269,8 @@
}
struct rt6_nh {
- struct rt6_info *rt6_info;
+ struct fib6_info *fib6_info;
struct fib6_config r_cfg;
- struct mx6_config mxc;
struct list_head next;
};
@@ -4139,23 +4285,25 @@
}
}
-static int ip6_route_info_append(struct list_head *rt6_nh_list,
- struct rt6_info *rt, struct fib6_config *r_cfg)
+static int ip6_route_info_append(struct net *net,
+ struct list_head *rt6_nh_list,
+ struct fib6_info *rt,
+ struct fib6_config *r_cfg)
{
struct rt6_nh *nh;
int err = -EEXIST;
list_for_each_entry(nh, rt6_nh_list, next) {
- /* check if rt6_info already exists */
- if (rt6_duplicate_nexthop(nh->rt6_info, rt))
+ /* check if fib6_info already exists */
+ if (rt6_duplicate_nexthop(nh->fib6_info, rt))
return err;
}
nh = kzalloc(sizeof(*nh), GFP_KERNEL);
if (!nh)
return -ENOMEM;
- nh->rt6_info = rt;
- err = ip6_convert_metrics(&nh->mxc, r_cfg);
+ nh->fib6_info = rt;
+ err = ip6_convert_metrics(net, rt, r_cfg);
if (err) {
kfree(nh);
return err;
@@ -4166,8 +4314,8 @@
return 0;
}
-static void ip6_route_mpath_notify(struct rt6_info *rt,
- struct rt6_info *rt_last,
+static void ip6_route_mpath_notify(struct fib6_info *rt,
+ struct fib6_info *rt_last,
struct nl_info *info,
__u16 nlflags)
{
@@ -4177,10 +4325,10 @@
* nexthop. Since sibling routes are always added at the end of
* the list, find the first sibling of the last route appended
*/
- if ((nlflags & NLM_F_APPEND) && rt_last && rt_last->rt6i_nsiblings) {
- rt = list_first_entry(&rt_last->rt6i_siblings,
- struct rt6_info,
- rt6i_siblings);
+ if ((nlflags & NLM_F_APPEND) && rt_last && rt_last->fib6_nsiblings) {
+ rt = list_first_entry(&rt_last->fib6_siblings,
+ struct fib6_info,
+ fib6_siblings);
}
if (rt)
@@ -4190,11 +4338,11 @@
static int ip6_route_multipath_add(struct fib6_config *cfg,
struct netlink_ext_ack *extack)
{
- struct rt6_info *rt_notif = NULL, *rt_last = NULL;
+ struct fib6_info *rt_notif = NULL, *rt_last = NULL;
struct nl_info *info = &cfg->fc_nlinfo;
struct fib6_config r_cfg;
struct rtnexthop *rtnh;
- struct rt6_info *rt;
+ struct fib6_info *rt;
struct rt6_nh *err_nh;
struct rt6_nh *nh, *nh_safe;
__u16 nlflags;
@@ -4214,7 +4362,7 @@
rtnh = (struct rtnexthop *)cfg->fc_mp;
/* Parse a Multipath Entry and build a list (rt6_nh_list) of
- * rt6_info structs per nexthop
+ * fib6_info structs per nexthop
*/
while (rtnh_ok(rtnh, remaining)) {
memcpy(&r_cfg, cfg, sizeof(*cfg));
@@ -4237,18 +4385,19 @@
}
r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK);
- rt = ip6_route_info_create(&r_cfg, extack);
+ rt = ip6_route_info_create(&r_cfg, GFP_KERNEL, extack);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
rt = NULL;
goto cleanup;
}
- rt->rt6i_nh_weight = rtnh->rtnh_hops + 1;
+ rt->fib6_nh.nh_weight = rtnh->rtnh_hops + 1;
- err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
+ err = ip6_route_info_append(info->nl_net, &rt6_nh_list,
+ rt, &r_cfg);
if (err) {
- dst_release_immediate(&rt->dst);
+ fib6_info_release(rt);
goto cleanup;
}
@@ -4263,14 +4412,16 @@
err_nh = NULL;
list_for_each_entry(nh, &rt6_nh_list, next) {
- rt_last = nh->rt6_info;
- err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc, extack);
+ rt_last = nh->fib6_info;
+ err = __ip6_ins_rt(nh->fib6_info, info, extack);
+ fib6_info_release(nh->fib6_info);
+
/* save reference to first route for notification */
if (!rt_notif && !err)
- rt_notif = nh->rt6_info;
+ rt_notif = nh->fib6_info;
- /* nh->rt6_info is used or freed at this point, reset to NULL*/
- nh->rt6_info = NULL;
+ /* nh->fib6_info is used or freed at this point, reset to NULL*/
+ nh->fib6_info = NULL;
if (err) {
if (replace && nhn)
ip6_print_replace_route_err(&rt6_nh_list);
@@ -4287,6 +4438,7 @@
*/
cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL |
NLM_F_REPLACE);
+ cfg->fc_nlinfo.nlh->nlmsg_flags |= NLM_F_APPEND;
nhn++;
}
@@ -4311,9 +4463,8 @@
cleanup:
list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
- if (nh->rt6_info)
- dst_release_immediate(&nh->rt6_info->dst);
- kfree(nh->mxc.mx);
+ if (nh->fib6_info)
+ fib6_info_release(nh->fib6_info);
list_del(&nh->next);
kfree(nh);
}
@@ -4390,20 +4541,20 @@
if (cfg.fc_mp)
return ip6_route_multipath_add(&cfg, extack);
else
- return ip6_route_add(&cfg, extack);
+ return ip6_route_add(&cfg, GFP_KERNEL, extack);
}
-static size_t rt6_nlmsg_size(struct rt6_info *rt)
+static size_t rt6_nlmsg_size(struct fib6_info *rt)
{
int nexthop_len = 0;
- if (rt->rt6i_nsiblings) {
+ if (rt->fib6_nsiblings) {
nexthop_len = nla_total_size(0) /* RTA_MULTIPATH */
+ NLA_ALIGN(sizeof(struct rtnexthop))
+ nla_total_size(16) /* RTA_GATEWAY */
- + lwtunnel_get_encap_size(rt->dst.lwtstate);
+ + lwtunnel_get_encap_size(rt->fib6_nh.nh_lwtstate);
- nexthop_len *= rt->rt6i_nsiblings;
+ nexthop_len *= rt->fib6_nsiblings;
}
return NLMSG_ALIGN(sizeof(struct rtmsg))
@@ -4419,38 +4570,41 @@
+ nla_total_size(sizeof(struct rta_cacheinfo))
+ nla_total_size(TCP_CA_NAME_MAX) /* RTAX_CC_ALGO */
+ nla_total_size(1) /* RTA_PREF */
- + lwtunnel_get_encap_size(rt->dst.lwtstate)
+ + lwtunnel_get_encap_size(rt->fib6_nh.nh_lwtstate)
+ nexthop_len;
}
-static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
+static int rt6_nexthop_info(struct sk_buff *skb, struct fib6_info *rt,
unsigned int *flags, bool skip_oif)
{
- if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+ if (rt->fib6_nh.nh_flags & RTNH_F_DEAD)
*flags |= RTNH_F_DEAD;
- if (rt->rt6i_nh_flags & RTNH_F_LINKDOWN) {
+ if (rt->fib6_nh.nh_flags & RTNH_F_LINKDOWN) {
*flags |= RTNH_F_LINKDOWN;
- if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+
+ rcu_read_lock();
+ if (fib6_ignore_linkdown(rt))
*flags |= RTNH_F_DEAD;
+ rcu_read_unlock();
}
- if (rt->rt6i_flags & RTF_GATEWAY) {
- if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0)
+ if (rt->fib6_flags & RTF_GATEWAY) {
+ if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->fib6_nh.nh_gw) < 0)
goto nla_put_failure;
}
- *flags |= (rt->rt6i_nh_flags & RTNH_F_ONLINK);
- if (rt->rt6i_nh_flags & RTNH_F_OFFLOAD)
+ *flags |= (rt->fib6_nh.nh_flags & RTNH_F_ONLINK);
+ if (rt->fib6_nh.nh_flags & RTNH_F_OFFLOAD)
*flags |= RTNH_F_OFFLOAD;
/* not needed for multipath encoding b/c it has a rtnexthop struct */
- if (!skip_oif && rt->dst.dev &&
- nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
+ if (!skip_oif && rt->fib6_nh.nh_dev &&
+ nla_put_u32(skb, RTA_OIF, rt->fib6_nh.nh_dev->ifindex))
goto nla_put_failure;
- if (rt->dst.lwtstate &&
- lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0)
+ if (rt->fib6_nh.nh_lwtstate &&
+ lwtunnel_fill_encap(skb, rt->fib6_nh.nh_lwtstate) < 0)
goto nla_put_failure;
return 0;
@@ -4460,8 +4614,9 @@
}
/* add multipath next hop */
-static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt)
+static int rt6_add_nexthop(struct sk_buff *skb, struct fib6_info *rt)
{
+ const struct net_device *dev = rt->fib6_nh.nh_dev;
struct rtnexthop *rtnh;
unsigned int flags = 0;
@@ -4469,8 +4624,8 @@
if (!rtnh)
goto nla_put_failure;
- rtnh->rtnh_hops = rt->rt6i_nh_weight - 1;
- rtnh->rtnh_ifindex = rt->dst.dev ? rt->dst.dev->ifindex : 0;
+ rtnh->rtnh_hops = rt->fib6_nh.nh_weight - 1;
+ rtnh->rtnh_ifindex = dev ? dev->ifindex : 0;
if (rt6_nexthop_info(skb, rt, &flags, true) < 0)
goto nla_put_failure;
@@ -4486,16 +4641,16 @@
return -EMSGSIZE;
}
-static int rt6_fill_node(struct net *net,
- struct sk_buff *skb, struct rt6_info *rt,
- struct in6_addr *dst, struct in6_addr *src,
+static int rt6_fill_node(struct net *net, struct sk_buff *skb,
+ struct fib6_info *rt, struct dst_entry *dst,
+ struct in6_addr *dest, struct in6_addr *src,
int iif, int type, u32 portid, u32 seq,
unsigned int flags)
{
- u32 metrics[RTAX_MAX];
struct rtmsg *rtm;
struct nlmsghdr *nlh;
- long expires;
+ long expires = 0;
+ u32 *pmetrics;
u32 table;
nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), flags);
@@ -4504,53 +4659,31 @@
rtm = nlmsg_data(nlh);
rtm->rtm_family = AF_INET6;
- rtm->rtm_dst_len = rt->rt6i_dst.plen;
- rtm->rtm_src_len = rt->rt6i_src.plen;
+ rtm->rtm_dst_len = rt->fib6_dst.plen;
+ rtm->rtm_src_len = rt->fib6_src.plen;
rtm->rtm_tos = 0;
- if (rt->rt6i_table)
- table = rt->rt6i_table->tb6_id;
+ if (rt->fib6_table)
+ table = rt->fib6_table->tb6_id;
else
table = RT6_TABLE_UNSPEC;
rtm->rtm_table = table;
if (nla_put_u32(skb, RTA_TABLE, table))
goto nla_put_failure;
- if (rt->rt6i_flags & RTF_REJECT) {
- switch (rt->dst.error) {
- case -EINVAL:
- rtm->rtm_type = RTN_BLACKHOLE;
- break;
- case -EACCES:
- rtm->rtm_type = RTN_PROHIBIT;
- break;
- case -EAGAIN:
- rtm->rtm_type = RTN_THROW;
- break;
- default:
- rtm->rtm_type = RTN_UNREACHABLE;
- break;
- }
- }
- else if (rt->rt6i_flags & RTF_LOCAL)
- rtm->rtm_type = RTN_LOCAL;
- else if (rt->rt6i_flags & RTF_ANYCAST)
- rtm->rtm_type = RTN_ANYCAST;
- else if (rt->dst.dev && (rt->dst.dev->flags & IFF_LOOPBACK))
- rtm->rtm_type = RTN_LOCAL;
- else
- rtm->rtm_type = RTN_UNICAST;
+
+ rtm->rtm_type = rt->fib6_type;
rtm->rtm_flags = 0;
rtm->rtm_scope = RT_SCOPE_UNIVERSE;
- rtm->rtm_protocol = rt->rt6i_protocol;
+ rtm->rtm_protocol = rt->fib6_protocol;
- if (rt->rt6i_flags & RTF_CACHE)
+ if (rt->fib6_flags & RTF_CACHE)
rtm->rtm_flags |= RTM_F_CLONED;
- if (dst) {
- if (nla_put_in6_addr(skb, RTA_DST, dst))
+ if (dest) {
+ if (nla_put_in6_addr(skb, RTA_DST, dest))
goto nla_put_failure;
rtm->rtm_dst_len = 128;
} else if (rtm->rtm_dst_len)
- if (nla_put_in6_addr(skb, RTA_DST, &rt->rt6i_dst.addr))
+ if (nla_put_in6_addr(skb, RTA_DST, &rt->fib6_dst.addr))
goto nla_put_failure;
#ifdef CONFIG_IPV6_SUBTREES
if (src) {
@@ -4558,12 +4691,12 @@
goto nla_put_failure;
rtm->rtm_src_len = 128;
} else if (rtm->rtm_src_len &&
- nla_put_in6_addr(skb, RTA_SRC, &rt->rt6i_src.addr))
+ nla_put_in6_addr(skb, RTA_SRC, &rt->fib6_src.addr))
goto nla_put_failure;
#endif
if (iif) {
#ifdef CONFIG_IPV6_MROUTE
- if (ipv6_addr_is_multicast(&rt->rt6i_dst.addr)) {
+ if (ipv6_addr_is_multicast(&rt->fib6_dst.addr)) {
int err = ip6mr_get_route(net, skb, rtm, portid);
if (err == 0)
@@ -4574,34 +4707,32 @@
#endif
if (nla_put_u32(skb, RTA_IIF, iif))
goto nla_put_failure;
- } else if (dst) {
+ } else if (dest) {
struct in6_addr saddr_buf;
- if (ip6_route_get_saddr(net, rt, dst, 0, &saddr_buf) == 0 &&
+ if (ip6_route_get_saddr(net, rt, dest, 0, &saddr_buf) == 0 &&
nla_put_in6_addr(skb, RTA_PREFSRC, &saddr_buf))
goto nla_put_failure;
}
- if (rt->rt6i_prefsrc.plen) {
+ if (rt->fib6_prefsrc.plen) {
struct in6_addr saddr_buf;
- saddr_buf = rt->rt6i_prefsrc.addr;
+ saddr_buf = rt->fib6_prefsrc.addr;
if (nla_put_in6_addr(skb, RTA_PREFSRC, &saddr_buf))
goto nla_put_failure;
}
- memcpy(metrics, dst_metrics_ptr(&rt->dst), sizeof(metrics));
- if (rt->rt6i_pmtu)
- metrics[RTAX_MTU - 1] = rt->rt6i_pmtu;
- if (rtnetlink_put_metrics(skb, metrics) < 0)
+ pmetrics = dst ? dst_metrics_ptr(dst) : rt->fib6_metrics->metrics;
+ if (rtnetlink_put_metrics(skb, pmetrics) < 0)
goto nla_put_failure;
- if (nla_put_u32(skb, RTA_PRIORITY, rt->rt6i_metric))
+ if (nla_put_u32(skb, RTA_PRIORITY, rt->fib6_metric))
goto nla_put_failure;
/* For multipath routes, walk the siblings list and add
* each as a nexthop within RTA_MULTIPATH.
*/
- if (rt->rt6i_nsiblings) {
- struct rt6_info *sibling, *next_sibling;
+ if (rt->fib6_nsiblings) {
+ struct fib6_info *sibling, *next_sibling;
struct nlattr *mp;
mp = nla_nest_start(skb, RTA_MULTIPATH);
@@ -4612,7 +4743,7 @@
goto nla_put_failure;
list_for_each_entry_safe(sibling, next_sibling,
- &rt->rt6i_siblings, rt6i_siblings) {
+ &rt->fib6_siblings, fib6_siblings) {
if (rt6_add_nexthop(skb, sibling) < 0)
goto nla_put_failure;
}
@@ -4623,12 +4754,15 @@
goto nla_put_failure;
}
- expires = (rt->rt6i_flags & RTF_EXPIRES) ? rt->dst.expires - jiffies : 0;
+ if (rt->fib6_flags & RTF_EXPIRES) {
+ expires = dst ? dst->expires : rt->expires;
+ expires -= jiffies;
+ }
- if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, rt->dst.error) < 0)
+ if (rtnl_put_cacheinfo(skb, dst, 0, expires, dst ? dst->error : 0) < 0)
goto nla_put_failure;
- if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt->rt6i_flags)))
+ if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt->fib6_flags)))
goto nla_put_failure;
@@ -4640,12 +4774,12 @@
return -EMSGSIZE;
}
-int rt6_dump_route(struct rt6_info *rt, void *p_arg)
+int rt6_dump_route(struct fib6_info *rt, void *p_arg)
{
struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg;
struct net *net = arg->net;
- if (rt == net->ipv6.ip6_null_entry)
+ if (rt == net->ipv6.fib6_null_entry)
return 0;
if (nlmsg_len(arg->cb->nlh) >= sizeof(struct rtmsg)) {
@@ -4653,16 +4787,15 @@
/* user wants prefix routes only */
if (rtm->rtm_flags & RTM_F_PREFIX &&
- !(rt->rt6i_flags & RTF_PREFIX_RT)) {
+ !(rt->fib6_flags & RTF_PREFIX_RT)) {
/* success since this is not a prefix route */
return 1;
}
}
- return rt6_fill_node(net,
- arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
- NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq,
- NLM_F_MULTI);
+ return rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, 0,
+ RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid,
+ arg->cb->nlh->nlmsg_seq, NLM_F_MULTI);
}
static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
@@ -4671,6 +4804,7 @@
struct net *net = sock_net(in_skb->sk);
struct nlattr *tb[RTA_MAX+1];
int err, iif = 0, oif = 0;
+ struct fib6_info *from;
struct dst_entry *dst;
struct rt6_info *rt;
struct sk_buff *skb;
@@ -4718,6 +4852,19 @@
else
fl6.flowi6_uid = iif ? INVALID_UID : current_uid();
+ if (tb[RTA_SPORT])
+ fl6.fl6_sport = nla_get_be16(tb[RTA_SPORT]);
+
+ if (tb[RTA_DPORT])
+ fl6.fl6_dport = nla_get_be16(tb[RTA_DPORT]);
+
+ if (tb[RTA_IP_PROTO]) {
+ err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+ &fl6.flowi6_proto, extack);
+ if (err)
+ goto errout;
+ }
+
if (iif) {
struct net_device *dev;
int flags = 0;
@@ -4759,14 +4906,6 @@
goto errout;
}
- if (fibmatch && rt->from) {
- struct rt6_info *ort = rt->from;
-
- dst_hold(&ort->dst);
- ip6_rt_put(rt);
- rt = ort;
- }
-
skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb) {
ip6_rt_put(rt);
@@ -4775,14 +4914,21 @@
}
skb_dst_set(skb, &rt->dst);
+
+ rcu_read_lock();
+ from = rcu_dereference(rt->from);
+
if (fibmatch)
- err = rt6_fill_node(net, skb, rt, NULL, NULL, iif,
+ err = rt6_fill_node(net, skb, from, NULL, NULL, NULL, iif,
RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
nlh->nlmsg_seq, 0);
else
- err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif,
- RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
- nlh->nlmsg_seq, 0);
+ err = rt6_fill_node(net, skb, from, dst, &fl6.daddr,
+ &fl6.saddr, iif, RTM_NEWROUTE,
+ NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
+ 0);
+ rcu_read_unlock();
+
if (err < 0) {
kfree_skb(skb);
goto errout;
@@ -4793,7 +4939,7 @@
return err;
}
-void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info,
+void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
unsigned int nlm_flags)
{
struct sk_buff *skb;
@@ -4808,8 +4954,8 @@
if (!skb)
goto errout;
- err = rt6_fill_node(net, skb, rt, NULL, NULL, 0,
- event, info->portid, seq, nlm_flags);
+ err = rt6_fill_node(net, skb, rt, NULL, NULL, NULL, 0,
+ event, info->portid, seq, nlm_flags);
if (err < 0) {
/* -EMSGSIZE implies BUG in rt6_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
@@ -4834,6 +4980,7 @@
return NOTIFY_OK;
if (event == NETDEV_REGISTER) {
+ net->ipv6.fib6_null_entry->fib6_nh.nh_dev = dev;
net->ipv6.ip6_null_entry->dst.dev = dev;
net->ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(dev);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
@@ -5030,11 +5177,17 @@
if (dst_entries_init(&net->ipv6.ip6_dst_ops) < 0)
goto out_ip6_dst_ops;
+ net->ipv6.fib6_null_entry = kmemdup(&fib6_null_entry_template,
+ sizeof(*net->ipv6.fib6_null_entry),
+ GFP_KERNEL);
+ if (!net->ipv6.fib6_null_entry)
+ goto out_ip6_dst_entries;
+
net->ipv6.ip6_null_entry = kmemdup(&ip6_null_entry_template,
sizeof(*net->ipv6.ip6_null_entry),
GFP_KERNEL);
if (!net->ipv6.ip6_null_entry)
- goto out_ip6_dst_entries;
+ goto out_fib6_null_entry;
net->ipv6.ip6_null_entry->dst.ops = &net->ipv6.ip6_dst_ops;
dst_init_metrics(&net->ipv6.ip6_null_entry->dst,
ip6_template_metrics, true);
@@ -5081,6 +5234,8 @@
out_ip6_null_entry:
kfree(net->ipv6.ip6_null_entry);
#endif
+out_fib6_null_entry:
+ kfree(net->ipv6.fib6_null_entry);
out_ip6_dst_entries:
dst_entries_destroy(&net->ipv6.ip6_dst_ops);
out_ip6_dst_ops:
@@ -5089,6 +5244,7 @@
static void __net_exit ip6_route_net_exit(struct net *net)
{
+ kfree(net->ipv6.fib6_null_entry);
kfree(net->ipv6.ip6_null_entry);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
kfree(net->ipv6.ip6_prohibit_entry);
@@ -5159,6 +5315,7 @@
/* Registering of the loopback is done before this portion of code,
* the loopback reference in rt6_info will not be taken, do it
* manually for init_net */
+ init_net.ipv6.fib6_null_entry->fib6_nh.nh_dev = init_net.loopback_dev;
init_net.ipv6.ip6_null_entry->dst.dev = init_net.loopback_dev;
init_net.ipv6.ip6_null_entry->rt6i_idev = in6_dev_get(init_net.loopback_dev);
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 5fe1394..eab39bd 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -91,6 +91,24 @@
rcu_read_unlock();
}
+/* Compute flowlabel for outer IPv6 header */
+static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb,
+ struct ipv6hdr *inner_hdr)
+{
+ int do_flowlabel = net->ipv6.sysctl.seg6_flowlabel;
+ __be32 flowlabel = 0;
+ u32 hash;
+
+ if (do_flowlabel > 0) {
+ hash = skb_get_hash(skb);
+ rol32(hash, 16);
+ flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK;
+ } else if (!do_flowlabel && skb->protocol == htons(ETH_P_IPV6)) {
+ flowlabel = ip6_flowlabel(inner_hdr);
+ }
+ return flowlabel;
+}
+
/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
{
@@ -99,6 +117,7 @@
struct ipv6hdr *hdr, *inner_hdr;
struct ipv6_sr_hdr *isrh;
int hdrlen, tot_len, err;
+ __be32 flowlabel;
hdrlen = (osrh->hdrlen + 1) << 3;
tot_len = hdrlen + sizeof(*hdr);
@@ -108,6 +127,7 @@
return err;
inner_hdr = ipv6_hdr(skb);
+ flowlabel = seg6_make_flowlabel(net, skb, inner_hdr);
skb_push(skb, tot_len);
skb_reset_network_header(skb);
@@ -121,10 +141,10 @@
if (skb->protocol == htons(ETH_P_IPV6)) {
ip6_flow_hdr(hdr, ip6_tclass(ip6_flowinfo(inner_hdr)),
- ip6_flowlabel(inner_hdr));
+ flowlabel);
hdr->hop_limit = inner_hdr->hop_limit;
} else {
- ip6_flow_hdr(hdr, 0, 0);
+ ip6_flow_hdr(hdr, 0, flowlabel);
hdr->hop_limit = ip6_dst_hoplimit(skb_dst(skb));
}
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 4572232..cd6e4ca 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -1,8 +1,9 @@
/*
* SR-IPv6 implementation
*
- * Author:
+ * Authors:
* David Lebrun <david.lebrun@uclouvain.be>
+ * eBPF support: Mathieu Xhonneux <m.xhonneux@gmail.com>
*
*
* This program is free software; you can redistribute it and/or
@@ -30,7 +31,9 @@
#ifdef CONFIG_IPV6_SEG6_HMAC
#include <net/seg6_hmac.h>
#endif
+#include <net/seg6_local.h>
#include <linux/etherdevice.h>
+#include <linux/bpf.h>
struct seg6_local_lwt;
@@ -41,6 +44,11 @@
int static_headroom;
};
+struct bpf_lwt_prog {
+ struct bpf_prog *prog;
+ char *name;
+};
+
struct seg6_local_lwt {
int action;
struct ipv6_sr_hdr *srh;
@@ -49,6 +57,7 @@
struct in6_addr nh6;
int iif;
int oif;
+ struct bpf_lwt_prog bpf;
int headroom;
struct seg6_action_desc *desc;
@@ -140,8 +149,8 @@
*daddr = *addr;
}
-static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
- u32 tbl_id)
+int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+ u32 tbl_id)
{
struct net *net = dev_net(skb->dev);
struct ipv6hdr *hdr = ipv6_hdr(skb);
@@ -187,6 +196,7 @@
skb_dst_drop(skb);
skb_dst_set(skb, dst);
+ return dst->error;
}
/* regular endpoint function */
@@ -200,7 +210,7 @@
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
- lookup_nexthop(skb, NULL, 0);
+ seg6_lookup_nexthop(skb, NULL, 0);
return dst_input(skb);
@@ -220,7 +230,7 @@
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
- lookup_nexthop(skb, &slwt->nh6, 0);
+ seg6_lookup_nexthop(skb, &slwt->nh6, 0);
return dst_input(skb);
@@ -239,7 +249,7 @@
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
- lookup_nexthop(skb, NULL, slwt->table);
+ seg6_lookup_nexthop(skb, NULL, slwt->table);
return dst_input(skb);
@@ -331,7 +341,7 @@
if (!ipv6_addr_any(&slwt->nh6))
nhaddr = &slwt->nh6;
- lookup_nexthop(skb, nhaddr, 0);
+ seg6_lookup_nexthop(skb, nhaddr, 0);
return dst_input(skb);
drop:
@@ -380,7 +390,7 @@
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
goto drop;
- lookup_nexthop(skb, NULL, slwt->table);
+ seg6_lookup_nexthop(skb, NULL, slwt->table);
return dst_input(skb);
@@ -406,7 +416,7 @@
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
- lookup_nexthop(skb, NULL, 0);
+ seg6_lookup_nexthop(skb, NULL, 0);
return dst_input(skb);
@@ -438,7 +448,7 @@
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
- lookup_nexthop(skb, NULL, 0);
+ seg6_lookup_nexthop(skb, NULL, 0);
return dst_input(skb);
@@ -447,6 +457,71 @@
return err;
}
+DEFINE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
+
+static int input_action_end_bpf(struct sk_buff *skb,
+ struct seg6_local_lwt *slwt)
+{
+ struct seg6_bpf_srh_state *srh_state =
+ this_cpu_ptr(&seg6_bpf_srh_states);
+ struct seg6_bpf_srh_state local_srh_state;
+ struct ipv6_sr_hdr *srh;
+ int srhoff = 0;
+ int ret;
+
+ srh = get_and_validate_srh(skb);
+ if (!srh)
+ goto drop;
+ advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
+
+ /* preempt_disable is needed to protect the per-CPU buffer srh_state,
+ * which is also accessed by the bpf_lwt_seg6_* helpers
+ */
+ preempt_disable();
+ srh_state->hdrlen = srh->hdrlen << 3;
+ srh_state->valid = 1;
+
+ rcu_read_lock();
+ bpf_compute_data_pointers(skb);
+ ret = bpf_prog_run_save_cb(slwt->bpf.prog, skb);
+ rcu_read_unlock();
+
+ local_srh_state = *srh_state;
+ preempt_enable();
+
+ switch (ret) {
+ case BPF_OK:
+ case BPF_REDIRECT:
+ break;
+ case BPF_DROP:
+ goto drop;
+ default:
+ pr_warn_once("bpf-seg6local: Illegal return value %u\n", ret);
+ goto drop;
+ }
+
+ if (unlikely((local_srh_state.hdrlen & 7) != 0))
+ goto drop;
+
+ if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
+ goto drop;
+ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+ srh->hdrlen = (u8)(local_srh_state.hdrlen >> 3);
+
+ if (!local_srh_state.valid &&
+ unlikely(!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)))
+ goto drop;
+
+ if (ret != BPF_REDIRECT)
+ seg6_lookup_nexthop(skb, NULL, 0);
+
+ return dst_input(skb);
+
+drop:
+ kfree_skb(skb);
+ return -EINVAL;
+}
+
static struct seg6_action_desc seg6_action_table[] = {
{
.action = SEG6_LOCAL_ACTION_END,
@@ -493,7 +568,13 @@
.attrs = (1 << SEG6_LOCAL_SRH),
.input = input_action_end_b6_encap,
.static_headroom = sizeof(struct ipv6hdr),
- }
+ },
+ {
+ .action = SEG6_LOCAL_ACTION_END_BPF,
+ .attrs = (1 << SEG6_LOCAL_BPF),
+ .input = input_action_end_bpf,
+ },
+
};
static struct seg6_action_desc *__get_action_desc(int action)
@@ -538,6 +619,7 @@
.len = sizeof(struct in6_addr) },
[SEG6_LOCAL_IIF] = { .type = NLA_U32 },
[SEG6_LOCAL_OIF] = { .type = NLA_U32 },
+ [SEG6_LOCAL_BPF] = { .type = NLA_NESTED },
};
static int parse_nla_srh(struct nlattr **attrs, struct seg6_local_lwt *slwt)
@@ -715,6 +797,75 @@
return 0;
}
+#define MAX_PROG_NAME 256
+static const struct nla_policy bpf_prog_policy[SEG6_LOCAL_BPF_PROG_MAX + 1] = {
+ [SEG6_LOCAL_BPF_PROG] = { .type = NLA_U32, },
+ [SEG6_LOCAL_BPF_PROG_NAME] = { .type = NLA_NUL_STRING,
+ .len = MAX_PROG_NAME },
+};
+
+static int parse_nla_bpf(struct nlattr **attrs, struct seg6_local_lwt *slwt)
+{
+ struct nlattr *tb[SEG6_LOCAL_BPF_PROG_MAX + 1];
+ struct bpf_prog *p;
+ int ret;
+ u32 fd;
+
+ ret = nla_parse_nested(tb, SEG6_LOCAL_BPF_PROG_MAX,
+ attrs[SEG6_LOCAL_BPF], bpf_prog_policy, NULL);
+ if (ret < 0)
+ return ret;
+
+ if (!tb[SEG6_LOCAL_BPF_PROG] || !tb[SEG6_LOCAL_BPF_PROG_NAME])
+ return -EINVAL;
+
+ slwt->bpf.name = nla_memdup(tb[SEG6_LOCAL_BPF_PROG_NAME], GFP_KERNEL);
+ if (!slwt->bpf.name)
+ return -ENOMEM;
+
+ fd = nla_get_u32(tb[SEG6_LOCAL_BPF_PROG]);
+ p = bpf_prog_get_type(fd, BPF_PROG_TYPE_LWT_SEG6LOCAL);
+ if (IS_ERR(p)) {
+ kfree(slwt->bpf.name);
+ return PTR_ERR(p);
+ }
+
+ slwt->bpf.prog = p;
+ return 0;
+}
+
+static int put_nla_bpf(struct sk_buff *skb, struct seg6_local_lwt *slwt)
+{
+ struct nlattr *nest;
+
+ if (!slwt->bpf.prog)
+ return 0;
+
+ nest = nla_nest_start(skb, SEG6_LOCAL_BPF);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(skb, SEG6_LOCAL_BPF_PROG, slwt->bpf.prog->aux->id))
+ return -EMSGSIZE;
+
+ if (slwt->bpf.name &&
+ nla_put_string(skb, SEG6_LOCAL_BPF_PROG_NAME, slwt->bpf.name))
+ return -EMSGSIZE;
+
+ return nla_nest_end(skb, nest);
+}
+
+static int cmp_nla_bpf(struct seg6_local_lwt *a, struct seg6_local_lwt *b)
+{
+ if (!a->bpf.name && !b->bpf.name)
+ return 0;
+
+ if (!a->bpf.name || !b->bpf.name)
+ return 1;
+
+ return strcmp(a->bpf.name, b->bpf.name);
+}
+
struct seg6_action_param {
int (*parse)(struct nlattr **attrs, struct seg6_local_lwt *slwt);
int (*put)(struct sk_buff *skb, struct seg6_local_lwt *slwt);
@@ -745,6 +896,11 @@
[SEG6_LOCAL_OIF] = { .parse = parse_nla_oif,
.put = put_nla_oif,
.cmp = cmp_nla_oif },
+
+ [SEG6_LOCAL_BPF] = { .parse = parse_nla_bpf,
+ .put = put_nla_bpf,
+ .cmp = cmp_nla_bpf },
+
};
static int parse_nla_action(struct nlattr **attrs, struct seg6_local_lwt *slwt)
@@ -830,6 +986,13 @@
struct seg6_local_lwt *slwt = seg6_local_lwtunnel(lwt);
kfree(slwt->srh);
+
+ if (slwt->desc->attrs & (1 << SEG6_LOCAL_BPF)) {
+ kfree(slwt->bpf.name);
+ bpf_prog_put(slwt->bpf.prog);
+ }
+
+ return;
}
static int seg6_local_fill_encap(struct sk_buff *skb,
@@ -882,6 +1045,11 @@
if (attrs & (1 << SEG6_LOCAL_OIF))
nlsize += nla_total_size(4);
+ if (attrs & (1 << SEG6_LOCAL_BPF))
+ nlsize += nla_total_size(sizeof(struct nlattr)) +
+ nla_total_size(MAX_PROG_NAME) +
+ nla_total_size(4);
+
return nlsize;
}
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 6fbdef6..e15cd37 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -152,6 +152,13 @@
.extra1 = &zero,
.extra2 = &one,
},
+ {
+ .procname = "seg6_flowlabel",
+ .data = &init_net.ipv6.sysctl.seg6_flowlabel,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{ }
};
@@ -217,6 +224,7 @@
ipv6_table[12].data = &net->ipv6.sysctl.max_dst_opts_len;
ipv6_table[13].data = &net->ipv6.sysctl.max_hbh_opts_len;
ipv6_table[14].data = &net->ipv6.sysctl.multipath_hash_policy,
+ ipv6_table[15].data = &net->ipv6.sysctl.seg6_flowlabel;
ipv6_route_table = ipv6_route_sysctl_init(net);
if (!ipv6_route_table)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6d664d8..7d47c2b 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -803,6 +803,7 @@
unsigned int tot_len = sizeof(struct tcphdr);
struct dst_entry *dst;
__be32 *topt;
+ __u32 mark = 0;
if (tsecr)
tot_len += TCPOLEN_TSTAMP_ALIGNED;
@@ -871,7 +872,10 @@
fl6.flowi6_oif = oif;
}
- fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark);
+ if (sk)
+ mark = (sk->sk_state == TCP_TIME_WAIT) ?
+ inet_twsk(sk)->tw_mark : sk->sk_mark;
+ fl6.flowi6_mark = IP6_REPLY_MARK(net, skb->mark) ?: mark;
fl6.fl6_dport = t1->dest;
fl6.fl6_sport = t1->source;
fl6.flowi6_uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index ea07300..426c9d2 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -546,10 +546,10 @@
__udp6_lib_err(skb, opt, type, code, offset, info, &udp_table);
}
-static struct static_key udpv6_encap_needed __read_mostly;
+static DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
void udpv6_encap_enable(void)
{
- static_key_enable(&udpv6_encap_needed);
+ static_branch_enable(&udpv6_encap_needed_key);
}
EXPORT_SYMBOL(udpv6_encap_enable);
@@ -561,7 +561,7 @@
if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb))
goto drop;
- if (static_key_false(&udpv6_encap_needed) && up->encap_type) {
+ if (static_branch_unlikely(&udpv6_encap_needed_key) && up->encap_type) {
int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
/*
@@ -1023,7 +1023,8 @@
* Sending
*/
-static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6)
+static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
+ struct inet_cork *cork)
{
struct sock *sk = skb->sk;
struct udphdr *uh;
@@ -1042,12 +1043,32 @@
uh->len = htons(len);
uh->check = 0;
+ if (cork->gso_size) {
+ const int hlen = skb_network_header_len(skb) +
+ sizeof(struct udphdr);
+
+ if (hlen + cork->gso_size > cork->fragsize)
+ return -EINVAL;
+ if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS)
+ return -EINVAL;
+ if (udp_sk(sk)->no_check6_tx)
+ return -EINVAL;
+ if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
+ dst_xfrm(skb_dst(skb)))
+ return -EIO;
+
+ skb_shinfo(skb)->gso_size = cork->gso_size;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ goto csum_partial;
+ }
+
if (is_udplite)
csum = udplite_csum(skb);
else if (udp_sk(sk)->no_check6_tx) { /* UDP csum disabled */
skb->ip_summed = CHECKSUM_NONE;
goto send;
} else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
+csum_partial:
udp6_hwcsum_outgoing(sk, skb, &fl6->saddr, &fl6->daddr, len);
goto send;
} else
@@ -1093,7 +1114,7 @@
if (!skb)
goto out;
- err = udp_v6_send_skb(skb, &fl6);
+ err = udp_v6_send_skb(skb, &fl6, &inet_sk(sk)->cork.base);
out:
up->len = 0;
@@ -1127,6 +1148,7 @@
ipc6.hlimit = -1;
ipc6.tclass = -1;
ipc6.dontfrag = -1;
+ ipc6.gso_size = up->gso_size;
sockc.tsflags = sk->sk_tsflags;
/* destination address check */
@@ -1259,7 +1281,10 @@
opt->tot_len = sizeof(*opt);
ipc6.opt = opt;
- err = ip6_datagram_send_ctl(sock_net(sk), sk, msg, &fl6, &ipc6, &sockc);
+ err = udp_cmsg_send(sk, msg, &ipc6.gso_size);
+ if (err > 0)
+ err = ip6_datagram_send_ctl(sock_net(sk), sk, msg, &fl6,
+ &ipc6, &sockc);
if (err < 0) {
fl6_sock_release(flowlabel);
return err;
@@ -1324,15 +1349,16 @@
/* Lockless fast path for the non-corking case */
if (!corkreq) {
+ struct inet_cork_full cork;
struct sk_buff *skb;
skb = ip6_make_skb(sk, getfrag, msg, ulen,
sizeof(struct udphdr), &ipc6,
&fl6, (struct rt6_info *)dst,
- msg->msg_flags, &sockc);
+ msg->msg_flags, &cork, &sockc);
err = PTR_ERR(skb);
if (!IS_ERR_OR_NULL(skb))
- err = udp_v6_send_skb(skb, &fl6);
+ err = udp_v6_send_skb(skb, &fl6, &cork.base);
goto out;
}
@@ -1402,7 +1428,7 @@
udp_v6_flush_pending_frames(sk);
release_sock(sk);
- if (static_key_false(&udpv6_encap_needed) && up->encap_type) {
+ if (static_branch_unlikely(&udpv6_encap_needed_key) && up->encap_type) {
void (*encap_destroy)(struct sock *sk);
encap_destroy = READ_ONCE(up->encap_destroy);
if (encap_destroy)
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 2a04dc9..03a2ff3 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -42,12 +42,15 @@
const struct ipv6hdr *ipv6h;
struct udphdr *uh;
- if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+ if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_UDP | SKB_GSO_UDP_L4)))
goto out;
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
goto out;
+ if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+ return __udp_gso_segment(skb, features);
+
/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
* do checksum of UDP packets sent as multiple IP fragments.
*/
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 416fe67..2cff209 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -107,8 +107,6 @@
* it was magically lost, so this code needs audit */
xdst->u.rt6.rt6i_flags = rt->rt6i_flags & (RTF_ANYCAST |
RTF_LOCAL);
- xdst->u.rt6.rt6i_metric = rt->rt6i_metric;
- xdst->u.rt6.rt6i_node = rt->rt6i_node;
xdst->route_cookie = rt6_get_cookie(rt);
xdst->u.rt6.rt6i_gateway = rt->rt6i_gateway;
xdst->u.rt6.rt6i_dst = rt->rt6i_dst;
diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index 16f4347..5bdca3d 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -60,11 +60,9 @@
static int
__xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void *p), int maxclass)
{
- int i;
+ int count[XFRM_MAX_DEPTH] = { };
int class[XFRM_MAX_DEPTH];
- int count[maxclass];
-
- memset(count, 0, sizeof(count));
+ int i;
for (i = 0; i < n; i++) {
int c;
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index 7f1e842..e87686f 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -57,6 +57,10 @@
static void l2tp_dfs_next_session(struct l2tp_dfs_seq_data *pd)
{
+ /* Drop reference taken during previous invocation */
+ if (pd->session)
+ l2tp_session_dec_refcount(pd->session);
+
pd->session = l2tp_session_get_nth(pd->tunnel, pd->session_idx);
pd->session_idx++;
@@ -105,11 +109,16 @@
if (!pd || pd == SEQ_START_TOKEN)
return;
- /* Drop reference taken by last invocation of l2tp_dfs_next_tunnel() */
+ /* Drop reference taken by last invocation of l2tp_dfs_next_session()
+ * or l2tp_dfs_next_tunnel().
+ */
+ if (pd->session) {
+ l2tp_session_dec_refcount(pd->session);
+ pd->session = NULL;
+ }
if (pd->tunnel) {
l2tp_tunnel_dec_refcount(pd->tunnel);
pd->tunnel = NULL;
- pd->session = NULL;
}
}
@@ -250,13 +259,10 @@
goto out;
}
- /* Show the tunnel or session context */
- if (!pd->session) {
+ if (!pd->session)
l2tp_dfs_seq_tunnel_show(m, pd->tunnel);
- } else {
+ else
l2tp_dfs_seq_session_show(m, pd->session);
- l2tp_session_dec_refcount(pd->session);
- }
out:
return 0;
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 1fd9e14..f951c76 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1576,6 +1576,10 @@
static void pppol2tp_next_session(struct net *net, struct pppol2tp_seq_data *pd)
{
+ /* Drop reference taken during previous invocation */
+ if (pd->session)
+ l2tp_session_dec_refcount(pd->session);
+
pd->session = l2tp_session_get_nth(pd->tunnel, pd->session_idx);
pd->session_idx++;
@@ -1624,11 +1628,16 @@
if (!pd || pd == SEQ_START_TOKEN)
return;
- /* Drop reference taken by last invocation of pppol2tp_next_tunnel() */
+ /* Drop reference taken by last invocation of pppol2tp_next_session()
+ * or pppol2tp_next_tunnel().
+ */
+ if (pd->session) {
+ l2tp_session_dec_refcount(pd->session);
+ pd->session = NULL;
+ }
if (pd->tunnel) {
l2tp_tunnel_dec_refcount(pd->tunnel);
pd->tunnel = NULL;
- pd->session = NULL;
}
}
@@ -1723,14 +1732,10 @@
goto out;
}
- /* Show the tunnel or session context.
- */
- if (!pd->session) {
+ if (!pd->session)
pppol2tp_seq_tunnel_show(m, pd->tunnel);
- } else {
+ else
pppol2tp_seq_session_show(m, pd->session);
- l2tp_session_dec_refcount(pd->session);
- }
out:
return 0;
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 85dbaa8..bdf6fa7 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -695,7 +695,7 @@
if (sta) {
ret = 0;
memcpy(mac, sta->sta.addr, ETH_ALEN);
- sta_set_sinfo(sta, sinfo);
+ sta_set_sinfo(sta, sinfo, true);
}
mutex_unlock(&local->sta_mtx);
@@ -724,7 +724,7 @@
sta = sta_info_get_bss(sdata, mac);
if (sta) {
ret = 0;
- sta_set_sinfo(sta, sinfo);
+ sta_set_sinfo(sta, sinfo, true);
}
mutex_unlock(&local->sta_mtx);
@@ -2376,6 +2376,11 @@
(WIPHY_PARAM_RETRY_SHORT | WIPHY_PARAM_RETRY_LONG))
ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_RETRY_LIMITS);
+ if (changed & (WIPHY_PARAM_TXQ_LIMIT |
+ WIPHY_PARAM_TXQ_MEMORY_LIMIT |
+ WIPHY_PARAM_TXQ_QUANTUM))
+ ieee80211_txq_set_params(local);
+
return 0;
}
@@ -3705,6 +3710,99 @@
return 0;
}
+void ieee80211_fill_txq_stats(struct cfg80211_txq_stats *txqstats,
+ struct txq_info *txqi)
+{
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_BACKLOG_BYTES))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_BACKLOG_BYTES);
+ txqstats->backlog_bytes = txqi->tin.backlog_bytes;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_BACKLOG_PACKETS))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_BACKLOG_PACKETS);
+ txqstats->backlog_packets = txqi->tin.backlog_packets;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_FLOWS))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_FLOWS);
+ txqstats->flows = txqi->tin.flows;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_DROPS))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_DROPS);
+ txqstats->drops = txqi->cstats.drop_count;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_ECN_MARKS))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_ECN_MARKS);
+ txqstats->ecn_marks = txqi->cstats.ecn_mark;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_OVERLIMIT))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_OVERLIMIT);
+ txqstats->overlimit = txqi->tin.overlimit;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_COLLISIONS))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_COLLISIONS);
+ txqstats->collisions = txqi->tin.collisions;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_TX_BYTES))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_TX_BYTES);
+ txqstats->tx_bytes = txqi->tin.tx_bytes;
+ }
+
+ if (!(txqstats->filled & BIT(NL80211_TXQ_STATS_TX_PACKETS))) {
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_TX_PACKETS);
+ txqstats->tx_packets = txqi->tin.tx_packets;
+ }
+}
+
+static int ieee80211_get_txq_stats(struct wiphy *wiphy,
+ struct wireless_dev *wdev,
+ struct cfg80211_txq_stats *txqstats)
+{
+ struct ieee80211_local *local = wiphy_priv(wiphy);
+ struct ieee80211_sub_if_data *sdata;
+ int ret = 0;
+
+ if (!local->ops->wake_tx_queue)
+ return 1;
+
+ spin_lock_bh(&local->fq.lock);
+ rcu_read_lock();
+
+ if (wdev) {
+ sdata = IEEE80211_WDEV_TO_SUB_IF(wdev);
+ if (!sdata->vif.txq) {
+ ret = 1;
+ goto out;
+ }
+ ieee80211_fill_txq_stats(txqstats, to_txq_info(sdata->vif.txq));
+ } else {
+ /* phy stats */
+ txqstats->filled |= BIT(NL80211_TXQ_STATS_BACKLOG_PACKETS) |
+ BIT(NL80211_TXQ_STATS_BACKLOG_BYTES) |
+ BIT(NL80211_TXQ_STATS_OVERLIMIT) |
+ BIT(NL80211_TXQ_STATS_OVERMEMORY) |
+ BIT(NL80211_TXQ_STATS_COLLISIONS) |
+ BIT(NL80211_TXQ_STATS_MAX_FLOWS);
+ txqstats->backlog_packets = local->fq.backlog;
+ txqstats->backlog_bytes = local->fq.memory_usage;
+ txqstats->overlimit = local->fq.overlimit;
+ txqstats->overmemory = local->fq.overmemory;
+ txqstats->collisions = local->fq.collisions;
+ txqstats->max_flows = local->fq.flows_cnt;
+ }
+
+out:
+ rcu_read_unlock();
+ spin_unlock_bh(&local->fq.lock);
+
+ return ret;
+}
+
const struct cfg80211_ops mac80211_config_ops = {
.add_virtual_intf = ieee80211_add_iface,
.del_virtual_intf = ieee80211_del_iface,
@@ -3798,4 +3896,5 @@
.del_nan_func = ieee80211_del_nan_func,
.set_multicast_to_unicast = ieee80211_set_multicast_to_unicast,
.tx_control_port = ieee80211_tx_control_port,
+ .get_txq_stats = ieee80211_get_txq_stats,
};
diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h
index 4d82fe7..8f69980 100644
--- a/net/mac80211/driver-ops.h
+++ b/net/mac80211/driver-ops.h
@@ -2,6 +2,7 @@
/*
* Portions of this file
* Copyright(c) 2016 Intel Deutschland GmbH
+* Copyright (C) 2018 Intel Corporation
*/
#ifndef __MAC80211_DRIVER_OPS
@@ -813,7 +814,8 @@
}
static inline void drv_mgd_prepare_tx(struct ieee80211_local *local,
- struct ieee80211_sub_if_data *sdata)
+ struct ieee80211_sub_if_data *sdata,
+ u16 duration)
{
might_sleep();
@@ -821,9 +823,9 @@
return;
WARN_ON_ONCE(sdata->vif.type != NL80211_IFTYPE_STATION);
- trace_drv_mgd_prepare_tx(local, sdata);
+ trace_drv_mgd_prepare_tx(local, sdata, duration);
if (local->ops->mgd_prepare_tx)
- local->ops->mgd_prepare_tx(&local->hw, &sdata->vif);
+ local->ops->mgd_prepare_tx(&local->hw, &sdata->vif, duration);
trace_drv_return_void(local);
}
diff --git a/net/mac80211/ethtool.c b/net/mac80211/ethtool.c
index 9cc986d..690c142 100644
--- a/net/mac80211/ethtool.c
+++ b/net/mac80211/ethtool.c
@@ -4,6 +4,7 @@
* Copied from cfg.c - originally
* Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
* Copyright 2014 Intel Corporation (Author: Johannes Berg)
+ * Copyright (C) 2018 Intel Corporation
*
* This file is GPLv2 as found in COPYING.
*/
@@ -106,8 +107,8 @@
if (!(sta && !WARN_ON(sta->sdata->dev != dev)))
goto do_survey;
- sinfo.filled = 0;
- sta_set_sinfo(sta, &sinfo);
+ memset(&sinfo, 0, sizeof(sinfo));
+ sta_set_sinfo(sta, &sinfo, false);
i = 0;
ADD_STA_STATS(sta);
@@ -116,11 +117,11 @@
if (sinfo.filled & BIT(NL80211_STA_INFO_TX_BITRATE))
- data[i] = 100000 *
+ data[i] = 100000ULL *
cfg80211_calculate_bitrate(&sinfo.txrate);
i++;
if (sinfo.filled & BIT(NL80211_STA_INFO_RX_BITRATE))
- data[i] = 100000 *
+ data[i] = 100000ULL *
cfg80211_calculate_bitrate(&sinfo.rxrate);
i++;
@@ -133,8 +134,8 @@
if (sta->sdata->dev != dev)
continue;
- sinfo.filled = 0;
- sta_set_sinfo(sta, &sinfo);
+ memset(&sinfo, 0, sizeof(sinfo));
+ sta_set_sinfo(sta, &sinfo, false);
i = 0;
ADD_STA_STATS(sta);
}
diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c
index c78036a..26a7ba3 100644
--- a/net/mac80211/ht.c
+++ b/net/mac80211/ht.c
@@ -301,26 +301,27 @@
___ieee80211_stop_tx_ba_session(sta, i, reason);
mutex_unlock(&sta->ampdu_mlme.mtx);
- /* stopping might queue the work again - so cancel only afterwards */
- cancel_work_sync(&sta->ampdu_mlme.work);
-
/*
* In case the tear down is part of a reconfigure due to HW restart
* request, it is possible that the low level driver requested to stop
* the BA session, so handle it to properly clean tid_tx data.
*/
- mutex_lock(&sta->ampdu_mlme.mtx);
- for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
- struct tid_ampdu_tx *tid_tx =
- rcu_dereference_protected_tid_tx(sta, i);
+ if(reason == AGG_STOP_DESTROY_STA) {
+ cancel_work_sync(&sta->ampdu_mlme.work);
- if (!tid_tx)
- continue;
+ mutex_lock(&sta->ampdu_mlme.mtx);
+ for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ struct tid_ampdu_tx *tid_tx =
+ rcu_dereference_protected_tid_tx(sta, i);
- if (test_and_clear_bit(HT_AGG_STATE_STOP_CB, &tid_tx->state))
- ieee80211_stop_tx_ba_cb(sta, i, tid_tx);
+ if (!tid_tx)
+ continue;
+
+ if (test_and_clear_bit(HT_AGG_STATE_STOP_CB, &tid_tx->state))
+ ieee80211_stop_tx_ba_cb(sta, i, tid_tx);
+ }
+ mutex_unlock(&sta->ampdu_mlme.mtx);
}
- mutex_unlock(&sta->ampdu_mlme.mtx);
}
void ieee80211_ba_session_work(struct work_struct *work)
@@ -328,16 +329,11 @@
struct sta_info *sta =
container_of(work, struct sta_info, ampdu_mlme.work);
struct tid_ampdu_tx *tid_tx;
+ bool blocked;
int tid;
- /*
- * When this flag is set, new sessions should be
- * blocked, and existing sessions will be torn
- * down by the code that set the flag, so this
- * need not run.
- */
- if (test_sta_flag(sta, WLAN_STA_BLOCK_BA))
- return;
+ /* When this flag is set, new sessions should be blocked. */
+ blocked = test_sta_flag(sta, WLAN_STA_BLOCK_BA);
mutex_lock(&sta->ampdu_mlme.mtx);
for (tid = 0; tid < IEEE80211_NUM_TIDS; tid++) {
@@ -352,7 +348,8 @@
sta, tid, WLAN_BACK_RECIPIENT,
WLAN_REASON_UNSPECIFIED, true);
- if (test_and_clear_bit(tid,
+ if (!blocked &&
+ test_and_clear_bit(tid,
sta->ampdu_mlme.tid_rx_manage_offl))
___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid,
IEEE80211_MAX_AMPDU_BUF,
@@ -367,7 +364,7 @@
spin_lock_bh(&sta->lock);
tid_tx = sta->ampdu_mlme.tid_start_tx[tid];
- if (tid_tx) {
+ if (!blocked && tid_tx) {
/*
* Assign it over to the normal tid_tx array
* where it "goes live".
@@ -390,7 +387,8 @@
if (!tid_tx)
continue;
- if (test_and_clear_bit(HT_AGG_STATE_START_CB, &tid_tx->state))
+ if (!blocked &&
+ test_and_clear_bit(HT_AGG_STATE_START_CB, &tid_tx->state))
ieee80211_start_tx_ba_cb(sta, tid, tid_tx);
if (test_and_clear_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state))
___ieee80211_stop_tx_ba_session(sta, tid,
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 6372dbd..d1978aa 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -2012,6 +2012,7 @@
}
int ieee80211_txq_setup_flows(struct ieee80211_local *local);
+void ieee80211_txq_set_params(struct ieee80211_local *local);
void ieee80211_txq_teardown_flows(struct ieee80211_local *local);
void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta,
@@ -2020,6 +2021,8 @@
struct txq_info *txqi);
void ieee80211_txq_remove_vlan(struct ieee80211_local *local,
struct ieee80211_sub_if_data *sdata);
+void ieee80211_fill_txq_stats(struct cfg80211_txq_stats *txqstats,
+ struct txq_info *txqi);
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
u16 transaction, u16 auth_alg, u16 status,
const u8 *extra, size_t extra_len, const u8 *bssid,
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 9ea17af..4d2e797 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -565,6 +565,9 @@
if (!ops->set_key)
wiphy->flags |= WIPHY_FLAG_IBSS_RSN;
+ if (ops->wake_tx_queue)
+ wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_TXQS);
+
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_RRM);
wiphy->bss_priv_size = sizeof(struct ieee80211_bss);
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 2330687..a59187c 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -864,7 +864,7 @@
return;
}
- drv_mgd_prepare_tx(local, sdata);
+ drv_mgd_prepare_tx(local, sdata, 0);
IEEE80211_SKB_CB(skb)->flags |= IEEE80211_TX_INTFL_DONT_ENCRYPT;
if (ieee80211_hw_check(&local->hw, REPORTS_TX_ACK_STATUS))
@@ -2022,7 +2022,7 @@
*/
if (ieee80211_hw_check(&local->hw, DEAUTH_NEED_MGD_TX_PREP) &&
!ifmgd->have_beacon)
- drv_mgd_prepare_tx(sdata->local, sdata);
+ drv_mgd_prepare_tx(sdata->local, sdata, 0);
ieee80211_send_deauth_disassoc(sdata, ifmgd->bssid, stype,
reason, tx, frame_buf);
@@ -2560,7 +2560,7 @@
if (!elems.challenge)
return;
auth_data->expected_transaction = 4;
- drv_mgd_prepare_tx(sdata->local, sdata);
+ drv_mgd_prepare_tx(sdata->local, sdata, 0);
if (ieee80211_hw_check(&local->hw, REPORTS_TX_ACK_STATUS))
tx_flags = IEEE80211_TX_CTL_REQ_TX_STATUS |
IEEE80211_TX_INTFL_MLME_CONN_TX;
@@ -3769,6 +3769,7 @@
u32 tx_flags = 0;
u16 trans = 1;
u16 status = 0;
+ u16 prepare_tx_duration = 0;
sdata_assert_lock(sdata);
@@ -3790,7 +3791,11 @@
return -ETIMEDOUT;
}
- drv_mgd_prepare_tx(local, sdata);
+ if (auth_data->algorithm == WLAN_AUTH_SAE)
+ prepare_tx_duration =
+ jiffies_to_msecs(IEEE80211_AUTH_TIMEOUT_SAE);
+
+ drv_mgd_prepare_tx(local, sdata, prepare_tx_duration);
sdata_info(sdata, "send auth to %pM (try %d/%d)\n",
auth_data->bss->bssid, auth_data->tries,
@@ -4994,7 +4999,7 @@
req->bssid, req->reason_code,
ieee80211_get_reason_code_string(req->reason_code));
- drv_mgd_prepare_tx(sdata->local, sdata);
+ drv_mgd_prepare_tx(sdata->local, sdata, 0);
ieee80211_send_deauth_disassoc(sdata, req->bssid,
IEEE80211_STYPE_DEAUTH,
req->reason_code, tx,
@@ -5014,7 +5019,7 @@
req->bssid, req->reason_code,
ieee80211_get_reason_code_string(req->reason_code));
- drv_mgd_prepare_tx(sdata->local, sdata);
+ drv_mgd_prepare_tx(sdata->local, sdata, 0);
ieee80211_send_deauth_disassoc(sdata, req->bssid,
IEEE80211_STYPE_DEAUTH,
req->reason_code, tx,
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 03102af..0a38cc1 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -5,6 +5,7 @@
* Copyright 2007-2010 Johannes Berg <johannes@sipsolutions.net>
* Copyright 2013-2014 Intel Mobile Communications GmbH
* Copyright(c) 2015 - 2017 Intel Deutschland GmbH
+ * Copyright (C) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -97,27 +98,27 @@
*/
static void remove_monitor_info(struct sk_buff *skb,
unsigned int present_fcs_len,
- unsigned int rtap_vendor_space)
+ unsigned int rtap_space)
{
if (present_fcs_len)
__pskb_trim(skb, skb->len - present_fcs_len);
- __pskb_pull(skb, rtap_vendor_space);
+ __pskb_pull(skb, rtap_space);
}
static inline bool should_drop_frame(struct sk_buff *skb, int present_fcs_len,
- unsigned int rtap_vendor_space)
+ unsigned int rtap_space)
{
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
struct ieee80211_hdr *hdr;
- hdr = (void *)(skb->data + rtap_vendor_space);
+ hdr = (void *)(skb->data + rtap_space);
if (status->flag & (RX_FLAG_FAILED_FCS_CRC |
RX_FLAG_FAILED_PLCP_CRC |
RX_FLAG_ONLY_MONITOR))
return true;
- if (unlikely(skb->len < 16 + present_fcs_len + rtap_vendor_space))
+ if (unlikely(skb->len < 16 + present_fcs_len + rtap_space))
return true;
if (ieee80211_is_ctl(hdr->frame_control) &&
@@ -199,7 +200,7 @@
static void ieee80211_handle_mu_mimo_mon(struct ieee80211_sub_if_data *sdata,
struct sk_buff *skb,
- int rtap_vendor_space)
+ int rtap_space)
{
struct {
struct ieee80211_hdr_3addr hdr;
@@ -212,14 +213,14 @@
BUILD_BUG_ON(sizeof(action) != IEEE80211_MIN_ACTION_SIZE + 1);
- if (skb->len < rtap_vendor_space + sizeof(action) +
+ if (skb->len < rtap_space + sizeof(action) +
VHT_MUMIMO_GROUPS_DATA_LEN)
return;
if (!is_valid_ether_addr(sdata->u.mntr.mu_follow_addr))
return;
- skb_copy_bits(skb, rtap_vendor_space, &action, sizeof(action));
+ skb_copy_bits(skb, rtap_space, &action, sizeof(action));
if (!ieee80211_is_action(action.hdr.frame_control))
return;
@@ -545,7 +546,7 @@
ieee80211_make_monitor_skb(struct ieee80211_local *local,
struct sk_buff **origskb,
struct ieee80211_rate *rate,
- int rtap_vendor_space, bool use_origskb)
+ int rtap_space, bool use_origskb)
{
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(*origskb);
int rt_hdrlen, needed_headroom;
@@ -553,7 +554,7 @@
/* room for the radiotap header based on driver features */
rt_hdrlen = ieee80211_rx_radiotap_hdrlen(local, status, *origskb);
- needed_headroom = rt_hdrlen - rtap_vendor_space;
+ needed_headroom = rt_hdrlen - rtap_space;
if (use_origskb) {
/* only need to expand headroom if necessary */
@@ -607,7 +608,7 @@
struct ieee80211_sub_if_data *sdata;
struct sk_buff *monskb = NULL;
int present_fcs_len = 0;
- unsigned int rtap_vendor_space = 0;
+ unsigned int rtap_space = 0;
struct ieee80211_sub_if_data *monitor_sdata =
rcu_dereference(local->monitor_sdata);
bool only_monitor = false;
@@ -615,7 +616,7 @@
if (unlikely(status->flag & RX_FLAG_RADIOTAP_VENDOR_DATA)) {
struct ieee80211_vendor_radiotap *rtap = (void *)origskb->data;
- rtap_vendor_space = sizeof(*rtap) + rtap->len + rtap->pad;
+ rtap_space += sizeof(*rtap) + rtap->len + rtap->pad;
}
/*
@@ -638,13 +639,12 @@
}
/* ensure hdr->frame_control and vendor radiotap data are in skb head */
- if (!pskb_may_pull(origskb, 2 + rtap_vendor_space)) {
+ if (!pskb_may_pull(origskb, 2 + rtap_space)) {
dev_kfree_skb(origskb);
return NULL;
}
- only_monitor = should_drop_frame(origskb, present_fcs_len,
- rtap_vendor_space);
+ only_monitor = should_drop_frame(origskb, present_fcs_len, rtap_space);
if (!local->monitors || (status->flag & RX_FLAG_SKIP_MONITOR)) {
if (only_monitor) {
@@ -652,12 +652,11 @@
return NULL;
}
- remove_monitor_info(origskb, present_fcs_len,
- rtap_vendor_space);
+ remove_monitor_info(origskb, present_fcs_len, rtap_space);
return origskb;
}
- ieee80211_handle_mu_mimo_mon(monitor_sdata, origskb, rtap_vendor_space);
+ ieee80211_handle_mu_mimo_mon(monitor_sdata, origskb, rtap_space);
list_for_each_entry_rcu(sdata, &local->mon_list, u.mntr.list) {
bool last_monitor = list_is_last(&sdata->u.mntr.list,
@@ -665,8 +664,7 @@
if (!monskb)
monskb = ieee80211_make_monitor_skb(local, &origskb,
- rate,
- rtap_vendor_space,
+ rate, rtap_space,
only_monitor &&
last_monitor);
@@ -698,7 +696,7 @@
if (!origskb)
return NULL;
- remove_monitor_info(origskb, present_fcs_len, rtap_vendor_space);
+ remove_monitor_info(origskb, present_fcs_len, rtap_space);
return origskb;
}
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 655c3d8..6428f1a 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -3,6 +3,7 @@
* Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
* Copyright 2013-2014 Intel Mobile Communications GmbH
* Copyright (C) 2015 - 2017 Intel Deutschland GmbH
+ * Copyright (C) 2018 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -357,6 +358,7 @@
sta->last_connected = ktime_get_seconds();
ewma_signal_init(&sta->rx_stats_avg.signal);
+ ewma_avg_signal_init(&sta->status_stats.avg_ack_signal);
for (i = 0; i < ARRAY_SIZE(sta->rx_stats_avg.chain_signal); i++)
ewma_signal_init(&sta->rx_stats_avg.chain_signal[i]);
@@ -1006,7 +1008,7 @@
sinfo = kzalloc(sizeof(*sinfo), GFP_KERNEL);
if (sinfo)
- sta_set_sinfo(sta, sinfo);
+ sta_set_sinfo(sta, sinfo, true);
cfg80211_del_sta_sinfo(sdata->dev, sta->sta.addr, sinfo, GFP_KERNEL);
kfree(sinfo);
@@ -1992,7 +1994,6 @@
int band = STA_STATS_GET(LEGACY_BAND, rate);
int rate_idx = STA_STATS_GET(LEGACY_IDX, rate);
- rinfo->flags = 0;
sband = local->hw.wiphy->bands[band];
brate = sband->bitrates[rate_idx].bitrate;
if (rinfo->bw == RATE_INFO_BW_5)
@@ -2051,6 +2052,18 @@
tidstats->filled |= BIT(NL80211_TID_STATS_TX_MSDU_FAILED);
tidstats->tx_msdu_failed = sta->status_stats.msdu_failed[tid];
}
+
+ if (local->ops->wake_tx_queue && tid < IEEE80211_NUM_TIDS) {
+ spin_lock_bh(&local->fq.lock);
+ rcu_read_lock();
+
+ tidstats->filled |= BIT(NL80211_TID_STATS_TXQ_STATS);
+ ieee80211_fill_txq_stats(&tidstats->txq_stats,
+ to_txq_info(sta->sta.txq[tid]));
+
+ rcu_read_unlock();
+ spin_unlock_bh(&local->fq.lock);
+ }
}
static inline u64 sta_get_stats_bytes(struct ieee80211_sta_rx_stats *rxstats)
@@ -2066,7 +2079,8 @@
return value;
}
-void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo)
+void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo,
+ bool tidstats)
{
struct ieee80211_sub_if_data *sdata = sta->sdata;
struct ieee80211_local *local = sdata->local;
@@ -2220,11 +2234,12 @@
sinfo->filled |= BIT(NL80211_STA_INFO_RX_BITRATE);
}
- sinfo->filled |= BIT(NL80211_STA_INFO_TID_STATS);
- for (i = 0; i < IEEE80211_NUM_TIDS + 1; i++) {
- struct cfg80211_tid_stats *tidstats = &sinfo->pertid[i];
+ if (tidstats && !cfg80211_sinfo_alloc_tid_stats(sinfo, GFP_KERNEL)) {
+ for (i = 0; i < IEEE80211_NUM_TIDS + 1; i++) {
+ struct cfg80211_tid_stats *tidstats = &sinfo->pertid[i];
- sta_set_tidstats(sta, tidstats, i);
+ sta_set_tidstats(sta, tidstats, i);
+ }
}
if (ieee80211_vif_is_mesh(&sdata->vif)) {
@@ -2294,6 +2309,15 @@
sinfo->ack_signal = sta->status_stats.last_ack_signal;
sinfo->filled |= BIT_ULL(NL80211_STA_INFO_ACK_SIGNAL);
}
+
+ if (ieee80211_hw_check(&sta->local->hw, REPORTS_TX_ACK_STATUS) &&
+ !(sinfo->filled & BIT_ULL(NL80211_STA_INFO_DATA_ACK_SIGNAL_AVG))) {
+ sinfo->avg_ack_signal =
+ -(s8)ewma_avg_signal_read(
+ &sta->status_stats.avg_ack_signal);
+ sinfo->filled |=
+ BIT_ULL(NL80211_STA_INFO_DATA_ACK_SIGNAL_AVG);
+ }
}
u32 sta_get_expected_throughput(struct sta_info *sta)
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index f64eb86..81b35f6 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -119,6 +119,7 @@
#define HT_AGG_STATE_START_CB 6
#define HT_AGG_STATE_STOP_CB 7
+DECLARE_EWMA(avg_signal, 10, 8)
enum ieee80211_agg_stop_reason {
AGG_STOP_DECLINED,
AGG_STOP_LOCAL_REQUEST,
@@ -550,6 +551,7 @@
unsigned long last_ack;
s8 last_ack_signal;
bool ack_signal_filled;
+ struct ewma_avg_signal avg_ack_signal;
} status_stats;
/* Updated from TX path only, no locking requirements */
@@ -742,7 +744,8 @@
void sta_set_rate_info_tx(struct sta_info *sta,
const struct ieee80211_tx_rate *rate,
struct rate_info *rinfo);
-void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo);
+void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo,
+ bool tidstats);
u32 sta_get_expected_throughput(struct sta_info *sta);
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 743e89c..9a6d720 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -195,6 +195,8 @@
sta->status_stats.last_ack_signal =
(s8)txinfo->status.ack_signal;
sta->status_stats.ack_signal_filled = true;
+ ewma_avg_signal_add(&sta->status_stats.avg_ack_signal,
+ -txinfo->status.ack_signal);
}
}
diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h
index 591ad02..80a7edf 100644
--- a/net/mac80211/trace.h
+++ b/net/mac80211/trace.h
@@ -2,6 +2,7 @@
/*
* Portions of this file
* Copyright(c) 2016 Intel Deutschland GmbH
+* Copyright (C) 2018 Intel Corporation
*/
#if !defined(__MAC80211_DRIVER_TRACE) || defined(TRACE_HEADER_MULTI_READ)
@@ -1413,11 +1414,29 @@
TP_ARGS(local, sta, tids, num_frames, reason, more_data)
);
-DEFINE_EVENT(local_sdata_evt, drv_mgd_prepare_tx,
+TRACE_EVENT(drv_mgd_prepare_tx,
TP_PROTO(struct ieee80211_local *local,
- struct ieee80211_sub_if_data *sdata),
+ struct ieee80211_sub_if_data *sdata,
+ u16 duration),
- TP_ARGS(local, sdata)
+ TP_ARGS(local, sdata, duration),
+
+ TP_STRUCT__entry(
+ LOCAL_ENTRY
+ VIF_ENTRY
+ __field(u32, duration)
+ ),
+
+ TP_fast_assign(
+ LOCAL_ASSIGN;
+ VIF_ASSIGN;
+ __entry->duration = duration;
+ ),
+
+ TP_printk(
+ LOCAL_PR_FMT VIF_PR_FMT " duration: %u",
+ LOCAL_PR_ARG, VIF_PR_ARG, __entry->duration
+ )
);
DEFINE_EVENT(local_sdata_evt, drv_mgd_protect_tdls_discover,
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 05a265c..44b5dfe 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1460,6 +1460,24 @@
ieee80211_purge_tx_queue(&local->hw, &txqi->frags);
}
+void ieee80211_txq_set_params(struct ieee80211_local *local)
+{
+ if (local->hw.wiphy->txq_limit)
+ local->fq.limit = local->hw.wiphy->txq_limit;
+ else
+ local->hw.wiphy->txq_limit = local->fq.limit;
+
+ if (local->hw.wiphy->txq_memory_limit)
+ local->fq.memory_limit = local->hw.wiphy->txq_memory_limit;
+ else
+ local->hw.wiphy->txq_memory_limit = local->fq.memory_limit;
+
+ if (local->hw.wiphy->txq_quantum)
+ local->fq.quantum = local->hw.wiphy->txq_quantum;
+ else
+ local->hw.wiphy->txq_quantum = local->fq.quantum;
+}
+
int ieee80211_txq_setup_flows(struct ieee80211_local *local)
{
struct fq *fq = &local->fq;
@@ -1509,6 +1527,8 @@
for (i = 0; i < fq->flows_cnt; i++)
codel_vars_init(&local->cvars[i]);
+ ieee80211_txq_set_params(local);
+
return 0;
}
@@ -4085,6 +4105,31 @@
}
EXPORT_SYMBOL(ieee80211_csa_update_counter);
+void ieee80211_csa_set_counter(struct ieee80211_vif *vif, u8 counter)
+{
+ struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
+ struct beacon_data *beacon = NULL;
+
+ rcu_read_lock();
+
+ if (sdata->vif.type == NL80211_IFTYPE_AP)
+ beacon = rcu_dereference(sdata->u.ap.beacon);
+ else if (sdata->vif.type == NL80211_IFTYPE_ADHOC)
+ beacon = rcu_dereference(sdata->u.ibss.presp);
+ else if (ieee80211_vif_is_mesh(&sdata->vif))
+ beacon = rcu_dereference(sdata->u.mesh.beacon);
+
+ if (!beacon)
+ goto unlock;
+
+ if (counter < beacon->csa_current_counter)
+ beacon->csa_current_counter = counter;
+
+unlock:
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL(ieee80211_csa_set_counter);
+
bool ieee80211_csa_is_complete(struct ieee80211_vif *vif)
{
struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 11f9cfc..2d82c88 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -2793,12 +2793,13 @@
memset(&ri, 0, sizeof(ri));
+ ri.bw = status->bw;
+
/* Fill cfg80211 rate info */
switch (status->encoding) {
case RX_ENC_HT:
ri.mcs = status->rate_idx;
ri.flags |= RATE_INFO_FLAGS_MCS;
- ri.bw = status->bw;
if (status->enc_flags & RX_ENC_FLAG_SHORT_GI)
ri.flags |= RATE_INFO_FLAGS_SHORT_GI;
break;
@@ -2806,7 +2807,6 @@
ri.flags |= RATE_INFO_FLAGS_VHT_MCS;
ri.mcs = status->rate_idx;
ri.nss = status->nss;
- ri.bw = status->bw;
if (status->enc_flags & RX_ENC_FLAG_SHORT_GI)
ri.flags |= RATE_INFO_FLAGS_SHORT_GI;
break;
@@ -2818,8 +2818,6 @@
int shift = 0;
int bitrate;
- ri.bw = status->bw;
-
switch (status->bw) {
case RATE_INFO_BW_10:
shift = 1;
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 8da8431..8055e39 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -68,15 +68,6 @@
NCSI_MODE_MAX
};
-enum {
- NCSI_FILTER_BASE = 0,
- NCSI_FILTER_VLAN = 0,
- NCSI_FILTER_UC,
- NCSI_FILTER_MC,
- NCSI_FILTER_MIXED,
- NCSI_FILTER_MAX
-};
-
struct ncsi_channel_version {
u32 version; /* Supported BCD encoded NCSI version */
u32 alpha2; /* Supported BCD encoded NCSI version */
@@ -98,11 +89,18 @@
u32 data[8]; /* Data entries */
};
-struct ncsi_channel_filter {
- u32 index; /* Index of channel filters */
- u32 total; /* Total entries in the filter table */
- u64 bitmap; /* Bitmap of valid entries */
- u32 data[]; /* Data for the valid entries */
+struct ncsi_channel_mac_filter {
+ u8 n_uc;
+ u8 n_mc;
+ u8 n_mixed;
+ u64 bitmap;
+ unsigned char *addrs;
+};
+
+struct ncsi_channel_vlan_filter {
+ u8 n_vids;
+ u64 bitmap;
+ u16 *vids;
};
struct ncsi_channel_stats {
@@ -186,7 +184,9 @@
struct ncsi_channel_version version;
struct ncsi_channel_cap caps[NCSI_CAP_MAX];
struct ncsi_channel_mode modes[NCSI_MODE_MAX];
- struct ncsi_channel_filter *filters[NCSI_FILTER_MAX];
+ /* Filtering Settings */
+ struct ncsi_channel_mac_filter mac_filter;
+ struct ncsi_channel_vlan_filter vlan_filter;
struct ncsi_channel_stats stats;
struct {
struct timer_list timer;
@@ -320,10 +320,6 @@
list_for_each_entry_rcu(nc, &np->channels, node)
/* Resources */
-u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index);
-int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data);
-int ncsi_add_filter(struct ncsi_channel *nc, int table, void *data);
-int ncsi_remove_filter(struct ncsi_channel *nc, int table, int index);
void ncsi_start_channel_monitor(struct ncsi_channel *nc);
void ncsi_stop_channel_monitor(struct ncsi_channel *nc);
struct ncsi_channel *ncsi_find_channel(struct ncsi_package *np,
diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index c3695ba..5561e22 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -27,125 +27,6 @@
LIST_HEAD(ncsi_dev_list);
DEFINE_SPINLOCK(ncsi_dev_lock);
-static inline int ncsi_filter_size(int table)
-{
- int sizes[] = { 2, 6, 6, 6 };
-
- BUILD_BUG_ON(ARRAY_SIZE(sizes) != NCSI_FILTER_MAX);
- if (table < NCSI_FILTER_BASE || table >= NCSI_FILTER_MAX)
- return -EINVAL;
-
- return sizes[table];
-}
-
-u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index)
-{
- struct ncsi_channel_filter *ncf;
- int size;
-
- ncf = nc->filters[table];
- if (!ncf)
- return NULL;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return NULL;
-
- return ncf->data + size * index;
-}
-
-/* Find the first active filter in a filter table that matches the given
- * data parameter. If data is NULL, this returns the first active filter.
- */
-int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data)
-{
- struct ncsi_channel_filter *ncf;
- void *bitmap;
- int index, size;
- unsigned long flags;
-
- ncf = nc->filters[table];
- if (!ncf)
- return -ENXIO;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return size;
-
- spin_lock_irqsave(&nc->lock, flags);
- bitmap = (void *)&ncf->bitmap;
- index = -1;
- while ((index = find_next_bit(bitmap, ncf->total, index + 1))
- < ncf->total) {
- if (!data || !memcmp(ncf->data + size * index, data, size)) {
- spin_unlock_irqrestore(&nc->lock, flags);
- return index;
- }
- }
- spin_unlock_irqrestore(&nc->lock, flags);
-
- return -ENOENT;
-}
-
-int ncsi_add_filter(struct ncsi_channel *nc, int table, void *data)
-{
- struct ncsi_channel_filter *ncf;
- int index, size;
- void *bitmap;
- unsigned long flags;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return size;
-
- index = ncsi_find_filter(nc, table, data);
- if (index >= 0)
- return index;
-
- ncf = nc->filters[table];
- if (!ncf)
- return -ENODEV;
-
- spin_lock_irqsave(&nc->lock, flags);
- bitmap = (void *)&ncf->bitmap;
- do {
- index = find_next_zero_bit(bitmap, ncf->total, 0);
- if (index >= ncf->total) {
- spin_unlock_irqrestore(&nc->lock, flags);
- return -ENOSPC;
- }
- } while (test_and_set_bit(index, bitmap));
-
- memcpy(ncf->data + size * index, data, size);
- spin_unlock_irqrestore(&nc->lock, flags);
-
- return index;
-}
-
-int ncsi_remove_filter(struct ncsi_channel *nc, int table, int index)
-{
- struct ncsi_channel_filter *ncf;
- int size;
- void *bitmap;
- unsigned long flags;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return size;
-
- ncf = nc->filters[table];
- if (!ncf || index >= ncf->total)
- return -ENODEV;
-
- spin_lock_irqsave(&nc->lock, flags);
- bitmap = (void *)&ncf->bitmap;
- if (test_and_clear_bit(index, bitmap))
- memset(ncf->data + size * index, 0, size);
- spin_unlock_irqrestore(&nc->lock, flags);
-
- return 0;
-}
-
static void ncsi_report_link(struct ncsi_dev_priv *ndp, bool force_down)
{
struct ncsi_dev *nd = &ndp->ndev;
@@ -339,20 +220,13 @@
static void ncsi_remove_channel(struct ncsi_channel *nc)
{
struct ncsi_package *np = nc->package;
- struct ncsi_channel_filter *ncf;
unsigned long flags;
- int i;
+
+ spin_lock_irqsave(&nc->lock, flags);
/* Release filters */
- spin_lock_irqsave(&nc->lock, flags);
- for (i = 0; i < NCSI_FILTER_MAX; i++) {
- ncf = nc->filters[i];
- if (!ncf)
- continue;
-
- nc->filters[i] = NULL;
- kfree(ncf);
- }
+ kfree(nc->mac_filter.addrs);
+ kfree(nc->vlan_filter.vids);
nc->state = NCSI_CHANNEL_INACTIVE;
spin_unlock_irqrestore(&nc->lock, flags);
@@ -670,32 +544,26 @@
static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
struct ncsi_cmd_arg *nca)
{
+ struct ncsi_channel_vlan_filter *ncf;
+ unsigned long flags;
+ void *bitmap;
int index;
- u32 *data;
u16 vid;
- index = ncsi_find_filter(nc, NCSI_FILTER_VLAN, NULL);
- if (index < 0) {
- /* Filter table empty */
+ ncf = &nc->vlan_filter;
+ bitmap = &ncf->bitmap;
+
+ spin_lock_irqsave(&nc->lock, flags);
+ index = find_next_bit(bitmap, ncf->n_vids, 0);
+ if (index >= ncf->n_vids) {
+ spin_unlock_irqrestore(&nc->lock, flags);
return -1;
}
+ vid = ncf->vids[index];
- data = ncsi_get_filter(nc, NCSI_FILTER_VLAN, index);
- if (!data) {
- netdev_err(ndp->ndev.dev,
- "NCSI: failed to retrieve filter %d\n", index);
- /* Set the VLAN id to 0 - this will still disable the entry in
- * the filter table, but we won't know what it was.
- */
- vid = 0;
- } else {
- vid = *(u16 *)data;
- }
-
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "NCSI: removed vlan tag %u at index %d\n",
- vid, index + 1);
- ncsi_remove_filter(nc, NCSI_FILTER_VLAN, index);
+ clear_bit(index, bitmap);
+ ncf->vids[index] = 0;
+ spin_unlock_irqrestore(&nc->lock, flags);
nca->type = NCSI_PKT_CMD_SVF;
nca->words[1] = vid;
@@ -711,45 +579,55 @@
static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
struct ncsi_cmd_arg *nca)
{
+ struct ncsi_channel_vlan_filter *ncf;
struct vlan_vid *vlan = NULL;
- int index = 0;
+ unsigned long flags;
+ int i, index;
+ void *bitmap;
+ u16 vid;
+ if (list_empty(&ndp->vlan_vids))
+ return -1;
+
+ ncf = &nc->vlan_filter;
+ bitmap = &ncf->bitmap;
+
+ spin_lock_irqsave(&nc->lock, flags);
+
+ rcu_read_lock();
list_for_each_entry_rcu(vlan, &ndp->vlan_vids, list) {
- index = ncsi_find_filter(nc, NCSI_FILTER_VLAN, &vlan->vid);
- if (index < 0) {
- /* New tag to add */
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "NCSI: new vlan id to set: %u\n",
- vlan->vid);
+ vid = vlan->vid;
+ for (i = 0; i < ncf->n_vids; i++)
+ if (ncf->vids[i] == vid) {
+ vid = 0;
+ break;
+ }
+ if (vid)
break;
- }
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "vid %u already at filter pos %d\n",
- vlan->vid, index);
}
+ rcu_read_unlock();
- if (!vlan || index >= 0) {
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "no vlan ids left to set\n");
+ if (!vid) {
+ /* No VLAN ID is not set */
+ spin_unlock_irqrestore(&nc->lock, flags);
return -1;
}
- index = ncsi_add_filter(nc, NCSI_FILTER_VLAN, &vlan->vid);
- if (index < 0) {
+ index = find_next_zero_bit(bitmap, ncf->n_vids, 0);
+ if (index < 0 || index >= ncf->n_vids) {
netdev_err(ndp->ndev.dev,
- "Failed to add new VLAN tag, error %d\n", index);
- if (index == -ENOSPC)
- netdev_err(ndp->ndev.dev,
- "Channel %u already has all VLAN filters set\n",
- nc->id);
+ "Channel %u already has all VLAN filters set\n",
+ nc->id);
+ spin_unlock_irqrestore(&nc->lock, flags);
return -1;
}
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "NCSI: set vid %u in packet, index %u\n",
- vlan->vid, index + 1);
+ ncf->vids[index] = vid;
+ set_bit(index, bitmap);
+ spin_unlock_irqrestore(&nc->lock, flags);
+
nca->type = NCSI_PKT_CMD_SVF;
- nca->words[1] = vlan->vid;
+ nca->words[1] = vid;
/* HW filter index starts at 1 */
nca->bytes[6] = index + 1;
nca->bytes[7] = 0x01;
diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 8d7e849..b09ef77 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -58,10 +58,9 @@
struct ncsi_dev_priv *ndp,
struct ncsi_channel *nc)
{
- struct nlattr *vid_nest;
- struct ncsi_channel_filter *ncf;
+ struct ncsi_channel_vlan_filter *ncf;
struct ncsi_channel_mode *m;
- u32 *data;
+ struct nlattr *vid_nest;
int i;
nla_put_u32(skb, NCSI_CHANNEL_ATTR_ID, nc->id);
@@ -79,18 +78,13 @@
vid_nest = nla_nest_start(skb, NCSI_CHANNEL_ATTR_VLAN_LIST);
if (!vid_nest)
return -ENOMEM;
- ncf = nc->filters[NCSI_FILTER_VLAN];
+ ncf = &nc->vlan_filter;
i = -1;
- if (ncf) {
- while ((i = find_next_bit((void *)&ncf->bitmap, ncf->total,
- i + 1)) < ncf->total) {
- data = ncsi_get_filter(nc, NCSI_FILTER_VLAN, i);
- /* Uninitialised channels will have 'zero' vlan ids */
- if (!data || !*data)
- continue;
+ while ((i = find_next_bit((void *)&ncf->bitmap, ncf->n_vids,
+ i + 1)) < ncf->n_vids) {
+ if (ncf->vids[i])
nla_put_u16(skb, NCSI_CHANNEL_ATTR_VLAN_ID,
- *(u16 *)data);
- }
+ ncf->vids[i]);
}
nla_nest_end(skb, vid_nest);
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index efd933f..a6b7c7d 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -334,9 +334,9 @@
struct ncsi_rsp_pkt *rsp;
struct ncsi_dev_priv *ndp = nr->ndp;
struct ncsi_channel *nc;
- struct ncsi_channel_filter *ncf;
- unsigned short vlan;
- int ret;
+ struct ncsi_channel_vlan_filter *ncf;
+ unsigned long flags;
+ void *bitmap;
/* Find the package and channel */
rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
@@ -346,22 +346,23 @@
return -ENODEV;
cmd = (struct ncsi_cmd_svf_pkt *)skb_network_header(nr->cmd);
- ncf = nc->filters[NCSI_FILTER_VLAN];
- if (!ncf)
- return -ENOENT;
- if (cmd->index >= ncf->total)
+ ncf = &nc->vlan_filter;
+ if (cmd->index == 0 || cmd->index > ncf->n_vids)
return -ERANGE;
- /* Add or remove the VLAN filter */
+ /* Add or remove the VLAN filter. Remember HW indexes from 1 */
+ spin_lock_irqsave(&nc->lock, flags);
+ bitmap = &ncf->bitmap;
if (!(cmd->enable & 0x1)) {
- /* HW indexes from 1 */
- ret = ncsi_remove_filter(nc, NCSI_FILTER_VLAN, cmd->index - 1);
+ if (test_and_clear_bit(cmd->index - 1, bitmap))
+ ncf->vids[cmd->index - 1] = 0;
} else {
- vlan = ntohs(cmd->vlan);
- ret = ncsi_add_filter(nc, NCSI_FILTER_VLAN, &vlan);
+ set_bit(cmd->index - 1, bitmap);
+ ncf->vids[cmd->index - 1] = ntohs(cmd->vlan);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
- return ret;
+ return 0;
}
static int ncsi_rsp_handler_ev(struct ncsi_request *nr)
@@ -422,8 +423,12 @@
struct ncsi_rsp_pkt *rsp;
struct ncsi_dev_priv *ndp = nr->ndp;
struct ncsi_channel *nc;
- struct ncsi_channel_filter *ncf;
+ struct ncsi_channel_mac_filter *ncf;
+ unsigned long flags;
void *bitmap;
+ bool enabled;
+ int index;
+
/* Find the package and channel */
rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
@@ -436,31 +441,24 @@
* isn't supported yet.
*/
cmd = (struct ncsi_cmd_sma_pkt *)skb_network_header(nr->cmd);
- switch (cmd->at_e >> 5) {
- case 0x0: /* UC address */
- ncf = nc->filters[NCSI_FILTER_UC];
- break;
- case 0x1: /* MC address */
- ncf = nc->filters[NCSI_FILTER_MC];
- break;
- default:
- return -EINVAL;
- }
+ enabled = cmd->at_e & 0x1;
+ ncf = &nc->mac_filter;
+ bitmap = &ncf->bitmap;
- /* Sanity check on the filter */
- if (!ncf)
- return -ENOENT;
- else if (cmd->index >= ncf->total)
+ if (cmd->index == 0 ||
+ cmd->index > ncf->n_uc + ncf->n_mc + ncf->n_mixed)
return -ERANGE;
- bitmap = &ncf->bitmap;
- if (cmd->at_e & 0x1) {
- set_bit(cmd->index, bitmap);
- memcpy(ncf->data + 6 * cmd->index, cmd->mac, 6);
+ index = (cmd->index - 1) * ETH_ALEN;
+ spin_lock_irqsave(&nc->lock, flags);
+ if (enabled) {
+ set_bit(cmd->index - 1, bitmap);
+ memcpy(&ncf->addrs[index], cmd->mac, ETH_ALEN);
} else {
- clear_bit(cmd->index, bitmap);
- memset(ncf->data + 6 * cmd->index, 0, 6);
+ clear_bit(cmd->index - 1, bitmap);
+ memset(&ncf->addrs[index], 0, ETH_ALEN);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
return 0;
}
@@ -631,9 +629,7 @@
struct ncsi_rsp_gc_pkt *rsp;
struct ncsi_dev_priv *ndp = nr->ndp;
struct ncsi_channel *nc;
- struct ncsi_channel_filter *ncf;
- size_t size, entry_size;
- int cnt, i;
+ size_t size;
/* Find the channel */
rsp = (struct ncsi_rsp_gc_pkt *)skb_network_header(nr->rsp);
@@ -655,64 +651,40 @@
nc->caps[NCSI_CAP_VLAN].cap = rsp->vlan_mode &
NCSI_CAP_VLAN_MASK;
- /* Build filters */
- for (i = 0; i < NCSI_FILTER_MAX; i++) {
- switch (i) {
- case NCSI_FILTER_VLAN:
- cnt = rsp->vlan_cnt;
- entry_size = 2;
- break;
- case NCSI_FILTER_MIXED:
- cnt = rsp->mixed_cnt;
- entry_size = 6;
- break;
- case NCSI_FILTER_MC:
- cnt = rsp->mc_cnt;
- entry_size = 6;
- break;
- case NCSI_FILTER_UC:
- cnt = rsp->uc_cnt;
- entry_size = 6;
- break;
- default:
- continue;
- }
+ size = (rsp->uc_cnt + rsp->mc_cnt + rsp->mixed_cnt) * ETH_ALEN;
+ nc->mac_filter.addrs = kzalloc(size, GFP_KERNEL);
+ if (!nc->mac_filter.addrs)
+ return -ENOMEM;
+ nc->mac_filter.n_uc = rsp->uc_cnt;
+ nc->mac_filter.n_mc = rsp->mc_cnt;
+ nc->mac_filter.n_mixed = rsp->mixed_cnt;
- if (!cnt || nc->filters[i])
- continue;
-
- size = sizeof(*ncf) + cnt * entry_size;
- ncf = kzalloc(size, GFP_ATOMIC);
- if (!ncf) {
- pr_warn("%s: Cannot alloc filter table (%d)\n",
- __func__, i);
- return -ENOMEM;
- }
-
- ncf->index = i;
- ncf->total = cnt;
- if (i == NCSI_FILTER_VLAN) {
- /* Set VLAN filters active so they are cleared in
- * first configuration state
- */
- ncf->bitmap = U64_MAX;
- } else {
- ncf->bitmap = 0x0ul;
- }
- nc->filters[i] = ncf;
- }
+ nc->vlan_filter.vids = kcalloc(rsp->vlan_cnt,
+ sizeof(*nc->vlan_filter.vids),
+ GFP_KERNEL);
+ if (!nc->vlan_filter.vids)
+ return -ENOMEM;
+ /* Set VLAN filters active so they are cleared in the first
+ * configuration state
+ */
+ nc->vlan_filter.bitmap = U64_MAX;
+ nc->vlan_filter.n_vids = rsp->vlan_cnt;
return 0;
}
static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
{
- struct ncsi_rsp_gp_pkt *rsp;
+ struct ncsi_channel_vlan_filter *ncvf;
+ struct ncsi_channel_mac_filter *ncmf;
struct ncsi_dev_priv *ndp = nr->ndp;
+ struct ncsi_rsp_gp_pkt *rsp;
struct ncsi_channel *nc;
- unsigned short enable, vlan;
+ unsigned short enable;
unsigned char *pdata;
- int table, i;
+ unsigned long flags;
+ void *bitmap;
+ int i;
/* Find the channel */
rsp = (struct ncsi_rsp_gp_pkt *)skb_network_header(nr->rsp);
@@ -746,36 +718,33 @@
/* MAC addresses filter table */
pdata = (unsigned char *)rsp + 48;
enable = rsp->mac_enable;
+ ncmf = &nc->mac_filter;
+ spin_lock_irqsave(&nc->lock, flags);
+ bitmap = &ncmf->bitmap;
for (i = 0; i < rsp->mac_cnt; i++, pdata += 6) {
- if (i >= (nc->filters[NCSI_FILTER_UC]->total +
- nc->filters[NCSI_FILTER_MC]->total))
- table = NCSI_FILTER_MIXED;
- else if (i >= nc->filters[NCSI_FILTER_UC]->total)
- table = NCSI_FILTER_MC;
- else
- table = NCSI_FILTER_UC;
-
if (!(enable & (0x1 << i)))
- continue;
+ clear_bit(i, bitmap);
+ else
+ set_bit(i, bitmap);
- if (ncsi_find_filter(nc, table, pdata) >= 0)
- continue;
-
- ncsi_add_filter(nc, table, pdata);
+ memcpy(&ncmf->addrs[i * ETH_ALEN], pdata, ETH_ALEN);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
/* VLAN filter table */
enable = ntohs(rsp->vlan_enable);
+ ncvf = &nc->vlan_filter;
+ bitmap = &ncvf->bitmap;
+ spin_lock_irqsave(&nc->lock, flags);
for (i = 0; i < rsp->vlan_cnt; i++, pdata += 2) {
if (!(enable & (0x1 << i)))
- continue;
+ clear_bit(i, bitmap);
+ else
+ set_bit(i, bitmap);
- vlan = ntohs(*(__be16 *)pdata);
- if (ncsi_find_filter(nc, NCSI_FILTER_VLAN, &vlan) >= 0)
- continue;
-
- ncsi_add_filter(nc, NCSI_FILTER_VLAN, &vlan);
+ ncvf->vids[i] = ntohs(*(__be16 *)pdata);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
return 0;
}
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 44d8a55..a5b60e6 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -444,6 +444,9 @@
endif # NF_CONNTRACK
+config NF_OSF
+ tristate
+
config NF_TABLES
select NETFILTER_NETLINK
tristate "Netfilter nf_tables support"
@@ -474,24 +477,6 @@
help
This option enables support for the "netdev" table.
-config NFT_EXTHDR
- tristate "Netfilter nf_tables exthdr module"
- help
- This option adds the "exthdr" expression that you can use to match
- IPv6 extension headers and tcp options.
-
-config NFT_META
- tristate "Netfilter nf_tables meta module"
- help
- This option adds the "meta" expression that you can use to match and
- to set packet metainformation such as the packet mark.
-
-config NFT_RT
- tristate "Netfilter nf_tables routing module"
- help
- This option adds the "rt" expression that you can use to match
- packet routing information such as the packet nexthop.
-
config NFT_NUMGEN
tristate "Netfilter nf_tables number generator module"
help
@@ -667,8 +652,7 @@
config NF_FLOW_TABLE_INET
tristate "Netfilter flow table mixed IPv4/IPv6 module"
- depends on NF_FLOW_TABLE_IPV4
- depends on NF_FLOW_TABLE_IPV6
+ depends on NF_FLOW_TABLE
help
This option adds the flow table mixed IPv4/IPv6 support.
@@ -1378,6 +1362,7 @@
config NETFILTER_XT_MATCH_OSF
tristate '"osf" Passive OS fingerprint match'
depends on NETFILTER_ADVANCED && NETFILTER_NETLINK
+ select NF_OSF
help
This option selects the Passive OS Fingerprinting match module
that allows to passively match the remote operating system by
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index fd32bd2..1aa710b 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -76,13 +76,10 @@
nf_tables-objs := nf_tables_core.o nf_tables_api.o nft_chain_filter.o \
nf_tables_trace.o nft_immediate.o nft_cmp.o nft_range.o \
nft_bitwise.o nft_byteorder.o nft_payload.o nft_lookup.o \
- nft_dynset.o
+ nft_dynset.o nft_meta.o nft_rt.o nft_exthdr.o
obj-$(CONFIG_NF_TABLES) += nf_tables.o
obj-$(CONFIG_NFT_COMPAT) += nft_compat.o
-obj-$(CONFIG_NFT_EXTHDR) += nft_exthdr.o
-obj-$(CONFIG_NFT_META) += nft_meta.o
-obj-$(CONFIG_NFT_RT) += nft_rt.o
obj-$(CONFIG_NFT_NUMGEN) += nft_numgen.o
obj-$(CONFIG_NFT_CT) += nft_ct.o
obj-$(CONFIG_NFT_FLOW_OFFLOAD) += nft_flow_offload.o
@@ -104,6 +101,7 @@
obj-$(CONFIG_NFT_FIB) += nft_fib.o
obj-$(CONFIG_NFT_FIB_INET) += nft_fib_inet.o
obj-$(CONFIG_NFT_FIB_NETDEV) += nft_fib_netdev.o
+obj-$(CONFIG_NF_OSF) += nf_osf.o
# nf_tables netdev
obj-$(CONFIG_NFT_DUP_NETDEV) += nft_dup_netdev.o
@@ -111,6 +109,8 @@
# flow table infrastructure
obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o
+nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o
+
obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o
# generic X tables
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 206fb2c..168af54 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -138,11 +138,6 @@
continue;
}
- if (reg->nat_hook && orig_ops[i]->nat_hook) {
- kvfree(new);
- return ERR_PTR(-EBUSY);
- }
-
if (inserted || reg->priority > orig_ops[i]->priority) {
new_ops[nhooks] = (void *)orig_ops[i];
new->hooks[nhooks] = old->hooks[i];
@@ -186,9 +181,31 @@
#endif
}
+int nf_hook_entries_insert_raw(struct nf_hook_entries __rcu **pp,
+ const struct nf_hook_ops *reg)
+{
+ struct nf_hook_entries *new_hooks;
+ struct nf_hook_entries *p;
+
+ p = rcu_dereference_raw(*pp);
+ new_hooks = nf_hook_entries_grow(p, reg);
+ if (IS_ERR(new_hooks))
+ return PTR_ERR(new_hooks);
+
+ hooks_validate(new_hooks);
+
+ rcu_assign_pointer(*pp, new_hooks);
+
+ BUG_ON(p == new_hooks);
+ nf_hook_entries_free(p);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nf_hook_entries_insert_raw);
+
/*
* __nf_hook_entries_try_shrink - try to shrink hook array
*
+ * @old -- current hook blob at @pp
* @pp -- location of hook blob
*
* Hook unregistration must always succeed, so to-be-removed hooks
@@ -201,14 +218,14 @@
*
* Returns address to free, or NULL.
*/
-static void *__nf_hook_entries_try_shrink(struct nf_hook_entries __rcu **pp)
+static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
+ struct nf_hook_entries __rcu **pp)
{
- struct nf_hook_entries *old, *new = NULL;
unsigned int i, j, skip = 0, hook_entries;
+ struct nf_hook_entries *new = NULL;
struct nf_hook_ops **orig_ops;
struct nf_hook_ops **new_ops;
- old = nf_entry_dereference(*pp);
if (WARN_ON_ONCE(!old))
return NULL;
@@ -347,11 +364,10 @@
* This cannot fail, hook unregistration must always succeed.
* Therefore replace the to-be-removed hook with a dummy hook.
*/
-static void nf_remove_net_hook(struct nf_hook_entries *old,
- const struct nf_hook_ops *unreg, int pf)
+static bool nf_remove_net_hook(struct nf_hook_entries *old,
+ const struct nf_hook_ops *unreg)
{
struct nf_hook_ops **orig_ops;
- bool found = false;
unsigned int i;
orig_ops = nf_hook_entries_get_hook_ops(old);
@@ -360,21 +376,10 @@
continue;
WRITE_ONCE(old->hooks[i].hook, accept_all);
WRITE_ONCE(orig_ops[i], &dummy_ops);
- found = true;
- break;
+ return true;
}
- if (found) {
-#ifdef CONFIG_NETFILTER_INGRESS
- if (pf == NFPROTO_NETDEV && unreg->hooknum == NF_NETDEV_INGRESS)
- net_dec_ingress_queue();
-#endif
-#ifdef HAVE_JUMP_LABEL
- static_key_slow_dec(&nf_hooks_needed[pf][unreg->hooknum]);
-#endif
- } else {
- WARN_ONCE(1, "hook not found, pf %d num %d", pf, unreg->hooknum);
- }
+ return false;
}
static void __nf_unregister_net_hook(struct net *net, int pf,
@@ -395,9 +400,19 @@
return;
}
- nf_remove_net_hook(p, reg, pf);
+ if (nf_remove_net_hook(p, reg)) {
+#ifdef CONFIG_NETFILTER_INGRESS
+ if (pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)
+ net_dec_ingress_queue();
+#endif
+#ifdef HAVE_JUMP_LABEL
+ static_key_slow_dec(&nf_hooks_needed[pf][reg->hooknum]);
+#endif
+ } else {
+ WARN_ONCE(1, "hook not found, pf %d num %d", pf, reg->hooknum);
+ }
- p = __nf_hook_entries_try_shrink(pp);
+ p = __nf_hook_entries_try_shrink(p, pp);
mutex_unlock(&nf_hook_mutex);
if (!p)
return;
@@ -417,6 +432,19 @@
}
EXPORT_SYMBOL(nf_unregister_net_hook);
+void nf_hook_entries_delete_raw(struct nf_hook_entries __rcu **pp,
+ const struct nf_hook_ops *reg)
+{
+ struct nf_hook_entries *p;
+
+ p = rcu_dereference_raw(*pp);
+ if (nf_remove_net_hook(p, reg)) {
+ p = __nf_hook_entries_try_shrink(p, pp);
+ nf_hook_entries_free(p);
+ }
+}
+EXPORT_SYMBOL_GPL(nf_hook_entries_delete_raw);
+
int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)
{
int err;
@@ -535,6 +563,9 @@
struct nfnl_ct_hook __rcu *nfnl_ct_hook __read_mostly;
EXPORT_SYMBOL_GPL(nfnl_ct_hook);
+struct nf_ct_hook __rcu *nf_ct_hook __read_mostly;
+EXPORT_SYMBOL_GPL(nf_ct_hook);
+
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
/* This does not belong here, but locally generated errors need it if connection
tracking in use: without this, connection may not be in hash table, and hence
@@ -543,6 +574,9 @@
__rcu __read_mostly;
EXPORT_SYMBOL(ip_ct_attach);
+struct nf_nat_hook __rcu *nf_nat_hook __read_mostly;
+EXPORT_SYMBOL_GPL(nf_nat_hook);
+
void nf_ct_attach(struct sk_buff *new, const struct sk_buff *skb)
{
void (*attach)(struct sk_buff *, const struct sk_buff *);
@@ -557,17 +591,14 @@
}
EXPORT_SYMBOL(nf_ct_attach);
-void (*nf_ct_destroy)(struct nf_conntrack *) __rcu __read_mostly;
-EXPORT_SYMBOL(nf_ct_destroy);
-
void nf_conntrack_destroy(struct nf_conntrack *nfct)
{
- void (*destroy)(struct nf_conntrack *);
+ struct nf_ct_hook *ct_hook;
rcu_read_lock();
- destroy = rcu_dereference(nf_ct_destroy);
- BUG_ON(destroy == NULL);
- destroy(nfct);
+ ct_hook = rcu_dereference(nf_ct_hook);
+ BUG_ON(ct_hook == NULL);
+ ct_hook->destroy(nfct);
rcu_read_unlock();
}
EXPORT_SYMBOL(nf_conntrack_destroy);
@@ -580,11 +611,6 @@
EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
#endif /* CONFIG_NF_CONNTRACK */
-#ifdef CONFIG_NF_NAT_NEEDED
-void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
-EXPORT_SYMBOL(nf_nat_decode_session_hook);
-#endif
-
static void __net_init
__netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
{
diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index b32fb0d..05dc1b7 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -225,6 +225,25 @@
If you want to compile it in kernel, say Y. To compile it as a
module, choose M here. If unsure, say N.
+config IP_VS_MH
+ tristate "maglev hashing scheduling"
+ ---help---
+ The maglev consistent hashing scheduling algorithm provides the
+ Google's Maglev hashing algorithm as a IPVS scheduler. It assigns
+ network connections to the servers through looking up a statically
+ assigned special hash table called the lookup table. Maglev hashing
+ is to assign a preference list of all the lookup table positions
+ to each destination.
+
+ Through this operation, The maglev hashing gives an almost equal
+ share of the lookup table to each of the destinations and provides
+ minimal disruption by using the lookup table. When the set of
+ destinations changes, a connection will likely be sent to the same
+ destination as it was before.
+
+ If you want to compile it in kernel, say Y. To compile it as a
+ module, choose M here. If unsure, say N.
+
config IP_VS_SED
tristate "shortest expected delay scheduling"
---help---
@@ -266,6 +285,24 @@
needs to be large enough to effectively fit all the destinations
multiplied by their respective weights.
+comment 'IPVS MH scheduler'
+
+config IP_VS_MH_TAB_INDEX
+ int "IPVS maglev hashing table index of size (the prime numbers)"
+ range 8 17
+ default 12
+ ---help---
+ The maglev hashing scheduler maps source IPs to destinations
+ stored in a hash table. This table is assigned by a preference
+ list of the positions to each destination until all slots in
+ the table are filled. The index determines the prime for size of
+ the table as 251, 509, 1021, 2039, 4093, 8191, 16381, 32749,
+ 65521 or 131071. When using weights to allow destinations to
+ receive more connections, the table is assigned an amount
+ proportional to the weights specified. The table needs to be large
+ enough to effectively fit all the destinations multiplied by their
+ respective weights.
+
comment 'IPVS application helper'
config IP_VS_FTP
diff --git a/net/netfilter/ipvs/Makefile b/net/netfilter/ipvs/Makefile
index c552993f..bfce267 100644
--- a/net/netfilter/ipvs/Makefile
+++ b/net/netfilter/ipvs/Makefile
@@ -33,6 +33,7 @@
obj-$(CONFIG_IP_VS_LBLCR) += ip_vs_lblcr.o
obj-$(CONFIG_IP_VS_DH) += ip_vs_dh.o
obj-$(CONFIG_IP_VS_SH) += ip_vs_sh.o
+obj-$(CONFIG_IP_VS_MH) += ip_vs_mh.o
obj-$(CONFIG_IP_VS_SED) += ip_vs_sed.o
obj-$(CONFIG_IP_VS_NQ) += ip_vs_nq.o
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index f360988..d4f68d0 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -821,6 +821,10 @@
if (add && udest->af != svc->af)
ipvs->mixed_address_family_dests++;
+ /* keep the last_weight with latest non-0 weight */
+ if (add || udest->weight != 0)
+ atomic_set(&dest->last_weight, udest->weight);
+
/* set the weight and the flags */
atomic_set(&dest->weight, udest->weight);
conn_flags = udest->conn_flags & IP_VS_CONN_F_DEST_MASK;
diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 75f798f..07459e7 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -43,6 +43,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/skbuff.h>
+#include <linux/hash.h>
#include <net/ip_vs.h>
@@ -81,7 +82,7 @@
addr_fold = addr->ip6[0]^addr->ip6[1]^
addr->ip6[2]^addr->ip6[3];
#endif
- return (ntohl(addr_fold)*2654435761UL) & IP_VS_DH_TAB_MASK;
+ return hash_32(ntohl(addr_fold), IP_VS_DH_TAB_BITS);
}
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 3057e45..b9f375e 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -48,6 +48,7 @@
#include <linux/kernel.h>
#include <linux/skbuff.h>
#include <linux/jiffies.h>
+#include <linux/hash.h>
/* for sysctl */
#include <linux/fs.h>
@@ -160,7 +161,7 @@
addr_fold = addr->ip6[0]^addr->ip6[1]^
addr->ip6[2]^addr->ip6[3];
#endif
- return (ntohl(addr_fold)*2654435761UL) & IP_VS_LBLC_TAB_MASK;
+ return hash_32(ntohl(addr_fold), IP_VS_LBLC_TAB_BITS);
}
@@ -371,6 +372,7 @@
tbl->counter = 1;
tbl->dead = false;
tbl->svc = svc;
+ atomic_set(&tbl->entries, 0);
/*
* Hook periodic timer for garbage collection
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 92adc04..542c494 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -47,6 +47,7 @@
#include <linux/jiffies.h>
#include <linux/list.h>
#include <linux/slab.h>
+#include <linux/hash.h>
/* for sysctl */
#include <linux/fs.h>
@@ -323,7 +324,7 @@
addr_fold = addr->ip6[0]^addr->ip6[1]^
addr->ip6[2]^addr->ip6[3];
#endif
- return (ntohl(addr_fold)*2654435761UL) & IP_VS_LBLCR_TAB_MASK;
+ return hash_32(ntohl(addr_fold), IP_VS_LBLCR_TAB_BITS);
}
@@ -534,6 +535,7 @@
tbl->counter = 1;
tbl->dead = false;
tbl->svc = svc;
+ atomic_set(&tbl->entries, 0);
/*
* Hook periodic timer for garbage collection
diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
new file mode 100644
index 0000000..0f795b1
--- /dev/null
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -0,0 +1,540 @@
+// SPDX-License-Identifier: GPL-2.0
+/* IPVS: Maglev Hashing scheduling module
+ *
+ * Authors: Inju Song <inju.song@navercorp.com>
+ *
+ */
+
+/* The mh algorithm is to assign a preference list of all the lookup
+ * table positions to each destination and populate the table with
+ * the most-preferred position of destinations. Then it is to select
+ * destination with the hash key of source IP address through looking
+ * up a the lookup table.
+ *
+ * The algorithm is detailed in:
+ * [3.4 Consistent Hasing]
+https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
+ *
+ */
+
+#define KMSG_COMPONENT "IPVS"
+#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+
+#include <linux/ip.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+
+#include <net/ip_vs.h>
+
+#include <linux/siphash.h>
+#include <linux/bitops.h>
+#include <linux/gcd.h>
+
+#define IP_VS_SVC_F_SCHED_MH_FALLBACK IP_VS_SVC_F_SCHED1 /* MH fallback */
+#define IP_VS_SVC_F_SCHED_MH_PORT IP_VS_SVC_F_SCHED2 /* MH use port */
+
+struct ip_vs_mh_lookup {
+ struct ip_vs_dest __rcu *dest; /* real server (cache) */
+};
+
+struct ip_vs_mh_dest_setup {
+ unsigned int offset; /* starting offset */
+ unsigned int skip; /* skip */
+ unsigned int perm; /* next_offset */
+ int turns; /* weight / gcd() and rshift */
+};
+
+/* Available prime numbers for MH table */
+static int primes[] = {251, 509, 1021, 2039, 4093,
+ 8191, 16381, 32749, 65521, 131071};
+
+/* For IPVS MH entry hash table */
+#ifndef CONFIG_IP_VS_MH_TAB_INDEX
+#define CONFIG_IP_VS_MH_TAB_INDEX 12
+#endif
+#define IP_VS_MH_TAB_BITS (CONFIG_IP_VS_MH_TAB_INDEX / 2)
+#define IP_VS_MH_TAB_INDEX (CONFIG_IP_VS_MH_TAB_INDEX - 8)
+#define IP_VS_MH_TAB_SIZE primes[IP_VS_MH_TAB_INDEX]
+
+struct ip_vs_mh_state {
+ struct rcu_head rcu_head;
+ struct ip_vs_mh_lookup *lookup;
+ struct ip_vs_mh_dest_setup *dest_setup;
+ hsiphash_key_t hash1, hash2;
+ int gcd;
+ int rshift;
+};
+
+static inline void generate_hash_secret(hsiphash_key_t *hash1,
+ hsiphash_key_t *hash2)
+{
+ hash1->key[0] = 2654435761UL;
+ hash1->key[1] = 2654435761UL;
+
+ hash2->key[0] = 2654446892UL;
+ hash2->key[1] = 2654446892UL;
+}
+
+/* Helper function to determine if server is unavailable */
+static inline bool is_unavailable(struct ip_vs_dest *dest)
+{
+ return atomic_read(&dest->weight) <= 0 ||
+ dest->flags & IP_VS_DEST_F_OVERLOAD;
+}
+
+/* Returns hash value for IPVS MH entry */
+static inline unsigned int
+ip_vs_mh_hashkey(int af, const union nf_inet_addr *addr,
+ __be16 port, hsiphash_key_t *key, unsigned int offset)
+{
+ unsigned int v;
+ __be32 addr_fold = addr->ip;
+
+#ifdef CONFIG_IP_VS_IPV6
+ if (af == AF_INET6)
+ addr_fold = addr->ip6[0] ^ addr->ip6[1] ^
+ addr->ip6[2] ^ addr->ip6[3];
+#endif
+ v = (offset + ntohs(port) + ntohl(addr_fold));
+ return hsiphash(&v, sizeof(v), key);
+}
+
+/* Reset all the hash buckets of the specified table. */
+static void ip_vs_mh_reset(struct ip_vs_mh_state *s)
+{
+ int i;
+ struct ip_vs_mh_lookup *l;
+ struct ip_vs_dest *dest;
+
+ l = &s->lookup[0];
+ for (i = 0; i < IP_VS_MH_TAB_SIZE; i++) {
+ dest = rcu_dereference_protected(l->dest, 1);
+ if (dest) {
+ ip_vs_dest_put(dest);
+ RCU_INIT_POINTER(l->dest, NULL);
+ }
+ l++;
+ }
+}
+
+static int ip_vs_mh_permutate(struct ip_vs_mh_state *s,
+ struct ip_vs_service *svc)
+{
+ struct list_head *p;
+ struct ip_vs_mh_dest_setup *ds;
+ struct ip_vs_dest *dest;
+ int lw;
+
+ /* If gcd is smaller then 1, number of dests or
+ * all last_weight of dests are zero. So, skip
+ * permutation for the dests.
+ */
+ if (s->gcd < 1)
+ return 0;
+
+ /* Set dest_setup for the dests permutation */
+ p = &svc->destinations;
+ ds = &s->dest_setup[0];
+ while ((p = p->next) != &svc->destinations) {
+ dest = list_entry(p, struct ip_vs_dest, n_list);
+
+ ds->offset = ip_vs_mh_hashkey(svc->af, &dest->addr,
+ dest->port, &s->hash1, 0) %
+ IP_VS_MH_TAB_SIZE;
+ ds->skip = ip_vs_mh_hashkey(svc->af, &dest->addr,
+ dest->port, &s->hash2, 0) %
+ (IP_VS_MH_TAB_SIZE - 1) + 1;
+ ds->perm = ds->offset;
+
+ lw = atomic_read(&dest->last_weight);
+ ds->turns = ((lw / s->gcd) >> s->rshift) ? : (lw != 0);
+ ds++;
+ }
+
+ return 0;
+}
+
+static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
+ struct ip_vs_service *svc)
+{
+ int n, c, dt_count;
+ unsigned long *table;
+ struct list_head *p;
+ struct ip_vs_mh_dest_setup *ds;
+ struct ip_vs_dest *dest, *new_dest;
+
+ /* If gcd is smaller then 1, number of dests or
+ * all last_weight of dests are zero. So, skip
+ * the population for the dests and reset lookup table.
+ */
+ if (s->gcd < 1) {
+ ip_vs_mh_reset(s);
+ return 0;
+ }
+
+ table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!table)
+ return -ENOMEM;
+
+ p = &svc->destinations;
+ n = 0;
+ dt_count = 0;
+ while (n < IP_VS_MH_TAB_SIZE) {
+ if (p == &svc->destinations)
+ p = p->next;
+
+ ds = &s->dest_setup[0];
+ while (p != &svc->destinations) {
+ /* Ignore added server with zero weight */
+ if (ds->turns < 1) {
+ p = p->next;
+ ds++;
+ continue;
+ }
+
+ c = ds->perm;
+ while (test_bit(c, table)) {
+ /* Add skip, mod IP_VS_MH_TAB_SIZE */
+ ds->perm += ds->skip;
+ if (ds->perm >= IP_VS_MH_TAB_SIZE)
+ ds->perm -= IP_VS_MH_TAB_SIZE;
+ c = ds->perm;
+ }
+
+ __set_bit(c, table);
+
+ dest = rcu_dereference_protected(s->lookup[c].dest, 1);
+ new_dest = list_entry(p, struct ip_vs_dest, n_list);
+ if (dest != new_dest) {
+ if (dest)
+ ip_vs_dest_put(dest);
+ ip_vs_dest_hold(new_dest);
+ RCU_INIT_POINTER(s->lookup[c].dest, new_dest);
+ }
+
+ if (++n == IP_VS_MH_TAB_SIZE)
+ goto out;
+
+ if (++dt_count >= ds->turns) {
+ dt_count = 0;
+ p = p->next;
+ ds++;
+ }
+ }
+ }
+
+out:
+ kfree(table);
+ return 0;
+}
+
+/* Get ip_vs_dest associated with supplied parameters. */
+static inline struct ip_vs_dest *
+ip_vs_mh_get(struct ip_vs_service *svc, struct ip_vs_mh_state *s,
+ const union nf_inet_addr *addr, __be16 port)
+{
+ unsigned int hash = ip_vs_mh_hashkey(svc->af, addr, port, &s->hash1, 0)
+ % IP_VS_MH_TAB_SIZE;
+ struct ip_vs_dest *dest = rcu_dereference(s->lookup[hash].dest);
+
+ return (!dest || is_unavailable(dest)) ? NULL : dest;
+}
+
+/* As ip_vs_mh_get, but with fallback if selected server is unavailable */
+static inline struct ip_vs_dest *
+ip_vs_mh_get_fallback(struct ip_vs_service *svc, struct ip_vs_mh_state *s,
+ const union nf_inet_addr *addr, __be16 port)
+{
+ unsigned int offset, roffset;
+ unsigned int hash, ihash;
+ struct ip_vs_dest *dest;
+
+ /* First try the dest it's supposed to go to */
+ ihash = ip_vs_mh_hashkey(svc->af, addr, port,
+ &s->hash1, 0) % IP_VS_MH_TAB_SIZE;
+ dest = rcu_dereference(s->lookup[ihash].dest);
+ if (!dest)
+ return NULL;
+ if (!is_unavailable(dest))
+ return dest;
+
+ IP_VS_DBG_BUF(6, "MH: selected unavailable server %s:%u, reselecting",
+ IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
+
+ /* If the original dest is unavailable, loop around the table
+ * starting from ihash to find a new dest
+ */
+ for (offset = 0; offset < IP_VS_MH_TAB_SIZE; offset++) {
+ roffset = (offset + ihash) % IP_VS_MH_TAB_SIZE;
+ hash = ip_vs_mh_hashkey(svc->af, addr, port, &s->hash1,
+ roffset) % IP_VS_MH_TAB_SIZE;
+ dest = rcu_dereference(s->lookup[hash].dest);
+ if (!dest)
+ break;
+ if (!is_unavailable(dest))
+ return dest;
+ IP_VS_DBG_BUF(6,
+ "MH: selected unavailable server %s:%u (offset %u), reselecting",
+ IP_VS_DBG_ADDR(dest->af, &dest->addr),
+ ntohs(dest->port), roffset);
+ }
+
+ return NULL;
+}
+
+/* Assign all the hash buckets of the specified table with the service. */
+static int ip_vs_mh_reassign(struct ip_vs_mh_state *s,
+ struct ip_vs_service *svc)
+{
+ int ret;
+
+ if (svc->num_dests > IP_VS_MH_TAB_SIZE)
+ return -EINVAL;
+
+ if (svc->num_dests >= 1) {
+ s->dest_setup = kcalloc(svc->num_dests,
+ sizeof(struct ip_vs_mh_dest_setup),
+ GFP_KERNEL);
+ if (!s->dest_setup)
+ return -ENOMEM;
+ }
+
+ ip_vs_mh_permutate(s, svc);
+
+ ret = ip_vs_mh_populate(s, svc);
+ if (ret < 0)
+ goto out;
+
+ IP_VS_DBG_BUF(6, "MH: reassign lookup table of %s:%u\n",
+ IP_VS_DBG_ADDR(svc->af, &svc->addr),
+ ntohs(svc->port));
+
+out:
+ if (svc->num_dests >= 1) {
+ kfree(s->dest_setup);
+ s->dest_setup = NULL;
+ }
+ return ret;
+}
+
+static int ip_vs_mh_gcd_weight(struct ip_vs_service *svc)
+{
+ struct ip_vs_dest *dest;
+ int weight;
+ int g = 0;
+
+ list_for_each_entry(dest, &svc->destinations, n_list) {
+ weight = atomic_read(&dest->last_weight);
+ if (weight > 0) {
+ if (g > 0)
+ g = gcd(weight, g);
+ else
+ g = weight;
+ }
+ }
+ return g;
+}
+
+/* To avoid assigning huge weight for the MH table,
+ * calculate shift value with gcd.
+ */
+static int ip_vs_mh_shift_weight(struct ip_vs_service *svc, int gcd)
+{
+ struct ip_vs_dest *dest;
+ int new_weight, weight = 0;
+ int mw, shift;
+
+ /* If gcd is smaller then 1, number of dests or
+ * all last_weight of dests are zero. So, return
+ * shift value as zero.
+ */
+ if (gcd < 1)
+ return 0;
+
+ list_for_each_entry(dest, &svc->destinations, n_list) {
+ new_weight = atomic_read(&dest->last_weight);
+ if (new_weight > weight)
+ weight = new_weight;
+ }
+
+ /* Because gcd is greater than zero,
+ * the maximum weight and gcd are always greater than zero
+ */
+ mw = weight / gcd;
+
+ /* shift = occupied bits of weight/gcd - MH highest bits */
+ shift = fls(mw) - IP_VS_MH_TAB_BITS;
+ return (shift >= 0) ? shift : 0;
+}
+
+static void ip_vs_mh_state_free(struct rcu_head *head)
+{
+ struct ip_vs_mh_state *s;
+
+ s = container_of(head, struct ip_vs_mh_state, rcu_head);
+ kfree(s->lookup);
+ kfree(s);
+}
+
+static int ip_vs_mh_init_svc(struct ip_vs_service *svc)
+{
+ int ret;
+ struct ip_vs_mh_state *s;
+
+ /* Allocate the MH table for this service */
+ s = kzalloc(sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ s->lookup = kcalloc(IP_VS_MH_TAB_SIZE, sizeof(struct ip_vs_mh_lookup),
+ GFP_KERNEL);
+ if (!s->lookup) {
+ kfree(s);
+ return -ENOMEM;
+ }
+
+ generate_hash_secret(&s->hash1, &s->hash2);
+ s->gcd = ip_vs_mh_gcd_weight(svc);
+ s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
+
+ IP_VS_DBG(6,
+ "MH lookup table (memory=%zdbytes) allocated for current service\n",
+ sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
+
+ /* Assign the lookup table with current dests */
+ ret = ip_vs_mh_reassign(s, svc);
+ if (ret < 0) {
+ ip_vs_mh_reset(s);
+ ip_vs_mh_state_free(&s->rcu_head);
+ return ret;
+ }
+
+ /* No more failures, attach state */
+ svc->sched_data = s;
+ return 0;
+}
+
+static void ip_vs_mh_done_svc(struct ip_vs_service *svc)
+{
+ struct ip_vs_mh_state *s = svc->sched_data;
+
+ /* Got to clean up lookup entry here */
+ ip_vs_mh_reset(s);
+
+ call_rcu(&s->rcu_head, ip_vs_mh_state_free);
+ IP_VS_DBG(6, "MH lookup table (memory=%zdbytes) released\n",
+ sizeof(struct ip_vs_mh_lookup) * IP_VS_MH_TAB_SIZE);
+}
+
+static int ip_vs_mh_dest_changed(struct ip_vs_service *svc,
+ struct ip_vs_dest *dest)
+{
+ struct ip_vs_mh_state *s = svc->sched_data;
+
+ s->gcd = ip_vs_mh_gcd_weight(svc);
+ s->rshift = ip_vs_mh_shift_weight(svc, s->gcd);
+
+ /* Assign the lookup table with the updated service */
+ return ip_vs_mh_reassign(s, svc);
+}
+
+/* Helper function to get port number */
+static inline __be16
+ip_vs_mh_get_port(const struct sk_buff *skb, struct ip_vs_iphdr *iph)
+{
+ __be16 _ports[2], *ports;
+
+ /* At this point we know that we have a valid packet of some kind.
+ * Because ICMP packets are only guaranteed to have the first 8
+ * bytes, let's just grab the ports. Fortunately they're in the
+ * same position for all three of the protocols we care about.
+ */
+ switch (iph->protocol) {
+ case IPPROTO_TCP:
+ case IPPROTO_UDP:
+ case IPPROTO_SCTP:
+ ports = skb_header_pointer(skb, iph->len, sizeof(_ports),
+ &_ports);
+ if (unlikely(!ports))
+ return 0;
+
+ if (likely(!ip_vs_iph_inverse(iph)))
+ return ports[0];
+ else
+ return ports[1];
+ default:
+ return 0;
+ }
+}
+
+/* Maglev Hashing scheduling */
+static struct ip_vs_dest *
+ip_vs_mh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
+ struct ip_vs_iphdr *iph)
+{
+ struct ip_vs_dest *dest;
+ struct ip_vs_mh_state *s;
+ __be16 port = 0;
+ const union nf_inet_addr *hash_addr;
+
+ hash_addr = ip_vs_iph_inverse(iph) ? &iph->daddr : &iph->saddr;
+
+ IP_VS_DBG(6, "%s : Scheduling...\n", __func__);
+
+ if (svc->flags & IP_VS_SVC_F_SCHED_MH_PORT)
+ port = ip_vs_mh_get_port(skb, iph);
+
+ s = (struct ip_vs_mh_state *)svc->sched_data;
+
+ if (svc->flags & IP_VS_SVC_F_SCHED_MH_FALLBACK)
+ dest = ip_vs_mh_get_fallback(svc, s, hash_addr, port);
+ else
+ dest = ip_vs_mh_get(svc, s, hash_addr, port);
+
+ if (!dest) {
+ ip_vs_scheduler_err(svc, "no destination available");
+ return NULL;
+ }
+
+ IP_VS_DBG_BUF(6, "MH: source IP address %s:%u --> server %s:%u\n",
+ IP_VS_DBG_ADDR(svc->af, hash_addr),
+ ntohs(port),
+ IP_VS_DBG_ADDR(dest->af, &dest->addr),
+ ntohs(dest->port));
+
+ return dest;
+}
+
+/* IPVS MH Scheduler structure */
+static struct ip_vs_scheduler ip_vs_mh_scheduler = {
+ .name = "mh",
+ .refcnt = ATOMIC_INIT(0),
+ .module = THIS_MODULE,
+ .n_list = LIST_HEAD_INIT(ip_vs_mh_scheduler.n_list),
+ .init_service = ip_vs_mh_init_svc,
+ .done_service = ip_vs_mh_done_svc,
+ .add_dest = ip_vs_mh_dest_changed,
+ .del_dest = ip_vs_mh_dest_changed,
+ .upd_dest = ip_vs_mh_dest_changed,
+ .schedule = ip_vs_mh_schedule,
+};
+
+static int __init ip_vs_mh_init(void)
+{
+ return register_ip_vs_scheduler(&ip_vs_mh_scheduler);
+}
+
+static void __exit ip_vs_mh_cleanup(void)
+{
+ unregister_ip_vs_scheduler(&ip_vs_mh_scheduler);
+ rcu_barrier();
+}
+
+module_init(ip_vs_mh_init);
+module_exit(ip_vs_mh_cleanup);
+MODULE_DESCRIPTION("Maglev hashing ipvs scheduler");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Inju Song <inju.song@navercorp.com>");
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index bcd9b7b..569631d 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -436,7 +436,7 @@
return tcp_state_active_table[state];
}
-static struct tcp_states_t tcp_states [] = {
+static struct tcp_states_t tcp_states[] = {
/* INPUT */
/* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR }},
@@ -459,7 +459,7 @@
/*rst*/ {{sCL, sCL, sCL, sSR, sCL, sCL, sCL, sCL, sLA, sLI, sCL }},
};
-static struct tcp_states_t tcp_states_dos [] = {
+static struct tcp_states_t tcp_states_dos[] = {
/* INPUT */
/* sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */
/*syn*/ {{sSR, sES, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSA }},
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 16aaac6..1e01c78 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -96,7 +96,8 @@
addr_fold = addr->ip6[0]^addr->ip6[1]^
addr->ip6[2]^addr->ip6[3];
#endif
- return (offset + (ntohs(port) + ntohl(addr_fold))*2654435761UL) &
+ return (offset + hash_32(ntohs(port) + ntohl(addr_fold),
+ IP_VS_SH_TAB_BITS)) &
IP_VS_SH_TAB_MASK;
}
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 4527921..ba0a0fd 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -266,12 +266,13 @@
/* check and decrement ttl */
if (ipv6_hdr(skb)->hop_limit <= 1) {
+ struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
+
/* Force OUTPUT device used as source address */
skb->dev = dst->dev;
icmpv6_send(skb, ICMPV6_TIME_EXCEED,
ICMPV6_EXC_HOPLIMIT, 0);
- __IP6_INC_STATS(net, ip6_dst_idev(dst),
- IPSTATS_MIB_INHDRERRORS);
+ __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
return false;
}
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 41ff04e..3465da2 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -58,11 +58,6 @@
#include "nf_internals.h"
-int (*nfnetlink_parse_nat_setup_hook)(struct nf_conn *ct,
- enum nf_nat_manip_type manip,
- const struct nlattr *attr) __read_mostly;
-EXPORT_SYMBOL_GPL(nfnetlink_parse_nat_setup_hook);
-
__cacheline_aligned_in_smp spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS];
EXPORT_SYMBOL_GPL(nf_conntrack_locks);
@@ -186,6 +181,7 @@
EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
unsigned int nf_conntrack_max __read_mostly;
+EXPORT_SYMBOL_GPL(nf_conntrack_max);
seqcount_t nf_conntrack_generation __read_mostly;
static unsigned int nf_conntrack_hash_rnd __read_mostly;
@@ -1611,6 +1607,82 @@
nf_conntrack_get(skb_nfct(nskb));
}
+static int nf_conntrack_update(struct net *net, struct sk_buff *skb)
+{
+ const struct nf_conntrack_l3proto *l3proto;
+ const struct nf_conntrack_l4proto *l4proto;
+ struct nf_conntrack_tuple_hash *h;
+ struct nf_conntrack_tuple tuple;
+ enum ip_conntrack_info ctinfo;
+ struct nf_nat_hook *nat_hook;
+ unsigned int dataoff, status;
+ struct nf_conn *ct;
+ u16 l3num;
+ u8 l4num;
+
+ ct = nf_ct_get(skb, &ctinfo);
+ if (!ct || nf_ct_is_confirmed(ct))
+ return 0;
+
+ l3num = nf_ct_l3num(ct);
+ l3proto = nf_ct_l3proto_find_get(l3num);
+
+ if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff,
+ &l4num) <= 0)
+ return -1;
+
+ l4proto = nf_ct_l4proto_find_get(l3num, l4num);
+
+ if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
+ l4num, net, &tuple, l3proto, l4proto))
+ return -1;
+
+ if (ct->status & IPS_SRC_NAT) {
+ memcpy(tuple.src.u3.all,
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.all,
+ sizeof(tuple.src.u3.all));
+ tuple.src.u.all =
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.all;
+ }
+
+ if (ct->status & IPS_DST_NAT) {
+ memcpy(tuple.dst.u3.all,
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.all,
+ sizeof(tuple.dst.u3.all));
+ tuple.dst.u.all =
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u.all;
+ }
+
+ h = nf_conntrack_find_get(net, nf_ct_zone(ct), &tuple);
+ if (!h)
+ return 0;
+
+ /* Store status bits of the conntrack that is clashing to re-do NAT
+ * mangling according to what it has been done already to this packet.
+ */
+ status = ct->status;
+
+ nf_ct_put(ct);
+ ct = nf_ct_tuplehash_to_ctrack(h);
+ nf_ct_set(skb, ct, ctinfo);
+
+ nat_hook = rcu_dereference(nf_nat_hook);
+ if (!nat_hook)
+ return 0;
+
+ if (status & IPS_SRC_NAT &&
+ nat_hook->manip_pkt(skb, ct, NF_NAT_MANIP_SRC,
+ IP_CT_DIR_ORIGINAL) == NF_DROP)
+ return -1;
+
+ if (status & IPS_DST_NAT &&
+ nat_hook->manip_pkt(skb, ct, NF_NAT_MANIP_DST,
+ IP_CT_DIR_ORIGINAL) == NF_DROP)
+ return -1;
+
+ return 0;
+}
+
/* Bring out ya dead! */
static struct nf_conn *
get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
@@ -1812,8 +1884,7 @@
void nf_conntrack_cleanup_end(void)
{
- RCU_INIT_POINTER(nf_ct_destroy, NULL);
-
+ RCU_INIT_POINTER(nf_ct_hook, NULL);
cancel_delayed_work_sync(&conntrack_gc_work.dwork);
nf_ct_free_hashtable(nf_conntrack_hash, nf_conntrack_htable_size);
@@ -2130,11 +2201,16 @@
return ret;
}
+static struct nf_ct_hook nf_conntrack_hook = {
+ .update = nf_conntrack_update,
+ .destroy = destroy_conntrack,
+};
+
void nf_conntrack_init_end(void)
{
/* For use by REJECT target */
RCU_INIT_POINTER(ip_ct_attach, nf_conntrack_attach);
- RCU_INIT_POINTER(nf_ct_destroy, destroy_conntrack);
+ RCU_INIT_POINTER(nf_ct_hook, &nf_conntrack_hook);
}
/*
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index f0e9a75..a11c304 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -566,8 +566,7 @@
.timeout = 5 * 60,
};
-/* don't make this __exit, since it's called from __init ! */
-static void nf_conntrack_ftp_fini(void)
+static void __exit nf_conntrack_ftp_fini(void)
{
nf_conntrack_helpers_unregister(ftp, ports_c * 2);
kfree(ftp_buffer);
diff --git a/net/netfilter/nf_conntrack_irc.c b/net/netfilter/nf_conntrack_irc.c
index 5523acc..4099f4d 100644
--- a/net/netfilter/nf_conntrack_irc.c
+++ b/net/netfilter/nf_conntrack_irc.c
@@ -232,8 +232,6 @@
static struct nf_conntrack_helper irc[MAX_PORTS] __read_mostly;
static struct nf_conntrack_expect_policy irc_exp_policy;
-static void nf_conntrack_irc_fini(void);
-
static int __init nf_conntrack_irc_init(void)
{
int i, ret;
@@ -276,9 +274,7 @@
return 0;
}
-/* This function is intentionally _NOT_ defined as __exit, because
- * it is needed by the init function */
-static void nf_conntrack_irc_fini(void)
+static void __exit nf_conntrack_irc_fini(void)
{
nf_conntrack_helpers_unregister(irc, ports_c);
kfree(irc_buffer);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 4c1d0c5..39327a4 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1431,11 +1431,11 @@
enum nf_nat_manip_type manip,
const struct nlattr *attr)
{
- typeof(nfnetlink_parse_nat_setup_hook) parse_nat_setup;
+ struct nf_nat_hook *nat_hook;
int err;
- parse_nat_setup = rcu_dereference(nfnetlink_parse_nat_setup_hook);
- if (!parse_nat_setup) {
+ nat_hook = rcu_dereference(nf_nat_hook);
+ if (!nat_hook) {
#ifdef CONFIG_MODULES
rcu_read_unlock();
nfnl_unlock(NFNL_SUBSYS_CTNETLINK);
@@ -1446,13 +1446,13 @@
}
nfnl_lock(NFNL_SUBSYS_CTNETLINK);
rcu_read_lock();
- if (nfnetlink_parse_nat_setup_hook)
+ if (nat_hook->parse_nat_setup)
return -EAGAIN;
#endif
return -EOPNOTSUPP;
}
- err = parse_nat_setup(ct, manip, attr);
+ err = nat_hook->parse_nat_setup(ct, manip, attr);
if (err == -EAGAIN) {
#ifdef CONFIG_MODULES
rcu_read_unlock();
@@ -2205,6 +2205,9 @@
if (nla_put_be32(skb, CTA_STATS_GLOBAL_ENTRIES, htonl(nr_conntracks)))
goto nla_put_failure;
+ if (nla_put_be32(skb, CTA_STATS_GLOBAL_MAX_ENTRIES, htonl(nf_conntrack_max)))
+ goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return skb->len;
diff --git a/net/netfilter/nf_conntrack_sane.c b/net/netfilter/nf_conntrack_sane.c
index ae457f3..5072ff9 100644
--- a/net/netfilter/nf_conntrack_sane.c
+++ b/net/netfilter/nf_conntrack_sane.c
@@ -173,8 +173,7 @@
.timeout = 5 * 60,
};
-/* don't make this __exit, since it's called from __init ! */
-static void nf_conntrack_sane_fini(void)
+static void __exit nf_conntrack_sane_fini(void)
{
nf_conntrack_helpers_unregister(sane, ports_c * 2);
kfree(sane_buffer);
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 908e51e..c8d2b66 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1617,7 +1617,7 @@
},
};
-static void nf_conntrack_sip_fini(void)
+static void __exit nf_conntrack_sip_fini(void)
{
nf_conntrack_helpers_unregister(sip, ports_c * 4);
}
diff --git a/net/netfilter/nf_conntrack_tftp.c b/net/netfilter/nf_conntrack_tftp.c
index 0ec6779..548b673 100644
--- a/net/netfilter/nf_conntrack_tftp.c
+++ b/net/netfilter/nf_conntrack_tftp.c
@@ -104,7 +104,7 @@
.timeout = 5 * 60,
};
-static void nf_conntrack_tftp_fini(void)
+static void __exit nf_conntrack_tftp_fini(void)
{
nf_conntrack_helpers_unregister(tftp, ports_c * 2);
}
diff --git a/net/netfilter/nf_flow_table.c b/net/netfilter/nf_flow_table_core.c
similarity index 67%
rename from net/netfilter/nf_flow_table.c
rename to net/netfilter/nf_flow_table_core.c
index ec410ca..eb0d165 100644
--- a/net/netfilter/nf_flow_table.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -4,6 +4,8 @@
#include <linux/netfilter.h>
#include <linux/rhashtable.h>
#include <linux/netdevice.h>
+#include <net/ip.h>
+#include <net/ip6_route.h>
#include <net/netfilter/nf_tables.h>
#include <net/netfilter/nf_flow_table.h>
#include <net/netfilter/nf_conntrack.h>
@@ -16,6 +18,43 @@
struct rcu_head rcu_head;
};
+static DEFINE_MUTEX(flowtable_lock);
+static LIST_HEAD(flowtables);
+
+static void
+flow_offload_fill_dir(struct flow_offload *flow, struct nf_conn *ct,
+ struct nf_flow_route *route,
+ enum flow_offload_tuple_dir dir)
+{
+ struct flow_offload_tuple *ft = &flow->tuplehash[dir].tuple;
+ struct nf_conntrack_tuple *ctt = &ct->tuplehash[dir].tuple;
+ struct dst_entry *dst = route->tuple[dir].dst;
+
+ ft->dir = dir;
+
+ switch (ctt->src.l3num) {
+ case NFPROTO_IPV4:
+ ft->src_v4 = ctt->src.u3.in;
+ ft->dst_v4 = ctt->dst.u3.in;
+ ft->mtu = ip_dst_mtu_maybe_forward(dst, true);
+ break;
+ case NFPROTO_IPV6:
+ ft->src_v6 = ctt->src.u3.in6;
+ ft->dst_v6 = ctt->dst.u3.in6;
+ ft->mtu = ip6_dst_mtu_forward(dst);
+ break;
+ }
+
+ ft->l3proto = ctt->src.l3num;
+ ft->l4proto = ctt->dst.protonum;
+ ft->src_port = ctt->src.u.tcp.port;
+ ft->dst_port = ctt->dst.u.tcp.port;
+
+ ft->iifidx = route->tuple[dir].ifindex;
+ ft->oifidx = route->tuple[!dir].ifindex;
+ ft->dst_cache = dst;
+}
+
struct flow_offload *
flow_offload_alloc(struct nf_conn *ct, struct nf_flow_route *route)
{
@@ -40,69 +79,12 @@
entry->ct = ct;
- switch (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num) {
- case NFPROTO_IPV4:
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v4 =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.in;
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v4 =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.in;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4 =
- ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.in;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4 =
- ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.in;
- break;
- case NFPROTO_IPV6:
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v6 =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.in6;
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v6 =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.in6;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v6 =
- ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.in6;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v6 =
- ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.in6;
- break;
- }
-
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l3proto =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num;
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l4proto =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.l3proto =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.l4proto =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum;
-
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache =
- route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].dst;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_cache =
- route->tuple[FLOW_OFFLOAD_DIR_REPLY].dst;
-
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_port =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port;
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_port =
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u.tcp.port;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_port =
- ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_port =
- ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u.tcp.port;
-
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dir =
- FLOW_OFFLOAD_DIR_ORIGINAL;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dir =
- FLOW_OFFLOAD_DIR_REPLY;
-
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.iifidx =
- route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].ifindex;
- flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.oifidx =
- route->tuple[FLOW_OFFLOAD_DIR_REPLY].ifindex;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.iifidx =
- route->tuple[FLOW_OFFLOAD_DIR_REPLY].ifindex;
- flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.oifidx =
- route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].ifindex;
+ flow_offload_fill_dir(flow, ct, route, FLOW_OFFLOAD_DIR_ORIGINAL);
+ flow_offload_fill_dir(flow, ct, route, FLOW_OFFLOAD_DIR_REPLY);
if (ct->status & IPS_SRC_NAT)
flow->flags |= FLOW_OFFLOAD_SNAT;
- else if (ct->status & IPS_DST_NAT)
+ if (ct->status & IPS_DST_NAT)
flow->flags |= FLOW_OFFLOAD_DNAT;
return flow;
@@ -118,6 +100,43 @@
}
EXPORT_SYMBOL_GPL(flow_offload_alloc);
+static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
+{
+ tcp->state = TCP_CONNTRACK_ESTABLISHED;
+ tcp->seen[0].td_maxwin = 0;
+ tcp->seen[1].td_maxwin = 0;
+}
+
+static void flow_offload_fixup_ct_state(struct nf_conn *ct)
+{
+ const struct nf_conntrack_l4proto *l4proto;
+ struct net *net = nf_ct_net(ct);
+ unsigned int *timeouts;
+ unsigned int timeout;
+ int l4num;
+
+ l4num = nf_ct_protonum(ct);
+ if (l4num == IPPROTO_TCP)
+ flow_offload_fixup_tcp(&ct->proto.tcp);
+
+ l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), l4num);
+ if (!l4proto)
+ return;
+
+ timeouts = l4proto->get_timeouts(net);
+ if (!timeouts)
+ return;
+
+ if (l4num == IPPROTO_TCP)
+ timeout = timeouts[TCP_CONNTRACK_ESTABLISHED];
+ else if (l4num == IPPROTO_UDP)
+ timeout = timeouts[UDP_CT_REPLIED];
+ else
+ return;
+
+ ct->timeout = nfct_time_stamp + timeout;
+}
+
void flow_offload_free(struct flow_offload *flow)
{
struct flow_offload_entry *e;
@@ -125,17 +144,46 @@
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache);
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_cache);
e = container_of(flow, struct flow_offload_entry, flow);
- nf_ct_delete(e->ct, 0, 0);
+ if (flow->flags & FLOW_OFFLOAD_DYING)
+ nf_ct_delete(e->ct, 0, 0);
nf_ct_put(e->ct);
kfree_rcu(e, rcu_head);
}
EXPORT_SYMBOL_GPL(flow_offload_free);
-void flow_offload_dead(struct flow_offload *flow)
+static u32 flow_offload_hash(const void *data, u32 len, u32 seed)
{
- flow->flags |= FLOW_OFFLOAD_DYING;
+ const struct flow_offload_tuple *tuple = data;
+
+ return jhash(tuple, offsetof(struct flow_offload_tuple, dir), seed);
}
-EXPORT_SYMBOL_GPL(flow_offload_dead);
+
+static u32 flow_offload_hash_obj(const void *data, u32 len, u32 seed)
+{
+ const struct flow_offload_tuple_rhash *tuplehash = data;
+
+ return jhash(&tuplehash->tuple, offsetof(struct flow_offload_tuple, dir), seed);
+}
+
+static int flow_offload_hash_cmp(struct rhashtable_compare_arg *arg,
+ const void *ptr)
+{
+ const struct flow_offload_tuple *tuple = arg->key;
+ const struct flow_offload_tuple_rhash *x = ptr;
+
+ if (memcmp(&x->tuple, tuple, offsetof(struct flow_offload_tuple, dir)))
+ return 1;
+
+ return 0;
+}
+
+static const struct rhashtable_params nf_flow_offload_rhash_params = {
+ .head_offset = offsetof(struct flow_offload_tuple_rhash, node),
+ .hashfn = flow_offload_hash,
+ .obj_hashfn = flow_offload_hash_obj,
+ .obj_cmpfn = flow_offload_hash_cmp,
+ .automatic_shrinking = true,
+};
int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
{
@@ -143,10 +191,10 @@
rhashtable_insert_fast(&flow_table->rhashtable,
&flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
- *flow_table->type->params);
+ nf_flow_offload_rhash_params);
rhashtable_insert_fast(&flow_table->rhashtable,
&flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
- *flow_table->type->params);
+ nf_flow_offload_rhash_params);
return 0;
}
EXPORT_SYMBOL_GPL(flow_offload_add);
@@ -154,22 +202,51 @@
static void flow_offload_del(struct nf_flowtable *flow_table,
struct flow_offload *flow)
{
+ struct flow_offload_entry *e;
+
rhashtable_remove_fast(&flow_table->rhashtable,
&flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
- *flow_table->type->params);
+ nf_flow_offload_rhash_params);
rhashtable_remove_fast(&flow_table->rhashtable,
&flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
- *flow_table->type->params);
+ nf_flow_offload_rhash_params);
+
+ e = container_of(flow, struct flow_offload_entry, flow);
+ clear_bit(IPS_OFFLOAD_BIT, &e->ct->status);
flow_offload_free(flow);
}
+void flow_offload_teardown(struct flow_offload *flow)
+{
+ struct flow_offload_entry *e;
+
+ flow->flags |= FLOW_OFFLOAD_TEARDOWN;
+
+ e = container_of(flow, struct flow_offload_entry, flow);
+ flow_offload_fixup_ct_state(e->ct);
+}
+EXPORT_SYMBOL_GPL(flow_offload_teardown);
+
struct flow_offload_tuple_rhash *
flow_offload_lookup(struct nf_flowtable *flow_table,
struct flow_offload_tuple *tuple)
{
- return rhashtable_lookup_fast(&flow_table->rhashtable, tuple,
- *flow_table->type->params);
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct flow_offload *flow;
+ int dir;
+
+ tuplehash = rhashtable_lookup_fast(&flow_table->rhashtable, tuple,
+ nf_flow_offload_rhash_params);
+ if (!tuplehash)
+ return NULL;
+
+ dir = tuplehash->tuple.dir;
+ flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+ if (flow->flags & (FLOW_OFFLOAD_DYING | FLOW_OFFLOAD_TEARDOWN))
+ return NULL;
+
+ return tuplehash;
}
EXPORT_SYMBOL_GPL(flow_offload_lookup);
@@ -216,11 +293,6 @@
return (__s32)(flow->timeout - (u32)jiffies) <= 0;
}
-static inline bool nf_flow_is_dying(const struct flow_offload *flow)
-{
- return flow->flags & FLOW_OFFLOAD_DYING;
-}
-
static int nf_flow_offload_gc_step(struct nf_flowtable *flow_table)
{
struct flow_offload_tuple_rhash *tuplehash;
@@ -248,7 +320,8 @@
flow = container_of(tuplehash, struct flow_offload, tuplehash[0]);
if (nf_flow_has_expired(flow) ||
- nf_flow_is_dying(flow))
+ (flow->flags & (FLOW_OFFLOAD_DYING |
+ FLOW_OFFLOAD_TEARDOWN)))
flow_offload_del(flow_table, flow);
}
out:
@@ -258,7 +331,7 @@
return 1;
}
-void nf_flow_offload_work_gc(struct work_struct *work)
+static void nf_flow_offload_work_gc(struct work_struct *work)
{
struct nf_flowtable *flow_table;
@@ -266,42 +339,6 @@
nf_flow_offload_gc_step(flow_table);
queue_delayed_work(system_power_efficient_wq, &flow_table->gc_work, HZ);
}
-EXPORT_SYMBOL_GPL(nf_flow_offload_work_gc);
-
-static u32 flow_offload_hash(const void *data, u32 len, u32 seed)
-{
- const struct flow_offload_tuple *tuple = data;
-
- return jhash(tuple, offsetof(struct flow_offload_tuple, dir), seed);
-}
-
-static u32 flow_offload_hash_obj(const void *data, u32 len, u32 seed)
-{
- const struct flow_offload_tuple_rhash *tuplehash = data;
-
- return jhash(&tuplehash->tuple, offsetof(struct flow_offload_tuple, dir), seed);
-}
-
-static int flow_offload_hash_cmp(struct rhashtable_compare_arg *arg,
- const void *ptr)
-{
- const struct flow_offload_tuple *tuple = arg->key;
- const struct flow_offload_tuple_rhash *x = ptr;
-
- if (memcmp(&x->tuple, tuple, offsetof(struct flow_offload_tuple, dir)))
- return 1;
-
- return 0;
-}
-
-const struct rhashtable_params nf_flow_offload_rhash_params = {
- .head_offset = offsetof(struct flow_offload_tuple_rhash, node),
- .hashfn = flow_offload_hash,
- .obj_hashfn = flow_offload_hash_obj,
- .obj_cmpfn = flow_offload_hash_cmp,
- .automatic_shrinking = true,
-};
-EXPORT_SYMBOL_GPL(nf_flow_offload_rhash_params);
static int nf_flow_nat_port_tcp(struct sk_buff *skb, unsigned int thoff,
__be16 port, __be16 new_port)
@@ -419,33 +456,69 @@
}
EXPORT_SYMBOL_GPL(nf_flow_dnat_port);
+int nf_flow_table_init(struct nf_flowtable *flowtable)
+{
+ int err;
+
+ INIT_DEFERRABLE_WORK(&flowtable->gc_work, nf_flow_offload_work_gc);
+
+ err = rhashtable_init(&flowtable->rhashtable,
+ &nf_flow_offload_rhash_params);
+ if (err < 0)
+ return err;
+
+ queue_delayed_work(system_power_efficient_wq,
+ &flowtable->gc_work, HZ);
+
+ mutex_lock(&flowtable_lock);
+ list_add(&flowtable->list, &flowtables);
+ mutex_unlock(&flowtable_lock);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nf_flow_table_init);
+
static void nf_flow_table_do_cleanup(struct flow_offload *flow, void *data)
{
struct net_device *dev = data;
- if (dev && flow->tuplehash[0].tuple.iifidx != dev->ifindex)
+ if (!dev) {
+ flow_offload_teardown(flow);
return;
+ }
- flow_offload_dead(flow);
+ if (flow->tuplehash[0].tuple.iifidx == dev->ifindex ||
+ flow->tuplehash[1].tuple.iifidx == dev->ifindex)
+ flow_offload_dead(flow);
}
static void nf_flow_table_iterate_cleanup(struct nf_flowtable *flowtable,
- void *data)
+ struct net_device *dev)
{
- nf_flow_table_iterate(flowtable, nf_flow_table_do_cleanup, data);
+ nf_flow_table_iterate(flowtable, nf_flow_table_do_cleanup, dev);
flush_delayed_work(&flowtable->gc_work);
}
void nf_flow_table_cleanup(struct net *net, struct net_device *dev)
{
- nft_flow_table_iterate(net, nf_flow_table_iterate_cleanup, dev);
+ struct nf_flowtable *flowtable;
+
+ mutex_lock(&flowtable_lock);
+ list_for_each_entry(flowtable, &flowtables, list)
+ nf_flow_table_iterate_cleanup(flowtable, dev);
+ mutex_unlock(&flowtable_lock);
}
EXPORT_SYMBOL_GPL(nf_flow_table_cleanup);
void nf_flow_table_free(struct nf_flowtable *flow_table)
{
+ mutex_lock(&flowtable_lock);
+ list_del(&flow_table->list);
+ mutex_unlock(&flowtable_lock);
+ cancel_delayed_work_sync(&flow_table->gc_work);
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
WARN_ON(!nf_flow_offload_gc_step(flow_table));
+ rhashtable_destroy(&flow_table->rhashtable);
}
EXPORT_SYMBOL_GPL(nf_flow_table_free);
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 375a188..99771aa 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -22,8 +22,7 @@
static struct nf_flowtable_type flowtable_inet = {
.family = NFPROTO_INET,
- .params = &nf_flow_offload_rhash_params,
- .gc = nf_flow_offload_work_gc,
+ .init = nf_flow_table_init,
.free = nf_flow_table_free,
.hook = nf_flow_offload_inet_hook,
.owner = THIS_MODULE,
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
new file mode 100644
index 0000000..82451b7
--- /dev/null
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -0,0 +1,487 @@
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/netfilter.h>
+#include <linux/rhashtable.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/netdevice.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/neighbour.h>
+#include <net/netfilter/nf_flow_table.h>
+/* For layer 4 checksum field offset. */
+#include <linux/tcp.h>
+#include <linux/udp.h>
+
+static int nf_flow_state_check(struct flow_offload *flow, int proto,
+ struct sk_buff *skb, unsigned int thoff)
+{
+ struct tcphdr *tcph;
+
+ if (proto != IPPROTO_TCP)
+ return 0;
+
+ if (!pskb_may_pull(skb, thoff + sizeof(*tcph)))
+ return -1;
+
+ tcph = (void *)(skb_network_header(skb) + thoff);
+ if (unlikely(tcph->fin || tcph->rst)) {
+ flow_offload_teardown(flow);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int nf_flow_nat_ip_tcp(struct sk_buff *skb, unsigned int thoff,
+ __be32 addr, __be32 new_addr)
+{
+ struct tcphdr *tcph;
+
+ if (!pskb_may_pull(skb, thoff + sizeof(*tcph)) ||
+ skb_try_make_writable(skb, thoff + sizeof(*tcph)))
+ return -1;
+
+ tcph = (void *)(skb_network_header(skb) + thoff);
+ inet_proto_csum_replace4(&tcph->check, skb, addr, new_addr, true);
+
+ return 0;
+}
+
+static int nf_flow_nat_ip_udp(struct sk_buff *skb, unsigned int thoff,
+ __be32 addr, __be32 new_addr)
+{
+ struct udphdr *udph;
+
+ if (!pskb_may_pull(skb, thoff + sizeof(*udph)) ||
+ skb_try_make_writable(skb, thoff + sizeof(*udph)))
+ return -1;
+
+ udph = (void *)(skb_network_header(skb) + thoff);
+ if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) {
+ inet_proto_csum_replace4(&udph->check, skb, addr,
+ new_addr, true);
+ if (!udph->check)
+ udph->check = CSUM_MANGLED_0;
+ }
+
+ return 0;
+}
+
+static int nf_flow_nat_ip_l4proto(struct sk_buff *skb, struct iphdr *iph,
+ unsigned int thoff, __be32 addr,
+ __be32 new_addr)
+{
+ switch (iph->protocol) {
+ case IPPROTO_TCP:
+ if (nf_flow_nat_ip_tcp(skb, thoff, addr, new_addr) < 0)
+ return NF_DROP;
+ break;
+ case IPPROTO_UDP:
+ if (nf_flow_nat_ip_udp(skb, thoff, addr, new_addr) < 0)
+ return NF_DROP;
+ break;
+ }
+
+ return 0;
+}
+
+static int nf_flow_snat_ip(const struct flow_offload *flow, struct sk_buff *skb,
+ struct iphdr *iph, unsigned int thoff,
+ enum flow_offload_tuple_dir dir)
+{
+ __be32 addr, new_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = iph->saddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4.s_addr;
+ iph->saddr = new_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = iph->daddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v4.s_addr;
+ iph->daddr = new_addr;
+ break;
+ default:
+ return -1;
+ }
+ csum_replace4(&iph->check, addr, new_addr);
+
+ return nf_flow_nat_ip_l4proto(skb, iph, thoff, addr, new_addr);
+}
+
+static int nf_flow_dnat_ip(const struct flow_offload *flow, struct sk_buff *skb,
+ struct iphdr *iph, unsigned int thoff,
+ enum flow_offload_tuple_dir dir)
+{
+ __be32 addr, new_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = iph->daddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4.s_addr;
+ iph->daddr = new_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = iph->saddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v4.s_addr;
+ iph->saddr = new_addr;
+ break;
+ default:
+ return -1;
+ }
+ csum_replace4(&iph->check, addr, new_addr);
+
+ return nf_flow_nat_ip_l4proto(skb, iph, thoff, addr, new_addr);
+}
+
+static int nf_flow_nat_ip(const struct flow_offload *flow, struct sk_buff *skb,
+ unsigned int thoff, enum flow_offload_tuple_dir dir)
+{
+ struct iphdr *iph = ip_hdr(skb);
+
+ if (flow->flags & FLOW_OFFLOAD_SNAT &&
+ (nf_flow_snat_port(flow, skb, thoff, iph->protocol, dir) < 0 ||
+ nf_flow_snat_ip(flow, skb, iph, thoff, dir) < 0))
+ return -1;
+ if (flow->flags & FLOW_OFFLOAD_DNAT &&
+ (nf_flow_dnat_port(flow, skb, thoff, iph->protocol, dir) < 0 ||
+ nf_flow_dnat_ip(flow, skb, iph, thoff, dir) < 0))
+ return -1;
+
+ return 0;
+}
+
+static bool ip_has_options(unsigned int thoff)
+{
+ return thoff != sizeof(struct iphdr);
+}
+
+static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
+ struct flow_offload_tuple *tuple)
+{
+ struct flow_ports *ports;
+ unsigned int thoff;
+ struct iphdr *iph;
+
+ if (!pskb_may_pull(skb, sizeof(*iph)))
+ return -1;
+
+ iph = ip_hdr(skb);
+ thoff = iph->ihl * 4;
+
+ if (ip_is_fragment(iph) ||
+ unlikely(ip_has_options(thoff)))
+ return -1;
+
+ if (iph->protocol != IPPROTO_TCP &&
+ iph->protocol != IPPROTO_UDP)
+ return -1;
+
+ thoff = iph->ihl * 4;
+ if (!pskb_may_pull(skb, thoff + sizeof(*ports)))
+ return -1;
+
+ ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
+
+ tuple->src_v4.s_addr = iph->saddr;
+ tuple->dst_v4.s_addr = iph->daddr;
+ tuple->src_port = ports->source;
+ tuple->dst_port = ports->dest;
+ tuple->l3proto = AF_INET;
+ tuple->l4proto = iph->protocol;
+ tuple->iifidx = dev->ifindex;
+
+ return 0;
+}
+
+/* Based on ip_exceeds_mtu(). */
+static bool nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+ if (skb->len <= mtu)
+ return false;
+
+ if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
+ return false;
+
+ return true;
+}
+
+unsigned int
+nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct nf_flowtable *flow_table = priv;
+ struct flow_offload_tuple tuple = {};
+ enum flow_offload_tuple_dir dir;
+ struct flow_offload *flow;
+ struct net_device *outdev;
+ const struct rtable *rt;
+ unsigned int thoff;
+ struct iphdr *iph;
+ __be32 nexthop;
+
+ if (skb->protocol != htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if (nf_flow_tuple_ip(skb, state->in, &tuple) < 0)
+ return NF_ACCEPT;
+
+ tuplehash = flow_offload_lookup(flow_table, &tuple);
+ if (tuplehash == NULL)
+ return NF_ACCEPT;
+
+ outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
+ if (!outdev)
+ return NF_ACCEPT;
+
+ dir = tuplehash->tuple.dir;
+ flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+ rt = (const struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
+
+ if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)) &&
+ (ip_hdr(skb)->frag_off & htons(IP_DF)) != 0)
+ return NF_ACCEPT;
+
+ if (skb_try_make_writable(skb, sizeof(*iph)))
+ return NF_DROP;
+
+ thoff = ip_hdr(skb)->ihl * 4;
+ if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff))
+ return NF_ACCEPT;
+
+ if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
+ nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
+ return NF_DROP;
+
+ flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
+ iph = ip_hdr(skb);
+ ip_decrease_ttl(iph);
+
+ skb->dev = outdev;
+ nexthop = rt_nexthop(rt, flow->tuplehash[!dir].tuple.src_v4.s_addr);
+ neigh_xmit(NEIGH_ARP_TABLE, outdev, &nexthop, skb);
+
+ return NF_STOLEN;
+}
+EXPORT_SYMBOL_GPL(nf_flow_offload_ip_hook);
+
+static int nf_flow_nat_ipv6_tcp(struct sk_buff *skb, unsigned int thoff,
+ struct in6_addr *addr,
+ struct in6_addr *new_addr)
+{
+ struct tcphdr *tcph;
+
+ if (!pskb_may_pull(skb, thoff + sizeof(*tcph)) ||
+ skb_try_make_writable(skb, thoff + sizeof(*tcph)))
+ return -1;
+
+ tcph = (void *)(skb_network_header(skb) + thoff);
+ inet_proto_csum_replace16(&tcph->check, skb, addr->s6_addr32,
+ new_addr->s6_addr32, true);
+
+ return 0;
+}
+
+static int nf_flow_nat_ipv6_udp(struct sk_buff *skb, unsigned int thoff,
+ struct in6_addr *addr,
+ struct in6_addr *new_addr)
+{
+ struct udphdr *udph;
+
+ if (!pskb_may_pull(skb, thoff + sizeof(*udph)) ||
+ skb_try_make_writable(skb, thoff + sizeof(*udph)))
+ return -1;
+
+ udph = (void *)(skb_network_header(skb) + thoff);
+ if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) {
+ inet_proto_csum_replace16(&udph->check, skb, addr->s6_addr32,
+ new_addr->s6_addr32, true);
+ if (!udph->check)
+ udph->check = CSUM_MANGLED_0;
+ }
+
+ return 0;
+}
+
+static int nf_flow_nat_ipv6_l4proto(struct sk_buff *skb, struct ipv6hdr *ip6h,
+ unsigned int thoff, struct in6_addr *addr,
+ struct in6_addr *new_addr)
+{
+ switch (ip6h->nexthdr) {
+ case IPPROTO_TCP:
+ if (nf_flow_nat_ipv6_tcp(skb, thoff, addr, new_addr) < 0)
+ return NF_DROP;
+ break;
+ case IPPROTO_UDP:
+ if (nf_flow_nat_ipv6_udp(skb, thoff, addr, new_addr) < 0)
+ return NF_DROP;
+ break;
+ }
+
+ return 0;
+}
+
+static int nf_flow_snat_ipv6(const struct flow_offload *flow,
+ struct sk_buff *skb, struct ipv6hdr *ip6h,
+ unsigned int thoff,
+ enum flow_offload_tuple_dir dir)
+{
+ struct in6_addr addr, new_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = ip6h->saddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v6;
+ ip6h->saddr = new_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = ip6h->daddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v6;
+ ip6h->daddr = new_addr;
+ break;
+ default:
+ return -1;
+ }
+
+ return nf_flow_nat_ipv6_l4proto(skb, ip6h, thoff, &addr, &new_addr);
+}
+
+static int nf_flow_dnat_ipv6(const struct flow_offload *flow,
+ struct sk_buff *skb, struct ipv6hdr *ip6h,
+ unsigned int thoff,
+ enum flow_offload_tuple_dir dir)
+{
+ struct in6_addr addr, new_addr;
+
+ switch (dir) {
+ case FLOW_OFFLOAD_DIR_ORIGINAL:
+ addr = ip6h->daddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v6;
+ ip6h->daddr = new_addr;
+ break;
+ case FLOW_OFFLOAD_DIR_REPLY:
+ addr = ip6h->saddr;
+ new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v6;
+ ip6h->saddr = new_addr;
+ break;
+ default:
+ return -1;
+ }
+
+ return nf_flow_nat_ipv6_l4proto(skb, ip6h, thoff, &addr, &new_addr);
+}
+
+static int nf_flow_nat_ipv6(const struct flow_offload *flow,
+ struct sk_buff *skb,
+ enum flow_offload_tuple_dir dir)
+{
+ struct ipv6hdr *ip6h = ipv6_hdr(skb);
+ unsigned int thoff = sizeof(*ip6h);
+
+ if (flow->flags & FLOW_OFFLOAD_SNAT &&
+ (nf_flow_snat_port(flow, skb, thoff, ip6h->nexthdr, dir) < 0 ||
+ nf_flow_snat_ipv6(flow, skb, ip6h, thoff, dir) < 0))
+ return -1;
+ if (flow->flags & FLOW_OFFLOAD_DNAT &&
+ (nf_flow_dnat_port(flow, skb, thoff, ip6h->nexthdr, dir) < 0 ||
+ nf_flow_dnat_ipv6(flow, skb, ip6h, thoff, dir) < 0))
+ return -1;
+
+ return 0;
+}
+
+static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
+ struct flow_offload_tuple *tuple)
+{
+ struct flow_ports *ports;
+ struct ipv6hdr *ip6h;
+ unsigned int thoff;
+
+ if (!pskb_may_pull(skb, sizeof(*ip6h)))
+ return -1;
+
+ ip6h = ipv6_hdr(skb);
+
+ if (ip6h->nexthdr != IPPROTO_TCP &&
+ ip6h->nexthdr != IPPROTO_UDP)
+ return -1;
+
+ thoff = sizeof(*ip6h);
+ if (!pskb_may_pull(skb, thoff + sizeof(*ports)))
+ return -1;
+
+ ports = (struct flow_ports *)(skb_network_header(skb) + thoff);
+
+ tuple->src_v6 = ip6h->saddr;
+ tuple->dst_v6 = ip6h->daddr;
+ tuple->src_port = ports->source;
+ tuple->dst_port = ports->dest;
+ tuple->l3proto = AF_INET6;
+ tuple->l4proto = ip6h->nexthdr;
+ tuple->iifidx = dev->ifindex;
+
+ return 0;
+}
+
+unsigned int
+nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct flow_offload_tuple_rhash *tuplehash;
+ struct nf_flowtable *flow_table = priv;
+ struct flow_offload_tuple tuple = {};
+ enum flow_offload_tuple_dir dir;
+ struct flow_offload *flow;
+ struct net_device *outdev;
+ struct in6_addr *nexthop;
+ struct ipv6hdr *ip6h;
+ struct rt6_info *rt;
+
+ if (skb->protocol != htons(ETH_P_IPV6))
+ return NF_ACCEPT;
+
+ if (nf_flow_tuple_ipv6(skb, state->in, &tuple) < 0)
+ return NF_ACCEPT;
+
+ tuplehash = flow_offload_lookup(flow_table, &tuple);
+ if (tuplehash == NULL)
+ return NF_ACCEPT;
+
+ outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx);
+ if (!outdev)
+ return NF_ACCEPT;
+
+ dir = tuplehash->tuple.dir;
+ flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+ rt = (struct rt6_info *)flow->tuplehash[dir].tuple.dst_cache;
+
+ if (unlikely(nf_flow_exceeds_mtu(skb, flow->tuplehash[dir].tuple.mtu)))
+ return NF_ACCEPT;
+
+ if (nf_flow_state_check(flow, ipv6_hdr(skb)->nexthdr, skb,
+ sizeof(*ip6h)))
+ return NF_ACCEPT;
+
+ if (skb_try_make_writable(skb, sizeof(*ip6h)))
+ return NF_DROP;
+
+ if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) &&
+ nf_flow_nat_ipv6(flow, skb, dir) < 0)
+ return NF_DROP;
+
+ flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT;
+ ip6h = ipv6_hdr(skb);
+ ip6h->hop_limit--;
+
+ skb->dev = outdev;
+ nexthop = rt6_nexthop(rt, &flow->tuplehash[!dir].tuple.src_v6);
+ neigh_xmit(NEIGH_ND_TABLE, outdev, nexthop, skb);
+
+ return NF_STOLEN;
+}
+EXPORT_SYMBOL_GPL(nf_flow_offload_ipv6_hook);
diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h
index 18f6d7a..e15779fd 100644
--- a/net/netfilter/nf_internals.h
+++ b/net/netfilter/nf_internals.h
@@ -15,4 +15,9 @@
/* nf_log.c */
int __init netfilter_log_init(void);
+/* core.c */
+void nf_hook_entries_delete_raw(struct nf_hook_entries __rcu **pp,
+ const struct nf_hook_ops *reg);
+int nf_hook_entries_insert_raw(struct nf_hook_entries __rcu **pp,
+ const struct nf_hook_ops *reg);
#endif
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 617693f..821f8d8 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -32,6 +32,8 @@
#include <net/netfilter/nf_conntrack_zones.h>
#include <linux/netfilter/nf_nat.h>
+#include "nf_internals.h"
+
static spinlock_t nf_nat_locks[CONNTRACK_LOCKS];
static DEFINE_MUTEX(nf_nat_proto_mutex);
@@ -39,11 +41,27 @@
__read_mostly;
static const struct nf_nat_l4proto __rcu **nf_nat_l4protos[NFPROTO_NUMPROTO]
__read_mostly;
+static unsigned int nat_net_id __read_mostly;
static struct hlist_head *nf_nat_bysource __read_mostly;
static unsigned int nf_nat_htable_size __read_mostly;
static unsigned int nf_nat_hash_rnd __read_mostly;
+struct nf_nat_lookup_hook_priv {
+ struct nf_hook_entries __rcu *entries;
+
+ struct rcu_head rcu_head;
+};
+
+struct nf_nat_hooks_net {
+ struct nf_hook_ops *nat_hook_ops;
+ unsigned int users;
+};
+
+struct nat_net {
+ struct nf_nat_hooks_net nat_proto_net[NFPROTO_NUMPROTO];
+};
+
inline const struct nf_nat_l3proto *
__nf_nat_l3proto_find(u8 family)
{
@@ -157,7 +175,7 @@
static int in_range(const struct nf_nat_l3proto *l3proto,
const struct nf_nat_l4proto *l4proto,
const struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range)
+ const struct nf_nat_range2 *range)
{
/* If we are supposed to map IPs, then we must be in the
* range specified, otherwise let this drag us onto a new src IP.
@@ -194,7 +212,7 @@
const struct nf_nat_l4proto *l4proto,
const struct nf_conntrack_tuple *tuple,
struct nf_conntrack_tuple *result,
- const struct nf_nat_range *range)
+ const struct nf_nat_range2 *range)
{
unsigned int h = hash_by_src(net, tuple);
const struct nf_conn *ct;
@@ -224,7 +242,7 @@
static void
find_best_ips_proto(const struct nf_conntrack_zone *zone,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
const struct nf_conn *ct,
enum nf_nat_manip_type maniptype)
{
@@ -298,7 +316,7 @@
static void
get_unique_tuple(struct nf_conntrack_tuple *tuple,
const struct nf_conntrack_tuple *orig_tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
struct nf_conn *ct,
enum nf_nat_manip_type maniptype)
{
@@ -349,9 +367,10 @@
/* Only bother mapping if it's not already in range and unique */
if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) {
if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) {
- if (l4proto->in_range(tuple, maniptype,
- &range->min_proto,
- &range->max_proto) &&
+ if (!(range->flags & NF_NAT_RANGE_PROTO_OFFSET) &&
+ l4proto->in_range(tuple, maniptype,
+ &range->min_proto,
+ &range->max_proto) &&
(range->min_proto.all == range->max_proto.all ||
!nf_nat_used_tuple(tuple, ct)))
goto out;
@@ -360,7 +379,7 @@
}
}
- /* Last change: get protocol to try to obtain unique tuple. */
+ /* Last chance: get protocol to try to obtain unique tuple. */
l4proto->unique_tuple(l3proto, tuple, range, maniptype, ct);
out:
rcu_read_unlock();
@@ -381,7 +400,7 @@
unsigned int
nf_nat_setup_info(struct nf_conn *ct,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype)
{
struct net *net = nf_ct_net(ct);
@@ -459,7 +478,7 @@
(manip == NF_NAT_MANIP_SRC ?
ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3 :
ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3);
- struct nf_nat_range range = {
+ struct nf_nat_range2 range = {
.flags = NF_NAT_RANGE_MAP_IPS,
.min_addr = ip,
.max_addr = ip,
@@ -474,17 +493,36 @@
}
EXPORT_SYMBOL_GPL(nf_nat_alloc_null_binding);
+static unsigned int nf_nat_manip_pkt(struct sk_buff *skb, struct nf_conn *ct,
+ enum nf_nat_manip_type mtype,
+ enum ip_conntrack_dir dir)
+{
+ const struct nf_nat_l3proto *l3proto;
+ const struct nf_nat_l4proto *l4proto;
+ struct nf_conntrack_tuple target;
+
+ /* We are aiming to look like inverse of other direction. */
+ nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+
+ l3proto = __nf_nat_l3proto_find(target.src.l3num);
+ l4proto = __nf_nat_l4proto_find(target.src.l3num,
+ target.dst.protonum);
+ if (!l3proto->manip_pkt(skb, 0, l4proto, &target, mtype))
+ return NF_DROP;
+
+ return NF_ACCEPT;
+}
+
/* Do packet manipulations according to nf_nat_setup_info. */
unsigned int nf_nat_packet(struct nf_conn *ct,
enum ip_conntrack_info ctinfo,
unsigned int hooknum,
struct sk_buff *skb)
{
- const struct nf_nat_l3proto *l3proto;
- const struct nf_nat_l4proto *l4proto;
- enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
- unsigned long statusbit;
enum nf_nat_manip_type mtype = HOOK2MANIP(hooknum);
+ enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+ unsigned int verdict = NF_ACCEPT;
+ unsigned long statusbit;
if (mtype == NF_NAT_MANIP_SRC)
statusbit = IPS_SRC_NAT;
@@ -496,22 +534,88 @@
statusbit ^= IPS_NAT_MASK;
/* Non-atomic: these bits don't change. */
- if (ct->status & statusbit) {
- struct nf_conntrack_tuple target;
+ if (ct->status & statusbit)
+ verdict = nf_nat_manip_pkt(skb, ct, mtype, dir);
- /* We are aiming to look like inverse of other direction. */
- nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
-
- l3proto = __nf_nat_l3proto_find(target.src.l3num);
- l4proto = __nf_nat_l4proto_find(target.src.l3num,
- target.dst.protonum);
- if (!l3proto->manip_pkt(skb, 0, l4proto, &target, mtype))
- return NF_DROP;
- }
- return NF_ACCEPT;
+ return verdict;
}
EXPORT_SYMBOL_GPL(nf_nat_packet);
+unsigned int
+nf_nat_inet_fn(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct nf_conn *ct;
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn_nat *nat;
+ /* maniptype == SRC for postrouting. */
+ enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
+
+ ct = nf_ct_get(skb, &ctinfo);
+ /* Can't track? It's not due to stress, or conntrack would
+ * have dropped it. Hence it's the user's responsibilty to
+ * packet filter it out, or implement conntrack/NAT for that
+ * protocol. 8) --RR
+ */
+ if (!ct)
+ return NF_ACCEPT;
+
+ nat = nfct_nat(ct);
+
+ switch (ctinfo) {
+ case IP_CT_RELATED:
+ case IP_CT_RELATED_REPLY:
+ /* Only ICMPs can be IP_CT_IS_REPLY. Fallthrough */
+ case IP_CT_NEW:
+ /* Seen it before? This can happen for loopback, retrans,
+ * or local packets.
+ */
+ if (!nf_nat_initialized(ct, maniptype)) {
+ struct nf_nat_lookup_hook_priv *lpriv = priv;
+ struct nf_hook_entries *e = rcu_dereference(lpriv->entries);
+ unsigned int ret;
+ int i;
+
+ if (!e)
+ goto null_bind;
+
+ for (i = 0; i < e->num_hook_entries; i++) {
+ ret = e->hooks[i].hook(e->hooks[i].priv, skb,
+ state);
+ if (ret != NF_ACCEPT)
+ return ret;
+ if (nf_nat_initialized(ct, maniptype))
+ goto do_nat;
+ }
+null_bind:
+ ret = nf_nat_alloc_null_binding(ct, state->hook);
+ if (ret != NF_ACCEPT)
+ return ret;
+ } else {
+ pr_debug("Already setup manip %s for ct %p (status bits 0x%lx)\n",
+ maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
+ ct, ct->status);
+ if (nf_nat_oif_changed(state->hook, ctinfo, nat,
+ state->out))
+ goto oif_changed;
+ }
+ break;
+ default:
+ /* ESTABLISHED */
+ WARN_ON(ctinfo != IP_CT_ESTABLISHED &&
+ ctinfo != IP_CT_ESTABLISHED_REPLY);
+ if (nf_nat_oif_changed(state->hook, ctinfo, nat, state->out))
+ goto oif_changed;
+ }
+do_nat:
+ return nf_nat_packet(ct, ctinfo, state->hook, skb);
+
+oif_changed:
+ nf_ct_kill_acct(ct, ctinfo, skb);
+ return NF_DROP;
+}
+EXPORT_SYMBOL_GPL(nf_nat_inet_fn);
+
struct nf_nat_proto_clean {
u8 l3proto;
u8 l4proto;
@@ -702,7 +806,7 @@
static int nfnetlink_parse_nat_proto(struct nlattr *attr,
const struct nf_conn *ct,
- struct nf_nat_range *range)
+ struct nf_nat_range2 *range)
{
struct nlattr *tb[CTA_PROTONAT_MAX+1];
const struct nf_nat_l4proto *l4proto;
@@ -730,7 +834,7 @@
static int
nfnetlink_parse_nat(const struct nlattr *nat,
- const struct nf_conn *ct, struct nf_nat_range *range,
+ const struct nf_conn *ct, struct nf_nat_range2 *range,
const struct nf_nat_l3proto *l3proto)
{
struct nlattr *tb[CTA_NAT_MAX+1];
@@ -758,7 +862,7 @@
enum nf_nat_manip_type manip,
const struct nlattr *attr)
{
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
const struct nf_nat_l3proto *l3proto;
int err;
@@ -800,6 +904,146 @@
.expectfn = nf_nat_follow_master,
};
+int nf_nat_register_fn(struct net *net, const struct nf_hook_ops *ops,
+ const struct nf_hook_ops *orig_nat_ops, unsigned int ops_count)
+{
+ struct nat_net *nat_net = net_generic(net, nat_net_id);
+ struct nf_nat_hooks_net *nat_proto_net;
+ struct nf_nat_lookup_hook_priv *priv;
+ unsigned int hooknum = ops->hooknum;
+ struct nf_hook_ops *nat_ops;
+ int i, ret;
+
+ if (WARN_ON_ONCE(ops->pf >= ARRAY_SIZE(nat_net->nat_proto_net)))
+ return -EINVAL;
+
+ nat_proto_net = &nat_net->nat_proto_net[ops->pf];
+
+ for (i = 0; i < ops_count; i++) {
+ if (WARN_ON(orig_nat_ops[i].pf != ops->pf))
+ return -EINVAL;
+ if (orig_nat_ops[i].hooknum == hooknum) {
+ hooknum = i;
+ break;
+ }
+ }
+
+ if (WARN_ON_ONCE(i == ops_count))
+ return -EINVAL;
+
+ mutex_lock(&nf_nat_proto_mutex);
+ if (!nat_proto_net->nat_hook_ops) {
+ WARN_ON(nat_proto_net->users != 0);
+
+ nat_ops = kmemdup(orig_nat_ops, sizeof(*orig_nat_ops) * ops_count, GFP_KERNEL);
+ if (!nat_ops) {
+ mutex_unlock(&nf_nat_proto_mutex);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < ops_count; i++) {
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (priv) {
+ nat_ops[i].priv = priv;
+ continue;
+ }
+ mutex_unlock(&nf_nat_proto_mutex);
+ while (i)
+ kfree(nat_ops[--i].priv);
+ kfree(nat_ops);
+ return -ENOMEM;
+ }
+
+ ret = nf_register_net_hooks(net, nat_ops, ops_count);
+ if (ret < 0) {
+ mutex_unlock(&nf_nat_proto_mutex);
+ for (i = 0; i < ops_count; i++)
+ kfree(nat_ops[i].priv);
+ kfree(nat_ops);
+ return ret;
+ }
+
+ nat_proto_net->nat_hook_ops = nat_ops;
+ }
+
+ nat_ops = nat_proto_net->nat_hook_ops;
+ priv = nat_ops[hooknum].priv;
+ if (WARN_ON_ONCE(!priv)) {
+ mutex_unlock(&nf_nat_proto_mutex);
+ return -EOPNOTSUPP;
+ }
+
+ ret = nf_hook_entries_insert_raw(&priv->entries, ops);
+ if (ret == 0)
+ nat_proto_net->users++;
+
+ mutex_unlock(&nf_nat_proto_mutex);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(nf_nat_register_fn);
+
+void nf_nat_unregister_fn(struct net *net, const struct nf_hook_ops *ops,
+ unsigned int ops_count)
+{
+ struct nat_net *nat_net = net_generic(net, nat_net_id);
+ struct nf_nat_hooks_net *nat_proto_net;
+ struct nf_nat_lookup_hook_priv *priv;
+ struct nf_hook_ops *nat_ops;
+ int hooknum = ops->hooknum;
+ int i;
+
+ if (ops->pf >= ARRAY_SIZE(nat_net->nat_proto_net))
+ return;
+
+ nat_proto_net = &nat_net->nat_proto_net[ops->pf];
+
+ mutex_lock(&nf_nat_proto_mutex);
+ if (WARN_ON(nat_proto_net->users == 0))
+ goto unlock;
+
+ nat_proto_net->users--;
+
+ nat_ops = nat_proto_net->nat_hook_ops;
+ for (i = 0; i < ops_count; i++) {
+ if (nat_ops[i].hooknum == hooknum) {
+ hooknum = i;
+ break;
+ }
+ }
+ if (WARN_ON_ONCE(i == ops_count))
+ goto unlock;
+ priv = nat_ops[hooknum].priv;
+ nf_hook_entries_delete_raw(&priv->entries, ops);
+
+ if (nat_proto_net->users == 0) {
+ nf_unregister_net_hooks(net, nat_ops, ops_count);
+
+ for (i = 0; i < ops_count; i++) {
+ priv = nat_ops[i].priv;
+ kfree_rcu(priv, rcu_head);
+ }
+
+ nat_proto_net->nat_hook_ops = NULL;
+ kfree(nat_ops);
+ }
+unlock:
+ mutex_unlock(&nf_nat_proto_mutex);
+}
+EXPORT_SYMBOL_GPL(nf_nat_unregister_fn);
+
+static struct pernet_operations nat_net_ops = {
+ .id = &nat_net_id,
+ .size = sizeof(struct nat_net),
+};
+
+struct nf_nat_hook nat_hook = {
+ .parse_nat_setup = nfnetlink_parse_nat_setup,
+#ifdef CONFIG_XFRM
+ .decode_session = __nf_nat_decode_session,
+#endif
+ .manip_pkt = nf_nat_manip_pkt,
+};
+
static int __init nf_nat_init(void)
{
int ret, i;
@@ -823,15 +1067,17 @@
for (i = 0; i < CONNTRACK_LOCKS; i++)
spin_lock_init(&nf_nat_locks[i]);
+ ret = register_pernet_subsys(&nat_net_ops);
+ if (ret < 0) {
+ nf_ct_extend_unregister(&nat_extend);
+ return ret;
+ }
+
nf_ct_helper_expectfn_register(&follow_master_nat);
- BUG_ON(nfnetlink_parse_nat_setup_hook != NULL);
- RCU_INIT_POINTER(nfnetlink_parse_nat_setup_hook,
- nfnetlink_parse_nat_setup);
-#ifdef CONFIG_XFRM
- BUG_ON(nf_nat_decode_session_hook != NULL);
- RCU_INIT_POINTER(nf_nat_decode_session_hook, __nf_nat_decode_session);
-#endif
+ WARN_ON(nf_nat_hook != NULL);
+ RCU_INIT_POINTER(nf_nat_hook, &nat_hook);
+
return 0;
}
@@ -844,16 +1090,15 @@
nf_ct_extend_unregister(&nat_extend);
nf_ct_helper_expectfn_unregister(&follow_master_nat);
- RCU_INIT_POINTER(nfnetlink_parse_nat_setup_hook, NULL);
-#ifdef CONFIG_XFRM
- RCU_INIT_POINTER(nf_nat_decode_session_hook, NULL);
-#endif
+ RCU_INIT_POINTER(nf_nat_hook, NULL);
+
synchronize_rcu();
for (i = 0; i < NFPROTO_NUMPROTO; i++)
kfree(nf_nat_l4protos[i]);
synchronize_net();
nf_ct_free_hashtable(nf_nat_bysource, nf_nat_htable_size);
+ unregister_pernet_subsys(&nat_net_ops);
}
MODULE_LICENSE("GPL");
diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c
index 607a373..99606ba 100644
--- a/net/netfilter/nf_nat_helper.c
+++ b/net/netfilter/nf_nat_helper.c
@@ -191,7 +191,7 @@
void nf_nat_follow_master(struct nf_conn *ct,
struct nf_conntrack_expect *exp)
{
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
/* This must be a fresh one. */
BUG_ON(ct->status & IPS_NAT_DONE_MASK);
diff --git a/net/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c
index 7d7466d..5d849d8 100644
--- a/net/netfilter/nf_nat_proto_common.c
+++ b/net/netfilter/nf_nat_proto_common.c
@@ -36,7 +36,7 @@
void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct,
u16 *rover)
@@ -83,6 +83,8 @@
: tuple->src.u.all);
} else if (range->flags & NF_NAT_RANGE_PROTO_RANDOM_FULLY) {
off = prandom_u32();
+ } else if (range->flags & NF_NAT_RANGE_PROTO_OFFSET) {
+ off = (ntohs(*portptr) - ntohs(range->base_proto.all));
} else {
off = *rover;
}
@@ -91,7 +93,8 @@
*portptr = htons(min + off % range_size);
if (++i != range_size && nf_nat_used_tuple(tuple, ct))
continue;
- if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL))
+ if (!(range->flags & (NF_NAT_RANGE_PROTO_RANDOM_ALL|
+ NF_NAT_RANGE_PROTO_OFFSET)))
*rover = off;
return;
}
@@ -100,7 +103,7 @@
#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
int nf_nat_l4proto_nlattr_to_range(struct nlattr *tb[],
- struct nf_nat_range *range)
+ struct nf_nat_range2 *range)
{
if (tb[CTA_PROTONAT_PORT_MIN]) {
range->min_proto.all = nla_get_be16(tb[CTA_PROTONAT_PORT_MIN]);
diff --git a/net/netfilter/nf_nat_proto_dccp.c b/net/netfilter/nf_nat_proto_dccp.c
index 269fcd5..67ea0d8 100644
--- a/net/netfilter/nf_nat_proto_dccp.c
+++ b/net/netfilter/nf_nat_proto_dccp.c
@@ -23,7 +23,7 @@
static void
dccp_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/netfilter/nf_nat_proto_sctp.c b/net/netfilter/nf_nat_proto_sctp.c
index c57ee32..1c5d9b6 100644
--- a/net/netfilter/nf_nat_proto_sctp.c
+++ b/net/netfilter/nf_nat_proto_sctp.c
@@ -17,7 +17,7 @@
static void
sctp_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/netfilter/nf_nat_proto_tcp.c b/net/netfilter/nf_nat_proto_tcp.c
index 4f8820f..f15fcd4 100644
--- a/net/netfilter/nf_nat_proto_tcp.c
+++ b/net/netfilter/nf_nat_proto_tcp.c
@@ -23,7 +23,7 @@
static void
tcp_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/netfilter/nf_nat_proto_udp.c b/net/netfilter/nf_nat_proto_udp.c
index edd4a77..5790f70 100644
--- a/net/netfilter/nf_nat_proto_udp.c
+++ b/net/netfilter/nf_nat_proto_udp.c
@@ -22,7 +22,7 @@
static void
udp_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
@@ -100,7 +100,7 @@
static void
udplite_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/netfilter/nf_nat_proto_unknown.c b/net/netfilter/nf_nat_proto_unknown.c
index 6e494d5..c5db3e2 100644
--- a/net/netfilter/nf_nat_proto_unknown.c
+++ b/net/netfilter/nf_nat_proto_unknown.c
@@ -27,7 +27,7 @@
static void unknown_unique_tuple(const struct nf_nat_l3proto *l3proto,
struct nf_conntrack_tuple *tuple,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype,
const struct nf_conn *ct)
{
diff --git a/net/netfilter/nf_nat_redirect.c b/net/netfilter/nf_nat_redirect.c
index 25b06b9..7c4bb0a 100644
--- a/net/netfilter/nf_nat_redirect.c
+++ b/net/netfilter/nf_nat_redirect.c
@@ -36,7 +36,7 @@
struct nf_conn *ct;
enum ip_conntrack_info ctinfo;
__be32 newdst;
- struct nf_nat_range newrange;
+ struct nf_nat_range2 newrange;
WARN_ON(hooknum != NF_INET_PRE_ROUTING &&
hooknum != NF_INET_LOCAL_OUT);
@@ -82,10 +82,10 @@
static const struct in6_addr loopback_addr = IN6ADDR_LOOPBACK_INIT;
unsigned int
-nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range *range,
+nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range,
unsigned int hooknum)
{
- struct nf_nat_range newrange;
+ struct nf_nat_range2 newrange;
struct in6_addr newdst;
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c
index 791fac4..1f30860 100644
--- a/net/netfilter/nf_nat_sip.c
+++ b/net/netfilter/nf_nat_sip.c
@@ -316,7 +316,7 @@
static void nf_nat_sip_expected(struct nf_conn *ct,
struct nf_conntrack_expect *exp)
{
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
/* This must be a fresh one. */
BUG_ON(ct->status & IPS_NAT_DONE_MASK);
diff --git a/net/netfilter/nf_osf.c b/net/netfilter/nf_osf.c
new file mode 100644
index 0000000..5ba5c7b
--- /dev/null
+++ b/net/netfilter/nf_osf.c
@@ -0,0 +1,218 @@
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/module.h>
+#include <linux/kernel.h>
+
+#include <linux/capability.h>
+#include <linux/if.h>
+#include <linux/inetdevice.h>
+#include <linux/ip.h>
+#include <linux/list.h>
+#include <linux/rculist.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/tcp.h>
+
+#include <net/ip.h>
+#include <net/tcp.h>
+
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_log.h>
+#include <linux/netfilter/nf_osf.h>
+
+static inline int nf_osf_ttl(const struct sk_buff *skb,
+ const struct nf_osf_info *info,
+ unsigned char f_ttl)
+{
+ const struct iphdr *ip = ip_hdr(skb);
+
+ if (info->flags & NF_OSF_TTL) {
+ if (info->ttl == NF_OSF_TTL_TRUE)
+ return ip->ttl == f_ttl;
+ if (info->ttl == NF_OSF_TTL_NOCHECK)
+ return 1;
+ else if (ip->ttl <= f_ttl)
+ return 1;
+ else {
+ struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
+ int ret = 0;
+
+ for_ifa(in_dev) {
+ if (inet_ifa_match(ip->saddr, ifa)) {
+ ret = (ip->ttl == f_ttl);
+ break;
+ }
+ }
+ endfor_ifa(in_dev);
+
+ return ret;
+ }
+ }
+
+ return ip->ttl == f_ttl;
+}
+
+bool
+nf_osf_match(const struct sk_buff *skb, u_int8_t family,
+ int hooknum, struct net_device *in, struct net_device *out,
+ const struct nf_osf_info *info, struct net *net,
+ const struct list_head *nf_osf_fingers)
+{
+ const unsigned char *optp = NULL, *_optp = NULL;
+ unsigned int optsize = 0, check_WSS = 0;
+ int fmatch = FMATCH_WRONG, fcount = 0;
+ const struct iphdr *ip = ip_hdr(skb);
+ const struct nf_osf_user_finger *f;
+ unsigned char opts[MAX_IPOPTLEN];
+ const struct nf_osf_finger *kf;
+ u16 window, totlen, mss = 0;
+ const struct tcphdr *tcp;
+ struct tcphdr _tcph;
+ bool df;
+
+ tcp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(struct tcphdr), &_tcph);
+ if (!tcp)
+ return false;
+
+ if (!tcp->syn)
+ return false;
+
+ totlen = ntohs(ip->tot_len);
+ df = ntohs(ip->frag_off) & IP_DF;
+ window = ntohs(tcp->window);
+
+ if (tcp->doff * 4 > sizeof(struct tcphdr)) {
+ optsize = tcp->doff * 4 - sizeof(struct tcphdr);
+
+ _optp = optp = skb_header_pointer(skb, ip_hdrlen(skb) +
+ sizeof(struct tcphdr), optsize, opts);
+ }
+
+ list_for_each_entry_rcu(kf, &nf_osf_fingers[df], finger_entry) {
+ int foptsize, optnum;
+
+ f = &kf->finger;
+
+ if (!(info->flags & NF_OSF_LOG) && strcmp(info->genre, f->genre))
+ continue;
+
+ optp = _optp;
+ fmatch = FMATCH_WRONG;
+
+ if (totlen != f->ss || !nf_osf_ttl(skb, info, f->ttl))
+ continue;
+
+ /*
+ * Should not happen if userspace parser was written correctly.
+ */
+ if (f->wss.wc >= OSF_WSS_MAX)
+ continue;
+
+ /* Check options */
+
+ foptsize = 0;
+ for (optnum = 0; optnum < f->opt_num; ++optnum)
+ foptsize += f->opt[optnum].length;
+
+ if (foptsize > MAX_IPOPTLEN ||
+ optsize > MAX_IPOPTLEN ||
+ optsize != foptsize)
+ continue;
+
+ check_WSS = f->wss.wc;
+
+ for (optnum = 0; optnum < f->opt_num; ++optnum) {
+ if (f->opt[optnum].kind == (*optp)) {
+ __u32 len = f->opt[optnum].length;
+ const __u8 *optend = optp + len;
+
+ fmatch = FMATCH_OK;
+
+ switch (*optp) {
+ case OSFOPT_MSS:
+ mss = optp[3];
+ mss <<= 8;
+ mss |= optp[2];
+
+ mss = ntohs((__force __be16)mss);
+ break;
+ case OSFOPT_TS:
+ break;
+ }
+
+ optp = optend;
+ } else
+ fmatch = FMATCH_OPT_WRONG;
+
+ if (fmatch != FMATCH_OK)
+ break;
+ }
+
+ if (fmatch != FMATCH_OPT_WRONG) {
+ fmatch = FMATCH_WRONG;
+
+ switch (check_WSS) {
+ case OSF_WSS_PLAIN:
+ if (f->wss.val == 0 || window == f->wss.val)
+ fmatch = FMATCH_OK;
+ break;
+ case OSF_WSS_MSS:
+ /*
+ * Some smart modems decrease mangle MSS to
+ * SMART_MSS_2, so we check standard, decreased
+ * and the one provided in the fingerprint MSS
+ * values.
+ */
+#define SMART_MSS_1 1460
+#define SMART_MSS_2 1448
+ if (window == f->wss.val * mss ||
+ window == f->wss.val * SMART_MSS_1 ||
+ window == f->wss.val * SMART_MSS_2)
+ fmatch = FMATCH_OK;
+ break;
+ case OSF_WSS_MTU:
+ if (window == f->wss.val * (mss + 40) ||
+ window == f->wss.val * (SMART_MSS_1 + 40) ||
+ window == f->wss.val * (SMART_MSS_2 + 40))
+ fmatch = FMATCH_OK;
+ break;
+ case OSF_WSS_MODULO:
+ if ((window % f->wss.val) == 0)
+ fmatch = FMATCH_OK;
+ break;
+ }
+ }
+
+ if (fmatch != FMATCH_OK)
+ continue;
+
+ fcount++;
+
+ if (info->flags & NF_OSF_LOG)
+ nf_log_packet(net, family, hooknum, skb,
+ in, out, NULL,
+ "%s [%s:%s] : %pI4:%d -> %pI4:%d hops=%d\n",
+ f->genre, f->version, f->subtype,
+ &ip->saddr, ntohs(tcp->source),
+ &ip->daddr, ntohs(tcp->dest),
+ f->ttl - ip->ttl);
+
+ if ((info->flags & NF_OSF_LOG) &&
+ info->loglevel == NF_OSF_LOGLEVEL_FIRST)
+ break;
+ }
+
+ if (!fcount && (info->flags & NF_OSF_LOG))
+ nf_log_packet(net, family, hooknum, skb, in, out, NULL,
+ "Remote OS is not known: %pI4:%u -> %pI4:%u\n",
+ &ip->saddr, ntohs(tcp->source),
+ &ip->daddr, ntohs(tcp->dest));
+
+ if (fcount)
+ fmatch = FMATCH_OK;
+
+ return fmatch == FMATCH_OK;
+}
+EXPORT_SYMBOL_GPL(nf_osf_match);
+
+MODULE_LICENSE("GPL");
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 91e80aa..87b2a77 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -74,88 +74,43 @@
kfree(trans);
}
-/* removal requests are queued in the commit_list, but not acted upon
- * until after all new rules are in place.
- *
- * Therefore, nf_register_net_hook(net, &nat_hook) runs before pending
- * nf_unregister_net_hook().
- *
- * nf_register_net_hook thus fails if a nat hook is already in place
- * even if the conflicting hook is about to be removed.
- *
- * If collision is detected, search commit_log for DELCHAIN matching
- * the new nat hooknum; if we find one collision is temporary:
- *
- * Either transaction is aborted (new/colliding hook is removed), or
- * transaction is committed (old hook is removed).
- */
-static bool nf_tables_allow_nat_conflict(const struct net *net,
- const struct nf_hook_ops *ops)
-{
- const struct nft_trans *trans;
- bool ret = false;
-
- if (!ops->nat_hook)
- return false;
-
- list_for_each_entry(trans, &net->nft.commit_list, list) {
- const struct nf_hook_ops *pending_ops;
- const struct nft_chain *pending;
-
- if (trans->msg_type != NFT_MSG_NEWCHAIN &&
- trans->msg_type != NFT_MSG_DELCHAIN)
- continue;
-
- pending = trans->ctx.chain;
- if (!nft_is_base_chain(pending))
- continue;
-
- pending_ops = &nft_base_chain(pending)->ops;
- if (pending_ops->nat_hook &&
- pending_ops->pf == ops->pf &&
- pending_ops->hooknum == ops->hooknum) {
- /* other hook registration already pending? */
- if (trans->msg_type == NFT_MSG_NEWCHAIN)
- return false;
-
- ret = true;
- }
- }
-
- return ret;
-}
-
static int nf_tables_register_hook(struct net *net,
const struct nft_table *table,
struct nft_chain *chain)
{
- struct nf_hook_ops *ops;
- int ret;
+ const struct nft_base_chain *basechain;
+ const struct nf_hook_ops *ops;
if (table->flags & NFT_TABLE_F_DORMANT ||
!nft_is_base_chain(chain))
return 0;
- ops = &nft_base_chain(chain)->ops;
- ret = nf_register_net_hook(net, ops);
- if (ret == -EBUSY && nf_tables_allow_nat_conflict(net, ops)) {
- ops->nat_hook = false;
- ret = nf_register_net_hook(net, ops);
- ops->nat_hook = true;
- }
+ basechain = nft_base_chain(chain);
+ ops = &basechain->ops;
- return ret;
+ if (basechain->type->ops_register)
+ return basechain->type->ops_register(net, ops);
+
+ return nf_register_net_hook(net, ops);
}
static void nf_tables_unregister_hook(struct net *net,
const struct nft_table *table,
struct nft_chain *chain)
{
+ const struct nft_base_chain *basechain;
+ const struct nf_hook_ops *ops;
+
if (table->flags & NFT_TABLE_F_DORMANT ||
!nft_is_base_chain(chain))
return;
+ basechain = nft_base_chain(chain);
+ ops = &basechain->ops;
- nf_unregister_net_hook(net, &nft_base_chain(chain)->ops);
+ if (basechain->type->ops_unregister)
+ return basechain->type->ops_unregister(net, ops);
+
+ nf_unregister_net_hook(net, ops);
}
static int nft_trans_table_add(struct nft_ctx *ctx, int msg_type)
@@ -415,13 +370,17 @@
{
struct nft_table *table;
+ if (nla == NULL)
+ return ERR_PTR(-EINVAL);
+
list_for_each_entry(table, &net->nft.tables, list) {
if (!nla_strcmp(nla, table->name) &&
table->family == family &&
nft_active_genmask(table, genmask))
return table;
}
- return NULL;
+
+ return ERR_PTR(-ENOENT);
}
static struct nft_table *nft_table_lookup_byhandle(const struct net *net,
@@ -435,37 +394,6 @@
nft_active_genmask(table, genmask))
return table;
}
- return NULL;
-}
-
-static struct nft_table *nf_tables_table_lookup(const struct net *net,
- const struct nlattr *nla,
- u8 family, u8 genmask)
-{
- struct nft_table *table;
-
- if (nla == NULL)
- return ERR_PTR(-EINVAL);
-
- table = nft_table_lookup(net, nla, family, genmask);
- if (table != NULL)
- return table;
-
- return ERR_PTR(-ENOENT);
-}
-
-static struct nft_table *nf_tables_table_lookup_byhandle(const struct net *net,
- const struct nlattr *nla,
- u8 genmask)
-{
- struct nft_table *table;
-
- if (nla == NULL)
- return ERR_PTR(-EINVAL);
-
- table = nft_table_lookup_byhandle(net, nla, genmask);
- if (table != NULL)
- return table;
return ERR_PTR(-ENOENT);
}
@@ -637,10 +565,11 @@
return netlink_dump_start(nlsk, skb, nlh, &c);
}
- table = nf_tables_table_lookup(net, nla[NFTA_TABLE_NAME], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_TABLE_NAME], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_TABLE_NAME]);
return PTR_ERR(table);
+ }
skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb2)
@@ -756,21 +685,23 @@
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
u8 genmask = nft_genmask_next(net);
- const struct nlattr *name;
- struct nft_table *table;
int family = nfmsg->nfgen_family;
+ const struct nlattr *attr;
+ struct nft_table *table;
u32 flags = 0;
struct nft_ctx ctx;
int err;
- name = nla[NFTA_TABLE_NAME];
- table = nf_tables_table_lookup(net, name, family, genmask);
+ attr = nla[NFTA_TABLE_NAME];
+ table = nft_table_lookup(net, attr, family, genmask);
if (IS_ERR(table)) {
if (PTR_ERR(table) != -ENOENT)
return PTR_ERR(table);
} else {
- if (nlh->nlmsg_flags & NLM_F_EXCL)
+ if (nlh->nlmsg_flags & NLM_F_EXCL) {
+ NL_SET_BAD_ATTR(extack, attr);
return -EEXIST;
+ }
if (nlh->nlmsg_flags & NLM_F_REPLACE)
return -EOPNOTSUPP;
@@ -789,7 +720,7 @@
if (table == NULL)
goto err_kzalloc;
- table->name = nla_strdup(name, GFP_KERNEL);
+ table->name = nla_strdup(attr, GFP_KERNEL);
if (table->name == NULL)
goto err_strdup;
@@ -912,8 +843,9 @@
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
u8 genmask = nft_genmask_next(net);
- struct nft_table *table;
int family = nfmsg->nfgen_family;
+ const struct nlattr *attr;
+ struct nft_table *table;
struct nft_ctx ctx;
nft_ctx_init(&ctx, net, skb, nlh, 0, NULL, NULL, nla);
@@ -921,16 +853,18 @@
(!nla[NFTA_TABLE_NAME] && !nla[NFTA_TABLE_HANDLE]))
return nft_flush(&ctx, family);
- if (nla[NFTA_TABLE_HANDLE])
- table = nf_tables_table_lookup_byhandle(net,
- nla[NFTA_TABLE_HANDLE],
- genmask);
- else
- table = nf_tables_table_lookup(net, nla[NFTA_TABLE_NAME],
- family, genmask);
+ if (nla[NFTA_TABLE_HANDLE]) {
+ attr = nla[NFTA_TABLE_HANDLE];
+ table = nft_table_lookup_byhandle(net, attr, genmask);
+ } else {
+ attr = nla[NFTA_TABLE_NAME];
+ table = nft_table_lookup(net, attr, family, genmask);
+ }
- if (IS_ERR(table))
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, attr);
return PTR_ERR(table);
+ }
if (nlh->nlmsg_flags & NLM_F_NONREC &&
table->use > 0)
@@ -978,8 +912,7 @@
*/
static struct nft_chain *
-nf_tables_chain_lookup_byhandle(const struct nft_table *table, u64 handle,
- u8 genmask)
+nft_chain_lookup_byhandle(const struct nft_table *table, u64 handle, u8 genmask)
{
struct nft_chain *chain;
@@ -992,9 +925,8 @@
return ERR_PTR(-ENOENT);
}
-static struct nft_chain *nf_tables_chain_lookup(const struct nft_table *table,
- const struct nlattr *nla,
- u8 genmask)
+static struct nft_chain *nft_chain_lookup(const struct nft_table *table,
+ const struct nlattr *nla, u8 genmask)
{
struct nft_chain *chain;
@@ -1223,14 +1155,17 @@
return netlink_dump_start(nlsk, skb, nlh, &c);
}
- table = nf_tables_table_lookup(net, nla[NFTA_CHAIN_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_CHAIN_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_TABLE]);
return PTR_ERR(table);
+ }
- chain = nf_tables_chain_lookup(table, nla[NFTA_CHAIN_NAME], genmask);
- if (IS_ERR(chain))
+ chain = nft_chain_lookup(table, nla[NFTA_CHAIN_NAME], genmask);
+ if (IS_ERR(chain)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_NAME]);
return PTR_ERR(chain);
+ }
skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb2)
@@ -1311,8 +1246,6 @@
if (nft_is_base_chain(chain)) {
struct nft_base_chain *basechain = nft_base_chain(chain);
- if (basechain->type->free)
- basechain->type->free(ctx);
module_put(basechain->type->owner);
free_percpu(basechain->stats);
if (basechain->stats)
@@ -1445,9 +1378,6 @@
}
basechain->type = hook.type;
- if (basechain->type->init)
- basechain->type->init(ctx);
-
chain = &basechain->chain;
ops = &basechain->ops;
@@ -1458,9 +1388,6 @@
ops->hook = hook.type->hooks[ops->hooknum];
ops->dev = hook.dev;
- if (basechain->type->type == NFT_CHAIN_T_NAT)
- ops->nat_hook = true;
-
chain->flags |= NFT_BASE_CHAIN;
basechain->policy = policy;
} else {
@@ -1542,8 +1469,7 @@
nla[NFTA_CHAIN_NAME]) {
struct nft_chain *chain2;
- chain2 = nf_tables_chain_lookup(table, nla[NFTA_CHAIN_NAME],
- genmask);
+ chain2 = nft_chain_lookup(table, nla[NFTA_CHAIN_NAME], genmask);
if (!IS_ERR(chain2))
return -EEXIST;
}
@@ -1593,9 +1519,9 @@
struct netlink_ext_ack *extack)
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
- const struct nlattr * uninitialized_var(name);
u8 genmask = nft_genmask_next(net);
int family = nfmsg->nfgen_family;
+ const struct nlattr *attr;
struct nft_table *table;
struct nft_chain *chain;
u8 policy = NF_ACCEPT;
@@ -1605,36 +1531,46 @@
create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false;
- table = nf_tables_table_lookup(net, nla[NFTA_CHAIN_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_CHAIN_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_TABLE]);
return PTR_ERR(table);
+ }
chain = NULL;
- name = nla[NFTA_CHAIN_NAME];
+ attr = nla[NFTA_CHAIN_NAME];
if (nla[NFTA_CHAIN_HANDLE]) {
handle = be64_to_cpu(nla_get_be64(nla[NFTA_CHAIN_HANDLE]));
- chain = nf_tables_chain_lookup_byhandle(table, handle, genmask);
- if (IS_ERR(chain))
- return PTR_ERR(chain);
- } else {
- chain = nf_tables_chain_lookup(table, name, genmask);
+ chain = nft_chain_lookup_byhandle(table, handle, genmask);
if (IS_ERR(chain)) {
- if (PTR_ERR(chain) != -ENOENT)
+ NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_HANDLE]);
+ return PTR_ERR(chain);
+ }
+ attr = nla[NFTA_CHAIN_HANDLE];
+ } else {
+ chain = nft_chain_lookup(table, attr, genmask);
+ if (IS_ERR(chain)) {
+ if (PTR_ERR(chain) != -ENOENT) {
+ NL_SET_BAD_ATTR(extack, attr);
return PTR_ERR(chain);
+ }
chain = NULL;
}
}
if (nla[NFTA_CHAIN_POLICY]) {
if (chain != NULL &&
- !nft_is_base_chain(chain))
+ !nft_is_base_chain(chain)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_POLICY]);
return -EOPNOTSUPP;
+ }
if (chain == NULL &&
- nla[NFTA_CHAIN_HOOK] == NULL)
+ nla[NFTA_CHAIN_HOOK] == NULL) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_POLICY]);
return -EOPNOTSUPP;
+ }
policy = ntohl(nla_get_be32(nla[NFTA_CHAIN_POLICY]));
switch (policy) {
@@ -1649,8 +1585,10 @@
nft_ctx_init(&ctx, net, skb, nlh, family, table, chain, nla);
if (chain != NULL) {
- if (nlh->nlmsg_flags & NLM_F_EXCL)
+ if (nlh->nlmsg_flags & NLM_F_EXCL) {
+ NL_SET_BAD_ATTR(extack, attr);
return -EEXIST;
+ }
if (nlh->nlmsg_flags & NLM_F_REPLACE)
return -EOPNOTSUPP;
@@ -1667,28 +1605,34 @@
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
u8 genmask = nft_genmask_next(net);
+ int family = nfmsg->nfgen_family;
+ const struct nlattr *attr;
struct nft_table *table;
struct nft_chain *chain;
struct nft_rule *rule;
- int family = nfmsg->nfgen_family;
struct nft_ctx ctx;
u64 handle;
u32 use;
int err;
- table = nf_tables_table_lookup(net, nla[NFTA_CHAIN_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_CHAIN_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_CHAIN_TABLE]);
return PTR_ERR(table);
+ }
if (nla[NFTA_CHAIN_HANDLE]) {
- handle = be64_to_cpu(nla_get_be64(nla[NFTA_CHAIN_HANDLE]));
- chain = nf_tables_chain_lookup_byhandle(table, handle, genmask);
+ attr = nla[NFTA_CHAIN_HANDLE];
+ handle = be64_to_cpu(nla_get_be64(attr));
+ chain = nft_chain_lookup_byhandle(table, handle, genmask);
} else {
- chain = nf_tables_chain_lookup(table, nla[NFTA_CHAIN_NAME], genmask);
+ attr = nla[NFTA_CHAIN_NAME];
+ chain = nft_chain_lookup(table, attr, genmask);
}
- if (IS_ERR(chain))
+ if (IS_ERR(chain)) {
+ NL_SET_BAD_ATTR(extack, attr);
return PTR_ERR(chain);
+ }
if (nlh->nlmsg_flags & NLM_F_NONREC &&
chain->use > 0)
@@ -1710,8 +1654,10 @@
/* There are rules and elements that are still holding references to us,
* we cannot do a recursive removal in this case.
*/
- if (use > 0)
+ if (use > 0) {
+ NL_SET_BAD_ATTR(extack, attr);
return -EBUSY;
+ }
return nft_delchain(&ctx);
}
@@ -1968,8 +1914,8 @@
* Rules
*/
-static struct nft_rule *__nf_tables_rule_lookup(const struct nft_chain *chain,
- u64 handle)
+static struct nft_rule *__nft_rule_lookup(const struct nft_chain *chain,
+ u64 handle)
{
struct nft_rule *rule;
@@ -1982,13 +1928,13 @@
return ERR_PTR(-ENOENT);
}
-static struct nft_rule *nf_tables_rule_lookup(const struct nft_chain *chain,
- const struct nlattr *nla)
+static struct nft_rule *nft_rule_lookup(const struct nft_chain *chain,
+ const struct nlattr *nla)
{
if (nla == NULL)
return ERR_PTR(-EINVAL);
- return __nf_tables_rule_lookup(chain, be64_to_cpu(nla_get_be64(nla)));
+ return __nft_rule_lookup(chain, be64_to_cpu(nla_get_be64(nla)));
}
static const struct nla_policy nft_rule_policy[NFTA_RULE_MAX + 1] = {
@@ -2220,18 +2166,23 @@
return netlink_dump_start(nlsk, skb, nlh, &c);
}
- table = nf_tables_table_lookup(net, nla[NFTA_RULE_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_RULE_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_TABLE]);
return PTR_ERR(table);
+ }
- chain = nf_tables_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask);
- if (IS_ERR(chain))
+ chain = nft_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask);
+ if (IS_ERR(chain)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
return PTR_ERR(chain);
+ }
- rule = nf_tables_rule_lookup(chain, nla[NFTA_RULE_HANDLE]);
- if (IS_ERR(rule))
+ rule = nft_rule_lookup(chain, nla[NFTA_RULE_HANDLE]);
+ if (IS_ERR(rule)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_HANDLE]);
return PTR_ERR(rule);
+ }
skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb2)
@@ -2301,23 +2252,30 @@
create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false;
- table = nf_tables_table_lookup(net, nla[NFTA_RULE_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_RULE_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_TABLE]);
return PTR_ERR(table);
+ }
- chain = nf_tables_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask);
- if (IS_ERR(chain))
+ chain = nft_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask);
+ if (IS_ERR(chain)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
return PTR_ERR(chain);
+ }
if (nla[NFTA_RULE_HANDLE]) {
handle = be64_to_cpu(nla_get_be64(nla[NFTA_RULE_HANDLE]));
- rule = __nf_tables_rule_lookup(chain, handle);
- if (IS_ERR(rule))
+ rule = __nft_rule_lookup(chain, handle);
+ if (IS_ERR(rule)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_HANDLE]);
return PTR_ERR(rule);
+ }
- if (nlh->nlmsg_flags & NLM_F_EXCL)
+ if (nlh->nlmsg_flags & NLM_F_EXCL) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_HANDLE]);
return -EEXIST;
+ }
if (nlh->nlmsg_flags & NLM_F_REPLACE)
old_rule = rule;
else
@@ -2336,9 +2294,11 @@
return -EOPNOTSUPP;
pos_handle = be64_to_cpu(nla_get_be64(nla[NFTA_RULE_POSITION]));
- old_rule = __nf_tables_rule_lookup(chain, pos_handle);
- if (IS_ERR(old_rule))
+ old_rule = __nft_rule_lookup(chain, pos_handle);
+ if (IS_ERR(old_rule)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_POSITION]);
return PTR_ERR(old_rule);
+ }
}
nft_ctx_init(&ctx, net, skb, nlh, family, table, chain, nla);
@@ -2476,32 +2436,37 @@
int family = nfmsg->nfgen_family, err = 0;
struct nft_ctx ctx;
- table = nf_tables_table_lookup(net, nla[NFTA_RULE_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_RULE_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_TABLE]);
return PTR_ERR(table);
+ }
if (nla[NFTA_RULE_CHAIN]) {
- chain = nf_tables_chain_lookup(table, nla[NFTA_RULE_CHAIN],
- genmask);
- if (IS_ERR(chain))
+ chain = nft_chain_lookup(table, nla[NFTA_RULE_CHAIN], genmask);
+ if (IS_ERR(chain)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
return PTR_ERR(chain);
+ }
}
nft_ctx_init(&ctx, net, skb, nlh, family, table, chain, nla);
if (chain) {
if (nla[NFTA_RULE_HANDLE]) {
- rule = nf_tables_rule_lookup(chain,
- nla[NFTA_RULE_HANDLE]);
- if (IS_ERR(rule))
+ rule = nft_rule_lookup(chain, nla[NFTA_RULE_HANDLE]);
+ if (IS_ERR(rule)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_HANDLE]);
return PTR_ERR(rule);
+ }
err = nft_delrule(&ctx, rule);
} else if (nla[NFTA_RULE_ID]) {
rule = nft_rule_lookup_byid(net, nla[NFTA_RULE_ID]);
- if (IS_ERR(rule))
+ if (IS_ERR(rule)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_ID]);
return PTR_ERR(rule);
+ }
err = nft_delrule(&ctx, rule);
} else {
@@ -2546,14 +2511,12 @@
EXPORT_SYMBOL_GPL(nft_unregister_set);
#define NFT_SET_FEATURES (NFT_SET_INTERVAL | NFT_SET_MAP | \
- NFT_SET_TIMEOUT | NFT_SET_OBJECT)
+ NFT_SET_TIMEOUT | NFT_SET_OBJECT | \
+ NFT_SET_EVAL)
-static bool nft_set_ops_candidate(const struct nft_set_ops *ops, u32 flags)
+static bool nft_set_ops_candidate(const struct nft_set_type *type, u32 flags)
{
- if ((flags & NFT_SET_EVAL) && !ops->update)
- return false;
-
- return (flags & ops->features) == (flags & NFT_SET_FEATURES);
+ return (flags & type->features) == (flags & NFT_SET_FEATURES);
}
/*
@@ -2590,14 +2553,9 @@
best.space = ~0;
list_for_each_entry(type, &nf_tables_set_types, list) {
- if (!type->select_ops)
- ops = type->ops;
- else
- ops = type->select_ops(ctx, desc, flags);
- if (!ops)
- continue;
+ ops = &type->ops;
- if (!nft_set_ops_candidate(ops, flags))
+ if (!nft_set_ops_candidate(type, flags))
continue;
if (!ops->estimate(desc, flags, &est))
continue;
@@ -2628,7 +2586,7 @@
if (!try_module_get(type->owner))
continue;
if (bops != NULL)
- module_put(bops->type->owner);
+ module_put(to_set_type(bops)->owner);
bops = ops;
best = est;
@@ -2669,6 +2627,7 @@
const struct sk_buff *skb,
const struct nlmsghdr *nlh,
const struct nlattr * const nla[],
+ struct netlink_ext_ack *extack,
u8 genmask)
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
@@ -2676,18 +2635,20 @@
struct nft_table *table = NULL;
if (nla[NFTA_SET_TABLE] != NULL) {
- table = nf_tables_table_lookup(net, nla[NFTA_SET_TABLE],
- family, genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_SET_TABLE], family,
+ genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_SET_TABLE]);
return PTR_ERR(table);
+ }
}
nft_ctx_init(ctx, net, skb, nlh, family, table, NULL, nla);
return 0;
}
-static struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
- const struct nlattr *nla, u8 genmask)
+static struct nft_set *nft_set_lookup(const struct nft_table *table,
+ const struct nlattr *nla, u8 genmask)
{
struct nft_set *set;
@@ -2702,14 +2663,12 @@
return ERR_PTR(-ENOENT);
}
-static struct nft_set *nf_tables_set_lookup_byhandle(const struct nft_table *table,
- const struct nlattr *nla, u8 genmask)
+static struct nft_set *nft_set_lookup_byhandle(const struct nft_table *table,
+ const struct nlattr *nla,
+ u8 genmask)
{
struct nft_set *set;
- if (nla == NULL)
- return ERR_PTR(-EINVAL);
-
list_for_each_entry(set, &table->sets, list) {
if (be64_to_cpu(nla_get_be64(nla)) == set->handle &&
nft_active_genmask(set, genmask))
@@ -2718,9 +2677,8 @@
return ERR_PTR(-ENOENT);
}
-static struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
- const struct nlattr *nla,
- u8 genmask)
+static struct nft_set *nft_set_lookup_byid(const struct net *net,
+ const struct nlattr *nla, u8 genmask)
{
struct nft_trans *trans;
u32 id = ntohl(nla_get_be32(nla));
@@ -2744,12 +2702,12 @@
{
struct nft_set *set;
- set = nf_tables_set_lookup(table, nla_set_name, genmask);
+ set = nft_set_lookup(table, nla_set_name, genmask);
if (IS_ERR(set)) {
if (!nla_set_id)
return set;
- set = nf_tables_set_lookup_byid(net, nla_set_id, genmask);
+ set = nft_set_lookup_byid(net, nla_set_id, genmask);
}
return set;
}
@@ -2809,6 +2767,27 @@
return 0;
}
+static int nf_msecs_to_jiffies64(const struct nlattr *nla, u64 *result)
+{
+ u64 ms = be64_to_cpu(nla_get_be64(nla));
+ u64 max = (u64)(~((u64)0));
+
+ max = div_u64(max, NSEC_PER_MSEC);
+ if (ms >= max)
+ return -ERANGE;
+
+ ms *= NSEC_PER_MSEC;
+ *result = nsecs_to_jiffies64(ms);
+ return 0;
+}
+
+static u64 nf_jiffies64_to_msecs(u64 input)
+{
+ u64 ms = jiffies64_to_nsecs(input);
+
+ return cpu_to_be64(div_u64(ms, NSEC_PER_MSEC));
+}
+
static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
const struct nft_set *set, u16 event, u16 flags)
{
@@ -2856,7 +2835,7 @@
if (set->timeout &&
nla_put_be64(skb, NFTA_SET_TIMEOUT,
- cpu_to_be64(jiffies_to_msecs(set->timeout)),
+ nf_jiffies64_to_msecs(set->timeout),
NFTA_SET_PAD))
goto nla_put_failure;
if (set->gc_int &&
@@ -2994,7 +2973,8 @@
int err;
/* Verify existence before starting dump */
- err = nft_ctx_init_from_setattr(&ctx, net, skb, nlh, nla, genmask);
+ err = nft_ctx_init_from_setattr(&ctx, net, skb, nlh, nla, extack,
+ genmask);
if (err < 0)
return err;
@@ -3021,7 +3001,7 @@
if (!nla[NFTA_SET_TABLE])
return -EINVAL;
- set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_NAME], genmask);
+ set = nft_set_lookup(ctx.table, nla[NFTA_SET_NAME], genmask);
if (IS_ERR(set))
return PTR_ERR(set);
@@ -3151,8 +3131,10 @@
if (nla[NFTA_SET_TIMEOUT] != NULL) {
if (!(flags & NFT_SET_TIMEOUT))
return -EINVAL;
- timeout = msecs_to_jiffies(be64_to_cpu(nla_get_be64(
- nla[NFTA_SET_TIMEOUT])));
+
+ err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &timeout);
+ if (err)
+ return err;
}
gc_int = 0;
if (nla[NFTA_SET_GC_INTERVAL] != NULL) {
@@ -3173,22 +3155,28 @@
create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false;
- table = nf_tables_table_lookup(net, nla[NFTA_SET_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_SET_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_SET_TABLE]);
return PTR_ERR(table);
+ }
nft_ctx_init(&ctx, net, skb, nlh, family, table, NULL, nla);
- set = nf_tables_set_lookup(table, nla[NFTA_SET_NAME], genmask);
+ set = nft_set_lookup(table, nla[NFTA_SET_NAME], genmask);
if (IS_ERR(set)) {
- if (PTR_ERR(set) != -ENOENT)
+ if (PTR_ERR(set) != -ENOENT) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_SET_NAME]);
return PTR_ERR(set);
+ }
} else {
- if (nlh->nlmsg_flags & NLM_F_EXCL)
+ if (nlh->nlmsg_flags & NLM_F_EXCL) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_SET_NAME]);
return -EEXIST;
+ }
if (nlh->nlmsg_flags & NLM_F_REPLACE)
return -EOPNOTSUPP;
+
return 0;
}
@@ -3265,14 +3253,14 @@
err2:
kvfree(set);
err1:
- module_put(ops->type->owner);
+ module_put(to_set_type(ops)->owner);
return err;
}
static void nft_set_destroy(struct nft_set *set)
{
set->ops->destroy(set);
- module_put(set->ops->type->owner);
+ module_put(to_set_type(set->ops)->owner);
kfree(set->name);
kvfree(set);
}
@@ -3291,6 +3279,7 @@
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
u8 genmask = nft_genmask_next(net);
+ const struct nlattr *attr;
struct nft_set *set;
struct nft_ctx ctx;
int err;
@@ -3300,20 +3289,28 @@
if (nla[NFTA_SET_TABLE] == NULL)
return -EINVAL;
- err = nft_ctx_init_from_setattr(&ctx, net, skb, nlh, nla, genmask);
+ err = nft_ctx_init_from_setattr(&ctx, net, skb, nlh, nla, extack,
+ genmask);
if (err < 0)
return err;
- if (nla[NFTA_SET_HANDLE])
- set = nf_tables_set_lookup_byhandle(ctx.table, nla[NFTA_SET_HANDLE], genmask);
- else
- set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_NAME], genmask);
- if (IS_ERR(set))
- return PTR_ERR(set);
+ if (nla[NFTA_SET_HANDLE]) {
+ attr = nla[NFTA_SET_HANDLE];
+ set = nft_set_lookup_byhandle(ctx.table, attr, genmask);
+ } else {
+ attr = nla[NFTA_SET_NAME];
+ set = nft_set_lookup(ctx.table, attr, genmask);
+ }
+ if (IS_ERR(set)) {
+ NL_SET_BAD_ATTR(extack, attr);
+ return PTR_ERR(set);
+ }
if (!list_empty(&set->bindings) ||
- (nlh->nlmsg_flags & NLM_F_NONREC && atomic_read(&set->nelems) > 0))
+ (nlh->nlmsg_flags & NLM_F_NONREC && atomic_read(&set->nelems) > 0)) {
+ NL_SET_BAD_ATTR(extack, attr);
return -EBUSY;
+ }
return nft_delset(&ctx, set);
}
@@ -3403,8 +3400,8 @@
.align = __alignof__(u64),
},
[NFT_SET_EXT_EXPIRATION] = {
- .len = sizeof(unsigned long),
- .align = __alignof__(unsigned long),
+ .len = sizeof(u64),
+ .align = __alignof__(u64),
},
[NFT_SET_EXT_USERDATA] = {
.len = sizeof(struct nft_userdata),
@@ -3441,16 +3438,19 @@
const struct sk_buff *skb,
const struct nlmsghdr *nlh,
const struct nlattr * const nla[],
+ struct netlink_ext_ack *extack,
u8 genmask)
{
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
int family = nfmsg->nfgen_family;
struct nft_table *table;
- table = nf_tables_table_lookup(net, nla[NFTA_SET_ELEM_LIST_TABLE],
- family, genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_SET_ELEM_LIST_TABLE], family,
+ genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_SET_ELEM_LIST_TABLE]);
return PTR_ERR(table);
+ }
nft_ctx_init(ctx, net, skb, nlh, family, table, NULL, nla);
return 0;
@@ -3494,22 +3494,21 @@
if (nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT) &&
nla_put_be64(skb, NFTA_SET_ELEM_TIMEOUT,
- cpu_to_be64(jiffies_to_msecs(
- *nft_set_ext_timeout(ext))),
+ nf_jiffies64_to_msecs(*nft_set_ext_timeout(ext)),
NFTA_SET_ELEM_PAD))
goto nla_put_failure;
if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) {
- unsigned long expires, now = jiffies;
+ u64 expires, now = get_jiffies_64();
expires = *nft_set_ext_expiration(ext);
- if (time_before(now, expires))
+ if (time_before64(now, expires))
expires -= now;
else
expires = 0;
if (nla_put_be64(skb, NFTA_SET_ELEM_EXPIRATION,
- cpu_to_be64(jiffies_to_msecs(expires)),
+ nf_jiffies64_to_msecs(expires),
NFTA_SET_ELEM_PAD))
goto nla_put_failure;
}
@@ -3780,12 +3779,12 @@
struct nft_ctx ctx;
int rem, err = 0;
- err = nft_ctx_init_from_elemattr(&ctx, net, skb, nlh, nla, genmask);
+ err = nft_ctx_init_from_elemattr(&ctx, net, skb, nlh, nla, extack,
+ genmask);
if (err < 0)
return err;
- set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET],
- genmask);
+ set = nft_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET], genmask);
if (IS_ERR(set))
return PTR_ERR(set);
@@ -3884,7 +3883,7 @@
memcpy(nft_set_ext_data(ext), data, set->dlen);
if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION))
*nft_set_ext_expiration(ext) =
- jiffies + timeout;
+ get_jiffies_64() + timeout;
if (nft_set_ext_exists(ext, NFT_SET_EXT_TIMEOUT))
*nft_set_ext_timeout(ext) = timeout;
@@ -3971,8 +3970,10 @@
if (nla[NFTA_SET_ELEM_TIMEOUT] != NULL) {
if (!(set->flags & NFT_SET_TIMEOUT))
return -EINVAL;
- timeout = msecs_to_jiffies(be64_to_cpu(nla_get_be64(
- nla[NFTA_SET_ELEM_TIMEOUT])));
+ err = nf_msecs_to_jiffies64(nla[NFTA_SET_ELEM_TIMEOUT],
+ &timeout);
+ if (err)
+ return err;
} else if (set->flags & NFT_SET_TIMEOUT) {
timeout = set->timeout;
}
@@ -3997,8 +3998,8 @@
err = -EINVAL;
goto err2;
}
- obj = nf_tables_obj_lookup(ctx->table, nla[NFTA_SET_ELEM_OBJREF],
- set->objtype, genmask);
+ obj = nft_obj_lookup(ctx->table, nla[NFTA_SET_ELEM_OBJREF],
+ set->objtype, genmask);
if (IS_ERR(obj)) {
err = PTR_ERR(obj);
goto err2;
@@ -4137,7 +4138,8 @@
if (nla[NFTA_SET_ELEM_LIST_ELEMENTS] == NULL)
return -EINVAL;
- err = nft_ctx_init_from_elemattr(&ctx, net, skb, nlh, nla, genmask);
+ err = nft_ctx_init_from_elemattr(&ctx, net, skb, nlh, nla, extack,
+ genmask);
if (err < 0)
return err;
@@ -4325,12 +4327,12 @@
struct nft_ctx ctx;
int rem, err = 0;
- err = nft_ctx_init_from_elemattr(&ctx, net, skb, nlh, nla, genmask);
+ err = nft_ctx_init_from_elemattr(&ctx, net, skb, nlh, nla, extack,
+ genmask);
if (err < 0)
return err;
- set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET],
- genmask);
+ set = nft_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET], genmask);
if (IS_ERR(set))
return PTR_ERR(set);
if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT)
@@ -4418,9 +4420,9 @@
}
EXPORT_SYMBOL_GPL(nft_unregister_obj);
-struct nft_object *nf_tables_obj_lookup(const struct nft_table *table,
- const struct nlattr *nla,
- u32 objtype, u8 genmask)
+struct nft_object *nft_obj_lookup(const struct nft_table *table,
+ const struct nlattr *nla, u32 objtype,
+ u8 genmask)
{
struct nft_object *obj;
@@ -4432,11 +4434,11 @@
}
return ERR_PTR(-ENOENT);
}
-EXPORT_SYMBOL_GPL(nf_tables_obj_lookup);
+EXPORT_SYMBOL_GPL(nft_obj_lookup);
-static struct nft_object *nf_tables_obj_lookup_byhandle(const struct nft_table *table,
- const struct nlattr *nla,
- u32 objtype, u8 genmask)
+static struct nft_object *nft_obj_lookup_byhandle(const struct nft_table *table,
+ const struct nlattr *nla,
+ u32 objtype, u8 genmask)
{
struct nft_object *obj;
@@ -4580,22 +4582,25 @@
!nla[NFTA_OBJ_DATA])
return -EINVAL;
- table = nf_tables_table_lookup(net, nla[NFTA_OBJ_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_OBJ_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_TABLE]);
return PTR_ERR(table);
+ }
objtype = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
- obj = nf_tables_obj_lookup(table, nla[NFTA_OBJ_NAME], objtype, genmask);
+ obj = nft_obj_lookup(table, nla[NFTA_OBJ_NAME], objtype, genmask);
if (IS_ERR(obj)) {
err = PTR_ERR(obj);
- if (err != -ENOENT)
+ if (err != -ENOENT) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_NAME]);
return err;
-
+ }
} else {
- if (nlh->nlmsg_flags & NLM_F_EXCL)
+ if (nlh->nlmsg_flags & NLM_F_EXCL) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_NAME]);
return -EEXIST;
-
+ }
return 0;
}
@@ -4806,15 +4811,18 @@
!nla[NFTA_OBJ_TYPE])
return -EINVAL;
- table = nf_tables_table_lookup(net, nla[NFTA_OBJ_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_OBJ_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_TABLE]);
return PTR_ERR(table);
+ }
objtype = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
- obj = nf_tables_obj_lookup(table, nla[NFTA_OBJ_NAME], objtype, genmask);
- if (IS_ERR(obj))
+ obj = nft_obj_lookup(table, nla[NFTA_OBJ_NAME], objtype, genmask);
+ if (IS_ERR(obj)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_NAME]);
return PTR_ERR(obj);
+ }
skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
if (!skb2)
@@ -4853,6 +4861,7 @@
const struct nfgenmsg *nfmsg = nlmsg_data(nlh);
u8 genmask = nft_genmask_next(net);
int family = nfmsg->nfgen_family;
+ const struct nlattr *attr;
struct nft_table *table;
struct nft_object *obj;
struct nft_ctx ctx;
@@ -4862,22 +4871,29 @@
(!nla[NFTA_OBJ_NAME] && !nla[NFTA_OBJ_HANDLE]))
return -EINVAL;
- table = nf_tables_table_lookup(net, nla[NFTA_OBJ_TABLE], family,
- genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_OBJ_TABLE], family, genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_TABLE]);
return PTR_ERR(table);
+ }
objtype = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
- if (nla[NFTA_OBJ_HANDLE])
- obj = nf_tables_obj_lookup_byhandle(table, nla[NFTA_OBJ_HANDLE],
- objtype, genmask);
- else
- obj = nf_tables_obj_lookup(table, nla[NFTA_OBJ_NAME],
- objtype, genmask);
- if (IS_ERR(obj))
+ if (nla[NFTA_OBJ_HANDLE]) {
+ attr = nla[NFTA_OBJ_HANDLE];
+ obj = nft_obj_lookup_byhandle(table, attr, objtype, genmask);
+ } else {
+ attr = nla[NFTA_OBJ_NAME];
+ obj = nft_obj_lookup(table, attr, objtype, genmask);
+ }
+
+ if (IS_ERR(obj)) {
+ NL_SET_BAD_ATTR(extack, attr);
return PTR_ERR(obj);
- if (obj->use > 0)
+ }
+ if (obj->use > 0) {
+ NL_SET_BAD_ATTR(extack, attr);
return -EBUSY;
+ }
nft_ctx_init(&ctx, net, skb, nlh, family, table, NULL, nla);
@@ -4948,9 +4964,8 @@
[NFTA_FLOWTABLE_HANDLE] = { .type = NLA_U64 },
};
-struct nft_flowtable *nf_tables_flowtable_lookup(const struct nft_table *table,
- const struct nlattr *nla,
- u8 genmask)
+struct nft_flowtable *nft_flowtable_lookup(const struct nft_table *table,
+ const struct nlattr *nla, u8 genmask)
{
struct nft_flowtable *flowtable;
@@ -4961,11 +4976,11 @@
}
return ERR_PTR(-ENOENT);
}
-EXPORT_SYMBOL_GPL(nf_tables_flowtable_lookup);
+EXPORT_SYMBOL_GPL(nft_flowtable_lookup);
static struct nft_flowtable *
-nf_tables_flowtable_lookup_byhandle(const struct nft_table *table,
- const struct nlattr *nla, u8 genmask)
+nft_flowtable_lookup_byhandle(const struct nft_table *table,
+ const struct nlattr *nla, u8 genmask)
{
struct nft_flowtable *flowtable;
@@ -5064,7 +5079,7 @@
flowtable->ops[i].pf = NFPROTO_NETDEV;
flowtable->ops[i].hooknum = hooknum;
flowtable->ops[i].priority = priority;
- flowtable->ops[i].priv = &flowtable->data.rhashtable;
+ flowtable->ops[i].priv = &flowtable->data;
flowtable->ops[i].hook = flowtable->data.type->hook;
flowtable->ops[i].dev = dev_array[i];
flowtable->dev_name[i] = kstrdup(dev_array[i]->name,
@@ -5105,23 +5120,6 @@
return ERR_PTR(-ENOENT);
}
-void nft_flow_table_iterate(struct net *net,
- void (*iter)(struct nf_flowtable *flowtable, void *data),
- void *data)
-{
- struct nft_flowtable *flowtable;
- const struct nft_table *table;
-
- nfnl_lock(NFNL_SUBSYS_NFTABLES);
- list_for_each_entry(table, &net->nft.tables, list) {
- list_for_each_entry(flowtable, &table->flowtables, list) {
- iter(&flowtable->data, data);
- }
- }
- nfnl_unlock(NFNL_SUBSYS_NFTABLES);
-}
-EXPORT_SYMBOL_GPL(nft_flow_table_iterate);
-
static void nft_unregister_flowtable_net_hooks(struct net *net,
struct nft_flowtable *flowtable)
{
@@ -5155,20 +5153,26 @@
!nla[NFTA_FLOWTABLE_HOOK])
return -EINVAL;
- table = nf_tables_table_lookup(net, nla[NFTA_FLOWTABLE_TABLE],
- family, genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_FLOWTABLE_TABLE], family,
+ genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_FLOWTABLE_TABLE]);
return PTR_ERR(table);
+ }
- flowtable = nf_tables_flowtable_lookup(table, nla[NFTA_FLOWTABLE_NAME],
- genmask);
+ flowtable = nft_flowtable_lookup(table, nla[NFTA_FLOWTABLE_NAME],
+ genmask);
if (IS_ERR(flowtable)) {
err = PTR_ERR(flowtable);
- if (err != -ENOENT)
+ if (err != -ENOENT) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_FLOWTABLE_NAME]);
return err;
+ }
} else {
- if (nlh->nlmsg_flags & NLM_F_EXCL)
+ if (nlh->nlmsg_flags & NLM_F_EXCL) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_FLOWTABLE_NAME]);
return -EEXIST;
+ }
return 0;
}
@@ -5195,14 +5199,14 @@
}
flowtable->data.type = type;
- err = rhashtable_init(&flowtable->data.rhashtable, type->params);
+ err = type->init(&flowtable->data);
if (err < 0)
goto err3;
err = nf_tables_flowtable_parse_hook(&ctx, nla[NFTA_FLOWTABLE_HOOK],
flowtable);
if (err < 0)
- goto err3;
+ goto err4;
for (i = 0; i < flowtable->ops_len; i++) {
if (!flowtable->ops[i].dev)
@@ -5216,37 +5220,35 @@
if (flowtable->ops[i].dev == ft->ops[k].dev &&
flowtable->ops[i].pf == ft->ops[k].pf) {
err = -EBUSY;
- goto err4;
+ goto err5;
}
}
}
err = nf_register_net_hook(net, &flowtable->ops[i]);
if (err < 0)
- goto err4;
+ goto err5;
}
err = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable);
if (err < 0)
- goto err5;
-
- INIT_DEFERRABLE_WORK(&flowtable->data.gc_work, type->gc);
- queue_delayed_work(system_power_efficient_wq,
- &flowtable->data.gc_work, HZ);
+ goto err6;
list_add_tail_rcu(&flowtable->list, &table->flowtables);
table->use++;
return 0;
-err5:
+err6:
i = flowtable->ops_len;
-err4:
+err5:
for (k = i - 1; k >= 0; k--) {
kfree(flowtable->dev_name[k]);
nf_unregister_net_hook(net, &flowtable->ops[k]);
}
kfree(flowtable->ops);
+err4:
+ flowtable->data.type->free(&flowtable->data);
err3:
module_put(type->owner);
err2:
@@ -5266,6 +5268,7 @@
u8 genmask = nft_genmask_next(net);
int family = nfmsg->nfgen_family;
struct nft_flowtable *flowtable;
+ const struct nlattr *attr;
struct nft_table *table;
struct nft_ctx ctx;
@@ -5274,23 +5277,29 @@
!nla[NFTA_FLOWTABLE_HANDLE]))
return -EINVAL;
- table = nf_tables_table_lookup(net, nla[NFTA_FLOWTABLE_TABLE],
- family, genmask);
- if (IS_ERR(table))
+ table = nft_table_lookup(net, nla[NFTA_FLOWTABLE_TABLE], family,
+ genmask);
+ if (IS_ERR(table)) {
+ NL_SET_BAD_ATTR(extack, nla[NFTA_FLOWTABLE_TABLE]);
return PTR_ERR(table);
+ }
- if (nla[NFTA_FLOWTABLE_HANDLE])
- flowtable = nf_tables_flowtable_lookup_byhandle(table,
- nla[NFTA_FLOWTABLE_HANDLE],
- genmask);
- else
- flowtable = nf_tables_flowtable_lookup(table,
- nla[NFTA_FLOWTABLE_NAME],
- genmask);
- if (IS_ERR(flowtable))
- return PTR_ERR(flowtable);
- if (flowtable->use > 0)
+ if (nla[NFTA_FLOWTABLE_HANDLE]) {
+ attr = nla[NFTA_FLOWTABLE_HANDLE];
+ flowtable = nft_flowtable_lookup_byhandle(table, attr, genmask);
+ } else {
+ attr = nla[NFTA_FLOWTABLE_NAME];
+ flowtable = nft_flowtable_lookup(table, attr, genmask);
+ }
+
+ if (IS_ERR(flowtable)) {
+ NL_SET_BAD_ATTR(extack, attr);
+ return PTR_ERR(flowtable);
+ }
+ if (flowtable->use > 0) {
+ NL_SET_BAD_ATTR(extack, attr);
return -EBUSY;
+ }
nft_ctx_init(&ctx, net, skb, nlh, family, table, NULL, nla);
@@ -5471,13 +5480,13 @@
if (!nla[NFTA_FLOWTABLE_NAME])
return -EINVAL;
- table = nf_tables_table_lookup(net, nla[NFTA_FLOWTABLE_TABLE],
- family, genmask);
+ table = nft_table_lookup(net, nla[NFTA_FLOWTABLE_TABLE], family,
+ genmask);
if (IS_ERR(table))
return PTR_ERR(table);
- flowtable = nf_tables_flowtable_lookup(table, nla[NFTA_FLOWTABLE_NAME],
- genmask);
+ flowtable = nft_flowtable_lookup(table, nla[NFTA_FLOWTABLE_NAME],
+ genmask);
if (IS_ERR(flowtable))
return PTR_ERR(flowtable);
@@ -5530,11 +5539,9 @@
static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable)
{
- cancel_delayed_work_sync(&flowtable->data.gc_work);
kfree(flowtable->ops);
kfree(flowtable->name);
flowtable->data.type->free(&flowtable->data);
- rhashtable_destroy(&flowtable->data.rhashtable);
module_put(flowtable->data.type->owner);
}
@@ -6459,8 +6466,8 @@
case NFT_GOTO:
if (!tb[NFTA_VERDICT_CHAIN])
return -EINVAL;
- chain = nf_tables_chain_lookup(ctx->table,
- tb[NFTA_VERDICT_CHAIN], genmask);
+ chain = nft_chain_lookup(ctx->table, tb[NFTA_VERDICT_CHAIN],
+ genmask);
if (IS_ERR(chain))
return PTR_ERR(chain);
if (nft_is_base_chain(chain))
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 942702a..4f46d2f 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -41,7 +41,7 @@
static noinline void __nft_trace_packet(struct nft_traceinfo *info,
const struct nft_chain *chain,
- int rulenum, enum nft_trace_types type)
+ enum nft_trace_types type)
{
const struct nft_pktinfo *pkt = info->pkt;
@@ -52,22 +52,16 @@
info->type = type;
nft_trace_notify(info);
-
- nf_log_trace(nft_net(pkt), nft_pf(pkt), nft_hook(pkt), pkt->skb,
- nft_in(pkt), nft_out(pkt), &trace_loginfo,
- "TRACE: %s:%s:%s:%u ",
- chain->table->name, chain->name, comments[type], rulenum);
}
static inline void nft_trace_packet(struct nft_traceinfo *info,
const struct nft_chain *chain,
const struct nft_rule *rule,
- int rulenum,
enum nft_trace_types type)
{
if (static_branch_unlikely(&nft_trace_enabled)) {
info->rule = rule;
- __nft_trace_packet(info, chain, rulenum, type);
+ __nft_trace_packet(info, chain, type);
}
}
@@ -140,7 +134,6 @@
struct nft_jumpstack {
const struct nft_chain *chain;
const struct nft_rule *rule;
- int rulenum;
};
unsigned int
@@ -153,7 +146,6 @@
struct nft_regs regs;
unsigned int stackptr = 0;
struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE];
- int rulenum;
unsigned int gencursor = nft_genmask_cur(net);
struct nft_traceinfo info;
@@ -161,7 +153,6 @@
if (static_branch_unlikely(&nft_trace_enabled))
nft_trace_init(&info, pkt, ®s.verdict, basechain);
do_chain:
- rulenum = 0;
rule = list_entry(&chain->rules, struct nft_rule, list);
next_rule:
regs.verdict.code = NFT_CONTINUE;
@@ -171,8 +162,6 @@
if (unlikely(rule->genmask & gencursor))
continue;
- rulenum++;
-
nft_rule_for_each_expr(expr, last, rule) {
if (expr->ops == &nft_cmp_fast_ops)
nft_cmp_fast_eval(expr, ®s);
@@ -190,7 +179,7 @@
continue;
case NFT_CONTINUE:
nft_trace_packet(&info, chain, rule,
- rulenum, NFT_TRACETYPE_RULE);
+ NFT_TRACETYPE_RULE);
continue;
}
break;
@@ -202,7 +191,7 @@
case NF_QUEUE:
case NF_STOLEN:
nft_trace_packet(&info, chain, rule,
- rulenum, NFT_TRACETYPE_RULE);
+ NFT_TRACETYPE_RULE);
return regs.verdict.code;
}
@@ -211,21 +200,19 @@
BUG_ON(stackptr >= NFT_JUMP_STACK_SIZE);
jumpstack[stackptr].chain = chain;
jumpstack[stackptr].rule = rule;
- jumpstack[stackptr].rulenum = rulenum;
stackptr++;
/* fall through */
case NFT_GOTO:
nft_trace_packet(&info, chain, rule,
- rulenum, NFT_TRACETYPE_RULE);
+ NFT_TRACETYPE_RULE);
chain = regs.verdict.chain;
goto do_chain;
case NFT_CONTINUE:
- rulenum++;
/* fall through */
case NFT_RETURN:
nft_trace_packet(&info, chain, rule,
- rulenum, NFT_TRACETYPE_RETURN);
+ NFT_TRACETYPE_RETURN);
break;
default:
WARN_ON(1);
@@ -235,12 +222,10 @@
stackptr--;
chain = jumpstack[stackptr].chain;
rule = jumpstack[stackptr].rule;
- rulenum = jumpstack[stackptr].rulenum;
goto next_rule;
}
- nft_trace_packet(&info, basechain, NULL, -1,
- NFT_TRACETYPE_POLICY);
+ nft_trace_packet(&info, basechain, NULL, NFT_TRACETYPE_POLICY);
if (static_branch_unlikely(&nft_counters_enabled))
nft_update_chain_stats(basechain, pkt);
@@ -258,6 +243,9 @@
&nft_payload_type,
&nft_dynset_type,
&nft_range_type,
+ &nft_meta_type,
+ &nft_rt_type,
+ &nft_exthdr_type,
};
int __init nf_tables_core_module_init(void)
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 7b46aa4..e5cc4d9 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -37,7 +37,6 @@
#include <net/sock.h>
#include <net/netfilter/nf_log.h>
#include <net/netns/generic.h>
-#include <net/netfilter/nfnetlink_log.h>
#include <linux/atomic.h>
#include <linux/refcount.h>
@@ -47,6 +46,7 @@
#include "../bridge/br_private.h"
#endif
+#define NFULNL_COPY_DISABLED 0xff
#define NFULNL_NLBUFSIZ_DEFAULT NLMSG_GOODSIZE
#define NFULNL_TIMEOUT_DEFAULT 100 /* every second */
#define NFULNL_QTHRESH_DEFAULT 100 /* 100 packets */
@@ -618,7 +618,7 @@
};
/* log handler for internal netfilter logging api */
-void
+static void
nfulnl_log_packet(struct net *net,
u_int8_t pf,
unsigned int hooknum,
@@ -633,7 +633,7 @@
struct nfulnl_instance *inst;
const struct nf_loginfo *li;
unsigned int qthreshold;
- unsigned int plen;
+ unsigned int plen = 0;
struct nfnl_log_net *log = nfnl_log_pernet(net);
const struct nfnl_ct_hook *nfnl_ct = NULL;
struct nf_conn *ct = NULL;
@@ -648,7 +648,6 @@
if (!inst)
return;
- plen = 0;
if (prefix)
plen = strlen(prefix) + 1;
@@ -760,7 +759,6 @@
/* FIXME: statistics */
goto unlock_and_release;
}
-EXPORT_SYMBOL_GPL(nfulnl_log_packet);
static int
nfulnl_rcv_nl_event(struct notifier_block *this,
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 74a0463..2c17304 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -227,6 +227,25 @@
return entry;
}
+static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
+{
+ struct nf_ct_hook *ct_hook;
+ int err;
+
+ if (verdict == NF_ACCEPT ||
+ verdict == NF_STOP) {
+ rcu_read_lock();
+ ct_hook = rcu_dereference(nf_ct_hook);
+ if (ct_hook) {
+ err = ct_hook->update(entry->state.net, entry->skb);
+ if (err < 0)
+ verdict = NF_DROP;
+ }
+ rcu_read_unlock();
+ }
+ nf_reinject(entry, verdict);
+}
+
static void
nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
{
@@ -237,7 +256,7 @@
if (!cmpfn || cmpfn(entry, data)) {
list_del(&entry->list);
queue->queue_total--;
- nf_reinject(entry, NF_DROP);
+ nfqnl_reinject(entry, NF_DROP);
}
}
spin_unlock_bh(&queue->lock);
@@ -686,7 +705,7 @@
err_out_unlock:
spin_unlock_bh(&queue->lock);
if (failopen)
- nf_reinject(entry, NF_ACCEPT);
+ nfqnl_reinject(entry, NF_ACCEPT);
err_out:
return err;
}
@@ -1085,7 +1104,8 @@
list_for_each_entry_safe(entry, tmp, &batch_list, list) {
if (nfqa[NFQA_MARK])
entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
- nf_reinject(entry, verdict);
+
+ nfqnl_reinject(entry, verdict);
}
return 0;
}
@@ -1208,7 +1228,7 @@
if (nfqa[NFQA_MARK])
entry->skb->mark = ntohl(nla_get_be32(nfqa[NFQA_MARK]));
- nf_reinject(entry, verdict);
+ nfqnl_reinject(entry, verdict);
return 0;
}
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 04863fa..b07a3fd 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -36,7 +36,7 @@
u64 timeout;
void *elem;
- if (set->size && !atomic_add_unless(&set->nelems, 1, set->size))
+ if (!atomic_add_unless(&set->nelems, 1, set->size))
return NULL;
timeout = priv->timeout ? : set->timeout;
@@ -81,7 +81,7 @@
if (priv->op == NFT_DYNSET_OP_UPDATE &&
nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) {
timeout = priv->timeout ? : set->timeout;
- *nft_set_ext_expiration(ext) = jiffies + timeout;
+ *nft_set_ext_expiration(ext) = get_jiffies_64() + timeout;
}
if (sexpr != NULL)
@@ -216,6 +216,9 @@
if (err < 0)
goto err1;
+ if (set->size == 0)
+ set->size = 0xffff;
+
priv->set = set;
return 0;
diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
index 47ec104..a940c9f 100644
--- a/net/netfilter/nft_exthdr.c
+++ b/net/netfilter/nft_exthdr.c
@@ -10,11 +10,10 @@
#include <asm/unaligned.h>
#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/module.h>
#include <linux/netlink.h>
#include <linux/netfilter.h>
#include <linux/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
#include <net/netfilter/nf_tables.h>
#include <net/tcp.h>
@@ -353,7 +352,6 @@
return nft_exthdr_dump_common(skb, priv);
}
-static struct nft_expr_type nft_exthdr_type;
static const struct nft_expr_ops nft_exthdr_ipv6_ops = {
.type = &nft_exthdr_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_exthdr)),
@@ -407,27 +405,10 @@
return ERR_PTR(-EOPNOTSUPP);
}
-static struct nft_expr_type nft_exthdr_type __read_mostly = {
+struct nft_expr_type nft_exthdr_type __read_mostly = {
.name = "exthdr",
.select_ops = nft_exthdr_select_ops,
.policy = nft_exthdr_policy,
.maxattr = NFTA_EXTHDR_MAX,
.owner = THIS_MODULE,
};
-
-static int __init nft_exthdr_module_init(void)
-{
- return nft_register_expr(&nft_exthdr_type);
-}
-
-static void __exit nft_exthdr_module_exit(void)
-{
- nft_unregister_expr(&nft_exthdr_type);
-}
-
-module_init(nft_exthdr_module_init);
-module_exit(nft_exthdr_module_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-MODULE_ALIAS_NFT_EXPR("exthdr");
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index b65829b..d6bab8c 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -142,9 +142,8 @@
if (!tb[NFTA_FLOW_TABLE_NAME])
return -EINVAL;
- flowtable = nf_tables_flowtable_lookup(ctx->table,
- tb[NFTA_FLOW_TABLE_NAME],
- genmask);
+ flowtable = nft_flowtable_lookup(ctx->table, tb[NFTA_FLOW_TABLE_NAME],
+ genmask);
if (IS_ERR(flowtable))
return PTR_ERR(flowtable);
diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 24f2f75..f0fc21f 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -25,6 +25,7 @@
u32 modulus;
u32 seed;
u32 offset;
+ struct nft_set *map;
};
static void nft_jhash_eval(const struct nft_expr *expr,
@@ -35,14 +36,39 @@
const void *data = ®s->data[priv->sreg];
u32 h;
- h = reciprocal_scale(jhash(data, priv->len, priv->seed), priv->modulus);
+ h = reciprocal_scale(jhash(data, priv->len, priv->seed),
+ priv->modulus);
+
regs->data[priv->dreg] = h + priv->offset;
}
+static void nft_jhash_map_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ struct nft_jhash *priv = nft_expr_priv(expr);
+ const void *data = ®s->data[priv->sreg];
+ const struct nft_set *map = priv->map;
+ const struct nft_set_ext *ext;
+ u32 result;
+ bool found;
+
+ result = reciprocal_scale(jhash(data, priv->len, priv->seed),
+ priv->modulus) + priv->offset;
+
+ found = map->ops->lookup(nft_net(pkt), map, &result, &ext);
+ if (!found)
+ return;
+
+ nft_data_copy(®s->data[priv->dreg],
+ nft_set_ext_data(ext), map->dlen);
+}
+
struct nft_symhash {
enum nft_registers dreg:8;
u32 modulus;
u32 offset;
+ struct nft_set *map;
};
static void nft_symhash_eval(const struct nft_expr *expr,
@@ -58,6 +84,28 @@
regs->data[priv->dreg] = h + priv->offset;
}
+static void nft_symhash_map_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ struct nft_symhash *priv = nft_expr_priv(expr);
+ struct sk_buff *skb = pkt->skb;
+ const struct nft_set *map = priv->map;
+ const struct nft_set_ext *ext;
+ u32 result;
+ bool found;
+
+ result = reciprocal_scale(__skb_get_hash_symmetric(skb),
+ priv->modulus) + priv->offset;
+
+ found = map->ops->lookup(nft_net(pkt), map, &result, &ext);
+ if (!found)
+ return;
+
+ nft_data_copy(®s->data[priv->dreg],
+ nft_set_ext_data(ext), map->dlen);
+}
+
static const struct nla_policy nft_hash_policy[NFTA_HASH_MAX + 1] = {
[NFTA_HASH_SREG] = { .type = NLA_U32 },
[NFTA_HASH_DREG] = { .type = NLA_U32 },
@@ -66,6 +114,9 @@
[NFTA_HASH_SEED] = { .type = NLA_U32 },
[NFTA_HASH_OFFSET] = { .type = NLA_U32 },
[NFTA_HASH_TYPE] = { .type = NLA_U32 },
+ [NFTA_HASH_SET_NAME] = { .type = NLA_STRING,
+ .len = NFT_SET_MAXNAMELEN - 1 },
+ [NFTA_HASH_SET_ID] = { .type = NLA_U32 },
};
static int nft_jhash_init(const struct nft_ctx *ctx,
@@ -97,7 +148,7 @@
priv->len = len;
priv->modulus = ntohl(nla_get_be32(tb[NFTA_HASH_MODULUS]));
- if (priv->modulus <= 1)
+ if (priv->modulus < 1)
return -ERANGE;
if (priv->offset + priv->modulus - 1 < priv->offset)
@@ -115,6 +166,23 @@
NFT_DATA_VALUE, sizeof(u32));
}
+static int nft_jhash_map_init(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[])
+{
+ struct nft_jhash *priv = nft_expr_priv(expr);
+ u8 genmask = nft_genmask_next(ctx->net);
+
+ nft_jhash_init(ctx, expr, tb);
+ priv->map = nft_set_lookup_global(ctx->net, ctx->table,
+ tb[NFTA_HASH_SET_NAME],
+ tb[NFTA_HASH_SET_ID], genmask);
+ if (IS_ERR(priv->map))
+ return PTR_ERR(priv->map);
+
+ return 0;
+}
+
static int nft_symhash_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
@@ -141,6 +209,23 @@
NFT_DATA_VALUE, sizeof(u32));
}
+static int nft_symhash_map_init(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[])
+{
+ struct nft_jhash *priv = nft_expr_priv(expr);
+ u8 genmask = nft_genmask_next(ctx->net);
+
+ nft_symhash_init(ctx, expr, tb);
+ priv->map = nft_set_lookup_global(ctx->net, ctx->table,
+ tb[NFTA_HASH_SET_NAME],
+ tb[NFTA_HASH_SET_ID], genmask);
+ if (IS_ERR(priv->map))
+ return PTR_ERR(priv->map);
+
+ return 0;
+}
+
static int nft_jhash_dump(struct sk_buff *skb,
const struct nft_expr *expr)
{
@@ -168,6 +253,18 @@
return -1;
}
+static int nft_jhash_map_dump(struct sk_buff *skb,
+ const struct nft_expr *expr)
+{
+ const struct nft_jhash *priv = nft_expr_priv(expr);
+
+ if (nft_jhash_dump(skb, expr) ||
+ nla_put_string(skb, NFTA_HASH_SET_NAME, priv->map->name))
+ return -1;
+
+ return 0;
+}
+
static int nft_symhash_dump(struct sk_buff *skb,
const struct nft_expr *expr)
{
@@ -188,6 +285,18 @@
return -1;
}
+static int nft_symhash_map_dump(struct sk_buff *skb,
+ const struct nft_expr *expr)
+{
+ const struct nft_symhash *priv = nft_expr_priv(expr);
+
+ if (nft_symhash_dump(skb, expr) ||
+ nla_put_string(skb, NFTA_HASH_SET_NAME, priv->map->name))
+ return -1;
+
+ return 0;
+}
+
static struct nft_expr_type nft_hash_type;
static const struct nft_expr_ops nft_jhash_ops = {
.type = &nft_hash_type,
@@ -197,6 +306,14 @@
.dump = nft_jhash_dump,
};
+static const struct nft_expr_ops nft_jhash_map_ops = {
+ .type = &nft_hash_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_jhash)),
+ .eval = nft_jhash_map_eval,
+ .init = nft_jhash_map_init,
+ .dump = nft_jhash_map_dump,
+};
+
static const struct nft_expr_ops nft_symhash_ops = {
.type = &nft_hash_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_symhash)),
@@ -205,6 +322,14 @@
.dump = nft_symhash_dump,
};
+static const struct nft_expr_ops nft_symhash_map_ops = {
+ .type = &nft_hash_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_symhash)),
+ .eval = nft_symhash_map_eval,
+ .init = nft_symhash_map_init,
+ .dump = nft_symhash_map_dump,
+};
+
static const struct nft_expr_ops *
nft_hash_select_ops(const struct nft_ctx *ctx,
const struct nlattr * const tb[])
@@ -217,8 +342,12 @@
type = ntohl(nla_get_be32(tb[NFTA_HASH_TYPE]));
switch (type) {
case NFT_HASH_SYM:
+ if (tb[NFTA_HASH_SET_NAME])
+ return &nft_symhash_map_ops;
return &nft_symhash_ops;
case NFT_HASH_JENKINS:
+ if (tb[NFTA_HASH_SET_NAME])
+ return &nft_jhash_map_ops;
return &nft_jhash_ops;
default:
break;
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 8fb91940..5348bd0 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -1,5 +1,7 @@
/*
* Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net>
+ * Copyright (c) 2014 Intel Corporation
+ * Author: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -9,8 +11,6 @@
*/
#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/module.h>
#include <linux/netlink.h>
#include <linux/netfilter.h>
#include <linux/netfilter/nf_tables.h>
@@ -24,21 +24,35 @@
#include <net/tcp_states.h> /* for TCP_TIME_WAIT */
#include <net/netfilter/nf_tables.h>
#include <net/netfilter/nf_tables_core.h>
-#include <net/netfilter/nft_meta.h>
#include <uapi/linux/netfilter_bridge.h> /* NF_BR_PRE_ROUTING */
+struct nft_meta {
+ enum nft_meta_keys key:8;
+ union {
+ enum nft_registers dreg:8;
+ enum nft_registers sreg:8;
+ };
+};
+
static DEFINE_PER_CPU(struct rnd_state, nft_prandom_state);
-void nft_meta_get_eval(const struct nft_expr *expr,
- struct nft_regs *regs,
- const struct nft_pktinfo *pkt)
+#ifdef CONFIG_NF_TABLES_BRIDGE
+#include "../bridge/br_private.h"
+#endif
+
+static void nft_meta_get_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
{
const struct nft_meta *priv = nft_expr_priv(expr);
const struct sk_buff *skb = pkt->skb;
const struct net_device *in = nft_in(pkt), *out = nft_out(pkt);
struct sock *sk;
u32 *dest = ®s->data[priv->dreg];
+#ifdef CONFIG_NF_TABLES_BRIDGE
+ const struct net_bridge_port *p;
+#endif
switch (priv->key) {
case NFT_META_LEN:
@@ -215,6 +229,18 @@
nft_reg_store8(dest, !!skb->sp);
break;
#endif
+#ifdef CONFIG_NF_TABLES_BRIDGE
+ case NFT_META_BRI_IIFNAME:
+ if (in == NULL || (p = br_port_get_rcu(in)) == NULL)
+ goto err;
+ strncpy((char *)dest, p->br->dev->name, IFNAMSIZ);
+ return;
+ case NFT_META_BRI_OIFNAME:
+ if (out == NULL || (p = br_port_get_rcu(out)) == NULL)
+ goto err;
+ strncpy((char *)dest, p->br->dev->name, IFNAMSIZ);
+ return;
+#endif
default:
WARN_ON(1);
goto err;
@@ -224,11 +250,10 @@
err:
regs->verdict.code = NFT_BREAK;
}
-EXPORT_SYMBOL_GPL(nft_meta_get_eval);
-void nft_meta_set_eval(const struct nft_expr *expr,
- struct nft_regs *regs,
- const struct nft_pktinfo *pkt)
+static void nft_meta_set_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
{
const struct nft_meta *meta = nft_expr_priv(expr);
struct sk_buff *skb = pkt->skb;
@@ -258,18 +283,16 @@
WARN_ON(1);
}
}
-EXPORT_SYMBOL_GPL(nft_meta_set_eval);
-const struct nla_policy nft_meta_policy[NFTA_META_MAX + 1] = {
+static const struct nla_policy nft_meta_policy[NFTA_META_MAX + 1] = {
[NFTA_META_DREG] = { .type = NLA_U32 },
[NFTA_META_KEY] = { .type = NLA_U32 },
[NFTA_META_SREG] = { .type = NLA_U32 },
};
-EXPORT_SYMBOL_GPL(nft_meta_policy);
-int nft_meta_get_init(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
- const struct nlattr * const tb[])
+static int nft_meta_get_init(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[])
{
struct nft_meta *priv = nft_expr_priv(expr);
unsigned int len;
@@ -318,6 +341,14 @@
len = sizeof(u8);
break;
#endif
+#ifdef CONFIG_NF_TABLES_BRIDGE
+ case NFT_META_BRI_IIFNAME:
+ case NFT_META_BRI_OIFNAME:
+ if (ctx->family != NFPROTO_BRIDGE)
+ return -EOPNOTSUPP;
+ len = IFNAMSIZ;
+ break;
+#endif
default:
return -EOPNOTSUPP;
}
@@ -326,7 +357,6 @@
return nft_validate_register_store(ctx, priv->dreg, NULL,
NFT_DATA_VALUE, len);
}
-EXPORT_SYMBOL_GPL(nft_meta_get_init);
static int nft_meta_get_validate(const struct nft_ctx *ctx,
const struct nft_expr *expr,
@@ -360,9 +390,9 @@
#endif
}
-int nft_meta_set_validate(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
- const struct nft_data **data)
+static int nft_meta_set_validate(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nft_data **data)
{
struct nft_meta *priv = nft_expr_priv(expr);
unsigned int hooks;
@@ -388,11 +418,10 @@
return nft_chain_validate_hooks(ctx->chain, hooks);
}
-EXPORT_SYMBOL_GPL(nft_meta_set_validate);
-int nft_meta_set_init(const struct nft_ctx *ctx,
- const struct nft_expr *expr,
- const struct nlattr * const tb[])
+static int nft_meta_set_init(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[])
{
struct nft_meta *priv = nft_expr_priv(expr);
unsigned int len;
@@ -424,10 +453,9 @@
return 0;
}
-EXPORT_SYMBOL_GPL(nft_meta_set_init);
-int nft_meta_get_dump(struct sk_buff *skb,
- const struct nft_expr *expr)
+static int nft_meta_get_dump(struct sk_buff *skb,
+ const struct nft_expr *expr)
{
const struct nft_meta *priv = nft_expr_priv(expr);
@@ -440,10 +468,8 @@
nla_put_failure:
return -1;
}
-EXPORT_SYMBOL_GPL(nft_meta_get_dump);
-int nft_meta_set_dump(struct sk_buff *skb,
- const struct nft_expr *expr)
+static int nft_meta_set_dump(struct sk_buff *skb, const struct nft_expr *expr)
{
const struct nft_meta *priv = nft_expr_priv(expr);
@@ -457,19 +483,16 @@
nla_put_failure:
return -1;
}
-EXPORT_SYMBOL_GPL(nft_meta_set_dump);
-void nft_meta_set_destroy(const struct nft_ctx *ctx,
- const struct nft_expr *expr)
+static void nft_meta_set_destroy(const struct nft_ctx *ctx,
+ const struct nft_expr *expr)
{
const struct nft_meta *priv = nft_expr_priv(expr);
if (priv->key == NFT_META_NFTRACE)
static_branch_dec(&nft_trace_enabled);
}
-EXPORT_SYMBOL_GPL(nft_meta_set_destroy);
-static struct nft_expr_type nft_meta_type;
static const struct nft_expr_ops nft_meta_get_ops = {
.type = &nft_meta_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_meta)),
@@ -508,27 +531,10 @@
return ERR_PTR(-EINVAL);
}
-static struct nft_expr_type nft_meta_type __read_mostly = {
+struct nft_expr_type nft_meta_type __read_mostly = {
.name = "meta",
.select_ops = nft_meta_select_ops,
.policy = nft_meta_policy,
.maxattr = NFTA_META_MAX,
.owner = THIS_MODULE,
};
-
-static int __init nft_meta_module_init(void)
-{
- return nft_register_expr(&nft_meta_type);
-}
-
-static void __exit nft_meta_module_exit(void)
-{
- nft_unregister_expr(&nft_meta_type);
-}
-
-module_init(nft_meta_module_init);
-module_exit(nft_meta_module_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
-MODULE_ALIAS_NFT_EXPR("meta");
diff --git a/net/netfilter/nft_nat.c b/net/netfilter/nft_nat.c
index 1f36954..c15807d 100644
--- a/net/netfilter/nft_nat.c
+++ b/net/netfilter/nft_nat.c
@@ -43,7 +43,7 @@
const struct nft_nat *priv = nft_expr_priv(expr);
enum ip_conntrack_info ctinfo;
struct nf_conn *ct = nf_ct_get(pkt->skb, &ctinfo);
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
memset(&range, 0, sizeof(range));
if (priv->sreg_addr_min) {
diff --git a/net/netfilter/nft_numgen.c b/net/netfilter/nft_numgen.c
index 5a3a52c..cdbc62a 100644
--- a/net/netfilter/nft_numgen.c
+++ b/net/netfilter/nft_numgen.c
@@ -24,13 +24,11 @@
u32 modulus;
atomic_t counter;
u32 offset;
+ struct nft_set *map;
};
-static void nft_ng_inc_eval(const struct nft_expr *expr,
- struct nft_regs *regs,
- const struct nft_pktinfo *pkt)
+static u32 nft_ng_inc_gen(struct nft_ng_inc *priv)
{
- struct nft_ng_inc *priv = nft_expr_priv(expr);
u32 nval, oval;
do {
@@ -38,7 +36,36 @@
nval = (oval + 1 < priv->modulus) ? oval + 1 : 0;
} while (atomic_cmpxchg(&priv->counter, oval, nval) != oval);
- regs->data[priv->dreg] = nval + priv->offset;
+ return nval + priv->offset;
+}
+
+static void nft_ng_inc_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ struct nft_ng_inc *priv = nft_expr_priv(expr);
+
+ regs->data[priv->dreg] = nft_ng_inc_gen(priv);
+}
+
+static void nft_ng_inc_map_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ struct nft_ng_inc *priv = nft_expr_priv(expr);
+ const struct nft_set *map = priv->map;
+ const struct nft_set_ext *ext;
+ u32 result;
+ bool found;
+
+ result = nft_ng_inc_gen(priv);
+ found = map->ops->lookup(nft_net(pkt), map, &result, &ext);
+
+ if (!found)
+ return;
+
+ nft_data_copy(®s->data[priv->dreg],
+ nft_set_ext_data(ext), map->dlen);
}
static const struct nla_policy nft_ng_policy[NFTA_NG_MAX + 1] = {
@@ -46,6 +73,9 @@
[NFTA_NG_MODULUS] = { .type = NLA_U32 },
[NFTA_NG_TYPE] = { .type = NLA_U32 },
[NFTA_NG_OFFSET] = { .type = NLA_U32 },
+ [NFTA_NG_SET_NAME] = { .type = NLA_STRING,
+ .len = NFT_SET_MAXNAMELEN - 1 },
+ [NFTA_NG_SET_ID] = { .type = NLA_U32 },
};
static int nft_ng_inc_init(const struct nft_ctx *ctx,
@@ -71,6 +101,25 @@
NFT_DATA_VALUE, sizeof(u32));
}
+static int nft_ng_inc_map_init(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[])
+{
+ struct nft_ng_inc *priv = nft_expr_priv(expr);
+ u8 genmask = nft_genmask_next(ctx->net);
+
+ nft_ng_inc_init(ctx, expr, tb);
+
+ priv->map = nft_set_lookup_global(ctx->net, ctx->table,
+ tb[NFTA_NG_SET_NAME],
+ tb[NFTA_NG_SET_ID], genmask);
+
+ if (IS_ERR(priv->map))
+ return PTR_ERR(priv->map);
+
+ return 0;
+}
+
static int nft_ng_dump(struct sk_buff *skb, enum nft_registers dreg,
u32 modulus, enum nft_ng_types type, u32 offset)
{
@@ -97,22 +146,63 @@
priv->offset);
}
+static int nft_ng_inc_map_dump(struct sk_buff *skb,
+ const struct nft_expr *expr)
+{
+ const struct nft_ng_inc *priv = nft_expr_priv(expr);
+
+ if (nft_ng_dump(skb, priv->dreg, priv->modulus,
+ NFT_NG_INCREMENTAL, priv->offset) ||
+ nla_put_string(skb, NFTA_NG_SET_NAME, priv->map->name))
+ goto nla_put_failure;
+
+ return 0;
+
+nla_put_failure:
+ return -1;
+}
+
struct nft_ng_random {
enum nft_registers dreg:8;
u32 modulus;
u32 offset;
+ struct nft_set *map;
};
+static u32 nft_ng_random_gen(struct nft_ng_random *priv)
+{
+ struct rnd_state *state = this_cpu_ptr(&nft_numgen_prandom_state);
+
+ return reciprocal_scale(prandom_u32_state(state), priv->modulus) +
+ priv->offset;
+}
+
static void nft_ng_random_eval(const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_pktinfo *pkt)
{
struct nft_ng_random *priv = nft_expr_priv(expr);
- struct rnd_state *state = this_cpu_ptr(&nft_numgen_prandom_state);
- u32 val;
- val = reciprocal_scale(prandom_u32_state(state), priv->modulus);
- regs->data[priv->dreg] = val + priv->offset;
+ regs->data[priv->dreg] = nft_ng_random_gen(priv);
+}
+
+static void nft_ng_random_map_eval(const struct nft_expr *expr,
+ struct nft_regs *regs,
+ const struct nft_pktinfo *pkt)
+{
+ struct nft_ng_random *priv = nft_expr_priv(expr);
+ const struct nft_set *map = priv->map;
+ const struct nft_set_ext *ext;
+ u32 result;
+ bool found;
+
+ result = nft_ng_random_gen(priv);
+ found = map->ops->lookup(nft_net(pkt), map, &result, &ext);
+ if (!found)
+ return;
+
+ nft_data_copy(®s->data[priv->dreg],
+ nft_set_ext_data(ext), map->dlen);
}
static int nft_ng_random_init(const struct nft_ctx *ctx,
@@ -139,6 +229,23 @@
NFT_DATA_VALUE, sizeof(u32));
}
+static int nft_ng_random_map_init(const struct nft_ctx *ctx,
+ const struct nft_expr *expr,
+ const struct nlattr * const tb[])
+{
+ struct nft_ng_random *priv = nft_expr_priv(expr);
+ u8 genmask = nft_genmask_next(ctx->net);
+
+ nft_ng_random_init(ctx, expr, tb);
+ priv->map = nft_set_lookup_global(ctx->net, ctx->table,
+ tb[NFTA_NG_SET_NAME],
+ tb[NFTA_NG_SET_ID], genmask);
+ if (IS_ERR(priv->map))
+ return PTR_ERR(priv->map);
+
+ return 0;
+}
+
static int nft_ng_random_dump(struct sk_buff *skb, const struct nft_expr *expr)
{
const struct nft_ng_random *priv = nft_expr_priv(expr);
@@ -147,6 +254,22 @@
priv->offset);
}
+static int nft_ng_random_map_dump(struct sk_buff *skb,
+ const struct nft_expr *expr)
+{
+ const struct nft_ng_random *priv = nft_expr_priv(expr);
+
+ if (nft_ng_dump(skb, priv->dreg, priv->modulus,
+ NFT_NG_RANDOM, priv->offset) ||
+ nla_put_string(skb, NFTA_NG_SET_NAME, priv->map->name))
+ goto nla_put_failure;
+
+ return 0;
+
+nla_put_failure:
+ return -1;
+}
+
static struct nft_expr_type nft_ng_type;
static const struct nft_expr_ops nft_ng_inc_ops = {
.type = &nft_ng_type,
@@ -156,6 +279,14 @@
.dump = nft_ng_inc_dump,
};
+static const struct nft_expr_ops nft_ng_inc_map_ops = {
+ .type = &nft_ng_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_ng_inc)),
+ .eval = nft_ng_inc_map_eval,
+ .init = nft_ng_inc_map_init,
+ .dump = nft_ng_inc_map_dump,
+};
+
static const struct nft_expr_ops nft_ng_random_ops = {
.type = &nft_ng_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_ng_random)),
@@ -164,6 +295,14 @@
.dump = nft_ng_random_dump,
};
+static const struct nft_expr_ops nft_ng_random_map_ops = {
+ .type = &nft_ng_type,
+ .size = NFT_EXPR_SIZE(sizeof(struct nft_ng_random)),
+ .eval = nft_ng_random_map_eval,
+ .init = nft_ng_random_map_init,
+ .dump = nft_ng_random_map_dump,
+};
+
static const struct nft_expr_ops *
nft_ng_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[])
{
@@ -178,8 +317,12 @@
switch (type) {
case NFT_NG_INCREMENTAL:
+ if (tb[NFTA_NG_SET_NAME])
+ return &nft_ng_inc_map_ops;
return &nft_ng_inc_ops;
case NFT_NG_RANDOM:
+ if (tb[NFTA_NG_SET_NAME])
+ return &nft_ng_random_map_ops;
return &nft_ng_random_ops;
}
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index 0b02407..cdf348f 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -38,8 +38,8 @@
return -EINVAL;
objtype = ntohl(nla_get_be32(tb[NFTA_OBJREF_IMM_TYPE]));
- obj = nf_tables_obj_lookup(ctx->table, tb[NFTA_OBJREF_IMM_NAME], objtype,
- genmask);
+ obj = nft_obj_lookup(ctx->table, tb[NFTA_OBJREF_IMM_NAME], objtype,
+ genmask);
if (IS_ERR(obj))
return -ENOENT;
diff --git a/net/netfilter/nft_rt.c b/net/netfilter/nft_rt.c
index 11a2071..76dba9f 100644
--- a/net/netfilter/nft_rt.c
+++ b/net/netfilter/nft_rt.c
@@ -7,8 +7,6 @@
*/
#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/module.h>
#include <linux/netlink.h>
#include <linux/netfilter.h>
#include <linux/netfilter/nf_tables.h>
@@ -179,7 +177,6 @@
return nft_chain_validate_hooks(ctx->chain, hooks);
}
-static struct nft_expr_type nft_rt_type;
static const struct nft_expr_ops nft_rt_get_ops = {
.type = &nft_rt_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_rt)),
@@ -189,27 +186,10 @@
.validate = nft_rt_validate,
};
-static struct nft_expr_type nft_rt_type __read_mostly = {
+struct nft_expr_type nft_rt_type __read_mostly = {
.name = "rt",
.ops = &nft_rt_get_ops,
.policy = nft_rt_policy,
.maxattr = NFTA_RT_MAX,
.owner = THIS_MODULE,
};
-
-static int __init nft_rt_module_init(void)
-{
- return nft_register_expr(&nft_rt_type);
-}
-
-static void __exit nft_rt_module_exit(void)
-{
- nft_unregister_expr(&nft_rt_type);
-}
-
-module_init(nft_rt_module_init);
-module_exit(nft_rt_module_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Anders K. Pedersen <akp@cohaesio.com>");
-MODULE_ALIAS_NFT_EXPR("rt");
diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c
index 45fb275..d6626e0 100644
--- a/net/netfilter/nft_set_bitmap.c
+++ b/net/netfilter/nft_set_bitmap.c
@@ -296,27 +296,23 @@
return true;
}
-static struct nft_set_type nft_bitmap_type;
-static struct nft_set_ops nft_bitmap_ops __read_mostly = {
- .type = &nft_bitmap_type,
- .privsize = nft_bitmap_privsize,
- .elemsize = offsetof(struct nft_bitmap_elem, ext),
- .estimate = nft_bitmap_estimate,
- .init = nft_bitmap_init,
- .destroy = nft_bitmap_destroy,
- .insert = nft_bitmap_insert,
- .remove = nft_bitmap_remove,
- .deactivate = nft_bitmap_deactivate,
- .flush = nft_bitmap_flush,
- .activate = nft_bitmap_activate,
- .lookup = nft_bitmap_lookup,
- .walk = nft_bitmap_walk,
- .get = nft_bitmap_get,
-};
-
static struct nft_set_type nft_bitmap_type __read_mostly = {
- .ops = &nft_bitmap_ops,
.owner = THIS_MODULE,
+ .ops = {
+ .privsize = nft_bitmap_privsize,
+ .elemsize = offsetof(struct nft_bitmap_elem, ext),
+ .estimate = nft_bitmap_estimate,
+ .init = nft_bitmap_init,
+ .destroy = nft_bitmap_destroy,
+ .insert = nft_bitmap_insert,
+ .remove = nft_bitmap_remove,
+ .deactivate = nft_bitmap_deactivate,
+ .flush = nft_bitmap_flush,
+ .activate = nft_bitmap_activate,
+ .lookup = nft_bitmap_lookup,
+ .walk = nft_bitmap_walk,
+ .get = nft_bitmap_get,
+ },
};
static int __init nft_bitmap_module_init(void)
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index fc9c6d5..dbf1f4ad 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -605,6 +605,12 @@
static bool nft_hash_estimate(const struct nft_set_desc *desc, u32 features,
struct nft_set_estimate *est)
{
+ if (!desc->size)
+ return false;
+
+ if (desc->klen == 4)
+ return false;
+
est->size = sizeof(struct nft_hash) +
nft_hash_buckets(desc->size) * sizeof(struct hlist_head) +
desc->size * sizeof(struct nft_hash_elem);
@@ -614,91 +620,100 @@
return true;
}
-static struct nft_set_type nft_hash_type;
-static struct nft_set_ops nft_rhash_ops __read_mostly = {
- .type = &nft_hash_type,
- .privsize = nft_rhash_privsize,
- .elemsize = offsetof(struct nft_rhash_elem, ext),
- .estimate = nft_rhash_estimate,
- .init = nft_rhash_init,
- .destroy = nft_rhash_destroy,
- .insert = nft_rhash_insert,
- .activate = nft_rhash_activate,
- .deactivate = nft_rhash_deactivate,
- .flush = nft_rhash_flush,
- .remove = nft_rhash_remove,
- .lookup = nft_rhash_lookup,
- .update = nft_rhash_update,
- .walk = nft_rhash_walk,
- .get = nft_rhash_get,
- .features = NFT_SET_MAP | NFT_SET_OBJECT | NFT_SET_TIMEOUT,
-};
-
-static struct nft_set_ops nft_hash_ops __read_mostly = {
- .type = &nft_hash_type,
- .privsize = nft_hash_privsize,
- .elemsize = offsetof(struct nft_hash_elem, ext),
- .estimate = nft_hash_estimate,
- .init = nft_hash_init,
- .destroy = nft_hash_destroy,
- .insert = nft_hash_insert,
- .activate = nft_hash_activate,
- .deactivate = nft_hash_deactivate,
- .flush = nft_hash_flush,
- .remove = nft_hash_remove,
- .lookup = nft_hash_lookup,
- .walk = nft_hash_walk,
- .get = nft_hash_get,
- .features = NFT_SET_MAP | NFT_SET_OBJECT,
-};
-
-static struct nft_set_ops nft_hash_fast_ops __read_mostly = {
- .type = &nft_hash_type,
- .privsize = nft_hash_privsize,
- .elemsize = offsetof(struct nft_hash_elem, ext),
- .estimate = nft_hash_estimate,
- .init = nft_hash_init,
- .destroy = nft_hash_destroy,
- .insert = nft_hash_insert,
- .activate = nft_hash_activate,
- .deactivate = nft_hash_deactivate,
- .flush = nft_hash_flush,
- .remove = nft_hash_remove,
- .lookup = nft_hash_lookup_fast,
- .walk = nft_hash_walk,
- .get = nft_hash_get,
- .features = NFT_SET_MAP | NFT_SET_OBJECT,
-};
-
-static const struct nft_set_ops *
-nft_hash_select_ops(const struct nft_ctx *ctx, const struct nft_set_desc *desc,
- u32 flags)
+static bool nft_hash_fast_estimate(const struct nft_set_desc *desc, u32 features,
+ struct nft_set_estimate *est)
{
- if (desc->size && !(flags & (NFT_SET_EVAL | NFT_SET_TIMEOUT))) {
- switch (desc->klen) {
- case 4:
- return &nft_hash_fast_ops;
- default:
- return &nft_hash_ops;
- }
- }
+ if (!desc->size)
+ return false;
- return &nft_rhash_ops;
+ if (desc->klen != 4)
+ return false;
+
+ est->size = sizeof(struct nft_hash) +
+ nft_hash_buckets(desc->size) * sizeof(struct hlist_head) +
+ desc->size * sizeof(struct nft_hash_elem);
+ est->lookup = NFT_SET_CLASS_O_1;
+ est->space = NFT_SET_CLASS_O_N;
+
+ return true;
}
-static struct nft_set_type nft_hash_type __read_mostly = {
- .select_ops = nft_hash_select_ops,
+static struct nft_set_type nft_rhash_type __read_mostly = {
.owner = THIS_MODULE,
+ .features = NFT_SET_MAP | NFT_SET_OBJECT |
+ NFT_SET_TIMEOUT | NFT_SET_EVAL,
+ .ops = {
+ .privsize = nft_rhash_privsize,
+ .elemsize = offsetof(struct nft_rhash_elem, ext),
+ .estimate = nft_rhash_estimate,
+ .init = nft_rhash_init,
+ .destroy = nft_rhash_destroy,
+ .insert = nft_rhash_insert,
+ .activate = nft_rhash_activate,
+ .deactivate = nft_rhash_deactivate,
+ .flush = nft_rhash_flush,
+ .remove = nft_rhash_remove,
+ .lookup = nft_rhash_lookup,
+ .update = nft_rhash_update,
+ .walk = nft_rhash_walk,
+ .get = nft_rhash_get,
+ },
+};
+
+static struct nft_set_type nft_hash_type __read_mostly = {
+ .owner = THIS_MODULE,
+ .features = NFT_SET_MAP | NFT_SET_OBJECT,
+ .ops = {
+ .privsize = nft_hash_privsize,
+ .elemsize = offsetof(struct nft_hash_elem, ext),
+ .estimate = nft_hash_estimate,
+ .init = nft_hash_init,
+ .destroy = nft_hash_destroy,
+ .insert = nft_hash_insert,
+ .activate = nft_hash_activate,
+ .deactivate = nft_hash_deactivate,
+ .flush = nft_hash_flush,
+ .remove = nft_hash_remove,
+ .lookup = nft_hash_lookup,
+ .walk = nft_hash_walk,
+ .get = nft_hash_get,
+ },
+};
+
+static struct nft_set_type nft_hash_fast_type __read_mostly = {
+ .owner = THIS_MODULE,
+ .features = NFT_SET_MAP | NFT_SET_OBJECT,
+ .ops = {
+ .privsize = nft_hash_privsize,
+ .elemsize = offsetof(struct nft_hash_elem, ext),
+ .estimate = nft_hash_fast_estimate,
+ .init = nft_hash_init,
+ .destroy = nft_hash_destroy,
+ .insert = nft_hash_insert,
+ .activate = nft_hash_activate,
+ .deactivate = nft_hash_deactivate,
+ .flush = nft_hash_flush,
+ .remove = nft_hash_remove,
+ .lookup = nft_hash_lookup_fast,
+ .walk = nft_hash_walk,
+ .get = nft_hash_get,
+ },
};
static int __init nft_hash_module_init(void)
{
- return nft_register_set(&nft_hash_type);
+ if (nft_register_set(&nft_hash_fast_type) ||
+ nft_register_set(&nft_hash_type) ||
+ nft_register_set(&nft_rhash_type))
+ return 1;
+ return 0;
}
static void __exit nft_hash_module_exit(void)
{
+ nft_unregister_set(&nft_rhash_type);
nft_unregister_set(&nft_hash_type);
+ nft_unregister_set(&nft_hash_fast_type);
}
module_init(nft_hash_module_init);
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index e6f08bc..d260ce2 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -22,6 +22,7 @@
struct rb_root root;
rwlock_t lock;
seqcount_t count;
+ struct delayed_work gc_work;
};
struct nft_rbtree_elem {
@@ -265,6 +266,7 @@
struct nft_rbtree_elem *rbe = elem->priv;
nft_set_elem_change_active(net, set, &rbe->ext);
+ nft_set_elem_clear_busy(&rbe->ext);
}
static bool nft_rbtree_flush(const struct net *net,
@@ -272,8 +274,12 @@
{
struct nft_rbtree_elem *rbe = priv;
- nft_set_elem_change_active(net, set, &rbe->ext);
- return true;
+ if (!nft_set_elem_mark_busy(&rbe->ext) ||
+ !nft_is_active(net, &rbe->ext)) {
+ nft_set_elem_change_active(net, set, &rbe->ext);
+ return true;
+ }
+ return false;
}
static void *nft_rbtree_deactivate(const struct net *net,
@@ -347,6 +353,62 @@
read_unlock_bh(&priv->lock);
}
+static void nft_rbtree_gc(struct work_struct *work)
+{
+ struct nft_set_gc_batch *gcb = NULL;
+ struct rb_node *node, *prev = NULL;
+ struct nft_rbtree_elem *rbe;
+ struct nft_rbtree *priv;
+ struct nft_set *set;
+ int i;
+
+ priv = container_of(work, struct nft_rbtree, gc_work.work);
+ set = nft_set_container_of(priv);
+
+ write_lock_bh(&priv->lock);
+ write_seqcount_begin(&priv->count);
+ for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
+ rbe = rb_entry(node, struct nft_rbtree_elem, node);
+
+ if (nft_rbtree_interval_end(rbe)) {
+ prev = node;
+ continue;
+ }
+ if (!nft_set_elem_expired(&rbe->ext))
+ continue;
+ if (nft_set_elem_mark_busy(&rbe->ext))
+ continue;
+
+ gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
+ if (!gcb)
+ goto out;
+
+ atomic_dec(&set->nelems);
+ nft_set_gc_batch_add(gcb, rbe);
+
+ if (prev) {
+ rbe = rb_entry(prev, struct nft_rbtree_elem, node);
+ atomic_dec(&set->nelems);
+ nft_set_gc_batch_add(gcb, rbe);
+ }
+ node = rb_next(node);
+ }
+out:
+ if (gcb) {
+ for (i = 0; i < gcb->head.cnt; i++) {
+ rbe = gcb->elems[i];
+ rb_erase(&rbe->node, &priv->root);
+ }
+ }
+ write_seqcount_end(&priv->count);
+ write_unlock_bh(&priv->lock);
+
+ nft_set_gc_batch_complete(gcb);
+
+ queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
+ nft_set_gc_interval(set));
+}
+
static unsigned int nft_rbtree_privsize(const struct nlattr * const nla[],
const struct nft_set_desc *desc)
{
@@ -362,6 +424,12 @@
rwlock_init(&priv->lock);
seqcount_init(&priv->count);
priv->root = RB_ROOT;
+
+ INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc);
+ if (set->flags & NFT_SET_TIMEOUT)
+ queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
+ nft_set_gc_interval(set));
+
return 0;
}
@@ -371,6 +439,7 @@
struct nft_rbtree_elem *rbe;
struct rb_node *node;
+ cancel_delayed_work_sync(&priv->gc_work);
while ((node = priv->root.rb_node) != NULL) {
rb_erase(node, &priv->root);
rbe = rb_entry(node, struct nft_rbtree_elem, node);
@@ -393,28 +462,24 @@
return true;
}
-static struct nft_set_type nft_rbtree_type;
-static struct nft_set_ops nft_rbtree_ops __read_mostly = {
- .type = &nft_rbtree_type,
- .privsize = nft_rbtree_privsize,
- .elemsize = offsetof(struct nft_rbtree_elem, ext),
- .estimate = nft_rbtree_estimate,
- .init = nft_rbtree_init,
- .destroy = nft_rbtree_destroy,
- .insert = nft_rbtree_insert,
- .remove = nft_rbtree_remove,
- .deactivate = nft_rbtree_deactivate,
- .flush = nft_rbtree_flush,
- .activate = nft_rbtree_activate,
- .lookup = nft_rbtree_lookup,
- .walk = nft_rbtree_walk,
- .get = nft_rbtree_get,
- .features = NFT_SET_INTERVAL | NFT_SET_MAP | NFT_SET_OBJECT,
-};
-
static struct nft_set_type nft_rbtree_type __read_mostly = {
- .ops = &nft_rbtree_ops,
.owner = THIS_MODULE,
+ .features = NFT_SET_INTERVAL | NFT_SET_MAP | NFT_SET_OBJECT | NFT_SET_TIMEOUT,
+ .ops = {
+ .privsize = nft_rbtree_privsize,
+ .elemsize = offsetof(struct nft_rbtree_elem, ext),
+ .estimate = nft_rbtree_estimate,
+ .init = nft_rbtree_init,
+ .destroy = nft_rbtree_destroy,
+ .insert = nft_rbtree_insert,
+ .remove = nft_rbtree_remove,
+ .deactivate = nft_rbtree_deactivate,
+ .flush = nft_rbtree_flush,
+ .activate = nft_rbtree_activate,
+ .lookup = nft_rbtree_lookup,
+ .walk = nft_rbtree_walk,
+ .get = nft_rbtree_get,
+ },
};
static int __init nft_rbtree_module_init(void)
diff --git a/net/netfilter/xt_NETMAP.c b/net/netfilter/xt_NETMAP.c
index 58aa9dd..1d43787 100644
--- a/net/netfilter/xt_NETMAP.c
+++ b/net/netfilter/xt_NETMAP.c
@@ -21,8 +21,8 @@
static unsigned int
netmap_tg6(struct sk_buff *skb, const struct xt_action_param *par)
{
- const struct nf_nat_range *range = par->targinfo;
- struct nf_nat_range newrange;
+ const struct nf_nat_range2 *range = par->targinfo;
+ struct nf_nat_range2 newrange;
struct nf_conn *ct;
enum ip_conntrack_info ctinfo;
union nf_inet_addr new_addr, netmask;
@@ -56,7 +56,7 @@
static int netmap_tg6_checkentry(const struct xt_tgchk_param *par)
{
- const struct nf_nat_range *range = par->targinfo;
+ const struct nf_nat_range2 *range = par->targinfo;
if (!(range->flags & NF_NAT_RANGE_MAP_IPS))
return -EINVAL;
@@ -75,7 +75,7 @@
enum ip_conntrack_info ctinfo;
__be32 new_ip, netmask;
const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
- struct nf_nat_range newrange;
+ struct nf_nat_range2 newrange;
WARN_ON(xt_hooknum(par) != NF_INET_PRE_ROUTING &&
xt_hooknum(par) != NF_INET_POST_ROUTING &&
diff --git a/net/netfilter/xt_NFLOG.c b/net/netfilter/xt_NFLOG.c
index c7f8958..1ed0cac 100644
--- a/net/netfilter/xt_NFLOG.c
+++ b/net/netfilter/xt_NFLOG.c
@@ -13,7 +13,6 @@
#include <linux/netfilter/x_tables.h>
#include <linux/netfilter/xt_NFLOG.h>
#include <net/netfilter/nf_log.h>
-#include <net/netfilter/nfnetlink_log.h>
MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
MODULE_DESCRIPTION("Xtables: packet logging to netlink using NFLOG");
@@ -37,8 +36,9 @@
if (info->flags & XT_NFLOG_F_COPY_LEN)
li.u.ulog.flags |= NF_LOG_F_COPY_LEN;
- nfulnl_log_packet(net, xt_family(par), xt_hooknum(par), skb,
- xt_in(par), xt_out(par), &li, info->prefix);
+ nf_log_packet(net, xt_family(par), xt_hooknum(par), skb, xt_in(par),
+ xt_out(par), &li, "%s", info->prefix);
+
return XT_CONTINUE;
}
@@ -50,7 +50,13 @@
return -EINVAL;
if (info->prefix[sizeof(info->prefix) - 1] != '\0')
return -EINVAL;
- return 0;
+
+ return nf_logger_find_get(par->family, NF_LOG_TYPE_ULOG);
+}
+
+static void nflog_tg_destroy(const struct xt_tgdtor_param *par)
+{
+ nf_logger_put(par->family, NF_LOG_TYPE_ULOG);
}
static struct xt_target nflog_tg_reg __read_mostly = {
@@ -58,6 +64,7 @@
.revision = 0,
.family = NFPROTO_UNSPEC,
.checkentry = nflog_tg_check,
+ .destroy = nflog_tg_destroy,
.target = nflog_tg,
.targetsize = sizeof(struct xt_nflog_info),
.me = THIS_MODULE,
diff --git a/net/netfilter/xt_REDIRECT.c b/net/netfilter/xt_REDIRECT.c
index 98a4c6d..5ce9461 100644
--- a/net/netfilter/xt_REDIRECT.c
+++ b/net/netfilter/xt_REDIRECT.c
@@ -36,7 +36,7 @@
static int redirect_tg6_checkentry(const struct xt_tgchk_param *par)
{
- const struct nf_nat_range *range = par->targinfo;
+ const struct nf_nat_range2 *range = par->targinfo;
if (range->flags & NF_NAT_RANGE_MAP_IPS)
return -EINVAL;
diff --git a/net/netfilter/xt_nat.c b/net/netfilter/xt_nat.c
index bdb689c..8af9707 100644
--- a/net/netfilter/xt_nat.c
+++ b/net/netfilter/xt_nat.c
@@ -37,11 +37,12 @@
nf_ct_netns_put(par->net, par->family);
}
-static void xt_nat_convert_range(struct nf_nat_range *dst,
+static void xt_nat_convert_range(struct nf_nat_range2 *dst,
const struct nf_nat_ipv4_range *src)
{
memset(&dst->min_addr, 0, sizeof(dst->min_addr));
memset(&dst->max_addr, 0, sizeof(dst->max_addr));
+ memset(&dst->base_proto, 0, sizeof(dst->base_proto));
dst->flags = src->flags;
dst->min_addr.ip = src->min_ip;
@@ -54,7 +55,7 @@
xt_snat_target_v0(struct sk_buff *skb, const struct xt_action_param *par)
{
const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -71,7 +72,7 @@
xt_dnat_target_v0(struct sk_buff *skb, const struct xt_action_param *par)
{
const struct nf_nat_ipv4_multi_range_compat *mr = par->targinfo;
- struct nf_nat_range range;
+ struct nf_nat_range2 range;
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -86,7 +87,44 @@
static unsigned int
xt_snat_target_v1(struct sk_buff *skb, const struct xt_action_param *par)
{
- const struct nf_nat_range *range = par->targinfo;
+ const struct nf_nat_range *range_v1 = par->targinfo;
+ struct nf_nat_range2 range;
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct;
+
+ ct = nf_ct_get(skb, &ctinfo);
+ WARN_ON(!(ct != NULL &&
+ (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
+ ctinfo == IP_CT_RELATED_REPLY)));
+
+ memcpy(&range, range_v1, sizeof(*range_v1));
+ memset(&range.base_proto, 0, sizeof(range.base_proto));
+
+ return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC);
+}
+
+static unsigned int
+xt_dnat_target_v1(struct sk_buff *skb, const struct xt_action_param *par)
+{
+ const struct nf_nat_range *range_v1 = par->targinfo;
+ struct nf_nat_range2 range;
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct;
+
+ ct = nf_ct_get(skb, &ctinfo);
+ WARN_ON(!(ct != NULL &&
+ (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED)));
+
+ memcpy(&range, range_v1, sizeof(*range_v1));
+ memset(&range.base_proto, 0, sizeof(range.base_proto));
+
+ return nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST);
+}
+
+static unsigned int
+xt_snat_target_v2(struct sk_buff *skb, const struct xt_action_param *par)
+{
+ const struct nf_nat_range2 *range = par->targinfo;
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -99,9 +137,9 @@
}
static unsigned int
-xt_dnat_target_v1(struct sk_buff *skb, const struct xt_action_param *par)
+xt_dnat_target_v2(struct sk_buff *skb, const struct xt_action_param *par)
{
- const struct nf_nat_range *range = par->targinfo;
+ const struct nf_nat_range2 *range = par->targinfo;
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
@@ -163,6 +201,28 @@
(1 << NF_INET_LOCAL_OUT),
.me = THIS_MODULE,
},
+ {
+ .name = "SNAT",
+ .revision = 2,
+ .checkentry = xt_nat_checkentry,
+ .destroy = xt_nat_destroy,
+ .target = xt_snat_target_v2,
+ .targetsize = sizeof(struct nf_nat_range2),
+ .table = "nat",
+ .hooks = (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_IN),
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "DNAT",
+ .revision = 2,
+ .target = xt_dnat_target_v2,
+ .targetsize = sizeof(struct nf_nat_range2),
+ .table = "nat",
+ .hooks = (1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_LOCAL_OUT),
+ .me = THIS_MODULE,
+ },
};
static int __init xt_nat_init(void)
diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c
index a34f314..9cfef73 100644
--- a/net/netfilter/xt_osf.c
+++ b/net/netfilter/xt_osf.c
@@ -37,21 +37,6 @@
#include <net/netfilter/nf_log.h>
#include <linux/netfilter/xt_osf.h>
-struct xt_osf_finger {
- struct rcu_head rcu_head;
- struct list_head finger_entry;
- struct xt_osf_user_finger finger;
-};
-
-enum osf_fmatch_states {
- /* Packet does not match the fingerprint */
- FMATCH_WRONG = 0,
- /* Packet matches the fingerprint */
- FMATCH_OK,
- /* Options do not match the fingerprint, but header does */
- FMATCH_OPT_WRONG,
-};
-
/*
* Indexed by dont-fragment bit.
* It is the only constant value in the fingerprint.
@@ -164,200 +149,17 @@
.cb = xt_osf_nfnetlink_callbacks,
};
-static inline int xt_osf_ttl(const struct sk_buff *skb, const struct xt_osf_info *info,
- unsigned char f_ttl)
-{
- const struct iphdr *ip = ip_hdr(skb);
-
- if (info->flags & XT_OSF_TTL) {
- if (info->ttl == XT_OSF_TTL_TRUE)
- return ip->ttl == f_ttl;
- if (info->ttl == XT_OSF_TTL_NOCHECK)
- return 1;
- else if (ip->ttl <= f_ttl)
- return 1;
- else {
- struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
- int ret = 0;
-
- for_ifa(in_dev) {
- if (inet_ifa_match(ip->saddr, ifa)) {
- ret = (ip->ttl == f_ttl);
- break;
- }
- }
- endfor_ifa(in_dev);
-
- return ret;
- }
- }
-
- return ip->ttl == f_ttl;
-}
-
static bool
xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p)
{
const struct xt_osf_info *info = p->matchinfo;
- const struct iphdr *ip = ip_hdr(skb);
- const struct tcphdr *tcp;
- struct tcphdr _tcph;
- int fmatch = FMATCH_WRONG, fcount = 0;
- unsigned int optsize = 0, check_WSS = 0;
- u16 window, totlen, mss = 0;
- bool df;
- const unsigned char *optp = NULL, *_optp = NULL;
- unsigned char opts[MAX_IPOPTLEN];
- const struct xt_osf_finger *kf;
- const struct xt_osf_user_finger *f;
struct net *net = xt_net(p);
if (!info)
return false;
- tcp = skb_header_pointer(skb, ip_hdrlen(skb), sizeof(struct tcphdr), &_tcph);
- if (!tcp)
- return false;
-
- if (!tcp->syn)
- return false;
-
- totlen = ntohs(ip->tot_len);
- df = ntohs(ip->frag_off) & IP_DF;
- window = ntohs(tcp->window);
-
- if (tcp->doff * 4 > sizeof(struct tcphdr)) {
- optsize = tcp->doff * 4 - sizeof(struct tcphdr);
-
- _optp = optp = skb_header_pointer(skb, ip_hdrlen(skb) +
- sizeof(struct tcphdr), optsize, opts);
- }
-
- list_for_each_entry_rcu(kf, &xt_osf_fingers[df], finger_entry) {
- int foptsize, optnum;
-
- f = &kf->finger;
-
- if (!(info->flags & XT_OSF_LOG) && strcmp(info->genre, f->genre))
- continue;
-
- optp = _optp;
- fmatch = FMATCH_WRONG;
-
- if (totlen != f->ss || !xt_osf_ttl(skb, info, f->ttl))
- continue;
-
- /*
- * Should not happen if userspace parser was written correctly.
- */
- if (f->wss.wc >= OSF_WSS_MAX)
- continue;
-
- /* Check options */
-
- foptsize = 0;
- for (optnum = 0; optnum < f->opt_num; ++optnum)
- foptsize += f->opt[optnum].length;
-
- if (foptsize > MAX_IPOPTLEN ||
- optsize > MAX_IPOPTLEN ||
- optsize != foptsize)
- continue;
-
- check_WSS = f->wss.wc;
-
- for (optnum = 0; optnum < f->opt_num; ++optnum) {
- if (f->opt[optnum].kind == (*optp)) {
- __u32 len = f->opt[optnum].length;
- const __u8 *optend = optp + len;
-
- fmatch = FMATCH_OK;
-
- switch (*optp) {
- case OSFOPT_MSS:
- mss = optp[3];
- mss <<= 8;
- mss |= optp[2];
-
- mss = ntohs((__force __be16)mss);
- break;
- case OSFOPT_TS:
- break;
- }
-
- optp = optend;
- } else
- fmatch = FMATCH_OPT_WRONG;
-
- if (fmatch != FMATCH_OK)
- break;
- }
-
- if (fmatch != FMATCH_OPT_WRONG) {
- fmatch = FMATCH_WRONG;
-
- switch (check_WSS) {
- case OSF_WSS_PLAIN:
- if (f->wss.val == 0 || window == f->wss.val)
- fmatch = FMATCH_OK;
- break;
- case OSF_WSS_MSS:
- /*
- * Some smart modems decrease mangle MSS to
- * SMART_MSS_2, so we check standard, decreased
- * and the one provided in the fingerprint MSS
- * values.
- */
-#define SMART_MSS_1 1460
-#define SMART_MSS_2 1448
- if (window == f->wss.val * mss ||
- window == f->wss.val * SMART_MSS_1 ||
- window == f->wss.val * SMART_MSS_2)
- fmatch = FMATCH_OK;
- break;
- case OSF_WSS_MTU:
- if (window == f->wss.val * (mss + 40) ||
- window == f->wss.val * (SMART_MSS_1 + 40) ||
- window == f->wss.val * (SMART_MSS_2 + 40))
- fmatch = FMATCH_OK;
- break;
- case OSF_WSS_MODULO:
- if ((window % f->wss.val) == 0)
- fmatch = FMATCH_OK;
- break;
- }
- }
-
- if (fmatch != FMATCH_OK)
- continue;
-
- fcount++;
-
- if (info->flags & XT_OSF_LOG)
- nf_log_packet(net, xt_family(p), xt_hooknum(p), skb,
- xt_in(p), xt_out(p), NULL,
- "%s [%s:%s] : %pI4:%d -> %pI4:%d hops=%d\n",
- f->genre, f->version, f->subtype,
- &ip->saddr, ntohs(tcp->source),
- &ip->daddr, ntohs(tcp->dest),
- f->ttl - ip->ttl);
-
- if ((info->flags & XT_OSF_LOG) &&
- info->loglevel == XT_OSF_LOGLEVEL_FIRST)
- break;
- }
-
- if (!fcount && (info->flags & XT_OSF_LOG))
- nf_log_packet(net, xt_family(p), xt_hooknum(p), skb, xt_in(p),
- xt_out(p), NULL,
- "Remote OS is not known: %pI4:%u -> %pI4:%u\n",
- &ip->saddr, ntohs(tcp->source),
- &ip->daddr, ntohs(tcp->dest));
-
- if (fcount)
- fmatch = FMATCH_OK;
-
- return fmatch == FMATCH_OK;
+ return nf_osf_match(skb, xt_family(p), xt_hooknum(p), xt_in(p),
+ xt_out(p), info, net, xt_osf_fingers);
}
static struct xt_match xt_osf_match = {
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 2650205..89da951 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -9,7 +9,8 @@
(NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
(!NF_NAT || NF_NAT) && \
(!NF_NAT_IPV4 || NF_NAT_IPV4) && \
- (!NF_NAT_IPV6 || NF_NAT_IPV6)))
+ (!NF_NAT_IPV6 || NF_NAT_IPV6) && \
+ (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
select LIBCRC32C
select MPLS
select NET_MPLS_GSO
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c5904f6..284aca2 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -16,8 +16,11 @@
#include <linux/tcp.h>
#include <linux/udp.h>
#include <linux/sctp.h>
+#include <linux/static_key.h>
#include <net/ip.h>
+#include <net/genetlink.h>
#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_count.h>
#include <net/netfilter/nf_conntrack_helper.h>
#include <net/netfilter/nf_conntrack_labels.h>
#include <net/netfilter/nf_conntrack_seqadj.h>
@@ -72,10 +75,35 @@
struct md_mark mark;
struct md_labels labels;
#ifdef CONFIG_NF_NAT_NEEDED
- struct nf_nat_range range; /* Only present for SRC NAT and DST NAT. */
+ struct nf_nat_range2 range; /* Only present for SRC NAT and DST NAT. */
#endif
};
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+#define OVS_CT_LIMIT_UNLIMITED 0
+#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
+#define CT_LIMIT_HASH_BUCKETS 512
+static DEFINE_STATIC_KEY_FALSE(ovs_ct_limit_enabled);
+
+struct ovs_ct_limit {
+ /* Elements in ovs_ct_limit_info->limits hash table */
+ struct hlist_node hlist_node;
+ struct rcu_head rcu;
+ u16 zone;
+ u32 limit;
+};
+
+struct ovs_ct_limit_info {
+ u32 default_limit;
+ struct hlist_head *limits;
+ struct nf_conncount_data *data;
+};
+
+static const struct nla_policy ct_limit_policy[OVS_CT_LIMIT_ATTR_MAX + 1] = {
+ [OVS_CT_LIMIT_ATTR_ZONE_LIMIT] = { .type = NLA_NESTED, },
+};
+#endif
+
static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
@@ -710,7 +738,7 @@
*/
static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
enum ip_conntrack_info ctinfo,
- const struct nf_nat_range *range,
+ const struct nf_nat_range2 *range,
enum nf_nat_manip_type maniptype)
{
int hooknum, nh_off, err = NF_ACCEPT;
@@ -1036,6 +1064,89 @@
return false;
}
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+static struct hlist_head *ct_limit_hash_bucket(
+ const struct ovs_ct_limit_info *info, u16 zone)
+{
+ return &info->limits[zone & (CT_LIMIT_HASH_BUCKETS - 1)];
+}
+
+/* Call with ovs_mutex */
+static void ct_limit_set(const struct ovs_ct_limit_info *info,
+ struct ovs_ct_limit *new_ct_limit)
+{
+ struct ovs_ct_limit *ct_limit;
+ struct hlist_head *head;
+
+ head = ct_limit_hash_bucket(info, new_ct_limit->zone);
+ hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+ if (ct_limit->zone == new_ct_limit->zone) {
+ hlist_replace_rcu(&ct_limit->hlist_node,
+ &new_ct_limit->hlist_node);
+ kfree_rcu(ct_limit, rcu);
+ return;
+ }
+ }
+
+ hlist_add_head_rcu(&new_ct_limit->hlist_node, head);
+}
+
+/* Call with ovs_mutex */
+static void ct_limit_del(const struct ovs_ct_limit_info *info, u16 zone)
+{
+ struct ovs_ct_limit *ct_limit;
+ struct hlist_head *head;
+ struct hlist_node *n;
+
+ head = ct_limit_hash_bucket(info, zone);
+ hlist_for_each_entry_safe(ct_limit, n, head, hlist_node) {
+ if (ct_limit->zone == zone) {
+ hlist_del_rcu(&ct_limit->hlist_node);
+ kfree_rcu(ct_limit, rcu);
+ return;
+ }
+ }
+}
+
+/* Call with RCU read lock */
+static u32 ct_limit_get(const struct ovs_ct_limit_info *info, u16 zone)
+{
+ struct ovs_ct_limit *ct_limit;
+ struct hlist_head *head;
+
+ head = ct_limit_hash_bucket(info, zone);
+ hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+ if (ct_limit->zone == zone)
+ return ct_limit->limit;
+ }
+
+ return info->default_limit;
+}
+
+static int ovs_ct_check_limit(struct net *net,
+ const struct ovs_conntrack_info *info,
+ const struct nf_conntrack_tuple *tuple)
+{
+ struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+ const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+ u32 per_zone_limit, connections;
+ u32 conncount_key;
+
+ conncount_key = info->zone.id;
+
+ per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id);
+ if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED)
+ return 0;
+
+ connections = nf_conncount_count(net, ct_limit_info->data,
+ &conncount_key, tuple, &info->zone);
+ if (connections > per_zone_limit)
+ return -ENOMEM;
+
+ return 0;
+}
+#endif
+
/* Lookup connection and confirm if unconfirmed. */
static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
const struct ovs_conntrack_info *info,
@@ -1054,6 +1165,21 @@
if (!ct)
return 0;
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+ if (static_branch_unlikely(&ovs_ct_limit_enabled)) {
+ if (!nf_ct_is_confirmed(ct)) {
+ err = ovs_ct_check_limit(net, info,
+ &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+ if (err) {
+ net_warn_ratelimited("openvswitch: zone: %u "
+ "execeeds conntrack limit\n",
+ info->zone.id);
+ return err;
+ }
+ }
+ }
+#endif
+
/* Set the conntrack event mask if given. NEW and DELETE events have
* their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
* typically would receive many kinds of updates. Setting the event
@@ -1655,7 +1781,420 @@
nf_ct_tmpl_free(ct_info->ct);
}
-void ovs_ct_init(struct net *net)
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+static int ovs_ct_limit_init(struct net *net, struct ovs_net *ovs_net)
+{
+ int i, err;
+
+ ovs_net->ct_limit_info = kmalloc(sizeof(*ovs_net->ct_limit_info),
+ GFP_KERNEL);
+ if (!ovs_net->ct_limit_info)
+ return -ENOMEM;
+
+ ovs_net->ct_limit_info->default_limit = OVS_CT_LIMIT_DEFAULT;
+ ovs_net->ct_limit_info->limits =
+ kmalloc_array(CT_LIMIT_HASH_BUCKETS, sizeof(struct hlist_head),
+ GFP_KERNEL);
+ if (!ovs_net->ct_limit_info->limits) {
+ kfree(ovs_net->ct_limit_info);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < CT_LIMIT_HASH_BUCKETS; i++)
+ INIT_HLIST_HEAD(&ovs_net->ct_limit_info->limits[i]);
+
+ ovs_net->ct_limit_info->data =
+ nf_conncount_init(net, NFPROTO_INET, sizeof(u32));
+
+ if (IS_ERR(ovs_net->ct_limit_info->data)) {
+ err = PTR_ERR(ovs_net->ct_limit_info->data);
+ kfree(ovs_net->ct_limit_info->limits);
+ kfree(ovs_net->ct_limit_info);
+ pr_err("openvswitch: failed to init nf_conncount %d\n", err);
+ return err;
+ }
+ return 0;
+}
+
+static void ovs_ct_limit_exit(struct net *net, struct ovs_net *ovs_net)
+{
+ const struct ovs_ct_limit_info *info = ovs_net->ct_limit_info;
+ int i;
+
+ nf_conncount_destroy(net, NFPROTO_INET, info->data);
+ for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) {
+ struct hlist_head *head = &info->limits[i];
+ struct ovs_ct_limit *ct_limit;
+
+ hlist_for_each_entry_rcu(ct_limit, head, hlist_node)
+ kfree_rcu(ct_limit, rcu);
+ }
+ kfree(ovs_net->ct_limit_info->limits);
+ kfree(ovs_net->ct_limit_info);
+}
+
+static struct sk_buff *
+ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd,
+ struct ovs_header **ovs_reply_header)
+{
+ struct ovs_header *ovs_header = info->userhdr;
+ struct sk_buff *skb;
+
+ skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!skb)
+ return ERR_PTR(-ENOMEM);
+
+ *ovs_reply_header = genlmsg_put(skb, info->snd_portid,
+ info->snd_seq,
+ &dp_ct_limit_genl_family, 0, cmd);
+
+ if (!*ovs_reply_header) {
+ nlmsg_free(skb);
+ return ERR_PTR(-EMSGSIZE);
+ }
+ (*ovs_reply_header)->dp_ifindex = ovs_header->dp_ifindex;
+
+ return skb;
+}
+
+static bool check_zone_id(int zone_id, u16 *pzone)
+{
+ if (zone_id >= 0 && zone_id <= 65535) {
+ *pzone = (u16)zone_id;
+ return true;
+ }
+ return false;
+}
+
+static int ovs_ct_limit_set_zone_limit(struct nlattr *nla_zone_limit,
+ struct ovs_ct_limit_info *info)
+{
+ struct ovs_zone_limit *zone_limit;
+ int rem;
+ u16 zone;
+
+ rem = NLA_ALIGN(nla_len(nla_zone_limit));
+ zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit);
+
+ while (rem >= sizeof(*zone_limit)) {
+ if (unlikely(zone_limit->zone_id ==
+ OVS_ZONE_LIMIT_DEFAULT_ZONE)) {
+ ovs_lock();
+ info->default_limit = zone_limit->limit;
+ ovs_unlock();
+ } else if (unlikely(!check_zone_id(
+ zone_limit->zone_id, &zone))) {
+ OVS_NLERR(true, "zone id is out of range");
+ } else {
+ struct ovs_ct_limit *ct_limit;
+
+ ct_limit = kmalloc(sizeof(*ct_limit), GFP_KERNEL);
+ if (!ct_limit)
+ return -ENOMEM;
+
+ ct_limit->zone = zone;
+ ct_limit->limit = zone_limit->limit;
+
+ ovs_lock();
+ ct_limit_set(info, ct_limit);
+ ovs_unlock();
+ }
+ rem -= NLA_ALIGN(sizeof(*zone_limit));
+ zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit +
+ NLA_ALIGN(sizeof(*zone_limit)));
+ }
+
+ if (rem)
+ OVS_NLERR(true, "set zone limit has %d unknown bytes", rem);
+
+ return 0;
+}
+
+static int ovs_ct_limit_del_zone_limit(struct nlattr *nla_zone_limit,
+ struct ovs_ct_limit_info *info)
+{
+ struct ovs_zone_limit *zone_limit;
+ int rem;
+ u16 zone;
+
+ rem = NLA_ALIGN(nla_len(nla_zone_limit));
+ zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit);
+
+ while (rem >= sizeof(*zone_limit)) {
+ if (unlikely(zone_limit->zone_id ==
+ OVS_ZONE_LIMIT_DEFAULT_ZONE)) {
+ ovs_lock();
+ info->default_limit = OVS_CT_LIMIT_DEFAULT;
+ ovs_unlock();
+ } else if (unlikely(!check_zone_id(
+ zone_limit->zone_id, &zone))) {
+ OVS_NLERR(true, "zone id is out of range");
+ } else {
+ ovs_lock();
+ ct_limit_del(info, zone);
+ ovs_unlock();
+ }
+ rem -= NLA_ALIGN(sizeof(*zone_limit));
+ zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit +
+ NLA_ALIGN(sizeof(*zone_limit)));
+ }
+
+ if (rem)
+ OVS_NLERR(true, "del zone limit has %d unknown bytes", rem);
+
+ return 0;
+}
+
+static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info,
+ struct sk_buff *reply)
+{
+ struct ovs_zone_limit zone_limit;
+ int err;
+
+ zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
+ zone_limit.limit = info->default_limit;
+ err = nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int __ovs_ct_limit_get_zone_limit(struct net *net,
+ struct nf_conncount_data *data,
+ u16 zone_id, u32 limit,
+ struct sk_buff *reply)
+{
+ struct nf_conntrack_zone ct_zone;
+ struct ovs_zone_limit zone_limit;
+ u32 conncount_key = zone_id;
+
+ zone_limit.zone_id = zone_id;
+ zone_limit.limit = limit;
+ nf_ct_zone_init(&ct_zone, zone_id, NF_CT_DEFAULT_ZONE_DIR, 0);
+
+ zone_limit.count = nf_conncount_count(net, data, &conncount_key, NULL,
+ &ct_zone);
+ return nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit);
+}
+
+static int ovs_ct_limit_get_zone_limit(struct net *net,
+ struct nlattr *nla_zone_limit,
+ struct ovs_ct_limit_info *info,
+ struct sk_buff *reply)
+{
+ struct ovs_zone_limit *zone_limit;
+ int rem, err;
+ u32 limit;
+ u16 zone;
+
+ rem = NLA_ALIGN(nla_len(nla_zone_limit));
+ zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit);
+
+ while (rem >= sizeof(*zone_limit)) {
+ if (unlikely(zone_limit->zone_id ==
+ OVS_ZONE_LIMIT_DEFAULT_ZONE)) {
+ err = ovs_ct_limit_get_default_limit(info, reply);
+ if (err)
+ return err;
+ } else if (unlikely(!check_zone_id(zone_limit->zone_id,
+ &zone))) {
+ OVS_NLERR(true, "zone id is out of range");
+ } else {
+ rcu_read_lock();
+ limit = ct_limit_get(info, zone);
+ rcu_read_unlock();
+
+ err = __ovs_ct_limit_get_zone_limit(
+ net, info->data, zone, limit, reply);
+ if (err)
+ return err;
+ }
+ rem -= NLA_ALIGN(sizeof(*zone_limit));
+ zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit +
+ NLA_ALIGN(sizeof(*zone_limit)));
+ }
+
+ if (rem)
+ OVS_NLERR(true, "get zone limit has %d unknown bytes", rem);
+
+ return 0;
+}
+
+static int ovs_ct_limit_get_all_zone_limit(struct net *net,
+ struct ovs_ct_limit_info *info,
+ struct sk_buff *reply)
+{
+ struct ovs_ct_limit *ct_limit;
+ struct hlist_head *head;
+ int i, err = 0;
+
+ err = ovs_ct_limit_get_default_limit(info, reply);
+ if (err)
+ return err;
+
+ rcu_read_lock();
+ for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) {
+ head = &info->limits[i];
+ hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+ err = __ovs_ct_limit_get_zone_limit(net, info->data,
+ ct_limit->zone, ct_limit->limit, reply);
+ if (err)
+ goto exit_err;
+ }
+ }
+
+exit_err:
+ rcu_read_unlock();
+ return err;
+}
+
+static int ovs_ct_limit_cmd_set(struct sk_buff *skb, struct genl_info *info)
+{
+ struct nlattr **a = info->attrs;
+ struct sk_buff *reply;
+ struct ovs_header *ovs_reply_header;
+ struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
+ struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+ int err;
+
+ reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_SET,
+ &ovs_reply_header);
+ if (IS_ERR(reply))
+ return PTR_ERR(reply);
+
+ if (!a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT]) {
+ err = -EINVAL;
+ goto exit_err;
+ }
+
+ err = ovs_ct_limit_set_zone_limit(a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT],
+ ct_limit_info);
+ if (err)
+ goto exit_err;
+
+ static_branch_enable(&ovs_ct_limit_enabled);
+
+ genlmsg_end(reply, ovs_reply_header);
+ return genlmsg_reply(reply, info);
+
+exit_err:
+ nlmsg_free(reply);
+ return err;
+}
+
+static int ovs_ct_limit_cmd_del(struct sk_buff *skb, struct genl_info *info)
+{
+ struct nlattr **a = info->attrs;
+ struct sk_buff *reply;
+ struct ovs_header *ovs_reply_header;
+ struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
+ struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+ int err;
+
+ reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_DEL,
+ &ovs_reply_header);
+ if (IS_ERR(reply))
+ return PTR_ERR(reply);
+
+ if (!a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT]) {
+ err = -EINVAL;
+ goto exit_err;
+ }
+
+ err = ovs_ct_limit_del_zone_limit(a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT],
+ ct_limit_info);
+ if (err)
+ goto exit_err;
+
+ genlmsg_end(reply, ovs_reply_header);
+ return genlmsg_reply(reply, info);
+
+exit_err:
+ nlmsg_free(reply);
+ return err;
+}
+
+static int ovs_ct_limit_cmd_get(struct sk_buff *skb, struct genl_info *info)
+{
+ struct nlattr **a = info->attrs;
+ struct nlattr *nla_reply;
+ struct sk_buff *reply;
+ struct ovs_header *ovs_reply_header;
+ struct net *net = sock_net(skb->sk);
+ struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+ struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+ int err;
+
+ reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_GET,
+ &ovs_reply_header);
+ if (IS_ERR(reply))
+ return PTR_ERR(reply);
+
+ nla_reply = nla_nest_start(reply, OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
+
+ if (a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT]) {
+ err = ovs_ct_limit_get_zone_limit(
+ net, a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT], ct_limit_info,
+ reply);
+ if (err)
+ goto exit_err;
+ } else {
+ err = ovs_ct_limit_get_all_zone_limit(net, ct_limit_info,
+ reply);
+ if (err)
+ goto exit_err;
+ }
+
+ nla_nest_end(reply, nla_reply);
+ genlmsg_end(reply, ovs_reply_header);
+ return genlmsg_reply(reply, info);
+
+exit_err:
+ nlmsg_free(reply);
+ return err;
+}
+
+static struct genl_ops ct_limit_genl_ops[] = {
+ { .cmd = OVS_CT_LIMIT_CMD_SET,
+ .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+ * privilege. */
+ .policy = ct_limit_policy,
+ .doit = ovs_ct_limit_cmd_set,
+ },
+ { .cmd = OVS_CT_LIMIT_CMD_DEL,
+ .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+ * privilege. */
+ .policy = ct_limit_policy,
+ .doit = ovs_ct_limit_cmd_del,
+ },
+ { .cmd = OVS_CT_LIMIT_CMD_GET,
+ .flags = 0, /* OK for unprivileged users. */
+ .policy = ct_limit_policy,
+ .doit = ovs_ct_limit_cmd_get,
+ },
+};
+
+static const struct genl_multicast_group ovs_ct_limit_multicast_group = {
+ .name = OVS_CT_LIMIT_MCGROUP,
+};
+
+struct genl_family dp_ct_limit_genl_family __ro_after_init = {
+ .hdrsize = sizeof(struct ovs_header),
+ .name = OVS_CT_LIMIT_FAMILY,
+ .version = OVS_CT_LIMIT_VERSION,
+ .maxattr = OVS_CT_LIMIT_ATTR_MAX,
+ .netnsok = true,
+ .parallel_ops = true,
+ .ops = ct_limit_genl_ops,
+ .n_ops = ARRAY_SIZE(ct_limit_genl_ops),
+ .mcgrps = &ovs_ct_limit_multicast_group,
+ .n_mcgrps = 1,
+ .module = THIS_MODULE,
+};
+#endif
+
+int ovs_ct_init(struct net *net)
{
unsigned int n_bits = sizeof(struct ovs_key_ct_labels) * BITS_PER_BYTE;
struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
@@ -1666,12 +2205,22 @@
} else {
ovs_net->xt_label = true;
}
+
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+ return ovs_ct_limit_init(net, ovs_net);
+#else
+ return 0;
+#endif
}
void ovs_ct_exit(struct net *net)
{
struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+ ovs_ct_limit_exit(net, ovs_net);
+#endif
+
if (ovs_net->xt_label)
nf_connlabels_put(net);
}
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index 399dfdd..900dadd 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -17,10 +17,11 @@
#include "flow.h"
struct ovs_conntrack_info;
+struct ovs_ct_limit_info;
enum ovs_key_attr;
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
-void ovs_ct_init(struct net *);
+int ovs_ct_init(struct net *);
void ovs_ct_exit(struct net *);
bool ovs_ct_verify(struct net *, enum ovs_key_attr attr);
int ovs_ct_copy_action(struct net *, const struct nlattr *,
@@ -44,7 +45,7 @@
#else
#include <linux/errno.h>
-static inline void ovs_ct_init(struct net *net) { }
+static inline int ovs_ct_init(struct net *net) { return 0; }
static inline void ovs_ct_exit(struct net *net) { }
@@ -104,4 +105,8 @@
#define CT_SUPPORTED_MASK 0
#endif /* CONFIG_NF_CONNTRACK */
+
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+extern struct genl_family dp_ct_limit_genl_family;
+#endif
#endif /* ovs_conntrack.h */
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 015e24e..a61818e 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2288,6 +2288,9 @@
&dp_flow_genl_family,
&dp_packet_genl_family,
&dp_meter_genl_family,
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+ &dp_ct_limit_genl_family,
+#endif
};
static void dp_unregister_genl(int n_families)
@@ -2323,8 +2326,7 @@
INIT_LIST_HEAD(&ovs_net->dps);
INIT_WORK(&ovs_net->dp_notify_work, ovs_dp_notify_wq);
- ovs_ct_init(net);
- return 0;
+ return ovs_ct_init(net);
}
static void __net_exit list_vports_from_net(struct net *net, struct net *dnet,
@@ -2469,3 +2471,4 @@
MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY);
MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY);
MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY);
+MODULE_ALIAS_GENL_FAMILY(OVS_CT_LIMIT_FAMILY);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 523d655..c9eb267 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -144,6 +144,9 @@
struct ovs_net {
struct list_head dps;
struct work_struct dp_notify_work;
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+ struct ovs_ct_limit_info *ct_limit_info;
+#endif
/* Module reference for configuring conntrack. */
bool xt_label;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index acb7b86..b00aa95 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -209,7 +209,7 @@
static void prb_fill_vlan_info(struct tpacket_kbdq_core *,
struct tpacket3_hdr *);
static void packet_flush_mclist(struct sock *sk);
-static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb);
+static u16 packet_pick_tx_queue(struct sk_buff *skb);
struct packet_skb_cb {
union {
@@ -243,40 +243,7 @@
static int packet_direct_xmit(struct sk_buff *skb)
{
- struct net_device *dev = skb->dev;
- struct sk_buff *orig_skb = skb;
- struct netdev_queue *txq;
- int ret = NETDEV_TX_BUSY;
- bool again = false;
-
- if (unlikely(!netif_running(dev) ||
- !netif_carrier_ok(dev)))
- goto drop;
-
- skb = validate_xmit_skb_list(skb, dev, &again);
- if (skb != orig_skb)
- goto drop;
-
- packet_pick_tx_queue(dev, skb);
- txq = skb_get_tx_queue(dev, skb);
-
- local_bh_disable();
-
- HARD_TX_LOCK(dev, txq, smp_processor_id());
- if (!netif_xmit_frozen_or_drv_stopped(txq))
- ret = netdev_start_xmit(skb, dev, txq, false);
- HARD_TX_UNLOCK(dev, txq);
-
- local_bh_enable();
-
- if (!dev_xmit_complete(ret))
- kfree_skb(skb);
-
- return ret;
-drop:
- atomic_long_inc(&dev->tx_dropped);
- kfree_skb_list(skb);
- return NET_XMIT_DROP;
+ return dev_direct_xmit(skb, packet_pick_tx_queue(skb));
}
static struct net_device *packet_cached_dev_get(struct packet_sock *po)
@@ -313,8 +280,9 @@
return (u16) raw_smp_processor_id() % dev->real_num_tx_queues;
}
-static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb)
+static u16 packet_pick_tx_queue(struct sk_buff *skb)
{
+ struct net_device *dev = skb->dev;
const struct net_device_ops *ops = dev->netdev_ops;
u16 queue_index;
@@ -326,7 +294,7 @@
queue_index = __packet_pick_tx_queue(dev, skb);
}
- skb_set_queue_mapping(skb, queue_index);
+ return queue_index;
}
/* __register_prot_hook must be invoked through register_prot_hook
diff --git a/net/qrtr/Kconfig b/net/qrtr/Kconfig
index 326fd97..1944834 100644
--- a/net/qrtr/Kconfig
+++ b/net/qrtr/Kconfig
@@ -21,4 +21,11 @@
Say Y here to support SMD based ipcrouter channels. SMD is the
most common transport for IPC Router.
+config QRTR_TUN
+ tristate "TUN device for Qualcomm IPC Router"
+ ---help---
+ Say Y here to expose a character device that allows user space to
+ implement endpoints of QRTR, for purpose of tunneling data to other
+ hosts or testing purposes.
+
endif # QRTR
diff --git a/net/qrtr/Makefile b/net/qrtr/Makefile
index ab09e40..be012bf 100644
--- a/net/qrtr/Makefile
+++ b/net/qrtr/Makefile
@@ -2,3 +2,5 @@
obj-$(CONFIG_QRTR_SMD) += qrtr-smd.o
qrtr-smd-y := smd.o
+obj-$(CONFIG_QRTR_TUN) += qrtr-tun.o
+qrtr-tun-y := tun.o
diff --git a/net/qrtr/tun.c b/net/qrtr/tun.c
new file mode 100644
index 0000000..ccff1e5
--- /dev/null
+++ b/net/qrtr/tun.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018, Linaro Ltd */
+
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/skbuff.h>
+#include <linux/uaccess.h>
+
+#include "qrtr.h"
+
+struct qrtr_tun {
+ struct qrtr_endpoint ep;
+
+ struct sk_buff_head queue;
+ wait_queue_head_t readq;
+};
+
+static int qrtr_tun_send(struct qrtr_endpoint *ep, struct sk_buff *skb)
+{
+ struct qrtr_tun *tun = container_of(ep, struct qrtr_tun, ep);
+
+ skb_queue_tail(&tun->queue, skb);
+
+ /* wake up any blocking processes, waiting for new data */
+ wake_up_interruptible(&tun->readq);
+
+ return 0;
+}
+
+static int qrtr_tun_open(struct inode *inode, struct file *filp)
+{
+ struct qrtr_tun *tun;
+
+ tun = kzalloc(sizeof(*tun), GFP_KERNEL);
+ if (!tun)
+ return -ENOMEM;
+
+ skb_queue_head_init(&tun->queue);
+ init_waitqueue_head(&tun->readq);
+
+ tun->ep.xmit = qrtr_tun_send;
+
+ filp->private_data = tun;
+
+ return qrtr_endpoint_register(&tun->ep, QRTR_EP_NID_AUTO);
+}
+
+static ssize_t qrtr_tun_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct file *filp = iocb->ki_filp;
+ struct qrtr_tun *tun = filp->private_data;
+ struct sk_buff *skb;
+ int count;
+
+ while (!(skb = skb_dequeue(&tun->queue))) {
+ if (filp->f_flags & O_NONBLOCK)
+ return -EAGAIN;
+
+ /* Wait until we get data or the endpoint goes away */
+ if (wait_event_interruptible(tun->readq,
+ !skb_queue_empty(&tun->queue)))
+ return -ERESTARTSYS;
+ }
+
+ count = min_t(size_t, iov_iter_count(to), skb->len);
+ if (copy_to_iter(skb->data, count, to) != count)
+ count = -EFAULT;
+
+ kfree_skb(skb);
+
+ return count;
+}
+
+static ssize_t qrtr_tun_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *filp = iocb->ki_filp;
+ struct qrtr_tun *tun = filp->private_data;
+ size_t len = iov_iter_count(from);
+ ssize_t ret;
+ void *kbuf;
+
+ kbuf = kzalloc(len, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ if (!copy_from_iter_full(kbuf, len, from))
+ return -EFAULT;
+
+ ret = qrtr_endpoint_post(&tun->ep, kbuf, len);
+
+ return ret < 0 ? ret : len;
+}
+
+static __poll_t qrtr_tun_poll(struct file *filp, poll_table *wait)
+{
+ struct qrtr_tun *tun = filp->private_data;
+ __poll_t mask = 0;
+
+ poll_wait(filp, &tun->readq, wait);
+
+ if (!skb_queue_empty(&tun->queue))
+ mask |= EPOLLIN | EPOLLRDNORM;
+
+ return mask;
+}
+
+static int qrtr_tun_release(struct inode *inode, struct file *filp)
+{
+ struct qrtr_tun *tun = filp->private_data;
+ struct sk_buff *skb;
+
+ qrtr_endpoint_unregister(&tun->ep);
+
+ /* Discard all SKBs */
+ while (!skb_queue_empty(&tun->queue)) {
+ skb = skb_dequeue(&tun->queue);
+ kfree_skb(skb);
+ }
+
+ kfree(tun);
+
+ return 0;
+}
+
+static const struct file_operations qrtr_tun_ops = {
+ .owner = THIS_MODULE,
+ .open = qrtr_tun_open,
+ .poll = qrtr_tun_poll,
+ .read_iter = qrtr_tun_read_iter,
+ .write_iter = qrtr_tun_write_iter,
+ .release = qrtr_tun_release,
+};
+
+static struct miscdevice qrtr_tun_miscdev = {
+ MISC_DYNAMIC_MINOR,
+ "qrtr-tun",
+ &qrtr_tun_ops,
+};
+
+static int __init qrtr_tun_init(void)
+{
+ int ret;
+
+ ret = misc_register(&qrtr_tun_miscdev);
+ if (ret)
+ pr_err("failed to register Qualcomm IPC Router tun device\n");
+
+ return ret;
+}
+
+static void __exit qrtr_tun_exit(void)
+{
+ misc_deregister(&qrtr_tun_miscdev);
+}
+
+module_init(qrtr_tun_init);
+module_exit(qrtr_tun_exit);
+
+MODULE_DESCRIPTION("Qualcomm IPC Router TUN device");
+MODULE_LICENSE("GPL v2");
diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 59d0eb9..a7a4e6f 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -178,9 +178,10 @@
}
static struct led_trigger rfkill_any_led_trigger;
-static struct work_struct rfkill_any_work;
+static struct led_trigger rfkill_none_led_trigger;
+static struct work_struct rfkill_global_led_trigger_work;
-static void rfkill_any_led_trigger_worker(struct work_struct *work)
+static void rfkill_global_led_trigger_worker(struct work_struct *work)
{
enum led_brightness brightness = LED_OFF;
struct rfkill *rfkill;
@@ -195,30 +196,43 @@
mutex_unlock(&rfkill_global_mutex);
led_trigger_event(&rfkill_any_led_trigger, brightness);
+ led_trigger_event(&rfkill_none_led_trigger,
+ brightness == LED_OFF ? LED_FULL : LED_OFF);
}
-static void rfkill_any_led_trigger_event(void)
+static void rfkill_global_led_trigger_event(void)
{
- schedule_work(&rfkill_any_work);
+ schedule_work(&rfkill_global_led_trigger_work);
}
-static void rfkill_any_led_trigger_activate(struct led_classdev *led_cdev)
+static int rfkill_global_led_trigger_register(void)
{
- rfkill_any_led_trigger_event();
-}
+ int ret;
-static int rfkill_any_led_trigger_register(void)
-{
- INIT_WORK(&rfkill_any_work, rfkill_any_led_trigger_worker);
+ INIT_WORK(&rfkill_global_led_trigger_work,
+ rfkill_global_led_trigger_worker);
+
rfkill_any_led_trigger.name = "rfkill-any";
- rfkill_any_led_trigger.activate = rfkill_any_led_trigger_activate;
- return led_trigger_register(&rfkill_any_led_trigger);
+ ret = led_trigger_register(&rfkill_any_led_trigger);
+ if (ret)
+ return ret;
+
+ rfkill_none_led_trigger.name = "rfkill-none";
+ ret = led_trigger_register(&rfkill_none_led_trigger);
+ if (ret)
+ led_trigger_unregister(&rfkill_any_led_trigger);
+ else
+ /* Delay activation until all global triggers are registered */
+ rfkill_global_led_trigger_event();
+
+ return ret;
}
-static void rfkill_any_led_trigger_unregister(void)
+static void rfkill_global_led_trigger_unregister(void)
{
+ led_trigger_unregister(&rfkill_none_led_trigger);
led_trigger_unregister(&rfkill_any_led_trigger);
- cancel_work_sync(&rfkill_any_work);
+ cancel_work_sync(&rfkill_global_led_trigger_work);
}
#else
static void rfkill_led_trigger_event(struct rfkill *rfkill)
@@ -234,16 +248,16 @@
{
}
-static void rfkill_any_led_trigger_event(void)
+static void rfkill_global_led_trigger_event(void)
{
}
-static int rfkill_any_led_trigger_register(void)
+static int rfkill_global_led_trigger_register(void)
{
return 0;
}
-static void rfkill_any_led_trigger_unregister(void)
+static void rfkill_global_led_trigger_unregister(void)
{
}
#endif /* CONFIG_RFKILL_LEDS */
@@ -354,7 +368,7 @@
spin_unlock_irqrestore(&rfkill->lock, flags);
rfkill_led_trigger_event(rfkill);
- rfkill_any_led_trigger_event();
+ rfkill_global_led_trigger_event();
if (prev != curr)
rfkill_event(rfkill);
@@ -535,7 +549,7 @@
spin_unlock_irqrestore(&rfkill->lock, flags);
rfkill_led_trigger_event(rfkill);
- rfkill_any_led_trigger_event();
+ rfkill_global_led_trigger_event();
if (rfkill->registered && prev != blocked)
schedule_work(&rfkill->uevent_work);
@@ -579,7 +593,7 @@
schedule_work(&rfkill->uevent_work);
rfkill_led_trigger_event(rfkill);
- rfkill_any_led_trigger_event();
+ rfkill_global_led_trigger_event();
return blocked;
}
@@ -629,7 +643,7 @@
schedule_work(&rfkill->uevent_work);
rfkill_led_trigger_event(rfkill);
- rfkill_any_led_trigger_event();
+ rfkill_global_led_trigger_event();
}
}
EXPORT_SYMBOL(rfkill_set_states);
@@ -1046,7 +1060,7 @@
#endif
}
- rfkill_any_led_trigger_event();
+ rfkill_global_led_trigger_event();
rfkill_send_events(rfkill, RFKILL_OP_ADD);
mutex_unlock(&rfkill_global_mutex);
@@ -1079,7 +1093,7 @@
mutex_lock(&rfkill_global_mutex);
rfkill_send_events(rfkill, RFKILL_OP_DEL);
list_del_init(&rfkill->node);
- rfkill_any_led_trigger_event();
+ rfkill_global_led_trigger_event();
mutex_unlock(&rfkill_global_mutex);
rfkill_led_trigger_unregister(rfkill);
@@ -1332,7 +1346,7 @@
if (error)
goto error_misc;
- error = rfkill_any_led_trigger_register();
+ error = rfkill_global_led_trigger_register();
if (error)
goto error_led_trigger;
@@ -1346,7 +1360,7 @@
#ifdef CONFIG_RFKILL_INPUT
error_input:
- rfkill_any_led_trigger_unregister();
+ rfkill_global_led_trigger_unregister();
#endif
error_led_trigger:
misc_deregister(&rfkill_miscdev);
@@ -1362,7 +1376,7 @@
#ifdef CONFIG_RFKILL_INPUT
rfkill_handler_exit();
#endif
- rfkill_any_led_trigger_unregister();
+ rfkill_global_led_trigger_unregister();
misc_deregister(&rfkill_miscdev);
class_unregister(&rfkill_class);
}
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 72251241..3f4cf93 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -77,9 +77,9 @@
static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
{
- spin_lock_bh(&idrinfo->lock);
+ spin_lock(&idrinfo->lock);
idr_remove(&idrinfo->action_idr, p->tcfa_index);
- spin_unlock_bh(&idrinfo->lock);
+ spin_unlock(&idrinfo->lock);
gen_kill_estimator(&p->tcfa_rate_est);
free_tcf(p);
}
@@ -156,7 +156,7 @@
struct tc_action *p;
unsigned long id = 1;
- spin_lock_bh(&idrinfo->lock);
+ spin_lock(&idrinfo->lock);
s_i = cb->args[0];
@@ -191,7 +191,7 @@
if (index >= 0)
cb->args[0] = index + 1;
- spin_unlock_bh(&idrinfo->lock);
+ spin_unlock(&idrinfo->lock);
if (n_i) {
if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
cb->args[1] = n_i;
@@ -261,9 +261,9 @@
{
struct tc_action *p = NULL;
- spin_lock_bh(&idrinfo->lock);
+ spin_lock(&idrinfo->lock);
p = idr_find(&idrinfo->action_idr, index);
- spin_unlock_bh(&idrinfo->lock);
+ spin_unlock(&idrinfo->lock);
return p;
}
@@ -323,7 +323,7 @@
}
spin_lock_init(&p->tcfa_lock);
idr_preload(GFP_KERNEL);
- spin_lock_bh(&idrinfo->lock);
+ spin_lock(&idrinfo->lock);
/* user doesn't specify an index */
if (!index) {
index = 1;
@@ -331,7 +331,7 @@
} else {
err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
}
- spin_unlock_bh(&idrinfo->lock);
+ spin_unlock(&idrinfo->lock);
idr_preload_end();
if (err)
goto err3;
@@ -369,9 +369,9 @@
{
struct tcf_idrinfo *idrinfo = tn->idrinfo;
- spin_lock_bh(&idrinfo->lock);
+ spin_lock(&idrinfo->lock);
idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
- spin_unlock_bh(&idrinfo->lock);
+ spin_unlock(&idrinfo->lock);
}
EXPORT_SYMBOL(tcf_idr_insert);
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 7e28b2c..526a8e4 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -648,6 +648,11 @@
return tcf_idr_search(tn, a, index);
}
+static size_t tcf_csum_get_fill_size(const struct tc_action *act)
+{
+ return nla_total_size(sizeof(struct tc_csum));
+}
+
static struct tc_action_ops act_csum_ops = {
.kind = "csum",
.type = TCA_ACT_CSUM,
@@ -658,6 +663,7 @@
.cleanup = tcf_csum_cleanup,
.walk = tcf_csum_walker,
.lookup = tcf_csum_search,
+ .get_fill_size = tcf_csum_get_fill_size,
.size = sizeof(struct tcf_csum),
};
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a57e112..76303c4 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -103,9 +103,10 @@
}
EXPORT_SYMBOL(unregister_tcf_proto_ops);
-bool tcf_queue_work(struct work_struct *work)
+bool tcf_queue_work(struct rcu_work *rwork, work_func_t func)
{
- return queue_work(tc_filter_wq, work);
+ INIT_RCU_WORK(rwork, func);
+ return queue_rcu_work(tc_filter_wq, rwork);
}
EXPORT_SYMBOL(tcf_queue_work);
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 6b7ab35..95367f3 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -35,10 +35,7 @@
struct tcf_result res;
struct tcf_proto *tp;
struct list_head link;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -97,21 +94,14 @@
static void basic_delete_filter_work(struct work_struct *work)
{
- struct basic_filter *f = container_of(work, struct basic_filter, work);
-
+ struct basic_filter *f = container_of(to_rcu_work(work),
+ struct basic_filter,
+ rwork);
rtnl_lock();
__basic_delete_filter(f);
rtnl_unlock();
}
-static void basic_delete_filter(struct rcu_head *head)
-{
- struct basic_filter *f = container_of(head, struct basic_filter, rcu);
-
- INIT_WORK(&f->work, basic_delete_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void basic_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
{
struct basic_head *head = rtnl_dereference(tp->root);
@@ -122,7 +112,7 @@
tcf_unbind_filter(tp, &f->res);
idr_remove(&head->handle_idr, f->handle);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, basic_delete_filter);
+ tcf_queue_work(&f->rwork, basic_delete_filter_work);
else
__basic_delete_filter(f);
}
@@ -140,7 +130,7 @@
tcf_unbind_filter(tp, &f->res);
idr_remove(&head->handle_idr, f->handle);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, basic_delete_filter);
+ tcf_queue_work(&f->rwork, basic_delete_filter_work);
*last = list_empty(&head->flist);
return 0;
}
@@ -234,7 +224,7 @@
list_replace_rcu(&fold->link, &fnew->link);
tcf_unbind_filter(tp, &fold->res);
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, basic_delete_filter);
+ tcf_queue_work(&fold->rwork, basic_delete_filter_work);
} else {
list_add_rcu(&fnew->link, &head->flist);
}
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index b07c1fa..1aa7f65 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -49,10 +49,7 @@
struct sock_filter *bpf_ops;
const char *bpf_name;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
@@ -275,21 +272,14 @@
static void cls_bpf_delete_prog_work(struct work_struct *work)
{
- struct cls_bpf_prog *prog = container_of(work, struct cls_bpf_prog, work);
-
+ struct cls_bpf_prog *prog = container_of(to_rcu_work(work),
+ struct cls_bpf_prog,
+ rwork);
rtnl_lock();
__cls_bpf_delete_prog(prog);
rtnl_unlock();
}
-static void cls_bpf_delete_prog_rcu(struct rcu_head *rcu)
-{
- struct cls_bpf_prog *prog = container_of(rcu, struct cls_bpf_prog, rcu);
-
- INIT_WORK(&prog->work, cls_bpf_delete_prog_work);
- tcf_queue_work(&prog->work);
-}
-
static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog,
struct netlink_ext_ack *extack)
{
@@ -300,7 +290,7 @@
list_del_rcu(&prog->link);
tcf_unbind_filter(tp, &prog->res);
if (tcf_exts_get_net(&prog->exts))
- call_rcu(&prog->rcu, cls_bpf_delete_prog_rcu);
+ tcf_queue_work(&prog->rwork, cls_bpf_delete_prog_work);
else
__cls_bpf_delete_prog(prog);
}
@@ -526,7 +516,7 @@
list_replace_rcu(&oldprog->link, &prog->link);
tcf_unbind_filter(tp, &oldprog->res);
tcf_exts_get_net(&oldprog->exts);
- call_rcu(&oldprog->rcu, cls_bpf_delete_prog_rcu);
+ tcf_queue_work(&oldprog->rwork, cls_bpf_delete_prog_work);
} else {
list_add_rcu(&prog->link, &head->plist);
}
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 762da5c..3bc01bd 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -23,10 +23,7 @@
struct tcf_exts exts;
struct tcf_ematch_tree ematches;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static int cls_cgroup_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -70,24 +67,14 @@
static void cls_cgroup_destroy_work(struct work_struct *work)
{
- struct cls_cgroup_head *head = container_of(work,
+ struct cls_cgroup_head *head = container_of(to_rcu_work(work),
struct cls_cgroup_head,
- work);
+ rwork);
rtnl_lock();
__cls_cgroup_destroy(head);
rtnl_unlock();
}
-static void cls_cgroup_destroy_rcu(struct rcu_head *root)
-{
- struct cls_cgroup_head *head = container_of(root,
- struct cls_cgroup_head,
- rcu);
-
- INIT_WORK(&head->work, cls_cgroup_destroy_work);
- tcf_queue_work(&head->work);
-}
-
static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base,
u32 handle, struct nlattr **tca,
@@ -134,7 +121,7 @@
rcu_assign_pointer(tp->root, new);
if (head) {
tcf_exts_get_net(&head->exts);
- call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
+ tcf_queue_work(&head->rwork, cls_cgroup_destroy_work);
}
return 0;
errout:
@@ -151,7 +138,7 @@
/* Head can still be NULL due to cls_cgroup_init(). */
if (head) {
if (tcf_exts_get_net(&head->exts))
- call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
+ tcf_queue_work(&head->rwork, cls_cgroup_destroy_work);
else
__cls_cgroup_destroy(head);
}
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index cd5fe38..2bb043c 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -57,10 +57,7 @@
u32 divisor;
u32 baseclass;
u32 hashrnd;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static inline u32 addr_fold(void *addr)
@@ -383,21 +380,14 @@
static void flow_destroy_filter_work(struct work_struct *work)
{
- struct flow_filter *f = container_of(work, struct flow_filter, work);
-
+ struct flow_filter *f = container_of(to_rcu_work(work),
+ struct flow_filter,
+ rwork);
rtnl_lock();
__flow_destroy_filter(f);
rtnl_unlock();
}
-static void flow_destroy_filter(struct rcu_head *head)
-{
- struct flow_filter *f = container_of(head, struct flow_filter, rcu);
-
- INIT_WORK(&f->work, flow_destroy_filter_work);
- tcf_queue_work(&f->work);
-}
-
static int flow_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base,
u32 handle, struct nlattr **tca,
@@ -563,7 +553,7 @@
if (fold) {
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, flow_destroy_filter);
+ tcf_queue_work(&fold->rwork, flow_destroy_filter_work);
}
return 0;
@@ -583,7 +573,7 @@
list_del_rcu(&f->list);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, flow_destroy_filter);
+ tcf_queue_work(&f->rwork, flow_destroy_filter_work);
*last = list_empty(&head->filters);
return 0;
}
@@ -608,7 +598,7 @@
list_for_each_entry_safe(f, next, &head->filters, list) {
list_del_rcu(&f->list);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, flow_destroy_filter);
+ tcf_queue_work(&f->rwork, flow_destroy_filter_work);
else
__flow_destroy_filter(f);
}
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d964e60..4e74508 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -61,24 +61,24 @@
struct fl_flow_mask {
struct fl_flow_key key;
struct fl_flow_mask_range range;
- struct rcu_head rcu;
+ struct rhash_head ht_node;
+ struct rhashtable ht;
+ struct rhashtable_params filter_ht_params;
+ struct flow_dissector dissector;
+ struct list_head filters;
+ struct rcu_head rcu;
+ struct list_head list;
};
struct cls_fl_head {
struct rhashtable ht;
- struct fl_flow_mask mask;
- struct flow_dissector dissector;
- bool mask_assigned;
- struct list_head filters;
- struct rhashtable_params ht_params;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct list_head masks;
+ struct rcu_work rwork;
struct idr handle_idr;
};
struct cls_fl_filter {
+ struct fl_flow_mask *mask;
struct rhash_head ht_node;
struct fl_flow_key mkey;
struct tcf_exts exts;
@@ -87,13 +87,17 @@
struct list_head list;
u32 handle;
u32 flags;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
struct net_device *hw_dev;
};
+static const struct rhashtable_params mask_ht_params = {
+ .key_offset = offsetof(struct fl_flow_mask, key),
+ .key_len = sizeof(struct fl_flow_key),
+ .head_offset = offsetof(struct fl_flow_mask, ht_node),
+ .automatic_shrinking = true,
+};
+
static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
{
return mask->range.end - mask->range.start;
@@ -103,13 +107,19 @@
{
const u8 *bytes = (const u8 *) &mask->key;
size_t size = sizeof(mask->key);
- size_t i, first = 0, last = size - 1;
+ size_t i, first = 0, last;
- for (i = 0; i < sizeof(mask->key); i++) {
+ for (i = 0; i < size; i++) {
if (bytes[i]) {
- if (!first && i)
- first = i;
+ first = i;
+ break;
+ }
+ }
+ last = first;
+ for (i = size - 1; i != first; i--) {
+ if (bytes[i]) {
last = i;
+ break;
}
}
mask->range.start = rounddown(first, sizeof(long));
@@ -140,12 +150,11 @@
memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
}
-static struct cls_fl_filter *fl_lookup(struct cls_fl_head *head,
+static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
struct fl_flow_key *mkey)
{
- return rhashtable_lookup_fast(&head->ht,
- fl_key_get_start(mkey, &head->mask),
- head->ht_params);
+ return rhashtable_lookup_fast(&mask->ht, fl_key_get_start(mkey, mask),
+ mask->filter_ht_params);
}
static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -153,28 +162,28 @@
{
struct cls_fl_head *head = rcu_dereference_bh(tp->root);
struct cls_fl_filter *f;
+ struct fl_flow_mask *mask;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
- if (!atomic_read(&head->ht.nelems))
- return -1;
+ list_for_each_entry_rcu(mask, &head->masks, list) {
+ fl_clear_masked_range(&skb_key, mask);
- fl_clear_masked_range(&skb_key, &head->mask);
+ skb_key.indev_ifindex = skb->skb_iif;
+ /* skb_flow_dissect() does not set n_proto in case an unknown
+ * protocol, so do it rather here.
+ */
+ skb_key.basic.n_proto = skb->protocol;
+ skb_flow_dissect_tunnel_info(skb, &mask->dissector, &skb_key);
+ skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
- skb_key.indev_ifindex = skb->skb_iif;
- /* skb_flow_dissect() does not set n_proto in case an unknown protocol,
- * so do it rather here.
- */
- skb_key.basic.n_proto = skb->protocol;
- skb_flow_dissect_tunnel_info(skb, &head->dissector, &skb_key);
- skb_flow_dissect(skb, &head->dissector, &skb_key, 0);
+ fl_set_masked_key(&skb_mkey, &skb_key, mask);
- fl_set_masked_key(&skb_mkey, &skb_key, &head->mask);
-
- f = fl_lookup(head, &skb_mkey);
- if (f && !tc_skip_sw(f->flags)) {
- *res = f->res;
- return tcf_exts_exec(skb, &f->exts, res);
+ f = fl_lookup(mask, &skb_mkey);
+ if (f && !tc_skip_sw(f->flags)) {
+ *res = f->res;
+ return tcf_exts_exec(skb, &f->exts, res);
+ }
}
return -1;
}
@@ -187,11 +196,28 @@
if (!head)
return -ENOBUFS;
- INIT_LIST_HEAD_RCU(&head->filters);
+ INIT_LIST_HEAD_RCU(&head->masks);
rcu_assign_pointer(tp->root, head);
idr_init(&head->handle_idr);
- return 0;
+ return rhashtable_init(&head->ht, &mask_ht_params);
+}
+
+static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
+ bool async)
+{
+ if (!list_empty(&mask->filters))
+ return false;
+
+ rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params);
+ rhashtable_destroy(&mask->ht);
+ list_del_rcu(&mask->list);
+ if (async)
+ kfree_rcu(mask, rcu);
+ else
+ kfree(mask);
+
+ return true;
}
static void __fl_destroy_filter(struct cls_fl_filter *f)
@@ -203,21 +229,14 @@
static void fl_destroy_filter_work(struct work_struct *work)
{
- struct cls_fl_filter *f = container_of(work, struct cls_fl_filter, work);
+ struct cls_fl_filter *f = container_of(to_rcu_work(work),
+ struct cls_fl_filter, rwork);
rtnl_lock();
__fl_destroy_filter(f);
rtnl_unlock();
}
-static void fl_destroy_filter(struct rcu_head *head)
-{
- struct cls_fl_filter *f = container_of(head, struct cls_fl_filter, rcu);
-
- INIT_WORK(&f->work, fl_destroy_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
struct netlink_ext_ack *extack)
{
@@ -234,8 +253,6 @@
}
static int fl_hw_replace_filter(struct tcf_proto *tp,
- struct flow_dissector *dissector,
- struct fl_flow_key *mask,
struct cls_fl_filter *f,
struct netlink_ext_ack *extack)
{
@@ -247,8 +264,8 @@
tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
cls_flower.command = TC_CLSFLOWER_REPLACE;
cls_flower.cookie = (unsigned long) f;
- cls_flower.dissector = dissector;
- cls_flower.mask = mask;
+ cls_flower.dissector = &f->mask->dissector;
+ cls_flower.mask = &f->mask->key;
cls_flower.key = &f->mkey;
cls_flower.exts = &f->exts;
cls_flower.classid = f->res.classid;
@@ -283,51 +300,52 @@
&cls_flower, false);
}
-static void __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
+static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
struct netlink_ext_ack *extack)
{
struct cls_fl_head *head = rtnl_dereference(tp->root);
+ bool async = tcf_exts_get_net(&f->exts);
+ bool last;
idr_remove(&head->handle_idr, f->handle);
list_del_rcu(&f->list);
+ last = fl_mask_put(head, f->mask, async);
if (!tc_skip_hw(f->flags))
fl_hw_destroy_filter(tp, f, extack);
tcf_unbind_filter(tp, &f->res);
- if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, fl_destroy_filter);
+ if (async)
+ tcf_queue_work(&f->rwork, fl_destroy_filter_work);
else
__fl_destroy_filter(f);
+
+ return last;
}
static void fl_destroy_sleepable(struct work_struct *work)
{
- struct cls_fl_head *head = container_of(work, struct cls_fl_head,
- work);
- if (head->mask_assigned)
- rhashtable_destroy(&head->ht);
+ struct cls_fl_head *head = container_of(to_rcu_work(work),
+ struct cls_fl_head,
+ rwork);
kfree(head);
module_put(THIS_MODULE);
}
-static void fl_destroy_rcu(struct rcu_head *rcu)
-{
- struct cls_fl_head *head = container_of(rcu, struct cls_fl_head, rcu);
-
- INIT_WORK(&head->work, fl_destroy_sleepable);
- schedule_work(&head->work);
-}
-
static void fl_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
{
struct cls_fl_head *head = rtnl_dereference(tp->root);
+ struct fl_flow_mask *mask, *next_mask;
struct cls_fl_filter *f, *next;
- list_for_each_entry_safe(f, next, &head->filters, list)
- __fl_delete(tp, f, extack);
+ list_for_each_entry_safe(mask, next_mask, &head->masks, list) {
+ list_for_each_entry_safe(f, next, &mask->filters, list) {
+ if (__fl_delete(tp, f, extack))
+ break;
+ }
+ }
idr_destroy(&head->handle_idr);
__module_get(THIS_MODULE);
- call_rcu(&head->rcu, fl_destroy_rcu);
+ tcf_queue_work(&head->rwork, fl_destroy_sleepable);
}
static void *fl_get(struct tcf_proto *tp, u32 handle)
@@ -715,14 +733,14 @@
return ret;
}
-static bool fl_mask_eq(struct fl_flow_mask *mask1,
- struct fl_flow_mask *mask2)
+static void fl_mask_copy(struct fl_flow_mask *dst,
+ struct fl_flow_mask *src)
{
- const long *lmask1 = fl_key_get_start(&mask1->key, mask1);
- const long *lmask2 = fl_key_get_start(&mask2->key, mask2);
+ const void *psrc = fl_key_get_start(&src->key, src);
+ void *pdst = fl_key_get_start(&dst->key, src);
- return !memcmp(&mask1->range, &mask2->range, sizeof(mask1->range)) &&
- !memcmp(lmask1, lmask2, fl_mask_range(mask1));
+ memcpy(pdst, psrc, fl_mask_range(src));
+ dst->range = src->range;
}
static const struct rhashtable_params fl_ht_params = {
@@ -731,14 +749,13 @@
.automatic_shrinking = true,
};
-static int fl_init_hashtable(struct cls_fl_head *head,
- struct fl_flow_mask *mask)
+static int fl_init_mask_hashtable(struct fl_flow_mask *mask)
{
- head->ht_params = fl_ht_params;
- head->ht_params.key_len = fl_mask_range(mask);
- head->ht_params.key_offset += mask->range.start;
+ mask->filter_ht_params = fl_ht_params;
+ mask->filter_ht_params.key_len = fl_mask_range(mask);
+ mask->filter_ht_params.key_offset += mask->range.start;
- return rhashtable_init(&head->ht, &head->ht_params);
+ return rhashtable_init(&mask->ht, &mask->filter_ht_params);
}
#define FL_KEY_MEMBER_OFFSET(member) offsetof(struct fl_flow_key, member)
@@ -761,8 +778,7 @@
FL_KEY_SET(keys, cnt, id, member); \
} while(0);
-static void fl_init_dissector(struct cls_fl_head *head,
- struct fl_flow_mask *mask)
+static void fl_init_dissector(struct fl_flow_mask *mask)
{
struct flow_dissector_key keys[FLOW_DISSECTOR_KEY_MAX];
size_t cnt = 0;
@@ -802,32 +818,67 @@
FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
FLOW_DISSECTOR_KEY_ENC_PORTS, enc_tp);
- skb_flow_dissector_init(&head->dissector, keys, cnt);
+ skb_flow_dissector_init(&mask->dissector, keys, cnt);
+}
+
+static struct fl_flow_mask *fl_create_new_mask(struct cls_fl_head *head,
+ struct fl_flow_mask *mask)
+{
+ struct fl_flow_mask *newmask;
+ int err;
+
+ newmask = kzalloc(sizeof(*newmask), GFP_KERNEL);
+ if (!newmask)
+ return ERR_PTR(-ENOMEM);
+
+ fl_mask_copy(newmask, mask);
+
+ err = fl_init_mask_hashtable(newmask);
+ if (err)
+ goto errout_free;
+
+ fl_init_dissector(newmask);
+
+ INIT_LIST_HEAD_RCU(&newmask->filters);
+
+ err = rhashtable_insert_fast(&head->ht, &newmask->ht_node,
+ mask_ht_params);
+ if (err)
+ goto errout_destroy;
+
+ list_add_tail_rcu(&newmask->list, &head->masks);
+
+ return newmask;
+
+errout_destroy:
+ rhashtable_destroy(&newmask->ht);
+errout_free:
+ kfree(newmask);
+
+ return ERR_PTR(err);
}
static int fl_check_assign_mask(struct cls_fl_head *head,
+ struct cls_fl_filter *fnew,
+ struct cls_fl_filter *fold,
struct fl_flow_mask *mask)
{
- int err;
+ struct fl_flow_mask *newmask;
- if (head->mask_assigned) {
- if (!fl_mask_eq(&head->mask, mask))
+ fnew->mask = rhashtable_lookup_fast(&head->ht, mask, mask_ht_params);
+ if (!fnew->mask) {
+ if (fold)
return -EINVAL;
- else
- return 0;
+
+ newmask = fl_create_new_mask(head, mask);
+ if (IS_ERR(newmask))
+ return PTR_ERR(newmask);
+
+ fnew->mask = newmask;
+ } else if (fold && fold->mask == fnew->mask) {
+ return -EINVAL;
}
- /* Mask is not assigned yet. So assign it and init hashtable
- * according to that.
- */
- err = fl_init_hashtable(head, mask);
- if (err)
- return err;
- memcpy(&head->mask, mask, sizeof(head->mask));
- head->mask_assigned = true;
-
- fl_init_dissector(head, mask);
-
return 0;
}
@@ -924,30 +975,26 @@
if (err)
goto errout_idr;
- err = fl_check_assign_mask(head, &mask);
+ err = fl_check_assign_mask(head, fnew, fold, &mask);
if (err)
goto errout_idr;
if (!tc_skip_sw(fnew->flags)) {
- if (!fold && fl_lookup(head, &fnew->mkey)) {
+ if (!fold && fl_lookup(fnew->mask, &fnew->mkey)) {
err = -EEXIST;
- goto errout_idr;
+ goto errout_mask;
}
- err = rhashtable_insert_fast(&head->ht, &fnew->ht_node,
- head->ht_params);
+ err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
+ fnew->mask->filter_ht_params);
if (err)
- goto errout_idr;
+ goto errout_mask;
}
if (!tc_skip_hw(fnew->flags)) {
- err = fl_hw_replace_filter(tp,
- &head->dissector,
- &mask.key,
- fnew,
- extack);
+ err = fl_hw_replace_filter(tp, fnew, extack);
if (err)
- goto errout_idr;
+ goto errout_mask;
}
if (!tc_in_hw(fnew->flags))
@@ -955,8 +1002,9 @@
if (fold) {
if (!tc_skip_sw(fold->flags))
- rhashtable_remove_fast(&head->ht, &fold->ht_node,
- head->ht_params);
+ rhashtable_remove_fast(&fold->mask->ht,
+ &fold->ht_node,
+ fold->mask->filter_ht_params);
if (!tc_skip_hw(fold->flags))
fl_hw_destroy_filter(tp, fold, NULL);
}
@@ -968,14 +1016,17 @@
list_replace_rcu(&fold->list, &fnew->list);
tcf_unbind_filter(tp, &fold->res);
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, fl_destroy_filter);
+ tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
} else {
- list_add_tail_rcu(&fnew->list, &head->filters);
+ list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
}
kfree(tb);
return 0;
+errout_mask:
+ fl_mask_put(head, fnew->mask, false);
+
errout_idr:
if (fnew->handle)
idr_remove(&head->handle_idr, fnew->handle);
@@ -994,10 +1045,10 @@
struct cls_fl_filter *f = arg;
if (!tc_skip_sw(f->flags))
- rhashtable_remove_fast(&head->ht, &f->ht_node,
- head->ht_params);
+ rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
+ f->mask->filter_ht_params);
__fl_delete(tp, f, extack);
- *last = list_empty(&head->filters);
+ *last = list_empty(&head->masks);
return 0;
}
@@ -1005,16 +1056,19 @@
{
struct cls_fl_head *head = rtnl_dereference(tp->root);
struct cls_fl_filter *f;
+ struct fl_flow_mask *mask;
- list_for_each_entry_rcu(f, &head->filters, list) {
- if (arg->count < arg->skip)
- goto skip;
- if (arg->fn(tp, f, arg) < 0) {
- arg->stop = 1;
- break;
- }
+ list_for_each_entry_rcu(mask, &head->masks, list) {
+ list_for_each_entry_rcu(f, &mask->filters, list) {
+ if (arg->count < arg->skip)
+ goto skip;
+ if (arg->fn(tp, f, arg) < 0) {
+ arg->stop = 1;
+ break;
+ }
skip:
- arg->count++;
+ arg->count++;
+ }
}
}
@@ -1150,7 +1204,6 @@
static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
struct sk_buff *skb, struct tcmsg *t)
{
- struct cls_fl_head *head = rtnl_dereference(tp->root);
struct cls_fl_filter *f = fh;
struct nlattr *nest;
struct fl_flow_key *key, *mask;
@@ -1169,7 +1222,7 @@
goto nla_put_failure;
key = &f->key;
- mask = &head->mask.key;
+ mask = &f->mask->key;
if (mask->indev_ifindex) {
struct net_device *dev;
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 8b20772..29eeeaf 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -47,10 +47,7 @@
#endif /* CONFIG_NET_CLS_IND */
struct tcf_exts exts;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static u32 fw_hash(u32 handle)
@@ -134,21 +131,14 @@
static void fw_delete_filter_work(struct work_struct *work)
{
- struct fw_filter *f = container_of(work, struct fw_filter, work);
-
+ struct fw_filter *f = container_of(to_rcu_work(work),
+ struct fw_filter,
+ rwork);
rtnl_lock();
__fw_delete_filter(f);
rtnl_unlock();
}
-static void fw_delete_filter(struct rcu_head *head)
-{
- struct fw_filter *f = container_of(head, struct fw_filter, rcu);
-
- INIT_WORK(&f->work, fw_delete_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void fw_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
{
struct fw_head *head = rtnl_dereference(tp->root);
@@ -164,7 +154,7 @@
rtnl_dereference(f->next));
tcf_unbind_filter(tp, &f->res);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, fw_delete_filter);
+ tcf_queue_work(&f->rwork, fw_delete_filter_work);
else
__fw_delete_filter(f);
}
@@ -193,7 +183,7 @@
RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
tcf_unbind_filter(tp, &f->res);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, fw_delete_filter);
+ tcf_queue_work(&f->rwork, fw_delete_filter_work);
ret = 0;
break;
}
@@ -316,7 +306,7 @@
rcu_assign_pointer(*fp, fnew);
tcf_unbind_filter(tp, &f->res);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, fw_delete_filter);
+ tcf_queue_work(&f->rwork, fw_delete_filter_work);
*arg = fnew;
return err;
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 2ba721a..47b207e 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -21,10 +21,7 @@
struct tcf_result res;
u32 handle;
u32 flags;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static int mall_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -53,22 +50,14 @@
static void mall_destroy_work(struct work_struct *work)
{
- struct cls_mall_head *head = container_of(work, struct cls_mall_head,
- work);
+ struct cls_mall_head *head = container_of(to_rcu_work(work),
+ struct cls_mall_head,
+ rwork);
rtnl_lock();
__mall_destroy(head);
rtnl_unlock();
}
-static void mall_destroy_rcu(struct rcu_head *rcu)
-{
- struct cls_mall_head *head = container_of(rcu, struct cls_mall_head,
- rcu);
-
- INIT_WORK(&head->work, mall_destroy_work);
- tcf_queue_work(&head->work);
-}
-
static void mall_destroy_hw_filter(struct tcf_proto *tp,
struct cls_mall_head *head,
unsigned long cookie,
@@ -126,7 +115,7 @@
mall_destroy_hw_filter(tp, head, (unsigned long) head, extack);
if (tcf_exts_get_net(&head->exts))
- call_rcu(&head->rcu, mall_destroy_rcu);
+ tcf_queue_work(&head->rwork, mall_destroy_work);
else
__mall_destroy(head);
}
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 21a03a8..0404aa5 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -57,10 +57,7 @@
u32 handle;
struct route4_bucket *bkt;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
#define ROUTE4_FAILURE ((struct route4_filter *)(-1L))
@@ -266,19 +263,17 @@
static void route4_delete_filter_work(struct work_struct *work)
{
- struct route4_filter *f = container_of(work, struct route4_filter, work);
-
+ struct route4_filter *f = container_of(to_rcu_work(work),
+ struct route4_filter,
+ rwork);
rtnl_lock();
__route4_delete_filter(f);
rtnl_unlock();
}
-static void route4_delete_filter(struct rcu_head *head)
+static void route4_queue_work(struct route4_filter *f)
{
- struct route4_filter *f = container_of(head, struct route4_filter, rcu);
-
- INIT_WORK(&f->work, route4_delete_filter_work);
- tcf_queue_work(&f->work);
+ tcf_queue_work(&f->rwork, route4_delete_filter_work);
}
static void route4_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
@@ -304,7 +299,7 @@
RCU_INIT_POINTER(b->ht[h2], next);
tcf_unbind_filter(tp, &f->res);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, route4_delete_filter);
+ route4_queue_work(f);
else
__route4_delete_filter(f);
}
@@ -349,7 +344,7 @@
/* Delete it */
tcf_unbind_filter(tp, &f->res);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, route4_delete_filter);
+ tcf_queue_work(&f->rwork, route4_delete_filter_work);
/* Strip RTNL protected tree */
for (i = 0; i <= 32; i++) {
@@ -554,7 +549,7 @@
if (fold) {
tcf_unbind_filter(tp, &fold->res);
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, route4_delete_filter);
+ tcf_queue_work(&fold->rwork, route4_delete_filter_work);
}
return 0;
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 4f12976..e9ccf7d 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -97,10 +97,7 @@
u32 handle;
struct rsvp_session *sess;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid)
@@ -294,21 +291,14 @@
static void rsvp_delete_filter_work(struct work_struct *work)
{
- struct rsvp_filter *f = container_of(work, struct rsvp_filter, work);
-
+ struct rsvp_filter *f = container_of(to_rcu_work(work),
+ struct rsvp_filter,
+ rwork);
rtnl_lock();
__rsvp_delete_filter(f);
rtnl_unlock();
}
-static void rsvp_delete_filter_rcu(struct rcu_head *head)
-{
- struct rsvp_filter *f = container_of(head, struct rsvp_filter, rcu);
-
- INIT_WORK(&f->work, rsvp_delete_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
{
tcf_unbind_filter(tp, &f->res);
@@ -317,7 +307,7 @@
* in cleanup() callback
*/
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, rsvp_delete_filter_rcu);
+ tcf_queue_work(&f->rwork, rsvp_delete_filter_work);
else
__rsvp_delete_filter(f);
}
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index b49cc99..32f4bbd 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -28,20 +28,14 @@
struct tcindex_filter_result {
struct tcf_exts exts;
struct tcf_result res;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
struct tcindex_filter {
u16 key;
struct tcindex_filter_result result;
struct tcindex_filter __rcu *next;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
@@ -152,21 +146,14 @@
{
struct tcindex_filter_result *r;
- r = container_of(work, struct tcindex_filter_result, work);
+ r = container_of(to_rcu_work(work),
+ struct tcindex_filter_result,
+ rwork);
rtnl_lock();
__tcindex_destroy_rexts(r);
rtnl_unlock();
}
-static void tcindex_destroy_rexts(struct rcu_head *head)
-{
- struct tcindex_filter_result *r;
-
- r = container_of(head, struct tcindex_filter_result, rcu);
- INIT_WORK(&r->work, tcindex_destroy_rexts_work);
- tcf_queue_work(&r->work);
-}
-
static void __tcindex_destroy_fexts(struct tcindex_filter *f)
{
tcf_exts_destroy(&f->result.exts);
@@ -176,23 +163,15 @@
static void tcindex_destroy_fexts_work(struct work_struct *work)
{
- struct tcindex_filter *f = container_of(work, struct tcindex_filter,
- work);
+ struct tcindex_filter *f = container_of(to_rcu_work(work),
+ struct tcindex_filter,
+ rwork);
rtnl_lock();
__tcindex_destroy_fexts(f);
rtnl_unlock();
}
-static void tcindex_destroy_fexts(struct rcu_head *head)
-{
- struct tcindex_filter *f = container_of(head, struct tcindex_filter,
- rcu);
-
- INIT_WORK(&f->work, tcindex_destroy_fexts_work);
- tcf_queue_work(&f->work);
-}
-
static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last,
struct netlink_ext_ack *extack)
{
@@ -228,12 +207,12 @@
*/
if (f) {
if (tcf_exts_get_net(&f->result.exts))
- call_rcu(&f->rcu, tcindex_destroy_fexts);
+ tcf_queue_work(&f->rwork, tcindex_destroy_fexts_work);
else
__tcindex_destroy_fexts(f);
} else {
if (tcf_exts_get_net(&r->exts))
- call_rcu(&r->rcu, tcindex_destroy_rexts);
+ tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work);
else
__tcindex_destroy_rexts(r);
}
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index bac47b5..fb861f9 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -68,10 +68,7 @@
u32 __percpu *pcpu_success;
#endif
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
/* The 'sel' field MUST be the last field in structure to allow for
* tc_u32_keys allocated at end of structure.
*/
@@ -436,21 +433,14 @@
*/
static void u32_delete_key_work(struct work_struct *work)
{
- struct tc_u_knode *key = container_of(work, struct tc_u_knode, work);
-
+ struct tc_u_knode *key = container_of(to_rcu_work(work),
+ struct tc_u_knode,
+ rwork);
rtnl_lock();
u32_destroy_key(key->tp, key, false);
rtnl_unlock();
}
-static void u32_delete_key_rcu(struct rcu_head *rcu)
-{
- struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
-
- INIT_WORK(&key->work, u32_delete_key_work);
- tcf_queue_work(&key->work);
-}
-
/* u32_delete_key_freepf_rcu is the rcu callback variant
* that free's the entire structure including the statistics
* percpu variables. Only use this if the key is not a copy
@@ -460,21 +450,14 @@
*/
static void u32_delete_key_freepf_work(struct work_struct *work)
{
- struct tc_u_knode *key = container_of(work, struct tc_u_knode, work);
-
+ struct tc_u_knode *key = container_of(to_rcu_work(work),
+ struct tc_u_knode,
+ rwork);
rtnl_lock();
u32_destroy_key(key->tp, key, true);
rtnl_unlock();
}
-static void u32_delete_key_freepf_rcu(struct rcu_head *rcu)
-{
- struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
-
- INIT_WORK(&key->work, u32_delete_key_freepf_work);
- tcf_queue_work(&key->work);
-}
-
static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
{
struct tc_u_knode __rcu **kp;
@@ -491,7 +474,7 @@
tcf_unbind_filter(tp, &key->res);
idr_remove(&ht->handle_idr, key->handle);
tcf_exts_get_net(&key->exts);
- call_rcu(&key->rcu, u32_delete_key_freepf_rcu);
+ tcf_queue_work(&key->rwork, u32_delete_key_freepf_work);
return 0;
}
}
@@ -611,7 +594,7 @@
u32_remove_hw_knode(tp, n, extack);
idr_remove(&ht->handle_idr, n->handle);
if (tcf_exts_get_net(&n->exts))
- call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
+ tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
else
u32_destroy_key(n->tp, n, true);
}
@@ -995,7 +978,7 @@
u32_replace_knode(tp, tp_c, new);
tcf_unbind_filter(tp, &n->res);
tcf_exts_get_net(&n->exts);
- call_rcu(&n->rcu, u32_delete_key_rcu);
+ tcf_queue_work(&n->rwork, u32_delete_key_work);
return 0;
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 39c144b..760ab1b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -373,33 +373,24 @@
*/
static inline bool qdisc_restart(struct Qdisc *q, int *packets)
{
- bool more, validate, nolock = q->flags & TCQ_F_NOLOCK;
spinlock_t *root_lock = NULL;
struct netdev_queue *txq;
struct net_device *dev;
struct sk_buff *skb;
+ bool validate;
/* Dequeue packet */
- if (nolock && test_and_set_bit(__QDISC_STATE_RUNNING, &q->state))
- return false;
-
skb = dequeue_skb(q, &validate, packets);
- if (unlikely(!skb)) {
- if (nolock)
- clear_bit(__QDISC_STATE_RUNNING, &q->state);
+ if (unlikely(!skb))
return false;
- }
- if (!nolock)
+ if (!(q->flags & TCQ_F_NOLOCK))
root_lock = qdisc_lock(q);
dev = qdisc_dev(q);
txq = skb_get_tx_queue(dev, skb);
- more = sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
- if (nolock)
- clear_bit(__QDISC_STATE_RUNNING, &q->state);
- return more;
+ return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
}
void __qdisc_run(struct Qdisc *q)
@@ -665,7 +656,7 @@
if (__skb_array_empty(q))
continue;
- skb = skb_array_consume_bh(q);
+ skb = __skb_array_consume(q);
}
if (likely(skb)) {
qdisc_qstats_cpu_backlog_dec(qdisc, skb);
@@ -706,7 +697,7 @@
if (!q->ring.queue)
continue;
- while ((skb = skb_array_consume_bh(q)) != NULL)
+ while ((skb = __skb_array_consume(q)) != NULL)
kfree_skb(skb);
}
@@ -867,6 +858,11 @@
lockdep_set_class(&sch->busylock,
dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
+ /* seqlock has the same scope of busylock, for NOLOCK qdisc */
+ spin_lock_init(&sch->seqlock);
+ lockdep_set_class(&sch->busylock,
+ dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
+
seqcount_init(&sch->running);
lockdep_set_class(&sch->running,
dev->qdisc_running_key ?: &qdisc_running_key);
@@ -1106,6 +1102,10 @@
qdisc = rtnl_dereference(dev_queue->qdisc);
if (qdisc) {
+ bool nolock = qdisc->flags & TCQ_F_NOLOCK;
+
+ if (nolock)
+ spin_lock_bh(&qdisc->seqlock);
spin_lock_bh(qdisc_lock(qdisc));
if (!(qdisc->flags & TCQ_F_BUILTIN))
@@ -1115,6 +1115,8 @@
qdisc_reset(qdisc);
spin_unlock_bh(qdisc_lock(qdisc));
+ if (nolock)
+ spin_unlock_bh(&qdisc->seqlock);
}
}
@@ -1131,17 +1133,13 @@
dev_queue = netdev_get_tx_queue(dev, i);
q = dev_queue->qdisc_sleeping;
- if (q->flags & TCQ_F_NOLOCK) {
- val = test_bit(__QDISC_STATE_SCHED, &q->state);
- } else {
- root_lock = qdisc_lock(q);
- spin_lock_bh(root_lock);
+ root_lock = qdisc_lock(q);
+ spin_lock_bh(root_lock);
- val = (qdisc_is_running(q) ||
- test_bit(__QDISC_STATE_SCHED, &q->state));
+ val = (qdisc_is_running(q) ||
+ test_bit(__QDISC_STATE_SCHED, &q->state));
- spin_unlock_bh(root_lock);
- }
+ spin_unlock_bh(root_lock);
if (val)
return true;
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index a47179d..5d5a162 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -652,33 +652,20 @@
*/
peer->param_flags = asoc->param_flags;
- sctp_transport_route(peer, NULL, sp);
-
/* Initialize the pmtu of the transport. */
- if (peer->param_flags & SPP_PMTUD_DISABLE) {
- if (asoc->pathmtu)
- peer->pathmtu = asoc->pathmtu;
- else
- peer->pathmtu = SCTP_DEFAULT_MAXSEGMENT;
- }
+ sctp_transport_route(peer, NULL, sp);
/* If this is the first transport addr on this association,
* initialize the association PMTU to the peer's PMTU.
* If not and the current association PMTU is higher than the new
* peer's PMTU, reset the association PMTU to the new peer's PMTU.
*/
- if (asoc->pathmtu)
- asoc->pathmtu = min_t(int, peer->pathmtu, asoc->pathmtu);
- else
- asoc->pathmtu = peer->pathmtu;
-
- pr_debug("%s: association:%p PMTU set to %d\n", __func__, asoc,
- asoc->pathmtu);
+ sctp_assoc_set_pmtu(asoc, asoc->pathmtu ?
+ min_t(int, peer->pathmtu, asoc->pathmtu) :
+ peer->pathmtu);
peer->pmtu_pending = 0;
- asoc->frag_point = sctp_frag_point(asoc, asoc->pathmtu);
-
/* The asoc->peer.port might not be meaningful yet, but
* initialize the packet structure anyway.
*/
@@ -988,31 +975,6 @@
return match;
}
-/* Is this the association we are looking for? */
-struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
- struct net *net,
- const union sctp_addr *laddr,
- const union sctp_addr *paddr)
-{
- struct sctp_transport *transport;
-
- if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
- (htons(asoc->peer.port) == paddr->v4.sin_port) &&
- net_eq(sock_net(asoc->base.sk), net)) {
- transport = sctp_assoc_lookup_paddr(asoc, paddr);
- if (!transport)
- goto out;
-
- if (sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
- sctp_sk(asoc->base.sk)))
- goto out;
- }
- transport = NULL;
-
-out:
- return transport;
-}
-
/* Do delayed input processing. This is scheduled by sctp_rcv(). */
static void sctp_assoc_bh_rcv(struct work_struct *work)
{
@@ -1434,6 +1396,31 @@
}
}
+void sctp_assoc_update_frag_point(struct sctp_association *asoc)
+{
+ int frag = sctp_mtu_payload(sctp_sk(asoc->base.sk), asoc->pathmtu,
+ sctp_datachk_len(&asoc->stream));
+
+ if (asoc->user_frag)
+ frag = min_t(int, frag, asoc->user_frag);
+
+ frag = min_t(int, frag, SCTP_MAX_CHUNK_LEN -
+ sctp_datachk_len(&asoc->stream));
+
+ asoc->frag_point = SCTP_TRUNC4(frag);
+}
+
+void sctp_assoc_set_pmtu(struct sctp_association *asoc, __u32 pmtu)
+{
+ if (asoc->pathmtu != pmtu) {
+ asoc->pathmtu = pmtu;
+ sctp_assoc_update_frag_point(asoc);
+ }
+
+ pr_debug("%s: asoc:%p, pmtu:%d, frag_point:%d\n", __func__, asoc,
+ asoc->pathmtu, asoc->frag_point);
+}
+
/* Update the association's pmtu and frag_point by going through all the
* transports. This routine is called when a transport's PMTU has changed.
*/
@@ -1446,24 +1433,16 @@
return;
/* Get the lowest pmtu of all the transports. */
- list_for_each_entry(t, &asoc->peer.transport_addr_list,
- transports) {
+ list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) {
if (t->pmtu_pending && t->dst) {
- sctp_transport_update_pmtu(
- t, SCTP_TRUNC4(dst_mtu(t->dst)));
+ sctp_transport_update_pmtu(t, sctp_dst_mtu(t->dst));
t->pmtu_pending = 0;
}
if (!pmtu || (t->pathmtu < pmtu))
pmtu = t->pathmtu;
}
- if (pmtu) {
- asoc->pathmtu = pmtu;
- asoc->frag_point = sctp_frag_point(asoc, pmtu);
- }
-
- pr_debug("%s: asoc:%p, pmtu:%d, frag_point:%d\n", __func__, asoc,
- asoc->pathmtu, asoc->frag_point);
+ sctp_assoc_set_pmtu(asoc, pmtu);
}
/* Should we send a SACK to update our peer? */
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index be296d6..79daa98 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -172,8 +172,6 @@
struct list_head *pos, *temp;
struct sctp_chunk *chunk;
struct sctp_datamsg *msg;
- struct sctp_sock *sp;
- struct sctp_af *af;
int err;
msg = sctp_datamsg_new(GFP_KERNEL);
@@ -192,12 +190,7 @@
/* This is the biggest possible DATA chunk that can fit into
* the packet
*/
- sp = sctp_sk(asoc->base.sk);
- af = sp->pf->af;
- max_data = asoc->pathmtu - af->net_header_len -
- sizeof(struct sctphdr) - sctp_datachk_len(&asoc->stream) -
- af->ip_options_len(asoc->base.sk);
- max_data = SCTP_TRUNC4(max_data);
+ max_data = asoc->frag_point;
/* If the the peer requested that we authenticate DATA chunks
* we need to account for bundling of the AUTH chunks along with
@@ -222,9 +215,6 @@
}
}
- /* Check what's our max considering the above */
- max_data = min_t(size_t, max_data, asoc->frag_point);
-
/* Set first_len and then account for possible bundles on first frag */
first_len = max_data;
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 690d855..e672dee 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -90,8 +90,8 @@
{
struct sctp_transport *tp = packet->transport;
struct sctp_association *asoc = tp->asoc;
+ struct sctp_sock *sp = NULL;
struct sock *sk;
- size_t overhead = sizeof(struct ipv6hdr) + sizeof(struct sctphdr);
pr_debug("%s: packet:%p vtag:0x%x\n", __func__, packet, vtag);
packet->vtag = vtag;
@@ -102,28 +102,20 @@
/* set packet max_size with pathmtu, then calculate overhead */
packet->max_size = tp->pathmtu;
- if (asoc) {
- struct sctp_sock *sp = sctp_sk(asoc->base.sk);
- struct sctp_af *af = sp->pf->af;
- overhead = af->net_header_len +
- af->ip_options_len(asoc->base.sk);
- overhead += sizeof(struct sctphdr);
- packet->overhead = overhead;
- packet->size = overhead;
- } else {
- packet->overhead = overhead;
- packet->size = overhead;
- return;
+ if (asoc) {
+ sk = asoc->base.sk;
+ sp = sctp_sk(sk);
}
+ packet->overhead = sctp_mtu_payload(sp, 0, 0);
+ packet->size = packet->overhead;
+
+ if (!asoc)
+ return;
/* update dst or transport pathmtu if in need */
- sk = asoc->base.sk;
if (!sctp_transport_dst_check(tp)) {
- sctp_transport_route(tp, NULL, sctp_sk(sk));
- if (asoc->param_flags & SPP_PMTUD_ENABLE)
- sctp_assoc_sync_pmtu(asoc);
- } else if (!sctp_transport_pmtu_check(tp)) {
+ sctp_transport_route(tp, NULL, sp);
if (asoc->param_flags & SPP_PMTUD_ENABLE)
sctp_assoc_sync_pmtu(asoc);
}
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index f211b3d..d68aa33 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -601,14 +601,14 @@
/*
* Transmit DATA chunks on the retransmit queue. Upon return from
- * sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which
+ * __sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which
* need to be transmitted by the caller.
* We assume that pkt->transport has already been set.
*
* The return value is a normal kernel error return value.
*/
-static int sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
- int rtx_timeout, int *start_timer)
+static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
+ int rtx_timeout, int *start_timer, gfp_t gfp)
{
struct sctp_transport *transport = pkt->transport;
struct sctp_chunk *chunk, *chunk1;
@@ -684,12 +684,12 @@
* control chunks are already freed so there
* is nothing we can do.
*/
- sctp_packet_transmit(pkt, GFP_ATOMIC);
+ sctp_packet_transmit(pkt, gfp);
goto redo;
}
/* Send this packet. */
- error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+ error = sctp_packet_transmit(pkt, gfp);
/* If we are retransmitting, we should only
* send a single packet.
@@ -705,7 +705,7 @@
case SCTP_XMIT_RWND_FULL:
/* Send this packet. */
- error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+ error = sctp_packet_transmit(pkt, gfp);
/* Stop sending DATA as there is no more room
* at the receiver.
@@ -715,7 +715,7 @@
case SCTP_XMIT_DELAY:
/* Send this packet. */
- error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+ error = sctp_packet_transmit(pkt, gfp);
/* Stop sending DATA because of nagle delay. */
done = 1;
@@ -776,68 +776,43 @@
sctp_outq_flush(q, 0, gfp);
}
-
-/*
- * Try to flush an outqueue.
- *
- * Description: Send everything in q which we legally can, subject to
- * congestion limitations.
- * * Note: This function can be called from multiple contexts so appropriate
- * locking concerns must be made. Today we use the sock lock to protect
- * this function.
- */
-static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+static int sctp_packet_singleton(struct sctp_transport *transport,
+ struct sctp_chunk *chunk, gfp_t gfp)
{
- struct sctp_packet *packet;
+ const struct sctp_association *asoc = transport->asoc;
+ const __u16 sport = asoc->base.bind_addr.port;
+ const __u16 dport = asoc->peer.port;
+ const __u32 vtag = asoc->peer.i.init_tag;
struct sctp_packet singleton;
- struct sctp_association *asoc = q->asoc;
- __u16 sport = asoc->base.bind_addr.port;
- __u16 dport = asoc->peer.port;
- __u32 vtag = asoc->peer.i.init_tag;
- struct sctp_transport *transport = NULL;
- struct sctp_transport *new_transport;
- struct sctp_chunk *chunk, *tmp;
- enum sctp_xmit status;
- int error = 0;
- int start_timer = 0;
- int one_packet = 0;
+ sctp_packet_init(&singleton, transport, sport, dport);
+ sctp_packet_config(&singleton, vtag, 0);
+ sctp_packet_append_chunk(&singleton, chunk);
+ return sctp_packet_transmit(&singleton, gfp);
+}
+
+/* Struct to hold the context during sctp outq flush */
+struct sctp_flush_ctx {
+ struct sctp_outq *q;
+ /* Current transport being used. It's NOT the same as curr active one */
+ struct sctp_transport *transport;
/* These transports have chunks to send. */
struct list_head transport_list;
- struct list_head *ltransport;
+ struct sctp_association *asoc;
+ /* Packet on the current transport above */
+ struct sctp_packet *packet;
+ gfp_t gfp;
+};
- INIT_LIST_HEAD(&transport_list);
- packet = NULL;
+/* transport: current transport */
+static void sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
+ struct sctp_chunk *chunk)
+{
+ struct sctp_transport *new_transport = chunk->transport;
- /*
- * 6.10 Bundling
- * ...
- * When bundling control chunks with DATA chunks, an
- * endpoint MUST place control chunks first in the outbound
- * SCTP packet. The transmitter MUST transmit DATA chunks
- * within a SCTP packet in increasing order of TSN.
- * ...
- */
-
- list_for_each_entry_safe(chunk, tmp, &q->control_chunk_list, list) {
- /* RFC 5061, 5.3
- * F1) This means that until such time as the ASCONF
- * containing the add is acknowledged, the sender MUST
- * NOT use the new IP address as a source for ANY SCTP
- * packet except on carrying an ASCONF Chunk.
- */
- if (asoc->src_out_of_asoc_ok &&
- chunk->chunk_hdr->type != SCTP_CID_ASCONF)
- continue;
-
- list_del_init(&chunk->list);
-
- /* Pick the right transport to use. */
- new_transport = chunk->transport;
-
- if (!new_transport) {
- /*
- * If we have a prior transport pointer, see if
+ if (!new_transport) {
+ if (!sctp_chunk_is_data(chunk)) {
+ /* If we have a prior transport pointer, see if
* the destination address of the chunk
* matches the destination address of the
* current transport. If not a match, then
@@ -846,22 +821,26 @@
* after processing ASCONFs, we may have new
* transports created.
*/
- if (transport &&
- sctp_cmp_addr_exact(&chunk->dest,
- &transport->ipaddr))
- new_transport = transport;
+ if (ctx->transport && sctp_cmp_addr_exact(&chunk->dest,
+ &ctx->transport->ipaddr))
+ new_transport = ctx->transport;
else
- new_transport = sctp_assoc_lookup_paddr(asoc,
- &chunk->dest);
+ new_transport = sctp_assoc_lookup_paddr(ctx->asoc,
+ &chunk->dest);
+ }
- /* if we still don't have a new transport, then
- * use the current active path.
- */
- if (!new_transport)
- new_transport = asoc->peer.active_path;
- } else if ((new_transport->state == SCTP_INACTIVE) ||
- (new_transport->state == SCTP_UNCONFIRMED) ||
- (new_transport->state == SCTP_PF)) {
+ /* if we still don't have a new transport, then
+ * use the current active path.
+ */
+ if (!new_transport)
+ new_transport = ctx->asoc->peer.active_path;
+ } else {
+ __u8 type;
+
+ switch (new_transport->state) {
+ case SCTP_INACTIVE:
+ case SCTP_UNCONFIRMED:
+ case SCTP_PF:
/* If the chunk is Heartbeat or Heartbeat Ack,
* send it to chunk->transport, even if it's
* inactive.
@@ -875,29 +854,64 @@
*
* ASCONF_ACKs also must be sent to the source.
*/
- if (chunk->chunk_hdr->type != SCTP_CID_HEARTBEAT &&
- chunk->chunk_hdr->type != SCTP_CID_HEARTBEAT_ACK &&
- chunk->chunk_hdr->type != SCTP_CID_ASCONF_ACK)
- new_transport = asoc->peer.active_path;
+ type = chunk->chunk_hdr->type;
+ if (type != SCTP_CID_HEARTBEAT &&
+ type != SCTP_CID_HEARTBEAT_ACK &&
+ type != SCTP_CID_ASCONF_ACK)
+ new_transport = ctx->asoc->peer.active_path;
+ break;
+ default:
+ break;
}
+ }
- /* Are we switching transports?
- * Take care of transport locks.
+ /* Are we switching transports? Take care of transport locks. */
+ if (new_transport != ctx->transport) {
+ ctx->transport = new_transport;
+ ctx->packet = &ctx->transport->packet;
+
+ if (list_empty(&ctx->transport->send_ready))
+ list_add_tail(&ctx->transport->send_ready,
+ &ctx->transport_list);
+
+ sctp_packet_config(ctx->packet,
+ ctx->asoc->peer.i.init_tag,
+ ctx->asoc->peer.ecn_capable);
+ /* We've switched transports, so apply the
+ * Burst limit to the new transport.
*/
- if (new_transport != transport) {
- transport = new_transport;
- if (list_empty(&transport->send_ready)) {
- list_add_tail(&transport->send_ready,
- &transport_list);
- }
- packet = &transport->packet;
- sctp_packet_config(packet, vtag,
- asoc->peer.ecn_capable);
- }
+ sctp_transport_burst_limited(ctx->transport);
+ }
+}
+
+static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
+{
+ struct sctp_chunk *chunk, *tmp;
+ enum sctp_xmit status;
+ int one_packet, error;
+
+ list_for_each_entry_safe(chunk, tmp, &ctx->q->control_chunk_list, list) {
+ one_packet = 0;
+
+ /* RFC 5061, 5.3
+ * F1) This means that until such time as the ASCONF
+ * containing the add is acknowledged, the sender MUST
+ * NOT use the new IP address as a source for ANY SCTP
+ * packet except on carrying an ASCONF Chunk.
+ */
+ if (ctx->asoc->src_out_of_asoc_ok &&
+ chunk->chunk_hdr->type != SCTP_CID_ASCONF)
+ continue;
+
+ list_del_init(&chunk->list);
+
+ /* Pick the right transport to use. Should always be true for
+ * the first chunk as we don't have a transport by then.
+ */
+ sctp_outq_select_transport(ctx, chunk);
switch (chunk->chunk_hdr->type) {
- /*
- * 6.10 Bundling
+ /* 6.10 Bundling
* ...
* An endpoint MUST NOT bundle INIT, INIT ACK or SHUTDOWN
* COMPLETE with any other chunks. [Send them immediately.]
@@ -905,20 +919,19 @@
case SCTP_CID_INIT:
case SCTP_CID_INIT_ACK:
case SCTP_CID_SHUTDOWN_COMPLETE:
- sctp_packet_init(&singleton, transport, sport, dport);
- sctp_packet_config(&singleton, vtag, 0);
- sctp_packet_append_chunk(&singleton, chunk);
- error = sctp_packet_transmit(&singleton, gfp);
+ error = sctp_packet_singleton(ctx->transport, chunk,
+ ctx->gfp);
if (error < 0) {
- asoc->base.sk->sk_err = -error;
+ ctx->asoc->base.sk->sk_err = -error;
return;
}
break;
case SCTP_CID_ABORT:
if (sctp_test_T_bit(chunk))
- packet->vtag = asoc->c.my_vtag;
+ ctx->packet->vtag = ctx->asoc->c.my_vtag;
/* fallthru */
+
/* The following chunks are "response" chunks, i.e.
* they are generated in response to something we
* received. If we are sending these, then we can
@@ -942,27 +955,27 @@
case SCTP_CID_FWD_TSN:
case SCTP_CID_I_FWD_TSN:
case SCTP_CID_RECONF:
- status = sctp_packet_transmit_chunk(packet, chunk,
- one_packet, gfp);
- if (status != SCTP_XMIT_OK) {
+ status = sctp_packet_transmit_chunk(ctx->packet, chunk,
+ one_packet, ctx->gfp);
+ if (status != SCTP_XMIT_OK) {
/* put the chunk back */
- list_add(&chunk->list, &q->control_chunk_list);
+ list_add(&chunk->list, &ctx->q->control_chunk_list);
break;
}
- asoc->stats.octrlchunks++;
+ ctx->asoc->stats.octrlchunks++;
/* PR-SCTP C5) If a FORWARD TSN is sent, the
* sender MUST assure that at least one T3-rtx
* timer is running.
*/
if (chunk->chunk_hdr->type == SCTP_CID_FWD_TSN ||
chunk->chunk_hdr->type == SCTP_CID_I_FWD_TSN) {
- sctp_transport_reset_t3_rtx(transport);
- transport->last_time_sent = jiffies;
+ sctp_transport_reset_t3_rtx(ctx->transport);
+ ctx->transport->last_time_sent = jiffies;
}
- if (chunk == asoc->strreset_chunk)
- sctp_transport_reset_reconf_timer(transport);
+ if (chunk == ctx->asoc->strreset_chunk)
+ sctp_transport_reset_reconf_timer(ctx->transport);
break;
@@ -971,232 +984,186 @@
BUG();
}
}
+}
- if (q->asoc->src_out_of_asoc_ok)
- goto sctp_flush_out;
+/* Returns false if new data shouldn't be sent */
+static bool sctp_outq_flush_rtx(struct sctp_flush_ctx *ctx,
+ int rtx_timeout)
+{
+ int error, start_timer = 0;
+
+ if (ctx->asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
+ return false;
+
+ if (ctx->transport != ctx->asoc->peer.retran_path) {
+ /* Switch transports & prepare the packet. */
+ ctx->transport = ctx->asoc->peer.retran_path;
+ ctx->packet = &ctx->transport->packet;
+
+ if (list_empty(&ctx->transport->send_ready))
+ list_add_tail(&ctx->transport->send_ready,
+ &ctx->transport_list);
+
+ sctp_packet_config(ctx->packet, ctx->asoc->peer.i.init_tag,
+ ctx->asoc->peer.ecn_capable);
+ }
+
+ error = __sctp_outq_flush_rtx(ctx->q, ctx->packet, rtx_timeout,
+ &start_timer, ctx->gfp);
+ if (error < 0)
+ ctx->asoc->base.sk->sk_err = -error;
+
+ if (start_timer) {
+ sctp_transport_reset_t3_rtx(ctx->transport);
+ ctx->transport->last_time_sent = jiffies;
+ }
+
+ /* This can happen on COOKIE-ECHO resend. Only
+ * one chunk can get bundled with a COOKIE-ECHO.
+ */
+ if (ctx->packet->has_cookie_echo)
+ return false;
+
+ /* Don't send new data if there is still data
+ * waiting to retransmit.
+ */
+ if (!list_empty(&ctx->q->retransmit))
+ return false;
+
+ return true;
+}
+
+static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
+ int rtx_timeout)
+{
+ struct sctp_chunk *chunk;
+ enum sctp_xmit status;
/* Is it OK to send data chunks? */
- switch (asoc->state) {
+ switch (ctx->asoc->state) {
case SCTP_STATE_COOKIE_ECHOED:
/* Only allow bundling when this packet has a COOKIE-ECHO
* chunk.
*/
- if (!packet || !packet->has_cookie_echo)
- break;
+ if (!ctx->packet || !ctx->packet->has_cookie_echo)
+ return;
/* fallthru */
case SCTP_STATE_ESTABLISHED:
case SCTP_STATE_SHUTDOWN_PENDING:
case SCTP_STATE_SHUTDOWN_RECEIVED:
- /*
- * RFC 2960 6.1 Transmission of DATA Chunks
- *
- * C) When the time comes for the sender to transmit,
- * before sending new DATA chunks, the sender MUST
- * first transmit any outstanding DATA chunks which
- * are marked for retransmission (limited by the
- * current cwnd).
- */
- if (!list_empty(&q->retransmit)) {
- if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
- goto sctp_flush_out;
- if (transport == asoc->peer.retran_path)
- goto retran;
-
- /* Switch transports & prepare the packet. */
-
- transport = asoc->peer.retran_path;
-
- if (list_empty(&transport->send_ready)) {
- list_add_tail(&transport->send_ready,
- &transport_list);
- }
-
- packet = &transport->packet;
- sctp_packet_config(packet, vtag,
- asoc->peer.ecn_capable);
- retran:
- error = sctp_outq_flush_rtx(q, packet,
- rtx_timeout, &start_timer);
- if (error < 0)
- asoc->base.sk->sk_err = -error;
-
- if (start_timer) {
- sctp_transport_reset_t3_rtx(transport);
- transport->last_time_sent = jiffies;
- }
-
- /* This can happen on COOKIE-ECHO resend. Only
- * one chunk can get bundled with a COOKIE-ECHO.
- */
- if (packet->has_cookie_echo)
- goto sctp_flush_out;
-
- /* Don't send new data if there is still data
- * waiting to retransmit.
- */
- if (!list_empty(&q->retransmit))
- goto sctp_flush_out;
- }
-
- /* Apply Max.Burst limitation to the current transport in
- * case it will be used for new data. We are going to
- * rest it before we return, but we want to apply the limit
- * to the currently queued data.
- */
- if (transport)
- sctp_transport_burst_limited(transport);
-
- /* Finally, transmit new packets. */
- while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
- __u32 sid = ntohs(chunk->subh.data_hdr->stream);
-
- /* Has this chunk expired? */
- if (sctp_chunk_abandoned(chunk)) {
- sctp_sched_dequeue_done(q, chunk);
- sctp_chunk_fail(chunk, 0);
- sctp_chunk_free(chunk);
- continue;
- }
-
- if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
- sctp_outq_head_data(q, chunk);
- goto sctp_flush_out;
- }
-
- /* If there is a specified transport, use it.
- * Otherwise, we want to use the active path.
- */
- new_transport = chunk->transport;
- if (!new_transport ||
- ((new_transport->state == SCTP_INACTIVE) ||
- (new_transport->state == SCTP_UNCONFIRMED) ||
- (new_transport->state == SCTP_PF)))
- new_transport = asoc->peer.active_path;
- if (new_transport->state == SCTP_UNCONFIRMED) {
- WARN_ONCE(1, "Attempt to send packet on unconfirmed path.");
- sctp_sched_dequeue_done(q, chunk);
- sctp_chunk_fail(chunk, 0);
- sctp_chunk_free(chunk);
- continue;
- }
-
- /* Change packets if necessary. */
- if (new_transport != transport) {
- transport = new_transport;
-
- /* Schedule to have this transport's
- * packet flushed.
- */
- if (list_empty(&transport->send_ready)) {
- list_add_tail(&transport->send_ready,
- &transport_list);
- }
-
- packet = &transport->packet;
- sctp_packet_config(packet, vtag,
- asoc->peer.ecn_capable);
- /* We've switched transports, so apply the
- * Burst limit to the new transport.
- */
- sctp_transport_burst_limited(transport);
- }
-
- pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
- "skb->users:%d\n",
- __func__, q, chunk, chunk && chunk->chunk_hdr ?
- sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
- "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
- chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
- refcount_read(&chunk->skb->users) : -1);
-
- /* Add the chunk to the packet. */
- status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
-
- switch (status) {
- case SCTP_XMIT_PMTU_FULL:
- case SCTP_XMIT_RWND_FULL:
- case SCTP_XMIT_DELAY:
- /* We could not append this chunk, so put
- * the chunk back on the output queue.
- */
- pr_debug("%s: could not transmit tsn:0x%x, status:%d\n",
- __func__, ntohl(chunk->subh.data_hdr->tsn),
- status);
-
- sctp_outq_head_data(q, chunk);
- goto sctp_flush_out;
-
- case SCTP_XMIT_OK:
- /* The sender is in the SHUTDOWN-PENDING state,
- * The sender MAY set the I-bit in the DATA
- * chunk header.
- */
- if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
- chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
- if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
- asoc->stats.ouodchunks++;
- else
- asoc->stats.oodchunks++;
-
- /* Only now it's safe to consider this
- * chunk as sent, sched-wise.
- */
- sctp_sched_dequeue_done(q, chunk);
-
- break;
-
- default:
- BUG();
- }
-
- /* BUG: We assume that the sctp_packet_transmit()
- * call below will succeed all the time and add the
- * chunk to the transmitted list and restart the
- * timers.
- * It is possible that the call can fail under OOM
- * conditions.
- *
- * Is this really a problem? Won't this behave
- * like a lost TSN?
- */
- list_add_tail(&chunk->transmitted_list,
- &transport->transmitted);
-
- sctp_transport_reset_t3_rtx(transport);
- transport->last_time_sent = jiffies;
-
- /* Only let one DATA chunk get bundled with a
- * COOKIE-ECHO chunk.
- */
- if (packet->has_cookie_echo)
- goto sctp_flush_out;
- }
break;
default:
- /* Do nothing. */
- break;
+ /* Do nothing. */
+ return;
}
-sctp_flush_out:
-
- /* Before returning, examine all the transports touched in
- * this call. Right now, we bluntly force clear all the
- * transports. Things might change after we implement Nagle.
- * But such an examination is still required.
+ /* RFC 2960 6.1 Transmission of DATA Chunks
*
- * --xguo
+ * C) When the time comes for the sender to transmit,
+ * before sending new DATA chunks, the sender MUST
+ * first transmit any outstanding DATA chunks which
+ * are marked for retransmission (limited by the
+ * current cwnd).
*/
- while ((ltransport = sctp_list_dequeue(&transport_list)) != NULL) {
- struct sctp_transport *t = list_entry(ltransport,
- struct sctp_transport,
- send_ready);
+ if (!list_empty(&ctx->q->retransmit) &&
+ !sctp_outq_flush_rtx(ctx, rtx_timeout))
+ return;
+
+ /* Apply Max.Burst limitation to the current transport in
+ * case it will be used for new data. We are going to
+ * rest it before we return, but we want to apply the limit
+ * to the currently queued data.
+ */
+ if (ctx->transport)
+ sctp_transport_burst_limited(ctx->transport);
+
+ /* Finally, transmit new packets. */
+ while ((chunk = sctp_outq_dequeue_data(ctx->q)) != NULL) {
+ __u32 sid = ntohs(chunk->subh.data_hdr->stream);
+
+ /* Has this chunk expired? */
+ if (sctp_chunk_abandoned(chunk)) {
+ sctp_sched_dequeue_done(ctx->q, chunk);
+ sctp_chunk_fail(chunk, 0);
+ sctp_chunk_free(chunk);
+ continue;
+ }
+
+ if (ctx->asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
+ sctp_outq_head_data(ctx->q, chunk);
+ break;
+ }
+
+ sctp_outq_select_transport(ctx, chunk);
+
+ pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p skb->users:%d\n",
+ __func__, ctx->q, chunk, chunk && chunk->chunk_hdr ?
+ sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
+ "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
+ chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
+ refcount_read(&chunk->skb->users) : -1);
+
+ /* Add the chunk to the packet. */
+ status = sctp_packet_transmit_chunk(ctx->packet, chunk, 0,
+ ctx->gfp);
+ if (status != SCTP_XMIT_OK) {
+ /* We could not append this chunk, so put
+ * the chunk back on the output queue.
+ */
+ pr_debug("%s: could not transmit tsn:0x%x, status:%d\n",
+ __func__, ntohl(chunk->subh.data_hdr->tsn),
+ status);
+
+ sctp_outq_head_data(ctx->q, chunk);
+ break;
+ }
+
+ /* The sender is in the SHUTDOWN-PENDING state,
+ * The sender MAY set the I-bit in the DATA
+ * chunk header.
+ */
+ if (ctx->asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
+ chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
+ if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
+ ctx->asoc->stats.ouodchunks++;
+ else
+ ctx->asoc->stats.oodchunks++;
+
+ /* Only now it's safe to consider this
+ * chunk as sent, sched-wise.
+ */
+ sctp_sched_dequeue_done(ctx->q, chunk);
+
+ list_add_tail(&chunk->transmitted_list,
+ &ctx->transport->transmitted);
+
+ sctp_transport_reset_t3_rtx(ctx->transport);
+ ctx->transport->last_time_sent = jiffies;
+
+ /* Only let one DATA chunk get bundled with a
+ * COOKIE-ECHO chunk.
+ */
+ if (ctx->packet->has_cookie_echo)
+ break;
+ }
+}
+
+static void sctp_outq_flush_transports(struct sctp_flush_ctx *ctx)
+{
+ struct list_head *ltransport;
+ struct sctp_packet *packet;
+ struct sctp_transport *t;
+ int error = 0;
+
+ while ((ltransport = sctp_list_dequeue(&ctx->transport_list)) != NULL) {
+ t = list_entry(ltransport, struct sctp_transport, send_ready);
packet = &t->packet;
if (!sctp_packet_empty(packet)) {
- error = sctp_packet_transmit(packet, gfp);
+ error = sctp_packet_transmit(packet, ctx->gfp);
if (error < 0)
- asoc->base.sk->sk_err = -error;
+ ctx->q->asoc->base.sk->sk_err = -error;
}
/* Clear the burst limited state, if any */
@@ -1204,6 +1171,47 @@
}
}
+/* Try to flush an outqueue.
+ *
+ * Description: Send everything in q which we legally can, subject to
+ * congestion limitations.
+ * * Note: This function can be called from multiple contexts so appropriate
+ * locking concerns must be made. Today we use the sock lock to protect
+ * this function.
+ */
+
+static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+{
+ struct sctp_flush_ctx ctx = {
+ .q = q,
+ .transport = NULL,
+ .transport_list = LIST_HEAD_INIT(ctx.transport_list),
+ .asoc = q->asoc,
+ .packet = NULL,
+ .gfp = gfp,
+ };
+
+ /* 6.10 Bundling
+ * ...
+ * When bundling control chunks with DATA chunks, an
+ * endpoint MUST place control chunks first in the outbound
+ * SCTP packet. The transmitter MUST transmit DATA chunks
+ * within a SCTP packet in increasing order of TSN.
+ * ...
+ */
+
+ sctp_outq_flush_ctrl(&ctx);
+
+ if (q->asoc->src_out_of_asoc_ok)
+ goto sctp_flush_out;
+
+ sctp_outq_flush_data(&ctx, rtx_timeout);
+
+sctp_flush_out:
+
+ sctp_outq_flush_transports(&ctx);
+}
+
/* Update unack_data based on the incoming SACK chunk */
static void sctp_sack_update_unack_data(struct sctp_association *assoc,
struct sctp_sackhdr *sack)
@@ -1457,7 +1465,7 @@
* the outstanding bytes for this chunk, so only
* count bytes associated with a transport.
*/
- if (transport) {
+ if (transport && !tchunk->tsn_gap_acked) {
/* If this chunk is being used for RTT
* measurement, calculate the RTT and update
* the RTO using this value.
@@ -1469,14 +1477,34 @@
* first instance of the packet or a later
* instance).
*/
- if (!tchunk->tsn_gap_acked &&
- !sctp_chunk_retransmitted(tchunk) &&
+ if (!sctp_chunk_retransmitted(tchunk) &&
tchunk->rtt_in_progress) {
tchunk->rtt_in_progress = 0;
rtt = jiffies - tchunk->sent_at;
sctp_transport_update_rto(transport,
rtt);
}
+
+ if (TSN_lte(tsn, sack_ctsn)) {
+ /*
+ * SFR-CACC algorithm:
+ * 2) If the SACK contains gap acks
+ * and the flag CHANGEOVER_ACTIVE is
+ * set the receiver of the SACK MUST
+ * take the following action:
+ *
+ * B) For each TSN t being acked that
+ * has not been acked in any SACK so
+ * far, set cacc_saw_newack to 1 for
+ * the destination that the TSN was
+ * sent to.
+ */
+ if (sack->num_gap_ack_blocks &&
+ q->asoc->peer.primary_path->cacc.
+ changeover_active)
+ transport->cacc.cacc_saw_newack
+ = 1;
+ }
}
/* If the chunk hasn't been marked as ACKED,
@@ -1508,28 +1536,6 @@
restart_timer = 1;
forward_progress = true;
- if (!tchunk->tsn_gap_acked) {
- /*
- * SFR-CACC algorithm:
- * 2) If the SACK contains gap acks
- * and the flag CHANGEOVER_ACTIVE is
- * set the receiver of the SACK MUST
- * take the following action:
- *
- * B) For each TSN t being acked that
- * has not been acked in any SACK so
- * far, set cacc_saw_newack to 1 for
- * the destination that the TSN was
- * sent to.
- */
- if (transport &&
- sack->num_gap_ack_blocks &&
- q->asoc->peer.primary_path->cacc.
- changeover_active)
- transport->cacc.cacc_saw_newack
- = 1;
- }
-
list_add_tail(&tchunk->transmitted_list,
&q->sacked);
} else {
@@ -1756,7 +1762,7 @@
if (TSN_lte(tsn, ctsn))
goto pass;
- /* 3.3.4 Selective Acknowledgement (SACK) (3):
+ /* 3.3.4 Selective Acknowledgment (SACK) (3):
*
* Gap Ack Blocks:
* These fields contain the Gap Ack Blocks. They are repeated
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index e62addb..4a4fd19 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -81,8 +81,6 @@
gfp_t gfp);
static void *sctp_addto_param(struct sctp_chunk *chunk, int len,
const void *data);
-static void *sctp_addto_chunk_fixed(struct sctp_chunk *, int len,
- const void *data);
/* Control chunk destructor */
static void sctp_control_release_owner(struct sk_buff *skb)
@@ -154,12 +152,11 @@
cpu_to_be16(sizeof(struct sctp_paramhdr)),
};
-/* A helper to initialize an op error inside a
- * provided chunk, as most cause codes will be embedded inside an
- * abort chunk.
+/* A helper to initialize an op error inside a provided chunk, as most
+ * cause codes will be embedded inside an abort chunk.
*/
-void sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
- size_t paylen)
+int sctp_init_cause(struct sctp_chunk *chunk, __be16 cause_code,
+ size_t paylen)
{
struct sctp_errhdr err;
__u16 len;
@@ -167,33 +164,16 @@
/* Cause code constants are now defined in network order. */
err.cause = cause_code;
len = sizeof(err) + paylen;
- err.length = htons(len);
- chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(err), &err);
-}
-
-/* A helper to initialize an op error inside a
- * provided chunk, as most cause codes will be embedded inside an
- * abort chunk. Differs from sctp_init_cause in that it won't oops
- * if there isn't enough space in the op error chunk
- */
-static int sctp_init_cause_fixed(struct sctp_chunk *chunk, __be16 cause_code,
- size_t paylen)
-{
- struct sctp_errhdr err;
- __u16 len;
-
- /* Cause code constants are now defined in network order. */
- err.cause = cause_code;
- len = sizeof(err) + paylen;
- err.length = htons(len);
+ err.length = htons(len);
if (skb_tailroom(chunk->skb) < len)
return -ENOSPC;
- chunk->subh.err_hdr = sctp_addto_chunk_fixed(chunk, sizeof(err), &err);
+ chunk->subh.err_hdr = sctp_addto_chunk(chunk, sizeof(err), &err);
return 0;
}
+
/* 3.3.2 Initiation (INIT) (1)
*
* This chunk is used to initiate a SCTP association between two
@@ -779,10 +759,9 @@
* association. This reports on which TSN's we've seen to date,
* including duplicates and gaps.
*/
-struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
+struct sctp_chunk *sctp_make_sack(struct sctp_association *asoc)
{
struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map;
- struct sctp_association *aptr = (struct sctp_association *)asoc;
struct sctp_gap_ack_block gabs[SCTP_MAX_GABS];
__u16 num_gabs, num_dup_tsns;
struct sctp_transport *trans;
@@ -857,7 +836,7 @@
/* Add the duplicate TSN information. */
if (num_dup_tsns) {
- aptr->stats.idupchunks += num_dup_tsns;
+ asoc->stats.idupchunks += num_dup_tsns;
sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
sctp_tsnmap_get_dups(map));
}
@@ -869,11 +848,11 @@
* association so no transport will match after a wrap event like this,
* Until the next sack
*/
- if (++aptr->peer.sack_generation == 0) {
+ if (++asoc->peer.sack_generation == 0) {
list_for_each_entry(trans, &asoc->peer.transport_addr_list,
transports)
trans->sack_generation = 0;
- aptr->peer.sack_generation = 1;
+ asoc->peer.sack_generation = 1;
}
nodata:
return retval;
@@ -1258,20 +1237,26 @@
return retval;
}
-/* Create an Operation Error chunk of a fixed size,
- * specifically, max(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT)
- * This is a helper function to allocate an error chunk for
- * for those invalid parameter codes in which we may not want
- * to report all the errors, if the incoming chunk is large
+/* Create an Operation Error chunk of a fixed size, specifically,
+ * min(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT) - overheads.
+ * This is a helper function to allocate an error chunk for for those
+ * invalid parameter codes in which we may not want to report all the
+ * errors, if the incoming chunk is large. If it can't fit in a single
+ * packet, we ignore it.
*/
-static inline struct sctp_chunk *sctp_make_op_error_fixed(
+static inline struct sctp_chunk *sctp_make_op_error_limited(
const struct sctp_association *asoc,
const struct sctp_chunk *chunk)
{
- size_t size = asoc ? asoc->pathmtu : 0;
+ size_t size = SCTP_DEFAULT_MAXSEGMENT;
+ struct sctp_sock *sp = NULL;
- if (!size)
- size = SCTP_DEFAULT_MAXSEGMENT;
+ if (asoc) {
+ size = min_t(size_t, size, asoc->pathmtu);
+ sp = sctp_sk(asoc->base.sk);
+ }
+
+ size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr));
return sctp_make_op_error_space(asoc, chunk, size);
}
@@ -1523,18 +1508,6 @@
return target;
}
-/* Append bytes to the end of a chunk. Returns NULL if there isn't sufficient
- * space in the chunk
- */
-static void *sctp_addto_chunk_fixed(struct sctp_chunk *chunk,
- int len, const void *data)
-{
- if (skb_tailroom(chunk->skb) >= len)
- return sctp_addto_chunk(chunk, len, data);
- else
- return NULL;
-}
-
/* Append bytes from user space to the end of a chunk. Will panic if
* chunk is not big enough.
* Returns a kernel err value.
@@ -1829,6 +1802,9 @@
kt = ktime_get_real();
if (!asoc && ktime_before(bear_cookie->expiration, kt)) {
+ suseconds_t usecs = ktime_to_us(ktime_sub(kt, bear_cookie->expiration));
+ __be32 n = htonl(usecs);
+
/*
* Section 3.3.10.3 Stale Cookie Error (3)
*
@@ -1837,17 +1813,12 @@
* Stale Cookie Error: Indicates the receipt of a valid State
* Cookie that has expired.
*/
- len = ntohs(chunk->chunk_hdr->length);
- *errp = sctp_make_op_error_space(asoc, chunk, len);
- if (*errp) {
- suseconds_t usecs = ktime_to_us(ktime_sub(kt, bear_cookie->expiration));
- __be32 n = htonl(usecs);
-
- sctp_init_cause(*errp, SCTP_ERROR_STALE_COOKIE,
- sizeof(n));
- sctp_addto_chunk(*errp, sizeof(n), &n);
+ *errp = sctp_make_op_error(asoc, chunk,
+ SCTP_ERROR_STALE_COOKIE, &n,
+ sizeof(n), 0);
+ if (*errp)
*error = -SCTP_IERROR_STALE_COOKIE;
- } else
+ else
*error = -SCTP_IERROR_NOMEM;
goto fail;
@@ -1998,12 +1969,8 @@
if (*errp)
sctp_chunk_free(*errp);
- *errp = sctp_make_op_error_space(asoc, chunk, len);
-
- if (*errp) {
- sctp_init_cause(*errp, SCTP_ERROR_DNS_FAILED, len);
- sctp_addto_chunk(*errp, len, param.v);
- }
+ *errp = sctp_make_op_error(asoc, chunk, SCTP_ERROR_DNS_FAILED,
+ param.v, len, 0);
/* Stop processing this chunk. */
return 0;
@@ -2128,23 +2095,23 @@
/* Make an ERROR chunk, preparing enough room for
* returning multiple unknown parameters.
*/
- if (NULL == *errp)
- *errp = sctp_make_op_error_fixed(asoc, chunk);
-
- if (*errp) {
- if (!sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
- SCTP_PAD4(ntohs(param.p->length))))
- sctp_addto_chunk_fixed(*errp,
- SCTP_PAD4(ntohs(param.p->length)),
- param.v);
- } else {
- /* If there is no memory for generating the ERROR
- * report as specified, an ABORT will be triggered
- * to the peer and the association won't be
- * established.
- */
- retval = SCTP_IERROR_NOMEM;
+ if (!*errp) {
+ *errp = sctp_make_op_error_limited(asoc, chunk);
+ if (!*errp) {
+ /* If there is no memory for generating the
+ * ERROR report as specified, an ABORT will be
+ * triggered to the peer and the association
+ * won't be established.
+ */
+ retval = SCTP_IERROR_NOMEM;
+ break;
+ }
}
+
+ if (!sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
+ ntohs(param.p->length)))
+ sctp_addto_chunk(*errp, ntohs(param.p->length),
+ param.v);
break;
default:
break;
@@ -2220,10 +2187,10 @@
* MUST be aborted. The ABORT chunk SHOULD contain the error
* cause 'Protocol Violation'.
*/
- if (SCTP_AUTH_RANDOM_LENGTH !=
- ntohs(param.p->length) - sizeof(struct sctp_paramhdr)) {
+ if (SCTP_AUTH_RANDOM_LENGTH != ntohs(param.p->length) -
+ sizeof(struct sctp_paramhdr)) {
sctp_process_inv_paramlength(asoc, param.p,
- chunk, err_chunk);
+ chunk, err_chunk);
retval = SCTP_IERROR_ABORT;
}
break;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ae7e7c6..ce620e8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -644,16 +644,15 @@
list_for_each_entry(trans,
&asoc->peer.transport_addr_list, transports) {
- /* Clear the source and route cache */
- sctp_transport_dst_release(trans);
trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
2*asoc->pathmtu, 4380));
trans->ssthresh = asoc->peer.i.a_rwnd;
trans->rto = asoc->rto_initial;
sctp_max_rto(asoc, trans);
trans->rtt = trans->srtt = trans->rttvar = 0;
+ /* Clear the source and route cache */
sctp_transport_route(trans, NULL,
- sctp_sk(asoc->base.sk));
+ sctp_sk(asoc->base.sk));
}
}
retval = sctp_send_asconf(asoc, chunk);
@@ -896,7 +895,6 @@
*/
list_for_each_entry(transport, &asoc->peer.transport_addr_list,
transports) {
- sctp_transport_dst_release(transport);
sctp_transport_route(transport, NULL,
sctp_sk(asoc->base.sk));
}
@@ -1894,6 +1892,7 @@
struct sctp_sndrcvinfo *sinfo)
{
struct sock *sk = asoc->base.sk;
+ struct sctp_sock *sp = sctp_sk(sk);
struct net *net = sock_net(sk);
struct sctp_datamsg *datamsg;
bool wait_connect = false;
@@ -1912,13 +1911,16 @@
goto err;
}
- if (sctp_sk(sk)->disable_fragments && msg_len > asoc->frag_point) {
+ if (sp->disable_fragments && msg_len > asoc->frag_point) {
err = -EMSGSIZE;
goto err;
}
- if (asoc->pmtu_pending)
- sctp_assoc_pending_pmtu(asoc);
+ if (asoc->pmtu_pending) {
+ if (sp->param_flags & SPP_PMTUD_ENABLE)
+ sctp_assoc_sync_pmtu(asoc);
+ asoc->pmtu_pending = 0;
+ }
if (sctp_wspace(asoc) < msg_len)
sctp_prsctp_prune(asoc, sinfo, msg_len - sctp_wspace(asoc));
@@ -1935,7 +1937,7 @@
if (err)
goto err;
- if (sctp_sk(sk)->strm_interleave) {
+ if (sp->strm_interleave) {
timeo = sock_sndtimeo(sk, 0);
err = sctp_wait_for_connect(asoc, &timeo);
if (err)
@@ -2538,7 +2540,7 @@
trans->pathmtu = params->spp_pathmtu;
sctp_assoc_sync_pmtu(asoc);
} else if (asoc) {
- asoc->pathmtu = params->spp_pathmtu;
+ sctp_assoc_set_pmtu(asoc, params->spp_pathmtu);
} else {
sp->pathmtu = params->spp_pathmtu;
}
@@ -3208,7 +3210,6 @@
static int sctp_setsockopt_maxseg(struct sock *sk, char __user *optval, unsigned int optlen)
{
struct sctp_sock *sp = sctp_sk(sk);
- struct sctp_af *af = sp->pf->af;
struct sctp_assoc_value params;
struct sctp_association *asoc;
int val;
@@ -3230,30 +3231,24 @@
return -EINVAL;
}
+ asoc = sctp_id2assoc(sk, params.assoc_id);
+
if (val) {
int min_len, max_len;
+ __u16 datasize = asoc ? sctp_datachk_len(&asoc->stream) :
+ sizeof(struct sctp_data_chunk);
- min_len = SCTP_DEFAULT_MINSEGMENT - af->net_header_len;
- min_len -= af->ip_options_len(sk);
- min_len -= sizeof(struct sctphdr) +
- sizeof(struct sctp_data_chunk);
-
- max_len = SCTP_MAX_CHUNK_LEN - sizeof(struct sctp_data_chunk);
+ min_len = sctp_mtu_payload(sp, SCTP_DEFAULT_MINSEGMENT,
+ datasize);
+ max_len = SCTP_MAX_CHUNK_LEN - datasize;
if (val < min_len || val > max_len)
return -EINVAL;
}
- asoc = sctp_id2assoc(sk, params.assoc_id);
if (asoc) {
- if (val == 0) {
- val = asoc->pathmtu - af->net_header_len;
- val -= af->ip_options_len(sk);
- val -= sizeof(struct sctphdr) +
- sctp_datachk_len(&asoc->stream);
- }
asoc->user_frag = val;
- asoc->frag_point = sctp_frag_point(asoc, asoc->pathmtu);
+ sctp_assoc_update_frag_point(asoc);
} else {
if (params.assoc_id && sctp_style(sk, UDP))
return -EINVAL;
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 47f82bd..4a95e26 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -242,9 +242,18 @@
&transport->fl, sk);
}
- if (transport->dst) {
- transport->pathmtu = SCTP_TRUNC4(dst_mtu(transport->dst));
- } else
+ if (transport->param_flags & SPP_PMTUD_DISABLE) {
+ struct sctp_association *asoc = transport->asoc;
+
+ if (!transport->pathmtu && asoc && asoc->pathmtu)
+ transport->pathmtu = asoc->pathmtu;
+ if (transport->pathmtu)
+ return;
+ }
+
+ if (transport->dst)
+ transport->pathmtu = sctp_dst_mtu(transport->dst);
+ else
transport->pathmtu = SCTP_DEFAULT_MAXSEGMENT;
}
@@ -290,6 +299,7 @@
struct sctp_association *asoc = transport->asoc;
struct sctp_af *af = transport->af_specific;
+ sctp_transport_dst_release(transport);
af->get_dst(transport, saddr, &transport->fl, sctp_opt2sk(opt));
if (saddr)
@@ -297,21 +307,14 @@
else
af->get_saddr(opt, transport, &transport->fl);
- if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
- return;
- }
- if (transport->dst) {
- transport->pathmtu = SCTP_TRUNC4(dst_mtu(transport->dst));
+ sctp_transport_pmtu(transport, sctp_opt2sk(opt));
- /* Initialize sk->sk_rcv_saddr, if the transport is the
- * association's active path for getsockname().
- */
- if (asoc && (!asoc->peer.primary_path ||
- (transport == asoc->peer.active_path)))
- opt->pf->to_sk_saddr(&transport->saddr,
- asoc->base.sk);
- } else
- transport->pathmtu = SCTP_DEFAULT_MAXSEGMENT;
+ /* Initialize sk->sk_rcv_saddr, if the transport is the
+ * association's active path for getsockname().
+ */
+ if (transport->dst && asoc &&
+ (!asoc->peer.primary_path || transport == asoc->peer.active_path))
+ opt->pf->to_sk_saddr(&transport->saddr, asoc->base.sk);
}
/* Hold a reference to a transport. */
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 544bab4..2c369d4 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -8,8 +8,6 @@
*
* Initial restrictions:
* - support for alternate links postponed
- * - partial support for non-blocking sockets only
- * - support for urgent data postponed
*
* Copyright IBM Corp. 2016, 2018
*
@@ -29,6 +27,7 @@
#include <net/sock.h>
#include <net/tcp.h>
#include <net/smc.h>
+#include <asm/ioctls.h>
#include "smc.h"
#include "smc_clc.h"
@@ -45,11 +44,6 @@
* creation
*/
-struct smc_lgr_list smc_lgr_list = { /* established link groups */
- .lock = __SPIN_LOCK_UNLOCKED(smc_lgr_list.lock),
- .list = LIST_HEAD_INIT(smc_lgr_list.list),
-};
-
static void smc_tcp_listen_work(struct work_struct *);
static void smc_set_keepalive(struct sock *sk, int val)
@@ -192,8 +186,10 @@
sk->sk_protocol = protocol;
smc = smc_sk(sk);
INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
+ INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
INIT_LIST_HEAD(&smc->accept_q);
spin_lock_init(&smc->accept_q_lock);
+ spin_lock_init(&smc->conn.send_lock);
sk->sk_prot->hash(sk);
sk_refcnt_debug_inc(sk);
@@ -292,19 +288,28 @@
smc_copy_sock_settings(&smc->sk, smc->clcsock->sk, SK_FLAGS_CLC_TO_SMC);
}
-/* register a new rmb */
-static int smc_reg_rmb(struct smc_link *link, struct smc_buf_desc *rmb_desc)
+/* register a new rmb, optionally send confirm_rkey msg to register with peer */
+static int smc_reg_rmb(struct smc_link *link, struct smc_buf_desc *rmb_desc,
+ bool conf_rkey)
{
/* register memory region for new rmb */
if (smc_wr_reg_send(link, rmb_desc->mr_rx[SMC_SINGLE_LINK])) {
rmb_desc->regerr = 1;
return -EFAULT;
}
+ if (!conf_rkey)
+ return 0;
+ /* exchange confirm_rkey msg with peer */
+ if (smc_llc_do_confirm_rkey(link, rmb_desc)) {
+ rmb_desc->regerr = 1;
+ return -EFAULT;
+ }
return 0;
}
static int smc_clnt_conf_first_link(struct smc_sock *smc)
{
+ struct net *net = sock_net(smc->clcsock->sk);
struct smc_link_group *lgr = smc->conn.lgr;
struct smc_link *link;
int rest;
@@ -332,7 +337,7 @@
smc_wr_remember_qp_attr(link);
- if (smc_reg_rmb(link, smc->conn.rmb_desc))
+ if (smc_reg_rmb(link, smc->conn.rmb_desc, false))
return SMC_CLC_DECL_INTERR;
/* send CONFIRM LINK response over RoCE fabric */
@@ -362,7 +367,7 @@
if (rc < 0)
return SMC_CLC_DECL_TCL;
- link->state = SMC_LNK_ACTIVE;
+ smc_llc_link_active(link, net->ipv4.sysctl_tcp_keepalive_time);
return 0;
}
@@ -370,10 +375,13 @@
static void smc_conn_save_peer_info(struct smc_sock *smc,
struct smc_clc_msg_accept_confirm *clc)
{
- smc->conn.peer_conn_idx = clc->conn_idx;
+ int bufsize = smc_uncompress_bufsize(clc->rmbe_size);
+
+ smc->conn.peer_rmbe_idx = clc->rmbe_idx;
smc->conn.local_tx_ctrl.token = ntohl(clc->rmbe_alert_token);
- smc->conn.peer_rmbe_size = smc_uncompress_bufsize(clc->rmbe_size);
+ smc->conn.peer_rmbe_size = bufsize;
atomic_set(&smc->conn.peer_rmbe_space, smc->conn.peer_rmbe_size);
+ smc->conn.tx_off = bufsize * (smc->conn.peer_rmbe_idx - 1);
}
static void smc_link_save_peer_info(struct smc_link *link,
@@ -386,160 +394,186 @@
link->peer_mtu = clc->qp_mtu;
}
-/* setup for RDMA connection of client */
-static int smc_connect_rdma(struct smc_sock *smc)
+/* fall back during connect */
+static int smc_connect_fallback(struct smc_sock *smc)
{
- struct smc_clc_msg_accept_confirm aclc;
- int local_contact = SMC_FIRST_CONTACT;
- struct smc_ib_device *smcibdev;
- struct smc_link *link;
- u8 srv_first_contact;
+ smc->use_fallback = true;
+ smc_copy_sock_settings_to_clc(smc);
+ if (smc->sk.sk_state == SMC_INIT)
+ smc->sk.sk_state = SMC_ACTIVE;
+ return 0;
+}
+
+/* decline and fall back during connect */
+static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code)
+{
+ int rc;
+
+ if (reason_code < 0) /* error, fallback is not possible */
+ return reason_code;
+ if (reason_code != SMC_CLC_DECL_REPLY) {
+ rc = smc_clc_send_decline(smc, reason_code);
+ if (rc < 0)
+ return rc;
+ }
+ return smc_connect_fallback(smc);
+}
+
+/* abort connecting */
+static int smc_connect_abort(struct smc_sock *smc, int reason_code,
+ int local_contact)
+{
+ if (local_contact == SMC_FIRST_CONTACT)
+ smc_lgr_forget(smc->conn.lgr);
+ mutex_unlock(&smc_create_lgr_pending);
+ smc_conn_free(&smc->conn);
+ if (reason_code < 0 && smc->sk.sk_state == SMC_INIT)
+ sock_put(&smc->sk); /* passive closing */
+ return reason_code;
+}
+
+/* check if there is a rdma device available for this connection. */
+/* called for connect and listen */
+static int smc_check_rdma(struct smc_sock *smc, struct smc_ib_device **ibdev,
+ u8 *ibport)
+{
int reason_code = 0;
- int rc = 0;
- u8 ibport;
-
- sock_hold(&smc->sk); /* sock put in passive closing */
-
- if (!tcp_sk(smc->clcsock->sk)->syn_smc) {
- /* peer has not signalled SMC-capability */
- smc->use_fallback = true;
- goto out_connected;
- }
-
- /* IPSec connections opt out of SMC-R optimizations */
- if (using_ipsec(smc)) {
- reason_code = SMC_CLC_DECL_IPSEC;
- goto decline_rdma;
- }
/* PNET table look up: search active ib_device and port
* within same PNETID that also contains the ethernet device
* used for the internal TCP socket
*/
- smc_pnet_find_roce_resource(smc->clcsock->sk, &smcibdev, &ibport);
- if (!smcibdev) {
+ smc_pnet_find_roce_resource(smc->clcsock->sk, ibdev, ibport);
+ if (!(*ibdev))
reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
- goto decline_rdma;
- }
+
+ return reason_code;
+}
+
+/* CLC handshake during connect */
+static int smc_connect_clc(struct smc_sock *smc,
+ struct smc_clc_msg_accept_confirm *aclc,
+ struct smc_ib_device *ibdev, u8 ibport)
+{
+ int rc = 0;
/* do inband token exchange */
- reason_code = smc_clc_send_proposal(smc, smcibdev, ibport);
- if (reason_code < 0) {
- rc = reason_code;
- goto out_err;
- }
- if (reason_code > 0) /* configuration error */
- goto decline_rdma;
+ rc = smc_clc_send_proposal(smc, ibdev, ibport);
+ if (rc)
+ return rc;
/* receive SMC Accept CLC message */
- reason_code = smc_clc_wait_msg(smc, &aclc, sizeof(aclc),
- SMC_CLC_ACCEPT);
- if (reason_code < 0) {
- rc = reason_code;
- goto out_err;
- }
- if (reason_code > 0)
- goto decline_rdma;
+ return smc_clc_wait_msg(smc, aclc, sizeof(*aclc), SMC_CLC_ACCEPT);
+}
- srv_first_contact = aclc.hdr.flag;
+/* setup for RDMA connection of client */
+static int smc_connect_rdma(struct smc_sock *smc,
+ struct smc_clc_msg_accept_confirm *aclc,
+ struct smc_ib_device *ibdev, u8 ibport)
+{
+ int local_contact = SMC_FIRST_CONTACT;
+ struct smc_link *link;
+ int reason_code = 0;
+
mutex_lock(&smc_create_lgr_pending);
- local_contact = smc_conn_create(smc, smcibdev, ibport, &aclc.lcl,
- srv_first_contact);
+ local_contact = smc_conn_create(smc, ibdev, ibport, &aclc->lcl,
+ aclc->hdr.flag);
if (local_contact < 0) {
- rc = local_contact;
- if (rc == -ENOMEM)
+ if (local_contact == -ENOMEM)
reason_code = SMC_CLC_DECL_MEM;/* insufficient memory*/
- else if (rc == -ENOLINK)
+ else if (local_contact == -ENOLINK)
reason_code = SMC_CLC_DECL_SYNCERR; /* synchr. error */
- goto decline_rdma_unlock;
+ else
+ reason_code = SMC_CLC_DECL_INTERR; /* other error */
+ return smc_connect_abort(smc, reason_code, 0);
}
link = &smc->conn.lgr->lnk[SMC_SINGLE_LINK];
- smc_conn_save_peer_info(smc, &aclc);
+ smc_conn_save_peer_info(smc, aclc);
/* create send buffer and rmb */
- rc = smc_buf_create(smc);
- if (rc) {
- reason_code = SMC_CLC_DECL_MEM;
- goto decline_rdma_unlock;
- }
+ if (smc_buf_create(smc))
+ return smc_connect_abort(smc, SMC_CLC_DECL_MEM, local_contact);
if (local_contact == SMC_FIRST_CONTACT)
- smc_link_save_peer_info(link, &aclc);
+ smc_link_save_peer_info(link, aclc);
- rc = smc_rmb_rtoken_handling(&smc->conn, &aclc);
- if (rc) {
- reason_code = SMC_CLC_DECL_INTERR;
- goto decline_rdma_unlock;
- }
+ if (smc_rmb_rtoken_handling(&smc->conn, aclc))
+ return smc_connect_abort(smc, SMC_CLC_DECL_INTERR,
+ local_contact);
smc_close_init(smc);
smc_rx_init(smc);
if (local_contact == SMC_FIRST_CONTACT) {
- rc = smc_ib_ready_link(link);
- if (rc) {
- reason_code = SMC_CLC_DECL_INTERR;
- goto decline_rdma_unlock;
- }
+ if (smc_ib_ready_link(link))
+ return smc_connect_abort(smc, SMC_CLC_DECL_INTERR,
+ local_contact);
} else {
- if (!smc->conn.rmb_desc->reused) {
- if (smc_reg_rmb(link, smc->conn.rmb_desc)) {
- reason_code = SMC_CLC_DECL_INTERR;
- goto decline_rdma_unlock;
- }
- }
+ if (!smc->conn.rmb_desc->reused &&
+ smc_reg_rmb(link, smc->conn.rmb_desc, true))
+ return smc_connect_abort(smc, SMC_CLC_DECL_INTERR,
+ local_contact);
}
smc_rmb_sync_sg_for_device(&smc->conn);
- rc = smc_clc_send_confirm(smc);
- if (rc)
- goto out_err_unlock;
+ reason_code = smc_clc_send_confirm(smc);
+ if (reason_code)
+ return smc_connect_abort(smc, reason_code, local_contact);
+
+ smc_tx_init(smc);
if (local_contact == SMC_FIRST_CONTACT) {
/* QP confirmation over RoCE fabric */
reason_code = smc_clnt_conf_first_link(smc);
- if (reason_code < 0) {
- rc = reason_code;
- goto out_err_unlock;
- }
- if (reason_code > 0)
- goto decline_rdma_unlock;
+ if (reason_code)
+ return smc_connect_abort(smc, reason_code,
+ local_contact);
}
-
mutex_unlock(&smc_create_lgr_pending);
- smc_tx_init(smc);
-out_connected:
smc_copy_sock_settings_to_clc(smc);
if (smc->sk.sk_state == SMC_INIT)
smc->sk.sk_state = SMC_ACTIVE;
- return rc ? rc : local_contact;
+ return 0;
+}
-decline_rdma_unlock:
- if (local_contact == SMC_FIRST_CONTACT)
- smc_lgr_forget(smc->conn.lgr);
- mutex_unlock(&smc_create_lgr_pending);
- smc_conn_free(&smc->conn);
-decline_rdma:
- /* RDMA setup failed, switch back to TCP */
- smc->use_fallback = true;
- if (reason_code && (reason_code != SMC_CLC_DECL_REPLY)) {
- rc = smc_clc_send_decline(smc, reason_code);
- if (rc < 0)
- goto out_err;
- }
- goto out_connected;
+/* perform steps before actually connecting */
+static int __smc_connect(struct smc_sock *smc)
+{
+ struct smc_clc_msg_accept_confirm aclc;
+ struct smc_ib_device *ibdev;
+ int rc = 0;
+ u8 ibport;
-out_err_unlock:
- if (local_contact == SMC_FIRST_CONTACT)
- smc_lgr_forget(smc->conn.lgr);
- mutex_unlock(&smc_create_lgr_pending);
- smc_conn_free(&smc->conn);
-out_err:
- if (smc->sk.sk_state == SMC_INIT)
- sock_put(&smc->sk); /* passive closing */
- return rc;
+ sock_hold(&smc->sk); /* sock put in passive closing */
+
+ if (smc->use_fallback)
+ return smc_connect_fallback(smc);
+
+ /* if peer has not signalled SMC-capability, fall back */
+ if (!tcp_sk(smc->clcsock->sk)->syn_smc)
+ return smc_connect_fallback(smc);
+
+ /* IPSec connections opt out of SMC-R optimizations */
+ if (using_ipsec(smc))
+ return smc_connect_decline_fallback(smc, SMC_CLC_DECL_IPSEC);
+
+ /* check if a RDMA device is available; if not, fall back */
+ if (smc_check_rdma(smc, &ibdev, &ibport))
+ return smc_connect_decline_fallback(smc, SMC_CLC_DECL_CNFERR);
+
+ /* perform CLC handshake */
+ rc = smc_connect_clc(smc, &aclc, ibdev, ibport);
+ if (rc)
+ return smc_connect_decline_fallback(smc, rc);
+
+ /* connect using rdma */
+ rc = smc_connect_rdma(smc, &aclc, ibdev, ibport);
+ if (rc)
+ return smc_connect_decline_fallback(smc, rc);
+
+ return 0;
}
static int smc_connect(struct socket *sock, struct sockaddr *addr,
@@ -575,8 +609,7 @@
if (rc)
goto out;
- /* setup RDMA connection */
- rc = smc_connect_rdma(smc);
+ rc = __smc_connect(smc);
if (rc < 0)
goto out;
else
@@ -716,6 +749,7 @@
static int smc_serv_conf_first_link(struct smc_sock *smc)
{
+ struct net *net = sock_net(smc->clcsock->sk);
struct smc_link_group *lgr = smc->conn.lgr;
struct smc_link *link;
int rest;
@@ -723,7 +757,7 @@
link = &lgr->lnk[SMC_SINGLE_LINK];
- if (smc_reg_rmb(link, smc->conn.rmb_desc))
+ if (smc_reg_rmb(link, smc->conn.rmb_desc, false))
return SMC_CLC_DECL_INTERR;
/* send CONFIRM LINK request to client over the RoCE fabric */
@@ -768,147 +802,17 @@
return rc;
}
- link->state = SMC_LNK_ACTIVE;
+ smc_llc_link_active(link, net->ipv4.sysctl_tcp_keepalive_time);
return 0;
}
-/* setup for RDMA connection of server */
-static void smc_listen_work(struct work_struct *work)
+/* listen worker: finish */
+static void smc_listen_out(struct smc_sock *new_smc)
{
- struct smc_sock *new_smc = container_of(work, struct smc_sock,
- smc_listen_work);
- struct smc_clc_msg_proposal_prefix *pclc_prfx;
- struct socket *newclcsock = new_smc->clcsock;
struct smc_sock *lsmc = new_smc->listen_smc;
- struct smc_clc_msg_accept_confirm cclc;
- int local_contact = SMC_REUSE_CONTACT;
struct sock *newsmcsk = &new_smc->sk;
- struct smc_clc_msg_proposal *pclc;
- struct smc_ib_device *smcibdev;
- u8 buf[SMC_CLC_MAX_LEN];
- struct smc_link *link;
- int reason_code = 0;
- int rc = 0;
- u8 ibport;
- /* check if peer is smc capable */
- if (!tcp_sk(newclcsock->sk)->syn_smc) {
- new_smc->use_fallback = true;
- goto out_connected;
- }
-
- /* do inband token exchange -
- *wait for and receive SMC Proposal CLC message
- */
- reason_code = smc_clc_wait_msg(new_smc, &buf, sizeof(buf),
- SMC_CLC_PROPOSAL);
- if (reason_code < 0)
- goto out_err;
- if (reason_code > 0)
- goto decline_rdma;
-
- /* IPSec connections opt out of SMC-R optimizations */
- if (using_ipsec(new_smc)) {
- reason_code = SMC_CLC_DECL_IPSEC;
- goto decline_rdma;
- }
-
- /* PNET table look up: search active ib_device and port
- * within same PNETID that also contains the ethernet device
- * used for the internal TCP socket
- */
- smc_pnet_find_roce_resource(newclcsock->sk, &smcibdev, &ibport);
- if (!smcibdev) {
- reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
- goto decline_rdma;
- }
-
- pclc = (struct smc_clc_msg_proposal *)&buf;
- pclc_prfx = smc_clc_proposal_get_prefix(pclc);
-
- rc = smc_clc_prfx_match(newclcsock, pclc_prfx);
- if (rc) {
- reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
- goto decline_rdma;
- }
-
- /* allocate connection / link group */
- mutex_lock(&smc_create_lgr_pending);
- local_contact = smc_conn_create(new_smc, smcibdev, ibport, &pclc->lcl,
- 0);
- if (local_contact < 0) {
- rc = local_contact;
- if (rc == -ENOMEM)
- reason_code = SMC_CLC_DECL_MEM;/* insufficient memory*/
- goto decline_rdma_unlock;
- }
- link = &new_smc->conn.lgr->lnk[SMC_SINGLE_LINK];
-
- /* create send buffer and rmb */
- rc = smc_buf_create(new_smc);
- if (rc) {
- reason_code = SMC_CLC_DECL_MEM;
- goto decline_rdma_unlock;
- }
-
- smc_close_init(new_smc);
- smc_rx_init(new_smc);
-
- if (local_contact != SMC_FIRST_CONTACT) {
- if (!new_smc->conn.rmb_desc->reused) {
- if (smc_reg_rmb(link, new_smc->conn.rmb_desc)) {
- reason_code = SMC_CLC_DECL_INTERR;
- goto decline_rdma_unlock;
- }
- }
- }
- smc_rmb_sync_sg_for_device(&new_smc->conn);
-
- rc = smc_clc_send_accept(new_smc, local_contact);
- if (rc)
- goto out_err_unlock;
-
- /* receive SMC Confirm CLC message */
- reason_code = smc_clc_wait_msg(new_smc, &cclc, sizeof(cclc),
- SMC_CLC_CONFIRM);
- if (reason_code < 0)
- goto out_err_unlock;
- if (reason_code > 0)
- goto decline_rdma_unlock;
- smc_conn_save_peer_info(new_smc, &cclc);
- if (local_contact == SMC_FIRST_CONTACT)
- smc_link_save_peer_info(link, &cclc);
-
- rc = smc_rmb_rtoken_handling(&new_smc->conn, &cclc);
- if (rc) {
- reason_code = SMC_CLC_DECL_INTERR;
- goto decline_rdma_unlock;
- }
-
- if (local_contact == SMC_FIRST_CONTACT) {
- rc = smc_ib_ready_link(link);
- if (rc) {
- reason_code = SMC_CLC_DECL_INTERR;
- goto decline_rdma_unlock;
- }
- /* QP confirmation over RoCE fabric */
- reason_code = smc_serv_conf_first_link(new_smc);
- if (reason_code < 0)
- /* peer is not aware of a problem */
- goto out_err_unlock;
- if (reason_code > 0)
- goto decline_rdma_unlock;
- }
-
- smc_tx_init(new_smc);
- mutex_unlock(&smc_create_lgr_pending);
-
-out_connected:
- sk_refcnt_debug_inc(newsmcsk);
- if (newsmcsk->sk_state == SMC_INIT)
- newsmcsk->sk_state = SMC_ACTIVE;
-enqueue:
lock_sock_nested(&lsmc->sk, SINGLE_DEPTH_NESTING);
if (lsmc->sk.sk_state == SMC_LISTEN) {
smc_accept_enqueue(&lsmc->sk, newsmcsk);
@@ -920,32 +824,222 @@
/* Wake up accept */
lsmc->sk.sk_data_ready(&lsmc->sk);
sock_put(&lsmc->sk); /* sock_hold in smc_tcp_listen_work */
- return;
+}
-decline_rdma_unlock:
- if (local_contact == SMC_FIRST_CONTACT)
- smc_lgr_forget(new_smc->conn.lgr);
- mutex_unlock(&smc_create_lgr_pending);
-decline_rdma:
- /* RDMA setup failed, switch back to TCP */
- smc_conn_free(&new_smc->conn);
- new_smc->use_fallback = true;
- if (reason_code && (reason_code != SMC_CLC_DECL_REPLY)) {
- if (smc_clc_send_decline(new_smc, reason_code) < 0)
- goto out_err;
- }
- goto out_connected;
+/* listen worker: finish in state connected */
+static void smc_listen_out_connected(struct smc_sock *new_smc)
+{
+ struct sock *newsmcsk = &new_smc->sk;
-out_err_unlock:
- if (local_contact == SMC_FIRST_CONTACT)
- smc_lgr_forget(new_smc->conn.lgr);
- mutex_unlock(&smc_create_lgr_pending);
-out_err:
+ sk_refcnt_debug_inc(newsmcsk);
+ if (newsmcsk->sk_state == SMC_INIT)
+ newsmcsk->sk_state = SMC_ACTIVE;
+
+ smc_listen_out(new_smc);
+}
+
+/* listen worker: finish in error state */
+static void smc_listen_out_err(struct smc_sock *new_smc)
+{
+ struct sock *newsmcsk = &new_smc->sk;
+
if (newsmcsk->sk_state == SMC_INIT)
sock_put(&new_smc->sk); /* passive closing */
newsmcsk->sk_state = SMC_CLOSED;
smc_conn_free(&new_smc->conn);
- goto enqueue; /* queue new sock with sk_err set */
+
+ smc_listen_out(new_smc);
+}
+
+/* listen worker: decline and fall back if possible */
+static void smc_listen_decline(struct smc_sock *new_smc, int reason_code,
+ int local_contact)
+{
+ /* RDMA setup failed, switch back to TCP */
+ if (local_contact == SMC_FIRST_CONTACT)
+ smc_lgr_forget(new_smc->conn.lgr);
+ if (reason_code < 0) { /* error, no fallback possible */
+ smc_listen_out_err(new_smc);
+ return;
+ }
+ smc_conn_free(&new_smc->conn);
+ new_smc->use_fallback = true;
+ if (reason_code && reason_code != SMC_CLC_DECL_REPLY) {
+ if (smc_clc_send_decline(new_smc, reason_code) < 0) {
+ smc_listen_out_err(new_smc);
+ return;
+ }
+ }
+ smc_listen_out_connected(new_smc);
+}
+
+/* listen worker: check prefixes */
+static int smc_listen_rdma_check(struct smc_sock *new_smc,
+ struct smc_clc_msg_proposal *pclc)
+{
+ struct smc_clc_msg_proposal_prefix *pclc_prfx;
+ struct socket *newclcsock = new_smc->clcsock;
+
+ pclc_prfx = smc_clc_proposal_get_prefix(pclc);
+ if (smc_clc_prfx_match(newclcsock, pclc_prfx))
+ return SMC_CLC_DECL_CNFERR;
+
+ return 0;
+}
+
+/* listen worker: initialize connection and buffers */
+static int smc_listen_rdma_init(struct smc_sock *new_smc,
+ struct smc_clc_msg_proposal *pclc,
+ struct smc_ib_device *ibdev, u8 ibport,
+ int *local_contact)
+{
+ /* allocate connection / link group */
+ *local_contact = smc_conn_create(new_smc, ibdev, ibport, &pclc->lcl, 0);
+ if (*local_contact < 0) {
+ if (*local_contact == -ENOMEM)
+ return SMC_CLC_DECL_MEM;/* insufficient memory*/
+ return SMC_CLC_DECL_INTERR; /* other error */
+ }
+
+ /* create send buffer and rmb */
+ if (smc_buf_create(new_smc))
+ return SMC_CLC_DECL_MEM;
+
+ return 0;
+}
+
+/* listen worker: register buffers */
+static int smc_listen_rdma_reg(struct smc_sock *new_smc, int local_contact)
+{
+ struct smc_link *link = &new_smc->conn.lgr->lnk[SMC_SINGLE_LINK];
+
+ if (local_contact != SMC_FIRST_CONTACT) {
+ if (!new_smc->conn.rmb_desc->reused) {
+ if (smc_reg_rmb(link, new_smc->conn.rmb_desc, true))
+ return SMC_CLC_DECL_INTERR;
+ }
+ }
+ smc_rmb_sync_sg_for_device(&new_smc->conn);
+
+ return 0;
+}
+
+/* listen worker: finish RDMA setup */
+static void smc_listen_rdma_finish(struct smc_sock *new_smc,
+ struct smc_clc_msg_accept_confirm *cclc,
+ int local_contact)
+{
+ struct smc_link *link = &new_smc->conn.lgr->lnk[SMC_SINGLE_LINK];
+ int reason_code = 0;
+
+ if (local_contact == SMC_FIRST_CONTACT)
+ smc_link_save_peer_info(link, cclc);
+
+ if (smc_rmb_rtoken_handling(&new_smc->conn, cclc)) {
+ reason_code = SMC_CLC_DECL_INTERR;
+ goto decline;
+ }
+
+ if (local_contact == SMC_FIRST_CONTACT) {
+ if (smc_ib_ready_link(link)) {
+ reason_code = SMC_CLC_DECL_INTERR;
+ goto decline;
+ }
+ /* QP confirmation over RoCE fabric */
+ reason_code = smc_serv_conf_first_link(new_smc);
+ if (reason_code)
+ goto decline;
+ }
+ return;
+
+decline:
+ mutex_unlock(&smc_create_lgr_pending);
+ smc_listen_decline(new_smc, reason_code, local_contact);
+}
+
+/* setup for RDMA connection of server */
+static void smc_listen_work(struct work_struct *work)
+{
+ struct smc_sock *new_smc = container_of(work, struct smc_sock,
+ smc_listen_work);
+ struct socket *newclcsock = new_smc->clcsock;
+ struct smc_clc_msg_accept_confirm cclc;
+ struct smc_clc_msg_proposal *pclc;
+ struct smc_ib_device *ibdev;
+ u8 buf[SMC_CLC_MAX_LEN];
+ int local_contact = 0;
+ int reason_code = 0;
+ int rc = 0;
+ u8 ibport;
+
+ if (new_smc->use_fallback) {
+ smc_listen_out_connected(new_smc);
+ return;
+ }
+
+ /* check if peer is smc capable */
+ if (!tcp_sk(newclcsock->sk)->syn_smc) {
+ new_smc->use_fallback = true;
+ smc_listen_out_connected(new_smc);
+ return;
+ }
+
+ /* do inband token exchange -
+ * wait for and receive SMC Proposal CLC message
+ */
+ pclc = (struct smc_clc_msg_proposal *)&buf;
+ reason_code = smc_clc_wait_msg(new_smc, pclc, SMC_CLC_MAX_LEN,
+ SMC_CLC_PROPOSAL);
+ if (reason_code) {
+ smc_listen_decline(new_smc, reason_code, 0);
+ return;
+ }
+
+ /* IPSec connections opt out of SMC-R optimizations */
+ if (using_ipsec(new_smc)) {
+ smc_listen_decline(new_smc, SMC_CLC_DECL_IPSEC, 0);
+ return;
+ }
+
+ mutex_lock(&smc_create_lgr_pending);
+ smc_close_init(new_smc);
+ smc_rx_init(new_smc);
+ smc_tx_init(new_smc);
+
+ /* check if RDMA is available */
+ if (smc_check_rdma(new_smc, &ibdev, &ibport) ||
+ smc_listen_rdma_check(new_smc, pclc) ||
+ smc_listen_rdma_init(new_smc, pclc, ibdev, ibport,
+ &local_contact) ||
+ smc_listen_rdma_reg(new_smc, local_contact)) {
+ /* SMC not supported, decline */
+ mutex_unlock(&smc_create_lgr_pending);
+ smc_listen_decline(new_smc, SMC_CLC_DECL_CNFERR, local_contact);
+ return;
+ }
+
+ /* send SMC Accept CLC message */
+ rc = smc_clc_send_accept(new_smc, local_contact);
+ if (rc) {
+ mutex_unlock(&smc_create_lgr_pending);
+ smc_listen_decline(new_smc, rc, local_contact);
+ return;
+ }
+
+ /* receive SMC Confirm CLC message */
+ reason_code = smc_clc_wait_msg(new_smc, &cclc, sizeof(cclc),
+ SMC_CLC_CONFIRM);
+ if (reason_code) {
+ mutex_unlock(&smc_create_lgr_pending);
+ smc_listen_decline(new_smc, reason_code, local_contact);
+ return;
+ }
+
+ /* finish worker */
+ smc_listen_rdma_finish(new_smc, &cclc, local_contact);
+ smc_conn_save_peer_info(new_smc, &cclc);
+ mutex_unlock(&smc_create_lgr_pending);
+ smc_listen_out_connected(new_smc);
}
static void smc_tcp_listen_work(struct work_struct *work)
@@ -965,7 +1059,7 @@
continue;
new_smc->listen_smc = lsmc;
- new_smc->use_fallback = false; /* assume rdma capability first*/
+ new_smc->use_fallback = lsmc->use_fallback;
sock_hold(lsk); /* sock_put in smc_listen_work */
INIT_WORK(&new_smc->smc_listen_work, smc_listen_work);
smc_copy_sock_settings_to_smc(new_smc);
@@ -1001,7 +1095,8 @@
* them to the clc socket -- copy smc socket options to clc socket
*/
smc_copy_sock_settings_to_clc(smc);
- tcp_sk(smc->clcsock->sk)->syn_smc = 1;
+ if (!smc->use_fallback)
+ tcp_sk(smc->clcsock->sk)->syn_smc = 1;
rc = kernel_listen(smc->clcsock, backlog);
if (rc)
@@ -1034,6 +1129,7 @@
if (lsmc->sk.sk_state != SMC_LISTEN) {
rc = -EINVAL;
+ release_sock(sk);
goto out;
}
@@ -1061,9 +1157,29 @@
if (!rc)
rc = sock_error(nsk);
+ release_sock(sk);
+ if (rc)
+ goto out;
+
+ if (lsmc->sockopt_defer_accept && !(flags & O_NONBLOCK)) {
+ /* wait till data arrives on the socket */
+ timeo = msecs_to_jiffies(lsmc->sockopt_defer_accept *
+ MSEC_PER_SEC);
+ if (smc_sk(nsk)->use_fallback) {
+ struct sock *clcsk = smc_sk(nsk)->clcsock->sk;
+
+ lock_sock(clcsk);
+ if (skb_queue_empty(&clcsk->sk_receive_queue))
+ sk_wait_data(clcsk, &timeo, NULL);
+ release_sock(clcsk);
+ } else if (!atomic_read(&smc_sk(nsk)->conn.bytes_to_rcv)) {
+ lock_sock(nsk);
+ smc_rx_wait(smc_sk(nsk), &timeo, smc_rx_data_available);
+ release_sock(nsk);
+ }
+ }
out:
- release_sock(sk);
sock_put(sk); /* sock_hold above */
return rc;
}
@@ -1094,6 +1210,16 @@
(sk->sk_state != SMC_APPCLOSEWAIT1) &&
(sk->sk_state != SMC_INIT))
goto out;
+
+ if (msg->msg_flags & MSG_FASTOPEN) {
+ if (sk->sk_state == SMC_INIT) {
+ smc->use_fallback = true;
+ } else {
+ rc = -EINVAL;
+ goto out;
+ }
+ }
+
if (smc->use_fallback)
rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len);
else
@@ -1122,10 +1248,12 @@
goto out;
}
- if (smc->use_fallback)
+ if (smc->use_fallback) {
rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags);
- else
- rc = smc_rx_recvmsg(smc, msg, len, flags);
+ } else {
+ msg->msg_namelen = 0;
+ rc = smc_rx_recvmsg(smc, msg, NULL, len, flags);
+ }
out:
release_sock(sk);
@@ -1172,7 +1300,7 @@
if (sk->sk_state == SMC_INIT &&
mask & EPOLLOUT &&
smc->clcsock->sk->sk_state != TCP_CLOSE) {
- rc = smc_connect_rdma(smc);
+ rc = __smc_connect(smc);
if (rc < 0)
mask |= EPOLLERR;
/* success cases including fallback */
@@ -1208,6 +1336,8 @@
if (sk->sk_state == SMC_APPCLOSEWAIT1)
mask |= EPOLLIN;
}
+ if (smc->conn.urg_state == SMC_URG_VALID)
+ mask |= EPOLLPRI;
}
release_sock(sk);
@@ -1273,14 +1403,64 @@
{
struct sock *sk = sock->sk;
struct smc_sock *smc;
+ int val, rc;
smc = smc_sk(sk);
/* generic setsockopts reaching us here always apply to the
* CLC socket
*/
- return smc->clcsock->ops->setsockopt(smc->clcsock, level, optname,
- optval, optlen);
+ rc = smc->clcsock->ops->setsockopt(smc->clcsock, level, optname,
+ optval, optlen);
+ if (smc->clcsock->sk->sk_err) {
+ sk->sk_err = smc->clcsock->sk->sk_err;
+ sk->sk_error_report(sk);
+ }
+ if (rc)
+ return rc;
+
+ if (optlen < sizeof(int))
+ return rc;
+ get_user(val, (int __user *)optval);
+
+ lock_sock(sk);
+ switch (optname) {
+ case TCP_ULP:
+ case TCP_FASTOPEN:
+ case TCP_FASTOPEN_CONNECT:
+ case TCP_FASTOPEN_KEY:
+ case TCP_FASTOPEN_NO_COOKIE:
+ /* option not supported by SMC */
+ if (sk->sk_state == SMC_INIT) {
+ smc->use_fallback = true;
+ } else {
+ if (!smc->use_fallback)
+ rc = -EINVAL;
+ }
+ break;
+ case TCP_NODELAY:
+ if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+ if (val && !smc->use_fallback)
+ mod_delayed_work(system_wq, &smc->conn.tx_work,
+ 0);
+ }
+ break;
+ case TCP_CORK:
+ if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+ if (!val && !smc->use_fallback)
+ mod_delayed_work(system_wq, &smc->conn.tx_work,
+ 0);
+ }
+ break;
+ case TCP_DEFER_ACCEPT:
+ smc->sockopt_defer_accept = val;
+ break;
+ default:
+ break;
+ }
+ release_sock(sk);
+
+ return rc;
}
static int smc_getsockopt(struct socket *sock, int level, int optname,
@@ -1297,13 +1477,71 @@
static int smc_ioctl(struct socket *sock, unsigned int cmd,
unsigned long arg)
{
+ union smc_host_cursor cons, urg;
+ struct smc_connection *conn;
struct smc_sock *smc;
+ int answ;
smc = smc_sk(sock->sk);
- if (smc->use_fallback)
+ conn = &smc->conn;
+ if (smc->use_fallback) {
+ if (!smc->clcsock)
+ return -EBADF;
return smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg);
- else
- return sock_no_ioctl(sock, cmd, arg);
+ }
+ switch (cmd) {
+ case SIOCINQ: /* same as FIONREAD */
+ if (smc->sk.sk_state == SMC_LISTEN)
+ return -EINVAL;
+ if (smc->sk.sk_state == SMC_INIT ||
+ smc->sk.sk_state == SMC_CLOSED)
+ answ = 0;
+ else
+ answ = atomic_read(&smc->conn.bytes_to_rcv);
+ break;
+ case SIOCOUTQ:
+ /* output queue size (not send + not acked) */
+ if (smc->sk.sk_state == SMC_LISTEN)
+ return -EINVAL;
+ if (smc->sk.sk_state == SMC_INIT ||
+ smc->sk.sk_state == SMC_CLOSED)
+ answ = 0;
+ else
+ answ = smc->conn.sndbuf_desc->len -
+ atomic_read(&smc->conn.sndbuf_space);
+ break;
+ case SIOCOUTQNSD:
+ /* output queue size (not send only) */
+ if (smc->sk.sk_state == SMC_LISTEN)
+ return -EINVAL;
+ if (smc->sk.sk_state == SMC_INIT ||
+ smc->sk.sk_state == SMC_CLOSED)
+ answ = 0;
+ else
+ answ = smc_tx_prepared_sends(&smc->conn);
+ break;
+ case SIOCATMARK:
+ if (smc->sk.sk_state == SMC_LISTEN)
+ return -EINVAL;
+ if (smc->sk.sk_state == SMC_INIT ||
+ smc->sk.sk_state == SMC_CLOSED) {
+ answ = 0;
+ } else {
+ smc_curs_write(&cons,
+ smc_curs_read(&conn->local_tx_ctrl.cons, conn),
+ conn);
+ smc_curs_write(&urg,
+ smc_curs_read(&conn->urg_curs, conn),
+ conn);
+ answ = smc_curs_diff(conn->rmb_desc->len,
+ &cons, &urg) == 1;
+ }
+ break;
+ default:
+ return -ENOIOCTLCMD;
+ }
+
+ return put_user(answ, (int __user *)arg);
}
static ssize_t smc_sendpage(struct socket *sock, struct page *page,
@@ -1330,9 +1568,15 @@
return rc;
}
+/* Map the affected portions of the rmbe into an spd, note the number of bytes
+ * to splice in conn->splice_pending, and press 'go'. Delays consumer cursor
+ * updates till whenever a respective page has been fully processed.
+ * Note that subsequent recv() calls have to wait till all splice() processing
+ * completed.
+ */
static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
- unsigned int flags)
+ unsigned int flags)
{
struct sock *sk = sock->sk;
struct smc_sock *smc;
@@ -1340,16 +1584,34 @@
smc = smc_sk(sk);
lock_sock(sk);
- if ((sk->sk_state != SMC_ACTIVE) && (sk->sk_state != SMC_CLOSED))
+
+ if (sk->sk_state == SMC_INIT ||
+ sk->sk_state == SMC_LISTEN ||
+ sk->sk_state == SMC_CLOSED)
goto out;
+
+ if (sk->sk_state == SMC_PEERFINCLOSEWAIT) {
+ rc = 0;
+ goto out;
+ }
+
if (smc->use_fallback) {
rc = smc->clcsock->ops->splice_read(smc->clcsock, ppos,
pipe, len, flags);
} else {
- rc = -EOPNOTSUPP;
+ if (*ppos) {
+ rc = -ESPIPE;
+ goto out;
+ }
+ if (flags & SPLICE_F_NONBLOCK)
+ flags = MSG_DONTWAIT;
+ else
+ flags = 0;
+ rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags);
}
out:
release_sock(sk);
+
return rc;
}
@@ -1482,18 +1744,7 @@
static void __exit smc_exit(void)
{
- struct smc_link_group *lgr, *lg;
- LIST_HEAD(lgr_freeing_list);
-
- spin_lock_bh(&smc_lgr_list.lock);
- if (!list_empty(&smc_lgr_list.list))
- list_splice_init(&smc_lgr_list.list, &lgr_freeing_list);
- spin_unlock_bh(&smc_lgr_list.lock);
- list_for_each_entry_safe(lgr, lg, &lgr_freeing_list, list) {
- list_del_init(&lgr->list);
- cancel_delayed_work_sync(&lgr->free_work);
- smc_lgr_free(lgr); /* free link group */
- }
+ smc_core_exit();
static_branch_disable(&tcp_have_smc);
smc_ib_unregister_client();
sock_unregister(PF_SMC);
diff --git a/net/smc/smc.h b/net/smc/smc.h
index e4829a2..51ae1f1 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -114,11 +114,17 @@
u8 reserved[18];
} __aligned(8);
+enum smc_urg_state {
+ SMC_URG_VALID, /* data present */
+ SMC_URG_NOTYET, /* data pending */
+ SMC_URG_READ /* data was already read */
+};
+
struct smc_connection {
struct rb_node alert_node;
struct smc_link_group *lgr; /* link group of connection */
u32 alert_token_local; /* unique conn. id */
- u8 peer_conn_idx; /* from tcp handshake */
+ u8 peer_rmbe_idx; /* from tcp handshake */
int peer_rmbe_size; /* size of peer rx buffer */
atomic_t peer_rmbe_space;/* remaining free bytes in peer
* rmbe
@@ -126,9 +132,7 @@
int rtoken_idx; /* idx to peer RMB rkey/addr */
struct smc_buf_desc *sndbuf_desc; /* send buffer descriptor */
- int sndbuf_size; /* sndbuf size <== sock wmem */
struct smc_buf_desc *rmb_desc; /* RMBE descriptor */
- int rmbe_size; /* RMBE size <== sock rmem */
int rmbe_size_short;/* compressed notation */
int rmbe_update_limit;
/* lower limit for consumer
@@ -153,6 +157,7 @@
u16 tx_cdc_seq; /* sequence # for CDC send */
spinlock_t send_lock; /* protect wr_sends */
struct delayed_work tx_work; /* retry of smc_cdc_msg_send */
+ u32 tx_off; /* base offset in peer rmb */
struct smc_host_cdc_msg local_rx_ctrl; /* filled during event_handl.
* .prod cf. TCP rcv_nxt
@@ -161,9 +166,21 @@
union smc_host_cursor rx_curs_confirmed; /* confirmed to peer
* source of snd_una ?
*/
+ union smc_host_cursor urg_curs; /* points at urgent byte */
+ enum smc_urg_state urg_state;
+ bool urg_tx_pend; /* urgent data staged */
+ bool urg_rx_skip_pend;
+ /* indicate urgent oob data
+ * read, but previous regular
+ * data still pending
+ */
+ char urg_rx_byte; /* urgent byte */
atomic_t bytes_to_rcv; /* arrived data,
* not yet received
*/
+ atomic_t splice_pending; /* number of spliced bytes
+ * pending processing
+ */
#ifndef KERNEL_HAS_ATOMIC64
spinlock_t acurs_lock; /* protect cursors */
#endif
@@ -180,6 +197,10 @@
struct list_head accept_q; /* sockets to be accepted */
spinlock_t accept_q_lock; /* protects accept_q */
bool use_fallback; /* fallback to tcp */
+ int sockopt_defer_accept;
+ /* sockopt TCP_DEFER_ACCEPT
+ * value
+ */
u8 wait_close_tx_prepared : 1;
/* shutdown wr or close
* started, waiting for unsent
@@ -214,41 +235,6 @@
return be32_to_cpu(t);
}
-#define SMC_BUF_MIN_SIZE 16384 /* minimum size of an RMB */
-
-#define SMC_RMBE_SIZES 16 /* number of distinct sizes for an RMBE */
-/* theoretically, the RFC states that largest size would be 512K,
- * i.e. compressed 5 and thus 6 sizes (0..5), despite
- * struct smc_clc_msg_accept_confirm.rmbe_size being a 4 bit value (0..15)
- */
-
-/* convert the RMB size into the compressed notation - minimum 16K.
- * In contrast to plain ilog2, this rounds towards the next power of 2,
- * so the socket application gets at least its desired sndbuf / rcvbuf size.
- */
-static inline u8 smc_compress_bufsize(int size)
-{
- u8 compressed;
-
- if (size <= SMC_BUF_MIN_SIZE)
- return 0;
-
- size = (size - 1) >> 14;
- compressed = ilog2(size) + 1;
- if (compressed >= SMC_RMBE_SIZES)
- compressed = SMC_RMBE_SIZES - 1;
- return compressed;
-}
-
-/* convert the RMB size from compressed notation into integer */
-static inline int smc_uncompress_bufsize(u8 compressed)
-{
- u32 size;
-
- size = 0x00000001 << (((int)compressed) + 14);
- return (int)size;
-}
-
#ifdef CONFIG_XFRM
static inline bool using_ipsec(struct smc_sock *smc)
{
@@ -262,12 +248,6 @@
}
#endif
-struct smc_clc_msg_local;
-
-void smc_conn_free(struct smc_connection *conn);
-int smc_conn_create(struct smc_sock *smc,
- struct smc_ib_device *smcibdev, u8 ibport,
- struct smc_clc_msg_local *lcl, int srv_first_contact);
struct sock *smc_accept_dequeue(struct sock *parent, struct socket *new_sock);
void smc_close_non_accepted(struct sock *sk);
diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index b42395d..a7e8d63 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -44,13 +44,13 @@
smc = container_of(cdcpend->conn, struct smc_sock, conn);
bh_lock_sock(&smc->sk);
if (!wc_status) {
- diff = smc_curs_diff(cdcpend->conn->sndbuf_size,
+ diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
&cdcpend->conn->tx_curs_fin,
&cdcpend->cursor);
/* sndbuf_space is decreased in smc_sendmsg */
smp_mb__before_atomic();
atomic_add(diff, &cdcpend->conn->sndbuf_space);
- /* guarantee 0 <= sndbuf_space <= sndbuf_size */
+ /* guarantee 0 <= sndbuf_space <= sndbuf_desc->len */
smp_mb__after_atomic();
smc_curs_write(&cdcpend->conn->tx_curs_fin,
smc_curs_read(&cdcpend->cursor, cdcpend->conn),
@@ -82,7 +82,7 @@
sizeof(struct smc_cdc_msg) > SMC_WR_BUF_SIZE,
"must increase SMC_WR_BUF_SIZE to at least sizeof(struct smc_cdc_msg)");
BUILD_BUG_ON_MSG(
- offsetof(struct smc_cdc_msg, reserved) > SMC_WR_TX_SIZE,
+ sizeof(struct smc_cdc_msg) != SMC_WR_TX_SIZE,
"must adapt SMC_WR_TX_SIZE to sizeof(struct smc_cdc_msg); if not all smc_wr upper layer protocols use the same message size any more, must start to set link->wr_tx_sges[i].length on each individual smc_wr_tx_send()");
BUILD_BUG_ON_MSG(
sizeof(struct smc_cdc_tx_pend) > SMC_WR_TX_PEND_PRIV_SIZE,
@@ -164,20 +164,35 @@
return (s16)(seq1 - seq2) < 0;
}
+static void smc_cdc_handle_urg_data_arrival(struct smc_sock *smc,
+ int *diff_prod)
+{
+ struct smc_connection *conn = &smc->conn;
+ char *base;
+
+ /* new data included urgent business */
+ smc_curs_write(&conn->urg_curs,
+ smc_curs_read(&conn->local_rx_ctrl.prod, conn),
+ conn);
+ conn->urg_state = SMC_URG_VALID;
+ if (!sock_flag(&smc->sk, SOCK_URGINLINE))
+ /* we'll skip the urgent byte, so don't account for it */
+ (*diff_prod)--;
+ base = (char *)conn->rmb_desc->cpu_addr;
+ if (conn->urg_curs.count)
+ conn->urg_rx_byte = *(base + conn->urg_curs.count - 1);
+ else
+ conn->urg_rx_byte = *(base + conn->rmb_desc->len - 1);
+ sk_send_sigurg(&smc->sk);
+}
+
static void smc_cdc_msg_recv_action(struct smc_sock *smc,
- struct smc_link *link,
struct smc_cdc_msg *cdc)
{
union smc_host_cursor cons_old, prod_old;
struct smc_connection *conn = &smc->conn;
int diff_cons, diff_prod;
- if (!cdc->prod_flags.failover_validation) {
- if (smc_cdc_before(ntohs(cdc->seqno),
- conn->local_rx_ctrl.seqno))
- /* received seqno is old */
- return;
- }
smc_curs_write(&prod_old,
smc_curs_read(&conn->local_rx_ctrl.prod, conn),
conn);
@@ -198,18 +213,28 @@
smp_mb__after_atomic();
}
- diff_prod = smc_curs_diff(conn->rmbe_size, &prod_old,
+ diff_prod = smc_curs_diff(conn->rmb_desc->len, &prod_old,
&conn->local_rx_ctrl.prod);
if (diff_prod) {
+ if (conn->local_rx_ctrl.prod_flags.urg_data_present)
+ smc_cdc_handle_urg_data_arrival(smc, &diff_prod);
/* bytes_to_rcv is decreased in smc_recvmsg */
smp_mb__before_atomic();
atomic_add(diff_prod, &conn->bytes_to_rcv);
- /* guarantee 0 <= bytes_to_rcv <= rmbe_size */
+ /* guarantee 0 <= bytes_to_rcv <= rmb_desc->len */
smp_mb__after_atomic();
smc->sk.sk_data_ready(&smc->sk);
- } else if ((conn->local_rx_ctrl.prod_flags.write_blocked) ||
- (conn->local_rx_ctrl.prod_flags.cons_curs_upd_req)) {
- smc->sk.sk_data_ready(&smc->sk);
+ } else {
+ if (conn->local_rx_ctrl.prod_flags.write_blocked ||
+ conn->local_rx_ctrl.prod_flags.cons_curs_upd_req ||
+ conn->local_rx_ctrl.prod_flags.urg_data_pending) {
+ if (conn->local_rx_ctrl.prod_flags.urg_data_pending)
+ conn->urg_state = SMC_URG_NOTYET;
+ /* force immediate tx of current consumer cursor, but
+ * under send_lock to guarantee arrival in seqno-order
+ */
+ smc_tx_sndbuf_nonempty(conn);
+ }
}
/* piggy backed tx info */
@@ -219,6 +244,12 @@
/* trigger socket release if connection closed */
smc_close_wake_tx_prepared(smc);
}
+ if (diff_cons && conn->urg_tx_pend &&
+ atomic_read(&conn->peer_rmbe_space) == conn->peer_rmbe_size) {
+ /* urg data confirmed by peer, indicate we're ready for more */
+ conn->urg_tx_pend = false;
+ smc->sk.sk_write_space(&smc->sk);
+ }
if (conn->local_rx_ctrl.conn_state_flags.peer_conn_abort) {
smc->sk.sk_err = ECONNRESET;
@@ -236,26 +267,11 @@
}
/* called under tasklet context */
-static inline void smc_cdc_msg_recv(struct smc_cdc_msg *cdc,
- struct smc_link *link, u64 wr_id)
+static void smc_cdc_msg_recv(struct smc_sock *smc, struct smc_cdc_msg *cdc)
{
- struct smc_link_group *lgr = container_of(link, struct smc_link_group,
- lnk[SMC_SINGLE_LINK]);
- struct smc_connection *connection;
- struct smc_sock *smc;
-
- /* lookup connection */
- read_lock_bh(&lgr->conns_lock);
- connection = smc_lgr_find_conn(ntohl(cdc->token), lgr);
- if (!connection) {
- read_unlock_bh(&lgr->conns_lock);
- return;
- }
- smc = container_of(connection, struct smc_sock, conn);
sock_hold(&smc->sk);
- read_unlock_bh(&lgr->conns_lock);
bh_lock_sock(&smc->sk);
- smc_cdc_msg_recv_action(smc, link, cdc);
+ smc_cdc_msg_recv_action(smc, cdc);
bh_unlock_sock(&smc->sk);
sock_put(&smc->sk); /* no free sk in softirq-context */
}
@@ -266,12 +282,31 @@
{
struct smc_link *link = (struct smc_link *)wc->qp->qp_context;
struct smc_cdc_msg *cdc = buf;
+ struct smc_connection *conn;
+ struct smc_link_group *lgr;
+ struct smc_sock *smc;
if (wc->byte_len < offsetof(struct smc_cdc_msg, reserved))
return; /* short message */
if (cdc->len != SMC_WR_TX_SIZE)
return; /* invalid message */
- smc_cdc_msg_recv(cdc, link, wc->wr_id);
+
+ /* lookup connection */
+ lgr = container_of(link, struct smc_link_group, lnk[SMC_SINGLE_LINK]);
+ read_lock_bh(&lgr->conns_lock);
+ conn = smc_lgr_find_conn(ntohl(cdc->token), lgr);
+ read_unlock_bh(&lgr->conns_lock);
+ if (!conn)
+ return;
+ smc = container_of(conn, struct smc_sock, conn);
+
+ if (!cdc->prod_flags.failover_validation) {
+ if (smc_cdc_before(ntohs(cdc->seqno),
+ conn->local_rx_ctrl.seqno))
+ /* received seqno is old */
+ return;
+ }
+ smc_cdc_msg_recv(smc, cdc);
}
static struct smc_wr_rx_handler smc_cdc_rx_handlers[] = {
diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
index ab240b3..f60082f 100644
--- a/net/smc/smc_cdc.h
+++ b/net/smc/smc_cdc.h
@@ -48,7 +48,7 @@
struct smc_cdc_producer_flags prod_flags;
struct smc_cdc_conn_state_flags conn_state_flags;
u8 reserved[18];
-} __aligned(8);
+} __packed; /* format defined in RFC7609 */
static inline bool smc_cdc_rxed_any_close(struct smc_connection *conn)
{
@@ -146,6 +146,19 @@
return max_t(int, 0, (new->count - old->count));
}
+/* calculate cursor difference between old and new - returns negative
+ * value in case old > new
+ */
+static inline int smc_curs_comp(unsigned int size,
+ union smc_host_cursor *old,
+ union smc_host_cursor *new)
+{
+ if (old->wrap > new->wrap ||
+ (old->wrap == new->wrap && old->count > new->count))
+ return -smc_curs_diff(size, new, old);
+ return smc_curs_diff(size, old, new);
+}
+
static inline void smc_host_cursor_to_cdc(union smc_cdc_cursor *peer,
union smc_host_cursor *local,
struct smc_connection *conn)
diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index 3a988c2..717449b 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -316,7 +316,7 @@
if (clcm->type == SMC_CLC_DECLINE) {
reason_code = SMC_CLC_DECL_REPLY;
if (((struct smc_clc_msg_decline *)buf)->hdr.flag) {
- smc->conn.lgr->sync_err = true;
+ smc->conn.lgr->sync_err = 1;
smc_lgr_terminate(smc->conn.lgr);
}
}
@@ -442,7 +442,7 @@
hton24(cclc.qpn, link->roce_qp->qp_num);
cclc.rmb_rkey =
htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey);
- cclc.conn_idx = 1; /* for now: 1 RMB = 1 RMBE */
+ cclc.rmbe_idx = 1; /* for now: 1 RMB = 1 RMBE */
cclc.rmbe_alert_token = htonl(conn->alert_token_local);
cclc.qp_mtu = min(link->path_mtu, link->peer_mtu);
cclc.rmbe_size = conn->rmbe_size_short;
@@ -494,7 +494,7 @@
hton24(aclc.qpn, link->roce_qp->qp_num);
aclc.rmb_rkey =
htonl(conn->rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey);
- aclc.conn_idx = 1; /* as long as 1 RMB = 1 RMBE */
+ aclc.rmbe_idx = 1; /* as long as 1 RMB = 1 RMBE */
aclc.rmbe_alert_token = htonl(conn->alert_token_local);
aclc.qp_mtu = link->path_mtu;
aclc.rmbe_size = conn->rmbe_size_short,
diff --git a/net/smc/smc_clc.h b/net/smc/smc_clc.h
index 63bf1dc..41ff9ea 100644
--- a/net/smc/smc_clc.h
+++ b/net/smc/smc_clc.h
@@ -97,7 +97,7 @@
struct smc_clc_msg_local lcl;
u8 qpn[3]; /* QP number */
__be32 rmb_rkey; /* RMB rkey */
- u8 conn_idx; /* Connection index, which RMBE in RMB */
+ u8 rmbe_idx; /* Index of RMBE in RMB */
__be32 rmbe_alert_token;/* unique connection id */
#if defined(__BIG_ENDIAN_BITFIELD)
u8 rmbe_size : 4, /* RMBE buf size (compressed notation) */
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d4bd01b..add82b0 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -28,12 +28,16 @@
#define SMC_LGR_NUM_INCR 256
#define SMC_LGR_FREE_DELAY_SERV (600 * HZ)
-#define SMC_LGR_FREE_DELAY_CLNT (SMC_LGR_FREE_DELAY_SERV + 10)
+#define SMC_LGR_FREE_DELAY_CLNT (SMC_LGR_FREE_DELAY_SERV + 10 * HZ)
-static u32 smc_lgr_num; /* unique link group number */
+static struct smc_lgr_list smc_lgr_list = { /* established link groups */
+ .lock = __SPIN_LOCK_UNLOCKED(smc_lgr_list.lock),
+ .list = LIST_HEAD_INIT(smc_lgr_list.list),
+ .num = 0,
+};
-static void smc_buf_free(struct smc_buf_desc *buf_desc, struct smc_link *lnk,
- bool is_rmb);
+static void smc_buf_free(struct smc_link_group *lgr, bool is_rmb,
+ struct smc_buf_desc *buf_desc);
static void smc_lgr_schedule_free_work(struct smc_link_group *lgr)
{
@@ -148,8 +152,11 @@
list_del_init(&lgr->list); /* remove from smc_lgr_list */
free:
spin_unlock_bh(&smc_lgr_list.lock);
- if (!delayed_work_pending(&lgr->free_work))
+ if (!delayed_work_pending(&lgr->free_work)) {
+ if (lgr->lnk[SMC_SINGLE_LINK].state != SMC_LNK_INACTIVE)
+ smc_llc_link_inactive(&lgr->lnk[SMC_SINGLE_LINK]);
smc_lgr_free(lgr);
+ }
}
/* create a new SMC link group */
@@ -169,7 +176,7 @@
goto out;
}
lgr->role = smc->listen_smc ? SMC_SERV : SMC_CLNT;
- lgr->sync_err = false;
+ lgr->sync_err = 0;
memcpy(lgr->peer_systemid, peer_systemid, SMC_SYSTEMID_LEN);
lgr->vlan_id = vlan_id;
rwlock_init(&lgr->sndbufs_lock);
@@ -178,8 +185,8 @@
INIT_LIST_HEAD(&lgr->sndbufs[i]);
INIT_LIST_HEAD(&lgr->rmbs[i]);
}
- smc_lgr_num += SMC_LGR_NUM_INCR;
- memcpy(&lgr->id, (u8 *)&smc_lgr_num, SMC_LGR_ID_SIZE);
+ smc_lgr_list.num += SMC_LGR_NUM_INCR;
+ memcpy(&lgr->id, (u8 *)&smc_lgr_list.num, SMC_LGR_ID_SIZE);
INIT_DELAYED_WORK(&lgr->free_work, smc_lgr_free_work);
lgr->conns_all = RB_ROOT;
@@ -194,9 +201,12 @@
smc_ib_setup_per_ibdev(smcibdev);
get_random_bytes(rndvec, sizeof(rndvec));
lnk->psn_initial = rndvec[0] + (rndvec[1] << 8) + (rndvec[2] << 16);
- rc = smc_wr_alloc_link_mem(lnk);
+ rc = smc_llc_link_init(lnk);
if (rc)
goto free_lgr;
+ rc = smc_wr_alloc_link_mem(lnk);
+ if (rc)
+ goto clear_llc_lnk;
rc = smc_ib_create_protection_domain(lnk);
if (rc)
goto free_link_mem;
@@ -206,10 +216,6 @@
rc = smc_wr_create_link(lnk);
if (rc)
goto destroy_qp;
- init_completion(&lnk->llc_confirm);
- init_completion(&lnk->llc_confirm_resp);
- init_completion(&lnk->llc_add);
- init_completion(&lnk->llc_add_resp);
smc->conn.lgr = lgr;
rwlock_init(&lgr->conns_lock);
@@ -224,6 +230,8 @@
smc_ib_dealloc_protection_domain(lnk);
free_link_mem:
smc_wr_free_link_mem(lnk);
+clear_llc_lnk:
+ smc_llc_link_clear(lnk);
free_lgr:
kfree(lgr);
out:
@@ -232,26 +240,21 @@
static void smc_buf_unuse(struct smc_connection *conn)
{
- if (conn->sndbuf_desc) {
+ if (conn->sndbuf_desc)
conn->sndbuf_desc->used = 0;
- conn->sndbuf_size = 0;
- }
if (conn->rmb_desc) {
if (!conn->rmb_desc->regerr) {
conn->rmb_desc->reused = 1;
conn->rmb_desc->used = 0;
- conn->rmbe_size = 0;
} else {
/* buf registration failed, reuse not possible */
struct smc_link_group *lgr = conn->lgr;
- struct smc_link *lnk;
write_lock_bh(&lgr->rmbs_lock);
list_del(&conn->rmb_desc->list);
write_unlock_bh(&lgr->rmbs_lock);
- lnk = &lgr->lnk[SMC_SINGLE_LINK];
- smc_buf_free(conn->rmb_desc, lnk, true);
+ smc_buf_free(lgr, true, conn->rmb_desc);
}
}
}
@@ -269,6 +272,7 @@
static void smc_link_clear(struct smc_link *lnk)
{
lnk->peer_qpn = 0;
+ smc_llc_link_clear(lnk);
smc_ib_modify_qp_reset(lnk);
smc_wr_free_link(lnk);
smc_ib_destroy_queue_pair(lnk);
@@ -276,9 +280,11 @@
smc_wr_free_link_mem(lnk);
}
-static void smc_buf_free(struct smc_buf_desc *buf_desc, struct smc_link *lnk,
- bool is_rmb)
+static void smc_buf_free(struct smc_link_group *lgr, bool is_rmb,
+ struct smc_buf_desc *buf_desc)
{
+ struct smc_link *lnk = &lgr->lnk[SMC_SINGLE_LINK];
+
if (is_rmb) {
if (buf_desc->mr_rx[SMC_SINGLE_LINK])
smc_ib_put_memory_region(
@@ -290,14 +296,13 @@
DMA_TO_DEVICE);
}
sg_free_table(&buf_desc->sgt[SMC_SINGLE_LINK]);
- if (buf_desc->cpu_addr)
- free_pages((unsigned long)buf_desc->cpu_addr, buf_desc->order);
+ if (buf_desc->pages)
+ __free_pages(buf_desc->pages, buf_desc->order);
kfree(buf_desc);
}
static void __smc_lgr_free_bufs(struct smc_link_group *lgr, bool is_rmb)
{
- struct smc_link *lnk = &lgr->lnk[SMC_SINGLE_LINK];
struct smc_buf_desc *buf_desc, *bf_desc;
struct list_head *buf_list;
int i;
@@ -310,7 +315,7 @@
list_for_each_entry_safe(buf_desc, bf_desc, buf_list,
list) {
list_del(&buf_desc->list);
- smc_buf_free(buf_desc, lnk, is_rmb);
+ smc_buf_free(lgr, is_rmb, buf_desc);
}
}
}
@@ -341,13 +346,18 @@
}
/* terminate linkgroup abnormally */
-void smc_lgr_terminate(struct smc_link_group *lgr)
+static void __smc_lgr_terminate(struct smc_link_group *lgr)
{
struct smc_connection *conn;
struct smc_sock *smc;
struct rb_node *node;
- smc_lgr_forget(lgr);
+ if (lgr->terminating)
+ return; /* lgr already terminating */
+ lgr->terminating = 1;
+ if (!list_empty(&lgr->list)) /* forget lgr */
+ list_del_init(&lgr->list);
+ smc_llc_link_inactive(&lgr->lnk[SMC_SINGLE_LINK]);
write_lock_bh(&lgr->conns_lock);
node = rb_first(&lgr->conns_all);
@@ -368,13 +378,35 @@
smc_lgr_schedule_free_work(lgr);
}
+void smc_lgr_terminate(struct smc_link_group *lgr)
+{
+ spin_lock_bh(&smc_lgr_list.lock);
+ __smc_lgr_terminate(lgr);
+ spin_unlock_bh(&smc_lgr_list.lock);
+}
+
+/* Called when IB port is terminated */
+void smc_port_terminate(struct smc_ib_device *smcibdev, u8 ibport)
+{
+ struct smc_link_group *lgr, *l;
+
+ spin_lock_bh(&smc_lgr_list.lock);
+ list_for_each_entry_safe(lgr, l, &smc_lgr_list.list, list) {
+ if (lgr->lnk[SMC_SINGLE_LINK].smcibdev == smcibdev &&
+ lgr->lnk[SMC_SINGLE_LINK].ibport == ibport)
+ __smc_lgr_terminate(lgr);
+ }
+ spin_unlock_bh(&smc_lgr_list.lock);
+}
+
/* Determine vlan of internal TCP socket.
* @vlan_id: address to store the determined vlan id into
*/
static int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned short *vlan_id)
{
struct dst_entry *dst = sk_dst_get(clcsock->sk);
- int rc = 0;
+ struct net_device *ndev;
+ int i, nest_lvl, rc = 0;
*vlan_id = 0;
if (!dst) {
@@ -386,8 +418,27 @@
goto out_rel;
}
- if (is_vlan_dev(dst->dev))
- *vlan_id = vlan_dev_vlan_id(dst->dev);
+ ndev = dst->dev;
+ if (is_vlan_dev(ndev)) {
+ *vlan_id = vlan_dev_vlan_id(ndev);
+ goto out_rel;
+ }
+
+ rtnl_lock();
+ nest_lvl = dev_get_nest_level(ndev);
+ for (i = 0; i < nest_lvl; i++) {
+ struct list_head *lower = &ndev->adj_list.lower;
+
+ if (list_empty(lower))
+ break;
+ lower = lower->next;
+ ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower);
+ if (is_vlan_dev(ndev)) {
+ *vlan_id = vlan_dev_vlan_id(ndev);
+ break;
+ }
+ }
+ rtnl_unlock();
out_rel:
dst_release(dst);
@@ -432,10 +483,10 @@
struct smc_clc_msg_local *lcl, int srv_first_contact)
{
struct smc_connection *conn = &smc->conn;
+ int local_contact = SMC_FIRST_CONTACT;
struct smc_link_group *lgr;
unsigned short vlan_id;
enum smc_lgr_role role;
- int local_contact = SMC_FIRST_CONTACT;
int rc = 0;
role = smc->listen_smc ? SMC_SERV : SMC_CLNT;
@@ -493,6 +544,7 @@
}
conn->local_tx_ctrl.common.type = SMC_CDC_MSG_TYPE;
conn->local_tx_ctrl.len = SMC_WR_TX_SIZE;
+ conn->urg_state = SMC_URG_READ;
#ifndef KERNEL_HAS_ATOMIC64
spin_lock_init(&conn->acurs_lock);
#endif
@@ -501,14 +553,39 @@
return rc ? rc : local_contact;
}
+/* convert the RMB size into the compressed notation - minimum 16K.
+ * In contrast to plain ilog2, this rounds towards the next power of 2,
+ * so the socket application gets at least its desired sndbuf / rcvbuf size.
+ */
+static u8 smc_compress_bufsize(int size)
+{
+ u8 compressed;
+
+ if (size <= SMC_BUF_MIN_SIZE)
+ return 0;
+
+ size = (size - 1) >> 14;
+ compressed = ilog2(size) + 1;
+ if (compressed >= SMC_RMBE_SIZES)
+ compressed = SMC_RMBE_SIZES - 1;
+ return compressed;
+}
+
+/* convert the RMB size from compressed notation into integer */
+int smc_uncompress_bufsize(u8 compressed)
+{
+ u32 size;
+
+ size = 0x00000001 << (((int)compressed) + 14);
+ return (int)size;
+}
+
/* try to reuse a sndbuf or rmb description slot for a certain
* buffer size; if not available, return NULL
*/
-static inline
-struct smc_buf_desc *smc_buf_get_slot(struct smc_link_group *lgr,
- int compressed_bufsize,
- rwlock_t *lock,
- struct list_head *buf_list)
+static struct smc_buf_desc *smc_buf_get_slot(int compressed_bufsize,
+ rwlock_t *lock,
+ struct list_head *buf_list)
{
struct smc_buf_desc *buf_slot;
@@ -544,23 +621,23 @@
if (!buf_desc)
return ERR_PTR(-ENOMEM);
- buf_desc->cpu_addr =
- (void *)__get_free_pages(GFP_KERNEL | __GFP_NOWARN |
- __GFP_NOMEMALLOC |
- __GFP_NORETRY | __GFP_ZERO,
- get_order(bufsize));
- if (!buf_desc->cpu_addr) {
+ buf_desc->order = get_order(bufsize);
+ buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | __GFP_COMP |
+ __GFP_NORETRY | __GFP_ZERO,
+ buf_desc->order);
+ if (!buf_desc->pages) {
kfree(buf_desc);
return ERR_PTR(-EAGAIN);
}
- buf_desc->order = get_order(bufsize);
+ buf_desc->cpu_addr = (void *)page_address(buf_desc->pages);
/* build the sg table from the pages */
lnk = &lgr->lnk[SMC_SINGLE_LINK];
rc = sg_alloc_table(&buf_desc->sgt[SMC_SINGLE_LINK], 1,
GFP_KERNEL);
if (rc) {
- smc_buf_free(buf_desc, lnk, is_rmb);
+ smc_buf_free(lgr, is_rmb, buf_desc);
return ERR_PTR(rc);
}
sg_set_buf(buf_desc->sgt[SMC_SINGLE_LINK].sgl,
@@ -571,7 +648,7 @@
is_rmb ? DMA_FROM_DEVICE : DMA_TO_DEVICE);
/* SMC protocol depends on mapping to one DMA address only */
if (rc != 1) {
- smc_buf_free(buf_desc, lnk, is_rmb);
+ smc_buf_free(lgr, is_rmb, buf_desc);
return ERR_PTR(-EAGAIN);
}
@@ -582,19 +659,20 @@
IB_ACCESS_LOCAL_WRITE,
buf_desc);
if (rc) {
- smc_buf_free(buf_desc, lnk, is_rmb);
+ smc_buf_free(lgr, is_rmb, buf_desc);
return ERR_PTR(rc);
}
}
+ buf_desc->len = bufsize;
return buf_desc;
}
static int __smc_buf_create(struct smc_sock *smc, bool is_rmb)
{
+ struct smc_buf_desc *buf_desc = ERR_PTR(-ENOMEM);
struct smc_connection *conn = &smc->conn;
struct smc_link_group *lgr = conn->lgr;
- struct smc_buf_desc *buf_desc = ERR_PTR(-ENOMEM);
struct list_head *buf_list;
int bufsize, bufsize_short;
int sk_buf_size;
@@ -622,7 +700,7 @@
continue;
/* check for reusable slot in the link group */
- buf_desc = smc_buf_get_slot(lgr, bufsize_short, lock, buf_list);
+ buf_desc = smc_buf_get_slot(bufsize_short, lock, buf_list);
if (buf_desc) {
memset(buf_desc->cpu_addr, 0, bufsize);
break; /* found reusable slot */
@@ -646,14 +724,12 @@
if (is_rmb) {
conn->rmb_desc = buf_desc;
- conn->rmbe_size = bufsize;
conn->rmbe_size_short = bufsize_short;
smc->sk.sk_rcvbuf = bufsize * 2;
atomic_set(&conn->bytes_to_rcv, 0);
conn->rmbe_update_limit = smc_rmb_wnd_update_limit(bufsize);
} else {
conn->sndbuf_desc = buf_desc;
- conn->sndbuf_size = bufsize;
smc->sk.sk_sndbuf = bufsize * 2;
atomic_set(&conn->sndbuf_space, bufsize);
}
@@ -709,8 +785,7 @@
/* create rmb */
rc = __smc_buf_create(smc, true);
if (rc)
- smc_buf_free(smc->conn.sndbuf_desc,
- &smc->conn.lgr->lnk[SMC_SINGLE_LINK], false);
+ smc_buf_free(smc->conn.lgr, false, smc->conn.sndbuf_desc);
return rc;
}
@@ -777,3 +852,21 @@
return conn->rtoken_idx;
return 0;
}
+
+/* Called (from smc_exit) when module is removed */
+void smc_core_exit(void)
+{
+ struct smc_link_group *lgr, *lg;
+ LIST_HEAD(lgr_freeing_list);
+
+ spin_lock_bh(&smc_lgr_list.lock);
+ if (!list_empty(&smc_lgr_list.list))
+ list_splice_init(&smc_lgr_list.list, &lgr_freeing_list);
+ spin_unlock_bh(&smc_lgr_list.lock);
+ list_for_each_entry_safe(lgr, lg, &lgr_freeing_list, list) {
+ list_del_init(&lgr->list);
+ smc_llc_link_inactive(&lgr->lnk[SMC_SINGLE_LINK]);
+ cancel_delayed_work_sync(&lgr->free_work);
+ smc_lgr_free(lgr); /* free link group */
+ }
+}
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 5dfcb15..93cb352 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -23,10 +23,9 @@
struct smc_lgr_list { /* list of link group definition */
struct list_head list;
spinlock_t lock; /* protects list of link groups */
+ u32 num; /* unique link group number */
};
-extern struct smc_lgr_list smc_lgr_list; /* list of link groups */
-
enum smc_lgr_role { /* possible roles of a link group */
SMC_CLNT, /* client */
SMC_SERV /* server */
@@ -79,6 +78,7 @@
dma_addr_t wr_rx_dma_addr; /* DMA address of wr_rx_bufs */
u64 wr_rx_id; /* seq # of last recv WR */
u32 wr_rx_cnt; /* number of WR recv buffers */
+ unsigned long wr_rx_tstamp; /* jiffies when last buf rx */
struct ib_reg_wr wr_reg; /* WR register memory region */
wait_queue_head_t wr_reg_wait; /* wait for wr_reg result */
@@ -95,12 +95,18 @@
u8 link_id; /* unique # within link group */
enum smc_link_state state; /* state of link */
+ struct workqueue_struct *llc_wq; /* single thread work queue */
struct completion llc_confirm; /* wait for rx of conf link */
struct completion llc_confirm_resp; /* wait 4 rx of cnf lnk rsp */
int llc_confirm_rc; /* rc from confirm link msg */
int llc_confirm_resp_rc; /* rc from conf_resp msg */
struct completion llc_add; /* wait for rx of add link */
struct completion llc_add_resp; /* wait for rx of add link rsp*/
+ struct delayed_work llc_testlink_wrk; /* testlink worker */
+ struct completion llc_testlink_resp; /* wait for rx of testlink */
+ int llc_testlink_time; /* testlink interval */
+ struct completion llc_confirm_rkey; /* wait 4 rx of cnf rkey */
+ int llc_confirm_rkey_rc; /* rc from cnf rkey msg */
};
/* For now we just allow one parallel link per link group. The SMC protocol
@@ -116,6 +122,8 @@
struct smc_buf_desc {
struct list_head list;
void *cpu_addr; /* virtual address of buffer */
+ struct page *pages;
+ int len; /* length of buffer */
struct sg_table sgt[SMC_LINKS_PER_LGR_MAX];/* virtual buffer */
struct ib_mr *mr_rx[SMC_LINKS_PER_LGR_MAX];
/* for rmb only: memory region
@@ -133,6 +141,12 @@
};
#define SMC_LGR_ID_SIZE 4
+#define SMC_BUF_MIN_SIZE 16384 /* minimum size of an RMB */
+#define SMC_RMBE_SIZES 16 /* number of distinct RMBE sizes */
+/* theoretically, the RFC states that largest size would be 512K,
+ * i.e. compressed 5 and thus 6 sizes (0..5), despite
+ * struct smc_clc_msg_accept_confirm.rmbe_size being a 4 bit value (0..15)
+ */
struct smc_link_group {
struct list_head list;
@@ -158,7 +172,8 @@
u8 id[SMC_LGR_ID_SIZE]; /* unique lgr id */
struct delayed_work free_work; /* delayed freeing of an lgr */
- bool sync_err; /* lgr no longer fits to peer */
+ u8 sync_err : 1; /* lgr no longer fits to peer */
+ u8 terminating : 1;/* lgr is terminating */
};
/* Find the connection associated with the given alert token in the link group.
@@ -196,11 +211,14 @@
struct smc_sock;
struct smc_clc_msg_accept_confirm;
+struct smc_clc_msg_local;
void smc_lgr_free(struct smc_link_group *lgr);
void smc_lgr_forget(struct smc_link_group *lgr);
void smc_lgr_terminate(struct smc_link_group *lgr);
+void smc_port_terminate(struct smc_ib_device *smcibdev, u8 ibport);
int smc_buf_create(struct smc_sock *smc);
+int smc_uncompress_bufsize(u8 compressed);
int smc_rmb_rtoken_handling(struct smc_connection *conn,
struct smc_clc_msg_accept_confirm *clc);
int smc_rtoken_add(struct smc_link_group *lgr, __be64 nw_vaddr, __be32 nw_rkey);
@@ -209,4 +227,9 @@
void smc_sndbuf_sync_sg_for_device(struct smc_connection *conn);
void smc_rmb_sync_sg_for_cpu(struct smc_connection *conn);
void smc_rmb_sync_sg_for_device(struct smc_connection *conn);
+void smc_conn_free(struct smc_connection *conn);
+int smc_conn_create(struct smc_sock *smc,
+ struct smc_ib_device *smcibdev, u8 ibport,
+ struct smc_clc_msg_local *lcl, int srv_first_contact);
+void smc_core_exit(void);
#endif
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index 427b91c..8393544 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -38,17 +38,27 @@
{
struct smc_sock *smc = smc_sk(sk);
- r->diag_family = sk->sk_family;
if (!smc->clcsock)
return;
r->id.idiag_sport = htons(smc->clcsock->sk->sk_num);
r->id.idiag_dport = smc->clcsock->sk->sk_dport;
r->id.idiag_if = smc->clcsock->sk->sk_bound_dev_if;
sock_diag_save_cookie(sk, r->id.idiag_cookie);
- memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
- memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
- r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr;
- r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr;
+ if (sk->sk_protocol == SMCPROTO_SMC) {
+ r->diag_family = PF_INET;
+ memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
+ memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
+ r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr;
+ r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr;
+#if IS_ENABLED(CONFIG_IPV6)
+ } else if (sk->sk_protocol == SMCPROTO_SMC6) {
+ r->diag_family = PF_INET6;
+ memcpy(&r->id.idiag_src, &smc->clcsock->sk->sk_v6_rcv_saddr,
+ sizeof(smc->clcsock->sk->sk_v6_rcv_saddr));
+ memcpy(&r->id.idiag_dst, &smc->clcsock->sk->sk_v6_daddr,
+ sizeof(smc->clcsock->sk->sk_v6_daddr));
+#endif
+ }
}
static int smc_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
@@ -91,8 +101,9 @@
struct smc_connection *conn = &smc->conn;
struct smc_diag_conninfo cinfo = {
.token = conn->alert_token_local,
- .sndbuf_size = conn->sndbuf_size,
- .rmbe_size = conn->rmbe_size,
+ .sndbuf_size = conn->sndbuf_desc ?
+ conn->sndbuf_desc->len : 0,
+ .rmbe_size = conn->rmb_desc ? conn->rmb_desc->len : 0,
.peer_rmbe_size = conn->peer_rmbe_size,
.rx_prod.wrap = conn->local_rx_ctrl.prod.wrap,
@@ -153,7 +164,8 @@
return -EMSGSIZE;
}
-static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
+static int smc_diag_dump_proto(struct proto *prot, struct sk_buff *skb,
+ struct netlink_callback *cb)
{
struct net *net = sock_net(skb->sk);
struct nlattr *bc = NULL;
@@ -161,8 +173,8 @@
struct sock *sk;
int rc = 0;
- read_lock(&smc_proto.h.smc_hash->lock);
- head = &smc_proto.h.smc_hash->ht;
+ read_lock(&prot->h.smc_hash->lock);
+ head = &prot->h.smc_hash->ht;
if (hlist_empty(head))
goto out;
@@ -175,7 +187,17 @@
}
out:
- read_unlock(&smc_proto.h.smc_hash->lock);
+ read_unlock(&prot->h.smc_hash->lock);
+ return rc;
+}
+
+static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ int rc = 0;
+
+ rc = smc_diag_dump_proto(&smc_proto, skb, cb);
+ if (!rc)
+ rc = smc_diag_dump_proto(&smc_proto6, skb, cb);
return rc;
}
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 26df554..0eed7ab 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -143,17 +143,6 @@
return rc;
}
-static void smc_ib_port_terminate(struct smc_ib_device *smcibdev, u8 ibport)
-{
- struct smc_link_group *lgr, *l;
-
- list_for_each_entry_safe(lgr, l, &smc_lgr_list.list, list) {
- if (lgr->lnk[SMC_SINGLE_LINK].smcibdev == smcibdev &&
- lgr->lnk[SMC_SINGLE_LINK].ibport == ibport)
- smc_lgr_terminate(lgr);
- }
-}
-
/* process context wrapper for might_sleep smc_ib_remember_port_attr */
static void smc_ib_port_event_work(struct work_struct *work)
{
@@ -165,7 +154,7 @@
smc_ib_remember_port_attr(smcibdev, port_idx + 1);
clear_bit(port_idx, &smcibdev->port_event_mask);
if (!smc_ib_port_active(smcibdev, port_idx + 1))
- smc_ib_port_terminate(smcibdev, port_idx + 1);
+ smc_port_terminate(smcibdev, port_idx + 1);
}
}
diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index ea4b219..5800a6b 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -214,6 +214,50 @@
return rc;
}
+/* send LLC confirm rkey request */
+static int smc_llc_send_confirm_rkey(struct smc_link *link,
+ struct smc_buf_desc *rmb_desc)
+{
+ struct smc_llc_msg_confirm_rkey *rkeyllc;
+ struct smc_wr_tx_pend_priv *pend;
+ struct smc_wr_buf *wr_buf;
+ int rc;
+
+ rc = smc_llc_add_pending_send(link, &wr_buf, &pend);
+ if (rc)
+ return rc;
+ rkeyllc = (struct smc_llc_msg_confirm_rkey *)wr_buf;
+ memset(rkeyllc, 0, sizeof(*rkeyllc));
+ rkeyllc->hd.common.type = SMC_LLC_CONFIRM_RKEY;
+ rkeyllc->hd.length = sizeof(struct smc_llc_msg_confirm_rkey);
+ rkeyllc->rtoken[0].rmb_key =
+ htonl(rmb_desc->mr_rx[SMC_SINGLE_LINK]->rkey);
+ rkeyllc->rtoken[0].rmb_vaddr = cpu_to_be64(
+ (u64)sg_dma_address(rmb_desc->sgt[SMC_SINGLE_LINK].sgl));
+ /* send llc message */
+ rc = smc_wr_tx_send(link, pend);
+ return rc;
+}
+
+/* prepare an add link message */
+static void smc_llc_prep_add_link(struct smc_llc_msg_add_link *addllc,
+ struct smc_link *link, u8 mac[],
+ union ib_gid *gid,
+ enum smc_llc_reqresp reqresp)
+{
+ memset(addllc, 0, sizeof(*addllc));
+ addllc->hd.common.type = SMC_LLC_ADD_LINK;
+ addllc->hd.length = sizeof(struct smc_llc_msg_add_link);
+ if (reqresp == SMC_LLC_RESP) {
+ addllc->hd.flags |= SMC_LLC_FLAG_RESP;
+ /* always reject more links for now */
+ addllc->hd.flags |= SMC_LLC_FLAG_ADD_LNK_REJ;
+ addllc->hd.add_link_rej_rsn = SMC_LLC_REJ_RSN_NO_ALT_PATH;
+ }
+ memcpy(addllc->sender_mac, mac, ETH_ALEN);
+ memcpy(addllc->sender_gid, gid, SMC_GID_SIZE);
+}
+
/* send ADD LINK request or response */
int smc_llc_send_add_link(struct smc_link *link, u8 mac[],
union ib_gid *gid,
@@ -228,22 +272,28 @@
if (rc)
return rc;
addllc = (struct smc_llc_msg_add_link *)wr_buf;
- memset(addllc, 0, sizeof(*addllc));
- addllc->hd.common.type = SMC_LLC_ADD_LINK;
- addllc->hd.length = sizeof(struct smc_llc_msg_add_link);
- if (reqresp == SMC_LLC_RESP) {
- addllc->hd.flags |= SMC_LLC_FLAG_RESP;
- /* always reject more links for now */
- addllc->hd.flags |= SMC_LLC_FLAG_ADD_LNK_REJ;
- addllc->hd.add_link_rej_rsn = SMC_LLC_REJ_RSN_NO_ALT_PATH;
- }
- memcpy(addllc->sender_mac, mac, ETH_ALEN);
- memcpy(addllc->sender_gid, gid, SMC_GID_SIZE);
+ smc_llc_prep_add_link(addllc, link, mac, gid, reqresp);
/* send llc message */
rc = smc_wr_tx_send(link, pend);
return rc;
}
+/* prepare a delete link message */
+static void smc_llc_prep_delete_link(struct smc_llc_msg_del_link *delllc,
+ struct smc_link *link,
+ enum smc_llc_reqresp reqresp)
+{
+ memset(delllc, 0, sizeof(*delllc));
+ delllc->hd.common.type = SMC_LLC_DELETE_LINK;
+ delllc->hd.length = sizeof(struct smc_llc_msg_add_link);
+ if (reqresp == SMC_LLC_RESP)
+ delllc->hd.flags |= SMC_LLC_FLAG_RESP;
+ /* DEL_LINK_ALL because only 1 link supported */
+ delllc->hd.flags |= SMC_LLC_FLAG_DEL_LINK_ALL;
+ delllc->hd.flags |= SMC_LLC_FLAG_DEL_LINK_ORDERLY;
+ delllc->link_num = link->link_id;
+}
+
/* send DELETE LINK request or response */
int smc_llc_send_delete_link(struct smc_link *link,
enum smc_llc_reqresp reqresp)
@@ -257,23 +307,14 @@
if (rc)
return rc;
delllc = (struct smc_llc_msg_del_link *)wr_buf;
- memset(delllc, 0, sizeof(*delllc));
- delllc->hd.common.type = SMC_LLC_DELETE_LINK;
- delllc->hd.length = sizeof(struct smc_llc_msg_add_link);
- if (reqresp == SMC_LLC_RESP)
- delllc->hd.flags |= SMC_LLC_FLAG_RESP;
- /* DEL_LINK_ALL because only 1 link supported */
- delllc->hd.flags |= SMC_LLC_FLAG_DEL_LINK_ALL;
- delllc->hd.flags |= SMC_LLC_FLAG_DEL_LINK_ORDERLY;
- delllc->link_num = link->link_id;
+ smc_llc_prep_delete_link(delllc, link, reqresp);
/* send llc message */
rc = smc_wr_tx_send(link, pend);
return rc;
}
-/* send LLC test link request or response */
-int smc_llc_send_test_link(struct smc_link *link, u8 user_data[16],
- enum smc_llc_reqresp reqresp)
+/* send LLC test link request */
+static int smc_llc_send_test_link(struct smc_link *link, u8 user_data[16])
{
struct smc_llc_msg_test_link *testllc;
struct smc_wr_tx_pend_priv *pend;
@@ -287,28 +328,52 @@
memset(testllc, 0, sizeof(*testllc));
testllc->hd.common.type = SMC_LLC_TEST_LINK;
testllc->hd.length = sizeof(struct smc_llc_msg_test_link);
- if (reqresp == SMC_LLC_RESP)
- testllc->hd.flags |= SMC_LLC_FLAG_RESP;
memcpy(testllc->user_data, user_data, sizeof(testllc->user_data));
/* send llc message */
rc = smc_wr_tx_send(link, pend);
return rc;
}
-/* send a prepared message */
-static int smc_llc_send_message(struct smc_link *link, void *llcbuf, int llclen)
+struct smc_llc_send_work {
+ struct work_struct work;
+ struct smc_link *link;
+ int llclen;
+ union smc_llc_msg llcbuf;
+};
+
+/* worker that sends a prepared message */
+static void smc_llc_send_message_work(struct work_struct *work)
{
+ struct smc_llc_send_work *llcwrk = container_of(work,
+ struct smc_llc_send_work, work);
struct smc_wr_tx_pend_priv *pend;
struct smc_wr_buf *wr_buf;
int rc;
- rc = smc_llc_add_pending_send(link, &wr_buf, &pend);
+ if (llcwrk->link->state == SMC_LNK_INACTIVE)
+ goto out;
+ rc = smc_llc_add_pending_send(llcwrk->link, &wr_buf, &pend);
if (rc)
- return rc;
- memcpy(wr_buf, llcbuf, llclen);
- /* send llc message */
- rc = smc_wr_tx_send(link, pend);
- return rc;
+ goto out;
+ memcpy(wr_buf, &llcwrk->llcbuf, llcwrk->llclen);
+ smc_wr_tx_send(llcwrk->link, pend);
+out:
+ kfree(llcwrk);
+}
+
+/* copy llcbuf and schedule an llc send on link */
+static int smc_llc_send_message(struct smc_link *link, void *llcbuf, int llclen)
+{
+ struct smc_llc_send_work *wrk = kmalloc(sizeof(*wrk), GFP_ATOMIC);
+
+ if (!wrk)
+ return -ENOMEM;
+ INIT_WORK(&wrk->work, smc_llc_send_message_work);
+ wrk->link = link;
+ wrk->llclen = llclen;
+ memcpy(&wrk->llcbuf, llcbuf, llclen);
+ queue_work(link->llc_wq, &wrk->work);
+ return 0;
}
/********************************* receive ***********************************/
@@ -359,17 +424,18 @@
}
if (lgr->role == SMC_SERV) {
- smc_llc_send_add_link(link,
+ smc_llc_prep_add_link(llc, link,
link->smcibdev->mac[link->ibport - 1],
&link->smcibdev->gid[link->ibport - 1],
SMC_LLC_REQ);
} else {
- smc_llc_send_add_link(link,
+ smc_llc_prep_add_link(llc, link,
link->smcibdev->mac[link->ibport - 1],
&link->smcibdev->gid[link->ibport - 1],
SMC_LLC_RESP);
}
+ smc_llc_send_message(link, llc, sizeof(*llc));
}
}
@@ -385,9 +451,11 @@
} else {
if (lgr->role == SMC_SERV) {
smc_lgr_forget(lgr);
- smc_llc_send_delete_link(link, SMC_LLC_REQ);
+ smc_llc_prep_delete_link(llc, link, SMC_LLC_REQ);
+ smc_llc_send_message(link, llc, sizeof(*llc));
} else {
- smc_llc_send_delete_link(link, SMC_LLC_RESP);
+ smc_llc_prep_delete_link(llc, link, SMC_LLC_RESP);
+ smc_llc_send_message(link, llc, sizeof(*llc));
smc_lgr_terminate(lgr);
}
}
@@ -397,9 +465,11 @@
struct smc_llc_msg_test_link *llc)
{
if (llc->hd.flags & SMC_LLC_FLAG_RESP) {
- /* unused as long as we don't send this type of msg */
+ if (link->state == SMC_LNK_ACTIVE)
+ complete(&link->llc_testlink_resp);
} else {
- smc_llc_send_test_link(link, llc->user_data, SMC_LLC_RESP);
+ llc->hd.flags |= SMC_LLC_FLAG_RESP;
+ smc_llc_send_message(link, llc, sizeof(*llc));
}
}
@@ -412,7 +482,9 @@
lgr = container_of(link, struct smc_link_group, lnk[SMC_SINGLE_LINK]);
if (llc->hd.flags & SMC_LLC_FLAG_RESP) {
- /* unused as long as we don't send this type of msg */
+ link->llc_confirm_rkey_rc = llc->hd.flags &
+ SMC_LLC_FLAG_RKEY_NEG;
+ complete(&link->llc_confirm_rkey);
} else {
rc = smc_rtoken_add(lgr,
llc->rtoken[0].rmb_vaddr,
@@ -423,7 +495,7 @@
llc->hd.flags |= SMC_LLC_FLAG_RESP;
if (rc < 0)
llc->hd.flags |= SMC_LLC_FLAG_RKEY_NEG;
- smc_llc_send_message(link, (void *)llc, sizeof(*llc));
+ smc_llc_send_message(link, llc, sizeof(*llc));
}
}
@@ -435,7 +507,7 @@
} else {
/* ignore rtokens for other links, we have only one link */
llc->hd.flags |= SMC_LLC_FLAG_RESP;
- smc_llc_send_message(link, (void *)llc, sizeof(*llc));
+ smc_llc_send_message(link, llc, sizeof(*llc));
}
}
@@ -463,7 +535,7 @@
}
llc->hd.flags |= SMC_LLC_FLAG_RESP;
- smc_llc_send_message(link, (void *)llc, sizeof(*llc));
+ smc_llc_send_message(link, llc, sizeof(*llc));
}
}
@@ -476,6 +548,8 @@
return; /* short message */
if (llc->raw.hdr.length != sizeof(*llc))
return; /* invalid message */
+ if (link->state == SMC_LNK_INACTIVE)
+ return; /* link not active, drop msg */
switch (llc->raw.hdr.common.type) {
case SMC_LLC_TEST_LINK:
@@ -502,6 +576,100 @@
}
}
+/***************************** worker, utils *********************************/
+
+static void smc_llc_testlink_work(struct work_struct *work)
+{
+ struct smc_link *link = container_of(to_delayed_work(work),
+ struct smc_link, llc_testlink_wrk);
+ unsigned long next_interval;
+ struct smc_link_group *lgr;
+ unsigned long expire_time;
+ u8 user_data[16] = { 0 };
+ int rc;
+
+ lgr = container_of(link, struct smc_link_group, lnk[SMC_SINGLE_LINK]);
+ if (link->state != SMC_LNK_ACTIVE)
+ return; /* don't reschedule worker */
+ expire_time = link->wr_rx_tstamp + link->llc_testlink_time;
+ if (time_is_after_jiffies(expire_time)) {
+ next_interval = expire_time - jiffies;
+ goto out;
+ }
+ reinit_completion(&link->llc_testlink_resp);
+ smc_llc_send_test_link(link, user_data);
+ /* receive TEST LINK response over RoCE fabric */
+ rc = wait_for_completion_interruptible_timeout(&link->llc_testlink_resp,
+ SMC_LLC_WAIT_TIME);
+ if (rc <= 0) {
+ smc_lgr_terminate(lgr);
+ return;
+ }
+ next_interval = link->llc_testlink_time;
+out:
+ queue_delayed_work(link->llc_wq, &link->llc_testlink_wrk,
+ next_interval);
+}
+
+int smc_llc_link_init(struct smc_link *link)
+{
+ struct smc_link_group *lgr = container_of(link, struct smc_link_group,
+ lnk[SMC_SINGLE_LINK]);
+ link->llc_wq = alloc_ordered_workqueue("llc_wq-%x:%x)", WQ_MEM_RECLAIM,
+ *((u32 *)lgr->id),
+ link->link_id);
+ if (!link->llc_wq)
+ return -ENOMEM;
+ init_completion(&link->llc_confirm);
+ init_completion(&link->llc_confirm_resp);
+ init_completion(&link->llc_add);
+ init_completion(&link->llc_add_resp);
+ init_completion(&link->llc_confirm_rkey);
+ init_completion(&link->llc_testlink_resp);
+ INIT_DELAYED_WORK(&link->llc_testlink_wrk, smc_llc_testlink_work);
+ return 0;
+}
+
+void smc_llc_link_active(struct smc_link *link, int testlink_time)
+{
+ link->state = SMC_LNK_ACTIVE;
+ if (testlink_time) {
+ link->llc_testlink_time = testlink_time * HZ;
+ queue_delayed_work(link->llc_wq, &link->llc_testlink_wrk,
+ link->llc_testlink_time);
+ }
+}
+
+/* called in tasklet context */
+void smc_llc_link_inactive(struct smc_link *link)
+{
+ link->state = SMC_LNK_INACTIVE;
+ cancel_delayed_work(&link->llc_testlink_wrk);
+}
+
+/* called in worker context */
+void smc_llc_link_clear(struct smc_link *link)
+{
+ flush_workqueue(link->llc_wq);
+ destroy_workqueue(link->llc_wq);
+}
+
+/* register a new rtoken at the remote peer */
+int smc_llc_do_confirm_rkey(struct smc_link *link,
+ struct smc_buf_desc *rmb_desc)
+{
+ int rc;
+
+ reinit_completion(&link->llc_confirm_rkey);
+ smc_llc_send_confirm_rkey(link, rmb_desc);
+ /* receive CONFIRM RKEY response from server over RoCE fabric */
+ rc = wait_for_completion_interruptible_timeout(&link->llc_confirm_rkey,
+ SMC_LLC_WAIT_TIME);
+ if (rc <= 0 || link->llc_confirm_rkey_rc)
+ return -EFAULT;
+ return 0;
+}
+
/***************************** init, exit, misc ******************************/
static struct smc_wr_rx_handler smc_llc_rx_handlers[] = {
diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
index e4a7d5e..65c8645 100644
--- a/net/smc/smc_llc.h
+++ b/net/smc/smc_llc.h
@@ -42,8 +42,12 @@
enum smc_llc_reqresp reqresp);
int smc_llc_send_delete_link(struct smc_link *link,
enum smc_llc_reqresp reqresp);
-int smc_llc_send_test_link(struct smc_link *lnk, u8 user_data[16],
- enum smc_llc_reqresp reqresp);
+int smc_llc_link_init(struct smc_link *link);
+void smc_llc_link_active(struct smc_link *link, int testlink_time);
+void smc_llc_link_inactive(struct smc_link *link);
+void smc_llc_link_clear(struct smc_link *link);
+int smc_llc_do_confirm_rkey(struct smc_link *link,
+ struct smc_buf_desc *rmb_desc);
int smc_llc_init(void) __init;
#endif /* SMC_LLC_H */
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index eff4e0d..3d77b38 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -22,11 +22,10 @@
#include "smc_tx.h" /* smc_tx_consumer_update() */
#include "smc_rx.h"
-/* callback implementation for sk.sk_data_ready()
- * to wakeup rcvbuf consumers that blocked with smc_rx_wait_data().
+/* callback implementation to wakeup consumers blocked with smc_rx_wait().
* indirectly called by smc_cdc_msg_recv_action().
*/
-static void smc_rx_data_ready(struct sock *sk)
+static void smc_rx_wake_up(struct sock *sk)
{
struct socket_wq *wq;
@@ -44,28 +43,180 @@
rcu_read_unlock();
}
+/* Update consumer cursor
+ * @conn connection to update
+ * @cons consumer cursor
+ * @len number of Bytes consumed
+ * Returns:
+ * 1 if we should end our receive, 0 otherwise
+ */
+static int smc_rx_update_consumer(struct smc_sock *smc,
+ union smc_host_cursor cons, size_t len)
+{
+ struct smc_connection *conn = &smc->conn;
+ struct sock *sk = &smc->sk;
+ bool force = false;
+ int diff, rc = 0;
+
+ smc_curs_add(conn->rmb_desc->len, &cons, len);
+
+ /* did we process urgent data? */
+ if (conn->urg_state == SMC_URG_VALID || conn->urg_rx_skip_pend) {
+ diff = smc_curs_comp(conn->rmb_desc->len, &cons,
+ &conn->urg_curs);
+ if (sock_flag(sk, SOCK_URGINLINE)) {
+ if (diff == 0) {
+ force = true;
+ rc = 1;
+ conn->urg_state = SMC_URG_READ;
+ }
+ } else {
+ if (diff == 1) {
+ /* skip urgent byte */
+ force = true;
+ smc_curs_add(conn->rmb_desc->len, &cons, 1);
+ conn->urg_rx_skip_pend = false;
+ } else if (diff < -1)
+ /* we read past urgent byte */
+ conn->urg_state = SMC_URG_READ;
+ }
+ }
+
+ smc_curs_write(&conn->local_tx_ctrl.cons, smc_curs_read(&cons, conn),
+ conn);
+
+ /* send consumer cursor update if required */
+ /* similar to advertising new TCP rcv_wnd if required */
+ smc_tx_consumer_update(conn, force);
+
+ return rc;
+}
+
+static void smc_rx_update_cons(struct smc_sock *smc, size_t len)
+{
+ struct smc_connection *conn = &smc->conn;
+ union smc_host_cursor cons;
+
+ smc_curs_write(&cons, smc_curs_read(&conn->local_tx_ctrl.cons, conn),
+ conn);
+ smc_rx_update_consumer(smc, cons, len);
+}
+
+struct smc_spd_priv {
+ struct smc_sock *smc;
+ size_t len;
+};
+
+static void smc_rx_pipe_buf_release(struct pipe_inode_info *pipe,
+ struct pipe_buffer *buf)
+{
+ struct smc_spd_priv *priv = (struct smc_spd_priv *)buf->private;
+ struct smc_sock *smc = priv->smc;
+ struct smc_connection *conn;
+ struct sock *sk = &smc->sk;
+
+ if (sk->sk_state == SMC_CLOSED ||
+ sk->sk_state == SMC_PEERFINCLOSEWAIT ||
+ sk->sk_state == SMC_APPFINCLOSEWAIT)
+ goto out;
+ conn = &smc->conn;
+ lock_sock(sk);
+ smc_rx_update_cons(smc, priv->len);
+ release_sock(sk);
+ if (atomic_sub_and_test(priv->len, &conn->splice_pending))
+ smc_rx_wake_up(sk);
+out:
+ kfree(priv);
+ put_page(buf->page);
+ sock_put(sk);
+}
+
+static int smc_rx_pipe_buf_nosteal(struct pipe_inode_info *pipe,
+ struct pipe_buffer *buf)
+{
+ return 1;
+}
+
+static const struct pipe_buf_operations smc_pipe_ops = {
+ .can_merge = 0,
+ .confirm = generic_pipe_buf_confirm,
+ .release = smc_rx_pipe_buf_release,
+ .steal = smc_rx_pipe_buf_nosteal,
+ .get = generic_pipe_buf_get
+};
+
+static void smc_rx_spd_release(struct splice_pipe_desc *spd,
+ unsigned int i)
+{
+ put_page(spd->pages[i]);
+}
+
+static int smc_rx_splice(struct pipe_inode_info *pipe, char *src, size_t len,
+ struct smc_sock *smc)
+{
+ struct splice_pipe_desc spd;
+ struct partial_page partial;
+ struct smc_spd_priv *priv;
+ struct page *page;
+ int bytes;
+
+ page = virt_to_page(smc->conn.rmb_desc->cpu_addr);
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+ priv->len = len;
+ priv->smc = smc;
+ partial.offset = src - (char *)smc->conn.rmb_desc->cpu_addr;
+ partial.len = len;
+ partial.private = (unsigned long)priv;
+
+ spd.nr_pages_max = 1;
+ spd.nr_pages = 1;
+ spd.pages = &page;
+ spd.partial = &partial;
+ spd.ops = &smc_pipe_ops;
+ spd.spd_release = smc_rx_spd_release;
+
+ bytes = splice_to_pipe(pipe, &spd);
+ if (bytes > 0) {
+ sock_hold(&smc->sk);
+ get_page(smc->conn.rmb_desc->pages);
+ atomic_add(bytes, &smc->conn.splice_pending);
+ }
+
+ return bytes;
+}
+
+static int smc_rx_data_available_and_no_splice_pend(struct smc_connection *conn)
+{
+ return atomic_read(&conn->bytes_to_rcv) &&
+ !atomic_read(&conn->splice_pending);
+}
+
/* blocks rcvbuf consumer until >=len bytes available or timeout or interrupted
* @smc smc socket
* @timeo pointer to max seconds to wait, pointer to value 0 for no timeout
+ * @fcrit add'l criterion to evaluate as function pointer
* Returns:
* 1 if at least 1 byte available in rcvbuf or if socket error/shutdown.
* 0 otherwise (nothing in rcvbuf nor timeout, e.g. interrupted).
*/
-static int smc_rx_wait_data(struct smc_sock *smc, long *timeo)
+int smc_rx_wait(struct smc_sock *smc, long *timeo,
+ int (*fcrit)(struct smc_connection *conn))
{
DEFINE_WAIT_FUNC(wait, woken_wake_function);
struct smc_connection *conn = &smc->conn;
struct sock *sk = &smc->sk;
int rc;
- if (atomic_read(&conn->bytes_to_rcv))
+ if (fcrit(conn))
return 1;
sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
add_wait_queue(sk_sleep(sk), &wait);
rc = sk_wait_event(sk, timeo,
sk->sk_err ||
sk->sk_shutdown & RCV_SHUTDOWN ||
- atomic_read(&conn->bytes_to_rcv) ||
+ fcrit(conn) ||
smc_cdc_rxed_any_close_or_senddone(conn),
&wait);
remove_wait_queue(sk_sleep(sk), &wait);
@@ -73,65 +224,115 @@
return rc;
}
-/* rcvbuf consumer: main API called by socket layer.
- * called under sk lock.
+static int smc_rx_recv_urg(struct smc_sock *smc, struct msghdr *msg, int len,
+ int flags)
+{
+ struct smc_connection *conn = &smc->conn;
+ union smc_host_cursor cons;
+ struct sock *sk = &smc->sk;
+ int rc = 0;
+
+ if (sock_flag(sk, SOCK_URGINLINE) ||
+ !(conn->urg_state == SMC_URG_VALID) ||
+ conn->urg_state == SMC_URG_READ)
+ return -EINVAL;
+
+ if (conn->urg_state == SMC_URG_VALID) {
+ if (!(flags & MSG_PEEK))
+ smc->conn.urg_state = SMC_URG_READ;
+ msg->msg_flags |= MSG_OOB;
+ if (len > 0) {
+ if (!(flags & MSG_TRUNC))
+ rc = memcpy_to_msg(msg, &conn->urg_rx_byte, 1);
+ len = 1;
+ smc_curs_write(&cons,
+ smc_curs_read(&conn->local_tx_ctrl.cons,
+ conn),
+ conn);
+ if (smc_curs_diff(conn->rmb_desc->len, &cons,
+ &conn->urg_curs) > 1)
+ conn->urg_rx_skip_pend = true;
+ /* Urgent Byte was already accounted for, but trigger
+ * skipping the urgent byte in non-inline case
+ */
+ if (!(flags & MSG_PEEK))
+ smc_rx_update_consumer(smc, cons, 0);
+ } else {
+ msg->msg_flags |= MSG_TRUNC;
+ }
+
+ return rc ? -EFAULT : len;
+ }
+
+ if (sk->sk_state == SMC_CLOSED || sk->sk_shutdown & RCV_SHUTDOWN)
+ return 0;
+
+ return -EAGAIN;
+}
+
+/* smc_rx_recvmsg - receive data from RMBE
+ * @msg: copy data to receive buffer
+ * @pipe: copy data to pipe if set - indicates splice() call
+ *
+ * rcvbuf consumer: main API called by socket layer.
+ * Called under sk lock.
*/
-int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
- int flags)
+int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg,
+ struct pipe_inode_info *pipe, size_t len, int flags)
{
size_t copylen, read_done = 0, read_remaining = len;
size_t chunk_len, chunk_off, chunk_len_sum;
struct smc_connection *conn = &smc->conn;
+ int (*func)(struct smc_connection *conn);
union smc_host_cursor cons;
int readable, chunk;
char *rcvbuf_base;
struct sock *sk;
+ int splbytes;
long timeo;
int target; /* Read at least these many bytes */
int rc;
if (unlikely(flags & MSG_ERRQUEUE))
return -EINVAL; /* future work for sk.sk_family == AF_SMC */
- if (flags & MSG_OOB)
- return -EINVAL; /* future work */
sk = &smc->sk;
if (sk->sk_state == SMC_LISTEN)
return -ENOTCONN;
+ if (flags & MSG_OOB)
+ return smc_rx_recv_urg(smc, msg, len, flags);
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
- msg->msg_namelen = 0;
/* we currently use 1 RMBE per RMB, so RMBE == RMB base addr */
rcvbuf_base = conn->rmb_desc->cpu_addr;
do { /* while (read_remaining) */
- if (read_done >= target)
+ if (read_done >= target || (pipe && read_done))
break;
if (atomic_read(&conn->bytes_to_rcv))
goto copy;
+ else if (conn->urg_state == SMC_URG_VALID)
+ /* we received a single urgent Byte - skip */
+ smc_rx_update_cons(smc, 0);
+
+ if (sk->sk_shutdown & RCV_SHUTDOWN ||
+ smc_cdc_rxed_any_close_or_senddone(conn) ||
+ conn->local_tx_ctrl.conn_state_flags.peer_conn_abort)
+ break;
if (read_done) {
if (sk->sk_err ||
sk->sk_state == SMC_CLOSED ||
- sk->sk_shutdown & RCV_SHUTDOWN ||
!timeo ||
- signal_pending(current) ||
- smc_cdc_rxed_any_close_or_senddone(conn) ||
- conn->local_tx_ctrl.conn_state_flags.
- peer_conn_abort)
+ signal_pending(current))
break;
} else {
if (sk->sk_err) {
read_done = sock_error(sk);
break;
}
- if (sk->sk_shutdown & RCV_SHUTDOWN ||
- smc_cdc_rxed_any_close_or_senddone(conn) ||
- conn->local_tx_ctrl.conn_state_flags.
- peer_conn_abort)
- break;
if (sk->sk_state == SMC_CLOSED) {
if (!sock_flag(sk, SOCK_DONE)) {
/* This occurs when user tries to read
@@ -150,32 +351,56 @@
return -EAGAIN;
}
- if (!atomic_read(&conn->bytes_to_rcv)) {
- smc_rx_wait_data(smc, &timeo);
+ if (!smc_rx_data_available(conn)) {
+ smc_rx_wait(smc, &timeo, smc_rx_data_available);
continue;
}
copy:
/* initialize variables for 1st iteration of subsequent loop */
- /* could be just 1 byte, even after smc_rx_wait_data above */
+ /* could be just 1 byte, even after waiting on data above */
readable = atomic_read(&conn->bytes_to_rcv);
- /* not more than what user space asked for */
- copylen = min_t(size_t, read_remaining, readable);
+ splbytes = atomic_read(&conn->splice_pending);
+ if (!readable || (msg && splbytes)) {
+ if (splbytes)
+ func = smc_rx_data_available_and_no_splice_pend;
+ else
+ func = smc_rx_data_available;
+ smc_rx_wait(smc, &timeo, func);
+ continue;
+ }
+
smc_curs_write(&cons,
smc_curs_read(&conn->local_tx_ctrl.cons, conn),
conn);
+ /* subsequent splice() calls pick up where previous left */
+ if (splbytes)
+ smc_curs_add(conn->rmb_desc->len, &cons, splbytes);
+ if (conn->urg_state == SMC_URG_VALID &&
+ sock_flag(&smc->sk, SOCK_URGINLINE) &&
+ readable > 1)
+ readable--; /* always stop at urgent Byte */
+ /* not more than what user space asked for */
+ copylen = min_t(size_t, read_remaining, readable);
/* determine chunks where to read from rcvbuf */
/* either unwrapped case, or 1st chunk of wrapped case */
- chunk_len = min_t(size_t,
- copylen, conn->rmbe_size - cons.count);
+ chunk_len = min_t(size_t, copylen, conn->rmb_desc->len -
+ cons.count);
chunk_len_sum = chunk_len;
chunk_off = cons.count;
smc_rmb_sync_sg_for_cpu(conn);
for (chunk = 0; chunk < 2; chunk++) {
if (!(flags & MSG_TRUNC)) {
- rc = memcpy_to_msg(msg, rcvbuf_base + chunk_off,
- chunk_len);
- if (rc) {
+ if (msg) {
+ rc = memcpy_to_msg(msg, rcvbuf_base +
+ chunk_off,
+ chunk_len);
+ } else {
+ rc = smc_rx_splice(pipe, rcvbuf_base +
+ chunk_off, chunk_len,
+ smc);
+ }
+ if (rc < 0) {
if (!read_done)
read_done = -EFAULT;
smc_rmb_sync_sg_for_device(conn);
@@ -196,18 +421,13 @@
/* update cursors */
if (!(flags & MSG_PEEK)) {
- smc_curs_add(conn->rmbe_size, &cons, copylen);
/* increased in recv tasklet smc_cdc_msg_rcv() */
smp_mb__before_atomic();
atomic_sub(copylen, &conn->bytes_to_rcv);
- /* guarantee 0 <= bytes_to_rcv <= rmbe_size */
+ /* guarantee 0 <= bytes_to_rcv <= rmb_desc->len */
smp_mb__after_atomic();
- smc_curs_write(&conn->local_tx_ctrl.cons,
- smc_curs_read(&cons, conn),
- conn);
- /* send consumer cursor update if required */
- /* similar to advertising new TCP rcv_wnd if required */
- smc_tx_consumer_update(conn);
+ if (msg && smc_rx_update_consumer(smc, cons, copylen))
+ goto out;
}
} while (read_remaining);
out:
@@ -217,5 +437,7 @@
/* Initialize receive properties on connection establishment. NB: not __init! */
void smc_rx_init(struct smc_sock *smc)
{
- smc->sk.sk_data_ready = smc_rx_data_ready;
+ smc->sk.sk_data_ready = smc_rx_wake_up;
+ atomic_set(&smc->conn.splice_pending, 0);
+ smc->conn.urg_state = SMC_URG_READ;
}
diff --git a/net/smc/smc_rx.h b/net/smc/smc_rx.h
index 3a32b59..db823c9 100644
--- a/net/smc/smc_rx.h
+++ b/net/smc/smc_rx.h
@@ -18,7 +18,14 @@
#include "smc.h"
void smc_rx_init(struct smc_sock *smc);
-int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
- int flags);
+
+int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg,
+ struct pipe_inode_info *pipe, size_t len, int flags);
+int smc_rx_wait(struct smc_sock *smc, long *timeo,
+ int (*fcrit)(struct smc_connection *conn));
+static inline int smc_rx_data_available(struct smc_connection *conn)
+{
+ return atomic_read(&conn->bytes_to_rcv);
+}
#endif /* SMC_RX_H */
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 72f004c..cee6664 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -19,6 +19,7 @@
#include <linux/sched/signal.h>
#include <net/sock.h>
+#include <net/tcp.h>
#include "smc.h"
#include "smc_wr.h"
@@ -26,11 +27,12 @@
#include "smc_tx.h"
#define SMC_TX_WORK_DELAY HZ
+#define SMC_TX_CORK_DELAY (HZ >> 2) /* 250 ms */
/***************************** sndbuf producer *******************************/
/* callback implementation for sk.sk_write_space()
- * to wakeup sndbuf producers that blocked with smc_tx_wait_memory().
+ * to wakeup sndbuf producers that blocked with smc_tx_wait().
* called under sk_socket lock.
*/
static void smc_tx_write_space(struct sock *sk)
@@ -54,7 +56,7 @@
}
}
-/* Wakeup sndbuf producers that blocked with smc_tx_wait_memory().
+/* Wakeup sndbuf producers that blocked with smc_tx_wait().
* Cf. tcp_data_snd_check()=>tcp_check_space()=>tcp_new_space().
*/
void smc_tx_sndbuf_nonfull(struct smc_sock *smc)
@@ -64,8 +66,10 @@
smc->sk.sk_write_space(&smc->sk);
}
-/* blocks sndbuf producer until at least one byte of free space available */
-static int smc_tx_wait_memory(struct smc_sock *smc, int flags)
+/* blocks sndbuf producer until at least one byte of free space available
+ * or urgent Byte was consumed
+ */
+static int smc_tx_wait(struct smc_sock *smc, int flags)
{
DEFINE_WAIT_FUNC(wait, woken_wake_function);
struct smc_connection *conn = &smc->conn;
@@ -101,20 +105,28 @@
break;
}
sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
- if (atomic_read(&conn->sndbuf_space))
- break; /* at least 1 byte of free space available */
+ if (atomic_read(&conn->sndbuf_space) && !conn->urg_tx_pend)
+ break; /* at least 1 byte of free & no urgent data */
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk_wait_event(sk, &timeo,
sk->sk_err ||
(sk->sk_shutdown & SEND_SHUTDOWN) ||
smc_cdc_rxed_any_close(conn) ||
- atomic_read(&conn->sndbuf_space),
+ (atomic_read(&conn->sndbuf_space) &&
+ !conn->urg_tx_pend),
&wait);
}
remove_wait_queue(sk_sleep(sk), &wait);
return rc;
}
+static bool smc_tx_is_corked(struct smc_sock *smc)
+{
+ struct tcp_sock *tp = tcp_sk(smc->clcsock->sk);
+
+ return (tp->nonagle & TCP_NAGLE_CORK) ? true : false;
+}
+
/* sndbuf producer: main API called by socket layer.
* called under sock lock.
*/
@@ -148,8 +160,11 @@
if (smc_cdc_rxed_any_close(conn))
return send_done ?: -ECONNRESET;
- if (!atomic_read(&conn->sndbuf_space)) {
- rc = smc_tx_wait_memory(smc, msg->msg_flags);
+ if (msg->msg_flags & MSG_OOB)
+ conn->local_tx_ctrl.prod_flags.urg_data_pending = 1;
+
+ if (!atomic_read(&conn->sndbuf_space) || conn->urg_tx_pend) {
+ rc = smc_tx_wait(smc, msg->msg_flags);
if (rc) {
if (send_done)
return send_done;
@@ -159,7 +174,7 @@
}
/* initialize variables for 1st iteration of subsequent loop */
- /* could be just 1 byte, even after smc_tx_wait_memory above */
+ /* could be just 1 byte, even after smc_tx_wait above */
writespace = atomic_read(&conn->sndbuf_space);
/* not more than what user space asked for */
copylen = min_t(size_t, send_remaining, writespace);
@@ -171,8 +186,8 @@
tx_cnt_prep = prep.count;
/* determine chunks where to write into sndbuf */
/* either unwrapped case, or 1st chunk of wrapped case */
- chunk_len = min_t(size_t,
- copylen, conn->sndbuf_size - tx_cnt_prep);
+ chunk_len = min_t(size_t, copylen, conn->sndbuf_desc->len -
+ tx_cnt_prep);
chunk_len_sum = chunk_len;
chunk_off = tx_cnt_prep;
smc_sndbuf_sync_sg_for_cpu(conn);
@@ -197,19 +212,30 @@
}
smc_sndbuf_sync_sg_for_device(conn);
/* update cursors */
- smc_curs_add(conn->sndbuf_size, &prep, copylen);
+ smc_curs_add(conn->sndbuf_desc->len, &prep, copylen);
smc_curs_write(&conn->tx_curs_prep,
smc_curs_read(&prep, conn),
conn);
/* increased in send tasklet smc_cdc_tx_handler() */
smp_mb__before_atomic();
atomic_sub(copylen, &conn->sndbuf_space);
- /* guarantee 0 <= sndbuf_space <= sndbuf_size */
+ /* guarantee 0 <= sndbuf_space <= sndbuf_desc->len */
smp_mb__after_atomic();
/* since we just produced more new data into sndbuf,
* trigger sndbuf consumer: RDMA write into peer RMBE and CDC
*/
- smc_tx_sndbuf_nonempty(conn);
+ if ((msg->msg_flags & MSG_OOB) && !send_remaining)
+ conn->urg_tx_pend = true;
+ if ((msg->msg_flags & MSG_MORE || smc_tx_is_corked(smc)) &&
+ (atomic_read(&conn->sndbuf_space) >
+ (conn->sndbuf_desc->len >> 1)))
+ /* for a corked socket defer the RDMA writes if there
+ * is still sufficient sndbuf_space available
+ */
+ schedule_delayed_work(&conn->tx_work,
+ SMC_TX_CORK_DELAY);
+ else
+ smc_tx_sndbuf_nonempty(conn);
} /* while (msg_data_left(msg)) */
return send_done;
@@ -243,7 +269,7 @@
rdma_wr.remote_addr =
lgr->rtokens[conn->rtoken_idx][SMC_SINGLE_LINK].dma_addr +
/* RMBE within RMB */
- ((conn->peer_conn_idx - 1) * conn->peer_rmbe_size) +
+ conn->tx_off +
/* offset within RMBE */
peer_rmbe_offset;
rdma_wr.rkey = lgr->rtokens[conn->rtoken_idx][SMC_SINGLE_LINK].rkey;
@@ -268,7 +294,7 @@
atomic_sub(len, &conn->peer_rmbe_space);
/* guarantee 0 <= peer_rmbe_space <= peer_rmbe_size */
smp_mb__after_atomic();
- smc_curs_add(conn->sndbuf_size, sent, len);
+ smc_curs_add(conn->sndbuf_desc->len, sent, len);
}
/* sndbuf consumer: prepare all necessary (src&dst) chunks of data transmit;
@@ -281,6 +307,7 @@
union smc_host_cursor sent, prep, prod, cons;
struct ib_sge sges[SMC_IB_MAX_SEND_SGE];
struct smc_link_group *lgr = conn->lgr;
+ struct smc_cdc_producer_flags *pflags;
int to_send, rmbespace;
struct smc_link *link;
dma_addr_t dma_addr;
@@ -291,7 +318,7 @@
smc_curs_write(&sent, smc_curs_read(&conn->tx_curs_sent, conn), conn);
smc_curs_write(&prep, smc_curs_read(&conn->tx_curs_prep, conn), conn);
/* cf. wmem_alloc - (snd_max - snd_una) */
- to_send = smc_curs_diff(conn->sndbuf_size, &sent, &prep);
+ to_send = smc_curs_diff(conn->sndbuf_desc->len, &sent, &prep);
if (to_send <= 0)
return 0;
@@ -308,7 +335,8 @@
conn);
/* if usable snd_wnd closes ask peer to advertise once it opens again */
- conn->local_tx_ctrl.prod_flags.write_blocked = (to_send >= rmbespace);
+ pflags = &conn->local_tx_ctrl.prod_flags;
+ pflags->write_blocked = (to_send >= rmbespace);
/* cf. usable snd_wnd */
len = min(to_send, rmbespace);
@@ -333,12 +361,12 @@
dst_len_sum = dst_len;
src_off = sent.count;
/* dst_len determines the maximum src_len */
- if (sent.count + dst_len <= conn->sndbuf_size) {
+ if (sent.count + dst_len <= conn->sndbuf_desc->len) {
/* unwrapped src case: single chunk of entire dst_len */
src_len = dst_len;
} else {
/* wrapped src case: 2 chunks of sum dst_len; start with 1st: */
- src_len = conn->sndbuf_size - sent.count;
+ src_len = conn->sndbuf_desc->len - sent.count;
}
src_len_sum = src_len;
dma_addr = sg_dma_address(conn->sndbuf_desc->sgt[SMC_SINGLE_LINK].sgl);
@@ -350,8 +378,8 @@
sges[srcchunk].lkey = link->roce_pd->local_dma_lkey;
num_sges++;
src_off += src_len;
- if (src_off >= conn->sndbuf_size)
- src_off -= conn->sndbuf_size;
+ if (src_off >= conn->sndbuf_desc->len)
+ src_off -= conn->sndbuf_desc->len;
/* modulo in send ring */
if (src_len_sum == dst_len)
break; /* either on 1st or 2nd iteration */
@@ -369,10 +397,12 @@
dst_len = len - dst_len; /* remainder */
dst_len_sum += dst_len;
src_len = min_t(int,
- dst_len, conn->sndbuf_size - sent.count);
+ dst_len, conn->sndbuf_desc->len - sent.count);
src_len_sum = src_len;
}
+ if (conn->urg_tx_pend && len == to_send)
+ pflags->urg_data_present = 1;
smc_tx_advance_cursors(conn, &prod, &sent, len);
/* update connection's cursors with advanced local cursors */
smc_curs_write(&conn->local_tx_ctrl.prod,
@@ -392,6 +422,7 @@
*/
int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
{
+ struct smc_cdc_producer_flags *pflags;
struct smc_cdc_tx_pend *pend;
struct smc_wr_buf *wr_buf;
int rc;
@@ -409,20 +440,27 @@
}
rc = 0;
if (conn->alert_token_local) /* connection healthy */
- schedule_delayed_work(&conn->tx_work,
- SMC_TX_WORK_DELAY);
+ mod_delayed_work(system_wq, &conn->tx_work,
+ SMC_TX_WORK_DELAY);
}
goto out_unlock;
}
- rc = smc_tx_rdma_writes(conn);
- if (rc) {
- smc_wr_tx_put_slot(&conn->lgr->lnk[SMC_SINGLE_LINK],
- (struct smc_wr_tx_pend_priv *)pend);
- goto out_unlock;
+ if (!conn->local_tx_ctrl.prod_flags.urg_data_present) {
+ rc = smc_tx_rdma_writes(conn);
+ if (rc) {
+ smc_wr_tx_put_slot(&conn->lgr->lnk[SMC_SINGLE_LINK],
+ (struct smc_wr_tx_pend_priv *)pend);
+ goto out_unlock;
+ }
}
rc = smc_cdc_msg_send(conn, wr_buf, pend);
+ pflags = &conn->local_tx_ctrl.prod_flags;
+ if (!rc && pflags->urg_data_present) {
+ pflags->urg_data_pending = 0;
+ pflags->urg_data_present = 0;
+ }
out_unlock:
spin_unlock_bh(&conn->send_lock);
@@ -432,7 +470,7 @@
/* Wakeup sndbuf consumers from process context
* since there is more data to transmit
*/
-static void smc_tx_work(struct work_struct *work)
+void smc_tx_work(struct work_struct *work)
{
struct smc_connection *conn = container_of(to_delayed_work(work),
struct smc_connection,
@@ -455,7 +493,7 @@
release_sock(&smc->sk);
}
-void smc_tx_consumer_update(struct smc_connection *conn)
+void smc_tx_consumer_update(struct smc_connection *conn, bool force)
{
union smc_host_cursor cfed, cons;
int to_confirm;
@@ -466,11 +504,12 @@
smc_curs_write(&cfed,
smc_curs_read(&conn->rx_curs_confirmed, conn),
conn);
- to_confirm = smc_curs_diff(conn->rmbe_size, &cfed, &cons);
+ to_confirm = smc_curs_diff(conn->rmb_desc->len, &cfed, &cons);
if (conn->local_rx_ctrl.prod_flags.cons_curs_upd_req ||
+ force ||
((to_confirm > conn->rmbe_update_limit) &&
- ((to_confirm > (conn->rmbe_size / 2)) ||
+ ((to_confirm > (conn->rmb_desc->len / 2)) ||
conn->local_rx_ctrl.prod_flags.write_blocked))) {
if ((smc_cdc_get_slot_and_msg_send(conn) < 0) &&
conn->alert_token_local) { /* connection healthy */
@@ -494,6 +533,4 @@
void smc_tx_init(struct smc_sock *smc)
{
smc->sk.sk_write_space = smc_tx_write_space;
- INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
- spin_lock_init(&smc->conn.send_lock);
}
diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h
index 7825596..9d223890 100644
--- a/net/smc/smc_tx.h
+++ b/net/smc/smc_tx.h
@@ -24,13 +24,14 @@
smc_curs_write(&sent, smc_curs_read(&conn->tx_curs_sent, conn), conn);
smc_curs_write(&prep, smc_curs_read(&conn->tx_curs_prep, conn), conn);
- return smc_curs_diff(conn->sndbuf_size, &sent, &prep);
+ return smc_curs_diff(conn->sndbuf_desc->len, &sent, &prep);
}
+void smc_tx_work(struct work_struct *work);
void smc_tx_init(struct smc_sock *smc);
int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
int smc_tx_sndbuf_nonempty(struct smc_connection *conn);
void smc_tx_sndbuf_nonfull(struct smc_sock *smc);
-void smc_tx_consumer_update(struct smc_connection *conn);
+void smc_tx_consumer_update(struct smc_connection *conn, bool force);
#endif /* SMC_TX_H */
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 1b8af23..cc7c1bb 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -376,6 +376,7 @@
for (i = 0; i < num; i++) {
link = wc[i].qp->qp_context;
if (wc[i].status == IB_WC_SUCCESS) {
+ link->wr_rx_tstamp = jiffies;
smc_wr_rx_demultiplex(&wc[i]);
smc_wr_rx_post(link); /* refill WR RX */
} else {
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index f7d47c8..2dfb492 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -697,6 +697,9 @@
goto prop_msg_full;
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->window))
goto prop_msg_full;
+ if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP)
+ if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu))
+ goto prop_msg_full;
nla_nest_end(msg->skb, prop);
@@ -979,12 +982,23 @@
if (props[TIPC_NLA_PROP_TOL]) {
b->tolerance = nla_get_u32(props[TIPC_NLA_PROP_TOL]);
- tipc_node_apply_tolerance(net, b);
+ tipc_node_apply_property(net, b, TIPC_NLA_PROP_TOL);
}
if (props[TIPC_NLA_PROP_PRIO])
b->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
if (props[TIPC_NLA_PROP_WIN])
b->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+ if (props[TIPC_NLA_PROP_MTU]) {
+ if (b->media->type_id != TIPC_MEDIA_TYPE_UDP)
+ return -EINVAL;
+#ifdef CONFIG_TIPC_MEDIA_UDP
+ if (tipc_udp_mtu_bad(nla_get_u32
+ (props[TIPC_NLA_PROP_MTU])))
+ return -EINVAL;
+ b->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
+ tipc_node_apply_property(net, b, TIPC_NLA_PROP_MTU);
+#endif
+ }
}
return 0;
@@ -1029,6 +1043,9 @@
goto prop_msg_full;
if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->window))
goto prop_msg_full;
+ if (media->type_id == TIPC_MEDIA_TYPE_UDP)
+ if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, media->mtu))
+ goto prop_msg_full;
nla_nest_end(msg->skb, prop);
nla_nest_end(msg->skb, attrs);
@@ -1158,6 +1175,16 @@
m->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]);
if (props[TIPC_NLA_PROP_WIN])
m->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+ if (props[TIPC_NLA_PROP_MTU]) {
+ if (m->type_id != TIPC_MEDIA_TYPE_UDP)
+ return -EINVAL;
+#ifdef CONFIG_TIPC_MEDIA_UDP
+ if (tipc_udp_mtu_bad(nla_get_u32
+ (props[TIPC_NLA_PROP_MTU])))
+ return -EINVAL;
+ m->mtu = nla_get_u32(props[TIPC_NLA_PROP_MTU]);
+#endif
+ }
}
return 0;
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 6efcee6..394290c 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -94,6 +94,8 @@
* @priority: default link (and bearer) priority
* @tolerance: default time (in ms) before declaring link failure
* @window: default window (in packets) before declaring link congestion
+ * @mtu: max packet size bearer can support for media type not dependent on
+ * underlying device MTU
* @type_id: TIPC media identifier
* @hwaddr_len: TIPC media address len
* @name: media name
@@ -118,6 +120,7 @@
u32 priority;
u32 tolerance;
u32 window;
+ u32 mtu;
u32 type_id;
u32 hwaddr_len;
char name[TIPC_MAX_MEDIA_NAME];
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index dd1c4fa..bebe88c 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -136,12 +136,12 @@
}
/**
- * tipc_service_find_range - find service range matching a service instance
+ * tipc_service_first_range - find first service range in tree matching instance
*
* Very time-critical, so binary search through range rb tree
*/
-static struct service_range *tipc_service_find_range(struct tipc_service *sc,
- u32 instance)
+static struct service_range *tipc_service_first_range(struct tipc_service *sc,
+ u32 instance)
{
struct rb_node *n = sc->ranges.rb_node;
struct service_range *sr;
@@ -158,6 +158,30 @@
return NULL;
}
+/* tipc_service_find_range - find service range matching publication parameters
+ */
+static struct service_range *tipc_service_find_range(struct tipc_service *sc,
+ u32 lower, u32 upper)
+{
+ struct rb_node *n = sc->ranges.rb_node;
+ struct service_range *sr;
+
+ sr = tipc_service_first_range(sc, lower);
+ if (!sr)
+ return NULL;
+
+ /* Look for exact match */
+ for (n = &sr->tree_node; n; n = rb_next(n)) {
+ sr = container_of(n, struct service_range, tree_node);
+ if (sr->upper == upper)
+ break;
+ }
+ if (!n || sr->lower != lower || sr->upper != upper)
+ return NULL;
+
+ return sr;
+}
+
static struct service_range *tipc_service_create_range(struct tipc_service *sc,
u32 lower, u32 upper)
{
@@ -238,54 +262,19 @@
/**
* tipc_service_remove_publ - remove a publication from a service
*/
-static struct publication *tipc_service_remove_publ(struct net *net,
- struct tipc_service *sc,
- u32 lower, u32 upper,
- u32 node, u32 key,
- struct service_range **rng)
+static struct publication *tipc_service_remove_publ(struct service_range *sr,
+ u32 node, u32 key)
{
- struct tipc_subscription *sub, *tmp;
- struct service_range *sr;
struct publication *p;
- bool found = false;
- bool last = false;
- struct rb_node *n;
- sr = tipc_service_find_range(sc, lower);
- if (!sr)
- return NULL;
-
- /* Find exact matching service range */
- for (n = &sr->tree_node; n; n = rb_next(n)) {
- sr = container_of(n, struct service_range, tree_node);
- if (sr->upper == upper)
- break;
- }
- if (!n || sr->lower != lower || sr->upper != upper)
- return NULL;
-
- /* Find publication, if it exists */
list_for_each_entry(p, &sr->all_publ, all_publ) {
if (p->key != key || (node && node != p->node))
continue;
- found = true;
- break;
+ list_del(&p->all_publ);
+ list_del(&p->local_publ);
+ return p;
}
- if (!found)
- return NULL;
-
- list_del(&p->all_publ);
- list_del(&p->local_publ);
- if (list_empty(&sr->all_publ))
- last = true;
-
- /* Notify any waiting subscriptions */
- list_for_each_entry_safe(sub, tmp, &sc->subscriptions, service_list) {
- tipc_sub_report_overlap(sub, p->lower, p->upper, TIPC_WITHDRAWN,
- p->port, p->node, p->scope, last);
- }
- *rng = sr;
- return p;
+ return NULL;
}
/**
@@ -376,17 +365,31 @@
u32 node, u32 key)
{
struct tipc_service *sc = tipc_service_find(net, type);
+ struct tipc_subscription *sub, *tmp;
struct service_range *sr = NULL;
struct publication *p = NULL;
+ bool last;
if (!sc)
return NULL;
spin_lock_bh(&sc->lock);
- p = tipc_service_remove_publ(net, sc, lower, upper, node, key, &sr);
+ sr = tipc_service_find_range(sc, lower, upper);
+ if (!sr)
+ goto exit;
+ p = tipc_service_remove_publ(sr, node, key);
+ if (!p)
+ goto exit;
+
+ /* Notify any waiting subscriptions */
+ last = list_empty(&sr->all_publ);
+ list_for_each_entry_safe(sub, tmp, &sc->subscriptions, service_list) {
+ tipc_sub_report_overlap(sub, lower, upper, TIPC_WITHDRAWN,
+ p->port, node, p->scope, last);
+ }
/* Remove service range item if this was its last publication */
- if (sr && list_empty(&sr->all_publ)) {
+ if (list_empty(&sr->all_publ)) {
rb_erase(&sr->tree_node, &sc->ranges);
kfree(sr);
}
@@ -396,6 +399,7 @@
hlist_del_init_rcu(&sc->service_list);
kfree_rcu(sc, rcu);
}
+exit:
spin_unlock_bh(&sc->lock);
return p;
}
@@ -437,7 +441,7 @@
goto not_found;
spin_lock_bh(&sc->lock);
- sr = tipc_service_find_range(sc, instance);
+ sr = tipc_service_first_range(sc, instance);
if (unlikely(!sr))
goto no_match;
@@ -484,7 +488,7 @@
spin_lock_bh(&sc->lock);
- sr = tipc_service_find_range(sc, instance);
+ sr = tipc_service_first_range(sc, instance);
if (!sr)
goto no_match;
@@ -756,8 +760,7 @@
spin_lock_bh(&sc->lock);
rbtree_postorder_for_each_entry_safe(sr, tmpr, &sc->ranges, tree_node) {
list_for_each_entry_safe(p, tmp, &sr->all_publ, all_publ) {
- tipc_service_remove_publ(net, sc, p->lower, p->upper,
- p->node, p->key, &sr);
+ tipc_service_remove_publ(sr, p->node, p->key);
kfree_rcu(p, rcu);
}
rb_erase(&sr->tree_node, &sc->ranges);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index f29549d..6a44eb8 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -195,6 +195,27 @@
return mtu;
}
+bool tipc_node_get_id(struct net *net, u32 addr, u8 *id)
+{
+ u8 *own_id = tipc_own_id(net);
+ struct tipc_node *n;
+
+ if (!own_id)
+ return true;
+
+ if (addr == tipc_own_addr(net)) {
+ memcpy(id, own_id, TIPC_NODEID_LEN);
+ return true;
+ }
+ n = tipc_node_find(net, addr);
+ if (!n)
+ return false;
+
+ memcpy(id, &n->peer_id, TIPC_NODEID_LEN);
+ tipc_node_put(n);
+ return true;
+}
+
u16 tipc_node_get_capabilities(struct net *net, u32 addr)
{
struct tipc_node *n;
@@ -1681,7 +1702,8 @@
kfree_skb(skb);
}
-void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b)
+void tipc_node_apply_property(struct net *net, struct tipc_bearer *b,
+ int prop)
{
struct tipc_net *tn = tipc_net(net);
int bearer_id = b->identity;
@@ -1696,8 +1718,13 @@
list_for_each_entry_rcu(n, &tn->node_list, list) {
tipc_node_write_lock(n);
e = &n->links[bearer_id];
- if (e->link)
- tipc_link_set_tolerance(e->link, b->tolerance, &xmitq);
+ if (e->link) {
+ if (prop == TIPC_NLA_PROP_TOL)
+ tipc_link_set_tolerance(e->link, b->tolerance,
+ &xmitq);
+ else if (prop == TIPC_NLA_PROP_MTU)
+ tipc_link_set_mtu(e->link, b->mtu);
+ }
tipc_node_write_unlock(n);
tipc_bearer_xmit(net, bearer_id, &xmitq, &e->maddr);
}
diff --git a/net/tipc/node.h b/net/tipc/node.h
index f24b835..846c8f2 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -60,6 +60,7 @@
#define INVALID_BEARER_ID -1
void tipc_node_stop(struct net *net);
+bool tipc_node_get_id(struct net *net, u32 addr, u8 *id);
u32 tipc_node_try_addr(struct net *net, u8 *id, u32 addr);
void tipc_node_check_dest(struct net *net, u32 onode, u8 *peer_id128,
struct tipc_bearer *bearer,
@@ -67,7 +68,7 @@
struct tipc_media_addr *maddr,
bool *respond, bool *dupl_addr);
void tipc_node_delete_links(struct net *net, int bearer_id);
-void tipc_node_apply_tolerance(struct net *net, struct tipc_bearer *b);
+void tipc_node_apply_property(struct net *net, struct tipc_bearer *b, int prop);
int tipc_node_get_linkname(struct net *net, u32 bearer_id, u32 node,
char *linkname, size_t len);
int tipc_node_xmit(struct net *net, struct sk_buff_head *list, u32 dnode,
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6be2157..930852c 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2974,7 +2974,8 @@
static int tipc_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
{
- struct sock *sk = sock->sk;
+ struct net *net = sock_net(sock->sk);
+ struct tipc_sioc_nodeid_req nr = {0};
struct tipc_sioc_ln_req lnr;
void __user *argp = (void __user *)arg;
@@ -2982,7 +2983,7 @@
case SIOCGETLINKNAME:
if (copy_from_user(&lnr, argp, sizeof(lnr)))
return -EFAULT;
- if (!tipc_node_get_linkname(sock_net(sk),
+ if (!tipc_node_get_linkname(net,
lnr.bearer_id & 0xffff, lnr.peer,
lnr.linkname, TIPC_MAX_LINK_NAME)) {
if (copy_to_user(argp, &lnr, sizeof(lnr)))
@@ -2990,6 +2991,14 @@
return 0;
}
return -EADDRNOTAVAIL;
+ case SIOCGETNODEID:
+ if (copy_from_user(&nr, argp, sizeof(nr)))
+ return -EFAULT;
+ if (!tipc_node_get_id(net, nr.peer, nr.node_id))
+ return -EADDRNOTAVAIL;
+ if (copy_to_user(argp, &nr, sizeof(nr)))
+ return -EFAULT;
+ return 0;
default:
return -ENOIOCTLCMD;
}
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index e7d91f5..9783101 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -713,8 +713,7 @@
err = -EINVAL;
goto err;
}
- b->mtu = dev->mtu - sizeof(struct iphdr)
- - sizeof(struct udphdr);
+ b->mtu = b->media->mtu;
#if IS_ENABLED(CONFIG_IPV6)
} else if (local.proto == htons(ETH_P_IPV6)) {
udp_conf.family = AF_INET6;
@@ -803,6 +802,7 @@
.priority = TIPC_DEF_LINK_PRI,
.tolerance = TIPC_DEF_LINK_TOL,
.window = TIPC_DEF_LINK_WIN,
+ .mtu = TIPC_DEF_LINK_UDP_MTU,
.type_id = TIPC_MEDIA_TYPE_UDP,
.hwaddr_len = 0,
.name = "udp"
diff --git a/net/tipc/udp_media.h b/net/tipc/udp_media.h
index 281bbae..e7455cc 100644
--- a/net/tipc/udp_media.h
+++ b/net/tipc/udp_media.h
@@ -38,9 +38,23 @@
#ifndef _TIPC_UDP_MEDIA_H
#define _TIPC_UDP_MEDIA_H
+#include <linux/ip.h>
+#include <linux/udp.h>
+
int tipc_udp_nl_bearer_add(struct tipc_bearer *b, struct nlattr *attr);
int tipc_udp_nl_add_bearer_data(struct tipc_nl_msg *msg, struct tipc_bearer *b);
int tipc_udp_nl_dump_remoteip(struct sk_buff *skb, struct netlink_callback *cb);
+/* check if configured MTU is too low for tipc headers */
+static inline bool tipc_udp_mtu_bad(u32 mtu)
+{
+ if (mtu >= (TIPC_MIN_BEARER_MTU + sizeof(struct iphdr) +
+ sizeof(struct udphdr)))
+ return false;
+
+ pr_warn("MTU too low for tipc bearer\n");
+ return true;
+}
+
#endif
#endif
diff --git a/net/tls/Kconfig b/net/tls/Kconfig
index 89b8745a..73f05ec 100644
--- a/net/tls/Kconfig
+++ b/net/tls/Kconfig
@@ -14,3 +14,13 @@
encryption handling of the TLS protocol to be done in-kernel.
If unsure, say N.
+
+config TLS_DEVICE
+ bool "Transport Layer Security HW offload"
+ depends on TLS
+ select SOCK_VALIDATE_XMIT
+ default n
+ help
+ Enable kernel support for HW offload of the TLS protocol.
+
+ If unsure, say N.
diff --git a/net/tls/Makefile b/net/tls/Makefile
index a930fd1..4d6b728 100644
--- a/net/tls/Makefile
+++ b/net/tls/Makefile
@@ -5,3 +5,5 @@
obj-$(CONFIG_TLS) += tls.o
tls-y := tls_main.o tls_sw.o
+
+tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
new file mode 100644
index 0000000..a7a8f8e
--- /dev/null
+++ b/net/tls/tls_device.c
@@ -0,0 +1,766 @@
+/* Copyright (c) 2018, Mellanox Technologies All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <crypto/aead.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <net/dst.h>
+#include <net/inet_connection_sock.h>
+#include <net/tcp.h>
+#include <net/tls.h>
+
+/* device_offload_lock is used to synchronize tls_dev_add
+ * against NETDEV_DOWN notifications.
+ */
+static DECLARE_RWSEM(device_offload_lock);
+
+static void tls_device_gc_task(struct work_struct *work);
+
+static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task);
+static LIST_HEAD(tls_device_gc_list);
+static LIST_HEAD(tls_device_list);
+static DEFINE_SPINLOCK(tls_device_lock);
+
+static void tls_device_free_ctx(struct tls_context *ctx)
+{
+ struct tls_offload_context *offload_ctx = tls_offload_ctx(ctx);
+
+ kfree(offload_ctx);
+ kfree(ctx);
+}
+
+static void tls_device_gc_task(struct work_struct *work)
+{
+ struct tls_context *ctx, *tmp;
+ unsigned long flags;
+ LIST_HEAD(gc_list);
+
+ spin_lock_irqsave(&tls_device_lock, flags);
+ list_splice_init(&tls_device_gc_list, &gc_list);
+ spin_unlock_irqrestore(&tls_device_lock, flags);
+
+ list_for_each_entry_safe(ctx, tmp, &gc_list, list) {
+ struct net_device *netdev = ctx->netdev;
+
+ if (netdev) {
+ netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
+ TLS_OFFLOAD_CTX_DIR_TX);
+ dev_put(netdev);
+ }
+
+ list_del(&ctx->list);
+ tls_device_free_ctx(ctx);
+ }
+}
+
+static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&tls_device_lock, flags);
+ list_move_tail(&ctx->list, &tls_device_gc_list);
+
+ /* schedule_work inside the spinlock
+ * to make sure tls_device_down waits for that work.
+ */
+ schedule_work(&tls_device_gc_work);
+
+ spin_unlock_irqrestore(&tls_device_lock, flags);
+}
+
+/* We assume that the socket is already connected */
+static struct net_device *get_netdev_for_sock(struct sock *sk)
+{
+ struct dst_entry *dst = sk_dst_get(sk);
+ struct net_device *netdev = NULL;
+
+ if (likely(dst)) {
+ netdev = dst->dev;
+ dev_hold(netdev);
+ }
+
+ dst_release(dst);
+
+ return netdev;
+}
+
+static void destroy_record(struct tls_record_info *record)
+{
+ int nr_frags = record->num_frags;
+ skb_frag_t *frag;
+
+ while (nr_frags-- > 0) {
+ frag = &record->frags[nr_frags];
+ __skb_frag_unref(frag);
+ }
+ kfree(record);
+}
+
+static void delete_all_records(struct tls_offload_context *offload_ctx)
+{
+ struct tls_record_info *info, *temp;
+
+ list_for_each_entry_safe(info, temp, &offload_ctx->records_list, list) {
+ list_del(&info->list);
+ destroy_record(info);
+ }
+
+ offload_ctx->retransmit_hint = NULL;
+}
+
+static void tls_icsk_clean_acked(struct sock *sk, u32 acked_seq)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_record_info *info, *temp;
+ struct tls_offload_context *ctx;
+ u64 deleted_records = 0;
+ unsigned long flags;
+
+ if (!tls_ctx)
+ return;
+
+ ctx = tls_offload_ctx(tls_ctx);
+
+ spin_lock_irqsave(&ctx->lock, flags);
+ info = ctx->retransmit_hint;
+ if (info && !before(acked_seq, info->end_seq)) {
+ ctx->retransmit_hint = NULL;
+ list_del(&info->list);
+ destroy_record(info);
+ deleted_records++;
+ }
+
+ list_for_each_entry_safe(info, temp, &ctx->records_list, list) {
+ if (before(acked_seq, info->end_seq))
+ break;
+ list_del(&info->list);
+
+ destroy_record(info);
+ deleted_records++;
+ }
+
+ ctx->unacked_record_sn += deleted_records;
+ spin_unlock_irqrestore(&ctx->lock, flags);
+}
+
+/* At this point, there should be no references on this
+ * socket and no in-flight SKBs associated with this
+ * socket, so it is safe to free all the resources.
+ */
+void tls_device_sk_destruct(struct sock *sk)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx);
+
+ if (ctx->open_record)
+ destroy_record(ctx->open_record);
+
+ delete_all_records(ctx);
+ crypto_free_aead(ctx->aead_send);
+ ctx->sk_destruct(sk);
+ clean_acked_data_disable(inet_csk(sk));
+
+ if (refcount_dec_and_test(&tls_ctx->refcount))
+ tls_device_queue_ctx_destruction(tls_ctx);
+}
+EXPORT_SYMBOL(tls_device_sk_destruct);
+
+static void tls_append_frag(struct tls_record_info *record,
+ struct page_frag *pfrag,
+ int size)
+{
+ skb_frag_t *frag;
+
+ frag = &record->frags[record->num_frags - 1];
+ if (frag->page.p == pfrag->page &&
+ frag->page_offset + frag->size == pfrag->offset) {
+ frag->size += size;
+ } else {
+ ++frag;
+ frag->page.p = pfrag->page;
+ frag->page_offset = pfrag->offset;
+ frag->size = size;
+ ++record->num_frags;
+ get_page(pfrag->page);
+ }
+
+ pfrag->offset += size;
+ record->len += size;
+}
+
+static int tls_push_record(struct sock *sk,
+ struct tls_context *ctx,
+ struct tls_offload_context *offload_ctx,
+ struct tls_record_info *record,
+ struct page_frag *pfrag,
+ int flags,
+ unsigned char record_type)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+ struct page_frag dummy_tag_frag;
+ skb_frag_t *frag;
+ int i;
+
+ /* fill prepend */
+ frag = &record->frags[0];
+ tls_fill_prepend(ctx,
+ skb_frag_address(frag),
+ record->len - ctx->tx.prepend_size,
+ record_type);
+
+ /* HW doesn't care about the data in the tag, because it fills it. */
+ dummy_tag_frag.page = skb_frag_page(frag);
+ dummy_tag_frag.offset = 0;
+
+ tls_append_frag(record, &dummy_tag_frag, ctx->tx.tag_size);
+ record->end_seq = tp->write_seq + record->len;
+ spin_lock_irq(&offload_ctx->lock);
+ list_add_tail(&record->list, &offload_ctx->records_list);
+ spin_unlock_irq(&offload_ctx->lock);
+ offload_ctx->open_record = NULL;
+ set_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags);
+ tls_advance_record_sn(sk, &ctx->tx);
+
+ for (i = 0; i < record->num_frags; i++) {
+ frag = &record->frags[i];
+ sg_unmark_end(&offload_ctx->sg_tx_data[i]);
+ sg_set_page(&offload_ctx->sg_tx_data[i], skb_frag_page(frag),
+ frag->size, frag->page_offset);
+ sk_mem_charge(sk, frag->size);
+ get_page(skb_frag_page(frag));
+ }
+ sg_mark_end(&offload_ctx->sg_tx_data[record->num_frags - 1]);
+
+ /* all ready, send */
+ return tls_push_sg(sk, ctx, offload_ctx->sg_tx_data, 0, flags);
+}
+
+static int tls_create_new_record(struct tls_offload_context *offload_ctx,
+ struct page_frag *pfrag,
+ size_t prepend_size)
+{
+ struct tls_record_info *record;
+ skb_frag_t *frag;
+
+ record = kmalloc(sizeof(*record), GFP_KERNEL);
+ if (!record)
+ return -ENOMEM;
+
+ frag = &record->frags[0];
+ __skb_frag_set_page(frag, pfrag->page);
+ frag->page_offset = pfrag->offset;
+ skb_frag_size_set(frag, prepend_size);
+
+ get_page(pfrag->page);
+ pfrag->offset += prepend_size;
+
+ record->num_frags = 1;
+ record->len = prepend_size;
+ offload_ctx->open_record = record;
+ return 0;
+}
+
+static int tls_do_allocation(struct sock *sk,
+ struct tls_offload_context *offload_ctx,
+ struct page_frag *pfrag,
+ size_t prepend_size)
+{
+ int ret;
+
+ if (!offload_ctx->open_record) {
+ if (unlikely(!skb_page_frag_refill(prepend_size, pfrag,
+ sk->sk_allocation))) {
+ sk->sk_prot->enter_memory_pressure(sk);
+ sk_stream_moderate_sndbuf(sk);
+ return -ENOMEM;
+ }
+
+ ret = tls_create_new_record(offload_ctx, pfrag, prepend_size);
+ if (ret)
+ return ret;
+
+ if (pfrag->size > pfrag->offset)
+ return 0;
+ }
+
+ if (!sk_page_frag_refill(sk, pfrag))
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int tls_push_data(struct sock *sk,
+ struct iov_iter *msg_iter,
+ size_t size, int flags,
+ unsigned char record_type)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx);
+ int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
+ int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE);
+ struct tls_record_info *record = ctx->open_record;
+ struct page_frag *pfrag;
+ size_t orig_size = size;
+ u32 max_open_record_len;
+ int copy, rc = 0;
+ bool done = false;
+ long timeo;
+
+ if (flags &
+ ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST))
+ return -ENOTSUPP;
+
+ if (sk->sk_err)
+ return -sk->sk_err;
+
+ timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+ rc = tls_complete_pending_work(sk, tls_ctx, flags, &timeo);
+ if (rc < 0)
+ return rc;
+
+ pfrag = sk_page_frag(sk);
+
+ /* TLS_HEADER_SIZE is not counted as part of the TLS record, and
+ * we need to leave room for an authentication tag.
+ */
+ max_open_record_len = TLS_MAX_PAYLOAD_SIZE +
+ tls_ctx->tx.prepend_size;
+ do {
+ rc = tls_do_allocation(sk, ctx, pfrag,
+ tls_ctx->tx.prepend_size);
+ if (rc) {
+ rc = sk_stream_wait_memory(sk, &timeo);
+ if (!rc)
+ continue;
+
+ record = ctx->open_record;
+ if (!record)
+ break;
+handle_error:
+ if (record_type != TLS_RECORD_TYPE_DATA) {
+ /* avoid sending partial
+ * record with type !=
+ * application_data
+ */
+ size = orig_size;
+ destroy_record(record);
+ ctx->open_record = NULL;
+ } else if (record->len > tls_ctx->tx.prepend_size) {
+ goto last_record;
+ }
+
+ break;
+ }
+
+ record = ctx->open_record;
+ copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
+ copy = min_t(size_t, copy, (max_open_record_len - record->len));
+
+ if (copy_from_iter_nocache(page_address(pfrag->page) +
+ pfrag->offset,
+ copy, msg_iter) != copy) {
+ rc = -EFAULT;
+ goto handle_error;
+ }
+ tls_append_frag(record, pfrag, copy);
+
+ size -= copy;
+ if (!size) {
+last_record:
+ tls_push_record_flags = flags;
+ if (more) {
+ tls_ctx->pending_open_record_frags =
+ record->num_frags;
+ break;
+ }
+
+ done = true;
+ }
+
+ if (done || record->len >= max_open_record_len ||
+ (record->num_frags >= MAX_SKB_FRAGS - 1)) {
+ rc = tls_push_record(sk,
+ tls_ctx,
+ ctx,
+ record,
+ pfrag,
+ tls_push_record_flags,
+ record_type);
+ if (rc < 0)
+ break;
+ }
+ } while (!done);
+
+ if (orig_size - size > 0)
+ rc = orig_size - size;
+
+ return rc;
+}
+
+int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+{
+ unsigned char record_type = TLS_RECORD_TYPE_DATA;
+ int rc;
+
+ lock_sock(sk);
+
+ if (unlikely(msg->msg_controllen)) {
+ rc = tls_proccess_cmsg(sk, msg, &record_type);
+ if (rc)
+ goto out;
+ }
+
+ rc = tls_push_data(sk, &msg->msg_iter, size,
+ msg->msg_flags, record_type);
+
+out:
+ release_sock(sk);
+ return rc;
+}
+
+int tls_device_sendpage(struct sock *sk, struct page *page,
+ int offset, size_t size, int flags)
+{
+ struct iov_iter msg_iter;
+ char *kaddr = kmap(page);
+ struct kvec iov;
+ int rc;
+
+ if (flags & MSG_SENDPAGE_NOTLAST)
+ flags |= MSG_MORE;
+
+ lock_sock(sk);
+
+ if (flags & MSG_OOB) {
+ rc = -ENOTSUPP;
+ goto out;
+ }
+
+ iov.iov_base = kaddr + offset;
+ iov.iov_len = size;
+ iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, &iov, 1, size);
+ rc = tls_push_data(sk, &msg_iter, size,
+ flags, TLS_RECORD_TYPE_DATA);
+ kunmap(page);
+
+out:
+ release_sock(sk);
+ return rc;
+}
+
+struct tls_record_info *tls_get_record(struct tls_offload_context *context,
+ u32 seq, u64 *p_record_sn)
+{
+ u64 record_sn = context->hint_record_sn;
+ struct tls_record_info *info;
+
+ info = context->retransmit_hint;
+ if (!info ||
+ before(seq, info->end_seq - info->len)) {
+ /* if retransmit_hint is irrelevant start
+ * from the beggining of the list
+ */
+ info = list_first_entry(&context->records_list,
+ struct tls_record_info, list);
+ record_sn = context->unacked_record_sn;
+ }
+
+ list_for_each_entry_from(info, &context->records_list, list) {
+ if (before(seq, info->end_seq)) {
+ if (!context->retransmit_hint ||
+ after(info->end_seq,
+ context->retransmit_hint->end_seq)) {
+ context->hint_record_sn = record_sn;
+ context->retransmit_hint = info;
+ }
+ *p_record_sn = record_sn;
+ return info;
+ }
+ record_sn++;
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL(tls_get_record);
+
+static int tls_device_push_pending_record(struct sock *sk, int flags)
+{
+ struct iov_iter msg_iter;
+
+ iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, NULL, 0, 0);
+ return tls_push_data(sk, &msg_iter, 0, flags, TLS_RECORD_TYPE_DATA);
+}
+
+int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
+{
+ u16 nonce_size, tag_size, iv_size, rec_seq_size;
+ struct tls_record_info *start_marker_record;
+ struct tls_offload_context *offload_ctx;
+ struct tls_crypto_info *crypto_info;
+ struct net_device *netdev;
+ char *iv, *rec_seq;
+ struct sk_buff *skb;
+ int rc = -EINVAL;
+ __be64 rcd_sn;
+
+ if (!ctx)
+ goto out;
+
+ if (ctx->priv_ctx_tx) {
+ rc = -EEXIST;
+ goto out;
+ }
+
+ start_marker_record = kmalloc(sizeof(*start_marker_record), GFP_KERNEL);
+ if (!start_marker_record) {
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ offload_ctx = kzalloc(TLS_OFFLOAD_CONTEXT_SIZE, GFP_KERNEL);
+ if (!offload_ctx) {
+ rc = -ENOMEM;
+ goto free_marker_record;
+ }
+
+ crypto_info = &ctx->crypto_send;
+ switch (crypto_info->cipher_type) {
+ case TLS_CIPHER_AES_GCM_128:
+ nonce_size = TLS_CIPHER_AES_GCM_128_IV_SIZE;
+ tag_size = TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+ iv_size = TLS_CIPHER_AES_GCM_128_IV_SIZE;
+ iv = ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->iv;
+ rec_seq_size = TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE;
+ rec_seq =
+ ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->rec_seq;
+ break;
+ default:
+ rc = -EINVAL;
+ goto free_offload_ctx;
+ }
+
+ ctx->tx.prepend_size = TLS_HEADER_SIZE + nonce_size;
+ ctx->tx.tag_size = tag_size;
+ ctx->tx.overhead_size = ctx->tx.prepend_size + ctx->tx.tag_size;
+ ctx->tx.iv_size = iv_size;
+ ctx->tx.iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+ GFP_KERNEL);
+ if (!ctx->tx.iv) {
+ rc = -ENOMEM;
+ goto free_offload_ctx;
+ }
+
+ memcpy(ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size);
+
+ ctx->tx.rec_seq_size = rec_seq_size;
+ ctx->tx.rec_seq = kmalloc(rec_seq_size, GFP_KERNEL);
+ if (!ctx->tx.rec_seq) {
+ rc = -ENOMEM;
+ goto free_iv;
+ }
+ memcpy(ctx->tx.rec_seq, rec_seq, rec_seq_size);
+
+ rc = tls_sw_fallback_init(sk, offload_ctx, crypto_info);
+ if (rc)
+ goto free_rec_seq;
+
+ /* start at rec_seq - 1 to account for the start marker record */
+ memcpy(&rcd_sn, ctx->tx.rec_seq, sizeof(rcd_sn));
+ offload_ctx->unacked_record_sn = be64_to_cpu(rcd_sn) - 1;
+
+ start_marker_record->end_seq = tcp_sk(sk)->write_seq;
+ start_marker_record->len = 0;
+ start_marker_record->num_frags = 0;
+
+ INIT_LIST_HEAD(&offload_ctx->records_list);
+ list_add_tail(&start_marker_record->list, &offload_ctx->records_list);
+ spin_lock_init(&offload_ctx->lock);
+ sg_init_table(offload_ctx->sg_tx_data,
+ ARRAY_SIZE(offload_ctx->sg_tx_data));
+
+ clean_acked_data_enable(inet_csk(sk), &tls_icsk_clean_acked);
+ ctx->push_pending_record = tls_device_push_pending_record;
+ offload_ctx->sk_destruct = sk->sk_destruct;
+
+ /* TLS offload is greatly simplified if we don't send
+ * SKBs where only part of the payload needs to be encrypted.
+ * So mark the last skb in the write queue as end of record.
+ */
+ skb = tcp_write_queue_tail(sk);
+ if (skb)
+ TCP_SKB_CB(skb)->eor = 1;
+
+ refcount_set(&ctx->refcount, 1);
+
+ /* We support starting offload on multiple sockets
+ * concurrently, so we only need a read lock here.
+ * This lock must precede get_netdev_for_sock to prevent races between
+ * NETDEV_DOWN and setsockopt.
+ */
+ down_read(&device_offload_lock);
+ netdev = get_netdev_for_sock(sk);
+ if (!netdev) {
+ pr_err_ratelimited("%s: netdev not found\n", __func__);
+ rc = -EINVAL;
+ goto release_lock;
+ }
+
+ if (!(netdev->features & NETIF_F_HW_TLS_TX)) {
+ rc = -ENOTSUPP;
+ goto release_netdev;
+ }
+
+ /* Avoid offloading if the device is down
+ * We don't want to offload new flows after
+ * the NETDEV_DOWN event
+ */
+ if (!(netdev->flags & IFF_UP)) {
+ rc = -EINVAL;
+ goto release_netdev;
+ }
+
+ ctx->priv_ctx_tx = offload_ctx;
+ rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, TLS_OFFLOAD_CTX_DIR_TX,
+ &ctx->crypto_send,
+ tcp_sk(sk)->write_seq);
+ if (rc)
+ goto release_netdev;
+
+ ctx->netdev = netdev;
+
+ spin_lock_irq(&tls_device_lock);
+ list_add_tail(&ctx->list, &tls_device_list);
+ spin_unlock_irq(&tls_device_lock);
+
+ sk->sk_validate_xmit_skb = tls_validate_xmit_skb;
+ /* following this assignment tls_is_sk_tx_device_offloaded
+ * will return true and the context might be accessed
+ * by the netdev's xmit function.
+ */
+ smp_store_release(&sk->sk_destruct,
+ &tls_device_sk_destruct);
+ up_read(&device_offload_lock);
+ goto out;
+
+release_netdev:
+ dev_put(netdev);
+release_lock:
+ up_read(&device_offload_lock);
+ clean_acked_data_disable(inet_csk(sk));
+ crypto_free_aead(offload_ctx->aead_send);
+free_rec_seq:
+ kfree(ctx->tx.rec_seq);
+free_iv:
+ kfree(ctx->tx.iv);
+free_offload_ctx:
+ kfree(offload_ctx);
+ ctx->priv_ctx_tx = NULL;
+free_marker_record:
+ kfree(start_marker_record);
+out:
+ return rc;
+}
+
+static int tls_device_down(struct net_device *netdev)
+{
+ struct tls_context *ctx, *tmp;
+ unsigned long flags;
+ LIST_HEAD(list);
+
+ /* Request a write lock to block new offload attempts */
+ down_write(&device_offload_lock);
+
+ spin_lock_irqsave(&tls_device_lock, flags);
+ list_for_each_entry_safe(ctx, tmp, &tls_device_list, list) {
+ if (ctx->netdev != netdev ||
+ !refcount_inc_not_zero(&ctx->refcount))
+ continue;
+
+ list_move(&ctx->list, &list);
+ }
+ spin_unlock_irqrestore(&tls_device_lock, flags);
+
+ list_for_each_entry_safe(ctx, tmp, &list, list) {
+ netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
+ TLS_OFFLOAD_CTX_DIR_TX);
+ ctx->netdev = NULL;
+ dev_put(netdev);
+ list_del_init(&ctx->list);
+
+ if (refcount_dec_and_test(&ctx->refcount))
+ tls_device_free_ctx(ctx);
+ }
+
+ up_write(&device_offload_lock);
+
+ flush_work(&tls_device_gc_work);
+
+ return NOTIFY_DONE;
+}
+
+static int tls_dev_event(struct notifier_block *this, unsigned long event,
+ void *ptr)
+{
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+ if (!(dev->features & NETIF_F_HW_TLS_TX))
+ return NOTIFY_DONE;
+
+ switch (event) {
+ case NETDEV_REGISTER:
+ case NETDEV_FEAT_CHANGE:
+ if (dev->tlsdev_ops &&
+ dev->tlsdev_ops->tls_dev_add &&
+ dev->tlsdev_ops->tls_dev_del)
+ return NOTIFY_DONE;
+ else
+ return NOTIFY_BAD;
+ case NETDEV_DOWN:
+ return tls_device_down(dev);
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block tls_dev_notifier = {
+ .notifier_call = tls_dev_event,
+};
+
+void __init tls_device_init(void)
+{
+ register_netdevice_notifier(&tls_dev_notifier);
+}
+
+void __exit tls_device_cleanup(void)
+{
+ unregister_netdevice_notifier(&tls_dev_notifier);
+ flush_work(&tls_device_gc_work);
+}
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
new file mode 100644
index 0000000..748914a
--- /dev/null
+++ b/net/tls/tls_device_fallback.c
@@ -0,0 +1,450 @@
+/* Copyright (c) 2018, Mellanox Technologies All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <net/tls.h>
+#include <crypto/aead.h>
+#include <crypto/scatterwalk.h>
+#include <net/ip6_checksum.h>
+
+static void chain_to_walk(struct scatterlist *sg, struct scatter_walk *walk)
+{
+ struct scatterlist *src = walk->sg;
+ int diff = walk->offset - src->offset;
+
+ sg_set_page(sg, sg_page(src),
+ src->length - diff, walk->offset);
+
+ scatterwalk_crypto_chain(sg, sg_next(src), 0, 2);
+}
+
+static int tls_enc_record(struct aead_request *aead_req,
+ struct crypto_aead *aead, char *aad,
+ char *iv, __be64 rcd_sn,
+ struct scatter_walk *in,
+ struct scatter_walk *out, int *in_len)
+{
+ unsigned char buf[TLS_HEADER_SIZE + TLS_CIPHER_AES_GCM_128_IV_SIZE];
+ struct scatterlist sg_in[3];
+ struct scatterlist sg_out[3];
+ u16 len;
+ int rc;
+
+ len = min_t(int, *in_len, ARRAY_SIZE(buf));
+
+ scatterwalk_copychunks(buf, in, len, 0);
+ scatterwalk_copychunks(buf, out, len, 1);
+
+ *in_len -= len;
+ if (!*in_len)
+ return 0;
+
+ scatterwalk_pagedone(in, 0, 1);
+ scatterwalk_pagedone(out, 1, 1);
+
+ len = buf[4] | (buf[3] << 8);
+ len -= TLS_CIPHER_AES_GCM_128_IV_SIZE;
+
+ tls_make_aad(aad, len - TLS_CIPHER_AES_GCM_128_TAG_SIZE,
+ (char *)&rcd_sn, sizeof(rcd_sn), buf[0]);
+
+ memcpy(iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, buf + TLS_HEADER_SIZE,
+ TLS_CIPHER_AES_GCM_128_IV_SIZE);
+
+ sg_init_table(sg_in, ARRAY_SIZE(sg_in));
+ sg_init_table(sg_out, ARRAY_SIZE(sg_out));
+ sg_set_buf(sg_in, aad, TLS_AAD_SPACE_SIZE);
+ sg_set_buf(sg_out, aad, TLS_AAD_SPACE_SIZE);
+ chain_to_walk(sg_in + 1, in);
+ chain_to_walk(sg_out + 1, out);
+
+ *in_len -= len;
+ if (*in_len < 0) {
+ *in_len += TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+ /* the input buffer doesn't contain the entire record.
+ * trim len accordingly. The resulting authentication tag
+ * will contain garbage, but we don't care, so we won't
+ * include any of it in the output skb
+ * Note that we assume the output buffer length
+ * is larger then input buffer length + tag size
+ */
+ if (*in_len < 0)
+ len += *in_len;
+
+ *in_len = 0;
+ }
+
+ if (*in_len) {
+ scatterwalk_copychunks(NULL, in, len, 2);
+ scatterwalk_pagedone(in, 0, 1);
+ scatterwalk_copychunks(NULL, out, len, 2);
+ scatterwalk_pagedone(out, 1, 1);
+ }
+
+ len -= TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+ aead_request_set_crypt(aead_req, sg_in, sg_out, len, iv);
+
+ rc = crypto_aead_encrypt(aead_req);
+
+ return rc;
+}
+
+static void tls_init_aead_request(struct aead_request *aead_req,
+ struct crypto_aead *aead)
+{
+ aead_request_set_tfm(aead_req, aead);
+ aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
+}
+
+static struct aead_request *tls_alloc_aead_request(struct crypto_aead *aead,
+ gfp_t flags)
+{
+ unsigned int req_size = sizeof(struct aead_request) +
+ crypto_aead_reqsize(aead);
+ struct aead_request *aead_req;
+
+ aead_req = kzalloc(req_size, flags);
+ if (aead_req)
+ tls_init_aead_request(aead_req, aead);
+ return aead_req;
+}
+
+static int tls_enc_records(struct aead_request *aead_req,
+ struct crypto_aead *aead, struct scatterlist *sg_in,
+ struct scatterlist *sg_out, char *aad, char *iv,
+ u64 rcd_sn, int len)
+{
+ struct scatter_walk out, in;
+ int rc;
+
+ scatterwalk_start(&in, sg_in);
+ scatterwalk_start(&out, sg_out);
+
+ do {
+ rc = tls_enc_record(aead_req, aead, aad, iv,
+ cpu_to_be64(rcd_sn), &in, &out, &len);
+ rcd_sn++;
+
+ } while (rc == 0 && len);
+
+ scatterwalk_done(&in, 0, 0);
+ scatterwalk_done(&out, 1, 0);
+
+ return rc;
+}
+
+/* Can't use icsk->icsk_af_ops->send_check here because the ip addresses
+ * might have been changed by NAT.
+ */
+static void update_chksum(struct sk_buff *skb, int headln)
+{
+ struct tcphdr *th = tcp_hdr(skb);
+ int datalen = skb->len - headln;
+ const struct ipv6hdr *ipv6h;
+ const struct iphdr *iph;
+
+ /* We only changed the payload so if we are using partial we don't
+ * need to update anything.
+ */
+ if (likely(skb->ip_summed == CHECKSUM_PARTIAL))
+ return;
+
+ skb->ip_summed = CHECKSUM_PARTIAL;
+ skb->csum_start = skb_transport_header(skb) - skb->head;
+ skb->csum_offset = offsetof(struct tcphdr, check);
+
+ if (skb->sk->sk_family == AF_INET6) {
+ ipv6h = ipv6_hdr(skb);
+ th->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr,
+ datalen, IPPROTO_TCP, 0);
+ } else {
+ iph = ip_hdr(skb);
+ th->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, datalen,
+ IPPROTO_TCP, 0);
+ }
+}
+
+static void complete_skb(struct sk_buff *nskb, struct sk_buff *skb, int headln)
+{
+ skb_copy_header(nskb, skb);
+
+ skb_put(nskb, skb->len);
+ memcpy(nskb->data, skb->data, headln);
+ update_chksum(nskb, headln);
+
+ nskb->destructor = skb->destructor;
+ nskb->sk = skb->sk;
+ skb->destructor = NULL;
+ skb->sk = NULL;
+ refcount_add(nskb->truesize - skb->truesize,
+ &nskb->sk->sk_wmem_alloc);
+}
+
+/* This function may be called after the user socket is already
+ * closed so make sure we don't use anything freed during
+ * tls_sk_proto_close here
+ */
+
+static int fill_sg_in(struct scatterlist *sg_in,
+ struct sk_buff *skb,
+ struct tls_offload_context *ctx,
+ u64 *rcd_sn,
+ s32 *sync_size,
+ int *resync_sgs)
+{
+ int tcp_payload_offset = skb_transport_offset(skb) + tcp_hdrlen(skb);
+ int payload_len = skb->len - tcp_payload_offset;
+ u32 tcp_seq = ntohl(tcp_hdr(skb)->seq);
+ struct tls_record_info *record;
+ unsigned long flags;
+ int remaining;
+ int i;
+
+ spin_lock_irqsave(&ctx->lock, flags);
+ record = tls_get_record(ctx, tcp_seq, rcd_sn);
+ if (!record) {
+ spin_unlock_irqrestore(&ctx->lock, flags);
+ WARN(1, "Record not found for seq %u\n", tcp_seq);
+ return -EINVAL;
+ }
+
+ *sync_size = tcp_seq - tls_record_start_seq(record);
+ if (*sync_size < 0) {
+ int is_start_marker = tls_record_is_start_marker(record);
+
+ spin_unlock_irqrestore(&ctx->lock, flags);
+ /* This should only occur if the relevant record was
+ * already acked. In that case it should be ok
+ * to drop the packet and avoid retransmission.
+ *
+ * There is a corner case where the packet contains
+ * both an acked and a non-acked record.
+ * We currently don't handle that case and rely
+ * on TCP to retranmit a packet that doesn't contain
+ * already acked payload.
+ */
+ if (!is_start_marker)
+ *sync_size = 0;
+ return -EINVAL;
+ }
+
+ remaining = *sync_size;
+ for (i = 0; remaining > 0; i++) {
+ skb_frag_t *frag = &record->frags[i];
+
+ __skb_frag_ref(frag);
+ sg_set_page(sg_in + i, skb_frag_page(frag),
+ skb_frag_size(frag), frag->page_offset);
+
+ remaining -= skb_frag_size(frag);
+
+ if (remaining < 0)
+ sg_in[i].length += remaining;
+ }
+ *resync_sgs = i;
+
+ spin_unlock_irqrestore(&ctx->lock, flags);
+ if (skb_to_sgvec(skb, &sg_in[i], tcp_payload_offset, payload_len) < 0)
+ return -EINVAL;
+
+ return 0;
+}
+
+static void fill_sg_out(struct scatterlist sg_out[3], void *buf,
+ struct tls_context *tls_ctx,
+ struct sk_buff *nskb,
+ int tcp_payload_offset,
+ int payload_len,
+ int sync_size,
+ void *dummy_buf)
+{
+ sg_set_buf(&sg_out[0], dummy_buf, sync_size);
+ sg_set_buf(&sg_out[1], nskb->data + tcp_payload_offset, payload_len);
+ /* Add room for authentication tag produced by crypto */
+ dummy_buf += sync_size;
+ sg_set_buf(&sg_out[2], dummy_buf, TLS_CIPHER_AES_GCM_128_TAG_SIZE);
+}
+
+static struct sk_buff *tls_enc_skb(struct tls_context *tls_ctx,
+ struct scatterlist sg_out[3],
+ struct scatterlist *sg_in,
+ struct sk_buff *skb,
+ s32 sync_size, u64 rcd_sn)
+{
+ int tcp_payload_offset = skb_transport_offset(skb) + tcp_hdrlen(skb);
+ struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx);
+ int payload_len = skb->len - tcp_payload_offset;
+ void *buf, *iv, *aad, *dummy_buf;
+ struct aead_request *aead_req;
+ struct sk_buff *nskb = NULL;
+ int buf_len;
+
+ aead_req = tls_alloc_aead_request(ctx->aead_send, GFP_ATOMIC);
+ if (!aead_req)
+ return NULL;
+
+ buf_len = TLS_CIPHER_AES_GCM_128_SALT_SIZE +
+ TLS_CIPHER_AES_GCM_128_IV_SIZE +
+ TLS_AAD_SPACE_SIZE +
+ sync_size +
+ TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+ buf = kmalloc(buf_len, GFP_ATOMIC);
+ if (!buf)
+ goto free_req;
+
+ iv = buf;
+ memcpy(iv, tls_ctx->crypto_send_aes_gcm_128.salt,
+ TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+ aad = buf + TLS_CIPHER_AES_GCM_128_SALT_SIZE +
+ TLS_CIPHER_AES_GCM_128_IV_SIZE;
+ dummy_buf = aad + TLS_AAD_SPACE_SIZE;
+
+ nskb = alloc_skb(skb_headroom(skb) + skb->len, GFP_ATOMIC);
+ if (!nskb)
+ goto free_buf;
+
+ skb_reserve(nskb, skb_headroom(skb));
+
+ fill_sg_out(sg_out, buf, tls_ctx, nskb, tcp_payload_offset,
+ payload_len, sync_size, dummy_buf);
+
+ if (tls_enc_records(aead_req, ctx->aead_send, sg_in, sg_out, aad, iv,
+ rcd_sn, sync_size + payload_len) < 0)
+ goto free_nskb;
+
+ complete_skb(nskb, skb, tcp_payload_offset);
+
+ /* validate_xmit_skb_list assumes that if the skb wasn't segmented
+ * nskb->prev will point to the skb itself
+ */
+ nskb->prev = nskb;
+
+free_buf:
+ kfree(buf);
+free_req:
+ kfree(aead_req);
+ return nskb;
+free_nskb:
+ kfree_skb(nskb);
+ nskb = NULL;
+ goto free_buf;
+}
+
+static struct sk_buff *tls_sw_fallback(struct sock *sk, struct sk_buff *skb)
+{
+ int tcp_payload_offset = skb_transport_offset(skb) + tcp_hdrlen(skb);
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx);
+ int payload_len = skb->len - tcp_payload_offset;
+ struct scatterlist *sg_in, sg_out[3];
+ struct sk_buff *nskb = NULL;
+ int sg_in_max_elements;
+ int resync_sgs = 0;
+ s32 sync_size = 0;
+ u64 rcd_sn;
+
+ /* worst case is:
+ * MAX_SKB_FRAGS in tls_record_info
+ * MAX_SKB_FRAGS + 1 in SKB head and frags.
+ */
+ sg_in_max_elements = 2 * MAX_SKB_FRAGS + 1;
+
+ if (!payload_len)
+ return skb;
+
+ sg_in = kmalloc_array(sg_in_max_elements, sizeof(*sg_in), GFP_ATOMIC);
+ if (!sg_in)
+ goto free_orig;
+
+ sg_init_table(sg_in, sg_in_max_elements);
+ sg_init_table(sg_out, ARRAY_SIZE(sg_out));
+
+ if (fill_sg_in(sg_in, skb, ctx, &rcd_sn, &sync_size, &resync_sgs)) {
+ /* bypass packets before kernel TLS socket option was set */
+ if (sync_size < 0 && payload_len <= -sync_size)
+ nskb = skb_get(skb);
+ goto put_sg;
+ }
+
+ nskb = tls_enc_skb(tls_ctx, sg_out, sg_in, skb, sync_size, rcd_sn);
+
+put_sg:
+ while (resync_sgs)
+ put_page(sg_page(&sg_in[--resync_sgs]));
+ kfree(sg_in);
+free_orig:
+ kfree_skb(skb);
+ return nskb;
+}
+
+struct sk_buff *tls_validate_xmit_skb(struct sock *sk,
+ struct net_device *dev,
+ struct sk_buff *skb)
+{
+ if (dev == tls_get_ctx(sk)->netdev)
+ return skb;
+
+ return tls_sw_fallback(sk, skb);
+}
+
+int tls_sw_fallback_init(struct sock *sk,
+ struct tls_offload_context *offload_ctx,
+ struct tls_crypto_info *crypto_info)
+{
+ const u8 *key;
+ int rc;
+
+ offload_ctx->aead_send =
+ crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);
+ if (IS_ERR(offload_ctx->aead_send)) {
+ rc = PTR_ERR(offload_ctx->aead_send);
+ pr_err_ratelimited("crypto_alloc_aead failed rc=%d\n", rc);
+ offload_ctx->aead_send = NULL;
+ goto err_out;
+ }
+
+ key = ((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->key;
+
+ rc = crypto_aead_setkey(offload_ctx->aead_send, key,
+ TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+ if (rc)
+ goto free_aead;
+
+ rc = crypto_aead_setauthsize(offload_ctx->aead_send,
+ TLS_CIPHER_AES_GCM_128_TAG_SIZE);
+ if (rc)
+ goto free_aead;
+
+ return 0;
+free_aead:
+ crypto_free_aead(offload_ctx->aead_send);
+err_out:
+ return rc;
+}
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 20cd93be62..301f224 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -51,12 +51,12 @@
TLSV6,
TLS_NUM_PROTS,
};
-
enum {
TLS_BASE,
- TLS_SW_TX,
- TLS_SW_RX,
- TLS_SW_RXTX,
+ TLS_SW,
+#ifdef CONFIG_TLS_DEVICE
+ TLS_HW,
+#endif
TLS_HW_RECORD,
TLS_NUM_CONFIG,
};
@@ -65,14 +65,14 @@
static DEFINE_MUTEX(tcpv6_prot_mutex);
static LIST_HEAD(device_list);
static DEFINE_MUTEX(device_mutex);
-static struct proto tls_prots[TLS_NUM_PROTS][TLS_NUM_CONFIG];
+static struct proto tls_prots[TLS_NUM_PROTS][TLS_NUM_CONFIG][TLS_NUM_CONFIG];
static struct proto_ops tls_sw_proto_ops;
-static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx)
+static void update_sk_prot(struct sock *sk, struct tls_context *ctx)
{
int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4;
- sk->sk_prot = &tls_prots[ip_ver][ctx->conf];
+ sk->sk_prot = &tls_prots[ip_ver][ctx->tx_conf][ctx->rx_conf];
}
int wait_on_pending_writer(struct sock *sk, long *timeo)
@@ -254,7 +254,8 @@
lock_sock(sk);
sk_proto_close = ctx->sk_proto_close;
- if (ctx->conf == TLS_BASE || ctx->conf == TLS_HW_RECORD) {
+ if ((ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD) ||
+ (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)) {
free_ctx = true;
goto skip_tx_cleanup;
}
@@ -275,15 +276,26 @@
}
}
- kfree(ctx->tx.rec_seq);
- kfree(ctx->tx.iv);
- kfree(ctx->rx.rec_seq);
- kfree(ctx->rx.iv);
+ /* We need these for tls_sw_fallback handling of other packets */
+ if (ctx->tx_conf == TLS_SW) {
+ kfree(ctx->tx.rec_seq);
+ kfree(ctx->tx.iv);
+ tls_sw_free_resources_tx(sk);
+ }
- if (ctx->conf == TLS_SW_TX ||
- ctx->conf == TLS_SW_RX ||
- ctx->conf == TLS_SW_RXTX) {
- tls_sw_free_resources(sk);
+ if (ctx->rx_conf == TLS_SW) {
+ kfree(ctx->rx.rec_seq);
+ kfree(ctx->rx.iv);
+ tls_sw_free_resources_rx(sk);
+ }
+
+#ifdef CONFIG_TLS_DEVICE
+ if (ctx->tx_conf != TLS_HW) {
+#else
+ {
+#endif
+ kfree(ctx);
+ ctx = NULL;
}
skip_tx_cleanup:
@@ -446,25 +458,29 @@
goto err_crypto_info;
}
- /* currently SW is default, we will have ethtool in future */
if (tx) {
- rc = tls_set_sw_offload(sk, ctx, 1);
- if (ctx->conf == TLS_SW_RX)
- conf = TLS_SW_RXTX;
- else
- conf = TLS_SW_TX;
+#ifdef CONFIG_TLS_DEVICE
+ rc = tls_set_device_offload(sk, ctx);
+ conf = TLS_HW;
+ if (rc) {
+#else
+ {
+#endif
+ rc = tls_set_sw_offload(sk, ctx, 1);
+ conf = TLS_SW;
+ }
} else {
rc = tls_set_sw_offload(sk, ctx, 0);
- if (ctx->conf == TLS_SW_TX)
- conf = TLS_SW_RXTX;
- else
- conf = TLS_SW_RX;
+ conf = TLS_SW;
}
if (rc)
goto err_crypto_info;
- ctx->conf = conf;
+ if (tx)
+ ctx->tx_conf = conf;
+ else
+ ctx->rx_conf = conf;
update_sk_prot(sk, ctx);
if (tx) {
ctx->sk_write_space = sk->sk_write_space;
@@ -540,7 +556,8 @@
ctx->hash = sk->sk_prot->hash;
ctx->unhash = sk->sk_prot->unhash;
ctx->sk_proto_close = sk->sk_prot->close;
- ctx->conf = TLS_HW_RECORD;
+ ctx->rx_conf = TLS_HW_RECORD;
+ ctx->tx_conf = TLS_HW_RECORD;
update_sk_prot(sk, ctx);
rc = 1;
break;
@@ -584,29 +601,40 @@
return err;
}
-static void build_protos(struct proto *prot, struct proto *base)
+static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
+ struct proto *base)
{
- prot[TLS_BASE] = *base;
- prot[TLS_BASE].setsockopt = tls_setsockopt;
- prot[TLS_BASE].getsockopt = tls_getsockopt;
- prot[TLS_BASE].close = tls_sk_proto_close;
+ prot[TLS_BASE][TLS_BASE] = *base;
+ prot[TLS_BASE][TLS_BASE].setsockopt = tls_setsockopt;
+ prot[TLS_BASE][TLS_BASE].getsockopt = tls_getsockopt;
+ prot[TLS_BASE][TLS_BASE].close = tls_sk_proto_close;
- prot[TLS_SW_TX] = prot[TLS_BASE];
- prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg;
- prot[TLS_SW_TX].sendpage = tls_sw_sendpage;
+ prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
+ prot[TLS_SW][TLS_BASE].sendmsg = tls_sw_sendmsg;
+ prot[TLS_SW][TLS_BASE].sendpage = tls_sw_sendpage;
- prot[TLS_SW_RX] = prot[TLS_BASE];
- prot[TLS_SW_RX].recvmsg = tls_sw_recvmsg;
- prot[TLS_SW_RX].close = tls_sk_proto_close;
+ prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE];
+ prot[TLS_BASE][TLS_SW].recvmsg = tls_sw_recvmsg;
+ prot[TLS_BASE][TLS_SW].close = tls_sk_proto_close;
- prot[TLS_SW_RXTX] = prot[TLS_SW_TX];
- prot[TLS_SW_RXTX].recvmsg = tls_sw_recvmsg;
- prot[TLS_SW_RXTX].close = tls_sk_proto_close;
+ prot[TLS_SW][TLS_SW] = prot[TLS_SW][TLS_BASE];
+ prot[TLS_SW][TLS_SW].recvmsg = tls_sw_recvmsg;
+ prot[TLS_SW][TLS_SW].close = tls_sk_proto_close;
- prot[TLS_HW_RECORD] = *base;
- prot[TLS_HW_RECORD].hash = tls_hw_hash;
- prot[TLS_HW_RECORD].unhash = tls_hw_unhash;
- prot[TLS_HW_RECORD].close = tls_sk_proto_close;
+#ifdef CONFIG_TLS_DEVICE
+ prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
+ prot[TLS_HW][TLS_BASE].sendmsg = tls_device_sendmsg;
+ prot[TLS_HW][TLS_BASE].sendpage = tls_device_sendpage;
+
+ prot[TLS_HW][TLS_SW] = prot[TLS_BASE][TLS_SW];
+ prot[TLS_HW][TLS_SW].sendmsg = tls_device_sendmsg;
+ prot[TLS_HW][TLS_SW].sendpage = tls_device_sendpage;
+#endif
+
+ prot[TLS_HW_RECORD][TLS_HW_RECORD] = *base;
+ prot[TLS_HW_RECORD][TLS_HW_RECORD].hash = tls_hw_hash;
+ prot[TLS_HW_RECORD][TLS_HW_RECORD].unhash = tls_hw_unhash;
+ prot[TLS_HW_RECORD][TLS_HW_RECORD].close = tls_sk_proto_close;
}
static int tls_init(struct sock *sk)
@@ -637,7 +665,7 @@
ctx->getsockopt = sk->sk_prot->getsockopt;
ctx->sk_proto_close = sk->sk_prot->close;
- /* Build IPv6 TLS whenever the address of tcpv6_prot changes */
+ /* Build IPv6 TLS whenever the address of tcpv6 _prot changes */
if (ip_ver == TLSV6 &&
unlikely(sk->sk_prot != smp_load_acquire(&saved_tcpv6_prot))) {
mutex_lock(&tcpv6_prot_mutex);
@@ -648,7 +676,8 @@
mutex_unlock(&tcpv6_prot_mutex);
}
- ctx->conf = TLS_BASE;
+ ctx->tx_conf = TLS_BASE;
+ ctx->rx_conf = TLS_BASE;
update_sk_prot(sk, ctx);
out:
return rc;
@@ -686,6 +715,9 @@
tls_sw_proto_ops.poll = tls_sw_poll;
tls_sw_proto_ops.splice_read = tls_sw_splice_read;
+#ifdef CONFIG_TLS_DEVICE
+ tls_device_init();
+#endif
tcp_register_ulp(&tcp_tls_ulp_ops);
return 0;
@@ -694,6 +726,9 @@
static void __exit tls_unregister(void)
{
tcp_unregister_ulp(&tcp_tls_ulp_ops);
+#ifdef CONFIG_TLS_DEVICE
+ tls_device_cleanup();
+#endif
}
module_init(tls_register);
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e1c93ce..839e1e1 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -52,7 +52,7 @@
gfp_t flags)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct strp_msg *rxm = strp_msg(skb);
struct aead_request *aead_req;
@@ -122,7 +122,7 @@
static void trim_both_sgl(struct sock *sk, int target_size)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
trim_sg(sk, ctx->sg_plaintext_data,
&ctx->sg_plaintext_num_elem,
@@ -141,7 +141,7 @@
static int alloc_encrypted_sg(struct sock *sk, int len)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
int rc = 0;
rc = sk_alloc_sg(sk, len,
@@ -155,7 +155,7 @@
static int alloc_plaintext_sg(struct sock *sk, int len)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
int rc = 0;
rc = sk_alloc_sg(sk, len, ctx->sg_plaintext_data, 0,
@@ -181,7 +181,7 @@
static void tls_free_both_sg(struct sock *sk)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
free_sg(sk, ctx->sg_encrypted_data, &ctx->sg_encrypted_num_elem,
&ctx->sg_encrypted_size);
@@ -191,7 +191,7 @@
}
static int tls_do_encryption(struct tls_context *tls_ctx,
- struct tls_sw_context *ctx, size_t data_len,
+ struct tls_sw_context_tx *ctx, size_t data_len,
gfp_t flags)
{
unsigned int req_size = sizeof(struct aead_request) +
@@ -227,7 +227,7 @@
unsigned char record_type)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
int rc;
sg_mark_end(ctx->sg_plaintext_data + ctx->sg_plaintext_num_elem - 1);
@@ -339,7 +339,7 @@
int bytes)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
struct scatterlist *sg = ctx->sg_plaintext_data;
int copy, i, rc = 0;
@@ -367,7 +367,7 @@
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
int ret = 0;
int required_size;
long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
@@ -522,7 +522,7 @@
int offset, size_t size, int flags)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
int ret = 0;
long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
bool eor;
@@ -636,7 +636,7 @@
long timeo, int *err)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct sk_buff *skb;
DEFINE_WAIT_FUNC(wait, woken_wake_function);
@@ -674,7 +674,7 @@
struct scatterlist *sgout)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + MAX_IV_SIZE];
struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
struct scatterlist *sgin = &sgin_arr[0];
@@ -692,8 +692,7 @@
if (!sgout) {
nsg = skb_cow_data(skb, 0, &unused) + 1;
sgin = kmalloc_array(nsg, sizeof(*sgin), sk->sk_allocation);
- if (!sgout)
- sgout = sgin;
+ sgout = sgin;
}
sg_init_table(sgin, nsg);
@@ -723,7 +722,7 @@
unsigned int len)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct strp_msg *rxm = strp_msg(skb);
if (len < rxm->full_len) {
@@ -749,7 +748,7 @@
int *addr_len)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
unsigned char control;
struct strp_msg *rxm;
struct sk_buff *skb;
@@ -869,7 +868,7 @@
size_t len, unsigned int flags)
{
struct tls_context *tls_ctx = tls_get_ctx(sock->sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct strp_msg *rxm = NULL;
struct sock *sk = sock->sk;
struct sk_buff *skb;
@@ -922,7 +921,7 @@
unsigned int ret;
struct sock *sk = sock->sk;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
/* Grab POLLOUT and POLLHUP from the underlying socket */
ret = ctx->sk_poll(file, sock, wait);
@@ -938,7 +937,7 @@
static int tls_read_size(struct strparser *strp, struct sk_buff *skb)
{
struct tls_context *tls_ctx = tls_get_ctx(strp->sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
char header[tls_ctx->rx.prepend_size];
struct strp_msg *rxm = strp_msg(skb);
size_t cipher_overhead;
@@ -987,7 +986,7 @@
static void tls_queue(struct strparser *strp, struct sk_buff *skb)
{
struct tls_context *tls_ctx = tls_get_ctx(strp->sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct strp_msg *rxm;
rxm = strp_msg(skb);
@@ -1003,18 +1002,28 @@
static void tls_data_ready(struct sock *sk)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
strp_data_ready(&ctx->strp);
}
-void tls_sw_free_resources(struct sock *sk)
+void tls_sw_free_resources_tx(struct sock *sk)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
if (ctx->aead_send)
crypto_free_aead(ctx->aead_send);
+ tls_free_both_sg(sk);
+
+ kfree(ctx);
+}
+
+void tls_sw_free_resources_rx(struct sock *sk)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+
if (ctx->aead_recv) {
if (ctx->recv_pkt) {
kfree_skb(ctx->recv_pkt);
@@ -1030,10 +1039,7 @@
lock_sock(sk);
}
- tls_free_both_sg(sk);
-
kfree(ctx);
- kfree(tls_ctx);
}
int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
@@ -1041,7 +1047,8 @@
char keyval[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
struct tls_crypto_info *crypto_info;
struct tls12_crypto_info_aes_gcm_128 *gcm_128_info;
- struct tls_sw_context *sw_ctx;
+ struct tls_sw_context_tx *sw_ctx_tx = NULL;
+ struct tls_sw_context_rx *sw_ctx_rx = NULL;
struct cipher_context *cctx;
struct crypto_aead **aead;
struct strp_callbacks cb;
@@ -1054,27 +1061,32 @@
goto out;
}
- if (!ctx->priv_ctx) {
- sw_ctx = kzalloc(sizeof(*sw_ctx), GFP_KERNEL);
- if (!sw_ctx) {
+ if (tx) {
+ sw_ctx_tx = kzalloc(sizeof(*sw_ctx_tx), GFP_KERNEL);
+ if (!sw_ctx_tx) {
rc = -ENOMEM;
goto out;
}
- crypto_init_wait(&sw_ctx->async_wait);
+ crypto_init_wait(&sw_ctx_tx->async_wait);
+ ctx->priv_ctx_tx = sw_ctx_tx;
} else {
- sw_ctx = ctx->priv_ctx;
+ sw_ctx_rx = kzalloc(sizeof(*sw_ctx_rx), GFP_KERNEL);
+ if (!sw_ctx_rx) {
+ rc = -ENOMEM;
+ goto out;
+ }
+ crypto_init_wait(&sw_ctx_rx->async_wait);
+ ctx->priv_ctx_rx = sw_ctx_rx;
}
- ctx->priv_ctx = (struct tls_offload_context *)sw_ctx;
-
if (tx) {
crypto_info = &ctx->crypto_send;
cctx = &ctx->tx;
- aead = &sw_ctx->aead_send;
+ aead = &sw_ctx_tx->aead_send;
} else {
crypto_info = &ctx->crypto_recv;
cctx = &ctx->rx;
- aead = &sw_ctx->aead_recv;
+ aead = &sw_ctx_rx->aead_recv;
}
switch (crypto_info->cipher_type) {
@@ -1121,22 +1133,24 @@
}
memcpy(cctx->rec_seq, rec_seq, rec_seq_size);
- if (tx) {
- sg_init_table(sw_ctx->sg_encrypted_data,
- ARRAY_SIZE(sw_ctx->sg_encrypted_data));
- sg_init_table(sw_ctx->sg_plaintext_data,
- ARRAY_SIZE(sw_ctx->sg_plaintext_data));
+ if (sw_ctx_tx) {
+ sg_init_table(sw_ctx_tx->sg_encrypted_data,
+ ARRAY_SIZE(sw_ctx_tx->sg_encrypted_data));
+ sg_init_table(sw_ctx_tx->sg_plaintext_data,
+ ARRAY_SIZE(sw_ctx_tx->sg_plaintext_data));
- sg_init_table(sw_ctx->sg_aead_in, 2);
- sg_set_buf(&sw_ctx->sg_aead_in[0], sw_ctx->aad_space,
- sizeof(sw_ctx->aad_space));
- sg_unmark_end(&sw_ctx->sg_aead_in[1]);
- sg_chain(sw_ctx->sg_aead_in, 2, sw_ctx->sg_plaintext_data);
- sg_init_table(sw_ctx->sg_aead_out, 2);
- sg_set_buf(&sw_ctx->sg_aead_out[0], sw_ctx->aad_space,
- sizeof(sw_ctx->aad_space));
- sg_unmark_end(&sw_ctx->sg_aead_out[1]);
- sg_chain(sw_ctx->sg_aead_out, 2, sw_ctx->sg_encrypted_data);
+ sg_init_table(sw_ctx_tx->sg_aead_in, 2);
+ sg_set_buf(&sw_ctx_tx->sg_aead_in[0], sw_ctx_tx->aad_space,
+ sizeof(sw_ctx_tx->aad_space));
+ sg_unmark_end(&sw_ctx_tx->sg_aead_in[1]);
+ sg_chain(sw_ctx_tx->sg_aead_in, 2,
+ sw_ctx_tx->sg_plaintext_data);
+ sg_init_table(sw_ctx_tx->sg_aead_out, 2);
+ sg_set_buf(&sw_ctx_tx->sg_aead_out[0], sw_ctx_tx->aad_space,
+ sizeof(sw_ctx_tx->aad_space));
+ sg_unmark_end(&sw_ctx_tx->sg_aead_out[1]);
+ sg_chain(sw_ctx_tx->sg_aead_out, 2,
+ sw_ctx_tx->sg_encrypted_data);
}
if (!*aead) {
@@ -1161,22 +1175,22 @@
if (rc)
goto free_aead;
- if (!tx) {
+ if (sw_ctx_rx) {
/* Set up strparser */
memset(&cb, 0, sizeof(cb));
cb.rcv_msg = tls_queue;
cb.parse_msg = tls_read_size;
- strp_init(&sw_ctx->strp, sk, &cb);
+ strp_init(&sw_ctx_rx->strp, sk, &cb);
write_lock_bh(&sk->sk_callback_lock);
- sw_ctx->saved_data_ready = sk->sk_data_ready;
+ sw_ctx_rx->saved_data_ready = sk->sk_data_ready;
sk->sk_data_ready = tls_data_ready;
write_unlock_bh(&sk->sk_callback_lock);
- sw_ctx->sk_poll = sk->sk_socket->ops->poll;
+ sw_ctx_rx->sk_poll = sk->sk_socket->ops->poll;
- strp_check_rcv(&sw_ctx->strp);
+ strp_check_rcv(&sw_ctx_rx->strp);
}
goto out;
@@ -1188,11 +1202,16 @@
kfree(cctx->rec_seq);
cctx->rec_seq = NULL;
free_iv:
- kfree(ctx->tx.iv);
- ctx->tx.iv = NULL;
+ kfree(cctx->iv);
+ cctx->iv = NULL;
free_priv:
- kfree(ctx->priv_ctx);
- ctx->priv_ctx = NULL;
+ if (tx) {
+ kfree(ctx->priv_ctx_tx);
+ ctx->priv_ctx_tx = NULL;
+ } else {
+ kfree(ctx->priv_ctx_rx);
+ ctx->priv_ctx_rx = NULL;
+ }
out:
return rc;
}
diff --git a/net/wireless/core.c b/net/wireless/core.c
index c0fd8a8..5fe35aa 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -725,6 +725,10 @@
(!rdev->ops->set_pmk || !rdev->ops->del_pmk)))
return -EINVAL;
+ if (WARN_ON(!(rdev->wiphy.flags & WIPHY_FLAG_SUPPORTS_FW_ROAM) &&
+ rdev->ops->update_connect_params))
+ return -EINVAL;
+
if (wiphy->addresses)
memcpy(wiphy->perm_addr, wiphy->addresses[0].addr, ETH_ALEN);
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 7c5135a..07514ca 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4,6 +4,7 @@
* Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
* Copyright 2013-2014 Intel Mobile Communications GmbH
* Copyright 2015-2017 Intel Deutschland GmbH
+ * Copyright (C) 2018 Intel Corporation
*/
#include <linux/if.h>
@@ -423,6 +424,10 @@
[NL80211_ATTR_PMK] = { .type = NLA_BINARY, .len = PMK_MAX_LEN },
[NL80211_ATTR_SCHED_SCAN_MULTI] = { .type = NLA_FLAG },
[NL80211_ATTR_EXTERNAL_AUTH_SUPPORT] = { .type = NLA_FLAG },
+
+ [NL80211_ATTR_TXQ_LIMIT] = { .type = NLA_U32 },
+ [NL80211_ATTR_TXQ_MEMORY_LIMIT] = { .type = NLA_U32 },
+ [NL80211_ATTR_TXQ_QUANTUM] = { .type = NLA_U32 },
};
/* policy for the key attributes */
@@ -645,7 +650,43 @@
return genlmsg_put(skb, portid, seq, &nl80211_fam, flags, cmd);
}
-static int nl80211_msg_put_channel(struct sk_buff *msg,
+static int nl80211_msg_put_wmm_rules(struct sk_buff *msg,
+ const struct ieee80211_reg_rule *rule)
+{
+ int j;
+ struct nlattr *nl_wmm_rules =
+ nla_nest_start(msg, NL80211_FREQUENCY_ATTR_WMM);
+
+ if (!nl_wmm_rules)
+ goto nla_put_failure;
+
+ for (j = 0; j < IEEE80211_NUM_ACS; j++) {
+ struct nlattr *nl_wmm_rule = nla_nest_start(msg, j);
+
+ if (!nl_wmm_rule)
+ goto nla_put_failure;
+
+ if (nla_put_u16(msg, NL80211_WMMR_CW_MIN,
+ rule->wmm_rule->client[j].cw_min) ||
+ nla_put_u16(msg, NL80211_WMMR_CW_MAX,
+ rule->wmm_rule->client[j].cw_max) ||
+ nla_put_u8(msg, NL80211_WMMR_AIFSN,
+ rule->wmm_rule->client[j].aifsn) ||
+ nla_put_u8(msg, NL80211_WMMR_TXOP,
+ rule->wmm_rule->client[j].cot))
+ goto nla_put_failure;
+
+ nla_nest_end(msg, nl_wmm_rule);
+ }
+ nla_nest_end(msg, nl_wmm_rules);
+
+ return 0;
+
+nla_put_failure:
+ return -ENOBUFS;
+}
+
+static int nl80211_msg_put_channel(struct sk_buff *msg, struct wiphy *wiphy,
struct ieee80211_channel *chan,
bool large)
{
@@ -721,12 +762,55 @@
DBM_TO_MBM(chan->max_power)))
goto nla_put_failure;
+ if (large) {
+ const struct ieee80211_reg_rule *rule =
+ freq_reg_info(wiphy, chan->center_freq);
+
+ if (!IS_ERR(rule) && rule->wmm_rule) {
+ if (nl80211_msg_put_wmm_rules(msg, rule))
+ goto nla_put_failure;
+ }
+ }
+
return 0;
nla_put_failure:
return -ENOBUFS;
}
+static bool nl80211_put_txq_stats(struct sk_buff *msg,
+ struct cfg80211_txq_stats *txqstats,
+ int attrtype)
+{
+ struct nlattr *txqattr;
+
+#define PUT_TXQVAL_U32(attr, memb) do { \
+ if (txqstats->filled & BIT(NL80211_TXQ_STATS_ ## attr) && \
+ nla_put_u32(msg, NL80211_TXQ_STATS_ ## attr, txqstats->memb)) \
+ return false; \
+ } while (0)
+
+ txqattr = nla_nest_start(msg, attrtype);
+ if (!txqattr)
+ return false;
+
+ PUT_TXQVAL_U32(BACKLOG_BYTES, backlog_bytes);
+ PUT_TXQVAL_U32(BACKLOG_PACKETS, backlog_packets);
+ PUT_TXQVAL_U32(FLOWS, flows);
+ PUT_TXQVAL_U32(DROPS, drops);
+ PUT_TXQVAL_U32(ECN_MARKS, ecn_marks);
+ PUT_TXQVAL_U32(OVERLIMIT, overlimit);
+ PUT_TXQVAL_U32(OVERMEMORY, overmemory);
+ PUT_TXQVAL_U32(COLLISIONS, collisions);
+ PUT_TXQVAL_U32(TX_BYTES, tx_bytes);
+ PUT_TXQVAL_U32(TX_PACKETS, tx_packets);
+ PUT_TXQVAL_U32(MAX_FLOWS, max_flows);
+ nla_nest_end(msg, txqattr);
+
+#undef PUT_TXQVAL_U32
+ return true;
+}
+
/* netlink command implementations */
struct key_parse {
@@ -1631,7 +1715,7 @@
chan = &sband->channels[i];
if (nl80211_msg_put_channel(
- msg, chan,
+ msg, &rdev->wiphy, chan,
state->split))
goto nla_put_failure;
@@ -1926,6 +2010,28 @@
rdev->wiphy.nan_supported_bands))
goto nla_put_failure;
+ if (wiphy_ext_feature_isset(&rdev->wiphy,
+ NL80211_EXT_FEATURE_TXQS)) {
+ struct cfg80211_txq_stats txqstats = {};
+ int res;
+
+ res = rdev_get_txq_stats(rdev, NULL, &txqstats);
+ if (!res &&
+ !nl80211_put_txq_stats(msg, &txqstats,
+ NL80211_ATTR_TXQ_STATS))
+ goto nla_put_failure;
+
+ if (nla_put_u32(msg, NL80211_ATTR_TXQ_LIMIT,
+ rdev->wiphy.txq_limit))
+ goto nla_put_failure;
+ if (nla_put_u32(msg, NL80211_ATTR_TXQ_MEMORY_LIMIT,
+ rdev->wiphy.txq_memory_limit))
+ goto nla_put_failure;
+ if (nla_put_u32(msg, NL80211_ATTR_TXQ_QUANTUM,
+ rdev->wiphy.txq_quantum))
+ goto nla_put_failure;
+ }
+
/* done */
state->split_start = 0;
break;
@@ -2303,6 +2409,7 @@
u8 retry_short = 0, retry_long = 0;
u32 frag_threshold = 0, rts_threshold = 0;
u8 coverage_class = 0;
+ u32 txq_limit = 0, txq_memory_limit = 0, txq_quantum = 0;
ASSERT_RTNL();
@@ -2509,10 +2616,38 @@
changed |= WIPHY_PARAM_DYN_ACK;
}
+ if (info->attrs[NL80211_ATTR_TXQ_LIMIT]) {
+ if (!wiphy_ext_feature_isset(&rdev->wiphy,
+ NL80211_EXT_FEATURE_TXQS))
+ return -EOPNOTSUPP;
+ txq_limit = nla_get_u32(
+ info->attrs[NL80211_ATTR_TXQ_LIMIT]);
+ changed |= WIPHY_PARAM_TXQ_LIMIT;
+ }
+
+ if (info->attrs[NL80211_ATTR_TXQ_MEMORY_LIMIT]) {
+ if (!wiphy_ext_feature_isset(&rdev->wiphy,
+ NL80211_EXT_FEATURE_TXQS))
+ return -EOPNOTSUPP;
+ txq_memory_limit = nla_get_u32(
+ info->attrs[NL80211_ATTR_TXQ_MEMORY_LIMIT]);
+ changed |= WIPHY_PARAM_TXQ_MEMORY_LIMIT;
+ }
+
+ if (info->attrs[NL80211_ATTR_TXQ_QUANTUM]) {
+ if (!wiphy_ext_feature_isset(&rdev->wiphy,
+ NL80211_EXT_FEATURE_TXQS))
+ return -EOPNOTSUPP;
+ txq_quantum = nla_get_u32(
+ info->attrs[NL80211_ATTR_TXQ_QUANTUM]);
+ changed |= WIPHY_PARAM_TXQ_QUANTUM;
+ }
+
if (changed) {
u8 old_retry_short, old_retry_long;
u32 old_frag_threshold, old_rts_threshold;
u8 old_coverage_class;
+ u32 old_txq_limit, old_txq_memory_limit, old_txq_quantum;
if (!rdev->ops->set_wiphy_params)
return -EOPNOTSUPP;
@@ -2522,6 +2657,9 @@
old_frag_threshold = rdev->wiphy.frag_threshold;
old_rts_threshold = rdev->wiphy.rts_threshold;
old_coverage_class = rdev->wiphy.coverage_class;
+ old_txq_limit = rdev->wiphy.txq_limit;
+ old_txq_memory_limit = rdev->wiphy.txq_memory_limit;
+ old_txq_quantum = rdev->wiphy.txq_quantum;
if (changed & WIPHY_PARAM_RETRY_SHORT)
rdev->wiphy.retry_short = retry_short;
@@ -2533,6 +2671,12 @@
rdev->wiphy.rts_threshold = rts_threshold;
if (changed & WIPHY_PARAM_COVERAGE_CLASS)
rdev->wiphy.coverage_class = coverage_class;
+ if (changed & WIPHY_PARAM_TXQ_LIMIT)
+ rdev->wiphy.txq_limit = txq_limit;
+ if (changed & WIPHY_PARAM_TXQ_MEMORY_LIMIT)
+ rdev->wiphy.txq_memory_limit = txq_memory_limit;
+ if (changed & WIPHY_PARAM_TXQ_QUANTUM)
+ rdev->wiphy.txq_quantum = txq_quantum;
result = rdev_set_wiphy_params(rdev, changed);
if (result) {
@@ -2541,6 +2685,9 @@
rdev->wiphy.frag_threshold = old_frag_threshold;
rdev->wiphy.rts_threshold = old_rts_threshold;
rdev->wiphy.coverage_class = old_coverage_class;
+ rdev->wiphy.txq_limit = old_txq_limit;
+ rdev->wiphy.txq_memory_limit = old_txq_memory_limit;
+ rdev->wiphy.txq_quantum = old_txq_quantum;
return result;
}
}
@@ -2662,6 +2809,16 @@
}
wdev_unlock(wdev);
+ if (rdev->ops->get_txq_stats) {
+ struct cfg80211_txq_stats txqstats = {};
+ int ret = rdev_get_txq_stats(rdev, wdev, &txqstats);
+
+ if (ret == 0 &&
+ !nl80211_put_txq_stats(msg, &txqstats,
+ NL80211_ATTR_TXQ_STATS))
+ goto nla_put_failure;
+ }
+
genlmsg_end(msg, hdr);
return 0;
@@ -4494,11 +4651,14 @@
PUT_SINFO_U64(BEACON_RX, rx_beacon);
PUT_SINFO(BEACON_SIGNAL_AVG, rx_beacon_signal_avg, u8);
PUT_SINFO(ACK_SIGNAL, ack_signal, u8);
+ if (wiphy_ext_feature_isset(&rdev->wiphy,
+ NL80211_EXT_FEATURE_DATA_ACK_SIGNAL_SUPPORT))
+ PUT_SINFO(DATA_ACK_SIGNAL_AVG, avg_ack_signal, s8);
#undef PUT_SINFO
#undef PUT_SINFO_U64
- if (sinfo->filled & BIT(NL80211_STA_INFO_TID_STATS)) {
+ if (sinfo->pertid) {
struct nlattr *tidsattr;
int tid;
@@ -4532,6 +4692,12 @@
PUT_TIDVAL_U64(TX_MSDU_FAILED, tx_msdu_failed);
#undef PUT_TIDVAL_U64
+ if ((tidstats->filled &
+ BIT(NL80211_TID_STATS_TXQ_STATS)) &&
+ !nl80211_put_txq_stats(msg, &tidstats->txq_stats,
+ NL80211_TID_STATS_TXQ_STATS))
+ goto nla_put_failure;
+
nla_nest_end(msg, tidattr);
}
@@ -4545,10 +4711,12 @@
sinfo->assoc_req_ies))
goto nla_put_failure;
+ cfg80211_sinfo_release_content(sinfo);
genlmsg_end(msg, hdr);
return 0;
nla_put_failure:
+ cfg80211_sinfo_release_content(sinfo);
genlmsg_cancel(msg, hdr);
return -EMSGSIZE;
}
@@ -4630,8 +4798,10 @@
return err;
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
- if (!msg)
+ if (!msg) {
+ cfg80211_sinfo_release_content(&sinfo);
return -ENOMEM;
+ }
if (nl80211_send_station(msg, NL80211_CMD_NEW_STATION,
info->snd_portid, info->snd_seq, 0,
@@ -7930,7 +8100,15 @@
wdev_lock(wdev);
spin_lock_bh(&rdev->bss_lock);
- cfg80211_bss_expire(rdev);
+
+ /*
+ * dump_scan will be called multiple times to break up the scan results
+ * into multiple messages. It is unlikely that any more bss-es will be
+ * expired after the first call, so only call only call this on the
+ * first dump_scan invocation.
+ */
+ if (start == 0)
+ cfg80211_bss_expire(rdev);
cb->seq = rdev->bss_generation;
@@ -8336,6 +8514,10 @@
const u8 *bssid, *ssid;
int err, ssid_len = 0;
+ if (dev->ieee80211_ptr->conn_owner_nlportid &&
+ dev->ieee80211_ptr->conn_owner_nlportid != info->snd_portid)
+ return -EPERM;
+
if (!is_valid_ie_attr(info->attrs[NL80211_ATTR_IE]))
return -EINVAL;
@@ -8458,6 +8640,10 @@
u16 reason_code;
bool local_state_change;
+ if (dev->ieee80211_ptr->conn_owner_nlportid &&
+ dev->ieee80211_ptr->conn_owner_nlportid != info->snd_portid)
+ return -EPERM;
+
if (!is_valid_ie_attr(info->attrs[NL80211_ATTR_IE]))
return -EINVAL;
@@ -8505,6 +8691,10 @@
u16 reason_code;
bool local_state_change;
+ if (dev->ieee80211_ptr->conn_owner_nlportid &&
+ dev->ieee80211_ptr->conn_owner_nlportid != info->snd_portid)
+ return -EPERM;
+
if (!is_valid_ie_attr(info->attrs[NL80211_ATTR_IE]))
return -EINVAL;
@@ -9251,6 +9441,8 @@
struct cfg80211_registered_device *rdev = info->user_ptr[0];
struct net_device *dev = info->user_ptr[1];
struct wireless_dev *wdev = dev->ieee80211_ptr;
+ bool fils_sk_offload;
+ u32 auth_type;
u32 changed = 0;
int ret;
@@ -9265,6 +9457,56 @@
changed |= UPDATE_ASSOC_IES;
}
+ fils_sk_offload = wiphy_ext_feature_isset(&rdev->wiphy,
+ NL80211_EXT_FEATURE_FILS_SK_OFFLOAD);
+
+ /*
+ * when driver supports fils-sk offload all attributes must be
+ * provided. So the else covers "fils-sk-not-all" and
+ * "no-fils-sk-any".
+ */
+ if (fils_sk_offload &&
+ info->attrs[NL80211_ATTR_FILS_ERP_USERNAME] &&
+ info->attrs[NL80211_ATTR_FILS_ERP_REALM] &&
+ info->attrs[NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM] &&
+ info->attrs[NL80211_ATTR_FILS_ERP_RRK]) {
+ connect.fils_erp_username =
+ nla_data(info->attrs[NL80211_ATTR_FILS_ERP_USERNAME]);
+ connect.fils_erp_username_len =
+ nla_len(info->attrs[NL80211_ATTR_FILS_ERP_USERNAME]);
+ connect.fils_erp_realm =
+ nla_data(info->attrs[NL80211_ATTR_FILS_ERP_REALM]);
+ connect.fils_erp_realm_len =
+ nla_len(info->attrs[NL80211_ATTR_FILS_ERP_REALM]);
+ connect.fils_erp_next_seq_num =
+ nla_get_u16(
+ info->attrs[NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM]);
+ connect.fils_erp_rrk =
+ nla_data(info->attrs[NL80211_ATTR_FILS_ERP_RRK]);
+ connect.fils_erp_rrk_len =
+ nla_len(info->attrs[NL80211_ATTR_FILS_ERP_RRK]);
+ changed |= UPDATE_FILS_ERP_INFO;
+ } else if (info->attrs[NL80211_ATTR_FILS_ERP_USERNAME] ||
+ info->attrs[NL80211_ATTR_FILS_ERP_REALM] ||
+ info->attrs[NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM] ||
+ info->attrs[NL80211_ATTR_FILS_ERP_RRK]) {
+ return -EINVAL;
+ }
+
+ if (info->attrs[NL80211_ATTR_AUTH_TYPE]) {
+ auth_type = nla_get_u32(info->attrs[NL80211_ATTR_AUTH_TYPE]);
+ if (!nl80211_valid_auth_type(rdev, auth_type,
+ NL80211_CMD_CONNECT))
+ return -EINVAL;
+
+ if (auth_type == NL80211_AUTHTYPE_FILS_SK &&
+ fils_sk_offload && !(changed & UPDATE_FILS_ERP_INFO))
+ return -EINVAL;
+
+ connect.auth_type = auth_type;
+ changed |= UPDATE_AUTH_TYPE;
+ }
+
wdev_lock(dev->ieee80211_ptr);
if (!wdev->current_bss)
ret = -ENOLINK;
@@ -9282,6 +9524,10 @@
u16 reason;
int ret;
+ if (dev->ieee80211_ptr->conn_owner_nlportid &&
+ dev->ieee80211_ptr->conn_owner_nlportid != info->snd_portid)
+ return -EPERM;
+
if (!info->attrs[NL80211_ATTR_REASON_CODE])
reason = WLAN_REASON_DEAUTH_LEAVING;
else
@@ -14028,8 +14274,8 @@
void *hdr;
msg = nlmsg_new(100 + cr->req_ie_len + cr->resp_ie_len +
- cr->fils_kek_len + cr->pmk_len +
- (cr->pmkid ? WLAN_PMKID_LEN : 0), gfp);
+ cr->fils.kek_len + cr->fils.pmk_len +
+ (cr->fils.pmkid ? WLAN_PMKID_LEN : 0), gfp);
if (!msg)
return;
@@ -14055,17 +14301,17 @@
(cr->resp_ie &&
nla_put(msg, NL80211_ATTR_RESP_IE, cr->resp_ie_len,
cr->resp_ie)) ||
- (cr->update_erp_next_seq_num &&
+ (cr->fils.update_erp_next_seq_num &&
nla_put_u16(msg, NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM,
- cr->fils_erp_next_seq_num)) ||
+ cr->fils.erp_next_seq_num)) ||
(cr->status == WLAN_STATUS_SUCCESS &&
- ((cr->fils_kek &&
- nla_put(msg, NL80211_ATTR_FILS_KEK, cr->fils_kek_len,
- cr->fils_kek)) ||
- (cr->pmk &&
- nla_put(msg, NL80211_ATTR_PMK, cr->pmk_len, cr->pmk)) ||
- (cr->pmkid &&
- nla_put(msg, NL80211_ATTR_PMKID, WLAN_PMKID_LEN, cr->pmkid)))))
+ ((cr->fils.kek &&
+ nla_put(msg, NL80211_ATTR_FILS_KEK, cr->fils.kek_len,
+ cr->fils.kek)) ||
+ (cr->fils.pmk &&
+ nla_put(msg, NL80211_ATTR_PMK, cr->fils.pmk_len, cr->fils.pmk)) ||
+ (cr->fils.pmkid &&
+ nla_put(msg, NL80211_ATTR_PMKID, WLAN_PMKID_LEN, cr->fils.pmkid)))))
goto nla_put_failure;
genlmsg_end(msg, hdr);
@@ -14086,7 +14332,9 @@
void *hdr;
const u8 *bssid = info->bss ? info->bss->bssid : info->bssid;
- msg = nlmsg_new(100 + info->req_ie_len + info->resp_ie_len, gfp);
+ msg = nlmsg_new(100 + info->req_ie_len + info->resp_ie_len +
+ info->fils.kek_len + info->fils.pmk_len +
+ (info->fils.pmkid ? WLAN_PMKID_LEN : 0), gfp);
if (!msg)
return;
@@ -14104,7 +14352,17 @@
info->req_ie)) ||
(info->resp_ie &&
nla_put(msg, NL80211_ATTR_RESP_IE, info->resp_ie_len,
- info->resp_ie)))
+ info->resp_ie)) ||
+ (info->fils.update_erp_next_seq_num &&
+ nla_put_u16(msg, NL80211_ATTR_FILS_ERP_NEXT_SEQ_NUM,
+ info->fils.erp_next_seq_num)) ||
+ (info->fils.kek &&
+ nla_put(msg, NL80211_ATTR_FILS_KEK, info->fils.kek_len,
+ info->fils.kek)) ||
+ (info->fils.pmk &&
+ nla_put(msg, NL80211_ATTR_PMK, info->fils.pmk_len, info->fils.pmk)) ||
+ (info->fils.pmkid &&
+ nla_put(msg, NL80211_ATTR_PMKID, WLAN_PMKID_LEN, info->fils.pmkid)))
goto nla_put_failure;
genlmsg_end(msg, hdr);
@@ -14321,7 +14579,8 @@
nl_freq = nla_nest_start(msg, NL80211_ATTR_FREQ_BEFORE);
if (!nl_freq)
goto nla_put_failure;
- if (nl80211_msg_put_channel(msg, channel_before, false))
+
+ if (nl80211_msg_put_channel(msg, wiphy, channel_before, false))
goto nla_put_failure;
nla_nest_end(msg, nl_freq);
@@ -14329,7 +14588,8 @@
nl_freq = nla_nest_start(msg, NL80211_ATTR_FREQ_AFTER);
if (!nl_freq)
goto nla_put_failure;
- if (nl80211_msg_put_channel(msg, channel_after, false))
+
+ if (nl80211_msg_put_channel(msg, wiphy, channel_after, false))
goto nla_put_failure;
nla_nest_end(msg, nl_freq);
@@ -14456,8 +14716,10 @@
trace_cfg80211_del_sta(dev, mac_addr);
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, gfp);
- if (!msg)
+ if (!msg) {
+ cfg80211_sinfo_release_content(sinfo);
return;
+ }
if (nl80211_send_station(msg, NL80211_CMD_DEL_STATION, 0, 0, 0,
rdev, dev, mac_addr, sinfo) < 0) {
diff --git a/net/wireless/rdev-ops.h b/net/wireless/rdev-ops.h
index 87479a5..364f5d6 100644
--- a/net/wireless/rdev-ops.h
+++ b/net/wireless/rdev-ops.h
@@ -586,6 +586,18 @@
return ret;
}
+static inline int
+rdev_get_txq_stats(struct cfg80211_registered_device *rdev,
+ struct wireless_dev *wdev,
+ struct cfg80211_txq_stats *txqstats)
+{
+ int ret;
+ trace_rdev_get_txq_stats(&rdev->wiphy, wdev);
+ ret = rdev->ops->get_txq_stats(&rdev->wiphy, wdev, txqstats);
+ trace_rdev_return_int(&rdev->wiphy, ret);
+ return ret;
+}
+
static inline void rdev_rfkill_poll(struct cfg80211_registered_device *rdev)
{
trace_rdev_rfkill_poll(&rdev->wiphy);
diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 5fcec5c..bbe6298 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -1656,7 +1656,7 @@
case NL80211_REGDOM_SET_BY_DRIVER:
return "driver";
case NL80211_REGDOM_SET_BY_COUNTRY_IE:
- return "country IE";
+ return "country element";
default:
WARN_ON(1);
return "bug";
@@ -2622,7 +2622,7 @@
* This doesn't happen yet, not sure we
* ever want to support it for this case.
*/
- WARN_ONCE(1, "Unexpected intersection for country IEs");
+ WARN_ONCE(1, "Unexpected intersection for country elements");
return REG_REQ_IGNORE;
}
@@ -2772,6 +2772,21 @@
reg_free_request(reg_request);
}
+static void notify_self_managed_wiphys(struct regulatory_request *request)
+{
+ struct cfg80211_registered_device *rdev;
+ struct wiphy *wiphy;
+
+ list_for_each_entry(rdev, &cfg80211_rdev_list, list) {
+ wiphy = &rdev->wiphy;
+ if (wiphy->regulatory_flags & REGULATORY_WIPHY_SELF_MANAGED &&
+ request->initiator == NL80211_REGDOM_SET_BY_USER &&
+ request->user_reg_hint_type ==
+ NL80211_USER_REG_HINT_CELL_BASE)
+ reg_call_notifier(wiphy, request);
+ }
+}
+
static bool reg_only_self_managed_wiphys(void)
{
struct cfg80211_registered_device *rdev;
@@ -2823,6 +2838,7 @@
spin_unlock(®_requests_lock);
+ notify_self_managed_wiphys(reg_request);
if (reg_only_self_managed_wiphys()) {
reg_free_request(reg_request);
return;
@@ -3387,7 +3403,7 @@
case NL80211_DFS_JP:
return true;
default:
- pr_debug("Ignoring uknown DFS master region: %d\n", dfs_region);
+ pr_debug("Ignoring unknown DFS master region: %d\n", dfs_region);
return false;
}
}
@@ -3702,17 +3718,26 @@
void wiphy_regulatory_register(struct wiphy *wiphy)
{
- struct regulatory_request *lr;
+ struct regulatory_request *lr = get_last_request();
- /* self-managed devices ignore external hints */
- if (wiphy->regulatory_flags & REGULATORY_WIPHY_SELF_MANAGED)
+ /* self-managed devices ignore beacon hints and country IE */
+ if (wiphy->regulatory_flags & REGULATORY_WIPHY_SELF_MANAGED) {
wiphy->regulatory_flags |= REGULATORY_DISABLE_BEACON_HINTS |
REGULATORY_COUNTRY_IE_IGNORE;
+ /*
+ * The last request may have been received before this
+ * registration call. Call the driver notifier if
+ * initiator is USER and user type is CELL_BASE.
+ */
+ if (lr->initiator == NL80211_REGDOM_SET_BY_USER &&
+ lr->user_reg_hint_type == NL80211_USER_REG_HINT_CELL_BASE)
+ reg_call_notifier(wiphy, lr);
+ }
+
if (!reg_dev_ignore_cell_hint(wiphy))
reg_num_devs_support_basehint++;
- lr = get_last_request();
wiphy_update_regulatory(wiphy, lr->initiator);
wiphy_all_share_dfs_chan_state(wiphy);
}
diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index 5df6b33..d536b07 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -803,8 +803,8 @@
ev = kzalloc(sizeof(*ev) + (params->bssid ? ETH_ALEN : 0) +
params->req_ie_len + params->resp_ie_len +
- params->fils_kek_len + params->pmk_len +
- (params->pmkid ? WLAN_PMKID_LEN : 0), gfp);
+ params->fils.kek_len + params->fils.pmk_len +
+ (params->fils.pmkid ? WLAN_PMKID_LEN : 0), gfp);
if (!ev) {
cfg80211_put_bss(wdev->wiphy, params->bss);
return;
@@ -831,27 +831,29 @@
params->resp_ie_len);
next += params->resp_ie_len;
}
- if (params->fils_kek_len) {
- ev->cr.fils_kek = next;
- ev->cr.fils_kek_len = params->fils_kek_len;
- memcpy((void *)ev->cr.fils_kek, params->fils_kek,
- params->fils_kek_len);
- next += params->fils_kek_len;
+ if (params->fils.kek_len) {
+ ev->cr.fils.kek = next;
+ ev->cr.fils.kek_len = params->fils.kek_len;
+ memcpy((void *)ev->cr.fils.kek, params->fils.kek,
+ params->fils.kek_len);
+ next += params->fils.kek_len;
}
- if (params->pmk_len) {
- ev->cr.pmk = next;
- ev->cr.pmk_len = params->pmk_len;
- memcpy((void *)ev->cr.pmk, params->pmk, params->pmk_len);
- next += params->pmk_len;
+ if (params->fils.pmk_len) {
+ ev->cr.fils.pmk = next;
+ ev->cr.fils.pmk_len = params->fils.pmk_len;
+ memcpy((void *)ev->cr.fils.pmk, params->fils.pmk,
+ params->fils.pmk_len);
+ next += params->fils.pmk_len;
}
- if (params->pmkid) {
- ev->cr.pmkid = next;
- memcpy((void *)ev->cr.pmkid, params->pmkid, WLAN_PMKID_LEN);
+ if (params->fils.pmkid) {
+ ev->cr.fils.pmkid = next;
+ memcpy((void *)ev->cr.fils.pmkid, params->fils.pmkid,
+ WLAN_PMKID_LEN);
next += WLAN_PMKID_LEN;
}
- ev->cr.update_erp_next_seq_num = params->update_erp_next_seq_num;
- if (params->update_erp_next_seq_num)
- ev->cr.fils_erp_next_seq_num = params->fils_erp_next_seq_num;
+ ev->cr.fils.update_erp_next_seq_num = params->fils.update_erp_next_seq_num;
+ if (params->fils.update_erp_next_seq_num)
+ ev->cr.fils.erp_next_seq_num = params->fils.erp_next_seq_num;
if (params->bss)
cfg80211_hold_bss(bss_from_pub(params->bss));
ev->cr.bss = params->bss;
@@ -930,6 +932,7 @@
struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
struct cfg80211_event *ev;
unsigned long flags;
+ u8 *next;
if (!info->bss) {
info->bss = cfg80211_get_bss(wdev->wiphy, info->channel,
@@ -942,19 +945,52 @@
if (WARN_ON(!info->bss))
return;
- ev = kzalloc(sizeof(*ev) + info->req_ie_len + info->resp_ie_len, gfp);
+ ev = kzalloc(sizeof(*ev) + info->req_ie_len + info->resp_ie_len +
+ info->fils.kek_len + info->fils.pmk_len +
+ (info->fils.pmkid ? WLAN_PMKID_LEN : 0), gfp);
if (!ev) {
cfg80211_put_bss(wdev->wiphy, info->bss);
return;
}
ev->type = EVENT_ROAMED;
- ev->rm.req_ie = ((u8 *)ev) + sizeof(*ev);
- ev->rm.req_ie_len = info->req_ie_len;
- memcpy((void *)ev->rm.req_ie, info->req_ie, info->req_ie_len);
- ev->rm.resp_ie = ((u8 *)ev) + sizeof(*ev) + info->req_ie_len;
- ev->rm.resp_ie_len = info->resp_ie_len;
- memcpy((void *)ev->rm.resp_ie, info->resp_ie, info->resp_ie_len);
+ next = ((u8 *)ev) + sizeof(*ev);
+ if (info->req_ie_len) {
+ ev->rm.req_ie = next;
+ ev->rm.req_ie_len = info->req_ie_len;
+ memcpy((void *)ev->rm.req_ie, info->req_ie, info->req_ie_len);
+ next += info->req_ie_len;
+ }
+ if (info->resp_ie_len) {
+ ev->rm.resp_ie = next;
+ ev->rm.resp_ie_len = info->resp_ie_len;
+ memcpy((void *)ev->rm.resp_ie, info->resp_ie,
+ info->resp_ie_len);
+ next += info->resp_ie_len;
+ }
+ if (info->fils.kek_len) {
+ ev->rm.fils.kek = next;
+ ev->rm.fils.kek_len = info->fils.kek_len;
+ memcpy((void *)ev->rm.fils.kek, info->fils.kek,
+ info->fils.kek_len);
+ next += info->fils.kek_len;
+ }
+ if (info->fils.pmk_len) {
+ ev->rm.fils.pmk = next;
+ ev->rm.fils.pmk_len = info->fils.pmk_len;
+ memcpy((void *)ev->rm.fils.pmk, info->fils.pmk,
+ info->fils.pmk_len);
+ next += info->fils.pmk_len;
+ }
+ if (info->fils.pmkid) {
+ ev->rm.fils.pmkid = next;
+ memcpy((void *)ev->rm.fils.pmkid, info->fils.pmkid,
+ WLAN_PMKID_LEN);
+ next += WLAN_PMKID_LEN;
+ }
+ ev->rm.fils.update_erp_next_seq_num = info->fils.update_erp_next_seq_num;
+ if (info->fils.update_erp_next_seq_num)
+ ev->rm.fils.erp_next_seq_num = info->fils.erp_next_seq_num;
ev->rm.bss = info->bss;
spin_lock_irqsave(&wdev->event_lock, flags);
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index 55fb279..2b417a2 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -3243,6 +3243,20 @@
WIPHY_PR_ARG, NETDEV_PR_ARG,
BOOL_TO_STR(__entry->enabled))
);
+
+TRACE_EVENT(rdev_get_txq_stats,
+ TP_PROTO(struct wiphy *wiphy, struct wireless_dev *wdev),
+ TP_ARGS(wiphy, wdev),
+ TP_STRUCT__entry(
+ WIPHY_ENTRY
+ WDEV_ENTRY
+ ),
+ TP_fast_assign(
+ WIPHY_ASSIGN;
+ WDEV_ASSIGN;
+ ),
+ TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT, WIPHY_PR_ARG, WDEV_PR_ARG)
+);
#endif /* !__RDEV_OPS_TRACE || TRACE_HEADER_MULTI_READ */
#undef TRACE_INCLUDE_PATH
diff --git a/net/wireless/util.c b/net/wireless/util.c
index d112e9a..b5bb1c3 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1787,6 +1787,17 @@
return false;
}
+int cfg80211_sinfo_alloc_tid_stats(struct station_info *sinfo, gfp_t gfp)
+{
+ sinfo->pertid = kcalloc(sizeof(*(sinfo->pertid)),
+ IEEE80211_NUM_TIDS + 1, gfp);
+ if (!sinfo->pertid)
+ return -ENOMEM;
+
+ return 0;
+}
+EXPORT_SYMBOL(cfg80211_sinfo_alloc_tid_stats);
+
/* See IEEE 802.1H for LLC/SNAP encapsulation/decapsulation */
/* Ethernet-II snap header (RFC1042 for most EtherTypes) */
const unsigned char rfc1042_header[] __aligned(2) =
diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
new file mode 100644
index 0000000..90e4a71
--- /dev/null
+++ b/net/xdp/Kconfig
@@ -0,0 +1,7 @@
+config XDP_SOCKETS
+ bool "XDP sockets"
+ depends on BPF_SYSCALL
+ default n
+ help
+ XDP sockets allows a channel between XDP programs and
+ userspace applications.
diff --git a/net/xdp/Makefile b/net/xdp/Makefile
new file mode 100644
index 0000000..04f0731
--- /dev/null
+++ b/net/xdp/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o xsk_queue.o
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
new file mode 100644
index 0000000..8799881
--- /dev/null
+++ b/net/xdp/xdp_umem.c
@@ -0,0 +1,248 @@
+// SPDX-License-Identifier: GPL-2.0
+/* XDP user-space packet buffer
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <linux/init.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/task.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/bpf.h>
+#include <linux/mm.h>
+
+#include "xdp_umem.h"
+
+#define XDP_UMEM_MIN_FRAME_SIZE 2048
+
+static void xdp_umem_unpin_pages(struct xdp_umem *umem)
+{
+ unsigned int i;
+
+ for (i = 0; i < umem->npgs; i++) {
+ struct page *page = umem->pgs[i];
+
+ set_page_dirty_lock(page);
+ put_page(page);
+ }
+
+ kfree(umem->pgs);
+ umem->pgs = NULL;
+}
+
+static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
+{
+ atomic_long_sub(umem->npgs, &umem->user->locked_vm);
+ free_uid(umem->user);
+}
+
+static void xdp_umem_release(struct xdp_umem *umem)
+{
+ struct task_struct *task;
+ struct mm_struct *mm;
+
+ if (umem->fq) {
+ xskq_destroy(umem->fq);
+ umem->fq = NULL;
+ }
+
+ if (umem->cq) {
+ xskq_destroy(umem->cq);
+ umem->cq = NULL;
+ }
+
+ xdp_umem_unpin_pages(umem);
+
+ task = get_pid_task(umem->pid, PIDTYPE_PID);
+ put_pid(umem->pid);
+ if (!task)
+ goto out;
+ mm = get_task_mm(task);
+ put_task_struct(task);
+ if (!mm)
+ goto out;
+
+ mmput(mm);
+ xdp_umem_unaccount_pages(umem);
+out:
+ kfree(umem);
+}
+
+static void xdp_umem_release_deferred(struct work_struct *work)
+{
+ struct xdp_umem *umem = container_of(work, struct xdp_umem, work);
+
+ xdp_umem_release(umem);
+}
+
+void xdp_get_umem(struct xdp_umem *umem)
+{
+ refcount_inc(&umem->users);
+}
+
+void xdp_put_umem(struct xdp_umem *umem)
+{
+ if (!umem)
+ return;
+
+ if (refcount_dec_and_test(&umem->users)) {
+ INIT_WORK(&umem->work, xdp_umem_release_deferred);
+ schedule_work(&umem->work);
+ }
+}
+
+static int xdp_umem_pin_pages(struct xdp_umem *umem)
+{
+ unsigned int gup_flags = FOLL_WRITE;
+ long npgs;
+ int err;
+
+ umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL);
+ if (!umem->pgs)
+ return -ENOMEM;
+
+ down_write(¤t->mm->mmap_sem);
+ npgs = get_user_pages(umem->address, umem->npgs,
+ gup_flags, &umem->pgs[0], NULL);
+ up_write(¤t->mm->mmap_sem);
+
+ if (npgs != umem->npgs) {
+ if (npgs >= 0) {
+ umem->npgs = npgs;
+ err = -ENOMEM;
+ goto out_pin;
+ }
+ err = npgs;
+ goto out_pgs;
+ }
+ return 0;
+
+out_pin:
+ xdp_umem_unpin_pages(umem);
+out_pgs:
+ kfree(umem->pgs);
+ umem->pgs = NULL;
+ return err;
+}
+
+static int xdp_umem_account_pages(struct xdp_umem *umem)
+{
+ unsigned long lock_limit, new_npgs, old_npgs;
+
+ if (capable(CAP_IPC_LOCK))
+ return 0;
+
+ lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+ umem->user = get_uid(current_user());
+
+ do {
+ old_npgs = atomic_long_read(&umem->user->locked_vm);
+ new_npgs = old_npgs + umem->npgs;
+ if (new_npgs > lock_limit) {
+ free_uid(umem->user);
+ umem->user = NULL;
+ return -ENOBUFS;
+ }
+ } while (atomic_long_cmpxchg(&umem->user->locked_vm, old_npgs,
+ new_npgs) != old_npgs);
+ return 0;
+}
+
+static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
+{
+ u32 frame_size = mr->frame_size, frame_headroom = mr->frame_headroom;
+ u64 addr = mr->addr, size = mr->len;
+ unsigned int nframes, nfpp;
+ int size_chk, err;
+
+ if (frame_size < XDP_UMEM_MIN_FRAME_SIZE || frame_size > PAGE_SIZE) {
+ /* Strictly speaking we could support this, if:
+ * - huge pages, or*
+ * - using an IOMMU, or
+ * - making sure the memory area is consecutive
+ * but for now, we simply say "computer says no".
+ */
+ return -EINVAL;
+ }
+
+ if (!is_power_of_2(frame_size))
+ return -EINVAL;
+
+ if (!PAGE_ALIGNED(addr)) {
+ /* Memory area has to be page size aligned. For
+ * simplicity, this might change.
+ */
+ return -EINVAL;
+ }
+
+ if ((addr + size) < addr)
+ return -EINVAL;
+
+ nframes = (unsigned int)div_u64(size, frame_size);
+ if (nframes == 0 || nframes > UINT_MAX)
+ return -EINVAL;
+
+ nfpp = PAGE_SIZE / frame_size;
+ if (nframes < nfpp || nframes % nfpp)
+ return -EINVAL;
+
+ frame_headroom = ALIGN(frame_headroom, 64);
+
+ size_chk = frame_size - frame_headroom - XDP_PACKET_HEADROOM;
+ if (size_chk < 0)
+ return -EINVAL;
+
+ umem->pid = get_task_pid(current, PIDTYPE_PID);
+ umem->size = (size_t)size;
+ umem->address = (unsigned long)addr;
+ umem->props.frame_size = frame_size;
+ umem->props.nframes = nframes;
+ umem->frame_headroom = frame_headroom;
+ umem->npgs = size / PAGE_SIZE;
+ umem->pgs = NULL;
+ umem->user = NULL;
+
+ umem->frame_size_log2 = ilog2(frame_size);
+ umem->nfpp_mask = nfpp - 1;
+ umem->nfpplog2 = ilog2(nfpp);
+ refcount_set(&umem->users, 1);
+
+ err = xdp_umem_account_pages(umem);
+ if (err)
+ goto out;
+
+ err = xdp_umem_pin_pages(umem);
+ if (err)
+ goto out_account;
+ return 0;
+
+out_account:
+ xdp_umem_unaccount_pages(umem);
+out:
+ put_pid(umem->pid);
+ return err;
+}
+
+struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr)
+{
+ struct xdp_umem *umem;
+ int err;
+
+ umem = kzalloc(sizeof(*umem), GFP_KERNEL);
+ if (!umem)
+ return ERR_PTR(-ENOMEM);
+
+ err = xdp_umem_reg(umem, mr);
+ if (err) {
+ kfree(umem);
+ return ERR_PTR(err);
+ }
+
+ return umem;
+}
+
+bool xdp_umem_validate_queues(struct xdp_umem *umem)
+{
+ return umem->fq && umem->cq;
+}
diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
new file mode 100644
index 0000000..0881cf4
--- /dev/null
+++ b/net/xdp/xdp_umem.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* XDP user-space packet buffer
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#ifndef XDP_UMEM_H_
+#define XDP_UMEM_H_
+
+#include <linux/mm.h>
+#include <linux/if_xdp.h>
+#include <linux/workqueue.h>
+
+#include "xsk_queue.h"
+#include "xdp_umem_props.h"
+
+struct xdp_umem {
+ struct xsk_queue *fq;
+ struct xsk_queue *cq;
+ struct page **pgs;
+ struct xdp_umem_props props;
+ u32 npgs;
+ u32 frame_headroom;
+ u32 nfpp_mask;
+ u32 nfpplog2;
+ u32 frame_size_log2;
+ struct user_struct *user;
+ struct pid *pid;
+ unsigned long address;
+ size_t size;
+ refcount_t users;
+ struct work_struct work;
+};
+
+static inline char *xdp_umem_get_data(struct xdp_umem *umem, u32 idx)
+{
+ u64 pg, off;
+ char *data;
+
+ pg = idx >> umem->nfpplog2;
+ off = (idx & umem->nfpp_mask) << umem->frame_size_log2;
+
+ data = page_address(umem->pgs[pg]);
+ return data + off;
+}
+
+static inline char *xdp_umem_get_data_with_headroom(struct xdp_umem *umem,
+ u32 idx)
+{
+ return xdp_umem_get_data(umem, idx) + umem->frame_headroom;
+}
+
+bool xdp_umem_validate_queues(struct xdp_umem *umem);
+void xdp_get_umem(struct xdp_umem *umem);
+void xdp_put_umem(struct xdp_umem *umem);
+struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr);
+
+#endif /* XDP_UMEM_H_ */
diff --git a/net/xdp/xdp_umem_props.h b/net/xdp/xdp_umem_props.h
new file mode 100644
index 0000000..2cf8ec4
--- /dev/null
+++ b/net/xdp/xdp_umem_props.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* XDP user-space packet buffer
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#ifndef XDP_UMEM_PROPS_H_
+#define XDP_UMEM_PROPS_H_
+
+struct xdp_umem_props {
+ u32 frame_size;
+ u32 nframes;
+};
+
+#endif /* XDP_UMEM_PROPS_H_ */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
new file mode 100644
index 0000000..cce0e4f
--- /dev/null
+++ b/net/xdp/xsk.c
@@ -0,0 +1,674 @@
+// SPDX-License-Identifier: GPL-2.0
+/* XDP sockets
+ *
+ * AF_XDP sockets allows a channel between XDP programs and userspace
+ * applications.
+ * Copyright(c) 2018 Intel Corporation.
+ *
+ * Author(s): Björn Töpel <bjorn.topel@intel.com>
+ * Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#define pr_fmt(fmt) "AF_XDP: %s: " fmt, __func__
+
+#include <linux/if_xdp.h>
+#include <linux/init.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/task.h>
+#include <linux/socket.h>
+#include <linux/file.h>
+#include <linux/uaccess.h>
+#include <linux/net.h>
+#include <linux/netdevice.h>
+#include <net/xdp_sock.h>
+#include <net/xdp.h>
+
+#include "xsk_queue.h"
+#include "xdp_umem.h"
+
+#define TX_BATCH_SIZE 16
+
+static struct xdp_sock *xdp_sk(struct sock *sk)
+{
+ return (struct xdp_sock *)sk;
+}
+
+bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
+{
+ return !!xs->rx;
+}
+
+static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
+ u32 *id, len = xdp->data_end - xdp->data;
+ void *buffer;
+ int err = 0;
+
+ if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
+ return -EINVAL;
+
+ id = xskq_peek_id(xs->umem->fq);
+ if (!id)
+ return -ENOSPC;
+
+ buffer = xdp_umem_get_data_with_headroom(xs->umem, *id);
+ memcpy(buffer, xdp->data, len);
+ err = xskq_produce_batch_desc(xs->rx, *id, len,
+ xs->umem->frame_headroom);
+ if (!err)
+ xskq_discard_id(xs->umem->fq);
+
+ return err;
+}
+
+int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
+ int err;
+
+ err = __xsk_rcv(xs, xdp);
+ if (likely(!err))
+ xdp_return_buff(xdp);
+ else
+ xs->rx_dropped++;
+
+ return err;
+}
+
+void xsk_flush(struct xdp_sock *xs)
+{
+ xskq_produce_flush_desc(xs->rx);
+ xs->sk.sk_data_ready(&xs->sk);
+}
+
+int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
+{
+ int err;
+
+ err = __xsk_rcv(xs, xdp);
+ if (!err)
+ xsk_flush(xs);
+ else
+ xs->rx_dropped++;
+
+ return err;
+}
+
+static void xsk_destruct_skb(struct sk_buff *skb)
+{
+ u32 id = (u32)(long)skb_shinfo(skb)->destructor_arg;
+ struct xdp_sock *xs = xdp_sk(skb->sk);
+
+ WARN_ON_ONCE(xskq_produce_id(xs->umem->cq, id));
+
+ sock_wfree(skb);
+}
+
+static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
+ size_t total_len)
+{
+ bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
+ u32 max_batch = TX_BATCH_SIZE;
+ struct xdp_sock *xs = xdp_sk(sk);
+ bool sent_frame = false;
+ struct xdp_desc desc;
+ struct sk_buff *skb;
+ int err = 0;
+
+ if (unlikely(!xs->tx))
+ return -ENOBUFS;
+ if (need_wait)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&xs->mutex);
+
+ while (xskq_peek_desc(xs->tx, &desc)) {
+ char *buffer;
+ u32 id, len;
+
+ if (max_batch-- == 0) {
+ err = -EAGAIN;
+ goto out;
+ }
+
+ if (xskq_reserve_id(xs->umem->cq)) {
+ err = -EAGAIN;
+ goto out;
+ }
+
+ len = desc.len;
+ if (unlikely(len > xs->dev->mtu)) {
+ err = -EMSGSIZE;
+ goto out;
+ }
+
+ if (xs->queue_id >= xs->dev->real_num_tx_queues) {
+ err = -ENXIO;
+ goto out;
+ }
+
+ skb = sock_alloc_send_skb(sk, len, !need_wait, &err);
+ if (unlikely(!skb)) {
+ err = -EAGAIN;
+ goto out;
+ }
+
+ skb_put(skb, len);
+ id = desc.idx;
+ buffer = xdp_umem_get_data(xs->umem, id) + desc.offset;
+ err = skb_store_bits(skb, 0, buffer, len);
+ if (unlikely(err)) {
+ kfree_skb(skb);
+ goto out;
+ }
+
+ skb->dev = xs->dev;
+ skb->priority = sk->sk_priority;
+ skb->mark = sk->sk_mark;
+ skb_shinfo(skb)->destructor_arg = (void *)(long)id;
+ skb->destructor = xsk_destruct_skb;
+
+ err = dev_direct_xmit(skb, xs->queue_id);
+ /* Ignore NET_XMIT_CN as packet might have been sent */
+ if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
+ err = -EAGAIN;
+ /* SKB consumed by dev_direct_xmit() */
+ goto out;
+ }
+
+ sent_frame = true;
+ xskq_discard_desc(xs->tx);
+ }
+
+out:
+ if (sent_frame)
+ sk->sk_write_space(sk);
+
+ mutex_unlock(&xs->mutex);
+ return err;
+}
+
+static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
+{
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+
+ if (unlikely(!xs->dev))
+ return -ENXIO;
+ if (unlikely(!(xs->dev->flags & IFF_UP)))
+ return -ENETDOWN;
+
+ return xsk_generic_xmit(sk, m, total_len);
+}
+
+static unsigned int xsk_poll(struct file *file, struct socket *sock,
+ struct poll_table_struct *wait)
+{
+ unsigned int mask = datagram_poll(file, sock, wait);
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+
+ if (xs->rx && !xskq_empty_desc(xs->rx))
+ mask |= POLLIN | POLLRDNORM;
+ if (xs->tx && !xskq_full_desc(xs->tx))
+ mask |= POLLOUT | POLLWRNORM;
+
+ return mask;
+}
+
+static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
+ bool umem_queue)
+{
+ struct xsk_queue *q;
+
+ if (entries == 0 || *queue || !is_power_of_2(entries))
+ return -EINVAL;
+
+ q = xskq_create(entries, umem_queue);
+ if (!q)
+ return -ENOMEM;
+
+ /* Make sure queue is ready before it can be seen by others */
+ smp_wmb();
+ *queue = q;
+ return 0;
+}
+
+static int xsk_release(struct socket *sock)
+{
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+ struct net *net;
+
+ if (!sk)
+ return 0;
+
+ net = sock_net(sk);
+
+ local_bh_disable();
+ sock_prot_inuse_add(net, sk->sk_prot, -1);
+ local_bh_enable();
+
+ if (xs->dev) {
+ /* Wait for driver to stop using the xdp socket. */
+ synchronize_net();
+ dev_put(xs->dev);
+ xs->dev = NULL;
+ }
+
+ sock_orphan(sk);
+ sock->sk = NULL;
+
+ sk_refcnt_debug_release(sk);
+ sock_put(sk);
+
+ return 0;
+}
+
+static struct socket *xsk_lookup_xsk_from_fd(int fd)
+{
+ struct socket *sock;
+ int err;
+
+ sock = sockfd_lookup(fd, &err);
+ if (!sock)
+ return ERR_PTR(-ENOTSOCK);
+
+ if (sock->sk->sk_family != PF_XDP) {
+ sockfd_put(sock);
+ return ERR_PTR(-ENOPROTOOPT);
+ }
+
+ return sock;
+}
+
+static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
+{
+ struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+ struct net_device *dev;
+ int err = 0;
+
+ if (addr_len < sizeof(struct sockaddr_xdp))
+ return -EINVAL;
+ if (sxdp->sxdp_family != AF_XDP)
+ return -EINVAL;
+
+ mutex_lock(&xs->mutex);
+ if (xs->dev) {
+ err = -EBUSY;
+ goto out_release;
+ }
+
+ dev = dev_get_by_index(sock_net(sk), sxdp->sxdp_ifindex);
+ if (!dev) {
+ err = -ENODEV;
+ goto out_release;
+ }
+
+ if (!xs->rx && !xs->tx) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
+ if ((xs->rx && sxdp->sxdp_queue_id >= dev->real_num_rx_queues) ||
+ (xs->tx && sxdp->sxdp_queue_id >= dev->real_num_tx_queues)) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
+ if (sxdp->sxdp_flags & XDP_SHARED_UMEM) {
+ struct xdp_sock *umem_xs;
+ struct socket *sock;
+
+ if (xs->umem) {
+ /* We have already our own. */
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
+ sock = xsk_lookup_xsk_from_fd(sxdp->sxdp_shared_umem_fd);
+ if (IS_ERR(sock)) {
+ err = PTR_ERR(sock);
+ goto out_unlock;
+ }
+
+ umem_xs = xdp_sk(sock->sk);
+ if (!umem_xs->umem) {
+ /* No umem to inherit. */
+ err = -EBADF;
+ sockfd_put(sock);
+ goto out_unlock;
+ } else if (umem_xs->dev != dev ||
+ umem_xs->queue_id != sxdp->sxdp_queue_id) {
+ err = -EINVAL;
+ sockfd_put(sock);
+ goto out_unlock;
+ }
+
+ xdp_get_umem(umem_xs->umem);
+ xs->umem = umem_xs->umem;
+ sockfd_put(sock);
+ } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
+ err = -EINVAL;
+ goto out_unlock;
+ } else {
+ /* This xsk has its own umem. */
+ xskq_set_umem(xs->umem->fq, &xs->umem->props);
+ xskq_set_umem(xs->umem->cq, &xs->umem->props);
+ }
+
+ xs->dev = dev;
+ xs->queue_id = sxdp->sxdp_queue_id;
+
+ xskq_set_umem(xs->rx, &xs->umem->props);
+ xskq_set_umem(xs->tx, &xs->umem->props);
+
+out_unlock:
+ if (err)
+ dev_put(dev);
+out_release:
+ mutex_unlock(&xs->mutex);
+ return err;
+}
+
+static int xsk_setsockopt(struct socket *sock, int level, int optname,
+ char __user *optval, unsigned int optlen)
+{
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+ int err;
+
+ if (level != SOL_XDP)
+ return -ENOPROTOOPT;
+
+ switch (optname) {
+ case XDP_RX_RING:
+ case XDP_TX_RING:
+ {
+ struct xsk_queue **q;
+ int entries;
+
+ if (optlen < sizeof(entries))
+ return -EINVAL;
+ if (copy_from_user(&entries, optval, sizeof(entries)))
+ return -EFAULT;
+
+ mutex_lock(&xs->mutex);
+ q = (optname == XDP_TX_RING) ? &xs->tx : &xs->rx;
+ err = xsk_init_queue(entries, q, false);
+ mutex_unlock(&xs->mutex);
+ return err;
+ }
+ case XDP_UMEM_REG:
+ {
+ struct xdp_umem_reg mr;
+ struct xdp_umem *umem;
+
+ if (copy_from_user(&mr, optval, sizeof(mr)))
+ return -EFAULT;
+
+ mutex_lock(&xs->mutex);
+ if (xs->umem) {
+ mutex_unlock(&xs->mutex);
+ return -EBUSY;
+ }
+
+ umem = xdp_umem_create(&mr);
+ if (IS_ERR(umem)) {
+ mutex_unlock(&xs->mutex);
+ return PTR_ERR(umem);
+ }
+
+ /* Make sure umem is ready before it can be seen by others */
+ smp_wmb();
+ xs->umem = umem;
+ mutex_unlock(&xs->mutex);
+ return 0;
+ }
+ case XDP_UMEM_FILL_RING:
+ case XDP_UMEM_COMPLETION_RING:
+ {
+ struct xsk_queue **q;
+ int entries;
+
+ if (copy_from_user(&entries, optval, sizeof(entries)))
+ return -EFAULT;
+
+ mutex_lock(&xs->mutex);
+ if (!xs->umem) {
+ mutex_unlock(&xs->mutex);
+ return -EINVAL;
+ }
+
+ q = (optname == XDP_UMEM_FILL_RING) ? &xs->umem->fq :
+ &xs->umem->cq;
+ err = xsk_init_queue(entries, q, true);
+ mutex_unlock(&xs->mutex);
+ return err;
+ }
+ default:
+ break;
+ }
+
+ return -ENOPROTOOPT;
+}
+
+static int xsk_getsockopt(struct socket *sock, int level, int optname,
+ char __user *optval, int __user *optlen)
+{
+ struct sock *sk = sock->sk;
+ struct xdp_sock *xs = xdp_sk(sk);
+ int len;
+
+ if (level != SOL_XDP)
+ return -ENOPROTOOPT;
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+ if (len < 0)
+ return -EINVAL;
+
+ switch (optname) {
+ case XDP_STATISTICS:
+ {
+ struct xdp_statistics stats;
+
+ if (len < sizeof(stats))
+ return -EINVAL;
+
+ mutex_lock(&xs->mutex);
+ stats.rx_dropped = xs->rx_dropped;
+ stats.rx_invalid_descs = xskq_nb_invalid_descs(xs->rx);
+ stats.tx_invalid_descs = xskq_nb_invalid_descs(xs->tx);
+ mutex_unlock(&xs->mutex);
+
+ if (copy_to_user(optval, &stats, sizeof(stats)))
+ return -EFAULT;
+ if (put_user(sizeof(stats), optlen))
+ return -EFAULT;
+
+ return 0;
+ }
+ case XDP_MMAP_OFFSETS:
+ {
+ struct xdp_mmap_offsets off;
+
+ if (len < sizeof(off))
+ return -EINVAL;
+
+ off.rx.producer = offsetof(struct xdp_rxtx_ring, ptrs.producer);
+ off.rx.consumer = offsetof(struct xdp_rxtx_ring, ptrs.consumer);
+ off.rx.desc = offsetof(struct xdp_rxtx_ring, desc);
+ off.tx.producer = offsetof(struct xdp_rxtx_ring, ptrs.producer);
+ off.tx.consumer = offsetof(struct xdp_rxtx_ring, ptrs.consumer);
+ off.tx.desc = offsetof(struct xdp_rxtx_ring, desc);
+
+ off.fr.producer = offsetof(struct xdp_umem_ring, ptrs.producer);
+ off.fr.consumer = offsetof(struct xdp_umem_ring, ptrs.consumer);
+ off.fr.desc = offsetof(struct xdp_umem_ring, desc);
+ off.cr.producer = offsetof(struct xdp_umem_ring, ptrs.producer);
+ off.cr.consumer = offsetof(struct xdp_umem_ring, ptrs.consumer);
+ off.cr.desc = offsetof(struct xdp_umem_ring, desc);
+
+ len = sizeof(off);
+ if (copy_to_user(optval, &off, len))
+ return -EFAULT;
+ if (put_user(len, optlen))
+ return -EFAULT;
+
+ return 0;
+ }
+ default:
+ break;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static int xsk_mmap(struct file *file, struct socket *sock,
+ struct vm_area_struct *vma)
+{
+ unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
+ unsigned long size = vma->vm_end - vma->vm_start;
+ struct xdp_sock *xs = xdp_sk(sock->sk);
+ struct xsk_queue *q = NULL;
+ struct xdp_umem *umem;
+ unsigned long pfn;
+ struct page *qpg;
+
+ if (offset == XDP_PGOFF_RX_RING) {
+ q = READ_ONCE(xs->rx);
+ } else if (offset == XDP_PGOFF_TX_RING) {
+ q = READ_ONCE(xs->tx);
+ } else {
+ umem = READ_ONCE(xs->umem);
+ if (!umem)
+ return -EINVAL;
+
+ if (offset == XDP_UMEM_PGOFF_FILL_RING)
+ q = READ_ONCE(umem->fq);
+ else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING)
+ q = READ_ONCE(umem->cq);
+ }
+
+ if (!q)
+ return -EINVAL;
+
+ qpg = virt_to_head_page(q->ring);
+ if (size > (PAGE_SIZE << compound_order(qpg)))
+ return -EINVAL;
+
+ pfn = virt_to_phys(q->ring) >> PAGE_SHIFT;
+ return remap_pfn_range(vma, vma->vm_start, pfn,
+ size, vma->vm_page_prot);
+}
+
+static struct proto xsk_proto = {
+ .name = "XDP",
+ .owner = THIS_MODULE,
+ .obj_size = sizeof(struct xdp_sock),
+};
+
+static const struct proto_ops xsk_proto_ops = {
+ .family = PF_XDP,
+ .owner = THIS_MODULE,
+ .release = xsk_release,
+ .bind = xsk_bind,
+ .connect = sock_no_connect,
+ .socketpair = sock_no_socketpair,
+ .accept = sock_no_accept,
+ .getname = sock_no_getname,
+ .poll = xsk_poll,
+ .ioctl = sock_no_ioctl,
+ .listen = sock_no_listen,
+ .shutdown = sock_no_shutdown,
+ .setsockopt = xsk_setsockopt,
+ .getsockopt = xsk_getsockopt,
+ .sendmsg = xsk_sendmsg,
+ .recvmsg = sock_no_recvmsg,
+ .mmap = xsk_mmap,
+ .sendpage = sock_no_sendpage,
+};
+
+static void xsk_destruct(struct sock *sk)
+{
+ struct xdp_sock *xs = xdp_sk(sk);
+
+ if (!sock_flag(sk, SOCK_DEAD))
+ return;
+
+ xskq_destroy(xs->rx);
+ xskq_destroy(xs->tx);
+ xdp_put_umem(xs->umem);
+
+ sk_refcnt_debug_dec(sk);
+}
+
+static int xsk_create(struct net *net, struct socket *sock, int protocol,
+ int kern)
+{
+ struct sock *sk;
+ struct xdp_sock *xs;
+
+ if (!ns_capable(net->user_ns, CAP_NET_RAW))
+ return -EPERM;
+ if (sock->type != SOCK_RAW)
+ return -ESOCKTNOSUPPORT;
+
+ if (protocol)
+ return -EPROTONOSUPPORT;
+
+ sock->state = SS_UNCONNECTED;
+
+ sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern);
+ if (!sk)
+ return -ENOBUFS;
+
+ sock->ops = &xsk_proto_ops;
+
+ sock_init_data(sock, sk);
+
+ sk->sk_family = PF_XDP;
+
+ sk->sk_destruct = xsk_destruct;
+ sk_refcnt_debug_inc(sk);
+
+ xs = xdp_sk(sk);
+ mutex_init(&xs->mutex);
+
+ local_bh_disable();
+ sock_prot_inuse_add(net, &xsk_proto, 1);
+ local_bh_enable();
+
+ return 0;
+}
+
+static const struct net_proto_family xsk_family_ops = {
+ .family = PF_XDP,
+ .create = xsk_create,
+ .owner = THIS_MODULE,
+};
+
+static int __init xsk_init(void)
+{
+ int err;
+
+ err = proto_register(&xsk_proto, 0 /* no slab */);
+ if (err)
+ goto out;
+
+ err = sock_register(&xsk_family_ops);
+ if (err)
+ goto out_proto;
+
+ return 0;
+
+out_proto:
+ proto_unregister(&xsk_proto);
+out:
+ return err;
+}
+
+fs_initcall(xsk_init);
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
new file mode 100644
index 0000000..ebe85e5
--- /dev/null
+++ b/net/xdp/xsk_queue.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+/* XDP user-space ring structure
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <linux/slab.h>
+
+#include "xsk_queue.h"
+
+void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props)
+{
+ if (!q)
+ return;
+
+ q->umem_props = *umem_props;
+}
+
+static u32 xskq_umem_get_ring_size(struct xsk_queue *q)
+{
+ return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u32);
+}
+
+static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q)
+{
+ return sizeof(struct xdp_ring) + q->nentries * sizeof(struct xdp_desc);
+}
+
+struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
+{
+ struct xsk_queue *q;
+ gfp_t gfp_flags;
+ size_t size;
+
+ q = kzalloc(sizeof(*q), GFP_KERNEL);
+ if (!q)
+ return NULL;
+
+ q->nentries = nentries;
+ q->ring_mask = nentries - 1;
+
+ gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN |
+ __GFP_COMP | __GFP_NORETRY;
+ size = umem_queue ? xskq_umem_get_ring_size(q) :
+ xskq_rxtx_get_ring_size(q);
+
+ q->ring = (struct xdp_ring *)__get_free_pages(gfp_flags,
+ get_order(size));
+ if (!q->ring) {
+ kfree(q);
+ return NULL;
+ }
+
+ return q;
+}
+
+void xskq_destroy(struct xsk_queue *q)
+{
+ if (!q)
+ return;
+
+ page_frag_free(q->ring);
+ kfree(q);
+}
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
new file mode 100644
index 0000000..cb8e5be
--- /dev/null
+++ b/net/xdp/xsk_queue.h
@@ -0,0 +1,255 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* XDP user-space ring structure
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#ifndef _LINUX_XSK_QUEUE_H
+#define _LINUX_XSK_QUEUE_H
+
+#include <linux/types.h>
+#include <linux/if_xdp.h>
+
+#include "xdp_umem_props.h"
+
+#define RX_BATCH_SIZE 16
+
+struct xdp_ring {
+ u32 producer ____cacheline_aligned_in_smp;
+ u32 consumer ____cacheline_aligned_in_smp;
+};
+
+/* Used for the RX and TX queues for packets */
+struct xdp_rxtx_ring {
+ struct xdp_ring ptrs;
+ struct xdp_desc desc[0] ____cacheline_aligned_in_smp;
+};
+
+/* Used for the fill and completion queues for buffers */
+struct xdp_umem_ring {
+ struct xdp_ring ptrs;
+ u32 desc[0] ____cacheline_aligned_in_smp;
+};
+
+struct xsk_queue {
+ struct xdp_umem_props umem_props;
+ u32 ring_mask;
+ u32 nentries;
+ u32 prod_head;
+ u32 prod_tail;
+ u32 cons_head;
+ u32 cons_tail;
+ struct xdp_ring *ring;
+ u64 invalid_descs;
+};
+
+/* Common functions operating for both RXTX and umem queues */
+
+static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
+{
+ return q ? q->invalid_descs : 0;
+}
+
+static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
+{
+ u32 entries = q->prod_tail - q->cons_tail;
+
+ if (entries == 0) {
+ /* Refresh the local pointer */
+ q->prod_tail = READ_ONCE(q->ring->producer);
+ entries = q->prod_tail - q->cons_tail;
+ }
+
+ return (entries > dcnt) ? dcnt : entries;
+}
+
+static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
+{
+ u32 free_entries = q->nentries - (producer - q->cons_tail);
+
+ if (free_entries >= dcnt)
+ return free_entries;
+
+ /* Refresh the local tail pointer */
+ q->cons_tail = READ_ONCE(q->ring->consumer);
+ return q->nentries - (producer - q->cons_tail);
+}
+
+/* UMEM queue */
+
+static inline bool xskq_is_valid_id(struct xsk_queue *q, u32 idx)
+{
+ if (unlikely(idx >= q->umem_props.nframes)) {
+ q->invalid_descs++;
+ return false;
+ }
+ return true;
+}
+
+static inline u32 *xskq_validate_id(struct xsk_queue *q)
+{
+ while (q->cons_tail != q->cons_head) {
+ struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
+ unsigned int idx = q->cons_tail & q->ring_mask;
+
+ if (xskq_is_valid_id(q, ring->desc[idx]))
+ return &ring->desc[idx];
+
+ q->cons_tail++;
+ }
+
+ return NULL;
+}
+
+static inline u32 *xskq_peek_id(struct xsk_queue *q)
+{
+ struct xdp_umem_ring *ring;
+
+ if (q->cons_tail == q->cons_head) {
+ WRITE_ONCE(q->ring->consumer, q->cons_tail);
+ q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
+
+ /* Order consumer and data */
+ smp_rmb();
+
+ return xskq_validate_id(q);
+ }
+
+ ring = (struct xdp_umem_ring *)q->ring;
+ return &ring->desc[q->cons_tail & q->ring_mask];
+}
+
+static inline void xskq_discard_id(struct xsk_queue *q)
+{
+ q->cons_tail++;
+ (void)xskq_validate_id(q);
+}
+
+static inline int xskq_produce_id(struct xsk_queue *q, u32 id)
+{
+ struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
+
+ ring->desc[q->prod_tail++ & q->ring_mask] = id;
+
+ /* Order producer and data */
+ smp_wmb();
+
+ WRITE_ONCE(q->ring->producer, q->prod_tail);
+ return 0;
+}
+
+static inline int xskq_reserve_id(struct xsk_queue *q)
+{
+ if (xskq_nb_free(q, q->prod_head, 1) == 0)
+ return -ENOSPC;
+
+ q->prod_head++;
+ return 0;
+}
+
+/* Rx/Tx queue */
+
+static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
+{
+ u32 buff_len;
+
+ if (unlikely(d->idx >= q->umem_props.nframes)) {
+ q->invalid_descs++;
+ return false;
+ }
+
+ buff_len = q->umem_props.frame_size;
+ if (unlikely(d->len > buff_len || d->len == 0 ||
+ d->offset > buff_len || d->offset + d->len > buff_len)) {
+ q->invalid_descs++;
+ return false;
+ }
+
+ return true;
+}
+
+static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
+ struct xdp_desc *desc)
+{
+ while (q->cons_tail != q->cons_head) {
+ struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
+ unsigned int idx = q->cons_tail & q->ring_mask;
+
+ if (xskq_is_valid_desc(q, &ring->desc[idx])) {
+ if (desc)
+ *desc = ring->desc[idx];
+ return desc;
+ }
+
+ q->cons_tail++;
+ }
+
+ return NULL;
+}
+
+static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
+ struct xdp_desc *desc)
+{
+ struct xdp_rxtx_ring *ring;
+
+ if (q->cons_tail == q->cons_head) {
+ WRITE_ONCE(q->ring->consumer, q->cons_tail);
+ q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
+
+ /* Order consumer and data */
+ smp_rmb();
+
+ return xskq_validate_desc(q, desc);
+ }
+
+ ring = (struct xdp_rxtx_ring *)q->ring;
+ *desc = ring->desc[q->cons_tail & q->ring_mask];
+ return desc;
+}
+
+static inline void xskq_discard_desc(struct xsk_queue *q)
+{
+ q->cons_tail++;
+ (void)xskq_validate_desc(q, NULL);
+}
+
+static inline int xskq_produce_batch_desc(struct xsk_queue *q,
+ u32 id, u32 len, u16 offset)
+{
+ struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
+ unsigned int idx;
+
+ if (xskq_nb_free(q, q->prod_head, 1) == 0)
+ return -ENOSPC;
+
+ idx = (q->prod_head++) & q->ring_mask;
+ ring->desc[idx].idx = id;
+ ring->desc[idx].len = len;
+ ring->desc[idx].offset = offset;
+
+ return 0;
+}
+
+static inline void xskq_produce_flush_desc(struct xsk_queue *q)
+{
+ /* Order producer and data */
+ smp_wmb();
+
+ q->prod_tail = q->prod_head,
+ WRITE_ONCE(q->ring->producer, q->prod_tail);
+}
+
+static inline bool xskq_full_desc(struct xsk_queue *q)
+{
+ return xskq_nb_avail(q, q->nentries) == q->nentries;
+}
+
+static inline bool xskq_empty_desc(struct xsk_queue *q)
+{
+ return xskq_nb_free(q, q->prod_tail, 1) == q->nentries;
+}
+
+void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props);
+struct xsk_queue *xskq_create(u32 nentries, bool umem_queue);
+void xskq_destroy(struct xsk_queue *q_ops);
+
+#endif /* _LINUX_XSK_QUEUE_H */
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 6c177ae..8308281 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -42,6 +42,7 @@
static unsigned int xfrm_state_hashmax __read_mostly = 1 * 1024 * 1024;
static __read_mostly seqcount_t xfrm_state_hash_generation = SEQCNT_ZERO(xfrm_state_hash_generation);
+static struct kmem_cache *xfrm_state_cache __ro_after_init;
static DECLARE_WORK(xfrm_state_gc_work, xfrm_state_gc_task);
static HLIST_HEAD(xfrm_state_gc_list);
@@ -451,7 +452,7 @@
}
xfrm_dev_state_free(x);
security_xfrm_state_free(x);
- kfree(x);
+ kmem_cache_free(xfrm_state_cache, x);
}
static void xfrm_state_gc_task(struct work_struct *work)
@@ -563,7 +564,7 @@
{
struct xfrm_state *x;
- x = kzalloc(sizeof(struct xfrm_state), GFP_ATOMIC);
+ x = kmem_cache_alloc(xfrm_state_cache, GFP_ATOMIC | __GFP_ZERO);
if (x) {
write_pnet(&x->xs_net, net);
@@ -2313,6 +2314,10 @@
{
unsigned int sz;
+ if (net_eq(net, &init_net))
+ xfrm_state_cache = KMEM_CACHE(xfrm_state,
+ SLAB_HWCACHE_ALIGN | SLAB_PANIC);
+
INIT_LIST_HEAD(&net->xfrm.state_all);
sz = sizeof(struct hlist_head) * 8;
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0929476..1303af1 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -1,4 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
+
+BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src))
+TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools
+
# List of programs to build
hostprogs-y := test_lru_dist
hostprogs-y += sock_example
@@ -44,57 +48,65 @@
hostprogs-y += xdp_rxq_info
hostprogs-y += syscall_tp
hostprogs-y += cpustat
+hostprogs-y += xdp_adjust_tail
+hostprogs-y += xdpsock
+hostprogs-y += xdp_fwd
+hostprogs-y += task_fd_query
# Libbpf dependencies
-LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
-CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
+LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
-test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
-sock_example-objs := sock_example.o $(LIBBPF)
-fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o
-sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o
-sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o
-sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o
-tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o
-tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o
-tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
-tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
-tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
-tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
-tracex7-objs := bpf_load.o $(LIBBPF) tracex7_user.o
-load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o
-test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
-trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
-lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
-offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o
-spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o
-map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
-test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
-test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
-test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o
-test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS)
-test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o
-test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
-xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
+TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o
+
+fds_example-objs := bpf_load.o fds_example.o
+sockex1-objs := bpf_load.o sockex1_user.o
+sockex2-objs := bpf_load.o sockex2_user.o
+sockex3-objs := bpf_load.o sockex3_user.o
+tracex1-objs := bpf_load.o tracex1_user.o
+tracex2-objs := bpf_load.o tracex2_user.o
+tracex3-objs := bpf_load.o tracex3_user.o
+tracex4-objs := bpf_load.o tracex4_user.o
+tracex5-objs := bpf_load.o tracex5_user.o
+tracex6-objs := bpf_load.o tracex6_user.o
+tracex7-objs := bpf_load.o tracex7_user.o
+load_sock_ops-objs := bpf_load.o load_sock_ops.o
+test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o
+trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS)
+lathist-objs := bpf_load.o lathist_user.o
+offwaketime-objs := bpf_load.o offwaketime_user.o $(TRACE_HELPERS)
+spintest-objs := bpf_load.o spintest_user.o $(TRACE_HELPERS)
+map_perf_test-objs := bpf_load.o map_perf_test_user.o
+test_overhead-objs := bpf_load.o test_overhead_user.o
+test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o
+test_cgrp2_attach-objs := test_cgrp2_attach.o
+test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(CGROUP_HELPERS)
+test_cgrp2_sock-objs := test_cgrp2_sock.o
+test_cgrp2_sock2-objs := bpf_load.o test_cgrp2_sock2.o
+xdp1-objs := xdp1_user.o
# reuse xdp1 source intentionally
-xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
-xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o
-test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \
+xdp2-objs := xdp1_user.o
+xdp_router_ipv4-objs := bpf_load.o xdp_router_ipv4_user.o
+test_current_task_under_cgroup-objs := bpf_load.o $(CGROUP_HELPERS) \
test_current_task_under_cgroup_user.o
-trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
-sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
-tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
-lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o
-xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o
-test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o
-per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
-xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
-xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
-xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
-xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
-xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
-syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
-cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
+trace_event-objs := bpf_load.o trace_event_user.o $(TRACE_HELPERS)
+sampleip-objs := bpf_load.o sampleip_user.o $(TRACE_HELPERS)
+tc_l2_redirect-objs := bpf_load.o tc_l2_redirect_user.o
+lwt_len_hist-objs := bpf_load.o lwt_len_hist_user.o
+xdp_tx_iptunnel-objs := bpf_load.o xdp_tx_iptunnel_user.o
+test_map_in_map-objs := bpf_load.o test_map_in_map_user.o
+per_socket_stats_example-objs := cookie_uid_helper_example.o
+xdp_redirect-objs := bpf_load.o xdp_redirect_user.o
+xdp_redirect_map-objs := bpf_load.o xdp_redirect_map_user.o
+xdp_redirect_cpu-objs := bpf_load.o xdp_redirect_cpu_user.o
+xdp_monitor-objs := bpf_load.o xdp_monitor_user.o
+xdp_rxq_info-objs := xdp_rxq_info_user.o
+syscall_tp-objs := bpf_load.o syscall_tp_user.o
+cpustat-objs := bpf_load.o cpustat_user.o
+xdp_adjust_tail-objs := xdp_adjust_tail_user.o
+xdpsock-objs := bpf_load.o xdpsock_user.o
+xdp_fwd-objs := bpf_load.o xdp_fwd_user.o
+task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -112,7 +124,6 @@
always += test_probe_write_user_kern.o
always += trace_output_kern.o
always += tcbpf1_kern.o
-always += tcbpf2_kern.o
always += tc_l2_redirect_kern.o
always += lathist_kern.o
always += offwaketime_kern.o
@@ -148,6 +159,10 @@
always += xdp2skb_meta_kern.o
always += syscall_tp_kern.o
always += cpustat_kern.o
+always += xdp_adjust_tail_kern.o
+always += xdpsock_kern.o
+always += xdp_fwd_kern.o
+always += task_fd_query_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -156,43 +171,21 @@
HOSTCFLAGS += -I$(srctree)/tools/perf
HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
-HOSTLOADLIBES_fds_example += -lelf
-HOSTLOADLIBES_sockex1 += -lelf
-HOSTLOADLIBES_sockex2 += -lelf
-HOSTLOADLIBES_sockex3 += -lelf
-HOSTLOADLIBES_tracex1 += -lelf
-HOSTLOADLIBES_tracex2 += -lelf
-HOSTLOADLIBES_tracex3 += -lelf
-HOSTLOADLIBES_tracex4 += -lelf -lrt
-HOSTLOADLIBES_tracex5 += -lelf
-HOSTLOADLIBES_tracex6 += -lelf
-HOSTLOADLIBES_tracex7 += -lelf
-HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
-HOSTLOADLIBES_load_sock_ops += -lelf
-HOSTLOADLIBES_test_probe_write_user += -lelf
-HOSTLOADLIBES_trace_output += -lelf -lrt
-HOSTLOADLIBES_lathist += -lelf
-HOSTLOADLIBES_offwaketime += -lelf
-HOSTLOADLIBES_spintest += -lelf
-HOSTLOADLIBES_map_perf_test += -lelf -lrt
-HOSTLOADLIBES_test_overhead += -lelf -lrt
-HOSTLOADLIBES_xdp1 += -lelf
-HOSTLOADLIBES_xdp2 += -lelf
-HOSTLOADLIBES_xdp_router_ipv4 += -lelf
-HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
-HOSTLOADLIBES_trace_event += -lelf
-HOSTLOADLIBES_sampleip += -lelf
-HOSTLOADLIBES_tc_l2_redirect += -l elf
-HOSTLOADLIBES_lwt_len_hist += -l elf
-HOSTLOADLIBES_xdp_tx_iptunnel += -lelf
-HOSTLOADLIBES_test_map_in_map += -lelf
-HOSTLOADLIBES_xdp_redirect += -lelf
-HOSTLOADLIBES_xdp_redirect_map += -lelf
-HOSTLOADLIBES_xdp_redirect_cpu += -lelf
-HOSTLOADLIBES_xdp_monitor += -lelf
-HOSTLOADLIBES_xdp_rxq_info += -lelf
-HOSTLOADLIBES_syscall_tp += -lelf
-HOSTLOADLIBES_cpustat += -lelf
+HOSTCFLAGS_trace_helpers.o += -I$(srctree)/tools/lib/bpf/
+
+HOSTCFLAGS_trace_output_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_offwaketime_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_spintest_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_trace_event_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_sampleip_user.o += -I$(srctree)/tools/lib/bpf/
+HOSTCFLAGS_task_fd_query_user.o += -I$(srctree)/tools/lib/bpf/
+
+HOST_LOADLIBES += $(LIBBPF) -lelf
+HOSTLOADLIBES_tracex4 += -lrt
+HOSTLOADLIBES_trace_output += -lrt
+HOSTLOADLIBES_map_perf_test += -lrt
+HOSTLOADLIBES_test_overhead += -lrt
+HOSTLOADLIBES_xdpsock += -pthread
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
@@ -206,15 +199,16 @@
endif
# Trick to allow make to be run from this directory
-all: $(LIBBPF)
- $(MAKE) -C ../../ $(CURDIR)/
+all:
+ $(MAKE) -C ../../ $(CURDIR)/ BPF_SAMPLES_PATH=$(CURDIR)
clean:
$(MAKE) -C ../../ M=$(CURDIR) clean
@rm -f *~
$(LIBBPF): FORCE
- $(MAKE) -C $(dir $@) $(notdir $@)
+# Fix up variables inherited from Kbuild that tools/ build system won't like
+ $(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(BPF_SAMPLES_PATH)/../../ O=
$(obj)/syscall_nrs.s: $(src)/syscall_nrs.c
$(call if_changed_dep,cc_s_c)
@@ -245,7 +239,8 @@
exit 2; \
else true; fi
-$(src)/*.c: verify_target_bpf
+$(BPF_SAMPLES_PATH)/*.c: verify_target_bpf $(LIBBPF)
+$(src)/*.c: verify_target_bpf $(LIBBPF)
$(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
@@ -253,7 +248,8 @@
# But, there is no easy way to fix it, so just exclude it since it is
# useless for BPF samples.
$(obj)/%.o: $(src)/%.c
- $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
+ @echo " CLANG-bpf " $@
+ $(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
-I$(srctree)/tools/testing/selftests/bpf/ \
-D__KERNEL__ -D__BPF_TRACING__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
diff --git a/samples/bpf/libbpf.h b/samples/bpf/bpf_insn.h
similarity index 97%
rename from samples/bpf/libbpf.h
rename to samples/bpf/bpf_insn.h
index 18bfee5..20dc5ce 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/bpf_insn.h
@@ -1,9 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* eBPF mini library */
-#ifndef __LIBBPF_H
-#define __LIBBPF_H
-
-#include <bpf/bpf.h>
+/* eBPF instruction mini library */
+#ifndef __BPF_INSN_H
+#define __BPF_INSN_H
struct bpf_insn;
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index bebe418..89161c9 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -24,7 +24,7 @@
#include <poll.h>
#include <ctype.h>
#include <assert.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "perf-sys.h"
@@ -145,6 +145,9 @@
}
if (is_kprobe || is_kretprobe) {
+ bool need_normal_check = true;
+ const char *event_prefix = "";
+
if (is_kprobe)
event += 7;
else
@@ -158,18 +161,33 @@
if (isdigit(*event))
return populate_prog_array(event, fd);
- snprintf(buf, sizeof(buf),
- "echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
- is_kprobe ? 'p' : 'r', event, event);
- err = system(buf);
- if (err < 0) {
- printf("failed to create kprobe '%s' error '%s'\n",
- event, strerror(errno));
- return -1;
+#ifdef __x86_64__
+ if (strncmp(event, "sys_", 4) == 0) {
+ snprintf(buf, sizeof(buf),
+ "echo '%c:__x64_%s __x64_%s' >> /sys/kernel/debug/tracing/kprobe_events",
+ is_kprobe ? 'p' : 'r', event, event);
+ err = system(buf);
+ if (err >= 0) {
+ need_normal_check = false;
+ event_prefix = "__x64_";
+ }
+ }
+#endif
+ if (need_normal_check) {
+ snprintf(buf, sizeof(buf),
+ "echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
+ is_kprobe ? 'p' : 'r', event, event);
+ err = system(buf);
+ if (err < 0) {
+ printf("failed to create kprobe '%s' error '%s'\n",
+ event, strerror(errno));
+ return -1;
+ }
}
strcpy(buf, DEBUGFS);
strcat(buf, "events/kprobes/");
+ strcat(buf, event_prefix);
strcat(buf, event);
strcat(buf, "/id");
} else if (is_tracepoint) {
@@ -402,7 +420,7 @@
/* Keeping compatible with ELF maps section changes
* ------------------------------------------------
- * The program size of struct bpf_map_def is known by loader
+ * The program size of struct bpf_load_map_def is known by loader
* code, but struct stored in ELF file can be different.
*
* Unfortunately sym[i].st_size is zero. To calculate the
@@ -411,7 +429,7 @@
* symbols.
*/
map_sz_elf = data_maps->d_size / nr_maps;
- map_sz_copy = sizeof(struct bpf_map_def);
+ map_sz_copy = sizeof(struct bpf_load_map_def);
if (map_sz_elf < map_sz_copy) {
/*
* Backward compat, loading older ELF file with
@@ -430,8 +448,8 @@
/* Memcpy relevant part of ELF maps data to loader maps */
for (i = 0; i < nr_maps; i++) {
+ struct bpf_load_map_def *def;
unsigned char *addr, *end;
- struct bpf_map_def *def;
const char *map_name;
size_t offset;
@@ -446,9 +464,9 @@
/* Symbol value is offset into ELF maps section data area */
offset = sym[i].st_value;
- def = (struct bpf_map_def *)(data_maps->d_buf + offset);
+ def = (struct bpf_load_map_def *)(data_maps->d_buf + offset);
maps[i].elf_offset = offset;
- memset(&maps[i].def, 0, sizeof(struct bpf_map_def));
+ memset(&maps[i].def, 0, sizeof(struct bpf_load_map_def));
memcpy(&maps[i].def, def, map_sz_copy);
/* Verify no newer features were requested */
@@ -549,7 +567,6 @@
if (nr_maps < 0) {
printf("Error: Failed loading ELF maps (errno:%d):%s\n",
nr_maps, strerror(-nr_maps));
- ret = 1;
goto done;
}
if (load_maps(map_data, nr_maps, fixup_map))
@@ -615,7 +632,6 @@
}
}
- ret = 0;
done:
close(fd);
return ret;
@@ -650,66 +666,3 @@
}
}
}
-
-#define MAX_SYMS 300000
-static struct ksym syms[MAX_SYMS];
-static int sym_cnt;
-
-static int ksym_cmp(const void *p1, const void *p2)
-{
- return ((struct ksym *)p1)->addr - ((struct ksym *)p2)->addr;
-}
-
-int load_kallsyms(void)
-{
- FILE *f = fopen("/proc/kallsyms", "r");
- char func[256], buf[256];
- char symbol;
- void *addr;
- int i = 0;
-
- if (!f)
- return -ENOENT;
-
- while (!feof(f)) {
- if (!fgets(buf, sizeof(buf), f))
- break;
- if (sscanf(buf, "%p %c %s", &addr, &symbol, func) != 3)
- break;
- if (!addr)
- continue;
- syms[i].addr = (long) addr;
- syms[i].name = strdup(func);
- i++;
- }
- sym_cnt = i;
- qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp);
- return 0;
-}
-
-struct ksym *ksym_search(long key)
-{
- int start = 0, end = sym_cnt;
- int result;
-
- while (start < end) {
- size_t mid = start + (end - start) / 2;
-
- result = key - syms[mid].addr;
- if (result < 0)
- end = mid;
- else if (result > 0)
- start = mid + 1;
- else
- return &syms[mid];
- }
-
- if (start >= 1 && syms[start - 1].addr < key &&
- key < syms[start].addr)
- /* valid ksym */
- return &syms[start - 1];
-
- /* out of range. return _stext */
- return &syms[0];
-}
-
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 453c200..814894a 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -2,12 +2,12 @@
#ifndef __BPF_LOAD_H
#define __BPF_LOAD_H
-#include "libbpf.h"
+#include <bpf/bpf.h>
#define MAX_MAPS 32
#define MAX_PROGS 32
-struct bpf_map_def {
+struct bpf_load_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
@@ -21,7 +21,7 @@
int fd;
char *name;
size_t elf_offset;
- struct bpf_map_def def;
+ struct bpf_load_map_def def;
};
typedef void (*fixup_map_cb)(struct bpf_map_data *map, int idx);
@@ -54,12 +54,5 @@
int load_bpf_file_fixup_map(const char *path, fixup_map_cb fixup_map);
void read_trace_pipe(void);
-struct ksym {
- long addr;
- char *name;
-};
-
-int load_kallsyms(void);
-struct ksym *ksym_search(long key);
int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
#endif
diff --git a/samples/bpf/cookie_uid_helper_example.c b/samples/bpf/cookie_uid_helper_example.c
index 8eca27e..deb0e3e 100644
--- a/samples/bpf/cookie_uid_helper_example.c
+++ b/samples/bpf/cookie_uid_helper_example.c
@@ -51,7 +51,7 @@
#include <sys/types.h>
#include <unistd.h>
#include <bpf/bpf.h>
-#include "libbpf.h"
+#include "bpf_insn.h"
#define PORT 8888
diff --git a/samples/bpf/cpustat_user.c b/samples/bpf/cpustat_user.c
index 2b4cd1a..869a994 100644
--- a/samples/bpf/cpustat_user.c
+++ b/samples/bpf/cpustat_user.c
@@ -17,7 +17,7 @@
#include <sys/resource.h>
#include <sys/wait.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#define MAX_CPU 8
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index e29bd52..9854854 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -12,8 +12,10 @@
#include <sys/types.h>
#include <sys/socket.h>
+#include <bpf/bpf.h>
+
+#include "bpf_insn.h"
#include "bpf_load.h"
-#include "libbpf.h"
#include "sock_example.h"
#define BPF_F_PIN (1 << 0)
diff --git a/samples/bpf/lathist_user.c b/samples/bpf/lathist_user.c
index 6477bad..c8e88cc 100644
--- a/samples/bpf/lathist_user.c
+++ b/samples/bpf/lathist_user.c
@@ -10,7 +10,7 @@
#include <stdlib.h>
#include <signal.h>
#include <linux/bpf.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#define MAX_ENTRIES 20
diff --git a/samples/bpf/load_sock_ops.c b/samples/bpf/load_sock_ops.c
index e5da6cf..8ecb41e 100644
--- a/samples/bpf/load_sock_ops.c
+++ b/samples/bpf/load_sock_ops.c
@@ -8,7 +8,7 @@
#include <stdlib.h>
#include <string.h>
#include <linux/bpf.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include <unistd.h>
#include <errno.h>
diff --git a/samples/bpf/lwt_len_hist_user.c b/samples/bpf/lwt_len_hist_user.c
index 7fcb94c..587b68b 100644
--- a/samples/bpf/lwt_len_hist_user.c
+++ b/samples/bpf/lwt_len_hist_user.c
@@ -9,7 +9,7 @@
#include <errno.h>
#include <arpa/inet.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_util.h"
#define MAX_INDEX 64
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index 519d9af..38b7b1a 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -21,7 +21,7 @@
#include <arpa/inet.h>
#include <errno.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#define TEST_BIT(t) (1U << (t))
diff --git a/samples/bpf/offwaketime_user.c b/samples/bpf/offwaketime_user.c
index 512f87a..f06063a 100644
--- a/samples/bpf/offwaketime_user.c
+++ b/samples/bpf/offwaketime_user.c
@@ -17,6 +17,7 @@
#include <sys/resource.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "trace_helpers.h"
#define PRINT_RAW_ADDR 0
diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 4ed690b..60c2b73 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -22,6 +22,7 @@
#include "libbpf.h"
#include "bpf_load.h"
#include "perf-sys.h"
+#include "trace_helpers.h"
#define DEFAULT_FREQ 99
#define DEFAULT_SECS 5
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 6fc6e19..60ec467 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -9,10 +9,10 @@
* if (value)
* (*(u64*)value) += 1;
*
- * - attaches this program to eth0 raw socket
+ * - attaches this program to loopback interface "lo" raw socket
*
* - every second user space reads map[tcp], map[udp], map[icmp] to see
- * how many packets of given protocol were seen on eth0
+ * how many packets of given protocol were seen on "lo"
*/
#include <stdio.h>
#include <unistd.h>
@@ -26,7 +26,8 @@
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <stddef.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
+#include "bpf_insn.h"
#include "sock_example.h"
char bpf_log_buf[BPF_LOG_BUF_SIZE];
diff --git a/samples/bpf/sock_example.h b/samples/bpf/sock_example.h
index 772d5da..a27d757 100644
--- a/samples/bpf/sock_example.h
+++ b/samples/bpf/sock_example.h
@@ -9,7 +9,6 @@
#include <net/if.h>
#include <linux/if_packet.h>
#include <arpa/inet.h>
-#include "libbpf.h"
static inline int open_raw_sock(const char *name)
{
diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c
index 2be935c..93ec01c 100644
--- a/samples/bpf/sockex1_user.c
+++ b/samples/bpf/sockex1_user.c
@@ -2,7 +2,7 @@
#include <stdio.h>
#include <assert.h>
#include <linux/bpf.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "sock_example.h"
#include <unistd.h>
diff --git a/samples/bpf/sockex2_user.c b/samples/bpf/sockex2_user.c
index 44fe080..1d5c6e9 100644
--- a/samples/bpf/sockex2_user.c
+++ b/samples/bpf/sockex2_user.c
@@ -2,7 +2,7 @@
#include <stdio.h>
#include <assert.h>
#include <linux/bpf.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "sock_example.h"
#include <unistd.h>
diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c
index 495ee02..5ba3ae9 100644
--- a/samples/bpf/sockex3_user.c
+++ b/samples/bpf/sockex3_user.c
@@ -2,7 +2,7 @@
#include <stdio.h>
#include <assert.h>
#include <linux/bpf.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "sock_example.h"
#include <unistd.h>
diff --git a/samples/bpf/spintest_user.c b/samples/bpf/spintest_user.c
index 3d73621..8d3e9cf 100644
--- a/samples/bpf/spintest_user.c
+++ b/samples/bpf/spintest_user.c
@@ -7,6 +7,7 @@
#include <sys/resource.h>
#include "libbpf.h"
#include "bpf_load.h"
+#include "trace_helpers.h"
int main(int ac, char **argv)
{
diff --git a/samples/bpf/syscall_tp_user.c b/samples/bpf/syscall_tp_user.c
index 9169d32..1a1d005 100644
--- a/samples/bpf/syscall_tp_user.c
+++ b/samples/bpf/syscall_tp_user.c
@@ -16,7 +16,7 @@
#include <assert.h>
#include <stdbool.h>
#include <sys/resource.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
/* This program verifies bpf attachment to tracepoint sys_enter_* and sys_exit_*.
diff --git a/samples/bpf/task_fd_query_kern.c b/samples/bpf/task_fd_query_kern.c
new file mode 100644
index 0000000..f4b0a9e
--- /dev/null
+++ b/samples/bpf/task_fd_query_kern.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/version.h>
+#include <linux/ptrace.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+SEC("kprobe/blk_start_request")
+int bpf_prog1(struct pt_regs *ctx)
+{
+ return 0;
+}
+
+SEC("kretprobe/blk_account_io_completion")
+int bpf_prog2(struct pt_regs *ctx)
+{
+ return 0;
+}
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/task_fd_query_user.c b/samples/bpf/task_fd_query_user.c
new file mode 100644
index 0000000..8381d79
--- /dev/null
+++ b/samples/bpf/task_fd_query_user.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <string.h>
+#include <stdint.h>
+#include <fcntl.h>
+#include <linux/bpf.h>
+#include <sys/ioctl.h>
+#include <sys/resource.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "libbpf.h"
+#include "bpf_load.h"
+#include "bpf_util.h"
+#include "perf-sys.h"
+#include "trace_helpers.h"
+
+#define CHECK_PERROR_RET(condition) ({ \
+ int __ret = !!(condition); \
+ if (__ret) { \
+ printf("FAIL: %s:\n", __func__); \
+ perror(" "); \
+ return -1; \
+ } \
+})
+
+#define CHECK_AND_RET(condition) ({ \
+ int __ret = !!(condition); \
+ if (__ret) \
+ return -1; \
+})
+
+static __u64 ptr_to_u64(void *ptr)
+{
+ return (__u64) (unsigned long) ptr;
+}
+
+#define PMU_TYPE_FILE "/sys/bus/event_source/devices/%s/type"
+static int bpf_find_probe_type(const char *event_type)
+{
+ char buf[256];
+ int fd, ret;
+
+ ret = snprintf(buf, sizeof(buf), PMU_TYPE_FILE, event_type);
+ CHECK_PERROR_RET(ret < 0 || ret >= sizeof(buf));
+
+ fd = open(buf, O_RDONLY);
+ CHECK_PERROR_RET(fd < 0);
+
+ ret = read(fd, buf, sizeof(buf));
+ close(fd);
+ CHECK_PERROR_RET(ret < 0 || ret >= sizeof(buf));
+
+ errno = 0;
+ ret = (int)strtol(buf, NULL, 10);
+ CHECK_PERROR_RET(errno);
+ return ret;
+}
+
+#define PMU_RETPROBE_FILE "/sys/bus/event_source/devices/%s/format/retprobe"
+static int bpf_get_retprobe_bit(const char *event_type)
+{
+ char buf[256];
+ int fd, ret;
+
+ ret = snprintf(buf, sizeof(buf), PMU_RETPROBE_FILE, event_type);
+ CHECK_PERROR_RET(ret < 0 || ret >= sizeof(buf));
+
+ fd = open(buf, O_RDONLY);
+ CHECK_PERROR_RET(fd < 0);
+
+ ret = read(fd, buf, sizeof(buf));
+ close(fd);
+ CHECK_PERROR_RET(ret < 0 || ret >= sizeof(buf));
+ CHECK_PERROR_RET(strlen(buf) < strlen("config:"));
+
+ errno = 0;
+ ret = (int)strtol(buf + strlen("config:"), NULL, 10);
+ CHECK_PERROR_RET(errno);
+ return ret;
+}
+
+static int test_debug_fs_kprobe(int prog_fd_idx, const char *fn_name,
+ __u32 expected_fd_type)
+{
+ __u64 probe_offset, probe_addr;
+ __u32 len, prog_id, fd_type;
+ char buf[256];
+ int err;
+
+ len = sizeof(buf);
+ err = bpf_task_fd_query(getpid(), event_fd[prog_fd_idx], 0, buf, &len,
+ &prog_id, &fd_type, &probe_offset,
+ &probe_addr);
+ if (err < 0) {
+ printf("FAIL: %s, for event_fd idx %d, fn_name %s\n",
+ __func__, prog_fd_idx, fn_name);
+ perror(" :");
+ return -1;
+ }
+ if (strcmp(buf, fn_name) != 0 ||
+ fd_type != expected_fd_type ||
+ probe_offset != 0x0 || probe_addr != 0x0) {
+ printf("FAIL: bpf_trace_event_query(event_fd[%d]):\n",
+ prog_fd_idx);
+ printf("buf: %s, fd_type: %u, probe_offset: 0x%llx,"
+ " probe_addr: 0x%llx\n",
+ buf, fd_type, probe_offset, probe_addr);
+ return -1;
+ }
+ return 0;
+}
+
+static int test_nondebug_fs_kuprobe_common(const char *event_type,
+ const char *name, __u64 offset, __u64 addr, bool is_return,
+ char *buf, __u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
+ __u64 *probe_offset, __u64 *probe_addr)
+{
+ int is_return_bit = bpf_get_retprobe_bit(event_type);
+ int type = bpf_find_probe_type(event_type);
+ struct perf_event_attr attr = {};
+ int fd;
+
+ if (type < 0 || is_return_bit < 0) {
+ printf("FAIL: %s incorrect type (%d) or is_return_bit (%d)\n",
+ __func__, type, is_return_bit);
+ return -1;
+ }
+
+ attr.sample_period = 1;
+ attr.wakeup_events = 1;
+ if (is_return)
+ attr.config |= 1 << is_return_bit;
+
+ if (name) {
+ attr.config1 = ptr_to_u64((void *)name);
+ attr.config2 = offset;
+ } else {
+ attr.config1 = 0;
+ attr.config2 = addr;
+ }
+ attr.size = sizeof(attr);
+ attr.type = type;
+
+ fd = sys_perf_event_open(&attr, -1, 0, -1, 0);
+ CHECK_PERROR_RET(fd < 0);
+
+ CHECK_PERROR_RET(ioctl(fd, PERF_EVENT_IOC_ENABLE, 0) < 0);
+ CHECK_PERROR_RET(ioctl(fd, PERF_EVENT_IOC_SET_BPF, prog_fd[0]) < 0);
+ CHECK_PERROR_RET(bpf_task_fd_query(getpid(), fd, 0, buf, buf_len,
+ prog_id, fd_type, probe_offset, probe_addr) < 0);
+
+ return 0;
+}
+
+static int test_nondebug_fs_probe(const char *event_type, const char *name,
+ __u64 offset, __u64 addr, bool is_return,
+ __u32 expected_fd_type,
+ __u32 expected_ret_fd_type,
+ char *buf, __u32 buf_len)
+{
+ __u64 probe_offset, probe_addr;
+ __u32 prog_id, fd_type;
+ int err;
+
+ err = test_nondebug_fs_kuprobe_common(event_type, name,
+ offset, addr, is_return,
+ buf, &buf_len, &prog_id,
+ &fd_type, &probe_offset,
+ &probe_addr);
+ if (err < 0) {
+ printf("FAIL: %s, "
+ "for name %s, offset 0x%llx, addr 0x%llx, is_return %d\n",
+ __func__, name ? name : "", offset, addr, is_return);
+ perror(" :");
+ return -1;
+ }
+ if ((is_return && fd_type != expected_ret_fd_type) ||
+ (!is_return && fd_type != expected_fd_type)) {
+ printf("FAIL: %s, incorrect fd_type %u\n",
+ __func__, fd_type);
+ return -1;
+ }
+ if (name) {
+ if (strcmp(name, buf) != 0) {
+ printf("FAIL: %s, incorrect buf %s\n", __func__, buf);
+ return -1;
+ }
+ if (probe_offset != offset) {
+ printf("FAIL: %s, incorrect probe_offset 0x%llx\n",
+ __func__, probe_offset);
+ return -1;
+ }
+ } else {
+ if (buf_len != 0) {
+ printf("FAIL: %s, incorrect buf %p\n",
+ __func__, buf);
+ return -1;
+ }
+
+ if (probe_addr != addr) {
+ printf("FAIL: %s, incorrect probe_addr 0x%llx\n",
+ __func__, probe_addr);
+ return -1;
+ }
+ }
+ return 0;
+}
+
+static int test_debug_fs_uprobe(char *binary_path, long offset, bool is_return)
+{
+ const char *event_type = "uprobe";
+ struct perf_event_attr attr = {};
+ char buf[256], event_alias[256];
+ __u64 probe_offset, probe_addr;
+ __u32 len, prog_id, fd_type;
+ int err, res, kfd, efd;
+ ssize_t bytes;
+
+ snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/%s_events",
+ event_type);
+ kfd = open(buf, O_WRONLY | O_APPEND, 0);
+ CHECK_PERROR_RET(kfd < 0);
+
+ res = snprintf(event_alias, sizeof(event_alias), "test_%d", getpid());
+ CHECK_PERROR_RET(res < 0 || res >= sizeof(event_alias));
+
+ res = snprintf(buf, sizeof(buf), "%c:%ss/%s %s:0x%lx",
+ is_return ? 'r' : 'p', event_type, event_alias,
+ binary_path, offset);
+ CHECK_PERROR_RET(res < 0 || res >= sizeof(buf));
+ CHECK_PERROR_RET(write(kfd, buf, strlen(buf)) < 0);
+
+ close(kfd);
+ kfd = -1;
+
+ snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s/id",
+ event_type, event_alias);
+ efd = open(buf, O_RDONLY, 0);
+ CHECK_PERROR_RET(efd < 0);
+
+ bytes = read(efd, buf, sizeof(buf));
+ CHECK_PERROR_RET(bytes <= 0 || bytes >= sizeof(buf));
+ close(efd);
+ buf[bytes] = '\0';
+
+ attr.config = strtol(buf, NULL, 0);
+ attr.type = PERF_TYPE_TRACEPOINT;
+ attr.sample_period = 1;
+ attr.wakeup_events = 1;
+ kfd = sys_perf_event_open(&attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+ CHECK_PERROR_RET(kfd < 0);
+ CHECK_PERROR_RET(ioctl(kfd, PERF_EVENT_IOC_SET_BPF, prog_fd[0]) < 0);
+ CHECK_PERROR_RET(ioctl(kfd, PERF_EVENT_IOC_ENABLE, 0) < 0);
+
+ len = sizeof(buf);
+ err = bpf_task_fd_query(getpid(), kfd, 0, buf, &len,
+ &prog_id, &fd_type, &probe_offset,
+ &probe_addr);
+ if (err < 0) {
+ printf("FAIL: %s, binary_path %s\n", __func__, binary_path);
+ perror(" :");
+ return -1;
+ }
+ if ((is_return && fd_type != BPF_FD_TYPE_URETPROBE) ||
+ (!is_return && fd_type != BPF_FD_TYPE_UPROBE)) {
+ printf("FAIL: %s, incorrect fd_type %u\n", __func__,
+ fd_type);
+ return -1;
+ }
+ if (strcmp(binary_path, buf) != 0) {
+ printf("FAIL: %s, incorrect buf %s\n", __func__, buf);
+ return -1;
+ }
+ if (probe_offset != offset) {
+ printf("FAIL: %s, incorrect probe_offset 0x%llx\n", __func__,
+ probe_offset);
+ return -1;
+ }
+
+ close(kfd);
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ struct rlimit r = {1024*1024, RLIM_INFINITY};
+ extern char __executable_start;
+ char filename[256], buf[256];
+ __u64 uprobe_file_offset;
+
+ snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+ if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+ perror("setrlimit(RLIMIT_MEMLOCK)");
+ return 1;
+ }
+
+ if (load_kallsyms()) {
+ printf("failed to process /proc/kallsyms\n");
+ return 1;
+ }
+
+ if (load_bpf_file(filename)) {
+ printf("%s", bpf_log_buf);
+ return 1;
+ }
+
+ /* test two functions in the corresponding *_kern.c file */
+ CHECK_AND_RET(test_debug_fs_kprobe(0, "blk_start_request",
+ BPF_FD_TYPE_KPROBE));
+ CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_completion",
+ BPF_FD_TYPE_KRETPROBE));
+
+ /* test nondebug fs kprobe */
+ CHECK_AND_RET(test_nondebug_fs_probe("kprobe", "bpf_check", 0x0, 0x0,
+ false, BPF_FD_TYPE_KPROBE,
+ BPF_FD_TYPE_KRETPROBE,
+ buf, sizeof(buf)));
+#ifdef __x86_64__
+ /* set a kprobe on "bpf_check + 0x5", which is x64 specific */
+ CHECK_AND_RET(test_nondebug_fs_probe("kprobe", "bpf_check", 0x5, 0x0,
+ false, BPF_FD_TYPE_KPROBE,
+ BPF_FD_TYPE_KRETPROBE,
+ buf, sizeof(buf)));
+#endif
+ CHECK_AND_RET(test_nondebug_fs_probe("kprobe", "bpf_check", 0x0, 0x0,
+ true, BPF_FD_TYPE_KPROBE,
+ BPF_FD_TYPE_KRETPROBE,
+ buf, sizeof(buf)));
+ CHECK_AND_RET(test_nondebug_fs_probe("kprobe", NULL, 0x0,
+ ksym_get_addr("bpf_check"), false,
+ BPF_FD_TYPE_KPROBE,
+ BPF_FD_TYPE_KRETPROBE,
+ buf, sizeof(buf)));
+ CHECK_AND_RET(test_nondebug_fs_probe("kprobe", NULL, 0x0,
+ ksym_get_addr("bpf_check"), false,
+ BPF_FD_TYPE_KPROBE,
+ BPF_FD_TYPE_KRETPROBE,
+ NULL, 0));
+ CHECK_AND_RET(test_nondebug_fs_probe("kprobe", NULL, 0x0,
+ ksym_get_addr("bpf_check"), true,
+ BPF_FD_TYPE_KPROBE,
+ BPF_FD_TYPE_KRETPROBE,
+ buf, sizeof(buf)));
+ CHECK_AND_RET(test_nondebug_fs_probe("kprobe", NULL, 0x0,
+ ksym_get_addr("bpf_check"), true,
+ BPF_FD_TYPE_KPROBE,
+ BPF_FD_TYPE_KRETPROBE,
+ 0, 0));
+
+ /* test nondebug fs uprobe */
+ /* the calculation of uprobe file offset is based on gcc 7.3.1 on x64
+ * and the default linker script, which defines __executable_start as
+ * the start of the .text section. The calculation could be different
+ * on different systems with different compilers. The right way is
+ * to parse the ELF file. We took a shortcut here.
+ */
+ uprobe_file_offset = (__u64)main - (__u64)&__executable_start;
+ CHECK_AND_RET(test_nondebug_fs_probe("uprobe", (char *)argv[0],
+ uprobe_file_offset, 0x0, false,
+ BPF_FD_TYPE_UPROBE,
+ BPF_FD_TYPE_URETPROBE,
+ buf, sizeof(buf)));
+ CHECK_AND_RET(test_nondebug_fs_probe("uprobe", (char *)argv[0],
+ uprobe_file_offset, 0x0, true,
+ BPF_FD_TYPE_UPROBE,
+ BPF_FD_TYPE_URETPROBE,
+ buf, sizeof(buf)));
+
+ /* test debug fs uprobe */
+ CHECK_AND_RET(test_debug_fs_uprobe((char *)argv[0], uprobe_file_offset,
+ false));
+ CHECK_AND_RET(test_debug_fs_uprobe((char *)argv[0], uprobe_file_offset,
+ true));
+
+ return 0;
+}
diff --git a/samples/bpf/tc_l2_redirect_user.c b/samples/bpf/tc_l2_redirect_user.c
index 28995a7..7ec45c3 100644
--- a/samples/bpf/tc_l2_redirect_user.c
+++ b/samples/bpf/tc_l2_redirect_user.c
@@ -13,7 +13,7 @@
#include <string.h>
#include <errno.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
static void usage(void)
{
diff --git a/samples/bpf/test_cgrp2_array_pin.c b/samples/bpf/test_cgrp2_array_pin.c
index 8a1b8b5..2421842 100644
--- a/samples/bpf/test_cgrp2_array_pin.c
+++ b/samples/bpf/test_cgrp2_array_pin.c
@@ -14,7 +14,7 @@
#include <errno.h>
#include <fcntl.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
static void usage(void)
{
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
index 4bfcaf9..20fbd12 100644
--- a/samples/bpf/test_cgrp2_attach.c
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -28,8 +28,9 @@
#include <fcntl.h>
#include <linux/bpf.h>
+#include <bpf/bpf.h>
-#include "libbpf.h"
+#include "bpf_insn.h"
enum {
MAP_KEY_PACKETS,
diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c
index 1af412e..b453e6a 100644
--- a/samples/bpf/test_cgrp2_attach2.c
+++ b/samples/bpf/test_cgrp2_attach2.c
@@ -24,8 +24,9 @@
#include <unistd.h>
#include <linux/bpf.h>
+#include <bpf/bpf.h>
-#include "libbpf.h"
+#include "bpf_insn.h"
#include "cgroup_helpers.h"
#define FOO "/foo"
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index e79594d..b0811da 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -21,8 +21,9 @@
#include <net/if.h>
#include <inttypes.h>
#include <linux/bpf.h>
+#include <bpf/bpf.h>
-#include "libbpf.h"
+#include "bpf_insn.h"
char bpf_log_buf[BPF_LOG_BUF_SIZE];
diff --git a/samples/bpf/test_cgrp2_sock2.c b/samples/bpf/test_cgrp2_sock2.c
index e53f1f6..3b5be23 100644
--- a/samples/bpf/test_cgrp2_sock2.c
+++ b/samples/bpf/test_cgrp2_sock2.c
@@ -19,8 +19,9 @@
#include <fcntl.h>
#include <net/if.h>
#include <linux/bpf.h>
+#include <bpf/bpf.h>
-#include "libbpf.h"
+#include "bpf_insn.h"
#include "bpf_load.h"
static int usage(const char *argv0)
diff --git a/samples/bpf/test_current_task_under_cgroup_user.c b/samples/bpf/test_current_task_under_cgroup_user.c
index 65b5fb5..4be4874 100644
--- a/samples/bpf/test_current_task_under_cgroup_user.c
+++ b/samples/bpf/test_current_task_under_cgroup_user.c
@@ -9,7 +9,7 @@
#include <stdio.h>
#include <linux/bpf.h>
#include <unistd.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include <linux/bpf.h>
#include "cgroup_helpers.h"
diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 73c3571..eec3e25 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -21,7 +21,7 @@
#include <stdlib.h>
#include <time.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_util.h"
#define min(a, b) ((a) < (b) ? (a) : (b))
diff --git a/samples/bpf/test_map_in_map_user.c b/samples/bpf/test_map_in_map_user.c
index 1aca185..e308858 100644
--- a/samples/bpf/test_map_in_map_user.c
+++ b/samples/bpf/test_map_in_map_user.c
@@ -13,7 +13,7 @@
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#define PORT_A (map_fd[0])
diff --git a/samples/bpf/test_overhead_user.c b/samples/bpf/test_overhead_user.c
index e1d35e0..6caf47a 100644
--- a/samples/bpf/test_overhead_user.c
+++ b/samples/bpf/test_overhead_user.c
@@ -19,7 +19,7 @@
#include <string.h>
#include <time.h>
#include <sys/resource.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#define MAX_CNT 1000000
diff --git a/samples/bpf/test_probe_write_user_user.c b/samples/bpf/test_probe_write_user_user.c
index bf8e3a9..045eb5e 100644
--- a/samples/bpf/test_probe_write_user_user.c
+++ b/samples/bpf/test_probe_write_user_user.c
@@ -3,7 +3,7 @@
#include <assert.h>
#include <linux/bpf.h>
#include <unistd.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include <sys/socket.h>
#include <string.h>
diff --git a/samples/bpf/test_tunnel_bpf.sh b/samples/bpf/test_tunnel_bpf.sh
deleted file mode 100755
index c265863..0000000
--- a/samples/bpf/test_tunnel_bpf.sh
+++ /dev/null
@@ -1,319 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-# In Namespace 0 (at_ns0) using native tunnel
-# Overlay IP: 10.1.1.100
-# local 192.16.1.100 remote 192.16.1.200
-# veth0 IP: 172.16.1.100, tunnel dev <type>00
-
-# Out of Namespace using BPF set/get on lwtunnel
-# Overlay IP: 10.1.1.200
-# local 172.16.1.200 remote 172.16.1.100
-# veth1 IP: 172.16.1.200, tunnel dev <type>11
-
-function config_device {
- ip netns add at_ns0
- ip link add veth0 type veth peer name veth1
- ip link set veth0 netns at_ns0
- ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
- ip netns exec at_ns0 ip link set dev veth0 up
- ip link set dev veth1 up mtu 1500
- ip addr add dev veth1 172.16.1.200/24
-}
-
-function add_gre_tunnel {
- # in namespace
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE seq key 2 \
- local 172.16.1.100 remote 172.16.1.200
- ip netns exec at_ns0 ip link set dev $DEV_NS up
- ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
- # out of namespace
- ip link add dev $DEV type $TYPE key 2 external
- ip link set dev $DEV up
- ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6gretap_tunnel {
-
- # assign ipv6 address
- ip netns exec at_ns0 ip addr add ::11/96 dev veth0
- ip netns exec at_ns0 ip link set dev veth0 up
- ip addr add dev veth1 ::22/96
- ip link set dev veth1 up
-
- # in namespace
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
- local ::11 remote ::22
-
- ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
- ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
- ip netns exec at_ns0 ip link set dev $DEV_NS up
-
- # out of namespace
- ip link add dev $DEV type $TYPE external
- ip addr add dev $DEV 10.1.1.200/24
- ip addr add dev $DEV fc80::200/24
- ip link set dev $DEV up
-}
-
-function add_erspan_tunnel {
- # in namespace
- if [ "$1" == "v1" ]; then
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE seq key 2 \
- local 172.16.1.100 remote 172.16.1.200 \
- erspan_ver 1 erspan 123
- else
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE seq key 2 \
- local 172.16.1.100 remote 172.16.1.200 \
- erspan_ver 2 erspan_dir egress erspan_hwid 3
- fi
- ip netns exec at_ns0 ip link set dev $DEV_NS up
- ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
- # out of namespace
- ip link add dev $DEV type $TYPE external
- ip link set dev $DEV up
- ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ip6erspan_tunnel {
-
- # assign ipv6 address
- ip netns exec at_ns0 ip addr add ::11/96 dev veth0
- ip netns exec at_ns0 ip link set dev veth0 up
- ip addr add dev veth1 ::22/96
- ip link set dev veth1 up
-
- # in namespace
- if [ "$1" == "v1" ]; then
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE seq key 2 \
- local ::11 remote ::22 \
- erspan_ver 1 erspan 123
- else
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE seq key 2 \
- local ::11 remote ::22 \
- erspan_ver 2 erspan_dir egress erspan_hwid 7
- fi
- ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
- ip netns exec at_ns0 ip link set dev $DEV_NS up
-
- # out of namespace
- ip link add dev $DEV type $TYPE external
- ip addr add dev $DEV 10.1.1.200/24
- ip link set dev $DEV up
-}
-
-function add_vxlan_tunnel {
- # Set static ARP entry here because iptables set-mark works
- # on L3 packet, as a result not applying to ARP packets,
- # causing errors at get_tunnel_{key/opt}.
-
- # in namespace
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE id 2 dstport 4789 gbp remote 172.16.1.200
- ip netns exec at_ns0 ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
- ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
- ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
- ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
-
- # out of namespace
- ip link add dev $DEV type $TYPE external gbp dstport 4789
- ip link set dev $DEV address 52:54:00:d9:02:00 up
- ip addr add dev $DEV 10.1.1.200/24
- arp -s 10.1.1.100 52:54:00:d9:01:00
-}
-
-function add_geneve_tunnel {
- # in namespace
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE id 2 dstport 6081 remote 172.16.1.200
- ip netns exec at_ns0 ip link set dev $DEV_NS up
- ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
- # out of namespace
- ip link add dev $DEV type $TYPE dstport 6081 external
- ip link set dev $DEV up
- ip addr add dev $DEV 10.1.1.200/24
-}
-
-function add_ipip_tunnel {
- # in namespace
- ip netns exec at_ns0 \
- ip link add dev $DEV_NS type $TYPE local 172.16.1.100 remote 172.16.1.200
- ip netns exec at_ns0 ip link set dev $DEV_NS up
- ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
-
- # out of namespace
- ip link add dev $DEV type $TYPE external
- ip link set dev $DEV up
- ip addr add dev $DEV 10.1.1.200/24
-}
-
-function attach_bpf {
- DEV=$1
- SET_TUNNEL=$2
- GET_TUNNEL=$3
- tc qdisc add dev $DEV clsact
- tc filter add dev $DEV egress bpf da obj tcbpf2_kern.o sec $SET_TUNNEL
- tc filter add dev $DEV ingress bpf da obj tcbpf2_kern.o sec $GET_TUNNEL
-}
-
-function test_gre {
- TYPE=gretap
- DEV_NS=gretap00
- DEV=gretap11
- config_device
- add_gre_tunnel
- attach_bpf $DEV gre_set_tunnel gre_get_tunnel
- ping -c 1 10.1.1.100
- ip netns exec at_ns0 ping -c 1 10.1.1.200
- cleanup
-}
-
-function test_ip6gre {
- TYPE=ip6gre
- DEV_NS=ip6gre00
- DEV=ip6gre11
- config_device
- # reuse the ip6gretap function
- add_ip6gretap_tunnel
- attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
- # underlay
- ping6 -c 4 ::11
- # overlay: ipv4 over ipv6
- ip netns exec at_ns0 ping -c 1 10.1.1.200
- ping -c 1 10.1.1.100
- # overlay: ipv6 over ipv6
- ip netns exec at_ns0 ping6 -c 1 fc80::200
- cleanup
-}
-
-function test_ip6gretap {
- TYPE=ip6gretap
- DEV_NS=ip6gretap00
- DEV=ip6gretap11
- config_device
- add_ip6gretap_tunnel
- attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
- # underlay
- ping6 -c 4 ::11
- # overlay: ipv4 over ipv6
- ip netns exec at_ns0 ping -i .2 -c 1 10.1.1.200
- ping -c 1 10.1.1.100
- # overlay: ipv6 over ipv6
- ip netns exec at_ns0 ping6 -c 1 fc80::200
- cleanup
-}
-
-function test_erspan {
- TYPE=erspan
- DEV_NS=erspan00
- DEV=erspan11
- config_device
- add_erspan_tunnel $1
- attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
- ping -c 1 10.1.1.100
- ip netns exec at_ns0 ping -c 1 10.1.1.200
- cleanup
-}
-
-function test_ip6erspan {
- TYPE=ip6erspan
- DEV_NS=ip6erspan00
- DEV=ip6erspan11
- config_device
- add_ip6erspan_tunnel $1
- attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
- ping6 -c 3 ::11
- ip netns exec at_ns0 ping -c 1 10.1.1.200
- cleanup
-}
-
-function test_vxlan {
- TYPE=vxlan
- DEV_NS=vxlan00
- DEV=vxlan11
- config_device
- add_vxlan_tunnel
- attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
- ping -c 1 10.1.1.100
- ip netns exec at_ns0 ping -c 1 10.1.1.200
- cleanup
-}
-
-function test_geneve {
- TYPE=geneve
- DEV_NS=geneve00
- DEV=geneve11
- config_device
- add_geneve_tunnel
- attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
- ping -c 1 10.1.1.100
- ip netns exec at_ns0 ping -c 1 10.1.1.200
- cleanup
-}
-
-function test_ipip {
- TYPE=ipip
- DEV_NS=ipip00
- DEV=ipip11
- config_device
- tcpdump -nei veth1 &
- cat /sys/kernel/debug/tracing/trace_pipe &
- add_ipip_tunnel
- ethtool -K veth1 gso off gro off rx off tx off
- ip link set dev veth1 mtu 1500
- attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
- ping -c 1 10.1.1.100
- ip netns exec at_ns0 ping -c 1 10.1.1.200
- ip netns exec at_ns0 iperf -sD -p 5200 > /dev/null
- sleep 0.2
- iperf -c 10.1.1.100 -n 5k -p 5200
- cleanup
-}
-
-function cleanup {
- set +ex
- pkill iperf
- ip netns delete at_ns0
- ip link del veth1
- ip link del ipip11
- ip link del gretap11
- ip link del ip6gre11
- ip link del ip6gretap11
- ip link del vxlan11
- ip link del geneve11
- ip link del erspan11
- ip link del ip6erspan11
- pkill tcpdump
- pkill cat
- set -ex
-}
-
-trap cleanup 0 2 3 6 9
-cleanup
-echo "Testing GRE tunnel..."
-test_gre
-echo "Testing IP6GRE tunnel..."
-test_ip6gre
-echo "Testing IP6GRETAP tunnel..."
-test_ip6gretap
-echo "Testing ERSPAN tunnel..."
-test_erspan v1
-test_erspan v2
-echo "Testing IP6ERSPAN tunnel..."
-test_ip6erspan v1
-test_ip6erspan v2
-echo "Testing VXLAN tunnel..."
-test_vxlan
-echo "Testing GENEVE tunnel..."
-test_geneve
-echo "Testing IPIP tunnel..."
-test_ipip
-echo "*** PASS ***"
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 56f7a25..1fa1bec 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -21,6 +21,7 @@
#include "libbpf.h"
#include "bpf_load.h"
#include "perf-sys.h"
+#include "trace_helpers.h"
#define SAMPLE_FREQ 50
diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
index ccca1e3..4837d73 100644
--- a/samples/bpf/trace_output_user.c
+++ b/samples/bpf/trace_output_user.c
@@ -18,103 +18,13 @@
#include <sys/mman.h>
#include <time.h>
#include <signal.h>
-#include "libbpf.h"
+#include <libbpf.h>
#include "bpf_load.h"
#include "perf-sys.h"
+#include "trace_helpers.h"
static int pmu_fd;
-int page_size;
-int page_cnt = 8;
-volatile struct perf_event_mmap_page *header;
-
-typedef void (*print_fn)(void *data, int size);
-
-static int perf_event_mmap(int fd)
-{
- void *base;
- int mmap_size;
-
- page_size = getpagesize();
- mmap_size = page_size * (page_cnt + 1);
-
- base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
- if (base == MAP_FAILED) {
- printf("mmap err\n");
- return -1;
- }
-
- header = base;
- return 0;
-}
-
-static int perf_event_poll(int fd)
-{
- struct pollfd pfd = { .fd = fd, .events = POLLIN };
-
- return poll(&pfd, 1, 1000);
-}
-
-struct perf_event_sample {
- struct perf_event_header header;
- __u32 size;
- char data[];
-};
-
-static void perf_event_read(print_fn fn)
-{
- __u64 data_tail = header->data_tail;
- __u64 data_head = header->data_head;
- __u64 buffer_size = page_cnt * page_size;
- void *base, *begin, *end;
- char buf[256];
-
- asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
- if (data_head == data_tail)
- return;
-
- base = ((char *)header) + page_size;
-
- begin = base + data_tail % buffer_size;
- end = base + data_head % buffer_size;
-
- while (begin != end) {
- struct perf_event_sample *e;
-
- e = begin;
- if (begin + e->header.size > base + buffer_size) {
- long len = base + buffer_size - begin;
-
- assert(len < e->header.size);
- memcpy(buf, begin, len);
- memcpy(buf + len, base, e->header.size - len);
- e = (void *) buf;
- begin = base + e->header.size - len;
- } else if (begin + e->header.size == base + buffer_size) {
- begin = base;
- } else {
- begin += e->header.size;
- }
-
- if (e->header.type == PERF_RECORD_SAMPLE) {
- fn(e->data, e->size);
- } else if (e->header.type == PERF_RECORD_LOST) {
- struct {
- struct perf_event_header header;
- __u64 id;
- __u64 lost;
- } *lost = (void *) e;
- printf("lost %lld events\n", lost->lost);
- } else {
- printf("unknown event type=%d size=%d\n",
- e->header.type, e->header.size);
- }
- }
-
- __sync_synchronize(); /* smp_mb() */
- header->data_tail = data_head;
-}
-
static __u64 time_get_ns(void)
{
struct timespec ts;
@@ -127,7 +37,7 @@
#define MAX_CNT 100000ll
-static void print_bpf_output(void *data, int size)
+static int print_bpf_output(void *data, int size)
{
static __u64 cnt;
struct {
@@ -138,7 +48,7 @@
if (e->cookie != 0x12345678) {
printf("BUG pid %llx cookie %llx sized %d\n",
e->pid, e->cookie, size);
- kill(0, SIGINT);
+ return LIBBPF_PERF_EVENT_ERROR;
}
cnt++;
@@ -146,8 +56,10 @@
if (cnt == MAX_CNT) {
printf("recv %lld events per sec\n",
MAX_CNT * 1000000000ll / (time_get_ns() - start_time));
- kill(0, SIGINT);
+ return LIBBPF_PERF_EVENT_DONE;
}
+
+ return LIBBPF_PERF_EVENT_CONT;
}
static void test_bpf_perf_event(void)
@@ -170,6 +82,7 @@
{
char filename[256];
FILE *f;
+ int ret;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
@@ -187,10 +100,7 @@
(void) f;
start_time = time_get_ns();
- for (;;) {
- perf_event_poll(pmu_fd);
- perf_event_read(print_bpf_output);
- }
-
- return 0;
+ ret = perf_event_poller(pmu_fd, print_bpf_output);
+ kill(0, SIGINT);
+ return ret;
}
diff --git a/samples/bpf/tracex1_user.c b/samples/bpf/tracex1_user.c
index 3dcb475..af8c206 100644
--- a/samples/bpf/tracex1_user.c
+++ b/samples/bpf/tracex1_user.c
@@ -2,7 +2,7 @@
#include <stdio.h>
#include <linux/bpf.h>
#include <unistd.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
int main(int ac, char **argv)
diff --git a/samples/bpf/tracex2_user.c b/samples/bpf/tracex2_user.c
index efb5e61..1a81e6a 100644
--- a/samples/bpf/tracex2_user.c
+++ b/samples/bpf/tracex2_user.c
@@ -7,7 +7,7 @@
#include <string.h>
#include <sys/resource.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "bpf_util.h"
diff --git a/samples/bpf/tracex3_user.c b/samples/bpf/tracex3_user.c
index fe37223..6c6b10f 100644
--- a/samples/bpf/tracex3_user.c
+++ b/samples/bpf/tracex3_user.c
@@ -13,7 +13,7 @@
#include <linux/bpf.h>
#include <sys/resource.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "bpf_util.h"
diff --git a/samples/bpf/tracex4_user.c b/samples/bpf/tracex4_user.c
index 22c644f..14625c8 100644
--- a/samples/bpf/tracex4_user.c
+++ b/samples/bpf/tracex4_user.c
@@ -14,7 +14,7 @@
#include <linux/bpf.h>
#include <sys/resource.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
struct pair {
diff --git a/samples/bpf/tracex5_user.c b/samples/bpf/tracex5_user.c
index 4e2774b..c4ab91c 100644
--- a/samples/bpf/tracex5_user.c
+++ b/samples/bpf/tracex5_user.c
@@ -5,7 +5,7 @@
#include <linux/filter.h>
#include <linux/seccomp.h>
#include <sys/prctl.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include <sys/resource.h>
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index 89ab8d4..4bb3c83 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -16,7 +16,7 @@
#include <unistd.h>
#include "bpf_load.h"
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "perf-sys.h"
#define SAMPLE_PERIOD 0x7fffffffffffffffULL
diff --git a/samples/bpf/tracex7_user.c b/samples/bpf/tracex7_user.c
index 8a52ac4..ea6dae7 100644
--- a/samples/bpf/tracex7_user.c
+++ b/samples/bpf/tracex7_user.c
@@ -3,7 +3,7 @@
#include <stdio.h>
#include <linux/bpf.h>
#include <unistd.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
int main(int argc, char **argv)
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index b901ee2..b02c5315 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -16,9 +16,9 @@
#include <libgen.h>
#include <sys/resource.h>
-#include "bpf_load.h"
#include "bpf_util.h"
-#include "libbpf.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
static int ifindex;
static __u32 xdp_flags;
@@ -31,7 +31,7 @@
/* simple per-protocol drop counter
*/
-static void poll_stats(int interval)
+static void poll_stats(int map_fd, int interval)
{
unsigned int nr_cpus = bpf_num_possible_cpus();
const unsigned int nr_keys = 256;
@@ -47,7 +47,7 @@
for (key = 0; key < nr_keys; key++) {
__u64 sum = 0;
- assert(bpf_map_lookup_elem(map_fd[0], &key, values) == 0);
+ assert(bpf_map_lookup_elem(map_fd, &key, values) == 0);
for (i = 0; i < nr_cpus; i++)
sum += (values[i] - prev[key][i]);
if (sum)
@@ -71,9 +71,14 @@
int main(int argc, char **argv)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ struct bpf_prog_load_attr prog_load_attr = {
+ .prog_type = BPF_PROG_TYPE_XDP,
+ };
const char *optstr = "SN";
+ int prog_fd, map_fd, opt;
+ struct bpf_object *obj;
+ struct bpf_map *map;
char filename[256];
- int opt;
while ((opt = getopt(argc, argv, optstr)) != -1) {
switch (opt) {
@@ -102,13 +107,19 @@
ifindex = strtoul(argv[optind], NULL, 0);
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+ prog_load_attr.file = filename;
- if (load_bpf_file(filename)) {
- printf("%s", bpf_log_buf);
+ if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+ return 1;
+
+ map = bpf_map__next(NULL, obj);
+ if (!map) {
+ printf("finding a map in obj file failed\n");
return 1;
}
+ map_fd = bpf_map__fd(map);
- if (!prog_fd[0]) {
+ if (!prog_fd) {
printf("load_bpf_file: %s\n", strerror(errno));
return 1;
}
@@ -116,12 +127,12 @@
signal(SIGINT, int_exit);
signal(SIGTERM, int_exit);
- if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+ if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
printf("link set xdp fd failed\n");
return 1;
}
- poll_stats(2);
+ poll_stats(map_fd, 2);
return 0;
}
diff --git a/samples/bpf/xdp_adjust_tail_kern.c b/samples/bpf/xdp_adjust_tail_kern.c
new file mode 100644
index 0000000..411fdb2
--- /dev/null
+++ b/samples/bpf/xdp_adjust_tail_kern.c
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program shows how to use bpf_xdp_adjust_tail() by
+ * generating ICMPv4 "packet to big" (unreachable/ df bit set frag needed
+ * to be more preice in case of v4)" where receiving packets bigger then
+ * 600 bytes.
+ */
+#define KBUILD_MODNAME "foo"
+#include <uapi/linux/bpf.h>
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/if_vlan.h>
+#include <linux/ip.h>
+#include <linux/icmp.h>
+#include "bpf_helpers.h"
+
+#define DEFAULT_TTL 64
+#define MAX_PCKT_SIZE 600
+#define ICMP_TOOBIG_SIZE 98
+#define ICMP_TOOBIG_PAYLOAD_SIZE 92
+
+struct bpf_map_def SEC("maps") icmpcnt = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(__u64),
+ .max_entries = 1,
+};
+
+static __always_inline void count_icmp(void)
+{
+ u64 key = 0;
+ u64 *icmp_count;
+
+ icmp_count = bpf_map_lookup_elem(&icmpcnt, &key);
+ if (icmp_count)
+ *icmp_count += 1;
+}
+
+static __always_inline void swap_mac(void *data, struct ethhdr *orig_eth)
+{
+ struct ethhdr *eth;
+
+ eth = data;
+ memcpy(eth->h_source, orig_eth->h_dest, ETH_ALEN);
+ memcpy(eth->h_dest, orig_eth->h_source, ETH_ALEN);
+ eth->h_proto = orig_eth->h_proto;
+}
+
+static __always_inline __u16 csum_fold_helper(__u32 csum)
+{
+ return ~((csum & 0xffff) + (csum >> 16));
+}
+
+static __always_inline void ipv4_csum(void *data_start, int data_size,
+ __u32 *csum)
+{
+ *csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);
+ *csum = csum_fold_helper(*csum);
+}
+
+static __always_inline int send_icmp4_too_big(struct xdp_md *xdp)
+{
+ int headroom = (int)sizeof(struct iphdr) + (int)sizeof(struct icmphdr);
+
+ if (bpf_xdp_adjust_head(xdp, 0 - headroom))
+ return XDP_DROP;
+ void *data = (void *)(long)xdp->data;
+ void *data_end = (void *)(long)xdp->data_end;
+
+ if (data + (ICMP_TOOBIG_SIZE + headroom) > data_end)
+ return XDP_DROP;
+
+ struct iphdr *iph, *orig_iph;
+ struct icmphdr *icmp_hdr;
+ struct ethhdr *orig_eth;
+ __u32 csum = 0;
+ __u64 off = 0;
+
+ orig_eth = data + headroom;
+ swap_mac(data, orig_eth);
+ off += sizeof(struct ethhdr);
+ iph = data + off;
+ off += sizeof(struct iphdr);
+ icmp_hdr = data + off;
+ off += sizeof(struct icmphdr);
+ orig_iph = data + off;
+ icmp_hdr->type = ICMP_DEST_UNREACH;
+ icmp_hdr->code = ICMP_FRAG_NEEDED;
+ icmp_hdr->un.frag.mtu = htons(MAX_PCKT_SIZE-sizeof(struct ethhdr));
+ icmp_hdr->checksum = 0;
+ ipv4_csum(icmp_hdr, ICMP_TOOBIG_PAYLOAD_SIZE, &csum);
+ icmp_hdr->checksum = csum;
+ iph->ttl = DEFAULT_TTL;
+ iph->daddr = orig_iph->saddr;
+ iph->saddr = orig_iph->daddr;
+ iph->version = 4;
+ iph->ihl = 5;
+ iph->protocol = IPPROTO_ICMP;
+ iph->tos = 0;
+ iph->tot_len = htons(
+ ICMP_TOOBIG_SIZE + headroom - sizeof(struct ethhdr));
+ iph->check = 0;
+ csum = 0;
+ ipv4_csum(iph, sizeof(struct iphdr), &csum);
+ iph->check = csum;
+ count_icmp();
+ return XDP_TX;
+}
+
+
+static __always_inline int handle_ipv4(struct xdp_md *xdp)
+{
+ void *data_end = (void *)(long)xdp->data_end;
+ void *data = (void *)(long)xdp->data;
+ int pckt_size = data_end - data;
+ int offset;
+
+ if (pckt_size > MAX_PCKT_SIZE) {
+ offset = pckt_size - ICMP_TOOBIG_SIZE;
+ if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+ return XDP_PASS;
+ return send_icmp4_too_big(xdp);
+ }
+ return XDP_PASS;
+}
+
+SEC("xdp_icmp")
+int _xdp_icmp(struct xdp_md *xdp)
+{
+ void *data_end = (void *)(long)xdp->data_end;
+ void *data = (void *)(long)xdp->data;
+ struct ethhdr *eth = data;
+ __u16 h_proto;
+
+ if (eth + 1 > data_end)
+ return XDP_DROP;
+
+ h_proto = eth->h_proto;
+
+ if (h_proto == htons(ETH_P_IP))
+ return handle_ipv4(xdp);
+ else
+ return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/xdp_adjust_tail_user.c b/samples/bpf/xdp_adjust_tail_user.c
new file mode 100644
index 0000000..3042ce3
--- /dev/null
+++ b/samples/bpf/xdp_adjust_tail_user.c
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <assert.h>
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <arpa/inet.h>
+#include <netinet/ether.h>
+#include <unistd.h>
+#include <time.h>
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
+
+#define STATS_INTERVAL_S 2U
+
+static int ifindex = -1;
+static __u32 xdp_flags;
+
+static void int_exit(int sig)
+{
+ if (ifindex > -1)
+ bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+ exit(0);
+}
+
+/* simple "icmp packet too big sent" counter
+ */
+static void poll_stats(unsigned int map_fd, unsigned int kill_after_s)
+{
+ time_t started_at = time(NULL);
+ __u64 value = 0;
+ int key = 0;
+
+
+ while (!kill_after_s || time(NULL) - started_at <= kill_after_s) {
+ sleep(STATS_INTERVAL_S);
+
+ assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0);
+
+ printf("icmp \"packet too big\" sent: %10llu pkts\n", value);
+ }
+}
+
+static void usage(const char *cmd)
+{
+ printf("Start a XDP prog which send ICMP \"packet too big\" \n"
+ "messages if ingress packet is bigger then MAX_SIZE bytes\n");
+ printf("Usage: %s [...]\n", cmd);
+ printf(" -i <ifindex> Interface Index\n");
+ printf(" -T <stop-after-X-seconds> Default: 0 (forever)\n");
+ printf(" -S use skb-mode\n");
+ printf(" -N enforce native mode\n");
+ printf(" -h Display this help\n");
+}
+
+int main(int argc, char **argv)
+{
+ struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ struct bpf_prog_load_attr prog_load_attr = {
+ .prog_type = BPF_PROG_TYPE_XDP,
+ };
+ unsigned char opt_flags[256] = {};
+ unsigned int kill_after_s = 0;
+ const char *optstr = "i:T:SNh";
+ int i, prog_fd, map_fd, opt;
+ struct bpf_object *obj;
+ struct bpf_map *map;
+ char filename[256];
+
+ for (i = 0; i < strlen(optstr); i++)
+ if (optstr[i] != 'h' && 'a' <= optstr[i] && optstr[i] <= 'z')
+ opt_flags[(unsigned char)optstr[i]] = 1;
+
+ while ((opt = getopt(argc, argv, optstr)) != -1) {
+
+ switch (opt) {
+ case 'i':
+ ifindex = atoi(optarg);
+ break;
+ case 'T':
+ kill_after_s = atoi(optarg);
+ break;
+ case 'S':
+ xdp_flags |= XDP_FLAGS_SKB_MODE;
+ break;
+ case 'N':
+ xdp_flags |= XDP_FLAGS_DRV_MODE;
+ break;
+ default:
+ usage(argv[0]);
+ return 1;
+ }
+ opt_flags[opt] = 0;
+ }
+
+ for (i = 0; i < strlen(optstr); i++) {
+ if (opt_flags[(unsigned int)optstr[i]]) {
+ fprintf(stderr, "Missing argument -%c\n", optstr[i]);
+ usage(argv[0]);
+ return 1;
+ }
+ }
+
+ if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+ perror("setrlimit(RLIMIT_MEMLOCK, RLIM_INFINITY)");
+ return 1;
+ }
+
+ snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+ prog_load_attr.file = filename;
+
+ if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+ return 1;
+
+ map = bpf_map__next(NULL, obj);
+ if (!map) {
+ printf("finding a map in obj file failed\n");
+ return 1;
+ }
+ map_fd = bpf_map__fd(map);
+
+ if (!prog_fd) {
+ printf("load_bpf_file: %s\n", strerror(errno));
+ return 1;
+ }
+
+ signal(SIGINT, int_exit);
+ signal(SIGTERM, int_exit);
+
+ if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
+ printf("link set xdp fd failed\n");
+ return 1;
+ }
+
+ poll_stats(map_fd, kill_after_s);
+
+ bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+
+ return 0;
+}
diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
new file mode 100644
index 0000000..4a6be0f
--- /dev/null
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2017-18 David Ahern <dsahern@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#define KBUILD_MODNAME "foo"
+#include <uapi/linux/bpf.h>
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/if_vlan.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+#include "bpf_helpers.h"
+
+#define IPV6_FLOWINFO_MASK cpu_to_be32(0x0FFFFFFF)
+
+struct bpf_map_def SEC("maps") tx_port = {
+ .type = BPF_MAP_TYPE_DEVMAP,
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .max_entries = 64,
+};
+
+/* from include/net/ip.h */
+static __always_inline int ip_decrease_ttl(struct iphdr *iph)
+{
+ u32 check = (__force u32)iph->check;
+
+ check += (__force u32)htons(0x0100);
+ iph->check = (__force __sum16)(check + (check >= 0xFFFF));
+ return --iph->ttl;
+}
+
+static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ void *data = (void *)(long)ctx->data;
+ struct bpf_fib_lookup fib_params;
+ struct ethhdr *eth = data;
+ struct ipv6hdr *ip6h;
+ struct iphdr *iph;
+ int out_index;
+ u16 h_proto;
+ u64 nh_off;
+
+ nh_off = sizeof(*eth);
+ if (data + nh_off > data_end)
+ return XDP_DROP;
+
+ __builtin_memset(&fib_params, 0, sizeof(fib_params));
+
+ h_proto = eth->h_proto;
+ if (h_proto == htons(ETH_P_IP)) {
+ iph = data + nh_off;
+
+ if (iph + 1 > data_end)
+ return XDP_DROP;
+
+ if (iph->ttl <= 1)
+ return XDP_PASS;
+
+ fib_params.family = AF_INET;
+ fib_params.tos = iph->tos;
+ fib_params.l4_protocol = iph->protocol;
+ fib_params.sport = 0;
+ fib_params.dport = 0;
+ fib_params.tot_len = ntohs(iph->tot_len);
+ fib_params.ipv4_src = iph->saddr;
+ fib_params.ipv4_dst = iph->daddr;
+ } else if (h_proto == htons(ETH_P_IPV6)) {
+ struct in6_addr *src = (struct in6_addr *) fib_params.ipv6_src;
+ struct in6_addr *dst = (struct in6_addr *) fib_params.ipv6_dst;
+
+ ip6h = data + nh_off;
+ if (ip6h + 1 > data_end)
+ return XDP_DROP;
+
+ if (ip6h->hop_limit <= 1)
+ return XDP_PASS;
+
+ fib_params.family = AF_INET6;
+ fib_params.flowlabel = *(__be32 *)ip6h & IPV6_FLOWINFO_MASK;
+ fib_params.l4_protocol = ip6h->nexthdr;
+ fib_params.sport = 0;
+ fib_params.dport = 0;
+ fib_params.tot_len = ntohs(ip6h->payload_len);
+ *src = ip6h->saddr;
+ *dst = ip6h->daddr;
+ } else {
+ return XDP_PASS;
+ }
+
+ fib_params.ifindex = ctx->ingress_ifindex;
+
+ out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
+
+ /* verify egress index has xdp support
+ * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
+ * cannot pass map_type 14 into func bpf_map_lookup_elem#1:
+ * NOTE: without verification that egress index supports XDP
+ * forwarding packets are dropped.
+ */
+ if (out_index > 0) {
+ if (h_proto == htons(ETH_P_IP))
+ ip_decrease_ttl(iph);
+ else if (h_proto == htons(ETH_P_IPV6))
+ ip6h->hop_limit--;
+
+ memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
+ memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
+ return bpf_redirect_map(&tx_port, out_index, 0);
+ }
+
+ return XDP_PASS;
+}
+
+SEC("xdp_fwd")
+int xdp_fwd_prog(struct xdp_md *ctx)
+{
+ return xdp_fwd_flags(ctx, 0);
+}
+
+SEC("xdp_fwd_direct")
+int xdp_fwd_direct_prog(struct xdp_md *ctx)
+{
+ return xdp_fwd_flags(ctx, BPF_FIB_LOOKUP_DIRECT);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c
new file mode 100644
index 0000000..a87a204
--- /dev/null
+++ b/samples/bpf/xdp_fwd_user.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2017-18 David Ahern <dsahern@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/limits.h>
+#include <net/if.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <libgen.h>
+
+#include "bpf_load.h"
+#include "bpf_util.h"
+#include <bpf/bpf.h>
+
+
+static int do_attach(int idx, int fd, const char *name)
+{
+ int err;
+
+ err = bpf_set_link_xdp_fd(idx, fd, 0);
+ if (err < 0)
+ printf("ERROR: failed to attach program to %s\n", name);
+
+ return err;
+}
+
+static int do_detach(int idx, const char *name)
+{
+ int err;
+
+ err = bpf_set_link_xdp_fd(idx, -1, 0);
+ if (err < 0)
+ printf("ERROR: failed to detach program from %s\n", name);
+
+ return err;
+}
+
+static void usage(const char *prog)
+{
+ fprintf(stderr,
+ "usage: %s [OPTS] interface-list\n"
+ "\nOPTS:\n"
+ " -d detach program\n"
+ " -D direct table lookups (skip fib rules)\n",
+ prog);
+}
+
+int main(int argc, char **argv)
+{
+ char filename[PATH_MAX];
+ int opt, i, idx, err;
+ int prog_id = 0;
+ int attach = 1;
+ int ret = 0;
+
+ while ((opt = getopt(argc, argv, ":dD")) != -1) {
+ switch (opt) {
+ case 'd':
+ attach = 0;
+ break;
+ case 'D':
+ prog_id = 1;
+ break;
+ default:
+ usage(basename(argv[0]));
+ return 1;
+ }
+ }
+
+ if (optind == argc) {
+ usage(basename(argv[0]));
+ return 1;
+ }
+
+ if (attach) {
+ snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+ if (access(filename, O_RDONLY) < 0) {
+ printf("error accessing file %s: %s\n",
+ filename, strerror(errno));
+ return 1;
+ }
+
+ if (load_bpf_file(filename)) {
+ printf("%s", bpf_log_buf);
+ return 1;
+ }
+
+ if (!prog_fd[prog_id]) {
+ printf("load_bpf_file: %s\n", strerror(errno));
+ return 1;
+ }
+ }
+ if (attach) {
+ for (i = 1; i < 64; ++i)
+ bpf_map_update_elem(map_fd[0], &i, &i, 0);
+ }
+
+ for (i = optind; i < argc; ++i) {
+ idx = if_nametoindex(argv[i]);
+ if (!idx)
+ idx = strtoul(argv[i], NULL, 0);
+
+ if (!idx) {
+ fprintf(stderr, "Invalid arg\n");
+ return 1;
+ }
+ if (!attach) {
+ err = do_detach(idx, argv[i]);
+ if (err)
+ ret = err;
+ } else {
+ err = do_attach(idx, prog_fd[prog_id], argv[i]);
+ if (err)
+ ret = err;
+ }
+ }
+
+ return ret;
+}
diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c
index 211db8d..ad10fe7 100644
--- a/samples/bpf/xdp_monitor_kern.c
+++ b/samples/bpf/xdp_monitor_kern.c
@@ -125,6 +125,7 @@
u64 processed;
u64 dropped;
u64 info;
+ u64 err;
};
#define MAX_CPUS 64
@@ -208,3 +209,51 @@
return 0;
}
+
+struct bpf_map_def SEC("maps") devmap_xmit_cnt = {
+ .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+ .key_size = sizeof(u32),
+ .value_size = sizeof(struct datarec),
+ .max_entries = 1,
+};
+
+/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_devmap_xmit/format
+ * Code in: kernel/include/trace/events/xdp.h
+ */
+struct devmap_xmit_ctx {
+ u64 __pad; // First 8 bytes are not accessible by bpf code
+ int map_id; // offset:8; size:4; signed:1;
+ u32 act; // offset:12; size:4; signed:0;
+ u32 map_index; // offset:16; size:4; signed:0;
+ int drops; // offset:20; size:4; signed:1;
+ int sent; // offset:24; size:4; signed:1;
+ int from_ifindex; // offset:28; size:4; signed:1;
+ int to_ifindex; // offset:32; size:4; signed:1;
+ int err; // offset:36; size:4; signed:1;
+};
+
+SEC("tracepoint/xdp/xdp_devmap_xmit")
+int trace_xdp_devmap_xmit(struct devmap_xmit_ctx *ctx)
+{
+ struct datarec *rec;
+ u32 key = 0;
+
+ rec = bpf_map_lookup_elem(&devmap_xmit_cnt, &key);
+ if (!rec)
+ return 0;
+ rec->processed += ctx->sent;
+ rec->dropped += ctx->drops;
+
+ /* Record bulk events, then userspace can calc average bulk size */
+ rec->info += 1;
+
+ /* Record error cases, where no frame were sent */
+ if (ctx->err)
+ rec->err++;
+
+ /* Catch API error of drv ndo_xdp_xmit sent more than count */
+ if (ctx->drops < 0)
+ rec->err++;
+
+ return 1;
+}
diff --git a/samples/bpf/xdp_monitor_user.c b/samples/bpf/xdp_monitor_user.c
index eec1452..dd558cb 100644
--- a/samples/bpf/xdp_monitor_user.c
+++ b/samples/bpf/xdp_monitor_user.c
@@ -26,7 +26,7 @@
#include <net/if.h>
#include <time.h>
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "bpf_util.h"
@@ -58,7 +58,7 @@
printf(" flag (internal value:%d)",
*long_options[i].flag);
else
- printf("(internal short-option: -%c)",
+ printf("short-option: -%c",
long_options[i].val);
printf("\n");
}
@@ -117,6 +117,7 @@
__u64 processed;
__u64 dropped;
__u64 info;
+ __u64 err;
};
#define MAX_CPUS 64
@@ -141,6 +142,7 @@
struct record_u64 xdp_exception[XDP_ACTION_MAX];
struct record xdp_cpumap_kthread;
struct record xdp_cpumap_enqueue[MAX_CPUS];
+ struct record xdp_devmap_xmit;
};
static bool map_collect_record(int fd, __u32 key, struct record *rec)
@@ -151,6 +153,7 @@
__u64 sum_processed = 0;
__u64 sum_dropped = 0;
__u64 sum_info = 0;
+ __u64 sum_err = 0;
int i;
if ((bpf_map_lookup_elem(fd, &key, values)) != 0) {
@@ -169,10 +172,13 @@
sum_dropped += values[i].dropped;
rec->cpu[i].info = values[i].info;
sum_info += values[i].info;
+ rec->cpu[i].err = values[i].err;
+ sum_err += values[i].err;
}
rec->total.processed = sum_processed;
rec->total.dropped = sum_dropped;
rec->total.info = sum_info;
+ rec->total.err = sum_err;
return true;
}
@@ -273,6 +279,18 @@
return pps;
}
+static double calc_err(struct datarec *r, struct datarec *p, double period)
+{
+ __u64 packets = 0;
+ double pps = 0;
+
+ if (period > 0) {
+ packets = r->err - p->err;
+ pps = packets / period;
+ }
+ return pps;
+}
+
static void stats_print(struct stats_record *stats_rec,
struct stats_record *stats_prev,
bool err_only)
@@ -330,7 +348,7 @@
pps = calc_pps_u64(r, p, t);
if (pps > 0)
printf(fmt1, "Exception", i,
- 0.0, pps, err2str(rec_i));
+ 0.0, pps, action2str(rec_i));
}
pps = calc_pps_u64(&rec->total, &prev->total, t);
if (pps > 0)
@@ -397,7 +415,7 @@
info = calc_info(r, p, t);
if (info > 0)
i_str = "sched";
- if (pps > 0)
+ if (pps > 0 || drop > 0)
printf(fmt1, "cpumap-kthread",
i, pps, drop, info, i_str);
}
@@ -409,6 +427,50 @@
printf(fmt2, "cpumap-kthread", "total", pps, drop, info, i_str);
}
+ /* devmap ndo_xdp_xmit stats */
+ {
+ char *fmt1 = "%-15s %-7d %'-12.0f %'-12.0f %'-10.2f %s %s\n";
+ char *fmt2 = "%-15s %-7s %'-12.0f %'-12.0f %'-10.2f %s %s\n";
+ struct record *rec, *prev;
+ double drop, info, err;
+ char *i_str = "";
+ char *err_str = "";
+
+ rec = &stats_rec->xdp_devmap_xmit;
+ prev = &stats_prev->xdp_devmap_xmit;
+ t = calc_period(rec, prev);
+ for (i = 0; i < nr_cpus; i++) {
+ struct datarec *r = &rec->cpu[i];
+ struct datarec *p = &prev->cpu[i];
+
+ pps = calc_pps(r, p, t);
+ drop = calc_drop(r, p, t);
+ info = calc_info(r, p, t);
+ err = calc_err(r, p, t);
+ if (info > 0) {
+ i_str = "bulk-average";
+ info = (pps+drop) / info; /* calc avg bulk */
+ }
+ if (err > 0)
+ err_str = "drv-err";
+ if (pps > 0 || drop > 0)
+ printf(fmt1, "devmap-xmit",
+ i, pps, drop, info, i_str, err_str);
+ }
+ pps = calc_pps(&rec->total, &prev->total, t);
+ drop = calc_drop(&rec->total, &prev->total, t);
+ info = calc_info(&rec->total, &prev->total, t);
+ err = calc_err(&rec->total, &prev->total, t);
+ if (info > 0) {
+ i_str = "bulk-average";
+ info = (pps+drop) / info; /* calc avg bulk */
+ }
+ if (err > 0)
+ err_str = "drv-err";
+ printf(fmt2, "devmap-xmit", "total", pps, drop,
+ info, i_str, err_str);
+ }
+
printf("\n");
}
@@ -437,6 +499,9 @@
fd = map_data[3].fd; /* map3: cpumap_kthread_cnt */
map_collect_record(fd, 0, &rec->xdp_cpumap_kthread);
+ fd = map_data[4].fd; /* map4: devmap_xmit_cnt */
+ map_collect_record(fd, 0, &rec->xdp_devmap_xmit);
+
return true;
}
@@ -480,6 +545,7 @@
rec_sz = sizeof(struct datarec);
rec->xdp_cpumap_kthread.cpu = alloc_rec_per_cpu(rec_sz);
+ rec->xdp_devmap_xmit.cpu = alloc_rec_per_cpu(rec_sz);
for (i = 0; i < MAX_CPUS; i++)
rec->xdp_cpumap_enqueue[i].cpu = alloc_rec_per_cpu(rec_sz);
@@ -498,6 +564,7 @@
free(r->xdp_exception[i].cpu);
free(r->xdp_cpumap_kthread.cpu);
+ free(r->xdp_devmap_xmit.cpu);
for (i = 0; i < MAX_CPUS; i++)
free(r->xdp_cpumap_enqueue[i].cpu);
@@ -594,7 +661,7 @@
snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]);
/* Parse commands line args */
- while ((opt = getopt_long(argc, argv, "h",
+ while ((opt = getopt_long(argc, argv, "hDSs:",
long_options, &longindex)) != -1) {
switch (opt) {
case 'D':
diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c
index 23744a8..f6efaef 100644
--- a/samples/bpf/xdp_redirect_cpu_user.c
+++ b/samples/bpf/xdp_redirect_cpu_user.c
@@ -28,7 +28,7 @@
* use bpf/libbpf.h), but cannot as (currently) needed for XDP
* attaching to a device via bpf_set_link_xdp_fd()
*/
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_load.h"
#include "bpf_util.h"
diff --git a/samples/bpf/xdp_redirect_map_user.c b/samples/bpf/xdp_redirect_map_user.c
index 7eae07d72..4445e76 100644
--- a/samples/bpf/xdp_redirect_map_user.c
+++ b/samples/bpf/xdp_redirect_map_user.c
@@ -24,7 +24,7 @@
#include "bpf_load.h"
#include "bpf_util.h"
-#include "libbpf.h"
+#include <bpf/bpf.h>
static int ifindex_in;
static int ifindex_out;
diff --git a/samples/bpf/xdp_redirect_user.c b/samples/bpf/xdp_redirect_user.c
index b701b5c..81a69e3 100644
--- a/samples/bpf/xdp_redirect_user.c
+++ b/samples/bpf/xdp_redirect_user.c
@@ -24,7 +24,7 @@
#include "bpf_load.h"
#include "bpf_util.h"
-#include "libbpf.h"
+#include <bpf/bpf.h>
static int ifindex_in;
static int ifindex_out;
diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c
index 6296741..b2b4dfa 100644
--- a/samples/bpf/xdp_router_ipv4_user.c
+++ b/samples/bpf/xdp_router_ipv4_user.c
@@ -16,7 +16,7 @@
#include <sys/socket.h>
#include <unistd.h>
#include "bpf_load.h"
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <poll.h>
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index 478d954..e4e9ba52 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -22,8 +22,8 @@
#include <arpa/inet.h>
#include <linux/if_link.h>
-#include "libbpf.h"
-#include "bpf_load.h"
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
#include "bpf_util.h"
static int ifindex = -1;
@@ -32,6 +32,9 @@
static __u32 xdp_flags;
+static struct bpf_map *stats_global_map;
+static struct bpf_map *rx_queue_index_map;
+
/* Exit return codes */
#define EXIT_OK 0
#define EXIT_FAIL 1
@@ -174,7 +177,7 @@
static struct record *alloc_record_per_rxq(void)
{
- unsigned int nr_rxqs = map_data[2].def.max_entries;
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
struct record *array;
size_t size;
@@ -190,7 +193,7 @@
static struct stats_record *alloc_stats_record(void)
{
- unsigned int nr_rxqs = map_data[2].def.max_entries;
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
struct stats_record *rec;
int i;
@@ -210,7 +213,7 @@
static void free_stats_record(struct stats_record *r)
{
- unsigned int nr_rxqs = map_data[2].def.max_entries;
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
int i;
for (i = 0; i < nr_rxqs; i++)
@@ -254,11 +257,11 @@
{
int fd, i, max_rxqs;
- fd = map_data[1].fd; /* map: stats_global_map */
+ fd = bpf_map__fd(stats_global_map);
map_collect_percpu(fd, 0, &rec->stats);
- fd = map_data[2].fd; /* map: rx_queue_index_map */
- max_rxqs = map_data[2].def.max_entries;
+ fd = bpf_map__fd(rx_queue_index_map);
+ max_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
for (i = 0; i < max_rxqs; i++)
map_collect_percpu(fd, i, &rec->rxq[i]);
}
@@ -304,8 +307,8 @@
struct stats_record *stats_prev,
int action)
{
+ unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
unsigned int nr_cpus = bpf_num_possible_cpus();
- unsigned int nr_rxqs = map_data[2].def.max_entries;
double pps = 0, err = 0;
struct record *rec, *prev;
double t;
@@ -419,31 +422,44 @@
int main(int argc, char **argv)
{
struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
+ struct bpf_prog_load_attr prog_load_attr = {
+ .prog_type = BPF_PROG_TYPE_XDP,
+ };
+ int prog_fd, map_fd, opt, err;
bool use_separators = true;
struct config cfg = { 0 };
+ struct bpf_object *obj;
+ struct bpf_map *map;
char filename[256];
int longindex = 0;
int interval = 2;
__u32 key = 0;
- int opt, err;
char action_str_buf[XDP_ACTION_MAX_STRLEN + 1 /* for \0 */] = { 0 };
int action = XDP_PASS; /* Default action */
char *action_str = NULL;
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+ prog_load_attr.file = filename;
if (setrlimit(RLIMIT_MEMLOCK, &r)) {
perror("setrlimit(RLIMIT_MEMLOCK)");
return 1;
}
- if (load_bpf_file(filename)) {
- fprintf(stderr, "ERR in load_bpf_file(): %s", bpf_log_buf);
+ if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+ return EXIT_FAIL;
+
+ map = bpf_map__next(NULL, obj);
+ stats_global_map = bpf_map__next(map, obj);
+ rx_queue_index_map = bpf_map__next(stats_global_map, obj);
+ if (!map || !stats_global_map || !rx_queue_index_map) {
+ printf("finding a map in obj file failed\n");
return EXIT_FAIL;
}
+ map_fd = bpf_map__fd(map);
- if (!prog_fd[0]) {
+ if (!prog_fd) {
fprintf(stderr, "ERR: load_bpf_file: %s\n", strerror(errno));
return EXIT_FAIL;
}
@@ -512,7 +528,7 @@
setlocale(LC_NUMERIC, "en_US");
/* User-side setup ifindex in config_map */
- err = bpf_map_update_elem(map_fd[0], &key, &cfg, 0);
+ err = bpf_map_update_elem(map_fd, &key, &cfg, 0);
if (err) {
fprintf(stderr, "Store config failed (err:%d)\n", err);
exit(EXIT_FAIL_BPF);
@@ -521,7 +537,7 @@
/* Remove XDP program when program is interrupted */
signal(SIGINT, int_exit);
- if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) {
+ if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
fprintf(stderr, "link set xdp fd failed\n");
return EXIT_FAIL_XDP;
}
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c
index f0a7872..a4ccc33 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -18,7 +18,7 @@
#include <unistd.h>
#include <time.h>
#include "bpf_load.h"
-#include "libbpf.h"
+#include <bpf/bpf.h>
#include "bpf_util.h"
#include "xdp_tx_iptunnel_common.h"
diff --git a/samples/bpf/xdpsock.h b/samples/bpf/xdpsock.h
new file mode 100644
index 0000000..533ab81
--- /dev/null
+++ b/samples/bpf/xdpsock.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef XDPSOCK_H_
+#define XDPSOCK_H_
+
+/* Power-of-2 number of sockets */
+#define MAX_SOCKS 4
+
+/* Round-robin receive */
+#define RR_LB 0
+
+#endif /* XDPSOCK_H_ */
diff --git a/samples/bpf/xdpsock_kern.c b/samples/bpf/xdpsock_kern.c
new file mode 100644
index 0000000..d8806c4
--- /dev/null
+++ b/samples/bpf/xdpsock_kern.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+#define KBUILD_MODNAME "foo"
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+#include "xdpsock.h"
+
+struct bpf_map_def SEC("maps") qidconf_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") xsks_map = {
+ .type = BPF_MAP_TYPE_XSKMAP,
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .max_entries = 4,
+};
+
+struct bpf_map_def SEC("maps") rr_map = {
+ .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(unsigned int),
+ .max_entries = 1,
+};
+
+SEC("xdp_sock")
+int xdp_sock_prog(struct xdp_md *ctx)
+{
+ int *qidconf, key = 0, idx;
+ unsigned int *rr;
+
+ qidconf = bpf_map_lookup_elem(&qidconf_map, &key);
+ if (!qidconf)
+ return XDP_ABORTED;
+
+ if (*qidconf != ctx->rx_queue_index)
+ return XDP_PASS;
+
+#if RR_LB /* NB! RR_LB is configured in xdpsock.h */
+ rr = bpf_map_lookup_elem(&rr_map, &key);
+ if (!rr)
+ return XDP_ABORTED;
+
+ *rr = (*rr + 1) & (MAX_SOCKS - 1);
+ idx = *rr;
+#else
+ idx = 0;
+#endif
+
+ return bpf_redirect_map(&xsks_map, idx, 0);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
new file mode 100644
index 0000000..e379eac
--- /dev/null
+++ b/samples/bpf/xdpsock_user.c
@@ -0,0 +1,967 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2017 - 2018 Intel Corporation. */
+
+#include <assert.h>
+#include <errno.h>
+#include <getopt.h>
+#include <libgen.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+#include <linux/if_ether.h>
+#include <net/if.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <net/ethernet.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <locale.h>
+#include <sys/types.h>
+#include <poll.h>
+
+#include "bpf_load.h"
+#include "bpf_util.h"
+#include <bpf/bpf.h>
+
+#include "xdpsock.h"
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+#define NUM_FRAMES 131072
+#define FRAME_HEADROOM 0
+#define FRAME_SIZE 2048
+#define NUM_DESCS 1024
+#define BATCH_SIZE 16
+
+#define FQ_NUM_DESCS 1024
+#define CQ_NUM_DESCS 1024
+
+#define DEBUG_HEXDUMP 0
+
+typedef __u32 u32;
+
+static unsigned long prev_time;
+
+enum benchmark_type {
+ BENCH_RXDROP = 0,
+ BENCH_TXONLY = 1,
+ BENCH_L2FWD = 2,
+};
+
+static enum benchmark_type opt_bench = BENCH_RXDROP;
+static u32 opt_xdp_flags;
+static const char *opt_if = "";
+static int opt_ifindex;
+static int opt_queue;
+static int opt_poll;
+static int opt_shared_packet_buffer;
+static int opt_interval = 1;
+
+struct xdp_umem_uqueue {
+ u32 cached_prod;
+ u32 cached_cons;
+ u32 mask;
+ u32 size;
+ u32 *producer;
+ u32 *consumer;
+ u32 *ring;
+ void *map;
+};
+
+struct xdp_umem {
+ char (*frames)[FRAME_SIZE];
+ struct xdp_umem_uqueue fq;
+ struct xdp_umem_uqueue cq;
+ int fd;
+};
+
+struct xdp_uqueue {
+ u32 cached_prod;
+ u32 cached_cons;
+ u32 mask;
+ u32 size;
+ u32 *producer;
+ u32 *consumer;
+ struct xdp_desc *ring;
+ void *map;
+};
+
+struct xdpsock {
+ struct xdp_uqueue rx;
+ struct xdp_uqueue tx;
+ int sfd;
+ struct xdp_umem *umem;
+ u32 outstanding_tx;
+ unsigned long rx_npkts;
+ unsigned long tx_npkts;
+ unsigned long prev_rx_npkts;
+ unsigned long prev_tx_npkts;
+};
+
+#define MAX_SOCKS 4
+static int num_socks;
+struct xdpsock *xsks[MAX_SOCKS];
+
+static unsigned long get_nsecs(void)
+{
+ struct timespec ts;
+
+ clock_gettime(CLOCK_MONOTONIC, &ts);
+ return ts.tv_sec * 1000000000UL + ts.tv_nsec;
+}
+
+static void dump_stats(void);
+
+#define lassert(expr) \
+ do { \
+ if (!(expr)) { \
+ fprintf(stderr, "%s:%s:%i: Assertion failed: " \
+ #expr ": errno: %d/\"%s\"\n", \
+ __FILE__, __func__, __LINE__, \
+ errno, strerror(errno)); \
+ dump_stats(); \
+ exit(EXIT_FAILURE); \
+ } \
+ } while (0)
+
+#define barrier() __asm__ __volatile__("": : :"memory")
+#define u_smp_rmb() barrier()
+#define u_smp_wmb() barrier()
+#define likely(x) __builtin_expect(!!(x), 1)
+#define unlikely(x) __builtin_expect(!!(x), 0)
+
+static const char pkt_data[] =
+ "\x3c\xfd\xfe\x9e\x7f\x71\xec\xb1\xd7\x98\x3a\xc0\x08\x00\x45\x00"
+ "\x00\x2e\x00\x00\x00\x00\x40\x11\x88\x97\x05\x08\x07\x08\xc8\x14"
+ "\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b"
+ "\x54\x59\xb6\x14\x2d\x11\x44\xbf\xaf\xd9\xbe\xaa";
+
+static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb)
+{
+ u32 free_entries = q->size - (q->cached_prod - q->cached_cons);
+
+ if (free_entries >= nb)
+ return free_entries;
+
+ /* Refresh the local tail pointer */
+ q->cached_cons = *q->consumer;
+
+ return q->size - (q->cached_prod - q->cached_cons);
+}
+
+static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs)
+{
+ u32 free_entries = q->cached_cons - q->cached_prod;
+
+ if (free_entries >= ndescs)
+ return free_entries;
+
+ /* Refresh the local tail pointer */
+ q->cached_cons = *q->consumer + q->size;
+ return q->cached_cons - q->cached_prod;
+}
+
+static inline u32 umem_nb_avail(struct xdp_umem_uqueue *q, u32 nb)
+{
+ u32 entries = q->cached_prod - q->cached_cons;
+
+ if (entries == 0) {
+ q->cached_prod = *q->producer;
+ entries = q->cached_prod - q->cached_cons;
+ }
+
+ return (entries > nb) ? nb : entries;
+}
+
+static inline u32 xq_nb_avail(struct xdp_uqueue *q, u32 ndescs)
+{
+ u32 entries = q->cached_prod - q->cached_cons;
+
+ if (entries == 0) {
+ q->cached_prod = *q->producer;
+ entries = q->cached_prod - q->cached_cons;
+ }
+
+ return (entries > ndescs) ? ndescs : entries;
+}
+
+static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq,
+ struct xdp_desc *d,
+ size_t nb)
+{
+ u32 i;
+
+ if (umem_nb_free(fq, nb) < nb)
+ return -ENOSPC;
+
+ for (i = 0; i < nb; i++) {
+ u32 idx = fq->cached_prod++ & fq->mask;
+
+ fq->ring[idx] = d[i].idx;
+ }
+
+ u_smp_wmb();
+
+ *fq->producer = fq->cached_prod;
+
+ return 0;
+}
+
+static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u32 *d,
+ size_t nb)
+{
+ u32 i;
+
+ if (umem_nb_free(fq, nb) < nb)
+ return -ENOSPC;
+
+ for (i = 0; i < nb; i++) {
+ u32 idx = fq->cached_prod++ & fq->mask;
+
+ fq->ring[idx] = d[i];
+ }
+
+ u_smp_wmb();
+
+ *fq->producer = fq->cached_prod;
+
+ return 0;
+}
+
+static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq,
+ u32 *d, size_t nb)
+{
+ u32 idx, i, entries = umem_nb_avail(cq, nb);
+
+ u_smp_rmb();
+
+ for (i = 0; i < entries; i++) {
+ idx = cq->cached_cons++ & cq->mask;
+ d[i] = cq->ring[idx];
+ }
+
+ if (entries > 0) {
+ u_smp_wmb();
+
+ *cq->consumer = cq->cached_cons;
+ }
+
+ return entries;
+}
+
+static inline void *xq_get_data(struct xdpsock *xsk, __u32 idx, __u32 off)
+{
+ lassert(idx < NUM_FRAMES);
+ return &xsk->umem->frames[idx][off];
+}
+
+static inline int xq_enq(struct xdp_uqueue *uq,
+ const struct xdp_desc *descs,
+ unsigned int ndescs)
+{
+ struct xdp_desc *r = uq->ring;
+ unsigned int i;
+
+ if (xq_nb_free(uq, ndescs) < ndescs)
+ return -ENOSPC;
+
+ for (i = 0; i < ndescs; i++) {
+ u32 idx = uq->cached_prod++ & uq->mask;
+
+ r[idx].idx = descs[i].idx;
+ r[idx].len = descs[i].len;
+ r[idx].offset = descs[i].offset;
+ }
+
+ u_smp_wmb();
+
+ *uq->producer = uq->cached_prod;
+ return 0;
+}
+
+static inline int xq_enq_tx_only(struct xdp_uqueue *uq,
+ __u32 idx, unsigned int ndescs)
+{
+ struct xdp_desc *r = uq->ring;
+ unsigned int i;
+
+ if (xq_nb_free(uq, ndescs) < ndescs)
+ return -ENOSPC;
+
+ for (i = 0; i < ndescs; i++) {
+ u32 idx = uq->cached_prod++ & uq->mask;
+
+ r[idx].idx = idx + i;
+ r[idx].len = sizeof(pkt_data) - 1;
+ r[idx].offset = 0;
+ }
+
+ u_smp_wmb();
+
+ *uq->producer = uq->cached_prod;
+ return 0;
+}
+
+static inline int xq_deq(struct xdp_uqueue *uq,
+ struct xdp_desc *descs,
+ int ndescs)
+{
+ struct xdp_desc *r = uq->ring;
+ unsigned int idx;
+ int i, entries;
+
+ entries = xq_nb_avail(uq, ndescs);
+
+ u_smp_rmb();
+
+ for (i = 0; i < entries; i++) {
+ idx = uq->cached_cons++ & uq->mask;
+ descs[i] = r[idx];
+ }
+
+ if (entries > 0) {
+ u_smp_wmb();
+
+ *uq->consumer = uq->cached_cons;
+ }
+
+ return entries;
+}
+
+static void swap_mac_addresses(void *data)
+{
+ struct ether_header *eth = (struct ether_header *)data;
+ struct ether_addr *src_addr = (struct ether_addr *)ð->ether_shost;
+ struct ether_addr *dst_addr = (struct ether_addr *)ð->ether_dhost;
+ struct ether_addr tmp;
+
+ tmp = *src_addr;
+ *src_addr = *dst_addr;
+ *dst_addr = tmp;
+}
+
+#if DEBUG_HEXDUMP
+static void hex_dump(void *pkt, size_t length, const char *prefix)
+{
+ int i = 0;
+ const unsigned char *address = (unsigned char *)pkt;
+ const unsigned char *line = address;
+ size_t line_size = 32;
+ unsigned char c;
+
+ printf("length = %zu\n", length);
+ printf("%s | ", prefix);
+ while (length-- > 0) {
+ printf("%02X ", *address++);
+ if (!(++i % line_size) || (length == 0 && i % line_size)) {
+ if (length == 0) {
+ while (i++ % line_size)
+ printf("__ ");
+ }
+ printf(" | "); /* right close */
+ while (line < address) {
+ c = *line++;
+ printf("%c", (c < 33 || c == 255) ? 0x2E : c);
+ }
+ printf("\n");
+ if (length > 0)
+ printf("%s | ", prefix);
+ }
+ }
+ printf("\n");
+}
+#endif
+
+static size_t gen_eth_frame(char *frame)
+{
+ memcpy(frame, pkt_data, sizeof(pkt_data) - 1);
+ return sizeof(pkt_data) - 1;
+}
+
+static struct xdp_umem *xdp_umem_configure(int sfd)
+{
+ int fq_size = FQ_NUM_DESCS, cq_size = CQ_NUM_DESCS;
+ struct xdp_mmap_offsets off;
+ struct xdp_umem_reg mr;
+ struct xdp_umem *umem;
+ socklen_t optlen;
+ void *bufs;
+
+ umem = calloc(1, sizeof(*umem));
+ lassert(umem);
+
+ lassert(posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
+ NUM_FRAMES * FRAME_SIZE) == 0);
+
+ mr.addr = (__u64)bufs;
+ mr.len = NUM_FRAMES * FRAME_SIZE;
+ mr.frame_size = FRAME_SIZE;
+ mr.frame_headroom = FRAME_HEADROOM;
+
+ lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0);
+ lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size,
+ sizeof(int)) == 0);
+ lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size,
+ sizeof(int)) == 0);
+
+ optlen = sizeof(off);
+ lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
+ &optlen) == 0);
+
+ umem->fq.map = mmap(0, off.fr.desc +
+ FQ_NUM_DESCS * sizeof(u32),
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, sfd,
+ XDP_UMEM_PGOFF_FILL_RING);
+ lassert(umem->fq.map != MAP_FAILED);
+
+ umem->fq.mask = FQ_NUM_DESCS - 1;
+ umem->fq.size = FQ_NUM_DESCS;
+ umem->fq.producer = umem->fq.map + off.fr.producer;
+ umem->fq.consumer = umem->fq.map + off.fr.consumer;
+ umem->fq.ring = umem->fq.map + off.fr.desc;
+
+ umem->cq.map = mmap(0, off.cr.desc +
+ CQ_NUM_DESCS * sizeof(u32),
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, sfd,
+ XDP_UMEM_PGOFF_COMPLETION_RING);
+ lassert(umem->cq.map != MAP_FAILED);
+
+ umem->cq.mask = CQ_NUM_DESCS - 1;
+ umem->cq.size = CQ_NUM_DESCS;
+ umem->cq.producer = umem->cq.map + off.cr.producer;
+ umem->cq.consumer = umem->cq.map + off.cr.consumer;
+ umem->cq.ring = umem->cq.map + off.cr.desc;
+
+ umem->frames = (char (*)[FRAME_SIZE])bufs;
+ umem->fd = sfd;
+
+ if (opt_bench == BENCH_TXONLY) {
+ int i;
+
+ for (i = 0; i < NUM_FRAMES; i++)
+ (void)gen_eth_frame(&umem->frames[i][0]);
+ }
+
+ return umem;
+}
+
+static struct xdpsock *xsk_configure(struct xdp_umem *umem)
+{
+ struct sockaddr_xdp sxdp = {};
+ struct xdp_mmap_offsets off;
+ int sfd, ndescs = NUM_DESCS;
+ struct xdpsock *xsk;
+ bool shared = true;
+ socklen_t optlen;
+ u32 i;
+
+ sfd = socket(PF_XDP, SOCK_RAW, 0);
+ lassert(sfd >= 0);
+
+ xsk = calloc(1, sizeof(*xsk));
+ lassert(xsk);
+
+ xsk->sfd = sfd;
+ xsk->outstanding_tx = 0;
+
+ if (!umem) {
+ shared = false;
+ xsk->umem = xdp_umem_configure(sfd);
+ } else {
+ xsk->umem = umem;
+ }
+
+ lassert(setsockopt(sfd, SOL_XDP, XDP_RX_RING,
+ &ndescs, sizeof(int)) == 0);
+ lassert(setsockopt(sfd, SOL_XDP, XDP_TX_RING,
+ &ndescs, sizeof(int)) == 0);
+ optlen = sizeof(off);
+ lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
+ &optlen) == 0);
+
+ /* Rx */
+ xsk->rx.map = mmap(NULL,
+ off.rx.desc +
+ NUM_DESCS * sizeof(struct xdp_desc),
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, sfd,
+ XDP_PGOFF_RX_RING);
+ lassert(xsk->rx.map != MAP_FAILED);
+
+ if (!shared) {
+ for (i = 0; i < NUM_DESCS / 2; i++)
+ lassert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1)
+ == 0);
+ }
+
+ /* Tx */
+ xsk->tx.map = mmap(NULL,
+ off.tx.desc +
+ NUM_DESCS * sizeof(struct xdp_desc),
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, sfd,
+ XDP_PGOFF_TX_RING);
+ lassert(xsk->tx.map != MAP_FAILED);
+
+ xsk->rx.mask = NUM_DESCS - 1;
+ xsk->rx.size = NUM_DESCS;
+ xsk->rx.producer = xsk->rx.map + off.rx.producer;
+ xsk->rx.consumer = xsk->rx.map + off.rx.consumer;
+ xsk->rx.ring = xsk->rx.map + off.rx.desc;
+
+ xsk->tx.mask = NUM_DESCS - 1;
+ xsk->tx.size = NUM_DESCS;
+ xsk->tx.producer = xsk->tx.map + off.tx.producer;
+ xsk->tx.consumer = xsk->tx.map + off.tx.consumer;
+ xsk->tx.ring = xsk->tx.map + off.tx.desc;
+
+ sxdp.sxdp_family = PF_XDP;
+ sxdp.sxdp_ifindex = opt_ifindex;
+ sxdp.sxdp_queue_id = opt_queue;
+ if (shared) {
+ sxdp.sxdp_flags = XDP_SHARED_UMEM;
+ sxdp.sxdp_shared_umem_fd = umem->fd;
+ }
+
+ lassert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0);
+
+ return xsk;
+}
+
+static void print_benchmark(bool running)
+{
+ const char *bench_str = "INVALID";
+
+ if (opt_bench == BENCH_RXDROP)
+ bench_str = "rxdrop";
+ else if (opt_bench == BENCH_TXONLY)
+ bench_str = "txonly";
+ else if (opt_bench == BENCH_L2FWD)
+ bench_str = "l2fwd";
+
+ printf("%s:%d %s ", opt_if, opt_queue, bench_str);
+ if (opt_xdp_flags & XDP_FLAGS_SKB_MODE)
+ printf("xdp-skb ");
+ else if (opt_xdp_flags & XDP_FLAGS_DRV_MODE)
+ printf("xdp-drv ");
+ else
+ printf(" ");
+
+ if (opt_poll)
+ printf("poll() ");
+
+ if (running) {
+ printf("running...");
+ fflush(stdout);
+ }
+}
+
+static void dump_stats(void)
+{
+ unsigned long now = get_nsecs();
+ long dt = now - prev_time;
+ int i;
+
+ prev_time = now;
+
+ for (i = 0; i < num_socks; i++) {
+ char *fmt = "%-15s %'-11.0f %'-11lu\n";
+ double rx_pps, tx_pps;
+
+ rx_pps = (xsks[i]->rx_npkts - xsks[i]->prev_rx_npkts) *
+ 1000000000. / dt;
+ tx_pps = (xsks[i]->tx_npkts - xsks[i]->prev_tx_npkts) *
+ 1000000000. / dt;
+
+ printf("\n sock%d@", i);
+ print_benchmark(false);
+ printf("\n");
+
+ printf("%-15s %-11s %-11s %-11.2f\n", "", "pps", "pkts",
+ dt / 1000000000.);
+ printf(fmt, "rx", rx_pps, xsks[i]->rx_npkts);
+ printf(fmt, "tx", tx_pps, xsks[i]->tx_npkts);
+
+ xsks[i]->prev_rx_npkts = xsks[i]->rx_npkts;
+ xsks[i]->prev_tx_npkts = xsks[i]->tx_npkts;
+ }
+}
+
+static void *poller(void *arg)
+{
+ (void)arg;
+ for (;;) {
+ sleep(opt_interval);
+ dump_stats();
+ }
+
+ return NULL;
+}
+
+static void int_exit(int sig)
+{
+ (void)sig;
+ dump_stats();
+ bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
+ exit(EXIT_SUCCESS);
+}
+
+static struct option long_options[] = {
+ {"rxdrop", no_argument, 0, 'r'},
+ {"txonly", no_argument, 0, 't'},
+ {"l2fwd", no_argument, 0, 'l'},
+ {"interface", required_argument, 0, 'i'},
+ {"queue", required_argument, 0, 'q'},
+ {"poll", no_argument, 0, 'p'},
+ {"shared-buffer", no_argument, 0, 's'},
+ {"xdp-skb", no_argument, 0, 'S'},
+ {"xdp-native", no_argument, 0, 'N'},
+ {"interval", required_argument, 0, 'n'},
+ {0, 0, 0, 0}
+};
+
+static void usage(const char *prog)
+{
+ const char *str =
+ " Usage: %s [OPTIONS]\n"
+ " Options:\n"
+ " -r, --rxdrop Discard all incoming packets (default)\n"
+ " -t, --txonly Only send packets\n"
+ " -l, --l2fwd MAC swap L2 forwarding\n"
+ " -i, --interface=n Run on interface n\n"
+ " -q, --queue=n Use queue n (default 0)\n"
+ " -p, --poll Use poll syscall\n"
+ " -s, --shared-buffer Use shared packet buffer\n"
+ " -S, --xdp-skb=n Use XDP skb-mod\n"
+ " -N, --xdp-native=n Enfore XDP native mode\n"
+ " -n, --interval=n Specify statistics update interval (default 1 sec).\n"
+ "\n";
+ fprintf(stderr, str, prog);
+ exit(EXIT_FAILURE);
+}
+
+static void parse_command_line(int argc, char **argv)
+{
+ int option_index, c;
+
+ opterr = 0;
+
+ for (;;) {
+ c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
+ &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'r':
+ opt_bench = BENCH_RXDROP;
+ break;
+ case 't':
+ opt_bench = BENCH_TXONLY;
+ break;
+ case 'l':
+ opt_bench = BENCH_L2FWD;
+ break;
+ case 'i':
+ opt_if = optarg;
+ break;
+ case 'q':
+ opt_queue = atoi(optarg);
+ break;
+ case 's':
+ opt_shared_packet_buffer = 1;
+ break;
+ case 'p':
+ opt_poll = 1;
+ break;
+ case 'S':
+ opt_xdp_flags |= XDP_FLAGS_SKB_MODE;
+ break;
+ case 'N':
+ opt_xdp_flags |= XDP_FLAGS_DRV_MODE;
+ break;
+ case 'n':
+ opt_interval = atoi(optarg);
+ break;
+ default:
+ usage(basename(argv[0]));
+ }
+ }
+
+ opt_ifindex = if_nametoindex(opt_if);
+ if (!opt_ifindex) {
+ fprintf(stderr, "ERROR: interface \"%s\" does not exist\n",
+ opt_if);
+ usage(basename(argv[0]));
+ }
+}
+
+static void kick_tx(int fd)
+{
+ int ret;
+
+ ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0);
+ if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN)
+ return;
+ lassert(0);
+}
+
+static inline void complete_tx_l2fwd(struct xdpsock *xsk)
+{
+ u32 descs[BATCH_SIZE];
+ unsigned int rcvd;
+ size_t ndescs;
+
+ if (!xsk->outstanding_tx)
+ return;
+
+ kick_tx(xsk->sfd);
+ ndescs = (xsk->outstanding_tx > BATCH_SIZE) ? BATCH_SIZE :
+ xsk->outstanding_tx;
+
+ /* re-add completed Tx buffers */
+ rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, ndescs);
+ if (rcvd > 0) {
+ umem_fill_to_kernel(&xsk->umem->fq, descs, rcvd);
+ xsk->outstanding_tx -= rcvd;
+ xsk->tx_npkts += rcvd;
+ }
+}
+
+static inline void complete_tx_only(struct xdpsock *xsk)
+{
+ u32 descs[BATCH_SIZE];
+ unsigned int rcvd;
+
+ if (!xsk->outstanding_tx)
+ return;
+
+ kick_tx(xsk->sfd);
+
+ rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, BATCH_SIZE);
+ if (rcvd > 0) {
+ xsk->outstanding_tx -= rcvd;
+ xsk->tx_npkts += rcvd;
+ }
+}
+
+static void rx_drop(struct xdpsock *xsk)
+{
+ struct xdp_desc descs[BATCH_SIZE];
+ unsigned int rcvd, i;
+
+ rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+ if (!rcvd)
+ return;
+
+ for (i = 0; i < rcvd; i++) {
+ u32 idx = descs[i].idx;
+
+ lassert(idx < NUM_FRAMES);
+#if DEBUG_HEXDUMP
+ char *pkt;
+ char buf[32];
+
+ pkt = xq_get_data(xsk, idx, descs[i].offset);
+ sprintf(buf, "idx=%d", idx);
+ hex_dump(pkt, descs[i].len, buf);
+#endif
+ }
+
+ xsk->rx_npkts += rcvd;
+
+ umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd);
+}
+
+static void rx_drop_all(void)
+{
+ struct pollfd fds[MAX_SOCKS + 1];
+ int i, ret, timeout, nfds = 1;
+
+ memset(fds, 0, sizeof(fds));
+
+ for (i = 0; i < num_socks; i++) {
+ fds[i].fd = xsks[i]->sfd;
+ fds[i].events = POLLIN;
+ timeout = 1000; /* 1sn */
+ }
+
+ for (;;) {
+ if (opt_poll) {
+ ret = poll(fds, nfds, timeout);
+ if (ret <= 0)
+ continue;
+ }
+
+ for (i = 0; i < num_socks; i++)
+ rx_drop(xsks[i]);
+ }
+}
+
+static void tx_only(struct xdpsock *xsk)
+{
+ int timeout, ret, nfds = 1;
+ struct pollfd fds[nfds + 1];
+ unsigned int idx = 0;
+
+ memset(fds, 0, sizeof(fds));
+ fds[0].fd = xsk->sfd;
+ fds[0].events = POLLOUT;
+ timeout = 1000; /* 1sn */
+
+ for (;;) {
+ if (opt_poll) {
+ ret = poll(fds, nfds, timeout);
+ if (ret <= 0)
+ continue;
+
+ if (fds[0].fd != xsk->sfd ||
+ !(fds[0].revents & POLLOUT))
+ continue;
+ }
+
+ if (xq_nb_free(&xsk->tx, BATCH_SIZE) >= BATCH_SIZE) {
+ lassert(xq_enq_tx_only(&xsk->tx, idx, BATCH_SIZE) == 0);
+
+ xsk->outstanding_tx += BATCH_SIZE;
+ idx += BATCH_SIZE;
+ idx %= NUM_FRAMES;
+ }
+
+ complete_tx_only(xsk);
+ }
+}
+
+static void l2fwd(struct xdpsock *xsk)
+{
+ for (;;) {
+ struct xdp_desc descs[BATCH_SIZE];
+ unsigned int rcvd, i;
+ int ret;
+
+ for (;;) {
+ complete_tx_l2fwd(xsk);
+
+ rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+ if (rcvd > 0)
+ break;
+ }
+
+ for (i = 0; i < rcvd; i++) {
+ char *pkt = xq_get_data(xsk, descs[i].idx,
+ descs[i].offset);
+
+ swap_mac_addresses(pkt);
+#if DEBUG_HEXDUMP
+ char buf[32];
+ u32 idx = descs[i].idx;
+
+ sprintf(buf, "idx=%d", idx);
+ hex_dump(pkt, descs[i].len, buf);
+#endif
+ }
+
+ xsk->rx_npkts += rcvd;
+
+ ret = xq_enq(&xsk->tx, descs, rcvd);
+ lassert(ret == 0);
+ xsk->outstanding_tx += rcvd;
+ }
+}
+
+int main(int argc, char **argv)
+{
+ struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ char xdp_filename[256];
+ int i, ret, key = 0;
+ pthread_t pt;
+
+ parse_command_line(argc, argv);
+
+ if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+ fprintf(stderr, "ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
+ strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+
+ snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
+
+ if (load_bpf_file(xdp_filename)) {
+ fprintf(stderr, "ERROR: load_bpf_file %s\n", bpf_log_buf);
+ exit(EXIT_FAILURE);
+ }
+
+ if (!prog_fd[0]) {
+ fprintf(stderr, "ERROR: load_bpf_file: \"%s\"\n",
+ strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+
+ if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd[0], opt_xdp_flags) < 0) {
+ fprintf(stderr, "ERROR: link set xdp fd failed\n");
+ exit(EXIT_FAILURE);
+ }
+
+ ret = bpf_map_update_elem(map_fd[0], &key, &opt_queue, 0);
+ if (ret) {
+ fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Create sockets... */
+ xsks[num_socks++] = xsk_configure(NULL);
+
+#if RR_LB
+ for (i = 0; i < MAX_SOCKS - 1; i++)
+ xsks[num_socks++] = xsk_configure(xsks[0]->umem);
+#endif
+
+ /* ...and insert them into the map. */
+ for (i = 0; i < num_socks; i++) {
+ key = i;
+ ret = bpf_map_update_elem(map_fd[1], &key, &xsks[i]->sfd, 0);
+ if (ret) {
+ fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ signal(SIGINT, int_exit);
+ signal(SIGTERM, int_exit);
+ signal(SIGABRT, int_exit);
+
+ setlocale(LC_ALL, "");
+
+ ret = pthread_create(&pt, NULL, poller, NULL);
+ lassert(ret == 0);
+
+ prev_time = get_nsecs();
+
+ if (opt_bench == BENCH_RXDROP)
+ rx_drop_all();
+ else if (opt_bench == BENCH_TXONLY)
+ tx_only(xsks[0]);
+ else
+ l2fwd(xsks[0]);
+
+ return 0;
+}
diff --git a/samples/sockmap/Makefile b/samples/sockmap/Makefile
deleted file mode 100644
index fa53f4d..0000000
--- a/samples/sockmap/Makefile
+++ /dev/null
@@ -1,78 +0,0 @@
-# List of programs to build
-hostprogs-y := sockmap
-
-# Libbpf dependencies
-LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
-
-HOSTCFLAGS += -I$(objtree)/usr/include
-HOSTCFLAGS += -I$(srctree)/tools/lib/
-HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
-HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
-HOSTCFLAGS += -I$(srctree)/tools/perf
-
-sockmap-objs := ../bpf/bpf_load.o $(LIBBPF) sockmap_user.o
-
-# Tell kbuild to always build the programs
-always := $(hostprogs-y)
-always += sockmap_kern.o
-
-HOSTLOADLIBES_sockmap += -lelf -lpthread
-
-# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
-# make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
-LLC ?= llc
-CLANG ?= clang
-
-# Trick to allow make to be run from this directory
-all:
- $(MAKE) -C ../../ $(CURDIR)/
-
-clean:
- $(MAKE) -C ../../ M=$(CURDIR) clean
- @rm -f *~
-
-$(obj)/syscall_nrs.s: $(src)/syscall_nrs.c
- $(call if_changed_dep,cc_s_c)
-
-$(obj)/syscall_nrs.h: $(obj)/syscall_nrs.s FORCE
- $(call filechk,offsets,__SYSCALL_NRS_H__)
-
-clean-files += syscall_nrs.h
-
-FORCE:
-
-
-# Verify LLVM compiler tools are available and bpf target is supported by llc
-.PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC)
-
-verify_cmds: $(CLANG) $(LLC)
- @for TOOL in $^ ; do \
- if ! (which -- "$${TOOL}" > /dev/null 2>&1); then \
- echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\
- exit 1; \
- else true; fi; \
- done
-
-verify_target_bpf: verify_cmds
- @if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
- echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
- echo " NOTICE: LLVM version >= 3.7.1 required" ;\
- exit 2; \
- else true; fi
-
-$(src)/*.c: verify_target_bpf
-
-# asm/sysreg.h - inline assembly used by it is incompatible with llvm.
-# But, there is no easy way to fix it, so just exclude it since it is
-# useless for BPF samples.
-#
-# -target bpf option required with SK_MSG programs, this is to ensure
-# reading 'void *' data types for data and data_end are __u64 reads.
-$(obj)/%.o: $(src)/%.c
- $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
- -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
- -Wno-compare-distinct-pointer-types \
- -Wno-gnu-variable-sized-type-not-at-end \
- -Wno-address-of-packed-member -Wno-tautological-compare \
- -Wno-unknown-warning-option -O2 -target bpf \
- -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
diff --git a/samples/sockmap/sockmap_test.sh b/samples/sockmap/sockmap_test.sh
deleted file mode 100755
index ace75f0..0000000
--- a/samples/sockmap/sockmap_test.sh
+++ /dev/null
@@ -1,488 +0,0 @@
-#Test a bunch of positive cases to verify basic functionality
-for prog in "--txmsg_redir --txmsg_skb" "--txmsg_redir --txmsg_ingress" "--txmsg" "--txmsg_redir" "--txmsg_redir --txmsg_ingress" "--txmsg_drop"; do
-for t in "sendmsg" "sendpage"; do
-for r in 1 10 100; do
- for i in 1 10 100; do
- for l in 1 10 100; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
- done
- done
-done
-done
-done
-
-#Test max iov
-t="sendmsg"
-r=1
-i=1024
-l=1
-prog="--txmsg"
-
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
-echo $TEST
-$TEST
-sleep 2
-prog="--txmsg_redir"
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
-echo $TEST
-$TEST
-
-# Test max iov with 1k send
-
-t="sendmsg"
-r=1
-i=1024
-l=1024
-prog="--txmsg"
-
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
-echo $TEST
-$TEST
-sleep 2
-prog="--txmsg_redir"
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
-echo $TEST
-$TEST
-sleep 2
-
-# Test apply with 1B
-r=1
-i=1024
-l=1024
-prog="--txmsg_apply 1"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test apply with larger value than send
-r=1
-i=8
-l=1024
-prog="--txmsg_apply 2048"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test apply with apply that never reaches limit
-r=1024
-i=1
-l=1
-prog="--txmsg_apply 2048"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test apply and redirect with 1B
-r=1
-i=1024
-l=1024
-prog="--txmsg_redir --txmsg_apply 1"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-prog="--txmsg_redir --txmsg_apply 1 --txmsg_ingress"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-prog="--txmsg_redir --txmsg_apply 1 --txmsg_skb"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-
-# Test apply and redirect with larger value than send
-r=1
-i=8
-l=1024
-prog="--txmsg_redir --txmsg_apply 2048"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-prog="--txmsg_redir --txmsg_apply 2048 --txmsg_ingress"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-prog="--txmsg_redir --txmsg_apply 2048 --txmsg_skb"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-
-# Test apply and redirect with apply that never reaches limit
-r=1024
-i=1
-l=1
-prog="--txmsg_apply 2048"
-
-for t in "sendmsg" "sendpage"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test cork with 1B not really useful but test it anyways
-r=1
-i=1024
-l=1024
-prog="--txmsg_cork 1"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test cork with a more reasonable 100B
-r=1
-i=1000
-l=1000
-prog="--txmsg_cork 100"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test cork with larger value than send
-r=1
-i=8
-l=1024
-prog="--txmsg_cork 2048"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test cork with cork that never reaches limit
-r=1024
-i=1
-l=1
-prog="--txmsg_cork 2048"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-r=1
-i=1024
-l=1024
-prog="--txmsg_redir --txmsg_cork 1"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test cork with a more reasonable 100B
-r=1
-i=1000
-l=1000
-prog="--txmsg_redir --txmsg_cork 100"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test cork with larger value than send
-r=1
-i=8
-l=1024
-prog="--txmsg_redir --txmsg_cork 2048"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test cork with cork that never reaches limit
-r=1024
-i=1
-l=1
-prog="--txmsg_cork 2048"
-
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-
-# mix and match cork and apply not really useful but valid programs
-
-# Test apply < cork
-r=100
-i=1
-l=5
-prog="--txmsg_apply 10 --txmsg_cork 100"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Try again with larger sizes so we hit overflow case
-r=100
-i=1000
-l=2048
-prog="--txmsg_apply 4096 --txmsg_cork 8096"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test apply > cork
-r=100
-i=1
-l=5
-prog="--txmsg_apply 100 --txmsg_cork 10"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Again with larger sizes so we hit overflow cases
-r=100
-i=1000
-l=2048
-prog="--txmsg_apply 8096 --txmsg_cork 4096"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-
-# Test apply = cork
-r=100
-i=1
-l=5
-prog="--txmsg_apply 10 --txmsg_cork 10"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-r=100
-i=1000
-l=2048
-prog="--txmsg_apply 4096 --txmsg_cork 4096"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test apply < cork
-r=100
-i=1
-l=5
-prog="--txmsg_redir --txmsg_apply 10 --txmsg_cork 100"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Try again with larger sizes so we hit overflow case
-r=100
-i=1000
-l=2048
-prog="--txmsg_redir --txmsg_apply 4096 --txmsg_cork 8096"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Test apply > cork
-r=100
-i=1
-l=5
-prog="--txmsg_redir --txmsg_apply 100 --txmsg_cork 10"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Again with larger sizes so we hit overflow cases
-r=100
-i=1000
-l=2048
-prog="--txmsg_redir --txmsg_apply 8096 --txmsg_cork 4096"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-
-# Test apply = cork
-r=100
-i=1
-l=5
-prog="--txmsg_redir --txmsg_apply 10 --txmsg_cork 10"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-r=100
-i=1000
-l=2048
-prog="--txmsg_redir --txmsg_apply 4096 --txmsg_cork 4096"
-for t in "sendpage" "sendmsg"; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
- echo $TEST
- $TEST
- sleep 2
-done
-
-# Tests for bpf_msg_pull_data()
-for i in `seq 99 100 1600`; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
- --txmsg --txmsg_start 0 --txmsg_end $i --txmsg_cork 1600"
- echo $TEST
- $TEST
- sleep 2
-done
-
-for i in `seq 199 100 1600`; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
- --txmsg --txmsg_start 100 --txmsg_end $i --txmsg_cork 1600"
- echo $TEST
- $TEST
- sleep 2
-done
-
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
- --txmsg --txmsg_start 1500 --txmsg_end 1600 --txmsg_cork 1600"
-echo $TEST
-$TEST
-sleep 2
-
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
- --txmsg --txmsg_start 1111 --txmsg_end 1112 --txmsg_cork 1600"
-echo $TEST
-$TEST
-sleep 2
-
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
- --txmsg --txmsg_start 1111 --txmsg_end 0 --txmsg_cork 1600"
-echo $TEST
-$TEST
-sleep 2
-
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
- --txmsg --txmsg_start 0 --txmsg_end 1601 --txmsg_cork 1600"
-echo $TEST
-$TEST
-sleep 2
-
-TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
- --txmsg --txmsg_start 0 --txmsg_end 1601 --txmsg_cork 1602"
-echo $TEST
-$TEST
-sleep 2
-
-# Run through gamut again with start and end
-for prog in "--txmsg" "--txmsg_redir" "--txmsg_drop"; do
-for t in "sendmsg" "sendpage"; do
-for r in 1 10 100; do
- for i in 1 10 100; do
- for l in 1 10 100; do
- TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog --txmsg_start 1 --txmsg_end 2"
- echo $TEST
- $TEST
- sleep 2
- done
- done
-done
-done
-done
-
-# Some specific tests to cover specific code paths
-./sockmap --cgroup /mnt/cgroup2/ -t sendpage \
- -r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 3
-./sockmap --cgroup /mnt/cgroup2/ -t sendmsg \
- -r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 3
-./sockmap --cgroup /mnt/cgroup2/ -t sendpage \
- -r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 5
-./sockmap --cgroup /mnt/cgroup2/ -t sendmsg \
- -r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 5
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
deleted file mode 100644
index 6f23349..0000000
--- a/samples/sockmap/sockmap_user.c
+++ /dev/null
@@ -1,894 +0,0 @@
-/* Copyright (c) 2017 Covalent IO, Inc. http://covalent.io
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- */
-#include <stdio.h>
-#include <stdlib.h>
-#include <sys/socket.h>
-#include <sys/ioctl.h>
-#include <sys/select.h>
-#include <netinet/in.h>
-#include <arpa/inet.h>
-#include <unistd.h>
-#include <string.h>
-#include <errno.h>
-#include <sys/ioctl.h>
-#include <stdbool.h>
-#include <signal.h>
-#include <fcntl.h>
-#include <sys/wait.h>
-#include <time.h>
-
-#include <sys/time.h>
-#include <sys/resource.h>
-#include <sys/types.h>
-#include <sys/sendfile.h>
-
-#include <linux/netlink.h>
-#include <linux/socket.h>
-#include <linux/sock_diag.h>
-#include <linux/bpf.h>
-#include <linux/if_link.h>
-#include <assert.h>
-#include <libgen.h>
-
-#include <getopt.h>
-
-#include "../bpf/bpf_load.h"
-#include "../bpf/bpf_util.h"
-#include "../bpf/libbpf.h"
-
-int running;
-void running_handler(int a);
-
-/* randomly selected ports for testing on lo */
-#define S1_PORT 10000
-#define S2_PORT 10001
-
-/* global sockets */
-int s1, s2, c1, c2, p1, p2;
-
-int txmsg_pass;
-int txmsg_noisy;
-int txmsg_redir;
-int txmsg_redir_noisy;
-int txmsg_drop;
-int txmsg_apply;
-int txmsg_cork;
-int txmsg_start;
-int txmsg_end;
-int txmsg_ingress;
-int txmsg_skb;
-
-static const struct option long_options[] = {
- {"help", no_argument, NULL, 'h' },
- {"cgroup", required_argument, NULL, 'c' },
- {"rate", required_argument, NULL, 'r' },
- {"verbose", no_argument, NULL, 'v' },
- {"iov_count", required_argument, NULL, 'i' },
- {"length", required_argument, NULL, 'l' },
- {"test", required_argument, NULL, 't' },
- {"data_test", no_argument, NULL, 'd' },
- {"txmsg", no_argument, &txmsg_pass, 1 },
- {"txmsg_noisy", no_argument, &txmsg_noisy, 1 },
- {"txmsg_redir", no_argument, &txmsg_redir, 1 },
- {"txmsg_redir_noisy", no_argument, &txmsg_redir_noisy, 1},
- {"txmsg_drop", no_argument, &txmsg_drop, 1 },
- {"txmsg_apply", required_argument, NULL, 'a'},
- {"txmsg_cork", required_argument, NULL, 'k'},
- {"txmsg_start", required_argument, NULL, 's'},
- {"txmsg_end", required_argument, NULL, 'e'},
- {"txmsg_ingress", no_argument, &txmsg_ingress, 1 },
- {"txmsg_skb", no_argument, &txmsg_skb, 1 },
- {0, 0, NULL, 0 }
-};
-
-static void usage(char *argv[])
-{
- int i;
-
- printf(" Usage: %s --cgroup <cgroup_path>\n", argv[0]);
- printf(" options:\n");
- for (i = 0; long_options[i].name != 0; i++) {
- printf(" --%-12s", long_options[i].name);
- if (long_options[i].flag != NULL)
- printf(" flag (internal value:%d)\n",
- *long_options[i].flag);
- else
- printf(" -%c\n", long_options[i].val);
- }
- printf("\n");
-}
-
-static int sockmap_init_sockets(void)
-{
- int i, err, one = 1;
- struct sockaddr_in addr;
- int *fds[4] = {&s1, &s2, &c1, &c2};
-
- s1 = s2 = p1 = p2 = c1 = c2 = 0;
-
- /* Init sockets */
- for (i = 0; i < 4; i++) {
- *fds[i] = socket(AF_INET, SOCK_STREAM, 0);
- if (*fds[i] < 0) {
- perror("socket s1 failed()");
- return errno;
- }
- }
-
- /* Allow reuse */
- for (i = 0; i < 2; i++) {
- err = setsockopt(*fds[i], SOL_SOCKET, SO_REUSEADDR,
- (char *)&one, sizeof(one));
- if (err) {
- perror("setsockopt failed()");
- return errno;
- }
- }
-
- /* Non-blocking sockets */
- for (i = 0; i < 2; i++) {
- err = ioctl(*fds[i], FIONBIO, (char *)&one);
- if (err < 0) {
- perror("ioctl s1 failed()");
- return errno;
- }
- }
-
- /* Bind server sockets */
- memset(&addr, 0, sizeof(struct sockaddr_in));
- addr.sin_family = AF_INET;
- addr.sin_addr.s_addr = inet_addr("127.0.0.1");
-
- addr.sin_port = htons(S1_PORT);
- err = bind(s1, (struct sockaddr *)&addr, sizeof(addr));
- if (err < 0) {
- perror("bind s1 failed()\n");
- return errno;
- }
-
- addr.sin_port = htons(S2_PORT);
- err = bind(s2, (struct sockaddr *)&addr, sizeof(addr));
- if (err < 0) {
- perror("bind s2 failed()\n");
- return errno;
- }
-
- /* Listen server sockets */
- addr.sin_port = htons(S1_PORT);
- err = listen(s1, 32);
- if (err < 0) {
- perror("listen s1 failed()\n");
- return errno;
- }
-
- addr.sin_port = htons(S2_PORT);
- err = listen(s2, 32);
- if (err < 0) {
- perror("listen s1 failed()\n");
- return errno;
- }
-
- /* Initiate Connect */
- addr.sin_port = htons(S1_PORT);
- err = connect(c1, (struct sockaddr *)&addr, sizeof(addr));
- if (err < 0 && errno != EINPROGRESS) {
- perror("connect c1 failed()\n");
- return errno;
- }
-
- addr.sin_port = htons(S2_PORT);
- err = connect(c2, (struct sockaddr *)&addr, sizeof(addr));
- if (err < 0 && errno != EINPROGRESS) {
- perror("connect c2 failed()\n");
- return errno;
- } else if (err < 0) {
- err = 0;
- }
-
- /* Accept Connecrtions */
- p1 = accept(s1, NULL, NULL);
- if (p1 < 0) {
- perror("accept s1 failed()\n");
- return errno;
- }
-
- p2 = accept(s2, NULL, NULL);
- if (p2 < 0) {
- perror("accept s1 failed()\n");
- return errno;
- }
-
- printf("connected sockets: c1 <-> p1, c2 <-> p2\n");
- printf("cgroups binding: c1(%i) <-> s1(%i) - - - c2(%i) <-> s2(%i)\n",
- c1, s1, c2, s2);
- return 0;
-}
-
-struct msg_stats {
- size_t bytes_sent;
- size_t bytes_recvd;
- struct timespec start;
- struct timespec end;
-};
-
-struct sockmap_options {
- int verbose;
- bool base;
- bool sendpage;
- bool data_test;
- bool drop_expected;
-};
-
-static int msg_loop_sendpage(int fd, int iov_length, int cnt,
- struct msg_stats *s,
- struct sockmap_options *opt)
-{
- bool drop = opt->drop_expected;
- unsigned char k = 0;
- FILE *file;
- int i, fp;
-
- file = fopen(".sendpage_tst.tmp", "w+");
- for (i = 0; i < iov_length * cnt; i++, k++)
- fwrite(&k, sizeof(char), 1, file);
- fflush(file);
- fseek(file, 0, SEEK_SET);
- fclose(file);
-
- fp = open(".sendpage_tst.tmp", O_RDONLY);
- clock_gettime(CLOCK_MONOTONIC, &s->start);
- for (i = 0; i < cnt; i++) {
- int sent = sendfile(fd, fp, NULL, iov_length);
-
- if (!drop && sent < 0) {
- perror("send loop error:");
- close(fp);
- return sent;
- } else if (drop && sent >= 0) {
- printf("sendpage loop error expected: %i\n", sent);
- close(fp);
- return -EIO;
- }
-
- if (sent > 0)
- s->bytes_sent += sent;
- }
- clock_gettime(CLOCK_MONOTONIC, &s->end);
- close(fp);
- return 0;
-}
-
-static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
- struct msg_stats *s, bool tx,
- struct sockmap_options *opt)
-{
- struct msghdr msg = {0};
- int err, i, flags = MSG_NOSIGNAL;
- struct iovec *iov;
- unsigned char k;
- bool data_test = opt->data_test;
- bool drop = opt->drop_expected;
-
- iov = calloc(iov_count, sizeof(struct iovec));
- if (!iov)
- return errno;
-
- k = 0;
- for (i = 0; i < iov_count; i++) {
- unsigned char *d = calloc(iov_length, sizeof(char));
-
- if (!d) {
- fprintf(stderr, "iov_count %i/%i OOM\n", i, iov_count);
- goto out_errno;
- }
- iov[i].iov_base = d;
- iov[i].iov_len = iov_length;
-
- if (data_test && tx) {
- int j;
-
- for (j = 0; j < iov_length; j++)
- d[j] = k++;
- }
- }
-
- msg.msg_iov = iov;
- msg.msg_iovlen = iov_count;
- k = 0;
-
- if (tx) {
- clock_gettime(CLOCK_MONOTONIC, &s->start);
- for (i = 0; i < cnt; i++) {
- int sent = sendmsg(fd, &msg, flags);
-
- if (!drop && sent < 0) {
- perror("send loop error:");
- goto out_errno;
- } else if (drop && sent >= 0) {
- printf("send loop error expected: %i\n", sent);
- errno = -EIO;
- goto out_errno;
- }
- if (sent > 0)
- s->bytes_sent += sent;
- }
- clock_gettime(CLOCK_MONOTONIC, &s->end);
- } else {
- int slct, recv, max_fd = fd;
- struct timeval timeout;
- float total_bytes;
- fd_set w;
-
- total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
- err = clock_gettime(CLOCK_MONOTONIC, &s->start);
- if (err < 0)
- perror("recv start time: ");
- while (s->bytes_recvd < total_bytes) {
- timeout.tv_sec = 1;
- timeout.tv_usec = 0;
-
- /* FD sets */
- FD_ZERO(&w);
- FD_SET(fd, &w);
-
- slct = select(max_fd + 1, &w, NULL, NULL, &timeout);
- if (slct == -1) {
- perror("select()");
- clock_gettime(CLOCK_MONOTONIC, &s->end);
- goto out_errno;
- } else if (!slct) {
- fprintf(stderr, "unexpected timeout\n");
- errno = -EIO;
- clock_gettime(CLOCK_MONOTONIC, &s->end);
- goto out_errno;
- }
-
- recv = recvmsg(fd, &msg, flags);
- if (recv < 0) {
- if (errno != EWOULDBLOCK) {
- clock_gettime(CLOCK_MONOTONIC, &s->end);
- perror("recv failed()\n");
- goto out_errno;
- }
- }
-
- s->bytes_recvd += recv;
-
- if (data_test) {
- int j;
-
- for (i = 0; i < msg.msg_iovlen; i++) {
- unsigned char *d = iov[i].iov_base;
-
- for (j = 0;
- j < iov[i].iov_len && recv; j++) {
- if (d[j] != k++) {
- errno = -EIO;
- fprintf(stderr,
- "detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
- i, j, d[j], k - 1, d[j+1], k + 1);
- goto out_errno;
- }
- recv--;
- }
- }
- }
- }
- clock_gettime(CLOCK_MONOTONIC, &s->end);
- }
-
- for (i = 0; i < iov_count; i++)
- free(iov[i].iov_base);
- free(iov);
- return 0;
-out_errno:
- for (i = 0; i < iov_count; i++)
- free(iov[i].iov_base);
- free(iov);
- return errno;
-}
-
-static float giga = 1000000000;
-
-static inline float sentBps(struct msg_stats s)
-{
- return s.bytes_sent / (s.end.tv_sec - s.start.tv_sec);
-}
-
-static inline float recvdBps(struct msg_stats s)
-{
- return s.bytes_recvd / (s.end.tv_sec - s.start.tv_sec);
-}
-
-static int sendmsg_test(int iov_count, int iov_buf, int cnt,
- struct sockmap_options *opt)
-{
- float sent_Bps = 0, recvd_Bps = 0;
- int rx_fd, txpid, rxpid, err = 0;
- struct msg_stats s = {0};
- int status;
-
- errno = 0;
-
- if (opt->base)
- rx_fd = p1;
- else
- rx_fd = p2;
-
- rxpid = fork();
- if (rxpid == 0) {
- if (opt->drop_expected)
- exit(1);
-
- if (opt->sendpage)
- iov_count = 1;
- err = msg_loop(rx_fd, iov_count, iov_buf,
- cnt, &s, false, opt);
- if (err)
- fprintf(stderr,
- "msg_loop_rx: iov_count %i iov_buf %i cnt %i err %i\n",
- iov_count, iov_buf, cnt, err);
- shutdown(p2, SHUT_RDWR);
- shutdown(p1, SHUT_RDWR);
- if (s.end.tv_sec - s.start.tv_sec) {
- sent_Bps = sentBps(s);
- recvd_Bps = recvdBps(s);
- }
- fprintf(stdout,
- "rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n",
- s.bytes_sent, sent_Bps, sent_Bps/giga,
- s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
- exit(1);
- } else if (rxpid == -1) {
- perror("msg_loop_rx: ");
- return errno;
- }
-
- txpid = fork();
- if (txpid == 0) {
- if (opt->sendpage)
- err = msg_loop_sendpage(c1, iov_buf, cnt, &s, opt);
- else
- err = msg_loop(c1, iov_count, iov_buf,
- cnt, &s, true, opt);
-
- if (err)
- fprintf(stderr,
- "msg_loop_tx: iov_count %i iov_buf %i cnt %i err %i\n",
- iov_count, iov_buf, cnt, err);
- shutdown(c1, SHUT_RDWR);
- if (s.end.tv_sec - s.start.tv_sec) {
- sent_Bps = sentBps(s);
- recvd_Bps = recvdBps(s);
- }
- fprintf(stdout,
- "tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n",
- s.bytes_sent, sent_Bps, sent_Bps/giga,
- s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
- exit(1);
- } else if (txpid == -1) {
- perror("msg_loop_tx: ");
- return errno;
- }
-
- assert(waitpid(rxpid, &status, 0) == rxpid);
- assert(waitpid(txpid, &status, 0) == txpid);
- return err;
-}
-
-static int forever_ping_pong(int rate, struct sockmap_options *opt)
-{
- struct timeval timeout;
- char buf[1024] = {0};
- int sc;
-
- timeout.tv_sec = 10;
- timeout.tv_usec = 0;
-
- /* Ping/Pong data from client to server */
- sc = send(c1, buf, sizeof(buf), 0);
- if (sc < 0) {
- perror("send failed()\n");
- return sc;
- }
-
- do {
- int s, rc, i, max_fd = p2;
- fd_set w;
-
- /* FD sets */
- FD_ZERO(&w);
- FD_SET(c1, &w);
- FD_SET(c2, &w);
- FD_SET(p1, &w);
- FD_SET(p2, &w);
-
- s = select(max_fd + 1, &w, NULL, NULL, &timeout);
- if (s == -1) {
- perror("select()");
- break;
- } else if (!s) {
- fprintf(stderr, "unexpected timeout\n");
- break;
- }
-
- for (i = 0; i <= max_fd && s > 0; ++i) {
- if (!FD_ISSET(i, &w))
- continue;
-
- s--;
-
- rc = recv(i, buf, sizeof(buf), 0);
- if (rc < 0) {
- if (errno != EWOULDBLOCK) {
- perror("recv failed()\n");
- return rc;
- }
- }
-
- if (rc == 0) {
- close(i);
- break;
- }
-
- sc = send(i, buf, rc, 0);
- if (sc < 0) {
- perror("send failed()\n");
- return sc;
- }
- }
-
- if (rate)
- sleep(rate);
-
- if (opt->verbose) {
- printf(".");
- fflush(stdout);
-
- }
- } while (running);
-
- return 0;
-}
-
-enum {
- PING_PONG,
- SENDMSG,
- BASE,
- BASE_SENDPAGE,
- SENDPAGE,
-};
-
-int main(int argc, char **argv)
-{
- int iov_count = 1, length = 1024, rate = 1, tx_prog_fd;
- struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
- int opt, longindex, err, cg_fd = 0;
- struct sockmap_options options = {0};
- int test = PING_PONG;
- char filename[256];
-
- while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
- long_options, &longindex)) != -1) {
- switch (opt) {
- case 's':
- txmsg_start = atoi(optarg);
- break;
- case 'e':
- txmsg_end = atoi(optarg);
- break;
- case 'a':
- txmsg_apply = atoi(optarg);
- break;
- case 'k':
- txmsg_cork = atoi(optarg);
- break;
- case 'c':
- cg_fd = open(optarg, O_DIRECTORY, O_RDONLY);
- if (cg_fd < 0) {
- fprintf(stderr,
- "ERROR: (%i) open cg path failed: %s\n",
- cg_fd, optarg);
- return cg_fd;
- }
- break;
- case 'r':
- rate = atoi(optarg);
- break;
- case 'v':
- options.verbose = 1;
- break;
- case 'i':
- iov_count = atoi(optarg);
- break;
- case 'l':
- length = atoi(optarg);
- break;
- case 'd':
- options.data_test = true;
- break;
- case 't':
- if (strcmp(optarg, "ping") == 0) {
- test = PING_PONG;
- } else if (strcmp(optarg, "sendmsg") == 0) {
- test = SENDMSG;
- } else if (strcmp(optarg, "base") == 0) {
- test = BASE;
- } else if (strcmp(optarg, "base_sendpage") == 0) {
- test = BASE_SENDPAGE;
- } else if (strcmp(optarg, "sendpage") == 0) {
- test = SENDPAGE;
- } else {
- usage(argv);
- return -1;
- }
- break;
- case 0:
- break;
- case 'h':
- default:
- usage(argv);
- return -1;
- }
- }
-
- if (!cg_fd) {
- fprintf(stderr, "%s requires cgroup option: --cgroup <path>\n",
- argv[0]);
- return -1;
- }
-
- if (setrlimit(RLIMIT_MEMLOCK, &r)) {
- perror("setrlimit(RLIMIT_MEMLOCK)");
- return 1;
- }
-
- snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
-
- running = 1;
-
- /* catch SIGINT */
- signal(SIGINT, running_handler);
-
- if (load_bpf_file(filename)) {
- fprintf(stderr, "load_bpf_file: (%s) %s\n",
- filename, strerror(errno));
- return 1;
- }
-
- /* If base test skip BPF setup */
- if (test == BASE || test == BASE_SENDPAGE)
- goto run;
-
- /* Attach programs to sockmap */
- err = bpf_prog_attach(prog_fd[0], map_fd[0],
- BPF_SK_SKB_STREAM_PARSER, 0);
- if (err) {
- fprintf(stderr, "ERROR: bpf_prog_attach (sockmap): %d (%s)\n",
- err, strerror(errno));
- return err;
- }
-
- err = bpf_prog_attach(prog_fd[1], map_fd[0],
- BPF_SK_SKB_STREAM_VERDICT, 0);
- if (err) {
- fprintf(stderr, "ERROR: bpf_prog_attach (sockmap): %d (%s)\n",
- err, strerror(errno));
- return err;
- }
-
- /* Attach to cgroups */
- err = bpf_prog_attach(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS, 0);
- if (err) {
- fprintf(stderr, "ERROR: bpf_prog_attach (groups): %d (%s)\n",
- err, strerror(errno));
- return err;
- }
-
-run:
- err = sockmap_init_sockets();
- if (err) {
- fprintf(stderr, "ERROR: test socket failed: %d\n", err);
- goto out;
- }
-
- /* Attach txmsg program to sockmap */
- if (txmsg_pass)
- tx_prog_fd = prog_fd[3];
- else if (txmsg_noisy)
- tx_prog_fd = prog_fd[4];
- else if (txmsg_redir)
- tx_prog_fd = prog_fd[5];
- else if (txmsg_redir_noisy)
- tx_prog_fd = prog_fd[6];
- else if (txmsg_drop)
- tx_prog_fd = prog_fd[9];
- /* apply and cork must be last */
- else if (txmsg_apply)
- tx_prog_fd = prog_fd[7];
- else if (txmsg_cork)
- tx_prog_fd = prog_fd[8];
- else
- tx_prog_fd = 0;
-
- if (tx_prog_fd) {
- int redir_fd, i = 0;
-
- err = bpf_prog_attach(tx_prog_fd,
- map_fd[1], BPF_SK_MSG_VERDICT, 0);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_prog_attach (txmsg): %d (%s)\n",
- err, strerror(errno));
- return err;
- }
-
- err = bpf_map_update_elem(map_fd[1], &i, &c1, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (txmsg): %d (%s\n",
- err, strerror(errno));
- return err;
- }
-
- if (txmsg_redir || txmsg_redir_noisy)
- redir_fd = c2;
- else
- redir_fd = c1;
-
- err = bpf_map_update_elem(map_fd[2], &i, &redir_fd, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (txmsg): %d (%s\n",
- err, strerror(errno));
- return err;
- }
-
- if (txmsg_apply) {
- err = bpf_map_update_elem(map_fd[3],
- &i, &txmsg_apply, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (apply_bytes): %d (%s\n",
- err, strerror(errno));
- return err;
- }
- }
-
- if (txmsg_cork) {
- err = bpf_map_update_elem(map_fd[4],
- &i, &txmsg_cork, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (cork_bytes): %d (%s\n",
- err, strerror(errno));
- return err;
- }
- }
-
- if (txmsg_start) {
- err = bpf_map_update_elem(map_fd[5],
- &i, &txmsg_start, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (txmsg_start): %d (%s)\n",
- err, strerror(errno));
- return err;
- }
- }
-
- if (txmsg_end) {
- i = 1;
- err = bpf_map_update_elem(map_fd[5],
- &i, &txmsg_end, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (txmsg_end): %d (%s)\n",
- err, strerror(errno));
- return err;
- }
- }
-
- if (txmsg_ingress) {
- int in = BPF_F_INGRESS;
-
- i = 0;
- err = bpf_map_update_elem(map_fd[6], &i, &in, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (txmsg_ingress): %d (%s)\n",
- err, strerror(errno));
- }
- i = 1;
- err = bpf_map_update_elem(map_fd[1], &i, &p1, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (p1 txmsg): %d (%s)\n",
- err, strerror(errno));
- }
- err = bpf_map_update_elem(map_fd[2], &i, &p1, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (p1 redir): %d (%s)\n",
- err, strerror(errno));
- }
-
- i = 2;
- err = bpf_map_update_elem(map_fd[2], &i, &p2, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (p2 txmsg): %d (%s)\n",
- err, strerror(errno));
- }
- }
-
- if (txmsg_skb) {
- int skb_fd = (test == SENDMSG || test == SENDPAGE) ? p2 : p1;
- int ingress = BPF_F_INGRESS;
-
- i = 0;
- err = bpf_map_update_elem(map_fd[7], &i, &ingress, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (txmsg_ingress): %d (%s)\n",
- err, strerror(errno));
- }
-
- i = 3;
- err = bpf_map_update_elem(map_fd[0], &i, &skb_fd, BPF_ANY);
- if (err) {
- fprintf(stderr,
- "ERROR: bpf_map_update_elem (c1 sockmap): %d (%s)\n",
- err, strerror(errno));
- }
- }
- }
-
- if (txmsg_drop)
- options.drop_expected = true;
-
- if (test == PING_PONG)
- err = forever_ping_pong(rate, &options);
- else if (test == SENDMSG) {
- options.base = false;
- options.sendpage = false;
- err = sendmsg_test(iov_count, length, rate, &options);
- } else if (test == SENDPAGE) {
- options.base = false;
- options.sendpage = true;
- err = sendmsg_test(iov_count, length, rate, &options);
- } else if (test == BASE) {
- options.base = true;
- options.sendpage = false;
- err = sendmsg_test(iov_count, length, rate, &options);
- } else if (test == BASE_SENDPAGE) {
- options.base = true;
- options.sendpage = true;
- err = sendmsg_test(iov_count, length, rate, &options);
- } else
- fprintf(stderr, "unknown test\n");
-out:
- bpf_prog_detach2(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS);
- close(s1);
- close(s2);
- close(p1);
- close(p2);
- close(c1);
- close(c2);
- close(cg_fd);
- return err;
-}
-
-void running_handler(int a)
-{
- running = 0;
-}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
new file mode 100755
index 0000000..5010a4d
--- /dev/null
+++ b/scripts/bpf_helpers_doc.py
@@ -0,0 +1,421 @@
+#!/usr/bin/python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (C) 2018 Netronome Systems, Inc.
+
+# In case user attempts to run with Python 2.
+from __future__ import print_function
+
+import argparse
+import re
+import sys, os
+
+class NoHelperFound(BaseException):
+ pass
+
+class ParsingError(BaseException):
+ def __init__(self, line='<line not provided>', reader=None):
+ if reader:
+ BaseException.__init__(self,
+ 'Error at file offset %d, parsing line: %s' %
+ (reader.tell(), line))
+ else:
+ BaseException.__init__(self, 'Error parsing line: %s' % line)
+
+class Helper(object):
+ """
+ An object representing the description of an eBPF helper function.
+ @proto: function prototype of the helper function
+ @desc: textual description of the helper function
+ @ret: description of the return value of the helper function
+ """
+ def __init__(self, proto='', desc='', ret=''):
+ self.proto = proto
+ self.desc = desc
+ self.ret = ret
+
+ def proto_break_down(self):
+ """
+ Break down helper function protocol into smaller chunks: return type,
+ name, distincts arguments.
+ """
+ arg_re = re.compile('((const )?(struct )?(\w+|...))( (\**)(\w+))?$')
+ res = {}
+ proto_re = re.compile('(.+) (\**)(\w+)\(((([^,]+)(, )?){1,5})\)$')
+
+ capture = proto_re.match(self.proto)
+ res['ret_type'] = capture.group(1)
+ res['ret_star'] = capture.group(2)
+ res['name'] = capture.group(3)
+ res['args'] = []
+
+ args = capture.group(4).split(', ')
+ for a in args:
+ capture = arg_re.match(a)
+ res['args'].append({
+ 'type' : capture.group(1),
+ 'star' : capture.group(6),
+ 'name' : capture.group(7)
+ })
+
+ return res
+
+class HeaderParser(object):
+ """
+ An object used to parse a file in order to extract the documentation of a
+ list of eBPF helper functions. All the helpers that can be retrieved are
+ stored as Helper object, in the self.helpers() array.
+ @filename: name of file to parse, usually include/uapi/linux/bpf.h in the
+ kernel tree
+ """
+ def __init__(self, filename):
+ self.reader = open(filename, 'r')
+ self.line = ''
+ self.helpers = []
+
+ def parse_helper(self):
+ proto = self.parse_proto()
+ desc = self.parse_desc()
+ ret = self.parse_ret()
+ return Helper(proto=proto, desc=desc, ret=ret)
+
+ def parse_proto(self):
+ # Argument can be of shape:
+ # - "void"
+ # - "type name"
+ # - "type *name"
+ # - Same as above, with "const" and/or "struct" in front of type
+ # - "..." (undefined number of arguments, for bpf_trace_printk())
+ # There is at least one term ("void"), and at most five arguments.
+ p = re.compile(' \* ?((.+) \**\w+\((((const )?(struct )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}\))$')
+ capture = p.match(self.line)
+ if not capture:
+ raise NoHelperFound
+ self.line = self.reader.readline()
+ return capture.group(1)
+
+ def parse_desc(self):
+ p = re.compile(' \* ?(?:\t| {5,8})Description$')
+ capture = p.match(self.line)
+ if not capture:
+ # Helper can have empty description and we might be parsing another
+ # attribute: return but do not consume.
+ return ''
+ # Description can be several lines, some of them possibly empty, and it
+ # stops when another subsection title is met.
+ desc = ''
+ while True:
+ self.line = self.reader.readline()
+ if self.line == ' *\n':
+ desc += '\n'
+ else:
+ p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)')
+ capture = p.match(self.line)
+ if capture:
+ desc += capture.group(1) + '\n'
+ else:
+ break
+ return desc
+
+ def parse_ret(self):
+ p = re.compile(' \* ?(?:\t| {5,8})Return$')
+ capture = p.match(self.line)
+ if not capture:
+ # Helper can have empty retval and we might be parsing another
+ # attribute: return but do not consume.
+ return ''
+ # Return value description can be several lines, some of them possibly
+ # empty, and it stops when another subsection title is met.
+ ret = ''
+ while True:
+ self.line = self.reader.readline()
+ if self.line == ' *\n':
+ ret += '\n'
+ else:
+ p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)')
+ capture = p.match(self.line)
+ if capture:
+ ret += capture.group(1) + '\n'
+ else:
+ break
+ return ret
+
+ def run(self):
+ # Advance to start of helper function descriptions.
+ offset = self.reader.read().find('* Start of BPF helper function descriptions:')
+ if offset == -1:
+ raise Exception('Could not find start of eBPF helper descriptions list')
+ self.reader.seek(offset)
+ self.reader.readline()
+ self.reader.readline()
+ self.line = self.reader.readline()
+
+ while True:
+ try:
+ helper = self.parse_helper()
+ self.helpers.append(helper)
+ except NoHelperFound:
+ break
+
+ self.reader.close()
+ print('Parsed description of %d helper function(s)' % len(self.helpers),
+ file=sys.stderr)
+
+###############################################################################
+
+class Printer(object):
+ """
+ A generic class for printers. Printers should be created with an array of
+ Helper objects, and implement a way to print them in the desired fashion.
+ @helpers: array of Helper objects to print to standard output
+ """
+ def __init__(self, helpers):
+ self.helpers = helpers
+
+ def print_header(self):
+ pass
+
+ def print_footer(self):
+ pass
+
+ def print_one(self, helper):
+ pass
+
+ def print_all(self):
+ self.print_header()
+ for helper in self.helpers:
+ self.print_one(helper)
+ self.print_footer()
+
+class PrinterRST(Printer):
+ """
+ A printer for dumping collected information about helpers as a ReStructured
+ Text page compatible with the rst2man program, which can be used to
+ generate a manual page for the helpers.
+ @helpers: array of Helper objects to print to standard output
+ """
+ def print_header(self):
+ header = '''\
+.. Copyright (C) All BPF authors and contributors from 2014 to present.
+.. See git log include/uapi/linux/bpf.h in kernel tree for details.
+..
+.. %%%LICENSE_START(VERBATIM)
+.. Permission is granted to make and distribute verbatim copies of this
+.. manual provided the copyright notice and this permission notice are
+.. preserved on all copies.
+..
+.. Permission is granted to copy and distribute modified versions of this
+.. manual under the conditions for verbatim copying, provided that the
+.. entire resulting derived work is distributed under the terms of a
+.. permission notice identical to this one.
+..
+.. Since the Linux kernel and libraries are constantly changing, this
+.. manual page may be incorrect or out-of-date. The author(s) assume no
+.. responsibility for errors or omissions, or for damages resulting from
+.. the use of the information contained herein. The author(s) may not
+.. have taken the same level of care in the production of this manual,
+.. which is licensed free of charge, as they might when working
+.. professionally.
+..
+.. Formatted or processed versions of this manual, if unaccompanied by
+.. the source, must acknowledge the copyright and authors of this work.
+.. %%%LICENSE_END
+..
+.. Please do not edit this file. It was generated from the documentation
+.. located in file include/uapi/linux/bpf.h of the Linux kernel sources
+.. (helpers description), and from scripts/bpf_helpers_doc.py in the same
+.. repository (header and footer).
+
+===========
+BPF-HELPERS
+===========
+-------------------------------------------------------------------------------
+list of eBPF helper functions
+-------------------------------------------------------------------------------
+
+:Manual section: 7
+
+DESCRIPTION
+===========
+
+The extended Berkeley Packet Filter (eBPF) subsystem consists in programs
+written in a pseudo-assembly language, then attached to one of the several
+kernel hooks and run in reaction of specific events. This framework differs
+from the older, "classic" BPF (or "cBPF") in several aspects, one of them being
+the ability to call special functions (or "helpers") from within a program.
+These functions are restricted to a white-list of helpers defined in the
+kernel.
+
+These helpers are used by eBPF programs to interact with the system, or with
+the context in which they work. For instance, they can be used to print
+debugging messages, to get the time since the system was booted, to interact
+with eBPF maps, or to manipulate network packets. Since there are several eBPF
+program types, and that they do not run in the same context, each program type
+can only call a subset of those helpers.
+
+Due to eBPF conventions, a helper can not have more than five arguments.
+
+Internally, eBPF programs call directly into the compiled helper functions
+without requiring any foreign-function interface. As a result, calling helpers
+introduces no overhead, thus offering excellent performance.
+
+This document is an attempt to list and document the helpers available to eBPF
+developers. They are sorted by chronological order (the oldest helpers in the
+kernel at the top).
+
+HELPERS
+=======
+'''
+ print(header)
+
+ def print_footer(self):
+ footer = '''
+EXAMPLES
+========
+
+Example usage for most of the eBPF helpers listed in this manual page are
+available within the Linux kernel sources, at the following locations:
+
+* *samples/bpf/*
+* *tools/testing/selftests/bpf/*
+
+LICENSE
+=======
+
+eBPF programs can have an associated license, passed along with the bytecode
+instructions to the kernel when the programs are loaded. The format for that
+string is identical to the one in use for kernel modules (Dual licenses, such
+as "Dual BSD/GPL", may be used). Some helper functions are only accessible to
+programs that are compatible with the GNU Privacy License (GPL).
+
+In order to use such helpers, the eBPF program must be loaded with the correct
+license string passed (via **attr**) to the **bpf**\ () system call, and this
+generally translates into the C source code of the program containing a line
+similar to the following:
+
+::
+
+ char ____license[] __attribute__((section("license"), used)) = "GPL";
+
+IMPLEMENTATION
+==============
+
+This manual page is an effort to document the existing eBPF helper functions.
+But as of this writing, the BPF sub-system is under heavy development. New eBPF
+program or map types are added, along with new helper functions. Some helpers
+are occasionally made available for additional program types. So in spite of
+the efforts of the community, this page might not be up-to-date. If you want to
+check by yourself what helper functions exist in your kernel, or what types of
+programs they can support, here are some files among the kernel tree that you
+may be interested in:
+
+* *include/uapi/linux/bpf.h* is the main BPF header. It contains the full list
+ of all helper functions, as well as many other BPF definitions including most
+ of the flags, structs or constants used by the helpers.
+* *net/core/filter.c* contains the definition of most network-related helper
+ functions, and the list of program types from which they can be used.
+* *kernel/trace/bpf_trace.c* is the equivalent for most tracing program-related
+ helpers.
+* *kernel/bpf/verifier.c* contains the functions used to check that valid types
+ of eBPF maps are used with a given helper function.
+* *kernel/bpf/* directory contains other files in which additional helpers are
+ defined (for cgroups, sockmaps, etc.).
+
+Compatibility between helper functions and program types can generally be found
+in the files where helper functions are defined. Look for the **struct
+bpf_func_proto** objects and for functions returning them: these functions
+contain a list of helpers that a given program type can call. Note that the
+**default:** label of the **switch ... case** used to filter helpers can call
+other functions, themselves allowing access to additional helpers. The
+requirement for GPL license is also in those **struct bpf_func_proto**.
+
+Compatibility between helper functions and map types can be found in the
+**check_map_func_compatibility**\ () function in file *kernel/bpf/verifier.c*.
+
+Helper functions that invalidate the checks on **data** and **data_end**
+pointers for network processing are listed in function
+**bpf_helper_changes_pkt_data**\ () in file *net/core/filter.c*.
+
+SEE ALSO
+========
+
+**bpf**\ (2),
+**cgroups**\ (7),
+**ip**\ (8),
+**perf_event_open**\ (2),
+**sendmsg**\ (2),
+**socket**\ (7),
+**tc-bpf**\ (8)'''
+ print(footer)
+
+ def print_proto(self, helper):
+ """
+ Format function protocol with bold and italics markers. This makes RST
+ file less readable, but gives nice results in the manual page.
+ """
+ proto = helper.proto_break_down()
+
+ print('**%s %s%s(' % (proto['ret_type'],
+ proto['ret_star'].replace('*', '\\*'),
+ proto['name']),
+ end='')
+
+ comma = ''
+ for a in proto['args']:
+ one_arg = '{}{}'.format(comma, a['type'])
+ if a['name']:
+ if a['star']:
+ one_arg += ' {}**\ '.format(a['star'].replace('*', '\\*'))
+ else:
+ one_arg += '** '
+ one_arg += '*{}*\\ **'.format(a['name'])
+ comma = ', '
+ print(one_arg, end='')
+
+ print(')**')
+
+ def print_one(self, helper):
+ self.print_proto(helper)
+
+ if (helper.desc):
+ print('\tDescription')
+ # Do not strip all newline characters: formatted code at the end of
+ # a section must be followed by a blank line.
+ for line in re.sub('\n$', '', helper.desc, count=1).split('\n'):
+ print('{}{}'.format('\t\t' if line else '', line))
+
+ if (helper.ret):
+ print('\tReturn')
+ for line in helper.ret.rstrip().split('\n'):
+ print('{}{}'.format('\t\t' if line else '', line))
+
+ print('')
+
+###############################################################################
+
+# If script is launched from scripts/ from kernel tree and can access
+# ../include/uapi/linux/bpf.h, use it as a default name for the file to parse,
+# otherwise the --filename argument will be required from the command line.
+script = os.path.abspath(sys.argv[0])
+linuxRoot = os.path.dirname(os.path.dirname(script))
+bpfh = os.path.join(linuxRoot, 'include/uapi/linux/bpf.h')
+
+argParser = argparse.ArgumentParser(description="""
+Parse eBPF header file and generate documentation for eBPF helper functions.
+The RST-formatted output produced can be turned into a manual page with the
+rst2man utility.
+""")
+if (os.path.isfile(bpfh)):
+ argParser.add_argument('--filename', help='path to include/uapi/linux/bpf.h',
+ default=bpfh)
+else:
+ argParser.add_argument('--filename', help='path to include/uapi/linux/bpf.h')
+args = argParser.parse_args()
+
+# Parse file.
+headerParser = HeaderParser(args.filename)
+headerParser.run()
+
+# Print formatted output to standard output.
+printer = PrinterRST(headerParser.helpers)
+printer.print_all()
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 179dd20..ae86724 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1471,7 +1471,9 @@
return SECCLASS_QIPCRTR_SOCKET;
case PF_SMC:
return SECCLASS_SMC_SOCKET;
-#if PF_MAX > 44
+ case PF_XDP:
+ return SECCLASS_XDP_SOCKET;
+#if PF_MAX > 45
#error New address family defined, please update this function.
#endif
}
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 7f03724..bd5fe0d 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -240,9 +240,11 @@
{ "manage_subnet", NULL } },
{ "bpf",
{"map_create", "map_read", "map_write", "prog_load", "prog_run"} },
+ { "xdp_socket",
+ { COMMON_SOCK_PERMS, NULL } },
{ NULL }
};
-#if PF_MAX > 44
+#if PF_MAX > 45
#error New address family defined, please update secclass_map.
#endif
diff --git a/tools/bpf/bpftool/.gitignore b/tools/bpf/bpftool/.gitignore
new file mode 100644
index 0000000..d7e678c
--- /dev/null
+++ b/tools/bpf/bpftool/.gitignore
@@ -0,0 +1,3 @@
+*.d
+bpftool
+FEATURE-DUMP.bpftool
diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
index 0e4e923..d004f63 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
@@ -26,7 +26,8 @@
| **bpftool** **cgroup help**
|
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
-| *ATTACH_TYPE* := { **ingress** | **egress** | **sock_create** | **sock_ops** | **device** }
+| *ATTACH_TYPE* := { **ingress** | **egress** | **sock_create** | **sock_ops** | **device** |
+| **bind4** | **bind6** | **post_bind4** | **post_bind6** | **connect4** | **connect6** }
| *ATTACH_FLAGS* := { **multi** | **override** }
DESCRIPTION
@@ -63,7 +64,13 @@
**egress** egress path of the inet socket (since 4.10);
**sock_create** opening of an inet socket (since 4.10);
**sock_ops** various socket operations (since 4.12);
- **device** device access (since 4.15).
+ **device** device access (since 4.15);
+ **bind4** call to bind(2) for an inet4 socket (since 4.17);
+ **bind6** call to bind(2) for an inet6 socket (since 4.17);
+ **post_bind4** return from bind(2) for an inet4 socket (since 4.17);
+ **post_bind6** return from bind(2) for an inet6 socket (since 4.17);
+ **connect4** call to connect(2) for an inet4 socket (since 4.17);
+ **connect6** call to connect(2) for an inet6 socket (since 4.17).
**bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
Detach *PROG* from the cgroup *CGROUP* and attach type
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 457e868..a6258bc 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -22,17 +22,19 @@
=============
| **bpftool** **map { show | list }** [*MAP*]
-| **bpftool** **map dump** *MAP*
-| **bpftool** **map update** *MAP* **key** *BYTES* **value** *VALUE* [*UPDATE_FLAGS*]
-| **bpftool** **map lookup** *MAP* **key** *BYTES*
-| **bpftool** **map getnext** *MAP* [**key** *BYTES*]
-| **bpftool** **map delete** *MAP* **key** *BYTES*
-| **bpftool** **map pin** *MAP* *FILE*
+| **bpftool** **map dump** *MAP*
+| **bpftool** **map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
+| **bpftool** **map lookup** *MAP* **key** *DATA*
+| **bpftool** **map getnext** *MAP* [**key** *DATA*]
+| **bpftool** **map delete** *MAP* **key** *DATA*
+| **bpftool** **map pin** *MAP* *FILE*
+| **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
| **bpftool** **map help**
|
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
+| *DATA* := { [**hex**] *BYTES* }
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
-| *VALUE* := { *BYTES* | *MAP* | *PROG* }
+| *VALUE* := { *DATA* | *MAP* | *PROG* }
| *UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
DESCRIPTION
@@ -48,20 +50,26 @@
**bpftool map dump** *MAP*
Dump all entries in a given *MAP*.
- **bpftool map update** *MAP* **key** *BYTES* **value** *VALUE* [*UPDATE_FLAGS*]
+ **bpftool map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
Update map entry for a given *KEY*.
*UPDATE_FLAGS* can be one of: **any** update existing entry
or add if doesn't exit; **exist** update only if entry already
exists; **noexist** update only if entry doesn't exist.
- **bpftool map lookup** *MAP* **key** *BYTES*
+ If the **hex** keyword is provided in front of the bytes
+ sequence, the bytes are parsed as hexadeximal values, even if
+ no "0x" prefix is added. If the keyword is not provided, then
+ the bytes are parsed as decimal values, unless a "0x" prefix
+ (for hexadecimal) or a "0" prefix (for octal) is provided.
+
+ **bpftool map lookup** *MAP* **key** *DATA*
Lookup **key** in the map.
- **bpftool map getnext** *MAP* [**key** *BYTES*]
+ **bpftool map getnext** *MAP* [**key** *DATA*]
Get next key. If *key* is not specified, get first key.
- **bpftool map delete** *MAP* **key** *BYTES*
+ **bpftool map delete** *MAP* **key** *DATA*
Remove entry from the map.
**bpftool map pin** *MAP* *FILE*
@@ -69,6 +77,22 @@
Note: *FILE* must be located in *bpffs* mount.
+ **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
+ Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.
+
+ Install perf rings into a perf event array map and dump
+ output of any bpf_perf_event_output() call in the kernel.
+ By default read the number of CPUs on the system and
+ install perf ring for each CPU in the corresponding index
+ in the array.
+
+ If **cpu** and **index** are specified, install perf ring
+ for given **cpu** at **index** in the array (single ring).
+
+ Note that installing a perf ring into an array will silently
+ replace any existing ring. Any other application will stop
+ receiving events if it installed its rings earlier.
+
**bpftool map help**
Print short help message.
@@ -98,7 +122,12 @@
10: hash name some_map flags 0x0
key 4B value 8B max_entries 2048 memlock 167936B
-**# bpftool map update id 10 key 13 00 07 00 value 02 00 00 00 01 02 03 04**
+The following three commands are equivalent:
+
+|
+| **# bpftool map update id 10 key hex 20 c4 b7 00 value hex 0f ff ff ab 01 02 03 4c**
+| **# bpftool map update id 10 key 0x20 0xc4 0xb7 0x00 value 0x0f 0xff 0xff 0xab 0x01 0x02 0x03 0x4c**
+| **# bpftool map update id 10 key 32 196 183 0 value 15 255 255 171 1 2 3 76**
**# bpftool map lookup id 10 key 0 1 2 3**
diff --git a/tools/bpf/bpftool/Documentation/bpftool-perf.rst b/tools/bpf/bpftool/Documentation/bpftool-perf.rst
new file mode 100644
index 0000000..e3eb0ea
--- /dev/null
+++ b/tools/bpf/bpftool/Documentation/bpftool-perf.rst
@@ -0,0 +1,81 @@
+================
+bpftool-perf
+================
+-------------------------------------------------------------------------------
+tool for inspection of perf related bpf prog attachments
+-------------------------------------------------------------------------------
+
+:Manual section: 8
+
+SYNOPSIS
+========
+
+ **bpftool** [*OPTIONS*] **perf** *COMMAND*
+
+ *OPTIONS* := { [{ **-j** | **--json** }] [{ **-p** | **--pretty** }] }
+
+ *COMMANDS* :=
+ { **show** | **list** | **help** }
+
+PERF COMMANDS
+=============
+
+| **bpftool** **perf { show | list }**
+| **bpftool** **perf help**
+
+DESCRIPTION
+===========
+ **bpftool perf { show | list }**
+ List all raw_tracepoint, tracepoint, kprobe attachment in the system.
+
+ Output will start with process id and file descriptor in that process,
+ followed by bpf program id, attachment information, and attachment point.
+ The attachment point for raw_tracepoint/tracepoint is the trace probe name.
+ The attachment point for k[ret]probe is either symbol name and offset,
+ or a kernel virtual address.
+ The attachment point for u[ret]probe is the file name and the file offset.
+
+ **bpftool perf help**
+ Print short help message.
+
+OPTIONS
+=======
+ -h, --help
+ Print short generic help message (similar to **bpftool help**).
+
+ -v, --version
+ Print version number (similar to **bpftool version**).
+
+ -j, --json
+ Generate JSON output. For commands that cannot produce JSON, this
+ option has no effect.
+
+ -p, --pretty
+ Generate human-readable JSON output. Implies **-j**.
+
+EXAMPLES
+========
+
+| **# bpftool perf**
+
+::
+
+ pid 21711 fd 5: prog_id 5 kprobe func __x64_sys_write offset 0
+ pid 21765 fd 5: prog_id 7 kretprobe func __x64_sys_nanosleep offset 0
+ pid 21767 fd 5: prog_id 8 tracepoint sys_enter_nanosleep
+ pid 21800 fd 5: prog_id 9 uprobe filename /home/yhs/a.out offset 1159
+
+|
+| **# bpftool -j perf**
+
+::
+
+ [{"pid":21711,"fd":5,"prog_id":5,"fd_type":"kprobe","func":"__x64_sys_write","offset":0}, \
+ {"pid":21765,"fd":5,"prog_id":7,"fd_type":"kretprobe","func":"__x64_sys_nanosleep","offset":0}, \
+ {"pid":21767,"fd":5,"prog_id":8,"fd_type":"tracepoint","tracepoint":"sys_enter_nanosleep"}, \
+ {"pid":21800,"fd":5,"prog_id":9,"fd_type":"uprobe","filename":"/home/yhs/a.out","offset":1159}]
+
+
+SEE ALSO
+========
+ **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-map**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 67ca6c6..43d34a5 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -95,7 +95,7 @@
**# bpftool prog show**
::
- 10: xdp name some_prog tag 005a3d2123620c8b
+ 10: xdp name some_prog tag 005a3d2123620c8b gpl
loaded_at Sep 29/20:11 uid 0
xlated 528B jited 370B memlock 4096B map_ids 10
@@ -108,6 +108,7 @@
"id": 10,
"type": "xdp",
"tag": "005a3d2123620c8b",
+ "gpl_compatible": true,
"loaded_at": "Sep 29/20:11",
"uid": 0,
"bytes_xlated": 528,
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
index 20689a3..b6f5d56 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -16,20 +16,22 @@
**bpftool** **version**
- *OBJECT* := { **map** | **program** | **cgroup** }
+ *OBJECT* := { **map** | **program** | **cgroup** | **perf** }
*OPTIONS* := { { **-V** | **--version** } | { **-h** | **--help** }
| { **-j** | **--json** } [{ **-p** | **--pretty** }] }
*MAP-COMMANDS* :=
{ **show** | **list** | **dump** | **update** | **lookup** | **getnext** | **delete**
- | **pin** | **help** }
+ | **pin** | **event_pipe** | **help** }
*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin**
| **load** | **help** }
*CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
+ *PERF-COMMANDS* := { **show** | **list** | **help** }
+
DESCRIPTION
===========
*bpftool* allows for inspection and simple modification of BPF objects
@@ -56,3 +58,4 @@
SEE ALSO
========
**bpftool-map**\ (8), **bpftool-prog**\ (8), **bpftool-cgroup**\ (8)
+ **bpftool-perf**\ (8)
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 4e69782..892dbf0 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -39,7 +39,12 @@
CFLAGS += -O2
CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow -Wno-missing-field-initializers
-CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
+CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ \
+ -I$(srctree)/kernel/bpf/ \
+ -I$(srctree)/tools/include \
+ -I$(srctree)/tools/include/uapi \
+ -I$(srctree)/tools/lib/bpf \
+ -I$(srctree)/tools/perf
CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"'
LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 490811b..7bc198d 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -1,6 +1,6 @@
# bpftool(8) bash completion -*- shell-script -*-
#
-# Copyright (C) 2017 Netronome Systems, Inc.
+# Copyright (C) 2017-2018 Netronome Systems, Inc.
#
# This software is dual licensed under the GNU General License
# Version 2, June 1991 as shown in the file COPYING in the top-level
@@ -79,6 +79,14 @@
command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
}
+_bpftool_get_perf_map_ids()
+{
+ COMPREPLY+=( $( compgen -W "$( bpftool -jp map 2>&1 | \
+ command grep -C2 perf_event_array | \
+ command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
+}
+
+
_bpftool_get_prog_ids()
{
COMPREPLY+=( $( compgen -W "$( bpftool -jp prog 2>&1 | \
@@ -147,7 +155,7 @@
# Deal with simplest keywords
case $prev in
- help|key|opcodes|visual)
+ help|hex|opcodes|visual)
return 0
;;
tag)
@@ -283,7 +291,7 @@
return 0
;;
key)
- return 0
+ COMPREPLY+=( $( compgen -W 'hex' -- "$cur" ) )
;;
*)
_bpftool_once_attr 'key'
@@ -302,7 +310,7 @@
return 0
;;
key)
- return 0
+ COMPREPLY+=( $( compgen -W 'hex' -- "$cur" ) )
;;
value)
# We can have bytes, or references to a prog or a
@@ -321,6 +329,8 @@
return 0
;;
*)
+ COMPREPLY+=( $( compgen -W 'hex' \
+ -- "$cur" ) )
return 0
;;
esac
@@ -357,10 +367,34 @@
fi
return 0
;;
+ event_pipe)
+ case $prev in
+ $command)
+ COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
+ return 0
+ ;;
+ id)
+ _bpftool_get_perf_map_ids
+ return 0
+ ;;
+ cpu)
+ return 0
+ ;;
+ index)
+ return 0
+ ;;
+ *)
+ _bpftool_once_attr 'cpu'
+ _bpftool_once_attr 'index'
+ return 0
+ ;;
+ esac
+ ;;
*)
[[ $prev == $object ]] && \
COMPREPLY=( $( compgen -W 'delete dump getnext help \
- lookup pin show list update' -- "$cur" ) )
+ lookup pin event_pipe show list update' -- \
+ "$cur" ) )
;;
esac
;;
@@ -372,7 +406,8 @@
;;
attach|detach)
local ATTACH_TYPES='ingress egress sock_create sock_ops \
- device'
+ device bind4 bind6 post_bind4 post_bind6 connect4 \
+ connect6'
local ATTACH_FLAGS='multi override'
local PROG_TYPE='id pinned tag'
case $prev in
@@ -380,7 +415,8 @@
_filedir
return 0
;;
- ingress|egress|sock_create|sock_ops|device)
+ ingress|egress|sock_create|sock_ops|device|bind4|bind6|\
+ post_bind4|post_bind6|connect4|connect6)
COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \
"$cur" ) )
return 0
@@ -412,6 +448,15 @@
;;
esac
;;
+ perf)
+ case $command in
+ *)
+ [[ $prev == $object ]] && \
+ COMPREPLY=( $( compgen -W 'help \
+ show list' -- "$cur" ) )
+ ;;
+ esac
+ ;;
esac
} &&
complete -F _bpftool bpftool
diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index cae32a6..1351bd6 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -16,8 +16,11 @@
#define HELP_SPEC_ATTACH_FLAGS \
"ATTACH_FLAGS := { multi | override }"
-#define HELP_SPEC_ATTACH_TYPES \
- "ATTACH_TYPE := { ingress | egress | sock_create | sock_ops | device }"
+#define HELP_SPEC_ATTACH_TYPES \
+ " ATTACH_TYPE := { ingress | egress | sock_create |\n" \
+ " sock_ops | device | bind4 | bind6 |\n" \
+ " post_bind4 | post_bind6 | connect4 |\n" \
+ " connect6 }"
static const char * const attach_type_strings[] = {
[BPF_CGROUP_INET_INGRESS] = "ingress",
@@ -25,6 +28,12 @@
[BPF_CGROUP_INET_SOCK_CREATE] = "sock_create",
[BPF_CGROUP_SOCK_OPS] = "sock_ops",
[BPF_CGROUP_DEVICE] = "device",
+ [BPF_CGROUP_INET4_BIND] = "bind4",
+ [BPF_CGROUP_INET6_BIND] = "bind6",
+ [BPF_CGROUP_INET4_CONNECT] = "connect4",
+ [BPF_CGROUP_INET6_CONNECT] = "connect6",
+ [BPF_CGROUP_INET4_POST_BIND] = "post_bind4",
+ [BPF_CGROUP_INET6_POST_BIND] = "post_bind6",
[__MAX_BPF_ATTACH_TYPE] = NULL,
};
@@ -282,7 +291,7 @@
" %s %s detach CGROUP ATTACH_TYPE PROG\n"
" %s %s help\n"
"\n"
- " " HELP_SPEC_ATTACH_TYPES "\n"
+ HELP_SPEC_ATTACH_TYPES "\n"
" " HELP_SPEC_ATTACH_FLAGS "\n"
" " HELP_SPEC_PROGRAM "\n"
" " HELP_SPEC_OPTIONS "\n"
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 4659952..32f9e39 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -33,6 +33,7 @@
/* Author: Jakub Kicinski <kubakici@wp.pl> */
+#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <fts.h>
@@ -330,6 +331,16 @@
return NULL;
}
+void print_data_json(uint8_t *data, size_t len)
+{
+ unsigned int i;
+
+ jsonw_start_array(json_wtr);
+ for (i = 0; i < len; i++)
+ jsonw_printf(json_wtr, "%d", data[i]);
+ jsonw_end_array(json_wtr);
+}
+
void print_hex_data_json(uint8_t *data, size_t len)
{
unsigned int i;
@@ -420,6 +431,70 @@
}
}
+unsigned int get_page_size(void)
+{
+ static int result;
+
+ if (!result)
+ result = getpagesize();
+ return result;
+}
+
+unsigned int get_possible_cpus(void)
+{
+ static unsigned int result;
+ char buf[128];
+ long int n;
+ char *ptr;
+ int fd;
+
+ if (result)
+ return result;
+
+ fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
+ if (fd < 0) {
+ p_err("can't open sysfs possible cpus");
+ exit(-1);
+ }
+
+ n = read(fd, buf, sizeof(buf));
+ if (n < 2) {
+ p_err("can't read sysfs possible cpus");
+ exit(-1);
+ }
+ close(fd);
+
+ if (n == sizeof(buf)) {
+ p_err("read sysfs possible cpus overflow");
+ exit(-1);
+ }
+
+ ptr = buf;
+ n = 0;
+ while (*ptr && *ptr != '\n') {
+ unsigned int a, b;
+
+ if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
+ n += b - a + 1;
+
+ ptr = strchr(ptr, '-') + 1;
+ } else if (sscanf(ptr, "%u", &a) == 1) {
+ n++;
+ } else {
+ assert(0);
+ }
+
+ while (isdigit(*ptr))
+ ptr++;
+ if (*ptr == ',')
+ ptr++;
+ }
+
+ result = n;
+
+ return result;
+}
+
static char *
ifindex_to_name_ns(__u32 ifindex, __u32 ns_dev, __u32 ns_ino, char *buf)
{
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 1ec852d..eea7f14 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -87,7 +87,7 @@
" %s batch file FILE\n"
" %s version\n"
"\n"
- " OBJECT := { prog | map | cgroup }\n"
+ " OBJECT := { prog | map | cgroup | perf }\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, bin_name, bin_name);
@@ -216,6 +216,7 @@
{ "prog", do_prog },
{ "map", do_map },
{ "cgroup", do_cgroup },
+ { "perf", do_perf },
{ "version", do_version },
{ 0 }
};
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index b8e9584..63fdb31 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -117,14 +117,20 @@
int do_prog(int argc, char **arg);
int do_map(int argc, char **arg);
+int do_event_pipe(int argc, char **argv);
int do_cgroup(int argc, char **arg);
+int do_perf(int argc, char **arg);
int prog_parse_fd(int *argc, char ***argv);
+int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len);
void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
const char *arch);
+void print_data_json(uint8_t *data, size_t len);
void print_hex_data_json(uint8_t *data, size_t len);
+unsigned int get_page_size(void);
+unsigned int get_possible_cpus(void);
const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino);
#endif
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index f509c86..097b1a5 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2017 Netronome Systems, Inc.
+ * Copyright (C) 2017-2018 Netronome Systems, Inc.
*
* This software is dual licensed under the GNU General License Version 2,
* June 1991 as shown in the file COPYING in the top-level directory of this
@@ -34,7 +34,6 @@
/* Author: Jakub Kicinski <kubakici@wp.pl> */
#include <assert.h>
-#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <stdbool.h>
@@ -67,63 +66,9 @@
[BPF_MAP_TYPE_DEVMAP] = "devmap",
[BPF_MAP_TYPE_SOCKMAP] = "sockmap",
[BPF_MAP_TYPE_CPUMAP] = "cpumap",
+ [BPF_MAP_TYPE_SOCKHASH] = "sockhash",
};
-static unsigned int get_possible_cpus(void)
-{
- static unsigned int result;
- char buf[128];
- long int n;
- char *ptr;
- int fd;
-
- if (result)
- return result;
-
- fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
- if (fd < 0) {
- p_err("can't open sysfs possible cpus");
- exit(-1);
- }
-
- n = read(fd, buf, sizeof(buf));
- if (n < 2) {
- p_err("can't read sysfs possible cpus");
- exit(-1);
- }
- close(fd);
-
- if (n == sizeof(buf)) {
- p_err("read sysfs possible cpus overflow");
- exit(-1);
- }
-
- ptr = buf;
- n = 0;
- while (*ptr && *ptr != '\n') {
- unsigned int a, b;
-
- if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
- n += b - a + 1;
-
- ptr = strchr(ptr, '-') + 1;
- } else if (sscanf(ptr, "%u", &a) == 1) {
- n++;
- } else {
- assert(0);
- }
-
- while (isdigit(*ptr))
- ptr++;
- if (*ptr == ',')
- ptr++;
- }
-
- result = n;
-
- return result;
-}
-
static bool map_is_per_cpu(__u32 type)
{
return type == BPF_MAP_TYPE_PERCPU_HASH ||
@@ -186,8 +131,7 @@
return -1;
}
-static int
-map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
+int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
{
int err;
int fd;
@@ -283,11 +227,16 @@
static char **parse_bytes(char **argv, const char *name, unsigned char *val,
unsigned int n)
{
- unsigned int i = 0;
+ unsigned int i = 0, base = 0;
char *endptr;
+ if (is_prefix(*argv, "hex")) {
+ base = 16;
+ argv++;
+ }
+
while (i < n && argv[i]) {
- val[i] = strtoul(argv[i], &endptr, 0);
+ val[i] = strtoul(argv[i], &endptr, base);
if (*endptr) {
p_err("error parsing byte: %s", argv[i]);
return NULL;
@@ -868,23 +817,25 @@
fprintf(stderr,
"Usage: %s %s { show | list } [MAP]\n"
- " %s %s dump MAP\n"
- " %s %s update MAP key BYTES value VALUE [UPDATE_FLAGS]\n"
- " %s %s lookup MAP key BYTES\n"
- " %s %s getnext MAP [key BYTES]\n"
- " %s %s delete MAP key BYTES\n"
- " %s %s pin MAP FILE\n"
+ " %s %s dump MAP\n"
+ " %s %s update MAP key DATA value VALUE [UPDATE_FLAGS]\n"
+ " %s %s lookup MAP key DATA\n"
+ " %s %s getnext MAP [key DATA]\n"
+ " %s %s delete MAP key DATA\n"
+ " %s %s pin MAP FILE\n"
+ " %s %s event_pipe MAP [cpu N index M]\n"
" %s %s help\n"
"\n"
" MAP := { id MAP_ID | pinned FILE }\n"
+ " DATA := { [hex] BYTES }\n"
" " HELP_SPEC_PROGRAM "\n"
- " VALUE := { BYTES | MAP | PROG }\n"
+ " VALUE := { DATA | MAP | PROG }\n"
" UPDATE_FLAGS := { any | exist | noexist }\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
- bin_name, argv[-2], bin_name, argv[-2]);
+ bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
return 0;
}
@@ -899,6 +850,7 @@
{ "getnext", do_getnext },
{ "delete", do_delete },
{ "pin", do_pin },
+ { "event_pipe", do_event_pipe },
{ 0 }
};
diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
new file mode 100644
index 0000000..1832100
--- /dev/null
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2018 Netronome Systems, Inc. */
+/* This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <errno.h>
+#include <fcntl.h>
+#include <libbpf.h>
+#include <poll.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+
+#include <bpf.h>
+#include <perf-sys.h>
+
+#include "main.h"
+
+#define MMAP_PAGE_CNT 16
+
+static bool stop;
+
+struct event_ring_info {
+ int fd;
+ int key;
+ unsigned int cpu;
+ void *mem;
+};
+
+struct perf_event_sample {
+ struct perf_event_header header;
+ u64 time;
+ __u32 size;
+ unsigned char data[];
+};
+
+static void int_exit(int signo)
+{
+ fprintf(stderr, "Stopping...\n");
+ stop = true;
+}
+
+static enum bpf_perf_event_ret print_bpf_output(void *event, void *priv)
+{
+ struct event_ring_info *ring = priv;
+ struct perf_event_sample *e = event;
+ struct {
+ struct perf_event_header header;
+ __u64 id;
+ __u64 lost;
+ } *lost = event;
+
+ if (json_output) {
+ jsonw_start_object(json_wtr);
+ jsonw_name(json_wtr, "type");
+ jsonw_uint(json_wtr, e->header.type);
+ jsonw_name(json_wtr, "cpu");
+ jsonw_uint(json_wtr, ring->cpu);
+ jsonw_name(json_wtr, "index");
+ jsonw_uint(json_wtr, ring->key);
+ if (e->header.type == PERF_RECORD_SAMPLE) {
+ jsonw_name(json_wtr, "timestamp");
+ jsonw_uint(json_wtr, e->time);
+ jsonw_name(json_wtr, "data");
+ print_data_json(e->data, e->size);
+ } else if (e->header.type == PERF_RECORD_LOST) {
+ jsonw_name(json_wtr, "lost");
+ jsonw_start_object(json_wtr);
+ jsonw_name(json_wtr, "id");
+ jsonw_uint(json_wtr, lost->id);
+ jsonw_name(json_wtr, "count");
+ jsonw_uint(json_wtr, lost->lost);
+ jsonw_end_object(json_wtr);
+ }
+ jsonw_end_object(json_wtr);
+ } else {
+ if (e->header.type == PERF_RECORD_SAMPLE) {
+ printf("== @%lld.%09lld CPU: %d index: %d =====\n",
+ e->time / 1000000000ULL, e->time % 1000000000ULL,
+ ring->cpu, ring->key);
+ fprint_hex(stdout, e->data, e->size, " ");
+ printf("\n");
+ } else if (e->header.type == PERF_RECORD_LOST) {
+ printf("lost %lld events\n", lost->lost);
+ } else {
+ printf("unknown event type=%d size=%d\n",
+ e->header.type, e->header.size);
+ }
+ }
+
+ return LIBBPF_PERF_EVENT_CONT;
+}
+
+static void
+perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
+{
+ enum bpf_perf_event_ret ret;
+
+ ret = bpf_perf_event_read_simple(ring->mem,
+ MMAP_PAGE_CNT * get_page_size(),
+ get_page_size(), buf, buf_len,
+ print_bpf_output, ring);
+ if (ret != LIBBPF_PERF_EVENT_CONT) {
+ fprintf(stderr, "perf read loop failed with %d\n", ret);
+ stop = true;
+ }
+}
+
+static int perf_mmap_size(void)
+{
+ return get_page_size() * (MMAP_PAGE_CNT + 1);
+}
+
+static void *perf_event_mmap(int fd)
+{
+ int mmap_size = perf_mmap_size();
+ void *base;
+
+ base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (base == MAP_FAILED) {
+ p_err("event mmap failed: %s\n", strerror(errno));
+ return NULL;
+ }
+
+ return base;
+}
+
+static void perf_event_unmap(void *mem)
+{
+ if (munmap(mem, perf_mmap_size()))
+ fprintf(stderr, "Can't unmap ring memory!\n");
+}
+
+static int bpf_perf_event_open(int map_fd, int key, int cpu)
+{
+ struct perf_event_attr attr = {
+ .sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_TIME,
+ .type = PERF_TYPE_SOFTWARE,
+ .config = PERF_COUNT_SW_BPF_OUTPUT,
+ };
+ int pmu_fd;
+
+ pmu_fd = sys_perf_event_open(&attr, -1, cpu, -1, 0);
+ if (pmu_fd < 0) {
+ p_err("failed to open perf event %d for CPU %d", key, cpu);
+ return -1;
+ }
+
+ if (bpf_map_update_elem(map_fd, &key, &pmu_fd, BPF_ANY)) {
+ p_err("failed to update map for event %d for CPU %d", key, cpu);
+ goto err_close;
+ }
+ if (ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0)) {
+ p_err("failed to enable event %d for CPU %d", key, cpu);
+ goto err_close;
+ }
+
+ return pmu_fd;
+
+err_close:
+ close(pmu_fd);
+ return -1;
+}
+
+int do_event_pipe(int argc, char **argv)
+{
+ int i, nfds, map_fd, index = -1, cpu = -1;
+ struct bpf_map_info map_info = {};
+ struct event_ring_info *rings;
+ size_t tmp_buf_sz = 0;
+ void *tmp_buf = NULL;
+ struct pollfd *pfds;
+ __u32 map_info_len;
+ bool do_all = true;
+
+ map_info_len = sizeof(map_info);
+ map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len);
+ if (map_fd < 0)
+ return -1;
+
+ if (map_info.type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
+ p_err("map is not a perf event array");
+ goto err_close_map;
+ }
+
+ while (argc) {
+ if (argc < 2)
+ BAD_ARG();
+
+ if (is_prefix(*argv, "cpu")) {
+ char *endptr;
+
+ NEXT_ARG();
+ cpu = strtoul(*argv, &endptr, 0);
+ if (*endptr) {
+ p_err("can't parse %s as CPU ID", **argv);
+ goto err_close_map;
+ }
+
+ NEXT_ARG();
+ } else if (is_prefix(*argv, "index")) {
+ char *endptr;
+
+ NEXT_ARG();
+ index = strtoul(*argv, &endptr, 0);
+ if (*endptr) {
+ p_err("can't parse %s as index", **argv);
+ goto err_close_map;
+ }
+
+ NEXT_ARG();
+ } else {
+ BAD_ARG();
+ }
+
+ do_all = false;
+ }
+
+ if (!do_all) {
+ if (index == -1 || cpu == -1) {
+ p_err("cpu and index must be specified together");
+ goto err_close_map;
+ }
+
+ nfds = 1;
+ } else {
+ nfds = min(get_possible_cpus(), map_info.max_entries);
+ cpu = 0;
+ index = 0;
+ }
+
+ rings = calloc(nfds, sizeof(rings[0]));
+ if (!rings)
+ goto err_close_map;
+
+ pfds = calloc(nfds, sizeof(pfds[0]));
+ if (!pfds)
+ goto err_free_rings;
+
+ for (i = 0; i < nfds; i++) {
+ rings[i].cpu = cpu + i;
+ rings[i].key = index + i;
+
+ rings[i].fd = bpf_perf_event_open(map_fd, rings[i].key,
+ rings[i].cpu);
+ if (rings[i].fd < 0)
+ goto err_close_fds_prev;
+
+ rings[i].mem = perf_event_mmap(rings[i].fd);
+ if (!rings[i].mem)
+ goto err_close_fds_current;
+
+ pfds[i].fd = rings[i].fd;
+ pfds[i].events = POLLIN;
+ }
+
+ signal(SIGINT, int_exit);
+ signal(SIGHUP, int_exit);
+ signal(SIGTERM, int_exit);
+
+ if (json_output)
+ jsonw_start_array(json_wtr);
+
+ while (!stop) {
+ poll(pfds, nfds, 200);
+ for (i = 0; i < nfds; i++)
+ perf_event_read(&rings[i], &tmp_buf, &tmp_buf_sz);
+ }
+ free(tmp_buf);
+
+ if (json_output)
+ jsonw_end_array(json_wtr);
+
+ for (i = 0; i < nfds; i++) {
+ perf_event_unmap(rings[i].mem);
+ close(rings[i].fd);
+ }
+ free(pfds);
+ free(rings);
+ close(map_fd);
+
+ return 0;
+
+err_close_fds_prev:
+ while (i--) {
+ perf_event_unmap(rings[i].mem);
+err_close_fds_current:
+ close(rings[i].fd);
+ }
+ free(pfds);
+err_free_rings:
+ free(rings);
+err_close_map:
+ close(map_fd);
+ return -1;
+}
diff --git a/tools/bpf/bpftool/perf.c b/tools/bpf/bpftool/perf.c
new file mode 100644
index 0000000..ac6b1a1
--- /dev/null
+++ b/tools/bpf/bpftool/perf.c
@@ -0,0 +1,246 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright (C) 2018 Facebook
+// Author: Yonghong Song <yhs@fb.com>
+
+#define _GNU_SOURCE
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <ftw.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+/* 0: undecided, 1: supported, 2: not supported */
+static int perf_query_supported;
+static bool has_perf_query_support(void)
+{
+ __u64 probe_offset, probe_addr;
+ __u32 len, prog_id, fd_type;
+ char buf[256];
+ int fd;
+
+ if (perf_query_supported)
+ goto out;
+
+ fd = open(bin_name, O_RDONLY);
+ if (fd < 0) {
+ p_err("perf_query_support: %s", strerror(errno));
+ goto out;
+ }
+
+ /* the following query will fail as no bpf attachment,
+ * the expected errno is ENOTSUPP
+ */
+ errno = 0;
+ len = sizeof(buf);
+ bpf_task_fd_query(getpid(), fd, 0, buf, &len, &prog_id,
+ &fd_type, &probe_offset, &probe_addr);
+
+ if (errno == 524 /* ENOTSUPP */) {
+ perf_query_supported = 1;
+ goto close_fd;
+ }
+
+ perf_query_supported = 2;
+ p_err("perf_query_support: %s", strerror(errno));
+ fprintf(stderr,
+ "HINT: non root or kernel doesn't support TASK_FD_QUERY\n");
+
+close_fd:
+ close(fd);
+out:
+ return perf_query_supported == 1;
+}
+
+static void print_perf_json(int pid, int fd, __u32 prog_id, __u32 fd_type,
+ char *buf, __u64 probe_offset, __u64 probe_addr)
+{
+ jsonw_start_object(json_wtr);
+ jsonw_int_field(json_wtr, "pid", pid);
+ jsonw_int_field(json_wtr, "fd", fd);
+ jsonw_uint_field(json_wtr, "prog_id", prog_id);
+ switch (fd_type) {
+ case BPF_FD_TYPE_RAW_TRACEPOINT:
+ jsonw_string_field(json_wtr, "fd_type", "raw_tracepoint");
+ jsonw_string_field(json_wtr, "tracepoint", buf);
+ break;
+ case BPF_FD_TYPE_TRACEPOINT:
+ jsonw_string_field(json_wtr, "fd_type", "tracepoint");
+ jsonw_string_field(json_wtr, "tracepoint", buf);
+ break;
+ case BPF_FD_TYPE_KPROBE:
+ jsonw_string_field(json_wtr, "fd_type", "kprobe");
+ if (buf[0] != '\0') {
+ jsonw_string_field(json_wtr, "func", buf);
+ jsonw_lluint_field(json_wtr, "offset", probe_offset);
+ } else {
+ jsonw_lluint_field(json_wtr, "addr", probe_addr);
+ }
+ break;
+ case BPF_FD_TYPE_KRETPROBE:
+ jsonw_string_field(json_wtr, "fd_type", "kretprobe");
+ if (buf[0] != '\0') {
+ jsonw_string_field(json_wtr, "func", buf);
+ jsonw_lluint_field(json_wtr, "offset", probe_offset);
+ } else {
+ jsonw_lluint_field(json_wtr, "addr", probe_addr);
+ }
+ break;
+ case BPF_FD_TYPE_UPROBE:
+ jsonw_string_field(json_wtr, "fd_type", "uprobe");
+ jsonw_string_field(json_wtr, "filename", buf);
+ jsonw_lluint_field(json_wtr, "offset", probe_offset);
+ break;
+ case BPF_FD_TYPE_URETPROBE:
+ jsonw_string_field(json_wtr, "fd_type", "uretprobe");
+ jsonw_string_field(json_wtr, "filename", buf);
+ jsonw_lluint_field(json_wtr, "offset", probe_offset);
+ break;
+ }
+ jsonw_end_object(json_wtr);
+}
+
+static void print_perf_plain(int pid, int fd, __u32 prog_id, __u32 fd_type,
+ char *buf, __u64 probe_offset, __u64 probe_addr)
+{
+ printf("pid %d fd %d: prog_id %u ", pid, fd, prog_id);
+ switch (fd_type) {
+ case BPF_FD_TYPE_RAW_TRACEPOINT:
+ printf("raw_tracepoint %s\n", buf);
+ break;
+ case BPF_FD_TYPE_TRACEPOINT:
+ printf("tracepoint %s\n", buf);
+ break;
+ case BPF_FD_TYPE_KPROBE:
+ if (buf[0] != '\0')
+ printf("kprobe func %s offset %llu\n", buf,
+ probe_offset);
+ else
+ printf("kprobe addr %llu\n", probe_addr);
+ break;
+ case BPF_FD_TYPE_KRETPROBE:
+ if (buf[0] != '\0')
+ printf("kretprobe func %s offset %llu\n", buf,
+ probe_offset);
+ else
+ printf("kretprobe addr %llu\n", probe_addr);
+ break;
+ case BPF_FD_TYPE_UPROBE:
+ printf("uprobe filename %s offset %llu\n", buf, probe_offset);
+ break;
+ case BPF_FD_TYPE_URETPROBE:
+ printf("uretprobe filename %s offset %llu\n", buf,
+ probe_offset);
+ break;
+ }
+}
+
+static int show_proc(const char *fpath, const struct stat *sb,
+ int tflag, struct FTW *ftwbuf)
+{
+ __u64 probe_offset, probe_addr;
+ __u32 len, prog_id, fd_type;
+ int err, pid = 0, fd = 0;
+ const char *pch;
+ char buf[4096];
+
+ /* prefix always /proc */
+ pch = fpath + 5;
+ if (*pch == '\0')
+ return 0;
+
+ /* pid should be all numbers */
+ pch++;
+ while (isdigit(*pch)) {
+ pid = pid * 10 + *pch - '0';
+ pch++;
+ }
+ if (*pch == '\0')
+ return 0;
+ if (*pch != '/')
+ return FTW_SKIP_SUBTREE;
+
+ /* check /proc/<pid>/fd directory */
+ pch++;
+ if (strncmp(pch, "fd", 2))
+ return FTW_SKIP_SUBTREE;
+ pch += 2;
+ if (*pch == '\0')
+ return 0;
+ if (*pch != '/')
+ return FTW_SKIP_SUBTREE;
+
+ /* check /proc/<pid>/fd/<fd_num> */
+ pch++;
+ while (isdigit(*pch)) {
+ fd = fd * 10 + *pch - '0';
+ pch++;
+ }
+ if (*pch != '\0')
+ return FTW_SKIP_SUBTREE;
+
+ /* query (pid, fd) for potential perf events */
+ len = sizeof(buf);
+ err = bpf_task_fd_query(pid, fd, 0, buf, &len, &prog_id, &fd_type,
+ &probe_offset, &probe_addr);
+ if (err < 0)
+ return 0;
+
+ if (json_output)
+ print_perf_json(pid, fd, prog_id, fd_type, buf, probe_offset,
+ probe_addr);
+ else
+ print_perf_plain(pid, fd, prog_id, fd_type, buf, probe_offset,
+ probe_addr);
+
+ return 0;
+}
+
+static int do_show(int argc, char **argv)
+{
+ int flags = FTW_ACTIONRETVAL | FTW_PHYS;
+ int err = 0, nopenfd = 16;
+
+ if (!has_perf_query_support())
+ return -1;
+
+ if (json_output)
+ jsonw_start_array(json_wtr);
+ if (nftw("/proc", show_proc, nopenfd, flags) == -1) {
+ p_err("%s", strerror(errno));
+ err = -1;
+ }
+ if (json_output)
+ jsonw_end_array(json_wtr);
+
+ return err;
+}
+
+static int do_help(int argc, char **argv)
+{
+ fprintf(stderr,
+ "Usage: %s %s { show | list | help }\n"
+ "",
+ bin_name, argv[-2]);
+
+ return 0;
+}
+
+static const struct cmd cmds[] = {
+ { "show", do_show },
+ { "list", do_show },
+ { "help", do_help },
+ { 0 }
+};
+
+int do_perf(int argc, char **argv)
+{
+ return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index f7a8108..39b88e7 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -68,6 +68,9 @@
[BPF_PROG_TYPE_SOCK_OPS] = "sock_ops",
[BPF_PROG_TYPE_SK_SKB] = "sk_skb",
[BPF_PROG_TYPE_CGROUP_DEVICE] = "cgroup_device",
+ [BPF_PROG_TYPE_SK_MSG] = "sk_msg",
+ [BPF_PROG_TYPE_RAW_TRACEPOINT] = "raw_tracepoint",
+ [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
};
static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
@@ -93,7 +96,10 @@
return;
}
- strftime(buf, size, "%b %d/%H:%M", &load_tm);
+ if (json_output)
+ strftime(buf, size, "%s", &load_tm);
+ else
+ strftime(buf, size, "%FT%T%z", &load_tm);
}
static int prog_fd_by_tag(unsigned char *tag)
@@ -232,6 +238,8 @@
info->tag[0], info->tag[1], info->tag[2], info->tag[3],
info->tag[4], info->tag[5], info->tag[6], info->tag[7]);
+ jsonw_bool_field(json_wtr, "gpl_compatible", info->gpl_compatible);
+
print_dev_json(info->ifindex, info->netns_dev, info->netns_ino);
if (info->load_time) {
@@ -240,7 +248,8 @@
print_boot_time(info->load_time, buf, sizeof(buf));
/* Piggy back on load_time, since 0 uid is a valid one */
- jsonw_string_field(json_wtr, "loaded_at", buf);
+ jsonw_name(json_wtr, "loaded_at");
+ jsonw_printf(json_wtr, "%s", buf);
jsonw_uint_field(json_wtr, "uid", info->created_by_uid);
}
@@ -292,6 +301,7 @@
printf("tag ");
fprint_hex(stdout, info->tag, BPF_TAG_SIZE, "");
print_dev_plain(info->ifindex, info->netns_dev, info->netns_ino);
+ printf("%s", info->gpl_compatible ? " gpl" : "");
printf("\n");
if (info->load_time) {
@@ -410,7 +420,11 @@
static int do_dump(int argc, char **argv)
{
+ unsigned long *func_ksyms = NULL;
struct bpf_prog_info info = {};
+ unsigned int *func_lens = NULL;
+ unsigned int nr_func_ksyms;
+ unsigned int nr_func_lens;
struct dump_data dd = {};
__u32 len = sizeof(info);
unsigned int buf_size;
@@ -486,10 +500,34 @@
return -1;
}
+ nr_func_ksyms = info.nr_jited_ksyms;
+ if (nr_func_ksyms) {
+ func_ksyms = malloc(nr_func_ksyms * sizeof(__u64));
+ if (!func_ksyms) {
+ p_err("mem alloc failed");
+ close(fd);
+ goto err_free;
+ }
+ }
+
+ nr_func_lens = info.nr_jited_func_lens;
+ if (nr_func_lens) {
+ func_lens = malloc(nr_func_lens * sizeof(__u32));
+ if (!func_lens) {
+ p_err("mem alloc failed");
+ close(fd);
+ goto err_free;
+ }
+ }
+
memset(&info, 0, sizeof(info));
*member_ptr = ptr_to_u64(buf);
*member_len = buf_size;
+ info.jited_ksyms = ptr_to_u64(func_ksyms);
+ info.nr_jited_ksyms = nr_func_ksyms;
+ info.jited_func_lens = ptr_to_u64(func_lens);
+ info.nr_jited_func_lens = nr_func_lens;
err = bpf_obj_get_info_by_fd(fd, &info, &len);
close(fd);
@@ -503,6 +541,16 @@
goto err_free;
}
+ if (info.nr_jited_ksyms > nr_func_ksyms) {
+ p_err("too many addresses returned");
+ goto err_free;
+ }
+
+ if (info.nr_jited_func_lens > nr_func_lens) {
+ p_err("too many values returned");
+ goto err_free;
+ }
+
if ((member_len == &info.jited_prog_len &&
info.jited_prog_insns == 0) ||
(member_len == &info.xlated_prog_len &&
@@ -540,7 +588,57 @@
goto err_free;
}
- disasm_print_insn(buf, *member_len, opcodes, name);
+ if (info.nr_jited_func_lens && info.jited_func_lens) {
+ struct kernel_sym *sym = NULL;
+ char sym_name[SYM_MAX_NAME];
+ unsigned char *img = buf;
+ __u64 *ksyms = NULL;
+ __u32 *lens;
+ __u32 i;
+
+ if (info.nr_jited_ksyms) {
+ kernel_syms_load(&dd);
+ ksyms = (__u64 *) info.jited_ksyms;
+ }
+
+ if (json_output)
+ jsonw_start_array(json_wtr);
+
+ lens = (__u32 *) info.jited_func_lens;
+ for (i = 0; i < info.nr_jited_func_lens; i++) {
+ if (ksyms) {
+ sym = kernel_syms_search(&dd, ksyms[i]);
+ if (sym)
+ sprintf(sym_name, "%s", sym->name);
+ else
+ sprintf(sym_name, "0x%016llx", ksyms[i]);
+ } else {
+ strcpy(sym_name, "unknown");
+ }
+
+ if (json_output) {
+ jsonw_start_object(json_wtr);
+ jsonw_name(json_wtr, "name");
+ jsonw_string(json_wtr, sym_name);
+ jsonw_name(json_wtr, "insns");
+ } else {
+ printf("%s:\n", sym_name);
+ }
+
+ disasm_print_insn(img, lens[i], opcodes, name);
+ img += lens[i];
+
+ if (json_output)
+ jsonw_end_object(json_wtr);
+ else
+ printf("\n");
+ }
+
+ if (json_output)
+ jsonw_end_array(json_wtr);
+ } else {
+ disasm_print_insn(buf, *member_len, opcodes, name);
+ }
} else if (visual) {
if (json_output)
jsonw_null(json_wtr);
@@ -548,6 +646,9 @@
dump_xlated_cfg(buf, *member_len);
} else {
kernel_syms_load(&dd);
+ dd.nr_jited_ksyms = info.nr_jited_ksyms;
+ dd.jited_ksyms = (__u64 *) info.jited_ksyms;
+
if (json_output)
dump_xlated_json(&dd, buf, *member_len, opcodes);
else
@@ -556,10 +657,14 @@
}
free(buf);
+ free(func_ksyms);
+ free(func_lens);
return 0;
err_free:
free(buf);
+ free(func_ksyms);
+ free(func_lens);
return -1;
}
diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
index 7a3173b..b97f1da 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -102,8 +102,8 @@
free(dd->sym_mapping);
}
-static struct kernel_sym *kernel_syms_search(struct dump_data *dd,
- unsigned long key)
+struct kernel_sym *kernel_syms_search(struct dump_data *dd,
+ unsigned long key)
{
struct kernel_sym sym = {
.address = key,
@@ -174,7 +174,11 @@
unsigned long address,
const struct bpf_insn *insn)
{
- if (sym)
+ if (!dd->nr_jited_ksyms)
+ /* Do not show address for interpreted programs */
+ snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+ "%+d", insn->off);
+ else if (sym)
snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
"%+d#%s", insn->off, sym->name);
else
@@ -203,6 +207,10 @@
unsigned long address = dd->address_call_base + insn->imm;
struct kernel_sym *sym;
+ if (insn->src_reg == BPF_PSEUDO_CALL &&
+ (__u32) insn->imm < dd->nr_jited_ksyms)
+ address = dd->jited_ksyms[insn->imm];
+
sym = kernel_syms_search(dd, address);
if (insn->src_reg == BPF_PSEUDO_CALL)
return print_call_pcrel(dd, sym, address, insn);
diff --git a/tools/bpf/bpftool/xlated_dumper.h b/tools/bpf/bpftool/xlated_dumper.h
index b34affa..33d86e2 100644
--- a/tools/bpf/bpftool/xlated_dumper.h
+++ b/tools/bpf/bpftool/xlated_dumper.h
@@ -49,11 +49,14 @@
unsigned long address_call_base;
struct kernel_sym *sym_mapping;
__u32 sym_count;
+ __u64 *jited_ksyms;
+ __u32 nr_jited_ksyms;
char scratch_buff[SYM_MAX_NAME + 8];
};
void kernel_syms_load(struct dump_data *dd);
void kernel_syms_destroy(struct dump_data *dd);
+struct kernel_sym *kernel_syms_search(struct dump_data *dd, unsigned long key);
void dump_xlated_json(struct dump_data *dd, void *buf, unsigned int len,
bool opcodes);
void dump_xlated_plain(struct dump_data *dd, void *buf, unsigned int len,
diff --git a/tools/include/uapi/asm/bitsperlong.h b/tools/include/uapi/asm/bitsperlong.h
new file mode 100644
index 0000000..8dd6aef
--- /dev/null
+++ b/tools/include/uapi/asm/bitsperlong.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if defined(__i386__) || defined(__x86_64__)
+#include "../../arch/x86/include/uapi/asm/bitsperlong.h"
+#elif defined(__aarch64__)
+#include "../../arch/arm64/include/uapi/asm/bitsperlong.h"
+#elif defined(__powerpc__)
+#include "../../arch/powerpc/include/uapi/asm/bitsperlong.h"
+#elif defined(__s390__)
+#include "../../arch/s390/include/uapi/asm/bitsperlong.h"
+#elif defined(__sparc__)
+#include "../../arch/sparc/include/uapi/asm/bitsperlong.h"
+#elif defined(__mips__)
+#include "../../arch/mips/include/uapi/asm/bitsperlong.h"
+#elif defined(__ia64__)
+#include "../../arch/ia64/include/uapi/asm/bitsperlong.h"
+#else
+#include <asm-generic/bitsperlong.h>
+#endif
diff --git a/tools/include/uapi/asm/errno.h b/tools/include/uapi/asm/errno.h
new file mode 100644
index 0000000..ce3c594
--- /dev/null
+++ b/tools/include/uapi/asm/errno.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if defined(__i386__) || defined(__x86_64__)
+#include "../../arch/x86/include/uapi/asm/errno.h"
+#elif defined(__powerpc__)
+#include "../../arch/powerpc/include/uapi/asm/errno.h"
+#elif defined(__sparc__)
+#include "../../arch/sparc/include/uapi/asm/errno.h"
+#elif defined(__alpha__)
+#include "../../arch/alpha/include/uapi/asm/errno.h"
+#elif defined(__mips__)
+#include "../../arch/mips/include/uapi/asm/errno.h"
+#elif defined(__ia64__)
+#include "../../arch/ia64/include/uapi/asm/errno.h"
+#elif defined(__xtensa__)
+#include "../../arch/xtensa/include/uapi/asm/errno.h"
+#else
+#include <asm-generic/errno.h>
+#endif
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c5ec897..9b8c6e3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -95,6 +95,9 @@
BPF_OBJ_GET_INFO_BY_FD,
BPF_PROG_QUERY,
BPF_RAW_TRACEPOINT_OPEN,
+ BPF_BTF_LOAD,
+ BPF_BTF_GET_FD_BY_ID,
+ BPF_TASK_FD_QUERY,
};
enum bpf_map_type {
@@ -115,6 +118,8 @@
BPF_MAP_TYPE_DEVMAP,
BPF_MAP_TYPE_SOCKMAP,
BPF_MAP_TYPE_CPUMAP,
+ BPF_MAP_TYPE_XSKMAP,
+ BPF_MAP_TYPE_SOCKHASH,
};
enum bpf_prog_type {
@@ -137,6 +142,7 @@
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+ BPF_PROG_TYPE_LWT_SEG6LOCAL,
};
enum bpf_attach_type {
@@ -279,6 +285,9 @@
*/
char map_name[BPF_OBJ_NAME_LEN];
__u32 map_ifindex; /* ifindex of netdev to create on */
+ __u32 btf_fd; /* fd pointing to a BTF type data */
+ __u32 btf_key_type_id; /* BTF type_id of the key */
+ __u32 btf_value_type_id; /* BTF type_id of the value */
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -339,6 +348,7 @@
__u32 start_id;
__u32 prog_id;
__u32 map_id;
+ __u32 btf_id;
};
__u32 next_id;
__u32 open_flags;
@@ -363,398 +373,1637 @@
__u64 name;
__u32 prog_fd;
} raw_tracepoint;
+
+ struct { /* anonymous struct for BPF_BTF_LOAD */
+ __aligned_u64 btf;
+ __aligned_u64 btf_log_buf;
+ __u32 btf_size;
+ __u32 btf_log_size;
+ __u32 btf_log_level;
+ };
+
+ struct {
+ __u32 pid; /* input: pid */
+ __u32 fd; /* input: fd */
+ __u32 flags; /* input: flags */
+ __u32 buf_len; /* input/output: buf len */
+ __aligned_u64 buf; /* input/output:
+ * tp_name for tracepoint
+ * symbol for kprobe
+ * filename for uprobe
+ */
+ __u32 prog_id; /* output: prod_id */
+ __u32 fd_type; /* output: BPF_FD_TYPE_* */
+ __u64 probe_offset; /* output: probe_offset */
+ __u64 probe_addr; /* output: probe_addr */
+ } task_fd_query;
} __attribute__((aligned(8)));
-/* BPF helper function descriptions:
+/* The description below is an attempt at providing documentation to eBPF
+ * developers about the multiple available eBPF helper functions. It can be
+ * parsed and used to produce a manual page. The workflow is the following,
+ * and requires the rst2man utility:
*
- * void *bpf_map_lookup_elem(&map, &key)
- * Return: Map value or NULL
+ * $ ./scripts/bpf_helpers_doc.py \
+ * --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
+ * $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
+ * $ man /tmp/bpf-helpers.7
*
- * int bpf_map_update_elem(&map, &key, &value, flags)
- * Return: 0 on success or negative error
+ * Note that in order to produce this external documentation, some RST
+ * formatting is used in the descriptions to get "bold" and "italics" in
+ * manual pages. Also note that the few trailing white spaces are
+ * intentional, removing them would break paragraphs for rst2man.
*
- * int bpf_map_delete_elem(&map, &key)
- * Return: 0 on success or negative error
+ * Start of BPF helper function descriptions:
*
- * int bpf_probe_read(void *dst, int size, void *src)
- * Return: 0 on success or negative error
+ * void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Perform a lookup in *map* for an entry associated to *key*.
+ * Return
+ * Map value associated to *key*, or **NULL** if no entry was
+ * found.
+ *
+ * int bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
+ * Description
+ * Add or update the value of the entry associated to *key* in
+ * *map* with *value*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * Flag value **BPF_NOEXIST** cannot be used for maps of types
+ * **BPF_MAP_TYPE_ARRAY** or **BPF_MAP_TYPE_PERCPU_ARRAY** (all
+ * elements always exist), the helper would return an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_delete_elem(struct bpf_map *map, const void *key)
+ * Description
+ * Delete entry with *key* from *map*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_probe_read(void *dst, u32 size, const void *src)
+ * Description
+ * For tracing programs, safely attempt to read *size* bytes from
+ * address *src* and store the data in *dst*.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
* u64 bpf_ktime_get_ns(void)
- * Return: current ktime
+ * Description
+ * Return the time elapsed since system boot, in nanoseconds.
+ * Return
+ * Current *ktime*.
*
- * int bpf_trace_printk(const char *fmt, int fmt_size, ...)
- * Return: length of buffer written or negative error
+ * int bpf_trace_printk(const char *fmt, u32 fmt_size, ...)
+ * Description
+ * This helper is a "printk()-like" facility for debugging. It
+ * prints a message defined by format *fmt* (of size *fmt_size*)
+ * to file *\/sys/kernel/debug/tracing/trace* from DebugFS, if
+ * available. It can take up to three additional **u64**
+ * arguments (as an eBPF helpers, the total number of arguments is
+ * limited to five).
*
- * u32 bpf_prandom_u32(void)
- * Return: random value
+ * Each time the helper is called, it appends a line to the trace.
+ * The format of the trace is customizable, and the exact output
+ * one will get depends on the options set in
+ * *\/sys/kernel/debug/tracing/trace_options* (see also the
+ * *README* file under the same directory). However, it usually
+ * defaults to something like:
*
- * u32 bpf_raw_smp_processor_id(void)
- * Return: SMP processor ID
+ * ::
*
- * int bpf_skb_store_bytes(skb, offset, from, len, flags)
- * store bytes into packet
- * @skb: pointer to skb
- * @offset: offset within packet from skb->mac_header
- * @from: pointer where to copy bytes from
- * @len: number of bytes to store into packet
- * @flags: bit 0 - if true, recompute skb->csum
- * other bits - reserved
- * Return: 0 on success or negative error
+ * telnet-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
*
- * int bpf_l3_csum_replace(skb, offset, from, to, flags)
- * recompute IP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where IP checksum is located
- * @from: old value of header field
- * @to: new value of header field
- * @flags: bits 0-3 - size of header field
- * other bits - reserved
- * Return: 0 on success or negative error
+ * In the above:
*
- * int bpf_l4_csum_replace(skb, offset, from, to, flags)
- * recompute TCP/UDP checksum
- * @skb: pointer to skb
- * @offset: offset within packet where TCP/UDP checksum is located
- * @from: old value of header field
- * @to: new value of header field
- * @flags: bits 0-3 - size of header field
- * bit 4 - is pseudo header
- * other bits - reserved
- * Return: 0 on success or negative error
+ * * ``telnet`` is the name of the current task.
+ * * ``470`` is the PID of the current task.
+ * * ``001`` is the CPU number on which the task is
+ * running.
+ * * In ``.N..``, each character refers to a set of
+ * options (whether irqs are enabled, scheduling
+ * options, whether hard/softirqs are running, level of
+ * preempt_disabled respectively). **N** means that
+ * **TIF_NEED_RESCHED** and **PREEMPT_NEED_RESCHED**
+ * are set.
+ * * ``419421.045894`` is a timestamp.
+ * * ``0x00000001`` is a fake value used by BPF for the
+ * instruction pointer register.
+ * * ``<formatted msg>`` is the message formatted with
+ * *fmt*.
*
- * int bpf_tail_call(ctx, prog_array_map, index)
- * jump into another BPF program
- * @ctx: context pointer passed to next program
- * @prog_array_map: pointer to map which type is BPF_MAP_TYPE_PROG_ARRAY
- * @index: 32-bit index inside array that selects specific program to run
- * Return: 0 on success or negative error
+ * The conversion specifiers supported by *fmt* are similar, but
+ * more limited than for printk(). They are **%d**, **%i**,
+ * **%u**, **%x**, **%ld**, **%li**, **%lu**, **%lx**, **%lld**,
+ * **%lli**, **%llu**, **%llx**, **%p**, **%s**. No modifier (size
+ * of field, padding with zeroes, etc.) is available, and the
+ * helper will return **-EINVAL** (but print nothing) if it
+ * encounters an unknown specifier.
*
- * int bpf_clone_redirect(skb, ifindex, flags)
- * redirect to another netdev
- * @skb: pointer to skb
- * @ifindex: ifindex of the net device
- * @flags: bit 0 - if set, redirect to ingress instead of egress
- * other bits - reserved
- * Return: 0 on success or negative error
+ * Also, note that **bpf_trace_printk**\ () is slow, and should
+ * only be used for debugging purposes. For this reason, a notice
+ * bloc (spanning several lines) is printed to kernel logs and
+ * states that the helper should not be used "for production use"
+ * the first time this helper is used (or more precisely, when
+ * **trace_printk**\ () buffers are allocated). For passing values
+ * to user space, perf events should be preferred.
+ * Return
+ * The number of bytes written to the buffer, or a negative error
+ * in case of failure.
+ *
+ * u32 bpf_get_prandom_u32(void)
+ * Description
+ * Get a pseudo-random number.
+ *
+ * From a security point of view, this helper uses its own
+ * pseudo-random internal state, and cannot be used to infer the
+ * seed of other random functions in the kernel. However, it is
+ * essential to note that the generator used by the helper is not
+ * cryptographically secure.
+ * Return
+ * A random 32-bit unsigned value.
+ *
+ * u32 bpf_get_smp_processor_id(void)
+ * Description
+ * Get the SMP (symmetric multiprocessing) processor id. Note that
+ * all programs run with preemption disabled, which means that the
+ * SMP processor id is stable during all the execution of the
+ * program.
+ * Return
+ * The SMP id of the processor running the program.
+ *
+ * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
+ * Description
+ * Store *len* bytes from address *from* into the packet
+ * associated to *skb*, at *offset*. *flags* are a combination of
+ * **BPF_F_RECOMPUTE_CSUM** (automatically recompute the
+ * checksum for the packet after storing the bytes) and
+ * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\
+ * **->swhash** and *skb*\ **->l4hash** to 0).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_l3_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 size)
+ * Description
+ * Recompute the layer 3 (e.g. IP) checksum for the packet
+ * associated to *skb*. Computation is incremental, so the helper
+ * must know the former value of the header field that was
+ * modified (*from*), the new value of this field (*to*), and the
+ * number of bytes (2 or 4) for this field, stored in *size*.
+ * Alternatively, it is possible to store the difference between
+ * the previous and the new values of the header field in *to*, by
+ * setting *from* and *size* to 0. For both methods, *offset*
+ * indicates the location of the IP checksum within the packet.
+ *
+ * This helper works in combination with **bpf_csum_diff**\ (),
+ * which does not update the checksum in-place, but offers more
+ * flexibility and can handle sizes larger than 2 or 4 for the
+ * checksum to update.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_l4_csum_replace(struct sk_buff *skb, u32 offset, u64 from, u64 to, u64 flags)
+ * Description
+ * Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the
+ * packet associated to *skb*. Computation is incremental, so the
+ * helper must know the former value of the header field that was
+ * modified (*from*), the new value of this field (*to*), and the
+ * number of bytes (2 or 4) for this field, stored on the lowest
+ * four bits of *flags*. Alternatively, it is possible to store
+ * the difference between the previous and the new values of the
+ * header field in *to*, by setting *from* and the four lowest
+ * bits of *flags* to 0. For both methods, *offset* indicates the
+ * location of the IP checksum within the packet. In addition to
+ * the size of the field, *flags* can be added (bitwise OR) actual
+ * flags. With **BPF_F_MARK_MANGLED_0**, a null checksum is left
+ * untouched (unless **BPF_F_MARK_ENFORCE** is added as well), and
+ * for updates resulting in a null checksum the value is set to
+ * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates
+ * the checksum is to be computed against a pseudo-header.
+ *
+ * This helper works in combination with **bpf_csum_diff**\ (),
+ * which does not update the checksum in-place, but offers more
+ * flexibility and can handle sizes larger than 2 or 4 for the
+ * checksum to update.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_tail_call(void *ctx, struct bpf_map *prog_array_map, u32 index)
+ * Description
+ * This special helper is used to trigger a "tail call", or in
+ * other words, to jump into another eBPF program. The same stack
+ * frame is used (but values on stack and in registers for the
+ * caller are not accessible to the callee). This mechanism allows
+ * for program chaining, either for raising the maximum number of
+ * available eBPF instructions, or to execute given programs in
+ * conditional blocks. For security reasons, there is an upper
+ * limit to the number of successive tail calls that can be
+ * performed.
+ *
+ * Upon call of this helper, the program attempts to jump into a
+ * program referenced at index *index* in *prog_array_map*, a
+ * special map of type **BPF_MAP_TYPE_PROG_ARRAY**, and passes
+ * *ctx*, a pointer to the context.
+ *
+ * If the call succeeds, the kernel immediately runs the first
+ * instruction of the new program. This is not a function call,
+ * and it never returns to the previous program. If the call
+ * fails, then the helper has no effect, and the caller continues
+ * to run its subsequent instructions. A call can fail if the
+ * destination program for the jump does not exist (i.e. *index*
+ * is superior to the number of entries in *prog_array_map*), or
+ * if the maximum number of tail calls has been reached for this
+ * chain of programs. This limit is defined in the kernel by the
+ * macro **MAX_TAIL_CALL_CNT** (not accessible to user space),
+ * which is currently set to 32.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_clone_redirect(struct sk_buff *skb, u32 ifindex, u64 flags)
+ * Description
+ * Clone and redirect the packet associated to *skb* to another
+ * net device of index *ifindex*. Both ingress and egress
+ * interfaces can be used for redirection. The **BPF_F_INGRESS**
+ * value in *flags* is used to make the distinction (ingress path
+ * is selected if the flag is present, egress path otherwise).
+ * This is the only flag supported for now.
+ *
+ * In comparison with **bpf_redirect**\ () helper,
+ * **bpf_clone_redirect**\ () has the associated cost of
+ * duplicating the packet buffer, but this can be executed out of
+ * the eBPF program. Conversely, **bpf_redirect**\ () is more
+ * efficient, but it is handled through an action code where the
+ * redirection happens only after the eBPF program has returned.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
* u64 bpf_get_current_pid_tgid(void)
- * Return: current->tgid << 32 | current->pid
+ * Return
+ * A 64-bit integer containing the current tgid and pid, and
+ * created as such:
+ * *current_task*\ **->tgid << 32 \|**
+ * *current_task*\ **->pid**.
*
* u64 bpf_get_current_uid_gid(void)
- * Return: current_gid << 32 | current_uid
+ * Return
+ * A 64-bit integer containing the current GID and UID, and
+ * created as such: *current_gid* **<< 32 \|** *current_uid*.
*
- * int bpf_get_current_comm(char *buf, int size_of_buf)
- * stores current->comm into buf
- * Return: 0 on success or negative error
+ * int bpf_get_current_comm(char *buf, u32 size_of_buf)
+ * Description
+ * Copy the **comm** attribute of the current task into *buf* of
+ * *size_of_buf*. The **comm** attribute contains the name of
+ * the executable (excluding the path) for the current task. The
+ * *size_of_buf* must be strictly positive. On success, the
+ * helper makes sure that the *buf* is NUL-terminated. On failure,
+ * it is filled with zeroes.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * u32 bpf_get_cgroup_classid(skb)
- * retrieve a proc's classid
- * @skb: pointer to skb
- * Return: classid if != 0
+ * u32 bpf_get_cgroup_classid(struct sk_buff *skb)
+ * Description
+ * Retrieve the classid for the current task, i.e. for the net_cls
+ * cgroup to which *skb* belongs.
*
- * int bpf_skb_vlan_push(skb, vlan_proto, vlan_tci)
- * Return: 0 on success or negative error
+ * This helper can be used on TC egress path, but not on ingress.
*
- * int bpf_skb_vlan_pop(skb)
- * Return: 0 on success or negative error
+ * The net_cls cgroup provides an interface to tag network packets
+ * based on a user-provided identifier for all traffic coming from
+ * the tasks belonging to the related cgroup. See also the related
+ * kernel documentation, available from the Linux sources in file
+ * *Documentation/cgroup-v1/net_cls.txt*.
*
- * int bpf_skb_get_tunnel_key(skb, key, size, flags)
- * int bpf_skb_set_tunnel_key(skb, key, size, flags)
- * retrieve or populate tunnel metadata
- * @skb: pointer to skb
- * @key: pointer to 'struct bpf_tunnel_key'
- * @size: size of 'struct bpf_tunnel_key'
- * @flags: room for future extensions
- * Return: 0 on success or negative error
+ * The Linux kernel has two versions for cgroups: there are
+ * cgroups v1 and cgroups v2. Both are available to users, who can
+ * use a mixture of them, but note that the net_cls cgroup is for
+ * cgroup v1 only. This makes it incompatible with BPF programs
+ * run on cgroups, which is a cgroup-v2-only feature (a socket can
+ * only hold data for one version of cgroups at a time).
*
- * u64 bpf_perf_event_read(map, flags)
- * read perf event counter value
- * @map: pointer to perf_event_array map
- * @flags: index of event in the map or bitmask flags
- * Return: value of perf event counter read or error code
+ * This helper is only available is the kernel was compiled with
+ * the **CONFIG_CGROUP_NET_CLASSID** configuration option set to
+ * "**y**" or to "**m**".
+ * Return
+ * The classid, or 0 for the default unconfigured classid.
*
- * int bpf_redirect(ifindex, flags)
- * redirect to another netdev
- * @ifindex: ifindex of the net device
- * @flags:
- * cls_bpf:
- * bit 0 - if set, redirect to ingress instead of egress
- * other bits - reserved
- * xdp_bpf:
- * all bits - reserved
- * Return: cls_bpf: TC_ACT_REDIRECT on success or TC_ACT_SHOT on error
- * xdp_bfp: XDP_REDIRECT on success or XDP_ABORT on error
- * int bpf_redirect_map(map, key, flags)
- * redirect to endpoint in map
- * @map: pointer to dev map
- * @key: index in map to lookup
- * @flags: --
- * Return: XDP_REDIRECT on success or XDP_ABORT on error
+ * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
+ * Description
+ * Push a *vlan_tci* (VLAN tag control information) of protocol
+ * *vlan_proto* to the packet associated to *skb*, then update
+ * the checksum. Note that if *vlan_proto* is different from
+ * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to
+ * be **ETH_P_8021Q**.
*
- * u32 bpf_get_route_realm(skb)
- * retrieve a dst's tclassid
- * @skb: pointer to skb
- * Return: realm if != 0
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_perf_event_output(ctx, map, flags, data, size)
- * output perf raw sample
- * @ctx: struct pt_regs*
- * @map: pointer to perf_event_array map
- * @flags: index of event in the map or bitmask flags
- * @data: data on stack to be output as raw data
- * @size: size of data
- * Return: 0 on success or negative error
+ * int bpf_skb_vlan_pop(struct sk_buff *skb)
+ * Description
+ * Pop a VLAN header from the packet associated to *skb*.
*
- * int bpf_get_stackid(ctx, map, flags)
- * walk user or kernel stack and return id
- * @ctx: struct pt_regs*
- * @map: pointer to stack_trace map
- * @flags: bits 0-7 - numer of stack frames to skip
- * bit 8 - collect user stack instead of kernel
- * bit 9 - compare stacks by hash only
- * bit 10 - if two different stacks hash into the same stackid
- * discard old
- * other bits - reserved
- * Return: >= 0 stackid on success or negative error
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
- * calculate csum diff
- * @from: raw from buffer
- * @from_size: length of from buffer
- * @to: raw to buffer
- * @to_size: length of to buffer
- * @seed: optional seed
- * Return: csum result or negative error code
+ * int bpf_skb_get_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
+ * Description
+ * Get tunnel metadata. This helper takes a pointer *key* to an
+ * empty **struct bpf_tunnel_key** of **size**, that will be
+ * filled with tunnel metadata for the packet associated to *skb*.
+ * The *flags* can be set to **BPF_F_TUNINFO_IPV6**, which
+ * indicates that the tunnel is based on IPv6 protocol instead of
+ * IPv4.
*
- * int bpf_skb_get_tunnel_opt(skb, opt, size)
- * retrieve tunnel options metadata
- * @skb: pointer to skb
- * @opt: pointer to raw tunnel option data
- * @size: size of @opt
- * Return: option size
+ * The **struct bpf_tunnel_key** is an object that generalizes the
+ * principal parameters used by various tunneling protocols into a
+ * single struct. This way, it can be used to easily make a
+ * decision based on the contents of the encapsulation header,
+ * "summarized" in this struct. In particular, it holds the IP
+ * address of the remote end (IPv4 or IPv6, depending on the case)
+ * in *key*\ **->remote_ipv4** or *key*\ **->remote_ipv6**. Also,
+ * this struct exposes the *key*\ **->tunnel_id**, which is
+ * generally mapped to a VNI (Virtual Network Identifier), making
+ * it programmable together with the **bpf_skb_set_tunnel_key**\
+ * () helper.
*
- * int bpf_skb_set_tunnel_opt(skb, opt, size)
- * populate tunnel options metadata
- * @skb: pointer to skb
- * @opt: pointer to raw tunnel option data
- * @size: size of @opt
- * Return: 0 on success or negative error
+ * Let's imagine that the following code is part of a program
+ * attached to the TC ingress interface, on one end of a GRE
+ * tunnel, and is supposed to filter out all messages coming from
+ * remote ends with IPv4 address other than 10.0.0.1:
*
- * int bpf_skb_change_proto(skb, proto, flags)
- * Change protocol of the skb. Currently supported is v4 -> v6,
- * v6 -> v4 transitions. The helper will also resize the skb. eBPF
- * program is expected to fill the new headers via skb_store_bytes
- * and lX_csum_replace.
- * @skb: pointer to skb
- * @proto: new skb->protocol type
- * @flags: reserved
- * Return: 0 on success or negative error
+ * ::
*
- * int bpf_skb_change_type(skb, type)
- * Change packet type of skb.
- * @skb: pointer to skb
- * @type: new skb->pkt_type type
- * Return: 0 on success or negative error
+ * int ret;
+ * struct bpf_tunnel_key key = {};
+ *
+ * ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+ * if (ret < 0)
+ * return TC_ACT_SHOT; // drop packet
+ *
+ * if (key.remote_ipv4 != 0x0a000001)
+ * return TC_ACT_SHOT; // drop packet
+ *
+ * return TC_ACT_OK; // accept packet
*
- * int bpf_skb_under_cgroup(skb, map, index)
- * Check cgroup2 membership of skb
- * @skb: pointer to skb
- * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
- * @index: index of the cgroup in the bpf_map
- * Return:
- * == 0 skb failed the cgroup2 descendant test
- * == 1 skb succeeded the cgroup2 descendant test
- * < 0 error
+ * This interface can also be used with all encapsulation devices
+ * that can operate in "collect metadata" mode: instead of having
+ * one network device per specific configuration, the "collect
+ * metadata" mode only requires a single device where the
+ * configuration can be extracted from this helper.
*
- * u32 bpf_get_hash_recalc(skb)
- * Retrieve and possibly recalculate skb->hash.
- * @skb: pointer to skb
- * Return: hash
+ * This can be used together with various tunnels such as VXLan,
+ * Geneve, GRE or IP in IP (IPIP).
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_set_tunnel_key(struct sk_buff *skb, struct bpf_tunnel_key *key, u32 size, u64 flags)
+ * Description
+ * Populate tunnel metadata for packet associated to *skb.* The
+ * tunnel metadata is set to the contents of *key*, of *size*. The
+ * *flags* can be set to a combination of the following values:
+ *
+ * **BPF_F_TUNINFO_IPV6**
+ * Indicate that the tunnel is based on IPv6 protocol
+ * instead of IPv4.
+ * **BPF_F_ZERO_CSUM_TX**
+ * For IPv4 packets, add a flag to tunnel metadata
+ * indicating that checksum computation should be skipped
+ * and checksum set to zeroes.
+ * **BPF_F_DONT_FRAGMENT**
+ * Add a flag to tunnel metadata indicating that the
+ * packet should not be fragmented.
+ * **BPF_F_SEQ_NUMBER**
+ * Add a flag to tunnel metadata indicating that a
+ * sequence number should be added to tunnel header before
+ * sending the packet. This flag was added for GRE
+ * encapsulation, but might be used with other protocols
+ * as well in the future.
+ *
+ * Here is a typical usage on the transmit path:
+ *
+ * ::
+ *
+ * struct bpf_tunnel_key key;
+ * populate key ...
+ * bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
+ * bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
+ *
+ * See also the description of the **bpf_skb_get_tunnel_key**\ ()
+ * helper for additional information.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * u64 bpf_perf_event_read(struct bpf_map *map, u64 flags)
+ * Description
+ * Read the value of a perf event counter. This helper relies on a
+ * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of
+ * the perf event counter is selected when *map* is updated with
+ * perf event file descriptors. The *map* is an array whose size
+ * is the number of available CPUs, and each cell contains a value
+ * relative to one CPU. The value to retrieve is indicated by
+ * *flags*, that contains the index of the CPU to look up, masked
+ * with **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU should be retrieved.
+ *
+ * Note that before Linux 4.13, only hardware perf event can be
+ * retrieved.
+ *
+ * Also, be aware that the newer helper
+ * **bpf_perf_event_read_value**\ () is recommended over
+ * **bpf_perf_event_read**\ () in general. The latter has some ABI
+ * quirks where error and counter value are used as a return code
+ * (which is wrong to do since ranges may overlap). This issue is
+ * fixed with **bpf_perf_event_read_value**\ (), which at the same
+ * time provides more features over the **bpf_perf_event_read**\
+ * () interface. Please refer to the description of
+ * **bpf_perf_event_read_value**\ () for details.
+ * Return
+ * The value of the perf event counter read from the map, or a
+ * negative error code in case of failure.
+ *
+ * int bpf_redirect(u32 ifindex, u64 flags)
+ * Description
+ * Redirect the packet to another net device of index *ifindex*.
+ * This helper is somewhat similar to **bpf_clone_redirect**\
+ * (), except that the packet is not cloned, which provides
+ * increased performance.
+ *
+ * Except for XDP, both ingress and egress interfaces can be used
+ * for redirection. The **BPF_F_INGRESS** value in *flags* is used
+ * to make the distinction (ingress path is selected if the flag
+ * is present, egress path otherwise). Currently, XDP only
+ * supports redirection to the egress interface, and accepts no
+ * flag at all.
+ *
+ * The same effect can be attained with the more generic
+ * **bpf_redirect_map**\ (), which requires specific maps to be
+ * used but offers better performance.
+ * Return
+ * For XDP, the helper returns **XDP_REDIRECT** on success or
+ * **XDP_ABORTED** on error. For other program types, the values
+ * are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
+ * error.
+ *
+ * u32 bpf_get_route_realm(struct sk_buff *skb)
+ * Description
+ * Retrieve the realm or the route, that is to say the
+ * **tclassid** field of the destination for the *skb*. The
+ * indentifier retrieved is a user-provided tag, similar to the
+ * one used with the net_cls cgroup (see description for
+ * **bpf_get_cgroup_classid**\ () helper), but here this tag is
+ * held by a route (a destination entry), not by a task.
+ *
+ * Retrieving this identifier works with the clsact TC egress hook
+ * (see also **tc-bpf(8)**), or alternatively on conventional
+ * classful egress qdiscs, but not on TC ingress path. In case of
+ * clsact TC egress hook, this has the advantage that, internally,
+ * the destination entry has not been dropped yet in the transmit
+ * path. Therefore, the destination entry does not need to be
+ * artificially held via **netif_keep_dst**\ () for a classful
+ * qdisc until the *skb* is freed.
+ *
+ * This helper is available only if the kernel was compiled with
+ * **CONFIG_IP_ROUTE_CLASSID** configuration option.
+ * Return
+ * The realm of the route for the packet associated to *skb*, or 0
+ * if none was found.
+ *
+ * int bpf_perf_event_output(struct pt_reg *ctx, struct bpf_map *map, u64 flags, void *data, u64 size)
+ * Description
+ * Write raw *data* blob into a special BPF perf event held by
+ * *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This perf
+ * event must have the following attributes: **PERF_SAMPLE_RAW**
+ * as **sample_type**, **PERF_TYPE_SOFTWARE** as **type**, and
+ * **PERF_COUNT_SW_BPF_OUTPUT** as **config**.
+ *
+ * The *flags* are used to indicate the index in *map* for which
+ * the value must be put, masked with **BPF_F_INDEX_MASK**.
+ * Alternatively, *flags* can be set to **BPF_F_CURRENT_CPU**
+ * to indicate that the index of the current CPU core should be
+ * used.
+ *
+ * The value to write, of *size*, is passed through eBPF stack and
+ * pointed by *data*.
+ *
+ * The context of the program *ctx* needs also be passed to the
+ * helper.
+ *
+ * On user space, a program willing to read the values needs to
+ * call **perf_event_open**\ () on the perf event (either for
+ * one or for all CPUs) and to store the file descriptor into the
+ * *map*. This must be done before the eBPF program can send data
+ * into it. An example is available in file
+ * *samples/bpf/trace_output_user.c* in the Linux kernel source
+ * tree (the eBPF program counterpart is in
+ * *samples/bpf/trace_output_kern.c*).
+ *
+ * **bpf_perf_event_output**\ () achieves better performance
+ * than **bpf_trace_printk**\ () for sharing data with user
+ * space, and is much better suitable for streaming data from eBPF
+ * programs.
+ *
+ * Note that this helper is not restricted to tracing use cases
+ * and can be used with programs attached to TC or XDP as well,
+ * where it allows for passing data to user space listeners. Data
+ * can be:
+ *
+ * * Only custom structs,
+ * * Only the packet payload, or
+ * * A combination of both.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len)
+ * Description
+ * This helper was provided as an easy way to load data from a
+ * packet. It can be used to load *len* bytes from *offset* from
+ * the packet associated to *skb*, into the buffer pointed by
+ * *to*.
+ *
+ * Since Linux 4.7, usage of this helper has mostly been replaced
+ * by "direct packet access", enabling packet data to be
+ * manipulated with *skb*\ **->data** and *skb*\ **->data_end**
+ * pointing respectively to the first byte of packet data and to
+ * the byte after the last byte of packet data. However, it
+ * remains useful if one wishes to read large quantities of data
+ * at once from a packet into the eBPF stack.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_get_stackid(struct pt_reg *ctx, struct bpf_map *map, u64 flags)
+ * Description
+ * Walk a user or a kernel stack and return its id. To achieve
+ * this, the helper needs *ctx*, which is a pointer to the context
+ * on which the tracing program is executed, and a pointer to a
+ * *map* of type **BPF_MAP_TYPE_STACK_TRACE**.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * a combination of the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_FAST_STACK_CMP**
+ * Compare stacks by hash only.
+ * **BPF_F_REUSE_STACKID**
+ * If two different stacks hash into the same *stackid*,
+ * discard the old one.
+ *
+ * The stack id retrieved is a 32 bit long integer handle which
+ * can be further combined with other data (including other stack
+ * ids) and used as a key into maps. This can be useful for
+ * generating a variety of graphs (such as flame graphs or off-cpu
+ * graphs).
+ *
+ * For walking a stack, this helper is an improvement over
+ * **bpf_probe_read**\ (), which can be used with unrolled loops
+ * but is not efficient and consumes a lot of eBPF instructions.
+ * Instead, **bpf_get_stackid**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=<new value>
+ *
+ * Return
+ * The positive or null stack id on success, or a negative error
+ * in case of failure.
+ *
+ * s64 bpf_csum_diff(__be32 *from, u32 from_size, __be32 *to, u32 to_size, __wsum seed)
+ * Description
+ * Compute a checksum difference, from the raw buffer pointed by
+ * *from*, of length *from_size* (that must be a multiple of 4),
+ * towards the raw buffer pointed by *to*, of size *to_size*
+ * (same remark). An optional *seed* can be added to the value
+ * (this can be cascaded, the seed may come from a previous call
+ * to the helper).
+ *
+ * This is flexible enough to be used in several ways:
+ *
+ * * With *from_size* == 0, *to_size* > 0 and *seed* set to
+ * checksum, it can be used when pushing new data.
+ * * With *from_size* > 0, *to_size* == 0 and *seed* set to
+ * checksum, it can be used when removing data from a packet.
+ * * With *from_size* > 0, *to_size* > 0 and *seed* set to 0, it
+ * can be used to compute a diff. Note that *from_size* and
+ * *to_size* do not need to be equal.
+ *
+ * This helper can be used in combination with
+ * **bpf_l3_csum_replace**\ () and **bpf_l4_csum_replace**\ (), to
+ * which one can feed in the difference computed with
+ * **bpf_csum_diff**\ ().
+ * Return
+ * The checksum result, or a negative error code in case of
+ * failure.
+ *
+ * int bpf_skb_get_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
+ * Description
+ * Retrieve tunnel options metadata for the packet associated to
+ * *skb*, and store the raw tunnel option data to the buffer *opt*
+ * of *size*.
+ *
+ * This helper can be used with encapsulation devices that can
+ * operate in "collect metadata" mode (please refer to the related
+ * note in the description of **bpf_skb_get_tunnel_key**\ () for
+ * more details). A particular example where this can be used is
+ * in combination with the Geneve encapsulation protocol, where it
+ * allows for pushing (with **bpf_skb_get_tunnel_opt**\ () helper)
+ * and retrieving arbitrary TLVs (Type-Length-Value headers) from
+ * the eBPF program. This allows for full customization of these
+ * headers.
+ * Return
+ * The size of the option data retrieved.
+ *
+ * int bpf_skb_set_tunnel_opt(struct sk_buff *skb, u8 *opt, u32 size)
+ * Description
+ * Set tunnel options metadata for the packet associated to *skb*
+ * to the option data contained in the raw buffer *opt* of *size*.
+ *
+ * See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ * helper for additional information.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_change_proto(struct sk_buff *skb, __be16 proto, u64 flags)
+ * Description
+ * Change the protocol of the *skb* to *proto*. Currently
+ * supported are transition from IPv4 to IPv6, and from IPv6 to
+ * IPv4. The helper takes care of the groundwork for the
+ * transition, including resizing the socket buffer. The eBPF
+ * program is expected to fill the new headers, if any, via
+ * **skb_store_bytes**\ () and to recompute the checksums with
+ * **bpf_l3_csum_replace**\ () and **bpf_l4_csum_replace**\
+ * (). The main case for this helper is to perform NAT64
+ * operations out of an eBPF program.
+ *
+ * Internally, the GSO type is marked as dodgy so that headers are
+ * checked and segments are recalculated by the GSO/GRO engine.
+ * The size for GSO target is adapted as well.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_change_type(struct sk_buff *skb, u32 type)
+ * Description
+ * Change the packet type for the packet associated to *skb*. This
+ * comes down to setting *skb*\ **->pkt_type** to *type*, except
+ * the eBPF program does not have a write access to *skb*\
+ * **->pkt_type** beside this helper. Using a helper here allows
+ * for graceful handling of errors.
+ *
+ * The major use case is to change incoming *skb*s to
+ * **PACKET_HOST** in a programmatic way instead of having to
+ * recirculate via **redirect**\ (..., **BPF_F_INGRESS**), for
+ * example.
+ *
+ * Note that *type* only allows certain values. At this time, they
+ * are:
+ *
+ * **PACKET_HOST**
+ * Packet is for us.
+ * **PACKET_BROADCAST**
+ * Send packet to all.
+ * **PACKET_MULTICAST**
+ * Send packet to group.
+ * **PACKET_OTHERHOST**
+ * Send packet to someone else.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_under_cgroup(struct sk_buff *skb, struct bpf_map *map, u32 index)
+ * Description
+ * Check whether *skb* is a descendant of the cgroup2 held by
+ * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
+ * Return
+ * The return value depends on the result of the test, and can be:
+ *
+ * * 0, if the *skb* failed the cgroup2 descendant test.
+ * * 1, if the *skb* succeeded the cgroup2 descendant test.
+ * * A negative error code, if an error occurred.
+ *
+ * u32 bpf_get_hash_recalc(struct sk_buff *skb)
+ * Description
+ * Retrieve the hash of the packet, *skb*\ **->hash**. If it is
+ * not set, in particular if the hash was cleared due to mangling,
+ * recompute this hash. Later accesses to the hash can be done
+ * directly with *skb*\ **->hash**.
+ *
+ * Calling **bpf_set_hash_invalid**\ (), changing a packet
+ * prototype with **bpf_skb_change_proto**\ (), or calling
+ * **bpf_skb_store_bytes**\ () with the
+ * **BPF_F_INVALIDATE_HASH** are actions susceptible to clear
+ * the hash and to trigger a new computation for the next call to
+ * **bpf_get_hash_recalc**\ ().
+ * Return
+ * The 32-bit hash.
*
* u64 bpf_get_current_task(void)
- * Returns current task_struct
- * Return: current
+ * Return
+ * A pointer to the current task struct.
*
- * int bpf_probe_write_user(void *dst, void *src, int len)
- * safely attempt to write to a location
- * @dst: destination address in userspace
- * @src: source address on stack
- * @len: number of bytes to copy
- * Return: 0 on success or negative error
+ * int bpf_probe_write_user(void *dst, const void *src, u32 len)
+ * Description
+ * Attempt in a safe way to write *len* bytes from the buffer
+ * *src* to *dst* in memory. It only works for threads that are in
+ * user context, and *dst* must be a valid user space address.
*
- * int bpf_current_task_under_cgroup(map, index)
- * Check cgroup2 membership of current task
- * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
- * @index: index of the cgroup in the bpf_map
- * Return:
- * == 0 current failed the cgroup2 descendant test
- * == 1 current succeeded the cgroup2 descendant test
- * < 0 error
+ * This helper should not be used to implement any kind of
+ * security mechanism because of TOC-TOU attacks, but rather to
+ * debug, divert, and manipulate execution of semi-cooperative
+ * processes.
*
- * int bpf_skb_change_tail(skb, len, flags)
- * The helper will resize the skb to the given new size, to be used f.e.
- * with control messages.
- * @skb: pointer to skb
- * @len: new skb length
- * @flags: reserved
- * Return: 0 on success or negative error
+ * Keep in mind that this feature is meant for experiments, and it
+ * has a risk of crashing the system and running programs.
+ * Therefore, when an eBPF program using this helper is attached,
+ * a warning including PID and process name is printed to kernel
+ * logs.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_skb_pull_data(skb, len)
- * The helper will pull in non-linear data in case the skb is non-linear
- * and not all of len are part of the linear section. Only needed for
- * read/write with direct packet access.
- * @skb: pointer to skb
- * @len: len to make read/writeable
- * Return: 0 on success or negative error
+ * int bpf_current_task_under_cgroup(struct bpf_map *map, u32 index)
+ * Description
+ * Check whether the probe is being run is the context of a given
+ * subset of the cgroup2 hierarchy. The cgroup2 to test is held by
+ * *map* of type **BPF_MAP_TYPE_CGROUP_ARRAY**, at *index*.
+ * Return
+ * The return value depends on the result of the test, and can be:
*
- * s64 bpf_csum_update(skb, csum)
- * Adds csum into skb->csum in case of CHECKSUM_COMPLETE.
- * @skb: pointer to skb
- * @csum: csum to add
- * Return: csum on success or negative error
+ * * 0, if the *skb* task belongs to the cgroup2.
+ * * 1, if the *skb* task does not belong to the cgroup2.
+ * * A negative error code, if an error occurred.
*
- * void bpf_set_hash_invalid(skb)
- * Invalidate current skb->hash.
- * @skb: pointer to skb
+ * int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
+ * Description
+ * Resize (trim or grow) the packet associated to *skb* to the
+ * new *len*. The *flags* are reserved for future usage, and must
+ * be left at zero.
*
- * int bpf_get_numa_node_id()
- * Return: Id of current NUMA node.
+ * The basic idea is that the helper performs the needed work to
+ * change the size of the packet, then the eBPF program rewrites
+ * the rest via helpers like **bpf_skb_store_bytes**\ (),
+ * **bpf_l3_csum_replace**\ (), **bpf_l3_csum_replace**\ ()
+ * and others. This helper is a slow path utility intended for
+ * replies with control messages. And because it is targeted for
+ * slow path, the helper itself can afford to be slow: it
+ * implicitly linearizes, unclones and drops offloads from the
+ * *skb*.
*
- * int bpf_skb_change_head()
- * Grows headroom of skb and adjusts MAC header offset accordingly.
- * Will extends/reallocae as required automatically.
- * May change skb data pointer and will thus invalidate any check
- * performed for direct packet access.
- * @skb: pointer to skb
- * @len: length of header to be pushed in front
- * @flags: Flags (unused for now)
- * Return: 0 on success or negative error
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_xdp_adjust_head(xdp_md, delta)
- * Adjust the xdp_md.data by delta
- * @xdp_md: pointer to xdp_md
- * @delta: An positive/negative integer to be added to xdp_md.data
- * Return: 0 on success or negative on error
+ * int bpf_skb_pull_data(struct sk_buff *skb, u32 len)
+ * Description
+ * Pull in non-linear data in case the *skb* is non-linear and not
+ * all of *len* are part of the linear section. Make *len* bytes
+ * from *skb* readable and writable. If a zero value is passed for
+ * *len*, then the whole length of the *skb* is pulled.
+ *
+ * This helper is only needed for reading and writing with direct
+ * packet access.
+ *
+ * For direct packet access, testing that offsets to access
+ * are within packet boundaries (test on *skb*\ **->data_end**) is
+ * susceptible to fail if offsets are invalid, or if the requested
+ * data is in non-linear parts of the *skb*. On failure the
+ * program can just bail out, or in the case of a non-linear
+ * buffer, use a helper to make the data available. The
+ * **bpf_skb_load_bytes**\ () helper is a first solution to access
+ * the data. Another one consists in using **bpf_skb_pull_data**
+ * to pull in once the non-linear parts, then retesting and
+ * eventually access the data.
+ *
+ * At the same time, this also makes sure the *skb* is uncloned,
+ * which is a necessary condition for direct write. As this needs
+ * to be an invariant for the write part only, the verifier
+ * detects writes and adds a prologue that is calling
+ * **bpf_skb_pull_data()** to effectively unclone the *skb* from
+ * the very beginning in case it is indeed cloned.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * s64 bpf_csum_update(struct sk_buff *skb, __wsum csum)
+ * Description
+ * Add the checksum *csum* into *skb*\ **->csum** in case the
+ * driver has supplied a checksum for the entire packet into that
+ * field. Return an error otherwise. This helper is intended to be
+ * used in combination with **bpf_csum_diff**\ (), in particular
+ * when the checksum needs to be updated after data has been
+ * written into the packet through direct packet access.
+ * Return
+ * The checksum on success, or a negative error code in case of
+ * failure.
+ *
+ * void bpf_set_hash_invalid(struct sk_buff *skb)
+ * Description
+ * Invalidate the current *skb*\ **->hash**. It can be used after
+ * mangling on headers through direct packet access, in order to
+ * indicate that the hash is outdated and to trigger a
+ * recalculation the next time the kernel tries to access this
+ * hash or when the **bpf_get_hash_recalc**\ () helper is called.
+ *
+ * int bpf_get_numa_node_id(void)
+ * Description
+ * Return the id of the current NUMA node. The primary use case
+ * for this helper is the selection of sockets for the local NUMA
+ * node, when the program is attached to sockets using the
+ * **SO_ATTACH_REUSEPORT_EBPF** option (see also **socket(7)**),
+ * but the helper is also available to other eBPF program types,
+ * similarly to **bpf_get_smp_processor_id**\ ().
+ * Return
+ * The id of current NUMA node.
+ *
+ * int bpf_skb_change_head(struct sk_buff *skb, u32 len, u64 flags)
+ * Description
+ * Grows headroom of packet associated to *skb* and adjusts the
+ * offset of the MAC header accordingly, adding *len* bytes of
+ * space. It automatically extends and reallocates memory as
+ * required.
+ *
+ * This helper can be used on a layer 3 *skb* to push a MAC header
+ * for redirection into a layer 2 device.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)
+ * Description
+ * Adjust (move) *xdp_md*\ **->data** by *delta* bytes. Note that
+ * it is possible to use a negative value for *delta*. This helper
+ * can be used to prepare the packet for pushing or popping
+ * headers.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
* int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
- * Copy a NUL terminated string from unsafe address. In case the string
- * length is smaller than size, the target is not padded with further NUL
- * bytes. In case the string length is larger than size, just count-1
- * bytes are copied and the last byte is set to NUL.
- * @dst: destination address
- * @size: maximum number of bytes to copy, including the trailing NUL
- * @unsafe_ptr: unsafe address
- * Return:
- * > 0 length of the string including the trailing NUL on success
- * < 0 error
+ * Description
+ * Copy a NUL terminated string from an unsafe address
+ * *unsafe_ptr* to *dst*. The *size* should include the
+ * terminating NUL byte. In case the string length is smaller than
+ * *size*, the target is not padded with further NUL bytes. If the
+ * string length is larger than *size*, just *size*-1 bytes are
+ * copied and the last byte is set to NUL.
*
- * u64 bpf_get_socket_cookie(skb)
- * Get the cookie for the socket stored inside sk_buff.
- * @skb: pointer to skb
- * Return: 8 Bytes non-decreasing number on success or 0 if the socket
- * field is missing inside sk_buff
+ * On success, the length of the copied string is returned. This
+ * makes this helper useful in tracing programs for reading
+ * strings, and more importantly to get its length at runtime. See
+ * the following snippet:
*
- * u32 bpf_get_socket_uid(skb)
- * Get the owner uid of the socket stored inside sk_buff.
- * @skb: pointer to skb
- * Return: uid of the socket owner on success or overflowuid if failed.
+ * ::
*
- * u32 bpf_set_hash(skb, hash)
- * Set full skb->hash.
- * @skb: pointer to skb
- * @hash: hash to set
+ * SEC("kprobe/sys_open")
+ * void bpf_sys_open(struct pt_regs *ctx)
+ * {
+ * char buf[PATHLEN]; // PATHLEN is defined to 256
+ * int res = bpf_probe_read_str(buf, sizeof(buf),
+ * ctx->di);
*
- * int bpf_setsockopt(bpf_socket, level, optname, optval, optlen)
- * Calls setsockopt. Not all opts are available, only those with
- * integer optvals plus TCP_CONGESTION.
- * Supported levels: SOL_SOCKET and IPPROTO_TCP
- * @bpf_socket: pointer to bpf_socket
- * @level: SOL_SOCKET or IPPROTO_TCP
- * @optname: option name
- * @optval: pointer to option value
- * @optlen: length of optval in bytes
- * Return: 0 or negative error
+ * // Consume buf, for example push it to
+ * // userspace via bpf_perf_event_output(); we
+ * // can use res (the string length) as event
+ * // size, after checking its boundaries.
+ * }
*
- * int bpf_getsockopt(bpf_socket, level, optname, optval, optlen)
- * Calls getsockopt. Not all opts are available.
- * Supported levels: IPPROTO_TCP
- * @bpf_socket: pointer to bpf_socket
- * @level: IPPROTO_TCP
- * @optname: option name
- * @optval: pointer to option value
- * @optlen: length of optval in bytes
- * Return: 0 or negative error
+ * In comparison, using **bpf_probe_read()** helper here instead
+ * to read the string would require to estimate the length at
+ * compile time, and would often result in copying more memory
+ * than necessary.
*
- * int bpf_sock_ops_cb_flags_set(bpf_sock_ops, flags)
- * Set callback flags for sock_ops
- * @bpf_sock_ops: pointer to bpf_sock_ops_kern struct
- * @flags: flags value
- * Return: 0 for no error
- * -EINVAL if there is no full tcp socket
- * bits in flags that are not supported by current kernel
+ * Another useful use case is when parsing individual process
+ * arguments or individual environment variables navigating
+ * *current*\ **->mm->arg_start** and *current*\
+ * **->mm->env_start**: using this helper and the return value,
+ * one can quickly iterate at the right offset of the memory area.
+ * Return
+ * On success, the strictly positive length of the string,
+ * including the trailing NUL character. On error, a negative
+ * value.
*
- * int bpf_skb_adjust_room(skb, len_diff, mode, flags)
- * Grow or shrink room in sk_buff.
- * @skb: pointer to skb
- * @len_diff: (signed) amount of room to grow/shrink
- * @mode: operation mode (enum bpf_adj_room_mode)
- * @flags: reserved for future use
- * Return: 0 on success or negative error code
+ * u64 bpf_get_socket_cookie(struct sk_buff *skb)
+ * Description
+ * If the **struct sk_buff** pointed by *skb* has a known socket,
+ * retrieve the cookie (generated by the kernel) of this socket.
+ * If no cookie has been set yet, generate a new cookie. Once
+ * generated, the socket cookie remains stable for the life of the
+ * socket. This helper can be useful for monitoring per socket
+ * networking traffic statistics as it provides a unique socket
+ * identifier per namespace.
+ * Return
+ * A 8-byte long non-decreasing number on success, or 0 if the
+ * socket field is missing inside *skb*.
*
- * int bpf_sk_redirect_map(map, key, flags)
- * Redirect skb to a sock in map using key as a lookup key for the
- * sock in map.
- * @map: pointer to sockmap
- * @key: key to lookup sock in map
- * @flags: reserved for future use
- * Return: SK_PASS
+ * u32 bpf_get_socket_uid(struct sk_buff *skb)
+ * Return
+ * The owner UID of the socket associated to *skb*. If the socket
+ * is **NULL**, or if it is not a full socket (i.e. if it is a
+ * time-wait or a request socket instead), **overflowuid** value
+ * is returned (note that **overflowuid** might also be the actual
+ * UID value for the socket).
*
- * int bpf_sock_map_update(skops, map, key, flags)
- * @skops: pointer to bpf_sock_ops
- * @map: pointer to sockmap to update
- * @key: key to insert/update sock in map
- * @flags: same flags as map update elem
+ * u32 bpf_set_hash(struct sk_buff *skb, u32 hash)
+ * Description
+ * Set the full hash for *skb* (set the field *skb*\ **->hash**)
+ * to value *hash*.
+ * Return
+ * 0
*
- * int bpf_xdp_adjust_meta(xdp_md, delta)
- * Adjust the xdp_md.data_meta by delta
- * @xdp_md: pointer to xdp_md
- * @delta: An positive/negative integer to be added to xdp_md.data_meta
- * Return: 0 on success or negative on error
+ * int bpf_setsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **setsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **setsockopt(2)** for more information.
+ * The option value of length *optlen* is pointed by *optval*.
*
- * int bpf_perf_event_read_value(map, flags, buf, buf_size)
- * read perf event counter value and perf event enabled/running time
- * @map: pointer to perf_event_array map
- * @flags: index of event in the map or bitmask flags
- * @buf: buf to fill
- * @buf_size: size of the buf
- * Return: 0 on success or negative error code
+ * This helper actually implements a subset of **setsockopt()**.
+ * It supports the following *level*\ s:
*
- * int bpf_perf_prog_read_value(ctx, buf, buf_size)
- * read perf prog attached perf event counter and enabled/running time
- * @ctx: pointer to ctx
- * @buf: buf to fill
- * @buf_size: size of the buf
- * Return : 0 on success or negative error code
+ * * **SOL_SOCKET**, which supports the following *optname*\ s:
+ * **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
+ * **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**.
+ * * **IPPROTO_TCP**, which supports the following *optname*\ s:
+ * **TCP_CONGESTION**, **TCP_BPF_IW**,
+ * **TCP_BPF_SNDCWND_CLAMP**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*
- * int bpf_override_return(pt_regs, rc)
- * @pt_regs: pointer to struct pt_regs
- * @rc: the return value to set
+ * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 flags)
+ * Description
+ * Grow or shrink the room for data in the packet associated to
+ * *skb* by *len_diff*, and according to the selected *mode*.
*
- * int bpf_msg_redirect_map(map, key, flags)
- * Redirect msg to a sock in map using key as a lookup key for the
- * sock in map.
- * @map: pointer to sockmap
- * @key: key to lookup sock in map
- * @flags: reserved for future use
- * Return: SK_PASS
+ * There is a single supported mode at this time:
*
- * int bpf_bind(ctx, addr, addr_len)
- * Bind socket to address. Only binding to IP is supported, no port can be
- * set in addr.
- * @ctx: pointer to context of type bpf_sock_addr
- * @addr: pointer to struct sockaddr to bind socket to
- * @addr_len: length of sockaddr structure
- * Return: 0 on success or negative error code
+ * * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
+ * (room space is added or removed below the layer 3 header).
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the endpoint referenced by *map* at
+ * index *key*. Depending on its type, this *map* can contain
+ * references to net devices (for forwarding packets through other
+ * ports), or to CPUs (for redirecting XDP frames to another CPU;
+ * but this is only implemented for native XDP (with driver
+ * support) as of this writing).
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * When used to redirect packets to net devices, this helper
+ * provides a high performance increase over **bpf_redirect**\ ().
+ * This is due to various implementation details of the underlying
+ * mechanisms, one of which is the fact that **bpf_redirect_map**\
+ * () tries to send packet as a "bulk" to the device.
+ * Return
+ * **XDP_REDIRECT** on success, or **XDP_ABORTED** on error.
+ *
+ * int bpf_sk_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * Redirect the packet to the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKMAP**) at index *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress path otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * Add an entry to, or update a *map* referencing sockets. The
+ * *skops* is used as a new value for the entry associated to
+ * *key*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * If the *map* has eBPF programs (parser and verdict), those will
+ * be inherited by the socket being added. If the socket is
+ * already attached to eBPF programs, this results in an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xdp_adjust_meta(struct xdp_buff *xdp_md, int delta)
+ * Description
+ * Adjust the address pointed by *xdp_md*\ **->data_meta** by
+ * *delta* (which can be positive or negative). Note that this
+ * operation modifies the address stored in *xdp_md*\ **->data**,
+ * so the latter must be loaded only after the helper has been
+ * called.
+ *
+ * The use of *xdp_md*\ **->data_meta** is optional and programs
+ * are not required to use it. The rationale is that when the
+ * packet is processed with XDP (e.g. as DoS filter), it is
+ * possible to push further meta data along with it before passing
+ * to the stack, and to give the guarantee that an ingress eBPF
+ * program attached as a TC classifier on the same device can pick
+ * this up for further post-processing. Since TC works with socket
+ * buffers, it remains possible to set from XDP the **mark** or
+ * **priority** pointers, or other pointers for the socket buffer.
+ * Having this scratch space generic and programmable allows for
+ * more flexibility as the user is free to store whatever meta
+ * data they need.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_event_read_value(struct bpf_map *map, u64 flags, struct bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * Read the value of a perf event counter, and store it into *buf*
+ * of size *buf_size*. This helper relies on a *map* of type
+ * **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. The nature of the perf event
+ * counter is selected when *map* is updated with perf event file
+ * descriptors. The *map* is an array whose size is the number of
+ * available CPUs, and each cell contains a value relative to one
+ * CPU. The value to retrieve is indicated by *flags*, that
+ * contains the index of the CPU to look up, masked with
+ * **BPF_F_INDEX_MASK**. Alternatively, *flags* can be set to
+ * **BPF_F_CURRENT_CPU** to indicate that the value for the
+ * current CPU should be retrieved.
+ *
+ * This helper behaves in a way close to
+ * **bpf_perf_event_read**\ () helper, save that instead of
+ * just returning the value observed, it fills the *buf*
+ * structure. This allows for additional data to be retrieved: in
+ * particular, the enabled and running times (in *buf*\
+ * **->enabled** and *buf*\ **->running**, respectively) are
+ * copied. In general, **bpf_perf_event_read_value**\ () is
+ * recommended over **bpf_perf_event_read**\ (), which has some
+ * ABI issues and provides fewer functionalities.
+ *
+ * These values are interesting, because hardware PMU (Performance
+ * Monitoring Unit) counters are limited resources. When there are
+ * more PMU based perf events opened than available counters,
+ * kernel will multiplex these events so each event gets certain
+ * percentage (but not all) of the PMU time. In case that
+ * multiplexing happens, the number of samples or counter value
+ * will not reflect the case compared to when no multiplexing
+ * occurs. This makes comparison between different runs difficult.
+ * Typically, the counter value should be normalized before
+ * comparing to other experiments. The usual normalization is done
+ * as follows.
+ *
+ * ::
+ *
+ * normalized_counter = counter * t_enabled / t_running
+ *
+ * Where t_enabled is the time enabled for event and t_running is
+ * the time running for event since last normalization. The
+ * enabled and running times are accumulated since the perf event
+ * open. To achieve scaling factor between two invocations of an
+ * eBPF program, users can can use CPU id as the key (which is
+ * typical for perf array usage model) to remember the previous
+ * value and do the calculation inside the eBPF program.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct bpf_perf_event_value *buf, u32 buf_size)
+ * Description
+ * For en eBPF program attached to a perf event, retrieve the
+ * value of the event counter associated to *ctx* and store it in
+ * the structure pointed by *buf* and of size *buf_size*. Enabled
+ * and running times are also stored in the structure (see
+ * description of helper **bpf_perf_event_read_value**\ () for
+ * more details).
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_getsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
+ * Description
+ * Emulate a call to **getsockopt()** on the socket associated to
+ * *bpf_socket*, which must be a full socket. The *level* at
+ * which the option resides and the name *optname* of the option
+ * must be specified, see **getsockopt(2)** for more information.
+ * The retrieved value is stored in the structure pointed by
+ * *opval* and of length *optlen*.
+ *
+ * This helper actually implements a subset of **getsockopt()**.
+ * It supports the following *level*\ s:
+ *
+ * * **IPPROTO_TCP**, which supports *optname*
+ * **TCP_CONGESTION**.
+ * * **IPPROTO_IP**, which supports *optname* **IP_TOS**.
+ * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_override_return(struct pt_reg *regs, u64 rc)
+ * Description
+ * Used for error injection, this helper uses kprobes to override
+ * the return value of the probed function, and to set it to *rc*.
+ * The first argument is the context *regs* on which the kprobe
+ * works.
+ *
+ * This helper works by setting setting the PC (program counter)
+ * to an override function which is run in place of the original
+ * probed function. This means the probed function is not run at
+ * all. The replacement function just returns with the required
+ * value.
+ *
+ * This helper has security implications, and thus is subject to
+ * restrictions. It is only available if the kernel was compiled
+ * with the **CONFIG_BPF_KPROBE_OVERRIDE** configuration
+ * option, and in this case it only works on functions tagged with
+ * **ALLOW_ERROR_INJECTION** in the kernel code.
+ *
+ * Also, the helper is only available for the architectures having
+ * the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
+ * x86 architecture is the only one to support this feature.
+ * Return
+ * 0
+ *
+ * int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int argval)
+ * Description
+ * Attempt to set the value of the **bpf_sock_ops_cb_flags** field
+ * for the full TCP socket associated to *bpf_sock_ops* to
+ * *argval*.
+ *
+ * The primary use of this field is to determine if there should
+ * be calls to eBPF programs of type
+ * **BPF_PROG_TYPE_SOCK_OPS** at various points in the TCP
+ * code. A program of the same type can change its value, per
+ * connection and as necessary, when the connection is
+ * established. This field is directly accessible for reading, but
+ * this helper must be used for updates in order to return an
+ * error if an eBPF program tries to set a callback that is not
+ * supported in the current kernel.
+ *
+ * The supported callback values that *argval* can combine are:
+ *
+ * * **BPF_SOCK_OPS_RTO_CB_FLAG** (retransmission time out)
+ * * **BPF_SOCK_OPS_RETRANS_CB_FLAG** (retransmission)
+ * * **BPF_SOCK_OPS_STATE_CB_FLAG** (TCP state change)
+ *
+ * Here are some examples of where one could call such eBPF
+ * program:
+ *
+ * * When RTO fires.
+ * * When a packet is retransmitted.
+ * * When the connection terminates.
+ * * When a packet is sent.
+ * * When a packet is received.
+ * Return
+ * Code **-EINVAL** if the socket is not a full TCP socket;
+ * otherwise, a positive number containing the bits that could not
+ * be set is returned (which comes down to 0 if all bits were set
+ * as required).
+ *
+ * int bpf_msg_redirect_map(struct sk_msg_buff *msg, struct bpf_map *map, u32 key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ * socket level. If the message *msg* is allowed to pass (i.e. if
+ * the verdict eBPF program returns **SK_PASS**), redirect it to
+ * the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKMAP**) at index *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress path otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_msg_apply_bytes(struct sk_msg_buff *msg, u32 bytes)
+ * Description
+ * For socket policies, apply the verdict of the eBPF program to
+ * the next *bytes* (number of bytes) of message *msg*.
+ *
+ * For example, this helper can be used in the following cases:
+ *
+ * * A single **sendmsg**\ () or **sendfile**\ () system call
+ * contains multiple logical messages that the eBPF program is
+ * supposed to read and for which it should apply a verdict.
+ * * An eBPF program only cares to read the first *bytes* of a
+ * *msg*. If the message has a large payload, then setting up
+ * and calling the eBPF program repeatedly for all bytes, even
+ * though the verdict is already known, would create unnecessary
+ * overhead.
+ *
+ * When called from within an eBPF program, the helper sets a
+ * counter internal to the BPF infrastructure, that is used to
+ * apply the last verdict to the next *bytes*. If *bytes* is
+ * smaller than the current data being processed from a
+ * **sendmsg**\ () or **sendfile**\ () system call, the first
+ * *bytes* will be sent and the eBPF program will be re-run with
+ * the pointer for start of data pointing to byte number *bytes*
+ * **+ 1**. If *bytes* is larger than the current data being
+ * processed, then the eBPF verdict will be applied to multiple
+ * **sendmsg**\ () or **sendfile**\ () calls until *bytes* are
+ * consumed.
+ *
+ * Note that if a socket closes with the internal counter holding
+ * a non-zero value, this is not a problem because data is not
+ * being buffered for *bytes* and is sent as it is received.
+ * Return
+ * 0
+ *
+ * int bpf_msg_cork_bytes(struct sk_msg_buff *msg, u32 bytes)
+ * Description
+ * For socket policies, prevent the execution of the verdict eBPF
+ * program for message *msg* until *bytes* (byte number) have been
+ * accumulated.
+ *
+ * This can be used when one needs a specific number of bytes
+ * before a verdict can be assigned, even if the data spans
+ * multiple **sendmsg**\ () or **sendfile**\ () calls. The extreme
+ * case would be a user calling **sendmsg**\ () repeatedly with
+ * 1-byte long message segments. Obviously, this is bad for
+ * performance, but it is still valid. If the eBPF program needs
+ * *bytes* bytes to validate a header, this helper can be used to
+ * prevent the eBPF program to be called again until *bytes* have
+ * been accumulated.
+ * Return
+ * 0
+ *
+ * int bpf_msg_pull_data(struct sk_msg_buff *msg, u32 start, u32 end, u64 flags)
+ * Description
+ * For socket policies, pull in non-linear data from user space
+ * for *msg* and set pointers *msg*\ **->data** and *msg*\
+ * **->data_end** to *start* and *end* bytes offsets into *msg*,
+ * respectively.
+ *
+ * If a program of type **BPF_PROG_TYPE_SK_MSG** is run on a
+ * *msg* it can only parse data that the (**data**, **data_end**)
+ * pointers have already consumed. For **sendmsg**\ () hooks this
+ * is likely the first scatterlist element. But for calls relying
+ * on the **sendpage** handler (e.g. **sendfile**\ ()) this will
+ * be the range (**0**, **0**) because the data is shared with
+ * user space and by default the objective is to avoid allowing
+ * user space to modify data while (or after) eBPF verdict is
+ * being decided. This helper can be used to pull in data and to
+ * set the start and end pointer to given values. Data will be
+ * copied if necessary (i.e. if data was not linear and if start
+ * and end pointers do not point to the same chunk).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int addr_len)
+ * Description
+ * Bind the socket associated to *ctx* to the address pointed by
+ * *addr*, of length *addr_len*. This allows for making outgoing
+ * connection from the desired IP address, which can be useful for
+ * example when all processes inside a cgroup should use one
+ * single IP address on a host that has multiple IP configured.
+ *
+ * This helper works for IPv4 and IPv6, TCP and UDP sockets. The
+ * domain (*addr*\ **->sa_family**) must be **AF_INET** (or
+ * **AF_INET6**). Looking for a free port to bind to can be
+ * expensive, therefore binding to port is not permitted by the
+ * helper: *addr*\ **->sin_port** (or **sin6_port**, respectively)
+ * must be set to zero.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
+ * Description
+ * Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
+ * only possible to shrink the packet as of this writing,
+ * therefore *delta* must be a negative integer.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_skb_get_xfrm_state(struct sk_buff *skb, u32 index, struct bpf_xfrm_state *xfrm_state, u32 size, u64 flags)
+ * Description
+ * Retrieve the XFRM state (IP transform framework, see also
+ * **ip-xfrm(8)**) at *index* in XFRM "security path" for *skb*.
+ *
+ * The retrieved value is stored in the **struct bpf_xfrm_state**
+ * pointed by *xfrm_state* and of length *size*.
+ *
+ * All values for *flags* are reserved for future usage, and must
+ * be left at zero.
+ *
+ * This helper is available only if the kernel was compiled with
+ * **CONFIG_XFRM** configuration option.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
+ * Description
+ * Return a user or a kernel stack in bpf program provided buffer.
+ * To achieve this, the helper needs *ctx*, which is a pointer
+ * to the context on which the tracing program is executed.
+ * To store the stacktrace, the bpf program provides *buf* with
+ * a nonnegative *size*.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_USER_BUILD_ID**
+ * Collect buildid+offset instead of ips for user stack,
+ * only valid if **BPF_F_USER_STACK** is also specified.
+ *
+ * **bpf_get_stack**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
+ * to sufficient large buffer size. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=<new value>
+ *
+ * Return
+ * a non-negative value equal to or less than size on success, or
+ * a negative error in case of failure.
+ *
+ * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void *to, u32 len, u32 start_header)
+ * Description
+ * This helper is similar to **bpf_skb_load_bytes**\ () in that
+ * it provides an easy way to load *len* bytes from *offset*
+ * from the packet associated to *skb*, into the buffer pointed
+ * by *to*. The difference to **bpf_skb_load_bytes**\ () is that
+ * a fifth argument *start_header* exists in order to select a
+ * base offset to start from. *start_header* can be one of:
+ *
+ * **BPF_HDR_START_MAC**
+ * Base offset to load data from is *skb*'s mac header.
+ * **BPF_HDR_START_NET**
+ * Base offset to load data from is *skb*'s network header.
+ *
+ * In general, "direct packet access" is the preferred method to
+ * access packet data, however, this helper is in particular useful
+ * in socket filters where *skb*\ **->data** does not always point
+ * to the start of the mac header and where "direct packet access"
+ * is not available.
+ *
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags)
+ * Description
+ * Do FIB lookup in kernel tables using parameters in *params*.
+ * If lookup is successful and result shows packet is to be
+ * forwarded, the neighbor tables are searched for the nexthop.
+ * If successful (ie., FIB lookup shows forwarding and nexthop
+ * is resolved), the nexthop address is returned in ipv4_dst,
+ * ipv6_dst or mpls_out based on family, smac is set to mac
+ * address of egress device, dmac is set to nexthop mac address,
+ * rt_metric is set to metric from route.
+ *
+ * *plen* argument is the size of the passed in struct.
+ * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags:
+ *
+ * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs
+ * full lookup using FIB rules
+ * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress
+ * perspective (default is ingress)
+ *
+ * *ctx* is either **struct xdp_md** for XDP programs or
+ * **struct sk_buff** tc cls_act programs.
+ *
+ * Return
+ * Egress device index on success, 0 if packet needs to continue
+ * up the stack for further processing or a negative error in case
+ * of failure.
+ *
+ * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * Add an entry to, or update a sockhash *map* referencing sockets.
+ * The *skops* is used as a new value for the entry associated to
+ * *key*. *flags* is one of:
+ *
+ * **BPF_NOEXIST**
+ * The entry for *key* must not exist in the map.
+ * **BPF_EXIST**
+ * The entry for *key* must already exist in the map.
+ * **BPF_ANY**
+ * No condition on the existence of the entry for *key*.
+ *
+ * If the *map* has eBPF programs (parser and verdict), those will
+ * be inherited by the socket being added. If the socket is
+ * already attached to eBPF programs, this results in an error.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ * socket level. If the message *msg* is allowed to pass (i.e. if
+ * the verdict eBPF program returns **SK_PASS**), redirect it to
+ * the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress path otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void *key, u64 flags)
+ * Description
+ * This helper is used in programs implementing policies at the
+ * skb socket level. If the sk_buff *skb* is allowed to pass (i.e.
+ * if the verdeict eBPF program returns **SK_PASS**), redirect it
+ * to the socket referenced by *map* (of type
+ * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and
+ * egress interfaces can be used for redirection. The
+ * **BPF_F_INGRESS** value in *flags* is used to make the
+ * distinction (ingress path is selected if the flag is present,
+ * egress otherwise). This is the only flag supported for now.
+ * Return
+ * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+ * Description
+ * Encapsulate the packet associated to *skb* within a Layer 3
+ * protocol header. This header is provided in the buffer at
+ * address *hdr*, with *len* its size in bytes. *type* indicates
+ * the protocol of the header and can be one of:
+ *
+ * **BPF_LWT_ENCAP_SEG6**
+ * IPv6 encapsulation with Segment Routing Header
+ * (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
+ * the IPv6 header is computed by the kernel.
+ * **BPF_LWT_ENCAP_SEG6_INLINE**
+ * Only works if *skb* contains an IPv6 packet. Insert a
+ * Segment Routing Header (**struct ipv6_sr_hdr**) inside
+ * the IPv6 header.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len)
+ * Description
+ * Store *len* bytes from address *from* into the packet
+ * associated to *skb*, at *offset*. Only the flags, tag and TLVs
+ * inside the outermost IPv6 Segment Routing Header can be
+ * modified through this helper.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
+ * Description
+ * Adjust the size allocated to TLVs in the outermost IPv6
+ * Segment Routing Header contained in the packet associated to
+ * *skb*, at position *offset* by *delta* bytes. Only offsets
+ * after the segments are accepted. *delta* can be as well
+ * positive (growing) as negative (shrinking).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param, u32 param_len)
+ * Description
+ * Apply an IPv6 Segment Routing action of type *action* to the
+ * packet associated to *skb*. Each action takes a parameter
+ * contained at address *param*, and of length *param_len* bytes.
+ * *action* can be one of:
+ *
+ * **SEG6_LOCAL_ACTION_END_X**
+ * End.X action: Endpoint with Layer-3 cross-connect.
+ * Type of *param*: **struct in6_addr**.
+ * **SEG6_LOCAL_ACTION_END_T**
+ * End.T action: Endpoint with specific IPv6 table lookup.
+ * Type of *param*: **int**.
+ * **SEG6_LOCAL_ACTION_END_B6**
+ * End.B6 action: Endpoint bound to an SRv6 policy.
+ * Type of param: **struct ipv6_sr_hdr**.
+ * **SEG6_LOCAL_ACTION_END_B6_ENCAP**
+ * End.B6.Encap action: Endpoint bound to an SRv6
+ * encapsulation policy.
+ * Type of param: **struct ipv6_sr_hdr**.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -821,7 +2070,19 @@
FN(msg_apply_bytes), \
FN(msg_cork_bytes), \
FN(msg_pull_data), \
- FN(bind),
+ FN(bind), \
+ FN(xdp_adjust_tail), \
+ FN(skb_get_xfrm_state), \
+ FN(get_stack), \
+ FN(skb_load_bytes_relative), \
+ FN(fib_lookup), \
+ FN(sock_hash_update), \
+ FN(msg_redirect_hash), \
+ FN(sk_redirect_hash), \
+ FN(lwt_push_encap), \
+ FN(lwt_seg6_store_bytes), \
+ FN(lwt_seg6_adjust_srh), \
+ FN(lwt_seg6_action),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -855,11 +2116,14 @@
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
#define BPF_F_TUNINFO_IPV6 (1ULL << 0)
-/* BPF_FUNC_get_stackid flags. */
+/* flags for both BPF_FUNC_get_stackid and BPF_FUNC_get_stack. */
#define BPF_F_SKIP_FIELD_MASK 0xffULL
#define BPF_F_USER_STACK (1ULL << 8)
+/* flags used by BPF_FUNC_get_stackid only. */
#define BPF_F_FAST_STACK_CMP (1ULL << 9)
#define BPF_F_REUSE_STACKID (1ULL << 10)
+/* flags used by BPF_FUNC_get_stack only. */
+#define BPF_F_USER_BUILD_ID (1ULL << 11)
/* BPF_FUNC_skb_set_tunnel_key flags. */
#define BPF_F_ZERO_CSUM_TX (1ULL << 1)
@@ -879,6 +2143,18 @@
BPF_ADJ_ROOM_NET,
};
+/* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
+enum bpf_hdr_start_off {
+ BPF_HDR_START_MAC,
+ BPF_HDR_START_NET,
+};
+
+/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
+enum bpf_lwt_encap_mode {
+ BPF_LWT_ENCAP_SEG6,
+ BPF_LWT_ENCAP_SEG6_INLINE
+};
+
/* user accessible mirror of in-kernel sk_buff.
* new fields can only be added to the end of this structure
*/
@@ -927,6 +2203,19 @@
__u32 tunnel_label;
};
+/* user accessible mirror of in-kernel xfrm_state.
+ * new fields can only be added to the end of this structure
+ */
+struct bpf_xfrm_state {
+ __u32 reqid;
+ __u32 spi; /* Stored in network byte order */
+ __u16 family;
+ union {
+ __u32 remote_ipv4; /* Stored in network byte order */
+ __u32 remote_ipv6[4]; /* Stored in network byte order */
+ };
+};
+
/* Generic BPF return codes which all BPF program types may support.
* The values are binary compatible with their TC_ACT_* counter-part to
* provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
@@ -999,6 +2288,14 @@
struct sk_msg_md {
void *data;
void *data_end;
+
+ __u32 family;
+ __u32 remote_ip4; /* Stored in network byte order */
+ __u32 local_ip4; /* Stored in network byte order */
+ __u32 remote_ip6[4]; /* Stored in network byte order */
+ __u32 local_ip6[4]; /* Stored in network byte order */
+ __u32 remote_port; /* Stored in network byte order */
+ __u32 local_port; /* stored in host byte order */
};
#define BPF_TAG_SIZE 8
@@ -1017,8 +2314,13 @@
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
__u32 ifindex;
+ __u32 gpl_compatible:1;
__u64 netns_dev;
__u64 netns_ino;
+ __u32 nr_jited_ksyms;
+ __u32 nr_jited_func_lens;
+ __aligned_u64 jited_ksyms;
+ __aligned_u64 jited_func_lens;
} __attribute__((aligned(8)));
struct bpf_map_info {
@@ -1032,6 +2334,15 @@
__u32 ifindex;
__u64 netns_dev;
__u64 netns_ino;
+ __u32 btf_id;
+ __u32 btf_key_type_id;
+ __u32 btf_value_type_id;
+} __attribute__((aligned(8)));
+
+struct bpf_btf_info {
+ __aligned_u64 btf;
+ __u32 btf_size;
+ __u32 id;
} __attribute__((aligned(8)));
/* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
@@ -1212,4 +2523,64 @@
__u64 args[0];
};
+/* DIRECT: Skip the FIB rules and go to FIB table associated with device
+ * OUTPUT: Do lookup from egress perspective; default is ingress
+ */
+#define BPF_FIB_LOOKUP_DIRECT BIT(0)
+#define BPF_FIB_LOOKUP_OUTPUT BIT(1)
+
+struct bpf_fib_lookup {
+ /* input */
+ __u8 family; /* network family, AF_INET, AF_INET6, AF_MPLS */
+
+ /* set if lookup is to consider L4 data - e.g., FIB rules */
+ __u8 l4_protocol;
+ __be16 sport;
+ __be16 dport;
+
+ /* total length of packet from network header - used for MTU check */
+ __u16 tot_len;
+ __u32 ifindex; /* L3 device index for lookup */
+
+ union {
+ /* inputs to lookup */
+ __u8 tos; /* AF_INET */
+ __be32 flowlabel; /* AF_INET6 */
+
+ /* output: metric of fib result */
+ __u32 rt_metric;
+ };
+
+ union {
+ __be32 mpls_in;
+ __be32 ipv4_src;
+ __u32 ipv6_src[4]; /* in6_addr; network order */
+ };
+
+ /* input to bpf_fib_lookup, *dst is destination address.
+ * output: bpf_fib_lookup sets to gateway address
+ */
+ union {
+ /* return for MPLS lookups */
+ __be32 mpls_out[4]; /* support up to 4 labels */
+ __be32 ipv4_dst;
+ __u32 ipv6_dst[4]; /* in6_addr; network order */
+ };
+
+ /* output */
+ __be16 h_vlan_proto;
+ __be16 h_vlan_TCI;
+ __u8 smac[6]; /* ETH_ALEN */
+ __u8 dmac[6]; /* ETH_ALEN */
+};
+
+enum bpf_task_fd_type {
+ BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
+ BPF_FD_TYPE_TRACEPOINT, /* tp name */
+ BPF_FD_TYPE_KPROBE, /* (symbol + offset) or addr */
+ BPF_FD_TYPE_KRETPROBE, /* (symbol + offset) or addr */
+ BPF_FD_TYPE_UPROBE, /* filename + offset */
+ BPF_FD_TYPE_URETPROBE, /* filename + offset */
+};
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
new file mode 100644
index 0000000..0b5ddbe
--- /dev/null
+++ b/tools/include/uapi/linux/btf.h
@@ -0,0 +1,113 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2018 Facebook */
+#ifndef _UAPI__LINUX_BTF_H__
+#define _UAPI__LINUX_BTF_H__
+
+#include <linux/types.h>
+
+#define BTF_MAGIC 0xeB9F
+#define BTF_VERSION 1
+
+struct btf_header {
+ __u16 magic;
+ __u8 version;
+ __u8 flags;
+ __u32 hdr_len;
+
+ /* All offsets are in bytes relative to the end of this header */
+ __u32 type_off; /* offset of type section */
+ __u32 type_len; /* length of type section */
+ __u32 str_off; /* offset of string section */
+ __u32 str_len; /* length of string section */
+};
+
+/* Max # of type identifier */
+#define BTF_MAX_TYPE 0x0000ffff
+/* Max offset into the string section */
+#define BTF_MAX_NAME_OFFSET 0x0000ffff
+/* Max # of struct/union/enum members or func args */
+#define BTF_MAX_VLEN 0xffff
+
+struct btf_type {
+ __u32 name_off;
+ /* "info" bits arrangement
+ * bits 0-15: vlen (e.g. # of struct's members)
+ * bits 16-23: unused
+ * bits 24-27: kind (e.g. int, ptr, array...etc)
+ * bits 28-31: unused
+ */
+ __u32 info;
+ /* "size" is used by INT, ENUM, STRUCT and UNION.
+ * "size" tells the size of the type it is describing.
+ *
+ * "type" is used by PTR, TYPEDEF, VOLATILE, CONST and RESTRICT.
+ * "type" is a type_id referring to another type.
+ */
+ union {
+ __u32 size;
+ __u32 type;
+ };
+};
+
+#define BTF_INFO_KIND(info) (((info) >> 24) & 0x0f)
+#define BTF_INFO_VLEN(info) ((info) & 0xffff)
+
+#define BTF_KIND_UNKN 0 /* Unknown */
+#define BTF_KIND_INT 1 /* Integer */
+#define BTF_KIND_PTR 2 /* Pointer */
+#define BTF_KIND_ARRAY 3 /* Array */
+#define BTF_KIND_STRUCT 4 /* Struct */
+#define BTF_KIND_UNION 5 /* Union */
+#define BTF_KIND_ENUM 6 /* Enumeration */
+#define BTF_KIND_FWD 7 /* Forward */
+#define BTF_KIND_TYPEDEF 8 /* Typedef */
+#define BTF_KIND_VOLATILE 9 /* Volatile */
+#define BTF_KIND_CONST 10 /* Const */
+#define BTF_KIND_RESTRICT 11 /* Restrict */
+#define BTF_KIND_MAX 11
+#define NR_BTF_KINDS 12
+
+/* For some specific BTF_KIND, "struct btf_type" is immediately
+ * followed by extra data.
+ */
+
+/* BTF_KIND_INT is followed by a u32 and the following
+ * is the 32 bits arrangement:
+ */
+#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
+#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
+#define BTF_INT_BITS(VAL) ((VAL) & 0x0000ffff)
+
+/* Attributes stored in the BTF_INT_ENCODING */
+#define BTF_INT_SIGNED (1 << 0)
+#define BTF_INT_CHAR (1 << 1)
+#define BTF_INT_BOOL (1 << 2)
+
+/* BTF_KIND_ENUM is followed by multiple "struct btf_enum".
+ * The exact number of btf_enum is stored in the vlen (of the
+ * info in "struct btf_type").
+ */
+struct btf_enum {
+ __u32 name_off;
+ __s32 val;
+};
+
+/* BTF_KIND_ARRAY is followed by one "struct btf_array" */
+struct btf_array {
+ __u32 type;
+ __u32 index_type;
+ __u32 nelems;
+};
+
+/* BTF_KIND_STRUCT and BTF_KIND_UNION are followed
+ * by multiple "struct btf_member". The exact number
+ * of btf_member is stored in the vlen (of the info in
+ * "struct btf_type").
+ */
+struct btf_member {
+ __u32 name_off;
+ __u32 type;
+ __u32 offset; /* offset in bits */
+};
+
+#endif /* _UAPI__LINUX_BTF_H__ */
diff --git a/tools/include/uapi/linux/erspan.h b/tools/include/uapi/linux/erspan.h
new file mode 100644
index 0000000..8415730
--- /dev/null
+++ b/tools/include/uapi/linux/erspan.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * ERSPAN Tunnel Metadata
+ *
+ * Copyright (c) 2018 VMware
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * Userspace API for metadata mode ERSPAN tunnel
+ */
+#ifndef _UAPI_ERSPAN_H
+#define _UAPI_ERSPAN_H
+
+#include <linux/types.h> /* For __beXX in userspace */
+#include <asm/byteorder.h>
+
+/* ERSPAN version 2 metadata header */
+struct erspan_md2 {
+ __be32 timestamp;
+ __be16 sgt; /* security group tag */
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+ __u8 hwid_upper:2,
+ ft:5,
+ p:1;
+ __u8 o:1,
+ gra:2,
+ dir:1,
+ hwid:4;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+ __u8 p:1,
+ ft:5,
+ hwid_upper:2;
+ __u8 hwid:4,
+ dir:1,
+ gra:2,
+ o:1;
+#else
+#error "Please fix <asm/byteorder.h>"
+#endif
+};
+
+struct erspan_metadata {
+ int version;
+ union {
+ __be32 index; /* Version 1 (type II)*/
+ struct erspan_md2 md2; /* Version 2 (type III) */
+ } u;
+};
+
+#endif /* _UAPI_ERSPAN_H */
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 64c679d..6070e65 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o nlattr.o
+libbpf-y := libbpf.o bpf.o nlattr.o btf.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index e6d5f8d..f3fab4a 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -69,7 +69,7 @@
FEATURE_TESTS = libelf libelf-getphdrnum libelf-mmap bpf
FEATURE_DISPLAY = libelf bpf
-INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi
+INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi -I$(srctree)/tools/perf
FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
check_feat := 1
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index acbb3f8..9ddc89d 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -73,43 +73,77 @@
return syscall(__NR_bpf, cmd, attr, size);
}
-int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
- int key_size, int value_size, int max_entries,
- __u32 map_flags, int node)
+int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
{
- __u32 name_len = name ? strlen(name) : 0;
+ __u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
union bpf_attr attr;
memset(&attr, '\0', sizeof(attr));
- attr.map_type = map_type;
- attr.key_size = key_size;
- attr.value_size = value_size;
- attr.max_entries = max_entries;
- attr.map_flags = map_flags;
- memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
-
- if (node >= 0) {
- attr.map_flags |= BPF_F_NUMA_NODE;
- attr.numa_node = node;
- }
+ attr.map_type = create_attr->map_type;
+ attr.key_size = create_attr->key_size;
+ attr.value_size = create_attr->value_size;
+ attr.max_entries = create_attr->max_entries;
+ attr.map_flags = create_attr->map_flags;
+ memcpy(attr.map_name, create_attr->name,
+ min(name_len, BPF_OBJ_NAME_LEN - 1));
+ attr.numa_node = create_attr->numa_node;
+ attr.btf_fd = create_attr->btf_fd;
+ attr.btf_key_type_id = create_attr->btf_key_type_id;
+ attr.btf_value_type_id = create_attr->btf_value_type_id;
+ attr.map_ifindex = create_attr->map_ifindex;
return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
+int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
+ int key_size, int value_size, int max_entries,
+ __u32 map_flags, int node)
+{
+ struct bpf_create_map_attr map_attr = {};
+
+ map_attr.name = name;
+ map_attr.map_type = map_type;
+ map_attr.map_flags = map_flags;
+ map_attr.key_size = key_size;
+ map_attr.value_size = value_size;
+ map_attr.max_entries = max_entries;
+ if (node >= 0) {
+ map_attr.numa_node = node;
+ map_attr.map_flags |= BPF_F_NUMA_NODE;
+ }
+
+ return bpf_create_map_xattr(&map_attr);
+}
+
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries, __u32 map_flags)
{
- return bpf_create_map_node(map_type, NULL, key_size, value_size,
- max_entries, map_flags, -1);
+ struct bpf_create_map_attr map_attr = {};
+
+ map_attr.map_type = map_type;
+ map_attr.map_flags = map_flags;
+ map_attr.key_size = key_size;
+ map_attr.value_size = value_size;
+ map_attr.max_entries = max_entries;
+
+ return bpf_create_map_xattr(&map_attr);
}
int bpf_create_map_name(enum bpf_map_type map_type, const char *name,
int key_size, int value_size, int max_entries,
__u32 map_flags)
{
- return bpf_create_map_node(map_type, name, key_size, value_size,
- max_entries, map_flags, -1);
+ struct bpf_create_map_attr map_attr = {};
+
+ map_attr.name = name;
+ map_attr.map_type = map_type;
+ map_attr.map_flags = map_flags;
+ map_attr.key_size = key_size;
+ map_attr.value_size = value_size;
+ map_attr.max_entries = max_entries;
+
+ return bpf_create_map_xattr(&map_attr);
}
int bpf_create_map_in_map_node(enum bpf_map_type map_type, const char *name,
@@ -168,6 +202,7 @@
attr.log_size = 0;
attr.log_level = 0;
attr.kern_version = load_attr->kern_version;
+ attr.prog_ifindex = load_attr->prog_ifindex;
memcpy(attr.prog_name, load_attr->name,
min(name_len, BPF_OBJ_NAME_LEN - 1));
@@ -425,6 +460,16 @@
return sys_bpf(BPF_MAP_GET_FD_BY_ID, &attr, sizeof(attr));
}
+int bpf_btf_get_fd_by_id(__u32 id)
+{
+ union bpf_attr attr;
+
+ bzero(&attr, sizeof(attr));
+ attr.btf_id = id;
+
+ return sys_bpf(BPF_BTF_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
{
union bpf_attr attr;
@@ -573,3 +618,51 @@
close(sock);
return ret;
}
+
+int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
+ bool do_log)
+{
+ union bpf_attr attr = {};
+ int fd;
+
+ attr.btf = ptr_to_u64(btf);
+ attr.btf_size = btf_size;
+
+retry:
+ if (do_log && log_buf && log_buf_size) {
+ attr.btf_log_level = 1;
+ attr.btf_log_size = log_buf_size;
+ attr.btf_log_buf = ptr_to_u64(log_buf);
+ }
+
+ fd = sys_bpf(BPF_BTF_LOAD, &attr, sizeof(attr));
+ if (fd == -1 && !do_log && log_buf && log_buf_size) {
+ do_log = true;
+ goto retry;
+ }
+
+ return fd;
+}
+
+int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len,
+ __u32 *prog_id, __u32 *fd_type, __u64 *probe_offset,
+ __u64 *probe_addr)
+{
+ union bpf_attr attr = {};
+ int err;
+
+ attr.task_fd_query.pid = pid;
+ attr.task_fd_query.fd = fd;
+ attr.task_fd_query.flags = flags;
+ attr.task_fd_query.buf = ptr_to_u64(buf);
+ attr.task_fd_query.buf_len = *buf_len;
+
+ err = sys_bpf(BPF_TASK_FD_QUERY, &attr, sizeof(attr));
+ *buf_len = attr.task_fd_query.buf_len;
+ *prog_id = attr.task_fd_query.prog_id;
+ *fd_type = attr.task_fd_query.fd_type;
+ *probe_offset = attr.task_fd_query.probe_offset;
+ *probe_addr = attr.task_fd_query.probe_addr;
+
+ return err;
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 39f6a0d..0639a30 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -24,8 +24,24 @@
#define __BPF_BPF_H
#include <linux/bpf.h>
+#include <stdbool.h>
#include <stddef.h>
+struct bpf_create_map_attr {
+ const char *name;
+ enum bpf_map_type map_type;
+ __u32 map_flags;
+ __u32 key_size;
+ __u32 value_size;
+ __u32 max_entries;
+ __u32 numa_node;
+ __u32 btf_fd;
+ __u32 btf_key_type_id;
+ __u32 btf_value_type_id;
+ __u32 map_ifindex;
+};
+
+int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr);
int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
int key_size, int value_size, int max_entries,
__u32 map_flags, int node);
@@ -49,6 +65,7 @@
size_t insns_cnt;
const char *license;
__u32 kern_version;
+ __u32 prog_ifindex;
};
/* Recommend log buffer size */
@@ -83,8 +100,14 @@
int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
int bpf_prog_get_fd_by_id(__u32 id);
int bpf_map_get_fd_by_id(__u32 id);
+int bpf_btf_get_fd_by_id(__u32 id);
int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
__u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt);
int bpf_raw_tracepoint_open(const char *name, int prog_fd);
+int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
+ bool do_log);
+int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len,
+ __u32 *prog_id, __u32 *fd_type, __u64 *probe_offset,
+ __u64 *probe_addr);
#endif
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
new file mode 100644
index 0000000..8c54a4b
--- /dev/null
+++ b/tools/lib/bpf/btf.c
@@ -0,0 +1,373 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <linux/err.h>
+#include <linux/btf.h>
+#include "btf.h"
+#include "bpf.h"
+
+#define elog(fmt, ...) { if (err_log) err_log(fmt, ##__VA_ARGS__); }
+#define max(a, b) ((a) > (b) ? (a) : (b))
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+#define BTF_MAX_NR_TYPES 65535
+
+static struct btf_type btf_void;
+
+struct btf {
+ union {
+ struct btf_header *hdr;
+ void *data;
+ };
+ struct btf_type **types;
+ const char *strings;
+ void *nohdr_data;
+ uint32_t nr_types;
+ uint32_t types_size;
+ uint32_t data_size;
+ int fd;
+};
+
+static const char *btf_name_by_offset(const struct btf *btf, uint32_t offset)
+{
+ if (offset < btf->hdr->str_len)
+ return &btf->strings[offset];
+ else
+ return NULL;
+}
+
+static int btf_add_type(struct btf *btf, struct btf_type *t)
+{
+ if (btf->types_size - btf->nr_types < 2) {
+ struct btf_type **new_types;
+ u32 expand_by, new_size;
+
+ if (btf->types_size == BTF_MAX_NR_TYPES)
+ return -E2BIG;
+
+ expand_by = max(btf->types_size >> 2, 16);
+ new_size = min(BTF_MAX_NR_TYPES, btf->types_size + expand_by);
+
+ new_types = realloc(btf->types, sizeof(*new_types) * new_size);
+ if (!new_types)
+ return -ENOMEM;
+
+ if (btf->nr_types == 0)
+ new_types[0] = &btf_void;
+
+ btf->types = new_types;
+ btf->types_size = new_size;
+ }
+
+ btf->types[++(btf->nr_types)] = t;
+
+ return 0;
+}
+
+static int btf_parse_hdr(struct btf *btf, btf_print_fn_t err_log)
+{
+ const struct btf_header *hdr = btf->hdr;
+ u32 meta_left;
+
+ if (btf->data_size < sizeof(struct btf_header)) {
+ elog("BTF header not found\n");
+ return -EINVAL;
+ }
+
+ if (hdr->magic != BTF_MAGIC) {
+ elog("Invalid BTF magic:%x\n", hdr->magic);
+ return -EINVAL;
+ }
+
+ if (hdr->version != BTF_VERSION) {
+ elog("Unsupported BTF version:%u\n", hdr->version);
+ return -ENOTSUP;
+ }
+
+ if (hdr->flags) {
+ elog("Unsupported BTF flags:%x\n", hdr->flags);
+ return -ENOTSUP;
+ }
+
+ meta_left = btf->data_size - sizeof(*hdr);
+ if (!meta_left) {
+ elog("BTF has no data\n");
+ return -EINVAL;
+ }
+
+ if (meta_left < hdr->type_off) {
+ elog("Invalid BTF type section offset:%u\n", hdr->type_off);
+ return -EINVAL;
+ }
+
+ if (meta_left < hdr->str_off) {
+ elog("Invalid BTF string section offset:%u\n", hdr->str_off);
+ return -EINVAL;
+ }
+
+ if (hdr->type_off >= hdr->str_off) {
+ elog("BTF type section offset >= string section offset. No type?\n");
+ return -EINVAL;
+ }
+
+ if (hdr->type_off & 0x02) {
+ elog("BTF type section is not aligned to 4 bytes\n");
+ return -EINVAL;
+ }
+
+ btf->nohdr_data = btf->hdr + 1;
+
+ return 0;
+}
+
+static int btf_parse_str_sec(struct btf *btf, btf_print_fn_t err_log)
+{
+ const struct btf_header *hdr = btf->hdr;
+ const char *start = btf->nohdr_data + hdr->str_off;
+ const char *end = start + btf->hdr->str_len;
+
+ if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_NAME_OFFSET ||
+ start[0] || end[-1]) {
+ elog("Invalid BTF string section\n");
+ return -EINVAL;
+ }
+
+ btf->strings = start;
+
+ return 0;
+}
+
+static int btf_parse_type_sec(struct btf *btf, btf_print_fn_t err_log)
+{
+ struct btf_header *hdr = btf->hdr;
+ void *nohdr_data = btf->nohdr_data;
+ void *next_type = nohdr_data + hdr->type_off;
+ void *end_type = nohdr_data + hdr->str_off;
+
+ while (next_type < end_type) {
+ struct btf_type *t = next_type;
+ uint16_t vlen = BTF_INFO_VLEN(t->info);
+ int err;
+
+ next_type += sizeof(*t);
+ switch (BTF_INFO_KIND(t->info)) {
+ case BTF_KIND_INT:
+ next_type += sizeof(int);
+ break;
+ case BTF_KIND_ARRAY:
+ next_type += sizeof(struct btf_array);
+ break;
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ next_type += vlen * sizeof(struct btf_member);
+ break;
+ case BTF_KIND_ENUM:
+ next_type += vlen * sizeof(struct btf_enum);
+ break;
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_PTR:
+ case BTF_KIND_FWD:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ break;
+ default:
+ elog("Unsupported BTF_KIND:%u\n",
+ BTF_INFO_KIND(t->info));
+ return -EINVAL;
+ }
+
+ err = btf_add_type(btf, t);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static const struct btf_type *btf_type_by_id(const struct btf *btf,
+ uint32_t type_id)
+{
+ if (type_id > btf->nr_types)
+ return NULL;
+
+ return btf->types[type_id];
+}
+
+static bool btf_type_is_void(const struct btf_type *t)
+{
+ return t == &btf_void || BTF_INFO_KIND(t->info) == BTF_KIND_FWD;
+}
+
+static bool btf_type_is_void_or_null(const struct btf_type *t)
+{
+ return !t || btf_type_is_void(t);
+}
+
+static int64_t btf_type_size(const struct btf_type *t)
+{
+ switch (BTF_INFO_KIND(t->info)) {
+ case BTF_KIND_INT:
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ case BTF_KIND_ENUM:
+ return t->size;
+ case BTF_KIND_PTR:
+ return sizeof(void *);
+ default:
+ return -EINVAL;
+ }
+}
+
+#define MAX_RESOLVE_DEPTH 32
+
+int64_t btf__resolve_size(const struct btf *btf, uint32_t type_id)
+{
+ const struct btf_array *array;
+ const struct btf_type *t;
+ uint32_t nelems = 1;
+ int64_t size = -1;
+ int i;
+
+ t = btf_type_by_id(btf, type_id);
+ for (i = 0; i < MAX_RESOLVE_DEPTH && !btf_type_is_void_or_null(t);
+ i++) {
+ size = btf_type_size(t);
+ if (size >= 0)
+ break;
+
+ switch (BTF_INFO_KIND(t->info)) {
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ type_id = t->type;
+ break;
+ case BTF_KIND_ARRAY:
+ array = (const struct btf_array *)(t + 1);
+ if (nelems && array->nelems > UINT32_MAX / nelems)
+ return -E2BIG;
+ nelems *= array->nelems;
+ type_id = array->type;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ t = btf_type_by_id(btf, type_id);
+ }
+
+ if (size < 0)
+ return -EINVAL;
+
+ if (nelems && size > UINT32_MAX / nelems)
+ return -E2BIG;
+
+ return nelems * size;
+}
+
+int32_t btf__find_by_name(const struct btf *btf, const char *type_name)
+{
+ uint32_t i;
+
+ if (!strcmp(type_name, "void"))
+ return 0;
+
+ for (i = 1; i <= btf->nr_types; i++) {
+ const struct btf_type *t = btf->types[i];
+ const char *name = btf_name_by_offset(btf, t->name_off);
+
+ if (name && !strcmp(type_name, name))
+ return i;
+ }
+
+ return -ENOENT;
+}
+
+void btf__free(struct btf *btf)
+{
+ if (!btf)
+ return;
+
+ if (btf->fd != -1)
+ close(btf->fd);
+
+ free(btf->data);
+ free(btf->types);
+ free(btf);
+}
+
+struct btf *btf__new(uint8_t *data, uint32_t size,
+ btf_print_fn_t err_log)
+{
+ uint32_t log_buf_size = 0;
+ char *log_buf = NULL;
+ struct btf *btf;
+ int err;
+
+ btf = calloc(1, sizeof(struct btf));
+ if (!btf)
+ return ERR_PTR(-ENOMEM);
+
+ btf->fd = -1;
+
+ if (err_log) {
+ log_buf = malloc(BPF_LOG_BUF_SIZE);
+ if (!log_buf) {
+ err = -ENOMEM;
+ goto done;
+ }
+ *log_buf = 0;
+ log_buf_size = BPF_LOG_BUF_SIZE;
+ }
+
+ btf->data = malloc(size);
+ if (!btf->data) {
+ err = -ENOMEM;
+ goto done;
+ }
+
+ memcpy(btf->data, data, size);
+ btf->data_size = size;
+
+ btf->fd = bpf_load_btf(btf->data, btf->data_size,
+ log_buf, log_buf_size, false);
+
+ if (btf->fd == -1) {
+ err = -errno;
+ elog("Error loading BTF: %s(%d)\n", strerror(errno), errno);
+ if (log_buf && *log_buf)
+ elog("%s\n", log_buf);
+ goto done;
+ }
+
+ err = btf_parse_hdr(btf, err_log);
+ if (err)
+ goto done;
+
+ err = btf_parse_str_sec(btf, err_log);
+ if (err)
+ goto done;
+
+ err = btf_parse_type_sec(btf, err_log);
+
+done:
+ free(log_buf);
+
+ if (err) {
+ btf__free(btf);
+ return ERR_PTR(err);
+ }
+
+ return btf;
+}
+
+int btf__fd(const struct btf *btf)
+{
+ return btf->fd;
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
new file mode 100644
index 0000000..74bb344
--- /dev/null
+++ b/tools/lib/bpf/btf.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+
+#ifndef __BPF_BTF_H
+#define __BPF_BTF_H
+
+#include <stdint.h>
+
+#define BTF_ELF_SEC ".BTF"
+
+struct btf;
+
+typedef int (*btf_print_fn_t)(const char *, ...)
+ __attribute__((format(printf, 1, 2)));
+
+void btf__free(struct btf *btf);
+struct btf *btf__new(uint8_t *data, uint32_t size, btf_print_fn_t err_log);
+int32_t btf__find_by_name(const struct btf *btf, const char *type_name);
+int64_t btf__resolve_size(const struct btf *btf, uint32_t type_id);
+int btf__fd(const struct btf *btf);
+
+#endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0f9f06d..d20411e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -31,6 +31,7 @@
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
+#include <perf-sys.h>
#include <asm/unistd.h>
#include <linux/err.h>
#include <linux/kernel.h>
@@ -45,6 +46,7 @@
#include "libbpf.h"
#include "bpf.h"
+#include "btf.h"
#ifndef EM_BPF
#define EM_BPF 247
@@ -176,6 +178,7 @@
/* Index in elf obj file, for relocation use. */
int idx;
char *name;
+ int prog_ifindex;
char *section_name;
struct bpf_insn *insns;
size_t insns_cnt, main_prog_cnt;
@@ -211,7 +214,10 @@
int fd;
char *name;
size_t offset;
+ int map_ifindex;
struct bpf_map_def def;
+ uint32_t btf_key_type_id;
+ uint32_t btf_value_type_id;
void *priv;
bpf_map_clear_priv_t clear_priv;
};
@@ -256,6 +262,8 @@
*/
struct list_head list;
+ struct btf *btf;
+
void *priv;
bpf_object_clear_priv_t clear_priv;
@@ -819,7 +827,15 @@
data->d_size);
else if (strcmp(name, "maps") == 0)
obj->efile.maps_shndx = idx;
- else if (sh.sh_type == SHT_SYMTAB) {
+ else if (strcmp(name, BTF_ELF_SEC) == 0) {
+ obj->btf = btf__new(data->d_buf, data->d_size,
+ __pr_debug);
+ if (IS_ERR(obj->btf)) {
+ pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
+ BTF_ELF_SEC, PTR_ERR(obj->btf));
+ obj->btf = NULL;
+ }
+ } else if (sh.sh_type == SHT_SYMTAB) {
if (obj->efile.symbols) {
pr_warning("bpf: multiple SYMTAB in %s\n",
obj->path);
@@ -996,33 +1012,127 @@
return 0;
}
+static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
+{
+ struct bpf_map_def *def = &map->def;
+ const size_t max_name = 256;
+ int64_t key_size, value_size;
+ int32_t key_id, value_id;
+ char name[max_name];
+
+ /* Find key type by name from BTF */
+ if (snprintf(name, max_name, "%s_key", map->name) == max_name) {
+ pr_warning("map:%s length of BTF key_type:%s_key is too long\n",
+ map->name, map->name);
+ return -EINVAL;
+ }
+
+ key_id = btf__find_by_name(btf, name);
+ if (key_id < 0) {
+ pr_debug("map:%s key_type:%s cannot be found in BTF\n",
+ map->name, name);
+ return key_id;
+ }
+
+ key_size = btf__resolve_size(btf, key_id);
+ if (key_size < 0) {
+ pr_warning("map:%s key_type:%s cannot get the BTF type_size\n",
+ map->name, name);
+ return key_size;
+ }
+
+ if (def->key_size != key_size) {
+ pr_warning("map:%s key_type:%s has BTF type_size:%u != key_size:%u\n",
+ map->name, name, (unsigned int)key_size, def->key_size);
+ return -EINVAL;
+ }
+
+ /* Find value type from BTF */
+ if (snprintf(name, max_name, "%s_value", map->name) == max_name) {
+ pr_warning("map:%s length of BTF value_type:%s_value is too long\n",
+ map->name, map->name);
+ return -EINVAL;
+ }
+
+ value_id = btf__find_by_name(btf, name);
+ if (value_id < 0) {
+ pr_debug("map:%s value_type:%s cannot be found in BTF\n",
+ map->name, name);
+ return value_id;
+ }
+
+ value_size = btf__resolve_size(btf, value_id);
+ if (value_size < 0) {
+ pr_warning("map:%s value_type:%s cannot get the BTF type_size\n",
+ map->name, name);
+ return value_size;
+ }
+
+ if (def->value_size != value_size) {
+ pr_warning("map:%s value_type:%s has BTF type_size:%u != value_size:%u\n",
+ map->name, name, (unsigned int)value_size, def->value_size);
+ return -EINVAL;
+ }
+
+ map->btf_key_type_id = key_id;
+ map->btf_value_type_id = value_id;
+
+ return 0;
+}
+
static int
bpf_object__create_maps(struct bpf_object *obj)
{
+ struct bpf_create_map_attr create_attr = {};
unsigned int i;
+ int err;
for (i = 0; i < obj->nr_maps; i++) {
- struct bpf_map_def *def = &obj->maps[i].def;
- int *pfd = &obj->maps[i].fd;
+ struct bpf_map *map = &obj->maps[i];
+ struct bpf_map_def *def = &map->def;
+ int *pfd = &map->fd;
- *pfd = bpf_create_map_name(def->type,
- obj->maps[i].name,
- def->key_size,
- def->value_size,
- def->max_entries,
- def->map_flags);
+ create_attr.name = map->name;
+ create_attr.map_ifindex = map->map_ifindex;
+ create_attr.map_type = def->type;
+ create_attr.map_flags = def->map_flags;
+ create_attr.key_size = def->key_size;
+ create_attr.value_size = def->value_size;
+ create_attr.max_entries = def->max_entries;
+ create_attr.btf_fd = 0;
+ create_attr.btf_key_type_id = 0;
+ create_attr.btf_value_type_id = 0;
+
+ if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
+ create_attr.btf_fd = btf__fd(obj->btf);
+ create_attr.btf_key_type_id = map->btf_key_type_id;
+ create_attr.btf_value_type_id = map->btf_value_type_id;
+ }
+
+ *pfd = bpf_create_map_xattr(&create_attr);
+ if (*pfd < 0 && create_attr.btf_key_type_id) {
+ pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
+ map->name, strerror(errno), errno);
+ create_attr.btf_fd = 0;
+ create_attr.btf_key_type_id = 0;
+ create_attr.btf_value_type_id = 0;
+ map->btf_key_type_id = 0;
+ map->btf_value_type_id = 0;
+ *pfd = bpf_create_map_xattr(&create_attr);
+ }
+
if (*pfd < 0) {
size_t j;
- int err = *pfd;
+ err = *pfd;
pr_warning("failed to create map (name: '%s'): %s\n",
- obj->maps[i].name,
+ map->name,
strerror(errno));
for (j = 0; j < i; j++)
zclose(obj->maps[j].fd);
return err;
}
- pr_debug("create map %s: fd=%d\n", obj->maps[i].name, *pfd);
+ pr_debug("create map %s: fd=%d\n", map->name, *pfd);
}
return 0;
@@ -1166,7 +1276,7 @@
static int
load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type,
const char *name, struct bpf_insn *insns, int insns_cnt,
- char *license, u32 kern_version, int *pfd)
+ char *license, u32 kern_version, int *pfd, int prog_ifindex)
{
struct bpf_load_program_attr load_attr;
char *log_buf;
@@ -1180,6 +1290,7 @@
load_attr.insns_cnt = insns_cnt;
load_attr.license = license;
load_attr.kern_version = kern_version;
+ load_attr.prog_ifindex = prog_ifindex;
if (!load_attr.insns || !load_attr.insns_cnt)
return -EINVAL;
@@ -1261,7 +1372,8 @@
}
err = load_program(prog->type, prog->expected_attach_type,
prog->name, prog->insns, prog->insns_cnt,
- license, kern_version, &fd);
+ license, kern_version, &fd,
+ prog->prog_ifindex);
if (!err)
prog->instances.fds[0] = fd;
goto out;
@@ -1292,7 +1404,8 @@
err = load_program(prog->type, prog->expected_attach_type,
prog->name, result.new_insn_ptr,
result.new_insn_cnt,
- license, kern_version, &fd);
+ license, kern_version, &fd,
+ prog->prog_ifindex);
if (err) {
pr_warning("Loading the %dth instance of program '%s' failed\n",
@@ -1331,9 +1444,38 @@
return 0;
}
-static int bpf_object__validate(struct bpf_object *obj)
+static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
{
- if (obj->kern_version == 0) {
+ switch (type) {
+ case BPF_PROG_TYPE_SOCKET_FILTER:
+ case BPF_PROG_TYPE_SCHED_CLS:
+ case BPF_PROG_TYPE_SCHED_ACT:
+ case BPF_PROG_TYPE_XDP:
+ case BPF_PROG_TYPE_CGROUP_SKB:
+ case BPF_PROG_TYPE_CGROUP_SOCK:
+ case BPF_PROG_TYPE_LWT_IN:
+ case BPF_PROG_TYPE_LWT_OUT:
+ case BPF_PROG_TYPE_LWT_XMIT:
+ case BPF_PROG_TYPE_LWT_SEG6LOCAL:
+ case BPF_PROG_TYPE_SOCK_OPS:
+ case BPF_PROG_TYPE_SK_SKB:
+ case BPF_PROG_TYPE_CGROUP_DEVICE:
+ case BPF_PROG_TYPE_SK_MSG:
+ case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+ return false;
+ case BPF_PROG_TYPE_UNSPEC:
+ case BPF_PROG_TYPE_KPROBE:
+ case BPF_PROG_TYPE_TRACEPOINT:
+ case BPF_PROG_TYPE_PERF_EVENT:
+ case BPF_PROG_TYPE_RAW_TRACEPOINT:
+ default:
+ return true;
+ }
+}
+
+static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
+{
+ if (needs_kver && obj->kern_version == 0) {
pr_warning("%s doesn't provide kernel version\n",
obj->path);
return -LIBBPF_ERRNO__KVERSION;
@@ -1342,7 +1484,8 @@
}
static struct bpf_object *
-__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz)
+__bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
+ bool needs_kver)
{
struct bpf_object *obj;
int err;
@@ -1360,7 +1503,7 @@
CHECK_ERR(bpf_object__check_endianness(obj), err, out);
CHECK_ERR(bpf_object__elf_collect(obj), err, out);
CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
- CHECK_ERR(bpf_object__validate(obj), err, out);
+ CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
bpf_object__elf_finish(obj);
return obj;
@@ -1377,7 +1520,7 @@
pr_debug("loading %s\n", path);
- return __bpf_object__open(path, NULL, 0);
+ return __bpf_object__open(path, NULL, 0, true);
}
struct bpf_object *bpf_object__open_buffer(void *obj_buf,
@@ -1400,7 +1543,7 @@
pr_debug("loading object '%s' from buffer\n",
name);
- return __bpf_object__open(name, obj_buf, obj_buf_sz);
+ return __bpf_object__open(name, obj_buf, obj_buf_sz, true);
}
int bpf_object__unload(struct bpf_object *obj)
@@ -1641,6 +1784,7 @@
bpf_object__elf_finish(obj);
bpf_object__unload(obj);
+ btf__free(obj->btf);
for (i = 0; i < obj->nr_maps; i++) {
zfree(&obj->maps[i].name);
@@ -1692,6 +1836,11 @@
return obj ? obj->kern_version : 0;
}
+int bpf_object__btf_fd(const struct bpf_object *obj)
+{
+ return obj->btf ? btf__fd(obj->btf) : -1;
+}
+
int bpf_object__set_priv(struct bpf_object *obj, void *priv,
bpf_object_clear_priv_t clear_priv)
{
@@ -1845,11 +1994,12 @@
BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS);
BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT);
BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
+BPF_PROG_TYPE_FNS(raw_tracepoint, BPF_PROG_TYPE_RAW_TRACEPOINT);
BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
-static void bpf_program__set_expected_attach_type(struct bpf_program *prog,
- enum bpf_attach_type type)
+void bpf_program__set_expected_attach_type(struct bpf_program *prog,
+ enum bpf_attach_type type)
{
prog->expected_attach_type = type;
}
@@ -1859,6 +2009,9 @@
#define BPF_PROG_SEC(string, ptype) BPF_PROG_SEC_FULL(string, ptype, 0)
+#define BPF_S_PROG_SEC(string, ptype) \
+ BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK, ptype)
+
#define BPF_SA_PROG_SEC(string, ptype) \
BPF_PROG_SEC_FULL(string, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, ptype)
@@ -1874,6 +2027,7 @@
BPF_PROG_SEC("classifier", BPF_PROG_TYPE_SCHED_CLS),
BPF_PROG_SEC("action", BPF_PROG_TYPE_SCHED_ACT),
BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT),
+ BPF_PROG_SEC("raw_tracepoint/", BPF_PROG_TYPE_RAW_TRACEPOINT),
BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT),
BPF_PROG_SEC("cgroup/skb", BPF_PROG_TYPE_CGROUP_SKB),
@@ -1889,10 +2043,13 @@
BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND),
BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT),
BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT),
+ BPF_S_PROG_SEC("cgroup/post_bind4", BPF_CGROUP_INET4_POST_BIND),
+ BPF_S_PROG_SEC("cgroup/post_bind6", BPF_CGROUP_INET6_POST_BIND),
};
#undef BPF_PROG_SEC
#undef BPF_PROG_SEC_FULL
+#undef BPF_S_PROG_SEC
#undef BPF_SA_PROG_SEC
static int bpf_program__identify_section(struct bpf_program *prog)
@@ -1929,6 +2086,16 @@
return map ? map->name : NULL;
}
+uint32_t bpf_map__btf_key_type_id(const struct bpf_map *map)
+{
+ return map ? map->btf_key_type_id : 0;
+}
+
+uint32_t bpf_map__btf_value_type_id(const struct bpf_map *map)
+{
+ return map ? map->btf_value_type_id : 0;
+}
+
int bpf_map__set_priv(struct bpf_map *map, void *priv,
bpf_map_clear_priv_t clear_priv)
{
@@ -2028,13 +2195,17 @@
enum bpf_attach_type expected_attach_type;
enum bpf_prog_type prog_type;
struct bpf_object *obj;
+ struct bpf_map *map;
int section_idx;
int err;
if (!attr)
return -EINVAL;
+ if (!attr->file)
+ return -EINVAL;
- obj = bpf_object__open(attr->file);
+ obj = __bpf_object__open(attr->file, NULL, 0,
+ bpf_prog_type__needs_kver(attr->prog_type));
if (IS_ERR_OR_NULL(obj))
return -ENOENT;
@@ -2044,6 +2215,7 @@
* section name.
*/
prog_type = attr->prog_type;
+ prog->prog_ifindex = attr->ifindex;
expected_attach_type = attr->expected_attach_type;
if (prog_type == BPF_PROG_TYPE_UNSPEC) {
section_idx = bpf_program__identify_section(prog);
@@ -2064,6 +2236,10 @@
first_prog = prog;
}
+ bpf_map__for_each(map, obj) {
+ map->map_ifindex = attr->ifindex;
+ }
+
if (!first_prog) {
pr_warning("object file doesn't contain bpf program\n");
bpf_object__close(obj);
@@ -2080,3 +2256,63 @@
*prog_fd = bpf_program__fd(first_prog);
return 0;
}
+
+enum bpf_perf_event_ret
+bpf_perf_event_read_simple(void *mem, unsigned long size,
+ unsigned long page_size, void **buf, size_t *buf_len,
+ bpf_perf_event_print_t fn, void *priv)
+{
+ volatile struct perf_event_mmap_page *header = mem;
+ __u64 data_tail = header->data_tail;
+ __u64 data_head = header->data_head;
+ void *base, *begin, *end;
+ int ret;
+
+ asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
+ if (data_head == data_tail)
+ return LIBBPF_PERF_EVENT_CONT;
+
+ base = ((char *)header) + page_size;
+
+ begin = base + data_tail % size;
+ end = base + data_head % size;
+
+ while (begin != end) {
+ struct perf_event_header *ehdr;
+
+ ehdr = begin;
+ if (begin + ehdr->size > base + size) {
+ long len = base + size - begin;
+
+ if (*buf_len < ehdr->size) {
+ free(*buf);
+ *buf = malloc(ehdr->size);
+ if (!*buf) {
+ ret = LIBBPF_PERF_EVENT_ERROR;
+ break;
+ }
+ *buf_len = ehdr->size;
+ }
+
+ memcpy(*buf, begin, len);
+ memcpy(*buf + len, base, ehdr->size - len);
+ ehdr = (void *)*buf;
+ begin = base + ehdr->size - len;
+ } else if (begin + ehdr->size == base + size) {
+ begin = base;
+ } else {
+ begin += ehdr->size;
+ }
+
+ ret = fn(ehdr, priv);
+ if (ret != LIBBPF_PERF_EVENT_CONT)
+ break;
+
+ data_tail += ehdr->size;
+ }
+
+ __sync_synchronize(); /* smp_mb() */
+ header->data_tail = data_tail;
+
+ return ret;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a3a62a5..0997653 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -52,8 +52,8 @@
int libbpf_strerror(int err, char *buf, size_t size);
/*
- * In include/linux/compiler-gcc.h, __printf is defined. However
- * it should be better if libbpf.h doesn't depend on Linux header file.
+ * __printf is defined in include/linux/compiler-gcc.h. However,
+ * it would be better if libbpf.h didn't depend on Linux header files.
* So instead of __printf, here we use gcc attribute directly.
*/
typedef int (*libbpf_print_fn_t)(const char *, ...)
@@ -78,6 +78,7 @@
int bpf_object__unload(struct bpf_object *obj);
const char *bpf_object__name(struct bpf_object *obj);
unsigned int bpf_object__kversion(struct bpf_object *obj);
+int bpf_object__btf_fd(const struct bpf_object *obj);
struct bpf_object *bpf_object__next(struct bpf_object *prev);
#define bpf_object__for_each_safe(pos, tmp) \
@@ -91,7 +92,7 @@
bpf_object_clear_priv_t clear_priv);
void *bpf_object__priv(struct bpf_object *prog);
-/* Accessors of bpf_program. */
+/* Accessors of bpf_program */
struct bpf_program;
struct bpf_program *bpf_program__next(struct bpf_program *prog,
struct bpf_object *obj);
@@ -120,28 +121,28 @@
/*
* Libbpf allows callers to adjust BPF programs before being loaded
- * into kernel. One program in an object file can be transform into
- * multiple variants to be attached to different code.
+ * into kernel. One program in an object file can be transformed into
+ * multiple variants to be attached to different hooks.
*
* bpf_program_prep_t, bpf_program__set_prep and bpf_program__nth_fd
- * are APIs for this propose.
+ * form an API for this purpose.
*
* - bpf_program_prep_t:
- * It defines 'preprocessor', which is a caller defined function
+ * Defines a 'preprocessor', which is a caller defined function
* passed to libbpf through bpf_program__set_prep(), and will be
* called before program is loaded. The processor should adjust
- * the program one time for each instances according to the number
+ * the program one time for each instance according to the instance id
* passed to it.
*
* - bpf_program__set_prep:
- * Attachs a preprocessor to a BPF program. The number of instances
- * whould be created is also passed through this function.
+ * Attaches a preprocessor to a BPF program. The number of instances
+ * that should be created is also passed through this function.
*
* - bpf_program__nth_fd:
- * After the program is loaded, get resuling fds from bpf program for
- * each instances.
+ * After the program is loaded, get resulting FD of a given instance
+ * of the BPF program.
*
- * If bpf_program__set_prep() is not used, the program whould be loaded
+ * If bpf_program__set_prep() is not used, the program would be loaded
* without adjustment during bpf_object__load(). The program has only
* one instance. In this case bpf_program__fd(prog) is equal to
* bpf_program__nth_fd(prog, 0).
@@ -155,7 +156,7 @@
struct bpf_insn *new_insn_ptr;
int new_insn_cnt;
- /* If not NULL, result fd is set to it */
+ /* If not NULL, result FD is written to it. */
int *pfd;
};
@@ -168,8 +169,8 @@
* - res: Output parameter, result of transformation.
*
* Return value:
- * - Zero: pre-processing success.
- * - Non-zero: pre-processing, stop loading.
+ * - Zero: pre-processing success.
+ * - Non-zero: pre-processing error, stop loading.
*/
typedef int (*bpf_program_prep_t)(struct bpf_program *prog, int n,
struct bpf_insn *insns, int insns_cnt,
@@ -181,19 +182,23 @@
int bpf_program__nth_fd(struct bpf_program *prog, int n);
/*
- * Adjust type of bpf program. Default is kprobe.
+ * Adjust type of BPF program. Default is kprobe.
*/
int bpf_program__set_socket_filter(struct bpf_program *prog);
int bpf_program__set_tracepoint(struct bpf_program *prog);
+int bpf_program__set_raw_tracepoint(struct bpf_program *prog);
int bpf_program__set_kprobe(struct bpf_program *prog);
int bpf_program__set_sched_cls(struct bpf_program *prog);
int bpf_program__set_sched_act(struct bpf_program *prog);
int bpf_program__set_xdp(struct bpf_program *prog);
int bpf_program__set_perf_event(struct bpf_program *prog);
void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type);
+void bpf_program__set_expected_attach_type(struct bpf_program *prog,
+ enum bpf_attach_type type);
bool bpf_program__is_socket_filter(struct bpf_program *prog);
bool bpf_program__is_tracepoint(struct bpf_program *prog);
+bool bpf_program__is_raw_tracepoint(struct bpf_program *prog);
bool bpf_program__is_kprobe(struct bpf_program *prog);
bool bpf_program__is_sched_cls(struct bpf_program *prog);
bool bpf_program__is_sched_act(struct bpf_program *prog);
@@ -201,10 +206,10 @@
bool bpf_program__is_perf_event(struct bpf_program *prog);
/*
- * We don't need __attribute__((packed)) now since it is
- * unnecessary for 'bpf_map_def' because they are all aligned.
- * In addition, using it will trigger -Wpacked warning message,
- * and will be treated as an error due to -Werror.
+ * No need for __attribute__((packed)), all members of 'bpf_map_def'
+ * are all aligned. In addition, using __attribute__((packed))
+ * would trigger a -Wpacked warning message, and lead to an error
+ * if -Werror is set.
*/
struct bpf_map_def {
unsigned int type;
@@ -215,8 +220,8 @@
};
/*
- * There is another 'struct bpf_map' in include/linux/map.h. However,
- * it is not a uapi header so no need to consider name clash.
+ * The 'struct bpf_map' in include/linux/bpf.h is internal to the kernel,
+ * so no need to worry about a name clash.
*/
struct bpf_map;
struct bpf_map *
@@ -224,7 +229,7 @@
/*
* Get bpf_map through the offset of corresponding struct bpf_map_def
- * in the bpf object file.
+ * in the BPF object file.
*/
struct bpf_map *
bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset);
@@ -239,6 +244,8 @@
int bpf_map__fd(struct bpf_map *map);
const struct bpf_map_def *bpf_map__def(struct bpf_map *map);
const char *bpf_map__name(struct bpf_map *map);
+uint32_t bpf_map__btf_key_type_id(const struct bpf_map *map);
+uint32_t bpf_map__btf_value_type_id(const struct bpf_map *map);
typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *);
int bpf_map__set_priv(struct bpf_map *map, void *priv,
@@ -252,6 +259,7 @@
const char *file;
enum bpf_prog_type prog_type;
enum bpf_attach_type expected_attach_type;
+ int ifindex;
};
int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
@@ -260,4 +268,17 @@
struct bpf_object **pobj, int *prog_fd);
int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+
+enum bpf_perf_event_ret {
+ LIBBPF_PERF_EVENT_DONE = 0,
+ LIBBPF_PERF_EVENT_ERROR = -1,
+ LIBBPF_PERF_EVENT_CONT = -2,
+};
+
+typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(void *event,
+ void *priv);
+int bpf_perf_event_read_simple(void *mem, unsigned long size,
+ unsigned long page_size,
+ void **buf, size_t *buf_len,
+ bpf_perf_event_print_t fn, void *priv);
#endif
diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 5e1ab2f..adc8e54 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -15,3 +15,5 @@
test_sock
test_sock_addr
urandom_read
+test_btf
+test_sockmap
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 0a315dd..8504444 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -10,7 +10,7 @@
GENFLAGS := -DHAVE_GENHDR
endif
-CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
+CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include
LDLIBS += -lcap -lelf -lrt -lpthread
TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
@@ -19,19 +19,22 @@
$(TEST_CUSTOM_PROGS): urandom_read
urandom_read: urandom_read.c
- $(CC) -o $(TEST_CUSTOM_PROGS) -static $<
+ $(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id
# Order correspond to 'make run_tests' order
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
- test_sock test_sock_addr
+ test_sock test_btf test_sockmap
TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
- sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o
+ sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
+ test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
+ test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
+ test_lwt_seg6local.o
# Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \
@@ -39,10 +42,12 @@
test_xdp_redirect.sh \
test_xdp_meta.sh \
test_offload.py \
- test_sock_addr.sh
+ test_sock_addr.sh \
+ test_tunnel.sh \
+ test_lwt_seg6local.sh
# Compile but not part of 'make run_tests'
-TEST_GEN_PROGS_EXTENDED = test_libbpf_open
+TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr
include ../lib.mk
@@ -55,6 +60,8 @@
$(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
$(OUTPUT)/test_sock: cgroup_helpers.c
$(OUTPUT)/test_sock_addr: cgroup_helpers.c
+$(OUTPUT)/test_sockmap: cgroup_helpers.c
+$(OUTPUT)/test_progs: trace_helpers.c
.PHONY: force
@@ -66,6 +73,8 @@
CLANG ?= clang
LLC ?= llc
+LLVM_OBJCOPY ?= llvm-objcopy
+BTF_PAHOLE ?= pahole
PROBE := $(shell $(LLC) -march=bpf -mcpu=probe -filetype=null /dev/null 2>&1)
@@ -77,15 +86,42 @@
CPU ?= generic
endif
+# Get Clang's default includes on this system, as opposed to those seen by
+# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
+#
+# Use '-idirafter': Don't interfere with include mechanics except where the
+# build would have failed anyways.
+CLANG_SYS_INCLUDES := $(shell $(CLANG) -v -E - </dev/null 2>&1 \
+ | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')
+
CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
+ $(CLANG_SYS_INCLUDES) \
-Wno-compare-distinct-pointer-types
$(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline
$(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline
+BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
+BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
+BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --version 2>&1 | grep LLVM)
+
+ifneq ($(BTF_LLC_PROBE),)
+ifneq ($(BTF_PAHOLE_PROBE),)
+ifneq ($(BTF_OBJCOPY_PROBE),)
+ CLANG_FLAGS += -g
+ LLC_FLAGS += -mattr=dwarfris
+ DWARF2BTF = y
+endif
+endif
+endif
+
$(OUTPUT)/%.o: %.c
$(CLANG) $(CLANG_FLAGS) \
-O2 -target bpf -emit-llvm -c $< -o - | \
- $(LLC) -march=bpf -mcpu=$(CPU) -filetype=obj -o $@
+ $(LLC) -march=bpf -mcpu=$(CPU) $(LLC_FLAGS) -filetype=obj -o $@
+ifeq ($(DWARF2BTF),y)
+ $(BTF_PAHOLE) -J $@
+endif
EXTRA_CLEAN := $(TEST_CUSTOM_PROGS)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index d8223d9..334d3e8 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -75,9 +75,14 @@
(void *) BPF_FUNC_sock_ops_cb_flags_set;
static int (*bpf_sk_redirect_map)(void *ctx, void *map, int key, int flags) =
(void *) BPF_FUNC_sk_redirect_map;
+static int (*bpf_sk_redirect_hash)(void *ctx, void *map, void *key, int flags) =
+ (void *) BPF_FUNC_sk_redirect_hash;
static int (*bpf_sock_map_update)(void *map, void *key, void *value,
unsigned long long flags) =
(void *) BPF_FUNC_sock_map_update;
+static int (*bpf_sock_hash_update)(void *map, void *key, void *value,
+ unsigned long long flags) =
+ (void *) BPF_FUNC_sock_hash_update;
static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
void *buf, unsigned int buf_size) =
(void *) BPF_FUNC_perf_event_read_value;
@@ -88,6 +93,9 @@
(void *) BPF_FUNC_override_return;
static int (*bpf_msg_redirect_map)(void *ctx, void *map, int key, int flags) =
(void *) BPF_FUNC_msg_redirect_map;
+static int (*bpf_msg_redirect_hash)(void *ctx,
+ void *map, void *key, int flags) =
+ (void *) BPF_FUNC_msg_redirect_hash;
static int (*bpf_msg_apply_bytes)(void *ctx, int len) =
(void *) BPF_FUNC_msg_apply_bytes;
static int (*bpf_msg_cork_bytes)(void *ctx, int len) =
@@ -96,6 +104,28 @@
(void *) BPF_FUNC_msg_pull_data;
static int (*bpf_bind)(void *ctx, void *addr, int addr_len) =
(void *) BPF_FUNC_bind;
+static int (*bpf_xdp_adjust_tail)(void *ctx, int offset) =
+ (void *) BPF_FUNC_xdp_adjust_tail;
+static int (*bpf_skb_get_xfrm_state)(void *ctx, int index, void *state,
+ int size, int flags) =
+ (void *) BPF_FUNC_skb_get_xfrm_state;
+static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
+ (void *) BPF_FUNC_get_stack;
+static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params,
+ int plen, __u32 flags) =
+ (void *) BPF_FUNC_fib_lookup;
+static int (*bpf_lwt_push_encap)(void *ctx, unsigned int type, void *hdr,
+ unsigned int len) =
+ (void *) BPF_FUNC_lwt_push_encap;
+static int (*bpf_lwt_seg6_store_bytes)(void *ctx, unsigned int offset,
+ void *from, unsigned int len) =
+ (void *) BPF_FUNC_lwt_seg6_store_bytes;
+static int (*bpf_lwt_seg6_action)(void *ctx, unsigned int action, void *param,
+ unsigned int param_len) =
+ (void *) BPF_FUNC_lwt_seg6_action;
+static int (*bpf_lwt_seg6_adjust_srh)(void *ctx, unsigned int offset,
+ unsigned int len) =
+ (void *) BPF_FUNC_lwt_seg6_adjust_srh;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
@@ -129,6 +159,8 @@
(void *) BPF_FUNC_l3_csum_replace;
static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flags) =
(void *) BPF_FUNC_l4_csum_replace;
+static int (*bpf_csum_diff)(void *from, int from_size, void *to, int to_size, int seed) =
+ (void *) BPF_FUNC_csum_diff;
static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) =
(void *) BPF_FUNC_skb_under_cgroup;
static int (*bpf_skb_change_head)(void *, int len, int flags) =
diff --git a/tools/testing/selftests/bpf/bpf_rand.h b/tools/testing/selftests/bpf/bpf_rand.h
new file mode 100644
index 0000000..59bf3e1
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpf_rand.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __BPF_RAND__
+#define __BPF_RAND__
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <time.h>
+
+static inline uint64_t bpf_rand_mask(uint64_t mask)
+{
+ return (((uint64_t)(uint32_t)rand()) |
+ ((uint64_t)(uint32_t)rand() << 32)) & mask;
+}
+
+#define bpf_rand_ux(x, m) \
+static inline uint64_t bpf_rand_u##x(int shift) \
+{ \
+ return bpf_rand_mask((m)) << shift; \
+}
+
+bpf_rand_ux( 8, 0xffULL)
+bpf_rand_ux(16, 0xffffULL)
+bpf_rand_ux(24, 0xffffffULL)
+bpf_rand_ux(32, 0xffffffffULL)
+bpf_rand_ux(40, 0xffffffffffULL)
+bpf_rand_ux(48, 0xffffffffffffULL)
+bpf_rand_ux(56, 0xffffffffffffffULL)
+bpf_rand_ux(64, 0xffffffffffffffffULL)
+
+static inline void bpf_semi_rand_init(void)
+{
+ srand(time(NULL));
+}
+
+static inline uint64_t bpf_semi_rand_get(void)
+{
+ switch (rand() % 39) {
+ case 0: return 0x000000ff00000000ULL | bpf_rand_u8(0);
+ case 1: return 0xffffffff00000000ULL | bpf_rand_u16(0);
+ case 2: return 0x00000000ffff0000ULL | bpf_rand_u16(0);
+ case 3: return 0x8000000000000000ULL | bpf_rand_u32(0);
+ case 4: return 0x00000000f0000000ULL | bpf_rand_u32(0);
+ case 5: return 0x0000000100000000ULL | bpf_rand_u24(0);
+ case 6: return 0x800ff00000000000ULL | bpf_rand_u32(0);
+ case 7: return 0x7fffffff00000000ULL | bpf_rand_u32(0);
+ case 8: return 0xffffffffffffff00ULL ^ bpf_rand_u32(24);
+ case 9: return 0xffffffffffffff00ULL | bpf_rand_u8(0);
+ case 10: return 0x0000000010000000ULL | bpf_rand_u32(0);
+ case 11: return 0xf000000000000000ULL | bpf_rand_u8(0);
+ case 12: return 0x0000f00000000000ULL | bpf_rand_u8(8);
+ case 13: return 0x000000000f000000ULL | bpf_rand_u8(16);
+ case 14: return 0x0000000000000f00ULL | bpf_rand_u8(32);
+ case 15: return 0x00fff00000000f00ULL | bpf_rand_u8(48);
+ case 16: return 0x00007fffffffffffULL ^ bpf_rand_u32(1);
+ case 17: return 0xffff800000000000ULL | bpf_rand_u8(4);
+ case 18: return 0xffff800000000000ULL | bpf_rand_u8(20);
+ case 19: return (0xffffffc000000000ULL + 0x80000ULL) | bpf_rand_u32(0);
+ case 20: return (0xffffffc000000000ULL - 0x04000000ULL) | bpf_rand_u32(0);
+ case 21: return 0x0000000000000000ULL | bpf_rand_u8(55) | bpf_rand_u32(20);
+ case 22: return 0xffffffffffffffffULL ^ bpf_rand_u8(3) ^ bpf_rand_u32(40);
+ case 23: return 0x0000000000000000ULL | bpf_rand_u8(bpf_rand_u8(0) % 64);
+ case 24: return 0x0000000000000000ULL | bpf_rand_u16(bpf_rand_u8(0) % 64);
+ case 25: return 0xffffffffffffffffULL ^ bpf_rand_u8(bpf_rand_u8(0) % 64);
+ case 26: return 0xffffffffffffffffULL ^ bpf_rand_u40(bpf_rand_u8(0) % 64);
+ case 27: return 0x0000800000000000ULL;
+ case 28: return 0x8000000000000000ULL;
+ case 29: return 0x0000000000000000ULL;
+ case 30: return 0xffffffffffffffffULL;
+ case 31: return bpf_rand_u16(bpf_rand_u8(0) % 64);
+ case 32: return bpf_rand_u24(bpf_rand_u8(0) % 64);
+ case 33: return bpf_rand_u32(bpf_rand_u8(0) % 64);
+ case 34: return bpf_rand_u40(bpf_rand_u8(0) % 64);
+ case 35: return bpf_rand_u48(bpf_rand_u8(0) % 64);
+ case 36: return bpf_rand_u56(bpf_rand_u8(0) % 64);
+ case 37: return bpf_rand_u64(bpf_rand_u8(0) % 64);
+ default: return bpf_rand_u64(0);
+ }
+}
+
+#endif /* __BPF_RAND__ */
diff --git a/tools/testing/selftests/bpf/test_adjust_tail.c b/tools/testing/selftests/bpf/test_adjust_tail.c
new file mode 100644
index 0000000..4cd5e86
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_adjust_tail.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_tail")
+int _xdp_adjust_tail(struct xdp_md *xdp)
+{
+ void *data_end = (void *)(long)xdp->data_end;
+ void *data = (void *)(long)xdp->data;
+ int offset = 0;
+
+ if (data_end - data == 54)
+ offset = 256;
+ else
+ offset = 20;
+ if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+ return XDP_DROP;
+ return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
new file mode 100644
index 0000000..35064df
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -0,0 +1,2270 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/err.h>
+#include <bpf/bpf.h>
+#include <sys/resource.h>
+#include <libelf.h>
+#include <gelf.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+
+#include "bpf_rlimit.h"
+
+static uint32_t pass_cnt;
+static uint32_t error_cnt;
+static uint32_t skip_cnt;
+
+#define CHECK(condition, format...) ({ \
+ int __ret = !!(condition); \
+ if (__ret) { \
+ fprintf(stderr, "%s:%d:FAIL ", __func__, __LINE__); \
+ fprintf(stderr, format); \
+ } \
+ __ret; \
+})
+
+static int count_result(int err)
+{
+ if (err)
+ error_cnt++;
+ else
+ pass_cnt++;
+
+ fprintf(stderr, "\n");
+ return err;
+}
+
+#define min(a, b) ((a) < (b) ? (a) : (b))
+#define __printf(a, b) __attribute__((format(printf, a, b)))
+
+__printf(1, 2)
+static int __base_pr(const char *format, ...)
+{
+ va_list args;
+ int err;
+
+ va_start(args, format);
+ err = vfprintf(stderr, format, args);
+ va_end(args);
+ return err;
+}
+
+#define BTF_INFO_ENC(kind, root, vlen) \
+ ((!!(root) << 31) | ((kind) << 24) | ((vlen) & BTF_MAX_VLEN))
+
+#define BTF_TYPE_ENC(name, info, size_or_type) \
+ (name), (info), (size_or_type)
+
+#define BTF_INT_ENC(encoding, bits_offset, nr_bits) \
+ ((encoding) << 24 | (bits_offset) << 16 | (nr_bits))
+#define BTF_TYPE_INT_ENC(name, encoding, bits_offset, bits, sz) \
+ BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), sz), \
+ BTF_INT_ENC(encoding, bits_offset, bits)
+
+#define BTF_ARRAY_ENC(type, index_type, nr_elems) \
+ (type), (index_type), (nr_elems)
+#define BTF_TYPE_ARRAY_ENC(type, index_type, nr_elems) \
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_ARRAY, 0, 0), 0), \
+ BTF_ARRAY_ENC(type, index_type, nr_elems)
+
+#define BTF_MEMBER_ENC(name, type, bits_offset) \
+ (name), (type), (bits_offset)
+#define BTF_ENUM_ENC(name, val) (name), (val)
+
+#define BTF_TYPEDEF_ENC(name, type) \
+ BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_TYPEDEF, 0, 0), type)
+
+#define BTF_PTR_ENC(name, type) \
+ BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), type)
+
+#define BTF_END_RAW 0xdeadbeef
+#define NAME_TBD 0xdeadb33f
+
+#define MAX_NR_RAW_TYPES 1024
+#define BTF_LOG_BUF_SIZE 65535
+
+#ifndef ARRAY_SIZE
+# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+static struct args {
+ unsigned int raw_test_num;
+ unsigned int file_test_num;
+ unsigned int get_info_test_num;
+ bool raw_test;
+ bool file_test;
+ bool get_info_test;
+ bool pprint_test;
+ bool always_log;
+} args;
+
+static char btf_log_buf[BTF_LOG_BUF_SIZE];
+
+static struct btf_header hdr_tmpl = {
+ .magic = BTF_MAGIC,
+ .version = BTF_VERSION,
+ .hdr_len = sizeof(struct btf_header),
+};
+
+struct btf_raw_test {
+ const char *descr;
+ const char *str_sec;
+ const char *map_name;
+ const char *err_str;
+ __u32 raw_types[MAX_NR_RAW_TYPES];
+ __u32 str_sec_size;
+ enum bpf_map_type map_type;
+ __u32 key_size;
+ __u32 value_size;
+ __u32 key_type_id;
+ __u32 value_type_id;
+ __u32 max_entries;
+ bool btf_load_err;
+ bool map_create_err;
+ int hdr_len_delta;
+ int type_off_delta;
+ int str_off_delta;
+ int str_len_delta;
+};
+
+static struct btf_raw_test raw_tests[] = {
+/* enum E {
+ * E0,
+ * E1,
+ * };
+ *
+ * struct A {
+ * unsigned long long m;
+ * int n;
+ * char o;
+ * [3 bytes hole]
+ * int p[8];
+ * int q[4][8];
+ * enum E r;
+ * };
+ */
+{
+ .descr = "struct test #1",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* unsigned long long */
+ BTF_TYPE_INT_ENC(0, 0, 0, 64, 8), /* [2] */
+ /* char */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 8, 1), /* [3] */
+ /* int[8] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 8), /* [4] */
+ /* struct A { */ /* [5] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 6), 180),
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0), /* unsigned long long m;*/
+ BTF_MEMBER_ENC(NAME_TBD, 1, 64),/* int n; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 96),/* char o; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 128),/* int p[8] */
+ BTF_MEMBER_ENC(NAME_TBD, 6, 384),/* int q[4][8] */
+ BTF_MEMBER_ENC(NAME_TBD, 7, 1408), /* enum E r */
+ /* } */
+ /* int[4][8] */
+ BTF_TYPE_ARRAY_ENC(4, 1, 4), /* [6] */
+ /* enum E */ /* [7] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_ENUM, 0, 2), sizeof(int)),
+ BTF_ENUM_ENC(NAME_TBD, 0),
+ BTF_ENUM_ENC(NAME_TBD, 1),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0p\0q\0r\0E\0E0\0E1",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0p\0q\0r\0E\0E0\0E1"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "struct_test1_map",
+ .key_size = sizeof(int),
+ .value_size = 180,
+ .key_type_id = 1,
+ .value_type_id = 5,
+ .max_entries = 4,
+},
+
+/* typedef struct b Struct_B;
+ *
+ * struct A {
+ * int m;
+ * struct b n[4];
+ * const Struct_B o[4];
+ * };
+ *
+ * struct B {
+ * int m;
+ * int n;
+ * };
+ */
+{
+ .descr = "struct test #2",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* struct b [4] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(4, 1, 4),
+
+ /* struct A { */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 3), 68),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int m; */
+ BTF_MEMBER_ENC(NAME_TBD, 2, 32),/* struct B n[4] */
+ BTF_MEMBER_ENC(NAME_TBD, 8, 288),/* const Struct_B o[4];*/
+ /* } */
+
+ /* struct B { */ /* [4] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), 8),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int m; */
+ BTF_MEMBER_ENC(NAME_TBD, 1, 32),/* int n; */
+ /* } */
+
+ /* const int */ /* [5] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 1),
+ /* typedef struct b Struct_B */ /* [6] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_TYPEDEF, 0, 0), 4),
+ /* const Struct_B */ /* [7] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 6),
+ /* const Struct_B [4] */ /* [8] */
+ BTF_TYPE_ARRAY_ENC(7, 1, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n\0o\0B\0m\0n\0Struct_B",
+ .str_sec_size = sizeof("\0A\0m\0n\0o\0B\0m\0n\0Struct_B"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "struct_test2_map",
+ .key_size = sizeof(int),
+ .value_size = 68,
+ .key_type_id = 1,
+ .value_type_id = 3,
+ .max_entries = 4,
+},
+
+/* Test member exceeds the size of struct.
+ *
+ * struct A {
+ * int m;
+ * int n;
+ * };
+ */
+{
+ .descr = "size check test #1",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* struct A { */ /* [2] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), sizeof(int) * 2 - 1),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int m; */
+ BTF_MEMBER_ENC(NAME_TBD, 1, 32),/* int n; */
+ /* } */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n",
+ .str_sec_size = sizeof("\0A\0m\0n"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "size_check1_map",
+ .key_size = sizeof(int),
+ .value_size = 1,
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Member exceeds struct_size",
+},
+
+/* Test member exeeds the size of struct
+ *
+ * struct A {
+ * int m;
+ * int n[2];
+ * };
+ */
+{
+ .descr = "size check test #2",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, sizeof(int)),
+ /* int[2] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 2),
+ /* struct A { */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), sizeof(int) * 3 - 1),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int m; */
+ BTF_MEMBER_ENC(NAME_TBD, 2, 32),/* int n[2]; */
+ /* } */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n",
+ .str_sec_size = sizeof("\0A\0m\0n"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "size_check2_map",
+ .key_size = sizeof(int),
+ .value_size = 1,
+ .key_type_id = 1,
+ .value_type_id = 3,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Member exceeds struct_size",
+},
+
+/* Test member exeeds the size of struct
+ *
+ * struct A {
+ * int m;
+ * void *n;
+ * };
+ */
+{
+ .descr = "size check test #3",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, sizeof(int)),
+ /* void* */ /* [2] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 0),
+ /* struct A { */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), sizeof(int) + sizeof(void *) - 1),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int m; */
+ BTF_MEMBER_ENC(NAME_TBD, 2, 32),/* void *n; */
+ /* } */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0n",
+ .str_sec_size = sizeof("\0A\0m\0n"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "size_check3_map",
+ .key_size = sizeof(int),
+ .value_size = 1,
+ .key_type_id = 1,
+ .value_type_id = 3,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Member exceeds struct_size",
+},
+
+/* Test member exceeds the size of struct
+ *
+ * enum E {
+ * E0,
+ * E1,
+ * };
+ *
+ * struct A {
+ * int m;
+ * enum E n;
+ * };
+ */
+{
+ .descr = "size check test #4",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, sizeof(int)),
+ /* enum E { */ /* [2] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_ENUM, 0, 2), sizeof(int)),
+ BTF_ENUM_ENC(NAME_TBD, 0),
+ BTF_ENUM_ENC(NAME_TBD, 1),
+ /* } */
+ /* struct A { */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), sizeof(int) * 2 - 1),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int m; */
+ BTF_MEMBER_ENC(NAME_TBD, 2, 32),/* enum E n; */
+ /* } */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0E\0E0\0E1\0A\0m\0n",
+ .str_sec_size = sizeof("\0E\0E0\0E1\0A\0m\0n"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "size_check4_map",
+ .key_size = sizeof(int),
+ .value_size = 1,
+ .key_type_id = 1,
+ .value_type_id = 3,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Member exceeds struct_size",
+},
+
+/* typedef const void * const_void_ptr;
+ * struct A {
+ * const_void_ptr m;
+ * };
+ */
+{
+ .descr = "void test #1",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* const void */ /* [2] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 0),
+ /* const void* */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2),
+ /* typedef const void * const_void_ptr */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 3),
+ /* struct A { */ /* [4] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), sizeof(void *)),
+ /* const_void_ptr m; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 0),
+ /* } */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0const_void_ptr\0A\0m",
+ .str_sec_size = sizeof("\0const_void_ptr\0A\0m"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "void_test1_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(void *),
+ .key_type_id = 1,
+ .value_type_id = 4,
+ .max_entries = 4,
+},
+
+/* struct A {
+ * const void m;
+ * };
+ */
+{
+ .descr = "void test #2",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* const void */ /* [2] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 0),
+ /* struct A { */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), 8),
+ /* const void m; */
+ BTF_MEMBER_ENC(NAME_TBD, 2, 0),
+ /* } */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m",
+ .str_sec_size = sizeof("\0A\0m"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "void_test2_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(void *),
+ .key_type_id = 1,
+ .value_type_id = 3,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid member",
+},
+
+/* typedef const void * const_void_ptr;
+ * const_void_ptr[4]
+ */
+{
+ .descr = "void test #3",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* const void */ /* [2] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 0),
+ /* const void* */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2),
+ /* typedef const void * const_void_ptr */ /* [4] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 3),
+ /* const_void_ptr[4] */ /* [5] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0const_void_ptr",
+ .str_sec_size = sizeof("\0const_void_ptr"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "void_test3_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(void *) * 4,
+ .key_type_id = 1,
+ .value_type_id = 4,
+ .max_entries = 4,
+},
+
+/* const void[4] */
+{
+ .descr = "void test #4",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* const void */ /* [2] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 0),
+ /* const void[4] */ /* [3] */
+ BTF_TYPE_ARRAY_ENC(2, 1, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m",
+ .str_sec_size = sizeof("\0A\0m"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "void_test4_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(void *) * 4,
+ .key_type_id = 1,
+ .value_type_id = 3,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid elem",
+},
+
+/* Array_A <------------------+
+ * elem_type == Array_B |
+ * | |
+ * | |
+ * Array_B <-------- + |
+ * elem_type == Array A --+
+ */
+{
+ .descr = "loop test #1",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* Array_A */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 8),
+ /* Array_B */ /* [3] */
+ BTF_TYPE_ARRAY_ENC(2, 1, 8),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test1_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(sizeof(int) * 8),
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+/* typedef is _before_ the BTF type of Array_A and Array_B
+ *
+ * typedef Array_B int_array;
+ *
+ * Array_A <------------------+
+ * elem_type == int_array |
+ * | |
+ * | |
+ * Array_B <-------- + |
+ * elem_type == Array_A --+
+ */
+{
+ .descr = "loop test #2",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* typedef Array_B int_array */
+ BTF_TYPEDEF_ENC(1, 4), /* [2] */
+ /* Array_A */
+ BTF_TYPE_ARRAY_ENC(2, 1, 8), /* [3] */
+ /* Array_B */
+ BTF_TYPE_ARRAY_ENC(3, 1, 8), /* [4] */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int_array\0",
+ .str_sec_size = sizeof("\0int_array"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test2_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(sizeof(int) * 8),
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+/* Array_A <------------------+
+ * elem_type == Array_B |
+ * | |
+ * | |
+ * Array_B <-------- + |
+ * elem_type == Array_A --+
+ */
+{
+ .descr = "loop test #3",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* Array_A */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 8),
+ /* Array_B */ /* [3] */
+ BTF_TYPE_ARRAY_ENC(2, 1, 8),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test3_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(sizeof(int) * 8),
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+/* typedef is _between_ the BTF type of Array_A and Array_B
+ *
+ * typedef Array_B int_array;
+ *
+ * Array_A <------------------+
+ * elem_type == int_array |
+ * | |
+ * | |
+ * Array_B <-------- + |
+ * elem_type == Array_A --+
+ */
+{
+ .descr = "loop test #4",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* Array_A */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 8),
+ /* typedef Array_B int_array */ /* [3] */
+ BTF_TYPEDEF_ENC(NAME_TBD, 4),
+ /* Array_B */ /* [4] */
+ BTF_TYPE_ARRAY_ENC(2, 1, 8),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int_array\0",
+ .str_sec_size = sizeof("\0int_array"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test4_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(sizeof(int) * 8),
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+/* typedef struct B Struct_B
+ *
+ * struct A {
+ * int x;
+ * Struct_B y;
+ * };
+ *
+ * struct B {
+ * int x;
+ * struct A y;
+ * };
+ */
+{
+ .descr = "loop test #5",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ /* struct A */ /* [2] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), 8),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int x; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 32),/* Struct_B y; */
+ /* typedef struct B Struct_B */
+ BTF_TYPEDEF_ENC(NAME_TBD, 4), /* [3] */
+ /* struct B */ /* [4] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), 8),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int x; */
+ BTF_MEMBER_ENC(NAME_TBD, 2, 32),/* struct A y; */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0x\0y\0Struct_B\0B\0x\0y",
+ .str_sec_size = sizeof("\0A\0x\0y\0Struct_B\0B\0x\0y"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test5_map",
+ .key_size = sizeof(int),
+ .value_size = 8,
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+/* struct A {
+ * int x;
+ * struct A array_a[4];
+ * };
+ */
+{
+ .descr = "loop test #6",
+ .raw_types = {
+ /* int */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 4), /* [2] */
+ /* struct A */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 2), 8),
+ BTF_MEMBER_ENC(NAME_TBD, 1, 0), /* int x; */
+ BTF_MEMBER_ENC(NAME_TBD, 2, 32),/* struct A array_a[4]; */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0x\0y",
+ .str_sec_size = sizeof("\0A\0x\0y"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test6_map",
+ .key_size = sizeof(int),
+ .value_size = 8,
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+{
+ .descr = "loop test #7",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* struct A { */ /* [2] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), sizeof(void *)),
+ /* const void *m; */
+ BTF_MEMBER_ENC(NAME_TBD, 3, 0),
+ /* CONST type_id=3 */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 4),
+ /* PTR type_id=2 */ /* [4] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 3),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m",
+ .str_sec_size = sizeof("\0A\0m"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test7_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(void *),
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+{
+ .descr = "loop test #8",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* struct A { */ /* [2] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), sizeof(void *)),
+ /* const void *m; */
+ BTF_MEMBER_ENC(NAME_TBD, 4, 0),
+ /* struct B { */ /* [3] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), sizeof(void *)),
+ /* const void *n; */
+ BTF_MEMBER_ENC(NAME_TBD, 6, 0),
+ /* CONST type_id=5 */ /* [4] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 5),
+ /* PTR type_id=6 */ /* [5] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 6),
+ /* CONST type_id=7 */ /* [6] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 7),
+ /* PTR type_id=4 */ /* [7] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0A\0m\0B\0n",
+ .str_sec_size = sizeof("\0A\0m\0B\0n"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "loop_test8_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(void *),
+ .key_type_id = 1,
+ .value_type_id = 2,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Loop detected",
+},
+
+{
+ .descr = "string section does not end with null",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int") - 1,
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid string section",
+},
+
+{
+ .descr = "empty string section",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = 0,
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid string section",
+},
+
+{
+ .descr = "empty type section",
+ .raw_types = {
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "No type found",
+},
+
+{
+ .descr = "btf_header test. Longer hdr_len",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .hdr_len_delta = 4,
+ .err_str = "Unsupported btf_header",
+},
+
+{
+ .descr = "btf_header test. Gap between hdr and type",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .type_off_delta = 4,
+ .err_str = "Unsupported section found",
+},
+
+{
+ .descr = "btf_header test. Gap between type and str",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .str_off_delta = 4,
+ .err_str = "Unsupported section found",
+},
+
+{
+ .descr = "btf_header test. Overlap between type and str",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .str_off_delta = -4,
+ .err_str = "Section overlap found",
+},
+
+{
+ .descr = "btf_header test. Larger BTF size",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .str_len_delta = -4,
+ .err_str = "Unsupported section found",
+},
+
+{
+ .descr = "btf_header test. Smaller BTF size",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "\0int",
+ .str_sec_size = sizeof("\0int"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "hdr_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .str_len_delta = 4,
+ .err_str = "Total section length too long",
+},
+
+{
+ .descr = "array test. index_type/elem_type \"int\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* int[16] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(1, 1, 16),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+},
+
+{
+ .descr = "array test. index_type/elem_type \"const int\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* int[16] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(3, 3, 16),
+ /* CONST type_id=1 */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 1),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+},
+
+{
+ .descr = "array test. index_type \"const int:31\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* int:31 */ /* [2] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 31, 4),
+ /* int[16] */ /* [3] */
+ BTF_TYPE_ARRAY_ENC(1, 4, 16),
+ /* CONST type_id=2 */ /* [4] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 2),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid index",
+},
+
+{
+ .descr = "array test. elem_type \"const int:31\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* int:31 */ /* [2] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 31, 4),
+ /* int[16] */ /* [3] */
+ BTF_TYPE_ARRAY_ENC(4, 1, 16),
+ /* CONST type_id=2 */ /* [4] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 2),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid array of int",
+},
+
+{
+ .descr = "array test. index_type \"void\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* int[16] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(1, 0, 16),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid index",
+},
+
+{
+ .descr = "array test. index_type \"const void\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* int[16] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(1, 3, 16),
+ /* CONST type_id=0 (void) */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 0),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid index",
+},
+
+{
+ .descr = "array test. elem_type \"const void\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* int[16] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 16),
+ /* CONST type_id=0 (void) */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 0),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid elem",
+},
+
+{
+ .descr = "array test. elem_type \"const void *\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* const void *[16] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(3, 1, 16),
+ /* CONST type_id=4 */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 4),
+ /* void* */ /* [4] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 0),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+},
+
+{
+ .descr = "array test. index_type \"const void *\"",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* const void *[16] */ /* [2] */
+ BTF_TYPE_ARRAY_ENC(3, 3, 16),
+ /* CONST type_id=4 */ /* [3] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), 4),
+ /* void* */ /* [4] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 0),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid index",
+},
+
+{
+ .descr = "int test. invalid int_data",
+ .raw_types = {
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), 4),
+ 0x10000000,
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid int_data",
+},
+
+{
+ .descr = "invalid BTF_INFO",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_TYPE_ENC(0, 0x10000000, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "array_test_map",
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .key_type_id = 1,
+ .value_type_id = 1,
+ .max_entries = 4,
+ .btf_load_err = true,
+ .err_str = "Invalid btf_info",
+},
+
+}; /* struct btf_raw_test raw_tests[] */
+
+static const char *get_next_str(const char *start, const char *end)
+{
+ return start < end - 1 ? start + 1 : NULL;
+}
+
+static int get_type_sec_size(const __u32 *raw_types)
+{
+ int i;
+
+ for (i = MAX_NR_RAW_TYPES - 1;
+ i >= 0 && raw_types[i] != BTF_END_RAW;
+ i--)
+ ;
+
+ return i < 0 ? i : i * sizeof(raw_types[0]);
+}
+
+static void *btf_raw_create(const struct btf_header *hdr,
+ const __u32 *raw_types,
+ const char *str,
+ unsigned int str_sec_size,
+ unsigned int *btf_size)
+{
+ const char *next_str = str, *end_str = str + str_sec_size;
+ unsigned int size_needed, offset;
+ struct btf_header *ret_hdr;
+ int i, type_sec_size;
+ uint32_t *ret_types;
+ void *raw_btf;
+
+ type_sec_size = get_type_sec_size(raw_types);
+ if (CHECK(type_sec_size < 0, "Cannot get nr_raw_types"))
+ return NULL;
+
+ size_needed = sizeof(*hdr) + type_sec_size + str_sec_size;
+ raw_btf = malloc(size_needed);
+ if (CHECK(!raw_btf, "Cannot allocate memory for raw_btf"))
+ return NULL;
+
+ /* Copy header */
+ memcpy(raw_btf, hdr, sizeof(*hdr));
+ offset = sizeof(*hdr);
+
+ /* Copy type section */
+ ret_types = raw_btf + offset;
+ for (i = 0; i < type_sec_size / sizeof(raw_types[0]); i++) {
+ if (raw_types[i] == NAME_TBD) {
+ next_str = get_next_str(next_str, end_str);
+ if (CHECK(!next_str, "Error in getting next_str")) {
+ free(raw_btf);
+ return NULL;
+ }
+ ret_types[i] = next_str - str;
+ next_str += strlen(next_str);
+ } else {
+ ret_types[i] = raw_types[i];
+ }
+ }
+ offset += type_sec_size;
+
+ /* Copy string section */
+ memcpy(raw_btf + offset, str, str_sec_size);
+
+ ret_hdr = (struct btf_header *)raw_btf;
+ ret_hdr->type_len = type_sec_size;
+ ret_hdr->str_off = type_sec_size;
+ ret_hdr->str_len = str_sec_size;
+
+ *btf_size = size_needed;
+
+ return raw_btf;
+}
+
+static int do_test_raw(unsigned int test_num)
+{
+ struct btf_raw_test *test = &raw_tests[test_num - 1];
+ struct bpf_create_map_attr create_attr = {};
+ int map_fd = -1, btf_fd = -1;
+ unsigned int raw_btf_size;
+ struct btf_header *hdr;
+ void *raw_btf;
+ int err;
+
+ fprintf(stderr, "BTF raw test[%u] (%s): ", test_num, test->descr);
+ raw_btf = btf_raw_create(&hdr_tmpl,
+ test->raw_types,
+ test->str_sec,
+ test->str_sec_size,
+ &raw_btf_size);
+
+ if (!raw_btf)
+ return -1;
+
+ hdr = raw_btf;
+
+ hdr->hdr_len = (int)hdr->hdr_len + test->hdr_len_delta;
+ hdr->type_off = (int)hdr->type_off + test->type_off_delta;
+ hdr->str_off = (int)hdr->str_off + test->str_off_delta;
+ hdr->str_len = (int)hdr->str_len + test->str_len_delta;
+
+ *btf_log_buf = '\0';
+ btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
+ btf_log_buf, BTF_LOG_BUF_SIZE,
+ args.always_log);
+ free(raw_btf);
+
+ err = ((btf_fd == -1) != test->btf_load_err);
+ if (CHECK(err, "btf_fd:%d test->btf_load_err:%u",
+ btf_fd, test->btf_load_err) ||
+ CHECK(test->err_str && !strstr(btf_log_buf, test->err_str),
+ "expected err_str:%s", test->err_str)) {
+ err = -1;
+ goto done;
+ }
+
+ if (err || btf_fd == -1)
+ goto done;
+
+ create_attr.name = test->map_name;
+ create_attr.map_type = test->map_type;
+ create_attr.key_size = test->key_size;
+ create_attr.value_size = test->value_size;
+ create_attr.max_entries = test->max_entries;
+ create_attr.btf_fd = btf_fd;
+ create_attr.btf_key_type_id = test->key_type_id;
+ create_attr.btf_value_type_id = test->value_type_id;
+
+ map_fd = bpf_create_map_xattr(&create_attr);
+
+ err = ((map_fd == -1) != test->map_create_err);
+ CHECK(err, "map_fd:%d test->map_create_err:%u",
+ map_fd, test->map_create_err);
+
+done:
+ if (!err)
+ fprintf(stderr, "OK");
+
+ if (*btf_log_buf && (err || args.always_log))
+ fprintf(stderr, "\n%s", btf_log_buf);
+
+ if (btf_fd != -1)
+ close(btf_fd);
+ if (map_fd != -1)
+ close(map_fd);
+
+ return err;
+}
+
+static int test_raw(void)
+{
+ unsigned int i;
+ int err = 0;
+
+ if (args.raw_test_num)
+ return count_result(do_test_raw(args.raw_test_num));
+
+ for (i = 1; i <= ARRAY_SIZE(raw_tests); i++)
+ err |= count_result(do_test_raw(i));
+
+ return err;
+}
+
+struct btf_get_info_test {
+ const char *descr;
+ const char *str_sec;
+ __u32 raw_types[MAX_NR_RAW_TYPES];
+ __u32 str_sec_size;
+ int btf_size_delta;
+ int (*special_test)(unsigned int test_num);
+};
+
+static int test_big_btf_info(unsigned int test_num);
+static int test_btf_id(unsigned int test_num);
+
+const struct btf_get_info_test get_info_tests[] = {
+{
+ .descr = "== raw_btf_size+1",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .btf_size_delta = 1,
+},
+{
+ .descr = "== raw_btf_size-3",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .btf_size_delta = -3,
+},
+{
+ .descr = "Large bpf_btf_info",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .special_test = test_big_btf_info,
+},
+{
+ .descr = "BTF ID",
+ .raw_types = {
+ /* int */ /* [1] */
+ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),
+ /* unsigned int */ /* [2] */
+ BTF_TYPE_INT_ENC(0, 0, 0, 32, 4),
+ BTF_END_RAW,
+ },
+ .str_sec = "",
+ .str_sec_size = sizeof(""),
+ .special_test = test_btf_id,
+},
+};
+
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+ return (__u64)(unsigned long)ptr;
+}
+
+static int test_big_btf_info(unsigned int test_num)
+{
+ const struct btf_get_info_test *test = &get_info_tests[test_num - 1];
+ uint8_t *raw_btf = NULL, *user_btf = NULL;
+ unsigned int raw_btf_size;
+ struct {
+ struct bpf_btf_info info;
+ uint64_t garbage;
+ } info_garbage;
+ struct bpf_btf_info *info;
+ int btf_fd = -1, err;
+ uint32_t info_len;
+
+ raw_btf = btf_raw_create(&hdr_tmpl,
+ test->raw_types,
+ test->str_sec,
+ test->str_sec_size,
+ &raw_btf_size);
+
+ if (!raw_btf)
+ return -1;
+
+ *btf_log_buf = '\0';
+
+ user_btf = malloc(raw_btf_size);
+ if (CHECK(!user_btf, "!user_btf")) {
+ err = -1;
+ goto done;
+ }
+
+ btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
+ btf_log_buf, BTF_LOG_BUF_SIZE,
+ args.always_log);
+ if (CHECK(btf_fd == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ /*
+ * GET_INFO should error out if the userspace info
+ * has non zero tailing bytes.
+ */
+ info = &info_garbage.info;
+ memset(info, 0, sizeof(*info));
+ info_garbage.garbage = 0xdeadbeef;
+ info_len = sizeof(info_garbage);
+ info->btf = ptr_to_u64(user_btf);
+ info->btf_size = raw_btf_size;
+
+ err = bpf_obj_get_info_by_fd(btf_fd, info, &info_len);
+ if (CHECK(!err, "!err")) {
+ err = -1;
+ goto done;
+ }
+
+ /*
+ * GET_INFO should succeed even info_len is larger than
+ * the kernel supported as long as tailing bytes are zero.
+ * The kernel supported info len should also be returned
+ * to userspace.
+ */
+ info_garbage.garbage = 0;
+ err = bpf_obj_get_info_by_fd(btf_fd, info, &info_len);
+ if (CHECK(err || info_len != sizeof(*info),
+ "err:%d errno:%d info_len:%u sizeof(*info):%lu",
+ err, errno, info_len, sizeof(*info))) {
+ err = -1;
+ goto done;
+ }
+
+ fprintf(stderr, "OK");
+
+done:
+ if (*btf_log_buf && (err || args.always_log))
+ fprintf(stderr, "\n%s", btf_log_buf);
+
+ free(raw_btf);
+ free(user_btf);
+
+ if (btf_fd != -1)
+ close(btf_fd);
+
+ return err;
+}
+
+static int test_btf_id(unsigned int test_num)
+{
+ const struct btf_get_info_test *test = &get_info_tests[test_num - 1];
+ struct bpf_create_map_attr create_attr = {};
+ uint8_t *raw_btf = NULL, *user_btf[2] = {};
+ int btf_fd[2] = {-1, -1}, map_fd = -1;
+ struct bpf_map_info map_info = {};
+ struct bpf_btf_info info[2] = {};
+ unsigned int raw_btf_size;
+ uint32_t info_len;
+ int err, i, ret;
+
+ raw_btf = btf_raw_create(&hdr_tmpl,
+ test->raw_types,
+ test->str_sec,
+ test->str_sec_size,
+ &raw_btf_size);
+
+ if (!raw_btf)
+ return -1;
+
+ *btf_log_buf = '\0';
+
+ for (i = 0; i < 2; i++) {
+ user_btf[i] = malloc(raw_btf_size);
+ if (CHECK(!user_btf[i], "!user_btf[%d]", i)) {
+ err = -1;
+ goto done;
+ }
+ info[i].btf = ptr_to_u64(user_btf[i]);
+ info[i].btf_size = raw_btf_size;
+ }
+
+ btf_fd[0] = bpf_load_btf(raw_btf, raw_btf_size,
+ btf_log_buf, BTF_LOG_BUF_SIZE,
+ args.always_log);
+ if (CHECK(btf_fd[0] == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ /* Test BPF_OBJ_GET_INFO_BY_ID on btf_id */
+ info_len = sizeof(info[0]);
+ err = bpf_obj_get_info_by_fd(btf_fd[0], &info[0], &info_len);
+ if (CHECK(err, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ btf_fd[1] = bpf_btf_get_fd_by_id(info[0].id);
+ if (CHECK(btf_fd[1] == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ ret = 0;
+ err = bpf_obj_get_info_by_fd(btf_fd[1], &info[1], &info_len);
+ if (CHECK(err || info[0].id != info[1].id ||
+ info[0].btf_size != info[1].btf_size ||
+ (ret = memcmp(user_btf[0], user_btf[1], info[0].btf_size)),
+ "err:%d errno:%d id0:%u id1:%u btf_size0:%u btf_size1:%u memcmp:%d",
+ err, errno, info[0].id, info[1].id,
+ info[0].btf_size, info[1].btf_size, ret)) {
+ err = -1;
+ goto done;
+ }
+
+ /* Test btf members in struct bpf_map_info */
+ create_attr.name = "test_btf_id";
+ create_attr.map_type = BPF_MAP_TYPE_ARRAY;
+ create_attr.key_size = sizeof(int);
+ create_attr.value_size = sizeof(unsigned int);
+ create_attr.max_entries = 4;
+ create_attr.btf_fd = btf_fd[0];
+ create_attr.btf_key_type_id = 1;
+ create_attr.btf_value_type_id = 2;
+
+ map_fd = bpf_create_map_xattr(&create_attr);
+ if (CHECK(map_fd == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ info_len = sizeof(map_info);
+ err = bpf_obj_get_info_by_fd(map_fd, &map_info, &info_len);
+ if (CHECK(err || map_info.btf_id != info[0].id ||
+ map_info.btf_key_type_id != 1 || map_info.btf_value_type_id != 2,
+ "err:%d errno:%d info.id:%u btf_id:%u btf_key_type_id:%u btf_value_type_id:%u",
+ err, errno, info[0].id, map_info.btf_id, map_info.btf_key_type_id,
+ map_info.btf_value_type_id)) {
+ err = -1;
+ goto done;
+ }
+
+ for (i = 0; i < 2; i++) {
+ close(btf_fd[i]);
+ btf_fd[i] = -1;
+ }
+
+ /* Test BTF ID is removed from the kernel */
+ btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
+ if (CHECK(btf_fd[0] == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+ close(btf_fd[0]);
+ btf_fd[0] = -1;
+
+ /* The map holds the last ref to BTF and its btf_id */
+ close(map_fd);
+ map_fd = -1;
+ btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
+ if (CHECK(btf_fd[0] != -1, "BTF lingers")) {
+ err = -1;
+ goto done;
+ }
+
+ fprintf(stderr, "OK");
+
+done:
+ if (*btf_log_buf && (err || args.always_log))
+ fprintf(stderr, "\n%s", btf_log_buf);
+
+ free(raw_btf);
+ if (map_fd != -1)
+ close(map_fd);
+ for (i = 0; i < 2; i++) {
+ free(user_btf[i]);
+ if (btf_fd[i] != -1)
+ close(btf_fd[i]);
+ }
+
+ return err;
+}
+
+static int do_test_get_info(unsigned int test_num)
+{
+ const struct btf_get_info_test *test = &get_info_tests[test_num - 1];
+ unsigned int raw_btf_size, user_btf_size, expected_nbytes;
+ uint8_t *raw_btf = NULL, *user_btf = NULL;
+ struct bpf_btf_info info = {};
+ int btf_fd = -1, err, ret;
+ uint32_t info_len;
+
+ fprintf(stderr, "BTF GET_INFO test[%u] (%s): ",
+ test_num, test->descr);
+
+ if (test->special_test)
+ return test->special_test(test_num);
+
+ raw_btf = btf_raw_create(&hdr_tmpl,
+ test->raw_types,
+ test->str_sec,
+ test->str_sec_size,
+ &raw_btf_size);
+
+ if (!raw_btf)
+ return -1;
+
+ *btf_log_buf = '\0';
+
+ user_btf = malloc(raw_btf_size);
+ if (CHECK(!user_btf, "!user_btf")) {
+ err = -1;
+ goto done;
+ }
+
+ btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
+ btf_log_buf, BTF_LOG_BUF_SIZE,
+ args.always_log);
+ if (CHECK(btf_fd == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ user_btf_size = (int)raw_btf_size + test->btf_size_delta;
+ expected_nbytes = min(raw_btf_size, user_btf_size);
+ if (raw_btf_size > expected_nbytes)
+ memset(user_btf + expected_nbytes, 0xff,
+ raw_btf_size - expected_nbytes);
+
+ info_len = sizeof(info);
+ info.btf = ptr_to_u64(user_btf);
+ info.btf_size = user_btf_size;
+
+ ret = 0;
+ err = bpf_obj_get_info_by_fd(btf_fd, &info, &info_len);
+ if (CHECK(err || !info.id || info_len != sizeof(info) ||
+ info.btf_size != raw_btf_size ||
+ (ret = memcmp(raw_btf, user_btf, expected_nbytes)),
+ "err:%d errno:%d info.id:%u info_len:%u sizeof(info):%lu raw_btf_size:%u info.btf_size:%u expected_nbytes:%u memcmp:%d",
+ err, errno, info.id, info_len, sizeof(info),
+ raw_btf_size, info.btf_size, expected_nbytes, ret)) {
+ err = -1;
+ goto done;
+ }
+
+ while (expected_nbytes < raw_btf_size) {
+ fprintf(stderr, "%u...", expected_nbytes);
+ if (CHECK(user_btf[expected_nbytes++] != 0xff,
+ "user_btf[%u]:%x != 0xff", expected_nbytes - 1,
+ user_btf[expected_nbytes - 1])) {
+ err = -1;
+ goto done;
+ }
+ }
+
+ fprintf(stderr, "OK");
+
+done:
+ if (*btf_log_buf && (err || args.always_log))
+ fprintf(stderr, "\n%s", btf_log_buf);
+
+ free(raw_btf);
+ free(user_btf);
+
+ if (btf_fd != -1)
+ close(btf_fd);
+
+ return err;
+}
+
+static int test_get_info(void)
+{
+ unsigned int i;
+ int err = 0;
+
+ if (args.get_info_test_num)
+ return count_result(do_test_get_info(args.get_info_test_num));
+
+ for (i = 1; i <= ARRAY_SIZE(get_info_tests); i++)
+ err |= count_result(do_test_get_info(i));
+
+ return err;
+}
+
+struct btf_file_test {
+ const char *file;
+ bool btf_kv_notfound;
+};
+
+static struct btf_file_test file_tests[] = {
+{
+ .file = "test_btf_haskv.o",
+},
+{
+ .file = "test_btf_nokv.o",
+ .btf_kv_notfound = true,
+},
+};
+
+static int file_has_btf_elf(const char *fn)
+{
+ Elf_Scn *scn = NULL;
+ GElf_Ehdr ehdr;
+ int elf_fd;
+ Elf *elf;
+ int ret;
+
+ if (CHECK(elf_version(EV_CURRENT) == EV_NONE,
+ "elf_version(EV_CURRENT) == EV_NONE"))
+ return -1;
+
+ elf_fd = open(fn, O_RDONLY);
+ if (CHECK(elf_fd == -1, "open(%s): errno:%d", fn, errno))
+ return -1;
+
+ elf = elf_begin(elf_fd, ELF_C_READ, NULL);
+ if (CHECK(!elf, "elf_begin(%s): %s", fn, elf_errmsg(elf_errno()))) {
+ ret = -1;
+ goto done;
+ }
+
+ if (CHECK(!gelf_getehdr(elf, &ehdr), "!gelf_getehdr(%s)", fn)) {
+ ret = -1;
+ goto done;
+ }
+
+ while ((scn = elf_nextscn(elf, scn))) {
+ const char *sh_name;
+ GElf_Shdr sh;
+
+ if (CHECK(gelf_getshdr(scn, &sh) != &sh,
+ "file:%s gelf_getshdr != &sh", fn)) {
+ ret = -1;
+ goto done;
+ }
+
+ sh_name = elf_strptr(elf, ehdr.e_shstrndx, sh.sh_name);
+ if (!strcmp(sh_name, BTF_ELF_SEC)) {
+ ret = 1;
+ goto done;
+ }
+ }
+
+ ret = 0;
+
+done:
+ close(elf_fd);
+ elf_end(elf);
+ return ret;
+}
+
+static int do_test_file(unsigned int test_num)
+{
+ const struct btf_file_test *test = &file_tests[test_num - 1];
+ struct bpf_object *obj = NULL;
+ struct bpf_program *prog;
+ struct bpf_map *map;
+ int err;
+
+ fprintf(stderr, "BTF libbpf test[%u] (%s): ", test_num,
+ test->file);
+
+ err = file_has_btf_elf(test->file);
+ if (err == -1)
+ return err;
+
+ if (err == 0) {
+ fprintf(stderr, "SKIP. No ELF %s found", BTF_ELF_SEC);
+ skip_cnt++;
+ return 0;
+ }
+
+ obj = bpf_object__open(test->file);
+ if (CHECK(IS_ERR(obj), "obj: %ld", PTR_ERR(obj)))
+ return PTR_ERR(obj);
+
+ err = bpf_object__btf_fd(obj);
+ if (CHECK(err == -1, "bpf_object__btf_fd: -1"))
+ goto done;
+
+ prog = bpf_program__next(NULL, obj);
+ if (CHECK(!prog, "Cannot find bpf_prog")) {
+ err = -1;
+ goto done;
+ }
+
+ bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
+ err = bpf_object__load(obj);
+ if (CHECK(err < 0, "bpf_object__load: %d", err))
+ goto done;
+
+ map = bpf_object__find_map_by_name(obj, "btf_map");
+ if (CHECK(!map, "btf_map not found")) {
+ err = -1;
+ goto done;
+ }
+
+ err = (bpf_map__btf_key_type_id(map) == 0 || bpf_map__btf_value_type_id(map) == 0)
+ != test->btf_kv_notfound;
+ if (CHECK(err, "btf_key_type_id:%u btf_value_type_id:%u test->btf_kv_notfound:%u",
+ bpf_map__btf_key_type_id(map), bpf_map__btf_value_type_id(map),
+ test->btf_kv_notfound))
+ goto done;
+
+ fprintf(stderr, "OK");
+
+done:
+ bpf_object__close(obj);
+ return err;
+}
+
+static int test_file(void)
+{
+ unsigned int i;
+ int err = 0;
+
+ if (args.file_test_num)
+ return count_result(do_test_file(args.file_test_num));
+
+ for (i = 1; i <= ARRAY_SIZE(file_tests); i++)
+ err |= count_result(do_test_file(i));
+
+ return err;
+}
+
+const char *pprint_enum_str[] = {
+ "ENUM_ZERO",
+ "ENUM_ONE",
+ "ENUM_TWO",
+ "ENUM_THREE",
+};
+
+struct pprint_mapv {
+ uint32_t ui32;
+ uint16_t ui16;
+ /* 2 bytes hole */
+ int32_t si32;
+ uint32_t unused_bits2a:2,
+ bits28:28,
+ unused_bits2b:2;
+ union {
+ uint64_t ui64;
+ uint8_t ui8a[8];
+ };
+ enum {
+ ENUM_ZERO,
+ ENUM_ONE,
+ ENUM_TWO,
+ ENUM_THREE,
+ } aenum;
+};
+
+static struct btf_raw_test pprint_test = {
+ .descr = "BTF pretty print test #1",
+ .raw_types = {
+ /* unsighed char */ /* [1] */
+ BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 8, 1),
+ /* unsigned short */ /* [2] */
+ BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 16, 2),
+ /* unsigned int */ /* [3] */
+ BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 32, 4),
+ /* int */ /* [4] */
+ BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 4),
+ /* unsigned long long */ /* [5] */
+ BTF_TYPE_INT_ENC(NAME_TBD, 0, 0, 64, 8),
+ /* 2 bits */ /* [6] */
+ BTF_TYPE_INT_ENC(0, 0, 0, 2, 2),
+ /* 28 bits */ /* [7] */
+ BTF_TYPE_INT_ENC(0, 0, 0, 28, 4),
+ /* uint8_t[8] */ /* [8] */
+ BTF_TYPE_ARRAY_ENC(9, 1, 8),
+ /* typedef unsigned char uint8_t */ /* [9] */
+ BTF_TYPEDEF_ENC(NAME_TBD, 1),
+ /* typedef unsigned short uint16_t */ /* [10] */
+ BTF_TYPEDEF_ENC(NAME_TBD, 2),
+ /* typedef unsigned int uint32_t */ /* [11] */
+ BTF_TYPEDEF_ENC(NAME_TBD, 3),
+ /* typedef int int32_t */ /* [12] */
+ BTF_TYPEDEF_ENC(NAME_TBD, 4),
+ /* typedef unsigned long long uint64_t *//* [13] */
+ BTF_TYPEDEF_ENC(NAME_TBD, 5),
+ /* union (anon) */ /* [14] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_UNION, 0, 2), 8),
+ BTF_MEMBER_ENC(NAME_TBD, 13, 0),/* uint64_t ui64; */
+ BTF_MEMBER_ENC(NAME_TBD, 8, 0), /* uint8_t ui8a[8]; */
+ /* enum (anon) */ /* [15] */
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_ENUM, 0, 4), 4),
+ BTF_ENUM_ENC(NAME_TBD, 0),
+ BTF_ENUM_ENC(NAME_TBD, 1),
+ BTF_ENUM_ENC(NAME_TBD, 2),
+ BTF_ENUM_ENC(NAME_TBD, 3),
+ /* struct pprint_mapv */ /* [16] */
+ BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 8), 28),
+ BTF_MEMBER_ENC(NAME_TBD, 11, 0), /* uint32_t ui32 */
+ BTF_MEMBER_ENC(NAME_TBD, 10, 32), /* uint16_t ui16 */
+ BTF_MEMBER_ENC(NAME_TBD, 12, 64), /* int32_t si32 */
+ BTF_MEMBER_ENC(NAME_TBD, 6, 96), /* unused_bits2a */
+ BTF_MEMBER_ENC(NAME_TBD, 7, 98), /* bits28 */
+ BTF_MEMBER_ENC(NAME_TBD, 6, 126), /* unused_bits2b */
+ BTF_MEMBER_ENC(0, 14, 128), /* union (anon) */
+ BTF_MEMBER_ENC(NAME_TBD, 15, 192), /* aenum */
+ BTF_END_RAW,
+ },
+ .str_sec = "\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum",
+ .str_sec_size = sizeof("\0unsigned char\0unsigned short\0unsigned int\0int\0unsigned long long\0uint8_t\0uint16_t\0uint32_t\0int32_t\0uint64_t\0ui64\0ui8a\0ENUM_ZERO\0ENUM_ONE\0ENUM_TWO\0ENUM_THREE\0pprint_mapv\0ui32\0ui16\0si32\0unused_bits2a\0bits28\0unused_bits2b\0aenum"),
+ .map_type = BPF_MAP_TYPE_ARRAY,
+ .map_name = "pprint_test",
+ .key_size = sizeof(unsigned int),
+ .value_size = sizeof(struct pprint_mapv),
+ .key_type_id = 3, /* unsigned int */
+ .value_type_id = 16, /* struct pprint_mapv */
+ .max_entries = 128 * 1024,
+};
+
+static void set_pprint_mapv(struct pprint_mapv *v, uint32_t i)
+{
+ v->ui32 = i;
+ v->si32 = -i;
+ v->unused_bits2a = 3;
+ v->bits28 = i;
+ v->unused_bits2b = 3;
+ v->ui64 = i;
+ v->aenum = i & 0x03;
+}
+
+static int test_pprint(void)
+{
+ const struct btf_raw_test *test = &pprint_test;
+ struct bpf_create_map_attr create_attr = {};
+ int map_fd = -1, btf_fd = -1;
+ struct pprint_mapv mapv = {};
+ unsigned int raw_btf_size;
+ char expected_line[255];
+ FILE *pin_file = NULL;
+ char pin_path[255];
+ size_t line_len = 0;
+ char *line = NULL;
+ unsigned int key;
+ uint8_t *raw_btf;
+ ssize_t nread;
+ int err, ret;
+
+ fprintf(stderr, "%s......", test->descr);
+ raw_btf = btf_raw_create(&hdr_tmpl, test->raw_types,
+ test->str_sec, test->str_sec_size,
+ &raw_btf_size);
+
+ if (!raw_btf)
+ return -1;
+
+ *btf_log_buf = '\0';
+ btf_fd = bpf_load_btf(raw_btf, raw_btf_size,
+ btf_log_buf, BTF_LOG_BUF_SIZE,
+ args.always_log);
+ free(raw_btf);
+
+ if (CHECK(btf_fd == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ create_attr.name = test->map_name;
+ create_attr.map_type = test->map_type;
+ create_attr.key_size = test->key_size;
+ create_attr.value_size = test->value_size;
+ create_attr.max_entries = test->max_entries;
+ create_attr.btf_fd = btf_fd;
+ create_attr.btf_key_type_id = test->key_type_id;
+ create_attr.btf_value_type_id = test->value_type_id;
+
+ map_fd = bpf_create_map_xattr(&create_attr);
+ if (CHECK(map_fd == -1, "errno:%d", errno)) {
+ err = -1;
+ goto done;
+ }
+
+ ret = snprintf(pin_path, sizeof(pin_path), "%s/%s",
+ "/sys/fs/bpf", test->map_name);
+
+ if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long",
+ "/sys/fs/bpf", test->map_name)) {
+ err = -1;
+ goto done;
+ }
+
+ err = bpf_obj_pin(map_fd, pin_path);
+ if (CHECK(err, "bpf_obj_pin(%s): errno:%d.", pin_path, errno))
+ goto done;
+
+ for (key = 0; key < test->max_entries; key++) {
+ set_pprint_mapv(&mapv, key);
+ bpf_map_update_elem(map_fd, &key, &mapv, 0);
+ }
+
+ pin_file = fopen(pin_path, "r");
+ if (CHECK(!pin_file, "fopen(%s): errno:%d", pin_path, errno)) {
+ err = -1;
+ goto done;
+ }
+
+ /* Skip lines start with '#' */
+ while ((nread = getline(&line, &line_len, pin_file)) > 0 &&
+ *line == '#')
+ ;
+
+ if (CHECK(nread <= 0, "Unexpected EOF")) {
+ err = -1;
+ goto done;
+ }
+
+ key = 0;
+ do {
+ ssize_t nexpected_line;
+
+ set_pprint_mapv(&mapv, key);
+ nexpected_line = snprintf(expected_line, sizeof(expected_line),
+ "%u: {%u,0,%d,0x%x,0x%x,0x%x,{%lu|[%u,%u,%u,%u,%u,%u,%u,%u]},%s}\n",
+ key,
+ mapv.ui32, mapv.si32,
+ mapv.unused_bits2a, mapv.bits28, mapv.unused_bits2b,
+ mapv.ui64,
+ mapv.ui8a[0], mapv.ui8a[1], mapv.ui8a[2], mapv.ui8a[3],
+ mapv.ui8a[4], mapv.ui8a[5], mapv.ui8a[6], mapv.ui8a[7],
+ pprint_enum_str[mapv.aenum]);
+
+ if (CHECK(nexpected_line == sizeof(expected_line),
+ "expected_line is too long")) {
+ err = -1;
+ goto done;
+ }
+
+ if (strcmp(expected_line, line)) {
+ err = -1;
+ fprintf(stderr, "unexpected pprint output\n");
+ fprintf(stderr, "expected: %s", expected_line);
+ fprintf(stderr, " read: %s", line);
+ goto done;
+ }
+
+ nread = getline(&line, &line_len, pin_file);
+ } while (++key < test->max_entries && nread > 0);
+
+ if (CHECK(key < test->max_entries,
+ "Unexpected EOF. key:%u test->max_entries:%u",
+ key, test->max_entries)) {
+ err = -1;
+ goto done;
+ }
+
+ if (CHECK(nread > 0, "Unexpected extra pprint output: %s", line)) {
+ err = -1;
+ goto done;
+ }
+
+ err = 0;
+
+done:
+ if (!err)
+ fprintf(stderr, "OK");
+ if (*btf_log_buf && (err || args.always_log))
+ fprintf(stderr, "\n%s", btf_log_buf);
+ if (btf_fd != -1)
+ close(btf_fd);
+ if (map_fd != -1)
+ close(map_fd);
+ if (pin_file)
+ fclose(pin_file);
+ unlink(pin_path);
+ free(line);
+
+ return err;
+}
+
+static void usage(const char *cmd)
+{
+ fprintf(stderr, "Usage: %s [-l] [[-r test_num (1 - %zu)] | [-g test_num (1 - %zu)] | [-f test_num (1 - %zu)] | [-p]]\n",
+ cmd, ARRAY_SIZE(raw_tests), ARRAY_SIZE(get_info_tests),
+ ARRAY_SIZE(file_tests));
+}
+
+static int parse_args(int argc, char **argv)
+{
+ const char *optstr = "lpf:r:g:";
+ int opt;
+
+ while ((opt = getopt(argc, argv, optstr)) != -1) {
+ switch (opt) {
+ case 'l':
+ args.always_log = true;
+ break;
+ case 'f':
+ args.file_test_num = atoi(optarg);
+ args.file_test = true;
+ break;
+ case 'r':
+ args.raw_test_num = atoi(optarg);
+ args.raw_test = true;
+ break;
+ case 'g':
+ args.get_info_test_num = atoi(optarg);
+ args.get_info_test = true;
+ break;
+ case 'p':
+ args.pprint_test = true;
+ break;
+ case 'h':
+ usage(argv[0]);
+ exit(0);
+ default:
+ usage(argv[0]);
+ return -1;
+ }
+ }
+
+ if (args.raw_test_num &&
+ (args.raw_test_num < 1 ||
+ args.raw_test_num > ARRAY_SIZE(raw_tests))) {
+ fprintf(stderr, "BTF raw test number must be [1 - %zu]\n",
+ ARRAY_SIZE(raw_tests));
+ return -1;
+ }
+
+ if (args.file_test_num &&
+ (args.file_test_num < 1 ||
+ args.file_test_num > ARRAY_SIZE(file_tests))) {
+ fprintf(stderr, "BTF file test number must be [1 - %zu]\n",
+ ARRAY_SIZE(file_tests));
+ return -1;
+ }
+
+ if (args.get_info_test_num &&
+ (args.get_info_test_num < 1 ||
+ args.get_info_test_num > ARRAY_SIZE(get_info_tests))) {
+ fprintf(stderr, "BTF get info test number must be [1 - %zu]\n",
+ ARRAY_SIZE(get_info_tests));
+ return -1;
+ }
+
+ return 0;
+}
+
+static void print_summary(void)
+{
+ fprintf(stderr, "PASS:%u SKIP:%u FAIL:%u\n",
+ pass_cnt - skip_cnt, skip_cnt, error_cnt);
+}
+
+int main(int argc, char **argv)
+{
+ int err = 0;
+
+ err = parse_args(argc, argv);
+ if (err)
+ return err;
+
+ if (args.always_log)
+ libbpf_set_print(__base_pr, __base_pr, __base_pr);
+
+ if (args.raw_test)
+ err |= test_raw();
+
+ if (args.get_info_test)
+ err |= test_get_info();
+
+ if (args.file_test)
+ err |= test_file();
+
+ if (args.pprint_test)
+ err |= count_result(test_pprint());
+
+ if (args.raw_test || args.get_info_test || args.file_test ||
+ args.pprint_test)
+ goto done;
+
+ err |= test_raw();
+ err |= test_get_info();
+ err |= test_file();
+
+done:
+ print_summary();
+ return err;
+}
diff --git a/tools/testing/selftests/bpf/test_btf_haskv.c b/tools/testing/selftests/bpf/test_btf_haskv.c
new file mode 100644
index 0000000..8c7ca09
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_btf_haskv.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+
+struct ipv_counts {
+ unsigned int v4;
+ unsigned int v6;
+};
+
+typedef int btf_map_key;
+typedef struct ipv_counts btf_map_value;
+btf_map_key dumm_key;
+btf_map_value dummy_value;
+
+struct bpf_map_def SEC("maps") btf_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(struct ipv_counts),
+ .max_entries = 4,
+};
+
+struct dummy_tracepoint_args {
+ unsigned long long pad;
+ struct sock *sock;
+};
+
+SEC("dummy_tracepoint")
+int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
+{
+ struct ipv_counts *counts;
+ int key = 0;
+
+ if (!arg->sock)
+ return 0;
+
+ counts = bpf_map_lookup_elem(&btf_map, &key);
+ if (!counts)
+ return 0;
+
+ counts->v6++;
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_btf_nokv.c b/tools/testing/selftests/bpf/test_btf_nokv.c
new file mode 100644
index 0000000..0ed8e08
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_btf_nokv.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+
+struct ipv_counts {
+ unsigned int v4;
+ unsigned int v6;
+};
+
+struct bpf_map_def SEC("maps") btf_map = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(struct ipv_counts),
+ .max_entries = 4,
+};
+
+struct dummy_tracepoint_args {
+ unsigned long long pad;
+ struct sock *sock;
+};
+
+SEC("dummy_tracepoint")
+int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
+{
+ struct ipv_counts *counts;
+ int key = 0;
+
+ if (!arg->sock)
+ return 0;
+
+ counts = bpf_map_lookup_elem(&btf_map, &key);
+ if (!counts)
+ return 0;
+
+ counts->v6++;
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/test_get_stack_rawtp.c
new file mode 100644
index 0000000..f6d9f23
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_get_stack_rawtp.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+/* Permit pretty deep stack traces */
+#define MAX_STACK_RAWTP 100
+struct stack_trace_t {
+ int pid;
+ int kern_stack_size;
+ int user_stack_size;
+ int user_stack_buildid_size;
+ __u64 kern_stack[MAX_STACK_RAWTP];
+ __u64 user_stack[MAX_STACK_RAWTP];
+ struct bpf_stack_build_id user_stack_buildid[MAX_STACK_RAWTP];
+};
+
+struct bpf_map_def SEC("maps") perfmap = {
+ .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(__u32),
+ .max_entries = 2,
+};
+
+struct bpf_map_def SEC("maps") stackdata_map = {
+ .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct stack_trace_t),
+ .max_entries = 1,
+};
+
+/* Allocate per-cpu space twice the needed. For the code below
+ * usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
+ * if (usize < 0)
+ * return 0;
+ * ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
+ *
+ * If we have value_size = MAX_STACK_RAWTP * sizeof(__u64),
+ * verifier will complain that access "raw_data + usize"
+ * with size "max_len - usize" may be out of bound.
+ * The maximum "raw_data + usize" is "raw_data + max_len"
+ * and the maximum "max_len - usize" is "max_len", verifier
+ * concludes that the maximum buffer access range is
+ * "raw_data[0...max_len * 2 - 1]" and hence reject the program.
+ *
+ * Doubling the to-be-used max buffer size can fix this verifier
+ * issue and avoid complicated C programming massaging.
+ * This is an acceptable workaround since there is one entry here.
+ */
+struct bpf_map_def SEC("maps") rawdata_map = {
+ .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = MAX_STACK_RAWTP * sizeof(__u64) * 2,
+ .max_entries = 1,
+};
+
+SEC("tracepoint/raw_syscalls/sys_enter")
+int bpf_prog1(void *ctx)
+{
+ int max_len, max_buildid_len, usize, ksize, total_size;
+ struct stack_trace_t *data;
+ void *raw_data;
+ __u32 key = 0;
+
+ data = bpf_map_lookup_elem(&stackdata_map, &key);
+ if (!data)
+ return 0;
+
+ max_len = MAX_STACK_RAWTP * sizeof(__u64);
+ max_buildid_len = MAX_STACK_RAWTP * sizeof(struct bpf_stack_build_id);
+ data->pid = bpf_get_current_pid_tgid();
+ data->kern_stack_size = bpf_get_stack(ctx, data->kern_stack,
+ max_len, 0);
+ data->user_stack_size = bpf_get_stack(ctx, data->user_stack, max_len,
+ BPF_F_USER_STACK);
+ data->user_stack_buildid_size = bpf_get_stack(
+ ctx, data->user_stack_buildid, max_buildid_len,
+ BPF_F_USER_STACK | BPF_F_USER_BUILD_ID);
+ bpf_perf_event_output(ctx, &perfmap, 0, data, sizeof(*data));
+
+ /* write both kernel and user stacks to the same buffer */
+ raw_data = bpf_map_lookup_elem(&rawdata_map, &key);
+ if (!raw_data)
+ return 0;
+
+ usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
+ if (usize < 0)
+ return 0;
+
+ ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
+ if (ksize < 0)
+ return 0;
+
+ total_size = usize + ksize;
+ if (total_size > 0 && total_size <= max_len)
+ bpf_perf_event_output(ctx, &perfmap, 0, raw_data, total_size);
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1; /* ignored by tracepoints, required by libbpf.a */
diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.c b/tools/testing/selftests/bpf/test_lwt_seg6local.c
new file mode 100644
index 0000000..0575751
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.c
@@ -0,0 +1,437 @@
+#include <stddef.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <linux/seg6_local.h>
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define bpf_printk(fmt, ...) \
+({ \
+ char ____fmt[] = fmt; \
+ bpf_trace_printk(____fmt, sizeof(____fmt), \
+ ##__VA_ARGS__); \
+})
+
+/* Packet parsing state machine helpers. */
+#define cursor_advance(_cursor, _len) \
+ ({ void *_tmp = _cursor; _cursor += _len; _tmp; })
+
+#define SR6_FLAG_ALERT (1 << 4)
+
+#define htonll(x) ((bpf_htonl(1)) == 1 ? (x) : ((uint64_t)bpf_htonl((x) & \
+ 0xFFFFFFFF) << 32) | bpf_htonl((x) >> 32))
+#define ntohll(x) ((bpf_ntohl(1)) == 1 ? (x) : ((uint64_t)bpf_ntohl((x) & \
+ 0xFFFFFFFF) << 32) | bpf_ntohl((x) >> 32))
+#define BPF_PACKET_HEADER __attribute__((packed))
+
+struct ip6_t {
+ unsigned int ver:4;
+ unsigned int priority:8;
+ unsigned int flow_label:20;
+ unsigned short payload_len;
+ unsigned char next_header;
+ unsigned char hop_limit;
+ unsigned long long src_hi;
+ unsigned long long src_lo;
+ unsigned long long dst_hi;
+ unsigned long long dst_lo;
+} BPF_PACKET_HEADER;
+
+struct ip6_addr_t {
+ unsigned long long hi;
+ unsigned long long lo;
+} BPF_PACKET_HEADER;
+
+struct ip6_srh_t {
+ unsigned char nexthdr;
+ unsigned char hdrlen;
+ unsigned char type;
+ unsigned char segments_left;
+ unsigned char first_segment;
+ unsigned char flags;
+ unsigned short tag;
+
+ struct ip6_addr_t segments[0];
+} BPF_PACKET_HEADER;
+
+struct sr6_tlv_t {
+ unsigned char type;
+ unsigned char len;
+ unsigned char value[0];
+} BPF_PACKET_HEADER;
+
+__attribute__((always_inline)) struct ip6_srh_t *get_srh(struct __sk_buff *skb)
+{
+ void *cursor, *data_end;
+ struct ip6_srh_t *srh;
+ struct ip6_t *ip;
+ uint8_t *ipver;
+
+ data_end = (void *)(long)skb->data_end;
+ cursor = (void *)(long)skb->data;
+ ipver = (uint8_t *)cursor;
+
+ if ((void *)ipver + sizeof(*ipver) > data_end)
+ return NULL;
+
+ if ((*ipver >> 4) != 6)
+ return NULL;
+
+ ip = cursor_advance(cursor, sizeof(*ip));
+ if ((void *)ip + sizeof(*ip) > data_end)
+ return NULL;
+
+ if (ip->next_header != 43)
+ return NULL;
+
+ srh = cursor_advance(cursor, sizeof(*srh));
+ if ((void *)srh + sizeof(*srh) > data_end)
+ return NULL;
+
+ if (srh->type != 4)
+ return NULL;
+
+ return srh;
+}
+
+__attribute__((always_inline))
+int update_tlv_pad(struct __sk_buff *skb, uint32_t new_pad,
+ uint32_t old_pad, uint32_t pad_off)
+{
+ int err;
+
+ if (new_pad != old_pad) {
+ err = bpf_lwt_seg6_adjust_srh(skb, pad_off,
+ (int) new_pad - (int) old_pad);
+ if (err)
+ return err;
+ }
+
+ if (new_pad > 0) {
+ char pad_tlv_buf[16] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0};
+ struct sr6_tlv_t *pad_tlv = (struct sr6_tlv_t *) pad_tlv_buf;
+
+ pad_tlv->type = SR6_TLV_PADDING;
+ pad_tlv->len = new_pad - 2;
+
+ err = bpf_lwt_seg6_store_bytes(skb, pad_off,
+ (void *)pad_tlv_buf, new_pad);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+__attribute__((always_inline))
+int is_valid_tlv_boundary(struct __sk_buff *skb, struct ip6_srh_t *srh,
+ uint32_t *tlv_off, uint32_t *pad_size,
+ uint32_t *pad_off)
+{
+ uint32_t srh_off, cur_off;
+ int offset_valid = 0;
+ int err;
+
+ srh_off = (char *)srh - (char *)(long)skb->data;
+ // cur_off = end of segments, start of possible TLVs
+ cur_off = srh_off + sizeof(*srh) +
+ sizeof(struct ip6_addr_t) * (srh->first_segment + 1);
+
+ *pad_off = 0;
+
+ // we can only go as far as ~10 TLVs due to the BPF max stack size
+ #pragma clang loop unroll(full)
+ for (int i = 0; i < 10; i++) {
+ struct sr6_tlv_t tlv;
+
+ if (cur_off == *tlv_off)
+ offset_valid = 1;
+
+ if (cur_off >= srh_off + ((srh->hdrlen + 1) << 3))
+ break;
+
+ err = bpf_skb_load_bytes(skb, cur_off, &tlv, sizeof(tlv));
+ if (err)
+ return err;
+
+ if (tlv.type == SR6_TLV_PADDING) {
+ *pad_size = tlv.len + sizeof(tlv);
+ *pad_off = cur_off;
+
+ if (*tlv_off == srh_off) {
+ *tlv_off = cur_off;
+ offset_valid = 1;
+ }
+ break;
+
+ } else if (tlv.type == SR6_TLV_HMAC) {
+ break;
+ }
+
+ cur_off += sizeof(tlv) + tlv.len;
+ } // we reached the padding or HMAC TLVs, or the end of the SRH
+
+ if (*pad_off == 0)
+ *pad_off = cur_off;
+
+ if (*tlv_off == -1)
+ *tlv_off = cur_off;
+ else if (!offset_valid)
+ return -EINVAL;
+
+ return 0;
+}
+
+__attribute__((always_inline))
+int add_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh, uint32_t tlv_off,
+ struct sr6_tlv_t *itlv, uint8_t tlv_size)
+{
+ uint32_t srh_off = (char *)srh - (char *)(long)skb->data;
+ uint8_t len_remaining, new_pad;
+ uint32_t pad_off = 0;
+ uint32_t pad_size = 0;
+ uint32_t partial_srh_len;
+ int err;
+
+ if (tlv_off != -1)
+ tlv_off += srh_off;
+
+ if (itlv->type == SR6_TLV_PADDING || itlv->type == SR6_TLV_HMAC)
+ return -EINVAL;
+
+ err = is_valid_tlv_boundary(skb, srh, &tlv_off, &pad_size, &pad_off);
+ if (err)
+ return err;
+
+ err = bpf_lwt_seg6_adjust_srh(skb, tlv_off, sizeof(*itlv) + itlv->len);
+ if (err)
+ return err;
+
+ err = bpf_lwt_seg6_store_bytes(skb, tlv_off, (void *)itlv, tlv_size);
+ if (err)
+ return err;
+
+ // the following can't be moved inside update_tlv_pad because the
+ // bpf verifier has some issues with it
+ pad_off += sizeof(*itlv) + itlv->len;
+ partial_srh_len = pad_off - srh_off;
+ len_remaining = partial_srh_len % 8;
+ new_pad = 8 - len_remaining;
+
+ if (new_pad == 1) // cannot pad for 1 byte only
+ new_pad = 9;
+ else if (new_pad == 8)
+ new_pad = 0;
+
+ return update_tlv_pad(skb, new_pad, pad_size, pad_off);
+}
+
+__attribute__((always_inline))
+int delete_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh,
+ uint32_t tlv_off)
+{
+ uint32_t srh_off = (char *)srh - (char *)(long)skb->data;
+ uint8_t len_remaining, new_pad;
+ uint32_t partial_srh_len;
+ uint32_t pad_off = 0;
+ uint32_t pad_size = 0;
+ struct sr6_tlv_t tlv;
+ int err;
+
+ tlv_off += srh_off;
+
+ err = is_valid_tlv_boundary(skb, srh, &tlv_off, &pad_size, &pad_off);
+ if (err)
+ return err;
+
+ err = bpf_skb_load_bytes(skb, tlv_off, &tlv, sizeof(tlv));
+ if (err)
+ return err;
+
+ err = bpf_lwt_seg6_adjust_srh(skb, tlv_off, -(sizeof(tlv) + tlv.len));
+ if (err)
+ return err;
+
+ pad_off -= sizeof(tlv) + tlv.len;
+ partial_srh_len = pad_off - srh_off;
+ len_remaining = partial_srh_len % 8;
+ new_pad = 8 - len_remaining;
+ if (new_pad == 1) // cannot pad for 1 byte only
+ new_pad = 9;
+ else if (new_pad == 8)
+ new_pad = 0;
+
+ return update_tlv_pad(skb, new_pad, pad_size, pad_off);
+}
+
+__attribute__((always_inline))
+int has_egr_tlv(struct __sk_buff *skb, struct ip6_srh_t *srh)
+{
+ int tlv_offset = sizeof(struct ip6_t) + sizeof(struct ip6_srh_t) +
+ ((srh->first_segment + 1) << 4);
+ struct sr6_tlv_t tlv;
+
+ if (bpf_skb_load_bytes(skb, tlv_offset, &tlv, sizeof(struct sr6_tlv_t)))
+ return 0;
+
+ if (tlv.type == SR6_TLV_EGRESS && tlv.len == 18) {
+ struct ip6_addr_t egr_addr;
+
+ if (bpf_skb_load_bytes(skb, tlv_offset + 4, &egr_addr, 16))
+ return 0;
+
+ // check if egress TLV value is correct
+ if (ntohll(egr_addr.hi) == 0xfd00000000000000 &&
+ ntohll(egr_addr.lo) == 0x4)
+ return 1;
+ }
+
+ return 0;
+}
+
+// This function will push a SRH with segments fd00::1, fd00::2, fd00::3,
+// fd00::4
+SEC("encap_srh")
+int __encap_srh(struct __sk_buff *skb)
+{
+ unsigned long long hi = 0xfd00000000000000;
+ struct ip6_addr_t *seg;
+ struct ip6_srh_t *srh;
+ char srh_buf[72]; // room for 4 segments
+ int err;
+
+ srh = (struct ip6_srh_t *)srh_buf;
+ srh->nexthdr = 0;
+ srh->hdrlen = 8;
+ srh->type = 4;
+ srh->segments_left = 3;
+ srh->first_segment = 3;
+ srh->flags = 0;
+ srh->tag = 0;
+
+ seg = (struct ip6_addr_t *)((char *)srh + sizeof(*srh));
+
+ #pragma clang loop unroll(full)
+ for (unsigned long long lo = 0; lo < 4; lo++) {
+ seg->lo = htonll(4 - lo);
+ seg->hi = htonll(hi);
+ seg = (struct ip6_addr_t *)((char *)seg + sizeof(*seg));
+ }
+
+ err = bpf_lwt_push_encap(skb, 0, (void *)srh, sizeof(srh_buf));
+ if (err)
+ return BPF_DROP;
+
+ return BPF_REDIRECT;
+}
+
+// Add an Egress TLV fc00::4, add the flag A,
+// and apply End.X action to fc42::1
+SEC("add_egr_x")
+int __add_egr_x(struct __sk_buff *skb)
+{
+ unsigned long long hi = 0xfc42000000000000;
+ unsigned long long lo = 0x1;
+ struct ip6_srh_t *srh = get_srh(skb);
+ uint8_t new_flags = SR6_FLAG_ALERT;
+ struct ip6_addr_t addr;
+ int err, offset;
+
+ if (srh == NULL)
+ return BPF_DROP;
+
+ uint8_t tlv[20] = {2, 18, 0, 0, 0xfd, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4};
+
+ err = add_tlv(skb, srh, (srh->hdrlen+1) << 3,
+ (struct sr6_tlv_t *)&tlv, 20);
+ if (err)
+ return BPF_DROP;
+
+ offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, flags);
+ err = bpf_lwt_seg6_store_bytes(skb, offset,
+ (void *)&new_flags, sizeof(new_flags));
+ if (err)
+ return BPF_DROP;
+
+ addr.lo = htonll(lo);
+ addr.hi = htonll(hi);
+ err = bpf_lwt_seg6_action(skb, SEG6_LOCAL_ACTION_END_X,
+ (void *)&addr, sizeof(addr));
+ if (err)
+ return BPF_DROP;
+ return BPF_REDIRECT;
+}
+
+// Pop the Egress TLV, reset the flags, change the tag 2442 and finally do a
+// simple End action
+SEC("pop_egr")
+int __pop_egr(struct __sk_buff *skb)
+{
+ struct ip6_srh_t *srh = get_srh(skb);
+ uint16_t new_tag = bpf_htons(2442);
+ uint8_t new_flags = 0;
+ int err, offset;
+
+ if (srh == NULL)
+ return BPF_DROP;
+
+ if (srh->flags != SR6_FLAG_ALERT)
+ return BPF_DROP;
+
+ if (srh->hdrlen != 11) // 4 segments + Egress TLV + Padding TLV
+ return BPF_DROP;
+
+ if (!has_egr_tlv(skb, srh))
+ return BPF_DROP;
+
+ err = delete_tlv(skb, srh, 8 + (srh->first_segment + 1) * 16);
+ if (err)
+ return BPF_DROP;
+
+ offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, flags);
+ if (bpf_lwt_seg6_store_bytes(skb, offset, (void *)&new_flags,
+ sizeof(new_flags)))
+ return BPF_DROP;
+
+ offset = sizeof(struct ip6_t) + offsetof(struct ip6_srh_t, tag);
+ if (bpf_lwt_seg6_store_bytes(skb, offset, (void *)&new_tag,
+ sizeof(new_tag)))
+ return BPF_DROP;
+
+ return BPF_OK;
+}
+
+// Inspect if the Egress TLV and flag have been removed, if the tag is correct,
+// then apply a End.T action to reach the last segment
+SEC("inspect_t")
+int __inspect_t(struct __sk_buff *skb)
+{
+ struct ip6_srh_t *srh = get_srh(skb);
+ int table = 117;
+ int err;
+
+ if (srh == NULL)
+ return BPF_DROP;
+
+ if (srh->flags != 0)
+ return BPF_DROP;
+
+ if (srh->tag != bpf_htons(2442))
+ return BPF_DROP;
+
+ if (srh->hdrlen != 8) // 4 segments
+ return BPF_DROP;
+
+ err = bpf_lwt_seg6_action(skb, SEG6_LOCAL_ACTION_END_T,
+ (void *)&table, sizeof(table));
+
+ if (err)
+ return BPF_DROP;
+
+ return BPF_REDIRECT;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_lwt_seg6local.sh b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
new file mode 100755
index 0000000..1c77994
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_seg6local.sh
@@ -0,0 +1,140 @@
+#!/bin/bash
+# Connects 6 network namespaces through veths.
+# Each NS may have different IPv6 global scope addresses :
+# NS1 ---- NS2 ---- NS3 ---- NS4 ---- NS5 ---- NS6
+# fb00::1 fd00::1 fd00::2 fd00::3 fb00::6
+# fc42::1 fd00::4
+#
+# All IPv6 packets going to fb00::/16 through NS2 will be encapsulated in a
+# IPv6 header with a Segment Routing Header, with segments :
+# fd00::1 -> fd00::2 -> fd00::3 -> fd00::4
+#
+# 3 fd00::/16 IPv6 addresses are binded to seg6local End.BPF actions :
+# - fd00::1 : add a TLV, change the flags and apply a End.X action to fc42::1
+# - fd00::2 : remove the TLV, change the flags, add a tag
+# - fd00::3 : apply an End.T action to fd00::4, through routing table 117
+#
+# fd00::4 is a simple Segment Routing node decapsulating the inner IPv6 packet.
+# Each End.BPF action will validate the operations applied on the SRH by the
+# previous BPF program in the chain, otherwise the packet is dropped.
+#
+# An UDP datagram is sent from fb00::1 to fb00::6. The test succeeds if this
+# datagram can be read on NS6 when binding to fb00::6.
+
+TMP_FILE="/tmp/selftest_lwt_seg6local.txt"
+
+cleanup()
+{
+ if [ "$?" = "0" ]; then
+ echo "selftests: test_lwt_seg6local [PASS]";
+ else
+ echo "selftests: test_lwt_seg6local [FAILED]";
+ fi
+
+ set +e
+ ip netns del ns1 2> /dev/null
+ ip netns del ns2 2> /dev/null
+ ip netns del ns3 2> /dev/null
+ ip netns del ns4 2> /dev/null
+ ip netns del ns5 2> /dev/null
+ ip netns del ns6 2> /dev/null
+ rm -f $TMP_FILE
+}
+
+set -e
+
+ip netns add ns1
+ip netns add ns2
+ip netns add ns3
+ip netns add ns4
+ip netns add ns5
+ip netns add ns6
+
+trap cleanup 0 2 3 6 9
+
+ip link add veth1 type veth peer name veth2
+ip link add veth3 type veth peer name veth4
+ip link add veth5 type veth peer name veth6
+ip link add veth7 type veth peer name veth8
+ip link add veth9 type veth peer name veth10
+
+ip link set veth1 netns ns1
+ip link set veth2 netns ns2
+ip link set veth3 netns ns2
+ip link set veth4 netns ns3
+ip link set veth5 netns ns3
+ip link set veth6 netns ns4
+ip link set veth7 netns ns4
+ip link set veth8 netns ns5
+ip link set veth9 netns ns5
+ip link set veth10 netns ns6
+
+ip netns exec ns1 ip link set dev veth1 up
+ip netns exec ns2 ip link set dev veth2 up
+ip netns exec ns2 ip link set dev veth3 up
+ip netns exec ns3 ip link set dev veth4 up
+ip netns exec ns3 ip link set dev veth5 up
+ip netns exec ns4 ip link set dev veth6 up
+ip netns exec ns4 ip link set dev veth7 up
+ip netns exec ns5 ip link set dev veth8 up
+ip netns exec ns5 ip link set dev veth9 up
+ip netns exec ns6 ip link set dev veth10 up
+ip netns exec ns6 ip link set dev lo up
+
+# All link scope addresses and routes required between veths
+ip netns exec ns1 ip -6 addr add fb00::12/16 dev veth1 scope link
+ip netns exec ns1 ip -6 route add fb00::21 dev veth1 scope link
+ip netns exec ns2 ip -6 addr add fb00::21/16 dev veth2 scope link
+ip netns exec ns2 ip -6 addr add fb00::34/16 dev veth3 scope link
+ip netns exec ns2 ip -6 route add fb00::43 dev veth3 scope link
+ip netns exec ns3 ip -6 route add fb00::65 dev veth5 scope link
+ip netns exec ns3 ip -6 addr add fb00::43/16 dev veth4 scope link
+ip netns exec ns3 ip -6 addr add fb00::56/16 dev veth5 scope link
+ip netns exec ns4 ip -6 addr add fb00::65/16 dev veth6 scope link
+ip netns exec ns4 ip -6 addr add fb00::78/16 dev veth7 scope link
+ip netns exec ns4 ip -6 route add fb00::87 dev veth7 scope link
+ip netns exec ns5 ip -6 addr add fb00::87/16 dev veth8 scope link
+ip netns exec ns5 ip -6 addr add fb00::910/16 dev veth9 scope link
+ip netns exec ns5 ip -6 route add fb00::109 dev veth9 scope link
+ip netns exec ns5 ip -6 route add fb00::109 table 117 dev veth9 scope link
+ip netns exec ns6 ip -6 addr add fb00::109/16 dev veth10 scope link
+
+ip netns exec ns1 ip -6 addr add fb00::1/16 dev lo
+ip netns exec ns1 ip -6 route add fb00::6 dev veth1 via fb00::21
+
+ip netns exec ns2 ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o sec encap_srh dev veth2
+ip netns exec ns2 ip -6 route add fd00::1 dev veth3 via fb00::43 scope link
+
+ip netns exec ns3 ip -6 route add fc42::1 dev veth5 via fb00::65
+ip netns exec ns3 ip -6 route add fd00::1 encap seg6local action End.BPF obj test_lwt_seg6local.o sec add_egr_x dev veth4
+
+ip netns exec ns4 ip -6 route add fd00::2 encap seg6local action End.BPF obj test_lwt_seg6local.o sec pop_egr dev veth6
+ip netns exec ns4 ip -6 addr add fc42::1 dev lo
+ip netns exec ns4 ip -6 route add fd00::3 dev veth7 via fb00::87
+
+ip netns exec ns5 ip -6 route add fd00::4 table 117 dev veth9 via fb00::109
+ip netns exec ns5 ip -6 route add fd00::3 encap seg6local action End.BPF obj test_lwt_seg6local.o sec inspect_t dev veth8
+
+ip netns exec ns6 ip -6 addr add fb00::6/16 dev lo
+ip netns exec ns6 ip -6 addr add fd00::4/16 dev lo
+
+ip netns exec ns1 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns2 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns3 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns4 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec ns5 sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+
+ip netns exec ns6 sysctl net.ipv6.conf.all.seg6_enabled=1 > /dev/null
+ip netns exec ns6 sysctl net.ipv6.conf.lo.seg6_enabled=1 > /dev/null
+ip netns exec ns6 sysctl net.ipv6.conf.veth10.seg6_enabled=1 > /dev/null
+
+ip netns exec ns6 nc -l -6 -u -d 7330 > $TMP_FILE &
+ip netns exec ns1 bash -c "echo 'foobar' | nc -w0 -6 -u -p 2121 -s fb00::1 fb00::6 7330"
+sleep 5 # wait enough time to ensure the UDP datagram arrived to the last segment
+kill -INT $!
+
+if [[ $(< $TMP_FILE) != "foobar" ]]; then
+ exit 1
+fi
+
+exit 0
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 4123d0a..0ef6820 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -38,8 +38,10 @@
#include "bpf_util.h"
#include "bpf_endian.h"
#include "bpf_rlimit.h"
+#include "trace_helpers.h"
static int error_cnt, pass_cnt;
+static bool jit_enabled;
#define MAGIC_BYTES 123
@@ -166,6 +168,37 @@
bpf_object__close(obj);
}
+static void test_xdp_adjust_tail(void)
+{
+ const char *file = "./test_adjust_tail.o";
+ struct bpf_object *obj;
+ char buf[128];
+ __u32 duration, retval, size;
+ int err, prog_fd;
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+ if (err) {
+ error_cnt++;
+ return;
+ }
+
+ err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+ buf, &size, &retval, &duration);
+
+ CHECK(err || errno || retval != XDP_DROP,
+ "ipv4", "err %d errno %d retval %d size %d\n",
+ err, errno, retval, size);
+
+ err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6),
+ buf, &size, &retval, &duration);
+ CHECK(err || errno || retval != XDP_TX || size != 54,
+ "ipv6", "err %d errno %d retval %d size %d\n",
+ err, errno, retval, size);
+ bpf_object__close(obj);
+}
+
+
+
#define MAGIC_VAL 0x1234
#define NUM_ITER 100000
#define VIP_NUM 5
@@ -360,13 +393,30 @@
return (__u64) (unsigned long) ptr;
}
+static bool is_jit_enabled(void)
+{
+ const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable";
+ bool enabled = false;
+ int sysctl_fd;
+
+ sysctl_fd = open(jit_sysctl, 0, O_RDONLY);
+ if (sysctl_fd != -1) {
+ char tmpc;
+
+ if (read(sysctl_fd, &tmpc, sizeof(tmpc)) == 1)
+ enabled = (tmpc != '0');
+ close(sysctl_fd);
+ }
+
+ return enabled;
+}
+
static void test_bpf_obj_id(void)
{
const __u64 array_magic_value = 0xfaceb00c;
const __u32 array_key = 0;
const int nr_iters = 2;
const char *file = "./test_obj_id.o";
- const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable";
const char *expected_prog_name = "test_obj_id";
const char *expected_map_name = "test_map_id";
const __u64 nsec_per_sec = 1000000000;
@@ -383,20 +433,11 @@
char jited_insns[128], xlated_insns[128], zeros[128];
__u32 i, next_id, info_len, nr_id_found, duration = 0;
struct timespec real_time_ts, boot_time_ts;
- int sysctl_fd, jit_enabled = 0, err = 0;
+ int err = 0;
__u64 array_value;
uid_t my_uid = getuid();
time_t now, load_time;
- sysctl_fd = open(jit_sysctl, 0, O_RDONLY);
- if (sysctl_fd != -1) {
- char tmpc;
-
- if (read(sysctl_fd, &tmpc, sizeof(tmpc)) == 1)
- jit_enabled = (tmpc != '0');
- close(sysctl_fd);
- }
-
err = bpf_prog_get_fd_by_id(0);
CHECK(err >= 0 || errno != ENOENT,
"get-fd-by-notexist-prog-id", "err %d errno %d\n", err, errno);
@@ -865,11 +906,47 @@
return 0;
}
+static int compare_stack_ips(int smap_fd, int amap_fd, int stack_trace_len)
+{
+ __u32 key, next_key, *cur_key_p, *next_key_p;
+ char *val_buf1, *val_buf2;
+ int i, err = 0;
+
+ val_buf1 = malloc(stack_trace_len);
+ val_buf2 = malloc(stack_trace_len);
+ cur_key_p = NULL;
+ next_key_p = &key;
+ while (bpf_map_get_next_key(smap_fd, cur_key_p, next_key_p) == 0) {
+ err = bpf_map_lookup_elem(smap_fd, next_key_p, val_buf1);
+ if (err)
+ goto out;
+ err = bpf_map_lookup_elem(amap_fd, next_key_p, val_buf2);
+ if (err)
+ goto out;
+ for (i = 0; i < stack_trace_len; i++) {
+ if (val_buf1[i] != val_buf2[i]) {
+ err = -1;
+ goto out;
+ }
+ }
+ key = *next_key_p;
+ cur_key_p = &key;
+ next_key_p = &next_key;
+ }
+ if (errno != ENOENT)
+ err = -1;
+
+out:
+ free(val_buf1);
+ free(val_buf2);
+ return err;
+}
+
static void test_stacktrace_map()
{
- int control_map_fd, stackid_hmap_fd, stackmap_fd;
+ int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd;
const char *file = "./test_stacktrace_map.o";
- int bytes, efd, err, pmu_fd, prog_fd;
+ int bytes, efd, err, pmu_fd, prog_fd, stack_trace_len;
struct perf_event_attr attr = {};
__u32 key, val, duration = 0;
struct bpf_object *obj;
@@ -925,6 +1002,10 @@
if (stackmap_fd < 0)
goto disable_pmu;
+ stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
+ if (stack_amap_fd < 0)
+ goto disable_pmu;
+
/* give some time for bpf program run */
sleep(1);
@@ -946,6 +1027,12 @@
"err %d errno %d\n", err, errno))
goto disable_pmu_noerr;
+ stack_trace_len = PERF_MAX_STACK_DEPTH * sizeof(__u64);
+ err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
+ if (CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu_noerr;
+
goto disable_pmu_noerr;
disable_pmu:
error_cnt++;
@@ -1039,9 +1126,9 @@
static void test_stacktrace_build_id(void)
{
- int control_map_fd, stackid_hmap_fd, stackmap_fd;
+ int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd;
const char *file = "./test_stacktrace_build_id.o";
- int bytes, efd, err, pmu_fd, prog_fd;
+ int bytes, efd, err, pmu_fd, prog_fd, stack_trace_len;
struct perf_event_attr attr = {};
__u32 key, previous_key, val, duration = 0;
struct bpf_object *obj;
@@ -1106,6 +1193,11 @@
err, errno))
goto disable_pmu;
+ stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
+ if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
== 0);
assert(system("./urandom_read") == 0);
@@ -1157,8 +1249,15 @@
previous_key = key;
} while (bpf_map_get_next_key(stackmap_fd, &previous_key, &key) == 0);
- CHECK(build_id_matches < 1, "build id match",
- "Didn't find expected build ID from the map\n");
+ if (CHECK(build_id_matches < 1, "build id match",
+ "Didn't find expected build ID from the map\n"))
+ goto disable_pmu;
+
+ stack_trace_len = PERF_MAX_STACK_DEPTH
+ * sizeof(struct bpf_stack_build_id);
+ err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
+ CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
+ "err %d errno %d\n", err, errno);
disable_pmu:
ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
@@ -1173,10 +1272,439 @@
return;
}
+static void test_stacktrace_build_id_nmi(void)
+{
+ int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd;
+ const char *file = "./test_stacktrace_build_id.o";
+ int err, pmu_fd, prog_fd;
+ struct perf_event_attr attr = {
+ .sample_freq = 5000,
+ .freq = 1,
+ .type = PERF_TYPE_HARDWARE,
+ .config = PERF_COUNT_HW_CPU_CYCLES,
+ };
+ __u32 key, previous_key, val, duration = 0;
+ struct bpf_object *obj;
+ char buf[256];
+ int i, j;
+ struct bpf_stack_build_id id_offs[PERF_MAX_STACK_DEPTH];
+ int build_id_matches = 0;
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
+ if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
+ return;
+
+ pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
+ 0 /* cpu 0 */, -1 /* group id */,
+ 0 /* flags */);
+ if (CHECK(pmu_fd < 0, "perf_event_open",
+ "err %d errno %d. Does the test host support PERF_COUNT_HW_CPU_CYCLES?\n",
+ pmu_fd, errno))
+ goto close_prog;
+
+ err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+ if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
+ err, errno))
+ goto close_pmu;
+
+ err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+ if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
+ err, errno))
+ goto disable_pmu;
+
+ /* find map fds */
+ control_map_fd = bpf_find_map(__func__, obj, "control_map");
+ if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
+ stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
+ if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
+ stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
+ if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
+ err, errno))
+ goto disable_pmu;
+
+ stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
+ if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
+ assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
+ == 0);
+ assert(system("taskset 0x1 ./urandom_read 100000") == 0);
+ /* disable stack trace collection */
+ key = 0;
+ val = 1;
+ bpf_map_update_elem(control_map_fd, &key, &val, 0);
+
+ /* for every element in stackid_hmap, we can find a corresponding one
+ * in stackmap, and vise versa.
+ */
+ err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
+ if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
+ err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
+ if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
+ err = extract_build_id(buf, 256);
+
+ if (CHECK(err, "get build_id with readelf",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
+ err = bpf_map_get_next_key(stackmap_fd, NULL, &key);
+ if (CHECK(err, "get_next_key from stackmap",
+ "err %d, errno %d\n", err, errno))
+ goto disable_pmu;
+
+ do {
+ char build_id[64];
+
+ err = bpf_map_lookup_elem(stackmap_fd, &key, id_offs);
+ if (CHECK(err, "lookup_elem from stackmap",
+ "err %d, errno %d\n", err, errno))
+ goto disable_pmu;
+ for (i = 0; i < PERF_MAX_STACK_DEPTH; ++i)
+ if (id_offs[i].status == BPF_STACK_BUILD_ID_VALID &&
+ id_offs[i].offset != 0) {
+ for (j = 0; j < 20; ++j)
+ sprintf(build_id + 2 * j, "%02x",
+ id_offs[i].build_id[j] & 0xff);
+ if (strstr(buf, build_id) != NULL)
+ build_id_matches = 1;
+ }
+ previous_key = key;
+ } while (bpf_map_get_next_key(stackmap_fd, &previous_key, &key) == 0);
+
+ if (CHECK(build_id_matches < 1, "build id match",
+ "Didn't find expected build ID from the map\n"))
+ goto disable_pmu;
+
+ /*
+ * We intentionally skip compare_stack_ips(). This is because we
+ * only support one in_nmi() ips-to-build_id translation per cpu
+ * at any time, thus stack_amap here will always fallback to
+ * BPF_STACK_BUILD_ID_IP;
+ */
+
+disable_pmu:
+ ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
+
+close_pmu:
+ close(pmu_fd);
+
+close_prog:
+ bpf_object__close(obj);
+}
+
+#define MAX_CNT_RAWTP 10ull
+#define MAX_STACK_RAWTP 100
+struct get_stack_trace_t {
+ int pid;
+ int kern_stack_size;
+ int user_stack_size;
+ int user_stack_buildid_size;
+ __u64 kern_stack[MAX_STACK_RAWTP];
+ __u64 user_stack[MAX_STACK_RAWTP];
+ struct bpf_stack_build_id user_stack_buildid[MAX_STACK_RAWTP];
+};
+
+static int get_stack_print_output(void *data, int size)
+{
+ bool good_kern_stack = false, good_user_stack = false;
+ const char *nonjit_func = "___bpf_prog_run";
+ struct get_stack_trace_t *e = data;
+ int i, num_stack;
+ static __u64 cnt;
+ struct ksym *ks;
+
+ cnt++;
+
+ if (size < sizeof(struct get_stack_trace_t)) {
+ __u64 *raw_data = data;
+ bool found = false;
+
+ num_stack = size / sizeof(__u64);
+ /* If jit is enabled, we do not have a good way to
+ * verify the sanity of the kernel stack. So we
+ * just assume it is good if the stack is not empty.
+ * This could be improved in the future.
+ */
+ if (jit_enabled) {
+ found = num_stack > 0;
+ } else {
+ for (i = 0; i < num_stack; i++) {
+ ks = ksym_search(raw_data[i]);
+ if (strcmp(ks->name, nonjit_func) == 0) {
+ found = true;
+ break;
+ }
+ }
+ }
+ if (found) {
+ good_kern_stack = true;
+ good_user_stack = true;
+ }
+ } else {
+ num_stack = e->kern_stack_size / sizeof(__u64);
+ if (jit_enabled) {
+ good_kern_stack = num_stack > 0;
+ } else {
+ for (i = 0; i < num_stack; i++) {
+ ks = ksym_search(e->kern_stack[i]);
+ if (strcmp(ks->name, nonjit_func) == 0) {
+ good_kern_stack = true;
+ break;
+ }
+ }
+ }
+ if (e->user_stack_size > 0 && e->user_stack_buildid_size > 0)
+ good_user_stack = true;
+ }
+ if (!good_kern_stack || !good_user_stack)
+ return LIBBPF_PERF_EVENT_ERROR;
+
+ if (cnt == MAX_CNT_RAWTP)
+ return LIBBPF_PERF_EVENT_DONE;
+
+ return LIBBPF_PERF_EVENT_CONT;
+}
+
+static void test_get_stack_raw_tp(void)
+{
+ const char *file = "./test_get_stack_rawtp.o";
+ int i, efd, err, prog_fd, pmu_fd, perfmap_fd;
+ struct perf_event_attr attr = {};
+ struct timespec tv = {0, 10};
+ __u32 key = 0, duration = 0;
+ struct bpf_object *obj;
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd);
+ if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
+ return;
+
+ efd = bpf_raw_tracepoint_open("sys_enter", prog_fd);
+ if (CHECK(efd < 0, "raw_tp_open", "err %d errno %d\n", efd, errno))
+ goto close_prog;
+
+ perfmap_fd = bpf_find_map(__func__, obj, "perfmap");
+ if (CHECK(perfmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+ perfmap_fd, errno))
+ goto close_prog;
+
+ err = load_kallsyms();
+ if (CHECK(err < 0, "load_kallsyms", "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ attr.sample_type = PERF_SAMPLE_RAW;
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_BPF_OUTPUT;
+ pmu_fd = syscall(__NR_perf_event_open, &attr, getpid()/*pid*/, -1/*cpu*/,
+ -1/*group_fd*/, 0);
+ if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+ errno))
+ goto close_prog;
+
+ err = bpf_map_update_elem(perfmap_fd, &key, &pmu_fd, BPF_ANY);
+ if (CHECK(err < 0, "bpf_map_update_elem", "err %d errno %d\n", err,
+ errno))
+ goto close_prog;
+
+ err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+ if (CHECK(err < 0, "ioctl PERF_EVENT_IOC_ENABLE", "err %d errno %d\n",
+ err, errno))
+ goto close_prog;
+
+ err = perf_event_mmap(pmu_fd);
+ if (CHECK(err < 0, "perf_event_mmap", "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ /* trigger some syscall action */
+ for (i = 0; i < MAX_CNT_RAWTP; i++)
+ nanosleep(&tv, NULL);
+
+ err = perf_event_poller(pmu_fd, get_stack_print_output);
+ if (CHECK(err < 0, "perf_event_poller", "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ goto close_prog_noerr;
+close_prog:
+ error_cnt++;
+close_prog_noerr:
+ bpf_object__close(obj);
+}
+
+static void test_task_fd_query_rawtp(void)
+{
+ const char *file = "./test_get_stack_rawtp.o";
+ __u64 probe_offset, probe_addr;
+ __u32 len, prog_id, fd_type;
+ struct bpf_object *obj;
+ int efd, err, prog_fd;
+ __u32 duration = 0;
+ char buf[256];
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd);
+ if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
+ return;
+
+ efd = bpf_raw_tracepoint_open("sys_enter", prog_fd);
+ if (CHECK(efd < 0, "raw_tp_open", "err %d errno %d\n", efd, errno))
+ goto close_prog;
+
+ /* query (getpid(), efd) */
+ len = sizeof(buf);
+ err = bpf_task_fd_query(getpid(), efd, 0, buf, &len, &prog_id,
+ &fd_type, &probe_offset, &probe_addr);
+ if (CHECK(err < 0, "bpf_task_fd_query", "err %d errno %d\n", err,
+ errno))
+ goto close_prog;
+
+ err = fd_type == BPF_FD_TYPE_RAW_TRACEPOINT &&
+ strcmp(buf, "sys_enter") == 0;
+ if (CHECK(!err, "check_results", "fd_type %d tp_name %s\n",
+ fd_type, buf))
+ goto close_prog;
+
+ /* test zero len */
+ len = 0;
+ err = bpf_task_fd_query(getpid(), efd, 0, buf, &len, &prog_id,
+ &fd_type, &probe_offset, &probe_addr);
+ if (CHECK(err < 0, "bpf_task_fd_query (len = 0)", "err %d errno %d\n",
+ err, errno))
+ goto close_prog;
+ err = fd_type == BPF_FD_TYPE_RAW_TRACEPOINT &&
+ len == strlen("sys_enter");
+ if (CHECK(!err, "check_results", "fd_type %d len %u\n", fd_type, len))
+ goto close_prog;
+
+ /* test empty buffer */
+ len = sizeof(buf);
+ err = bpf_task_fd_query(getpid(), efd, 0, 0, &len, &prog_id,
+ &fd_type, &probe_offset, &probe_addr);
+ if (CHECK(err < 0, "bpf_task_fd_query (buf = 0)", "err %d errno %d\n",
+ err, errno))
+ goto close_prog;
+ err = fd_type == BPF_FD_TYPE_RAW_TRACEPOINT &&
+ len == strlen("sys_enter");
+ if (CHECK(!err, "check_results", "fd_type %d len %u\n", fd_type, len))
+ goto close_prog;
+
+ /* test smaller buffer */
+ len = 3;
+ err = bpf_task_fd_query(getpid(), efd, 0, buf, &len, &prog_id,
+ &fd_type, &probe_offset, &probe_addr);
+ if (CHECK(err >= 0 || errno != ENOSPC, "bpf_task_fd_query (len = 3)",
+ "err %d errno %d\n", err, errno))
+ goto close_prog;
+ err = fd_type == BPF_FD_TYPE_RAW_TRACEPOINT &&
+ len == strlen("sys_enter") &&
+ strcmp(buf, "sy") == 0;
+ if (CHECK(!err, "check_results", "fd_type %d len %u\n", fd_type, len))
+ goto close_prog;
+
+ goto close_prog_noerr;
+close_prog:
+ error_cnt++;
+close_prog_noerr:
+ bpf_object__close(obj);
+}
+
+static void test_task_fd_query_tp_core(const char *probe_name,
+ const char *tp_name)
+{
+ const char *file = "./test_tracepoint.o";
+ int err, bytes, efd, prog_fd, pmu_fd;
+ struct perf_event_attr attr = {};
+ __u64 probe_offset, probe_addr;
+ __u32 len, prog_id, fd_type;
+ struct bpf_object *obj;
+ __u32 duration = 0;
+ char buf[256];
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+ if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ snprintf(buf, sizeof(buf),
+ "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+ efd = open(buf, O_RDONLY, 0);
+ if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+ goto close_prog;
+ bytes = read(efd, buf, sizeof(buf));
+ close(efd);
+ if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+ "bytes %d errno %d\n", bytes, errno))
+ goto close_prog;
+
+ attr.config = strtol(buf, NULL, 0);
+ attr.type = PERF_TYPE_TRACEPOINT;
+ attr.sample_type = PERF_SAMPLE_RAW;
+ attr.sample_period = 1;
+ attr.wakeup_events = 1;
+ pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
+ 0 /* cpu 0 */, -1 /* group id */,
+ 0 /* flags */);
+ if (CHECK(err, "perf_event_open", "err %d errno %d\n", err, errno))
+ goto close_pmu;
+
+ err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+ if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+ errno))
+ goto close_pmu;
+
+ err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+ if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+ errno))
+ goto close_pmu;
+
+ /* query (getpid(), pmu_fd) */
+ len = sizeof(buf);
+ err = bpf_task_fd_query(getpid(), pmu_fd, 0, buf, &len, &prog_id,
+ &fd_type, &probe_offset, &probe_addr);
+ if (CHECK(err < 0, "bpf_task_fd_query", "err %d errno %d\n", err,
+ errno))
+ goto close_pmu;
+
+ err = (fd_type == BPF_FD_TYPE_TRACEPOINT) && !strcmp(buf, tp_name);
+ if (CHECK(!err, "check_results", "fd_type %d tp_name %s\n",
+ fd_type, buf))
+ goto close_pmu;
+
+ close(pmu_fd);
+ goto close_prog_noerr;
+
+close_pmu:
+ close(pmu_fd);
+close_prog:
+ error_cnt++;
+close_prog_noerr:
+ bpf_object__close(obj);
+}
+
+static void test_task_fd_query_tp(void)
+{
+ test_task_fd_query_tp_core("sched/sched_switch",
+ "sched_switch");
+ test_task_fd_query_tp_core("syscalls/sys_enter_read",
+ "sys_enter_read");
+}
+
int main(void)
{
+ jit_enabled = is_jit_enabled();
+
test_pkt_access();
test_xdp();
+ test_xdp_adjust_tail();
test_l4lb_all();
test_xdp_noinline();
test_tcp_estats();
@@ -1186,7 +1714,11 @@
test_tp_attach_query();
test_stacktrace_map();
test_stacktrace_build_id();
+ test_stacktrace_build_id_nmi();
test_stacktrace_map_raw_tp();
+ test_get_stack_raw_tp();
+ test_task_fd_query_rawtp();
+ test_task_fd_query_tp();
printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
diff --git a/tools/testing/selftests/bpf/test_sockhash_kern.c b/tools/testing/selftests/bpf/test_sockhash_kern.c
new file mode 100644
index 0000000..e675591
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sockhash_kern.c
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
+#undef SOCKMAP
+#define TEST_MAP_TYPE BPF_MAP_TYPE_SOCKHASH
+#include "./test_sockmap_kern.h"
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
new file mode 100644
index 0000000..eb17fae
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -0,0 +1,1477 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/select.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/ioctl.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <fcntl.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <sched.h>
+
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <sys/types.h>
+#include <sys/sendfile.h>
+
+#include <linux/netlink.h>
+#include <linux/socket.h>
+#include <linux/sock_diag.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <assert.h>
+#include <libgen.h>
+
+#include <getopt.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_util.h"
+#include "bpf_rlimit.h"
+#include "cgroup_helpers.h"
+
+int running;
+static void running_handler(int a);
+
+/* randomly selected ports for testing on lo */
+#define S1_PORT 10000
+#define S2_PORT 10001
+
+#define BPF_SOCKMAP_FILENAME "test_sockmap_kern.o"
+#define BPF_SOCKHASH_FILENAME "test_sockhash_kern.o"
+#define CG_PATH "/sockmap"
+
+/* global sockets */
+int s1, s2, c1, c2, p1, p2;
+int test_cnt;
+int passed;
+int failed;
+int map_fd[8];
+struct bpf_map *maps[8];
+int prog_fd[11];
+
+int txmsg_pass;
+int txmsg_noisy;
+int txmsg_redir;
+int txmsg_redir_noisy;
+int txmsg_drop;
+int txmsg_apply;
+int txmsg_cork;
+int txmsg_start;
+int txmsg_end;
+int txmsg_ingress;
+int txmsg_skb;
+
+static const struct option long_options[] = {
+ {"help", no_argument, NULL, 'h' },
+ {"cgroup", required_argument, NULL, 'c' },
+ {"rate", required_argument, NULL, 'r' },
+ {"verbose", no_argument, NULL, 'v' },
+ {"iov_count", required_argument, NULL, 'i' },
+ {"length", required_argument, NULL, 'l' },
+ {"test", required_argument, NULL, 't' },
+ {"data_test", no_argument, NULL, 'd' },
+ {"txmsg", no_argument, &txmsg_pass, 1 },
+ {"txmsg_noisy", no_argument, &txmsg_noisy, 1 },
+ {"txmsg_redir", no_argument, &txmsg_redir, 1 },
+ {"txmsg_redir_noisy", no_argument, &txmsg_redir_noisy, 1},
+ {"txmsg_drop", no_argument, &txmsg_drop, 1 },
+ {"txmsg_apply", required_argument, NULL, 'a'},
+ {"txmsg_cork", required_argument, NULL, 'k'},
+ {"txmsg_start", required_argument, NULL, 's'},
+ {"txmsg_end", required_argument, NULL, 'e'},
+ {"txmsg_ingress", no_argument, &txmsg_ingress, 1 },
+ {"txmsg_skb", no_argument, &txmsg_skb, 1 },
+ {0, 0, NULL, 0 }
+};
+
+static void usage(char *argv[])
+{
+ int i;
+
+ printf(" Usage: %s --cgroup <cgroup_path>\n", argv[0]);
+ printf(" options:\n");
+ for (i = 0; long_options[i].name != 0; i++) {
+ printf(" --%-12s", long_options[i].name);
+ if (long_options[i].flag != NULL)
+ printf(" flag (internal value:%d)\n",
+ *long_options[i].flag);
+ else
+ printf(" -%c\n", long_options[i].val);
+ }
+ printf("\n");
+}
+
+static int sockmap_init_sockets(int verbose)
+{
+ int i, err, one = 1;
+ struct sockaddr_in addr;
+ int *fds[4] = {&s1, &s2, &c1, &c2};
+
+ s1 = s2 = p1 = p2 = c1 = c2 = 0;
+
+ /* Init sockets */
+ for (i = 0; i < 4; i++) {
+ *fds[i] = socket(AF_INET, SOCK_STREAM, 0);
+ if (*fds[i] < 0) {
+ perror("socket s1 failed()");
+ return errno;
+ }
+ }
+
+ /* Allow reuse */
+ for (i = 0; i < 2; i++) {
+ err = setsockopt(*fds[i], SOL_SOCKET, SO_REUSEADDR,
+ (char *)&one, sizeof(one));
+ if (err) {
+ perror("setsockopt failed()");
+ return errno;
+ }
+ }
+
+ /* Non-blocking sockets */
+ for (i = 0; i < 2; i++) {
+ err = ioctl(*fds[i], FIONBIO, (char *)&one);
+ if (err < 0) {
+ perror("ioctl s1 failed()");
+ return errno;
+ }
+ }
+
+ /* Bind server sockets */
+ memset(&addr, 0, sizeof(struct sockaddr_in));
+ addr.sin_family = AF_INET;
+ addr.sin_addr.s_addr = inet_addr("127.0.0.1");
+
+ addr.sin_port = htons(S1_PORT);
+ err = bind(s1, (struct sockaddr *)&addr, sizeof(addr));
+ if (err < 0) {
+ perror("bind s1 failed()\n");
+ return errno;
+ }
+
+ addr.sin_port = htons(S2_PORT);
+ err = bind(s2, (struct sockaddr *)&addr, sizeof(addr));
+ if (err < 0) {
+ perror("bind s2 failed()\n");
+ return errno;
+ }
+
+ /* Listen server sockets */
+ addr.sin_port = htons(S1_PORT);
+ err = listen(s1, 32);
+ if (err < 0) {
+ perror("listen s1 failed()\n");
+ return errno;
+ }
+
+ addr.sin_port = htons(S2_PORT);
+ err = listen(s2, 32);
+ if (err < 0) {
+ perror("listen s1 failed()\n");
+ return errno;
+ }
+
+ /* Initiate Connect */
+ addr.sin_port = htons(S1_PORT);
+ err = connect(c1, (struct sockaddr *)&addr, sizeof(addr));
+ if (err < 0 && errno != EINPROGRESS) {
+ perror("connect c1 failed()\n");
+ return errno;
+ }
+
+ addr.sin_port = htons(S2_PORT);
+ err = connect(c2, (struct sockaddr *)&addr, sizeof(addr));
+ if (err < 0 && errno != EINPROGRESS) {
+ perror("connect c2 failed()\n");
+ return errno;
+ } else if (err < 0) {
+ err = 0;
+ }
+
+ /* Accept Connecrtions */
+ p1 = accept(s1, NULL, NULL);
+ if (p1 < 0) {
+ perror("accept s1 failed()\n");
+ return errno;
+ }
+
+ p2 = accept(s2, NULL, NULL);
+ if (p2 < 0) {
+ perror("accept s1 failed()\n");
+ return errno;
+ }
+
+ if (verbose) {
+ printf("connected sockets: c1 <-> p1, c2 <-> p2\n");
+ printf("cgroups binding: c1(%i) <-> s1(%i) - - - c2(%i) <-> s2(%i)\n",
+ c1, s1, c2, s2);
+ }
+ return 0;
+}
+
+struct msg_stats {
+ size_t bytes_sent;
+ size_t bytes_recvd;
+ struct timespec start;
+ struct timespec end;
+};
+
+struct sockmap_options {
+ int verbose;
+ bool base;
+ bool sendpage;
+ bool data_test;
+ bool drop_expected;
+ int iov_count;
+ int iov_length;
+ int rate;
+};
+
+static int msg_loop_sendpage(int fd, int iov_length, int cnt,
+ struct msg_stats *s,
+ struct sockmap_options *opt)
+{
+ bool drop = opt->drop_expected;
+ unsigned char k = 0;
+ FILE *file;
+ int i, fp;
+
+ file = fopen(".sendpage_tst.tmp", "w+");
+ for (i = 0; i < iov_length * cnt; i++, k++)
+ fwrite(&k, sizeof(char), 1, file);
+ fflush(file);
+ fseek(file, 0, SEEK_SET);
+ fclose(file);
+
+ fp = open(".sendpage_tst.tmp", O_RDONLY);
+ clock_gettime(CLOCK_MONOTONIC, &s->start);
+ for (i = 0; i < cnt; i++) {
+ int sent = sendfile(fd, fp, NULL, iov_length);
+
+ if (!drop && sent < 0) {
+ perror("send loop error:");
+ close(fp);
+ return sent;
+ } else if (drop && sent >= 0) {
+ printf("sendpage loop error expected: %i\n", sent);
+ close(fp);
+ return -EIO;
+ }
+
+ if (sent > 0)
+ s->bytes_sent += sent;
+ }
+ clock_gettime(CLOCK_MONOTONIC, &s->end);
+ close(fp);
+ return 0;
+}
+
+static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
+ struct msg_stats *s, bool tx,
+ struct sockmap_options *opt)
+{
+ struct msghdr msg = {0};
+ int err, i, flags = MSG_NOSIGNAL;
+ struct iovec *iov;
+ unsigned char k;
+ bool data_test = opt->data_test;
+ bool drop = opt->drop_expected;
+
+ iov = calloc(iov_count, sizeof(struct iovec));
+ if (!iov)
+ return errno;
+
+ k = 0;
+ for (i = 0; i < iov_count; i++) {
+ unsigned char *d = calloc(iov_length, sizeof(char));
+
+ if (!d) {
+ fprintf(stderr, "iov_count %i/%i OOM\n", i, iov_count);
+ goto out_errno;
+ }
+ iov[i].iov_base = d;
+ iov[i].iov_len = iov_length;
+
+ if (data_test && tx) {
+ int j;
+
+ for (j = 0; j < iov_length; j++)
+ d[j] = k++;
+ }
+ }
+
+ msg.msg_iov = iov;
+ msg.msg_iovlen = iov_count;
+ k = 0;
+
+ if (tx) {
+ clock_gettime(CLOCK_MONOTONIC, &s->start);
+ for (i = 0; i < cnt; i++) {
+ int sent = sendmsg(fd, &msg, flags);
+
+ if (!drop && sent < 0) {
+ perror("send loop error:");
+ goto out_errno;
+ } else if (drop && sent >= 0) {
+ printf("send loop error expected: %i\n", sent);
+ errno = -EIO;
+ goto out_errno;
+ }
+ if (sent > 0)
+ s->bytes_sent += sent;
+ }
+ clock_gettime(CLOCK_MONOTONIC, &s->end);
+ } else {
+ int slct, recv, max_fd = fd;
+ int fd_flags = O_NONBLOCK;
+ struct timeval timeout;
+ float total_bytes;
+ fd_set w;
+
+ fcntl(fd, fd_flags);
+ total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
+ err = clock_gettime(CLOCK_MONOTONIC, &s->start);
+ if (err < 0)
+ perror("recv start time: ");
+ while (s->bytes_recvd < total_bytes) {
+ timeout.tv_sec = 0;
+ timeout.tv_usec = 10;
+
+ /* FD sets */
+ FD_ZERO(&w);
+ FD_SET(fd, &w);
+
+ slct = select(max_fd + 1, &w, NULL, NULL, &timeout);
+ if (slct == -1) {
+ perror("select()");
+ clock_gettime(CLOCK_MONOTONIC, &s->end);
+ goto out_errno;
+ } else if (!slct) {
+ if (opt->verbose)
+ fprintf(stderr, "unexpected timeout\n");
+ errno = -EIO;
+ clock_gettime(CLOCK_MONOTONIC, &s->end);
+ goto out_errno;
+ }
+
+ recv = recvmsg(fd, &msg, flags);
+ if (recv < 0) {
+ if (errno != EWOULDBLOCK) {
+ clock_gettime(CLOCK_MONOTONIC, &s->end);
+ perror("recv failed()\n");
+ goto out_errno;
+ }
+ }
+
+ s->bytes_recvd += recv;
+
+ if (data_test) {
+ int j;
+
+ for (i = 0; i < msg.msg_iovlen; i++) {
+ unsigned char *d = iov[i].iov_base;
+
+ for (j = 0;
+ j < iov[i].iov_len && recv; j++) {
+ if (d[j] != k++) {
+ errno = -EIO;
+ fprintf(stderr,
+ "detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
+ i, j, d[j], k - 1, d[j+1], k + 1);
+ goto out_errno;
+ }
+ recv--;
+ }
+ }
+ }
+ }
+ clock_gettime(CLOCK_MONOTONIC, &s->end);
+ }
+
+ for (i = 0; i < iov_count; i++)
+ free(iov[i].iov_base);
+ free(iov);
+ return 0;
+out_errno:
+ for (i = 0; i < iov_count; i++)
+ free(iov[i].iov_base);
+ free(iov);
+ return errno;
+}
+
+static float giga = 1000000000;
+
+static inline float sentBps(struct msg_stats s)
+{
+ return s.bytes_sent / (s.end.tv_sec - s.start.tv_sec);
+}
+
+static inline float recvdBps(struct msg_stats s)
+{
+ return s.bytes_recvd / (s.end.tv_sec - s.start.tv_sec);
+}
+
+static int sendmsg_test(struct sockmap_options *opt)
+{
+ float sent_Bps = 0, recvd_Bps = 0;
+ int rx_fd, txpid, rxpid, err = 0;
+ struct msg_stats s = {0};
+ int iov_count = opt->iov_count;
+ int iov_buf = opt->iov_length;
+ int cnt = opt->rate;
+ int status;
+
+ errno = 0;
+
+ if (opt->base)
+ rx_fd = p1;
+ else
+ rx_fd = p2;
+
+ rxpid = fork();
+ if (rxpid == 0) {
+ if (opt->drop_expected)
+ exit(1);
+
+ if (opt->sendpage)
+ iov_count = 1;
+ err = msg_loop(rx_fd, iov_count, iov_buf,
+ cnt, &s, false, opt);
+ if (err && opt->verbose)
+ fprintf(stderr,
+ "msg_loop_rx: iov_count %i iov_buf %i cnt %i err %i\n",
+ iov_count, iov_buf, cnt, err);
+ shutdown(p2, SHUT_RDWR);
+ shutdown(p1, SHUT_RDWR);
+ if (s.end.tv_sec - s.start.tv_sec) {
+ sent_Bps = sentBps(s);
+ recvd_Bps = recvdBps(s);
+ }
+ if (opt->verbose)
+ fprintf(stdout,
+ "rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n",
+ s.bytes_sent, sent_Bps, sent_Bps/giga,
+ s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
+ exit(1);
+ } else if (rxpid == -1) {
+ perror("msg_loop_rx: ");
+ return errno;
+ }
+
+ txpid = fork();
+ if (txpid == 0) {
+ if (opt->sendpage)
+ err = msg_loop_sendpage(c1, iov_buf, cnt, &s, opt);
+ else
+ err = msg_loop(c1, iov_count, iov_buf,
+ cnt, &s, true, opt);
+
+ if (err)
+ fprintf(stderr,
+ "msg_loop_tx: iov_count %i iov_buf %i cnt %i err %i\n",
+ iov_count, iov_buf, cnt, err);
+ shutdown(c1, SHUT_RDWR);
+ if (s.end.tv_sec - s.start.tv_sec) {
+ sent_Bps = sentBps(s);
+ recvd_Bps = recvdBps(s);
+ }
+ if (opt->verbose)
+ fprintf(stdout,
+ "tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n",
+ s.bytes_sent, sent_Bps, sent_Bps/giga,
+ s.bytes_recvd, recvd_Bps, recvd_Bps/giga);
+ exit(1);
+ } else if (txpid == -1) {
+ perror("msg_loop_tx: ");
+ return errno;
+ }
+
+ assert(waitpid(rxpid, &status, 0) == rxpid);
+ assert(waitpid(txpid, &status, 0) == txpid);
+ return err;
+}
+
+static int forever_ping_pong(int rate, struct sockmap_options *opt)
+{
+ struct timeval timeout;
+ char buf[1024] = {0};
+ int sc;
+
+ timeout.tv_sec = 10;
+ timeout.tv_usec = 0;
+
+ /* Ping/Pong data from client to server */
+ sc = send(c1, buf, sizeof(buf), 0);
+ if (sc < 0) {
+ perror("send failed()\n");
+ return sc;
+ }
+
+ do {
+ int s, rc, i, max_fd = p2;
+ fd_set w;
+
+ /* FD sets */
+ FD_ZERO(&w);
+ FD_SET(c1, &w);
+ FD_SET(c2, &w);
+ FD_SET(p1, &w);
+ FD_SET(p2, &w);
+
+ s = select(max_fd + 1, &w, NULL, NULL, &timeout);
+ if (s == -1) {
+ perror("select()");
+ break;
+ } else if (!s) {
+ fprintf(stderr, "unexpected timeout\n");
+ break;
+ }
+
+ for (i = 0; i <= max_fd && s > 0; ++i) {
+ if (!FD_ISSET(i, &w))
+ continue;
+
+ s--;
+
+ rc = recv(i, buf, sizeof(buf), 0);
+ if (rc < 0) {
+ if (errno != EWOULDBLOCK) {
+ perror("recv failed()\n");
+ return rc;
+ }
+ }
+
+ if (rc == 0) {
+ close(i);
+ break;
+ }
+
+ sc = send(i, buf, rc, 0);
+ if (sc < 0) {
+ perror("send failed()\n");
+ return sc;
+ }
+ }
+
+ if (rate)
+ sleep(rate);
+
+ if (opt->verbose) {
+ printf(".");
+ fflush(stdout);
+
+ }
+ } while (running);
+
+ return 0;
+}
+
+enum {
+ PING_PONG,
+ SENDMSG,
+ BASE,
+ BASE_SENDPAGE,
+ SENDPAGE,
+};
+
+static int run_options(struct sockmap_options *options, int cg_fd, int test)
+{
+ int i, key, next_key, err, tx_prog_fd = -1, zero = 0;
+
+ /* If base test skip BPF setup */
+ if (test == BASE || test == BASE_SENDPAGE)
+ goto run;
+
+ /* Attach programs to sockmap */
+ err = bpf_prog_attach(prog_fd[0], map_fd[0],
+ BPF_SK_SKB_STREAM_PARSER, 0);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_prog_attach (sockmap %i->%i): %d (%s)\n",
+ prog_fd[0], map_fd[0], err, strerror(errno));
+ return err;
+ }
+
+ err = bpf_prog_attach(prog_fd[1], map_fd[0],
+ BPF_SK_SKB_STREAM_VERDICT, 0);
+ if (err) {
+ fprintf(stderr, "ERROR: bpf_prog_attach (sockmap): %d (%s)\n",
+ err, strerror(errno));
+ return err;
+ }
+
+ /* Attach to cgroups */
+ err = bpf_prog_attach(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS, 0);
+ if (err) {
+ fprintf(stderr, "ERROR: bpf_prog_attach (groups): %d (%s)\n",
+ err, strerror(errno));
+ return err;
+ }
+
+run:
+ err = sockmap_init_sockets(options->verbose);
+ if (err) {
+ fprintf(stderr, "ERROR: test socket failed: %d\n", err);
+ goto out;
+ }
+
+ /* Attach txmsg program to sockmap */
+ if (txmsg_pass)
+ tx_prog_fd = prog_fd[3];
+ else if (txmsg_noisy)
+ tx_prog_fd = prog_fd[4];
+ else if (txmsg_redir)
+ tx_prog_fd = prog_fd[5];
+ else if (txmsg_redir_noisy)
+ tx_prog_fd = prog_fd[6];
+ else if (txmsg_drop)
+ tx_prog_fd = prog_fd[9];
+ /* apply and cork must be last */
+ else if (txmsg_apply)
+ tx_prog_fd = prog_fd[7];
+ else if (txmsg_cork)
+ tx_prog_fd = prog_fd[8];
+ else
+ tx_prog_fd = 0;
+
+ if (tx_prog_fd) {
+ int redir_fd, i = 0;
+
+ err = bpf_prog_attach(tx_prog_fd,
+ map_fd[1], BPF_SK_MSG_VERDICT, 0);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_prog_attach (txmsg): %d (%s)\n",
+ err, strerror(errno));
+ goto out;
+ }
+
+ err = bpf_map_update_elem(map_fd[1], &i, &c1, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (txmsg): %d (%s\n",
+ err, strerror(errno));
+ goto out;
+ }
+
+ if (txmsg_redir || txmsg_redir_noisy)
+ redir_fd = c2;
+ else
+ redir_fd = c1;
+
+ err = bpf_map_update_elem(map_fd[2], &i, &redir_fd, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (txmsg): %d (%s\n",
+ err, strerror(errno));
+ goto out;
+ }
+
+ if (txmsg_apply) {
+ err = bpf_map_update_elem(map_fd[3],
+ &i, &txmsg_apply, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (apply_bytes): %d (%s\n",
+ err, strerror(errno));
+ goto out;
+ }
+ }
+
+ if (txmsg_cork) {
+ err = bpf_map_update_elem(map_fd[4],
+ &i, &txmsg_cork, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (cork_bytes): %d (%s\n",
+ err, strerror(errno));
+ goto out;
+ }
+ }
+
+ if (txmsg_start) {
+ err = bpf_map_update_elem(map_fd[5],
+ &i, &txmsg_start, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (txmsg_start): %d (%s)\n",
+ err, strerror(errno));
+ goto out;
+ }
+ }
+
+ if (txmsg_end) {
+ i = 1;
+ err = bpf_map_update_elem(map_fd[5],
+ &i, &txmsg_end, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (txmsg_end): %d (%s)\n",
+ err, strerror(errno));
+ goto out;
+ }
+ }
+
+ if (txmsg_ingress) {
+ int in = BPF_F_INGRESS;
+
+ i = 0;
+ err = bpf_map_update_elem(map_fd[6], &i, &in, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (txmsg_ingress): %d (%s)\n",
+ err, strerror(errno));
+ }
+ i = 1;
+ err = bpf_map_update_elem(map_fd[1], &i, &p1, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (p1 txmsg): %d (%s)\n",
+ err, strerror(errno));
+ }
+ err = bpf_map_update_elem(map_fd[2], &i, &p1, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (p1 redir): %d (%s)\n",
+ err, strerror(errno));
+ }
+
+ i = 2;
+ err = bpf_map_update_elem(map_fd[2], &i, &p2, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (p2 txmsg): %d (%s)\n",
+ err, strerror(errno));
+ }
+ }
+
+ if (txmsg_skb) {
+ int skb_fd = (test == SENDMSG || test == SENDPAGE) ?
+ p2 : p1;
+ int ingress = BPF_F_INGRESS;
+
+ i = 0;
+ err = bpf_map_update_elem(map_fd[7],
+ &i, &ingress, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (txmsg_ingress): %d (%s)\n",
+ err, strerror(errno));
+ }
+
+ i = 3;
+ err = bpf_map_update_elem(map_fd[0],
+ &i, &skb_fd, BPF_ANY);
+ if (err) {
+ fprintf(stderr,
+ "ERROR: bpf_map_update_elem (c1 sockmap): %d (%s)\n",
+ err, strerror(errno));
+ }
+ }
+ }
+
+ if (txmsg_drop)
+ options->drop_expected = true;
+
+ if (test == PING_PONG)
+ err = forever_ping_pong(options->rate, options);
+ else if (test == SENDMSG) {
+ options->base = false;
+ options->sendpage = false;
+ err = sendmsg_test(options);
+ } else if (test == SENDPAGE) {
+ options->base = false;
+ options->sendpage = true;
+ err = sendmsg_test(options);
+ } else if (test == BASE) {
+ options->base = true;
+ options->sendpage = false;
+ err = sendmsg_test(options);
+ } else if (test == BASE_SENDPAGE) {
+ options->base = true;
+ options->sendpage = true;
+ err = sendmsg_test(options);
+ } else
+ fprintf(stderr, "unknown test\n");
+out:
+ /* Detatch and zero all the maps */
+ bpf_prog_detach2(prog_fd[2], cg_fd, BPF_CGROUP_SOCK_OPS);
+ bpf_prog_detach2(prog_fd[0], map_fd[0], BPF_SK_SKB_STREAM_PARSER);
+ bpf_prog_detach2(prog_fd[1], map_fd[0], BPF_SK_SKB_STREAM_VERDICT);
+ if (tx_prog_fd >= 0)
+ bpf_prog_detach2(tx_prog_fd, map_fd[1], BPF_SK_MSG_VERDICT);
+
+ for (i = 0; i < 8; i++) {
+ key = next_key = 0;
+ bpf_map_update_elem(map_fd[i], &key, &zero, BPF_ANY);
+ while (bpf_map_get_next_key(map_fd[i], &key, &next_key) == 0) {
+ bpf_map_update_elem(map_fd[i], &key, &zero, BPF_ANY);
+ key = next_key;
+ }
+ }
+
+ close(s1);
+ close(s2);
+ close(p1);
+ close(p2);
+ close(c1);
+ close(c2);
+ return err;
+}
+
+static char *test_to_str(int test)
+{
+ switch (test) {
+ case SENDMSG:
+ return "sendmsg";
+ case SENDPAGE:
+ return "sendpage";
+ }
+ return "unknown";
+}
+
+#define OPTSTRING 60
+static void test_options(char *options)
+{
+ memset(options, 0, OPTSTRING);
+
+ if (txmsg_pass)
+ strncat(options, "pass,", OPTSTRING);
+ if (txmsg_noisy)
+ strncat(options, "pass_noisy,", OPTSTRING);
+ if (txmsg_redir)
+ strncat(options, "redir,", OPTSTRING);
+ if (txmsg_redir_noisy)
+ strncat(options, "redir_noisy,", OPTSTRING);
+ if (txmsg_drop)
+ strncat(options, "drop,", OPTSTRING);
+ if (txmsg_apply)
+ strncat(options, "apply,", OPTSTRING);
+ if (txmsg_cork)
+ strncat(options, "cork,", OPTSTRING);
+ if (txmsg_start)
+ strncat(options, "start,", OPTSTRING);
+ if (txmsg_end)
+ strncat(options, "end,", OPTSTRING);
+ if (txmsg_ingress)
+ strncat(options, "ingress,", OPTSTRING);
+ if (txmsg_skb)
+ strncat(options, "skb,", OPTSTRING);
+}
+
+static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
+{
+ char *options = calloc(60, sizeof(char));
+ int err;
+
+ if (test == SENDPAGE)
+ opt->sendpage = true;
+ else
+ opt->sendpage = false;
+
+ if (txmsg_drop)
+ opt->drop_expected = true;
+ else
+ opt->drop_expected = false;
+
+ test_options(options);
+
+ fprintf(stdout,
+ "[TEST %i]: (%i, %i, %i, %s, %s): ",
+ test_cnt, opt->rate, opt->iov_count, opt->iov_length,
+ test_to_str(test), options);
+ fflush(stdout);
+ err = run_options(opt, cgrp, test);
+ fprintf(stdout, "%s\n", !err ? "PASS" : "FAILED");
+ test_cnt++;
+ !err ? passed++ : failed++;
+ free(options);
+ return err;
+}
+
+static int test_exec(int cgrp, struct sockmap_options *opt)
+{
+ int err = __test_exec(cgrp, SENDMSG, opt);
+
+ if (err)
+ goto out;
+
+ err = __test_exec(cgrp, SENDPAGE, opt);
+out:
+ return err;
+}
+
+static int test_loop(int cgrp)
+{
+ struct sockmap_options opt;
+
+ int err, i, l, r;
+
+ opt.verbose = 0;
+ opt.base = false;
+ opt.sendpage = false;
+ opt.data_test = false;
+ opt.drop_expected = false;
+ opt.iov_count = 0;
+ opt.iov_length = 0;
+ opt.rate = 0;
+
+ r = 1;
+ for (i = 1; i < 100; i += 33) {
+ for (l = 1; l < 100; l += 33) {
+ opt.rate = r;
+ opt.iov_count = i;
+ opt.iov_length = l;
+ err = test_exec(cgrp, &opt);
+ if (err)
+ goto out;
+ }
+ }
+ sched_yield();
+out:
+ return err;
+}
+
+static int test_txmsg(int cgrp)
+{
+ int err;
+
+ txmsg_pass = txmsg_noisy = txmsg_redir_noisy = txmsg_drop = 0;
+ txmsg_apply = txmsg_cork = 0;
+ txmsg_ingress = txmsg_skb = 0;
+
+ txmsg_pass = 1;
+ err = test_loop(cgrp);
+ txmsg_pass = 0;
+ if (err)
+ goto out;
+
+ txmsg_redir = 1;
+ err = test_loop(cgrp);
+ txmsg_redir = 0;
+ if (err)
+ goto out;
+
+ txmsg_drop = 1;
+ err = test_loop(cgrp);
+ txmsg_drop = 0;
+ if (err)
+ goto out;
+
+ txmsg_redir = 1;
+ txmsg_ingress = 1;
+ err = test_loop(cgrp);
+ txmsg_redir = 0;
+ txmsg_ingress = 0;
+ if (err)
+ goto out;
+out:
+ txmsg_pass = 0;
+ txmsg_redir = 0;
+ txmsg_drop = 0;
+ return err;
+}
+
+static int test_send(struct sockmap_options *opt, int cgrp)
+{
+ int err;
+
+ opt->iov_length = 1;
+ opt->iov_count = 1;
+ opt->rate = 1;
+ err = test_exec(cgrp, opt);
+ if (err)
+ goto out;
+
+ opt->iov_length = 1;
+ opt->iov_count = 1024;
+ opt->rate = 1;
+ err = test_exec(cgrp, opt);
+ if (err)
+ goto out;
+
+ opt->iov_length = 1024;
+ opt->iov_count = 1;
+ opt->rate = 1;
+ err = test_exec(cgrp, opt);
+ if (err)
+ goto out;
+
+ opt->iov_length = 1;
+ opt->iov_count = 1;
+ opt->rate = 1024;
+ err = test_exec(cgrp, opt);
+ if (err)
+ goto out;
+
+ opt->iov_length = 256;
+ opt->iov_count = 1024;
+ opt->rate = 10;
+ err = test_exec(cgrp, opt);
+ if (err)
+ goto out;
+
+ opt->rate = 100;
+ opt->iov_count = 1;
+ opt->iov_length = 5;
+ err = test_exec(cgrp, opt);
+ if (err)
+ goto out;
+out:
+ sched_yield();
+ return err;
+}
+
+static int test_mixed(int cgrp)
+{
+ struct sockmap_options opt = {0};
+ int err;
+
+ txmsg_pass = txmsg_noisy = txmsg_redir_noisy = txmsg_drop = 0;
+ txmsg_apply = txmsg_cork = 0;
+ txmsg_start = txmsg_end = 0;
+ /* Test small and large iov_count values with pass/redir/apply/cork */
+ txmsg_pass = 1;
+ txmsg_redir = 0;
+ txmsg_apply = 1;
+ txmsg_cork = 0;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 1;
+ txmsg_redir = 0;
+ txmsg_apply = 0;
+ txmsg_cork = 1;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 1;
+ txmsg_redir = 0;
+ txmsg_apply = 1;
+ txmsg_cork = 1;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 1;
+ txmsg_redir = 0;
+ txmsg_apply = 1024;
+ txmsg_cork = 0;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 1;
+ txmsg_redir = 0;
+ txmsg_apply = 0;
+ txmsg_cork = 1024;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 1;
+ txmsg_redir = 0;
+ txmsg_apply = 1024;
+ txmsg_cork = 1024;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 1;
+ txmsg_redir = 0;
+ txmsg_cork = 4096;
+ txmsg_apply = 4096;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 0;
+ txmsg_redir = 1;
+ txmsg_apply = 1;
+ txmsg_cork = 0;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 0;
+ txmsg_redir = 1;
+ txmsg_apply = 0;
+ txmsg_cork = 1;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 0;
+ txmsg_redir = 1;
+ txmsg_apply = 1024;
+ txmsg_cork = 0;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 0;
+ txmsg_redir = 1;
+ txmsg_apply = 0;
+ txmsg_cork = 1024;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 0;
+ txmsg_redir = 1;
+ txmsg_apply = 1024;
+ txmsg_cork = 1024;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+
+ txmsg_pass = 0;
+ txmsg_redir = 1;
+ txmsg_cork = 4096;
+ txmsg_apply = 4096;
+ err = test_send(&opt, cgrp);
+ if (err)
+ goto out;
+out:
+ return err;
+}
+
+static int test_start_end(int cgrp)
+{
+ struct sockmap_options opt = {0};
+ int err, i;
+
+ /* Test basic start/end with lots of iov_count and iov_lengths */
+ txmsg_start = 1;
+ txmsg_end = 2;
+ err = test_txmsg(cgrp);
+ if (err)
+ goto out;
+
+ /* Test start/end with cork */
+ opt.rate = 16;
+ opt.iov_count = 1;
+ opt.iov_length = 100;
+ txmsg_cork = 1600;
+
+ for (i = 99; i <= 1600; i += 500) {
+ txmsg_start = 0;
+ txmsg_end = i;
+ err = test_exec(cgrp, &opt);
+ if (err)
+ goto out;
+ }
+
+ /* Test start/end with cork but pull data in middle */
+ for (i = 199; i <= 1600; i += 500) {
+ txmsg_start = 100;
+ txmsg_end = i;
+ err = test_exec(cgrp, &opt);
+ if (err)
+ goto out;
+ }
+
+ /* Test start/end with cork pulling last sg entry */
+ txmsg_start = 1500;
+ txmsg_end = 1600;
+ err = test_exec(cgrp, &opt);
+ if (err)
+ goto out;
+
+ /* Test start/end pull of single byte in last page */
+ txmsg_start = 1111;
+ txmsg_end = 1112;
+ err = test_exec(cgrp, &opt);
+ if (err)
+ goto out;
+
+ /* Test start/end with end < start */
+ txmsg_start = 1111;
+ txmsg_end = 0;
+ err = test_exec(cgrp, &opt);
+ if (err)
+ goto out;
+
+ /* Test start/end with end > data */
+ txmsg_start = 0;
+ txmsg_end = 1601;
+ err = test_exec(cgrp, &opt);
+ if (err)
+ goto out;
+
+ /* Test start/end with start > data */
+ txmsg_start = 1601;
+ txmsg_end = 1600;
+ err = test_exec(cgrp, &opt);
+
+out:
+ txmsg_start = 0;
+ txmsg_end = 0;
+ sched_yield();
+ return err;
+}
+
+char *map_names[] = {
+ "sock_map",
+ "sock_map_txmsg",
+ "sock_map_redir",
+ "sock_apply_bytes",
+ "sock_cork_bytes",
+ "sock_pull_bytes",
+ "sock_redir_flags",
+ "sock_skb_opts",
+};
+
+int prog_attach_type[] = {
+ BPF_SK_SKB_STREAM_PARSER,
+ BPF_SK_SKB_STREAM_VERDICT,
+ BPF_CGROUP_SOCK_OPS,
+ BPF_SK_MSG_VERDICT,
+ BPF_SK_MSG_VERDICT,
+ BPF_SK_MSG_VERDICT,
+ BPF_SK_MSG_VERDICT,
+ BPF_SK_MSG_VERDICT,
+ BPF_SK_MSG_VERDICT,
+ BPF_SK_MSG_VERDICT,
+};
+
+int prog_type[] = {
+ BPF_PROG_TYPE_SK_SKB,
+ BPF_PROG_TYPE_SK_SKB,
+ BPF_PROG_TYPE_SOCK_OPS,
+ BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_SK_MSG,
+};
+
+static int populate_progs(char *bpf_file)
+{
+ struct bpf_program *prog;
+ struct bpf_object *obj;
+ int i = 0;
+ long err;
+
+ obj = bpf_object__open(bpf_file);
+ err = libbpf_get_error(obj);
+ if (err) {
+ char err_buf[256];
+
+ libbpf_strerror(err, err_buf, sizeof(err_buf));
+ printf("Unable to load eBPF objects in file '%s' : %s\n",
+ bpf_file, err_buf);
+ return -1;
+ }
+
+ bpf_object__for_each_program(prog, obj) {
+ bpf_program__set_type(prog, prog_type[i]);
+ bpf_program__set_expected_attach_type(prog,
+ prog_attach_type[i]);
+ i++;
+ }
+
+ i = bpf_object__load(obj);
+ i = 0;
+ bpf_object__for_each_program(prog, obj) {
+ prog_fd[i] = bpf_program__fd(prog);
+ i++;
+ }
+
+ for (i = 0; i < sizeof(map_fd)/sizeof(int); i++) {
+ maps[i] = bpf_object__find_map_by_name(obj, map_names[i]);
+ map_fd[i] = bpf_map__fd(maps[i]);
+ if (map_fd[i] < 0) {
+ fprintf(stderr, "load_bpf_file: (%i) %s\n",
+ map_fd[i], strerror(errno));
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int __test_suite(char *bpf_file)
+{
+ int cg_fd, err;
+
+ err = populate_progs(bpf_file);
+ if (err < 0) {
+ fprintf(stderr, "ERROR: (%i) load bpf failed\n", err);
+ return err;
+ }
+
+ if (setup_cgroup_environment()) {
+ fprintf(stderr, "ERROR: cgroup env failed\n");
+ return -EINVAL;
+ }
+
+ cg_fd = create_and_get_cgroup(CG_PATH);
+ if (cg_fd < 0) {
+ fprintf(stderr,
+ "ERROR: (%i) open cg path failed: %s\n",
+ cg_fd, optarg);
+ return cg_fd;
+ }
+
+ /* Tests basic commands and APIs with range of iov values */
+ txmsg_start = txmsg_end = 0;
+ err = test_txmsg(cg_fd);
+ if (err)
+ goto out;
+
+ /* Tests interesting combinations of APIs used together */
+ err = test_mixed(cg_fd);
+ if (err)
+ goto out;
+
+ /* Tests pull_data API using start/end API */
+ err = test_start_end(cg_fd);
+ if (err)
+ goto out;
+
+out:
+ printf("Summary: %i PASSED %i FAILED\n", passed, failed);
+ cleanup_cgroup_environment();
+ close(cg_fd);
+ return err;
+}
+
+static int test_suite(void)
+{
+ int err;
+
+ err = __test_suite(BPF_SOCKMAP_FILENAME);
+ if (err)
+ goto out;
+ err = __test_suite(BPF_SOCKHASH_FILENAME);
+out:
+ return err;
+}
+
+int main(int argc, char **argv)
+{
+ struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
+ int iov_count = 1, length = 1024, rate = 1;
+ struct sockmap_options options = {0};
+ int opt, longindex, err, cg_fd = 0;
+ char *bpf_file = BPF_SOCKMAP_FILENAME;
+ int test = PING_PONG;
+
+ if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+ perror("setrlimit(RLIMIT_MEMLOCK)");
+ return 1;
+ }
+
+ if (argc < 2)
+ return test_suite();
+
+ while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
+ long_options, &longindex)) != -1) {
+ switch (opt) {
+ case 's':
+ txmsg_start = atoi(optarg);
+ break;
+ case 'e':
+ txmsg_end = atoi(optarg);
+ break;
+ case 'a':
+ txmsg_apply = atoi(optarg);
+ break;
+ case 'k':
+ txmsg_cork = atoi(optarg);
+ break;
+ case 'c':
+ cg_fd = open(optarg, O_DIRECTORY, O_RDONLY);
+ if (cg_fd < 0) {
+ fprintf(stderr,
+ "ERROR: (%i) open cg path failed: %s\n",
+ cg_fd, optarg);
+ return cg_fd;
+ }
+ break;
+ case 'r':
+ rate = atoi(optarg);
+ break;
+ case 'v':
+ options.verbose = 1;
+ break;
+ case 'i':
+ iov_count = atoi(optarg);
+ break;
+ case 'l':
+ length = atoi(optarg);
+ break;
+ case 'd':
+ options.data_test = true;
+ break;
+ case 't':
+ if (strcmp(optarg, "ping") == 0) {
+ test = PING_PONG;
+ } else if (strcmp(optarg, "sendmsg") == 0) {
+ test = SENDMSG;
+ } else if (strcmp(optarg, "base") == 0) {
+ test = BASE;
+ } else if (strcmp(optarg, "base_sendpage") == 0) {
+ test = BASE_SENDPAGE;
+ } else if (strcmp(optarg, "sendpage") == 0) {
+ test = SENDPAGE;
+ } else {
+ usage(argv);
+ return -1;
+ }
+ break;
+ case 0:
+ break;
+ case 'h':
+ default:
+ usage(argv);
+ return -1;
+ }
+ }
+
+ if (!cg_fd) {
+ fprintf(stderr, "%s requires cgroup option: --cgroup <path>\n",
+ argv[0]);
+ return -1;
+ }
+
+ err = populate_progs(bpf_file);
+ if (err) {
+ fprintf(stderr, "populate program: (%s) %s\n",
+ bpf_file, strerror(errno));
+ return 1;
+ }
+ running = 1;
+
+ /* catch SIGINT */
+ signal(SIGINT, running_handler);
+
+ options.iov_count = iov_count;
+ options.iov_length = length;
+ options.rate = rate;
+
+ err = run_options(&options, cg_fd, test);
+ close(cg_fd);
+ return err;
+}
+
+void running_handler(int a)
+{
+ running = 0;
+}
diff --git a/tools/testing/selftests/bpf/test_sockmap_kern.c b/tools/testing/selftests/bpf/test_sockmap_kern.c
new file mode 100644
index 0000000..677b2ed
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sockmap_kern.c
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Covalent IO, Inc. http://covalent.io
+#define SOCKMAP
+#define TEST_MAP_TYPE BPF_MAP_TYPE_SOCKMAP
+#include "./test_sockmap_kern.h"
diff --git a/samples/sockmap/sockmap_kern.c b/tools/testing/selftests/bpf/test_sockmap_kern.h
similarity index 87%
rename from samples/sockmap/sockmap_kern.c
rename to tools/testing/selftests/bpf/test_sockmap_kern.h
index 9ff8bc5..8e8e417 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/tools/testing/selftests/bpf/test_sockmap_kern.h
@@ -1,20 +1,19 @@
-/* Copyright (c) 2017 Covalent IO, Inc. http://covalent.io
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- */
-#include <uapi/linux/bpf.h>
-#include <uapi/linux/if_ether.h>
-#include <uapi/linux/if_packet.h>
-#include <uapi/linux/ip.h>
-#include "../../tools/testing/selftests/bpf/bpf_helpers.h"
-#include "../../tools/testing/selftests/bpf/bpf_endian.h"
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io */
+#include <stddef.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/in.h>
+#include <linux/udp.h>
+#include <linux/tcp.h>
+#include <linux/pkt_cls.h>
+#include <sys/socket.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
/* Sockmap sample program connects a client and a backend together
* using cgroups.
@@ -37,21 +36,21 @@
})
struct bpf_map_def SEC("maps") sock_map = {
- .type = BPF_MAP_TYPE_SOCKMAP,
+ .type = TEST_MAP_TYPE,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 20,
};
struct bpf_map_def SEC("maps") sock_map_txmsg = {
- .type = BPF_MAP_TYPE_SOCKMAP,
+ .type = TEST_MAP_TYPE,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 20,
};
struct bpf_map_def SEC("maps") sock_map_redir = {
- .type = BPF_MAP_TYPE_SOCKMAP,
+ .type = TEST_MAP_TYPE,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 20,
@@ -120,7 +119,12 @@
bpf_printk("sk_skb2: redirect(%iB) flags=%i\n",
len, flags);
+#ifdef SOCKMAP
return bpf_sk_redirect_map(skb, &sock_map, ret, flags);
+#else
+ return bpf_sk_redirect_hash(skb, &sock_map, &ret, flags);
+#endif
+
}
SEC("sockops")
@@ -139,8 +143,13 @@
if (lport == 10000) {
ret = 1;
+#ifdef SOCKMAP
err = bpf_sock_map_update(skops, &sock_map, &ret,
BPF_NOEXIST);
+#else
+ err = bpf_sock_hash_update(skops, &sock_map, &ret,
+ BPF_NOEXIST);
+#endif
bpf_printk("passive(%i -> %i) map ctx update err: %d\n",
lport, bpf_ntohl(rport), err);
}
@@ -151,8 +160,13 @@
if (bpf_ntohl(rport) == 10001) {
ret = 10;
+#ifdef SOCKMAP
err = bpf_sock_map_update(skops, &sock_map, &ret,
BPF_NOEXIST);
+#else
+ err = bpf_sock_hash_update(skops, &sock_map, &ret,
+ BPF_NOEXIST);
+#endif
bpf_printk("active(%i -> %i) map ctx update err: %d\n",
lport, bpf_ntohl(rport), err);
}
@@ -238,7 +252,11 @@
key = 2;
flags = *f;
}
+#ifdef SOCKMAP
return bpf_msg_redirect_map(msg, &sock_map_redir, key, flags);
+#else
+ return bpf_msg_redirect_hash(msg, &sock_map_redir, &key, flags);
+#endif
}
SEC("sk_msg4")
@@ -277,7 +295,11 @@
}
bpf_printk("sk_msg3: redirect(%iB) flags=%i err=%i\n",
len1, flags, err1 ? err1 : err2);
+#ifdef SOCKMAP
err = bpf_msg_redirect_map(msg, &sock_map_redir, key, flags);
+#else
+ err = bpf_msg_redirect_hash(msg, &sock_map_redir, &key, flags);
+#endif
bpf_printk("sk_msg3: err %i\n", err);
return err;
}
@@ -337,5 +359,5 @@
return SK_DROP;
}
-
+int _version SEC("version") = 1;
char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_stacktrace_build_id.c b/tools/testing/selftests/bpf/test_stacktrace_build_id.c
index b755bd7..d86c281 100644
--- a/tools/testing/selftests/bpf/test_stacktrace_build_id.c
+++ b/tools/testing/selftests/bpf/test_stacktrace_build_id.c
@@ -19,7 +19,7 @@
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__u32),
.value_size = sizeof(__u32),
- .max_entries = 10000,
+ .max_entries = 16384,
};
struct bpf_map_def SEC("maps") stackmap = {
@@ -31,6 +31,14 @@
.map_flags = BPF_F_STACK_BUILD_ID,
};
+struct bpf_map_def SEC("maps") stack_amap = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(struct bpf_stack_build_id)
+ * PERF_MAX_STACK_DEPTH,
+ .max_entries = 128,
+};
+
/* taken from /sys/kernel/debug/tracing/events/random/urandom_read/format */
struct random_urandom_args {
unsigned long long pad;
@@ -42,7 +50,10 @@
SEC("tracepoint/random/urandom_read")
int oncpu(struct random_urandom_args *args)
{
+ __u32 max_len = sizeof(struct bpf_stack_build_id)
+ * PERF_MAX_STACK_DEPTH;
__u32 key = 0, val = 0, *value_p;
+ void *stack_p;
value_p = bpf_map_lookup_elem(&control_map, &key);
if (value_p && *value_p)
@@ -50,8 +61,13 @@
/* The size of stackmap and stackid_hmap should be the same */
key = bpf_get_stackid(args, &stackmap, BPF_F_USER_STACK);
- if ((int)key >= 0)
+ if ((int)key >= 0) {
bpf_map_update_elem(&stackid_hmap, &key, &val, 0);
+ stack_p = bpf_map_lookup_elem(&stack_amap, &key);
+ if (stack_p)
+ bpf_get_stack(args, stack_p, max_len,
+ BPF_F_USER_STACK | BPF_F_USER_BUILD_ID);
+ }
return 0;
}
diff --git a/tools/testing/selftests/bpf/test_stacktrace_map.c b/tools/testing/selftests/bpf/test_stacktrace_map.c
index 76d85c5d..af111af 100644
--- a/tools/testing/selftests/bpf/test_stacktrace_map.c
+++ b/tools/testing/selftests/bpf/test_stacktrace_map.c
@@ -19,14 +19,21 @@
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__u32),
.value_size = sizeof(__u32),
- .max_entries = 10000,
+ .max_entries = 16384,
};
struct bpf_map_def SEC("maps") stackmap = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.key_size = sizeof(__u32),
.value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
- .max_entries = 10000,
+ .max_entries = 16384,
+};
+
+struct bpf_map_def SEC("maps") stack_amap = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
+ .max_entries = 16384,
};
/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
@@ -44,7 +51,9 @@
SEC("tracepoint/sched/sched_switch")
int oncpu(struct sched_switch_args *ctx)
{
+ __u32 max_len = PERF_MAX_STACK_DEPTH * sizeof(__u64);
__u32 key = 0, val = 0, *value_p;
+ void *stack_p;
value_p = bpf_map_lookup_elem(&control_map, &key);
if (value_p && *value_p)
@@ -52,8 +61,12 @@
/* The size of stackmap and stackid_hmap should be the same */
key = bpf_get_stackid(ctx, &stackmap, 0);
- if ((int)key >= 0)
+ if ((int)key >= 0) {
bpf_map_update_elem(&stackid_hmap, &key, &val, 0);
+ stack_p = bpf_map_lookup_elem(&stack_amap, &key);
+ if (stack_p)
+ bpf_get_stack(ctx, stack_p, max_len, 0);
+ }
return 0;
}
diff --git a/tools/testing/selftests/bpf/test_tunnel.sh b/tools/testing/selftests/bpf/test_tunnel.sh
new file mode 100755
index 0000000..aeb2901
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_tunnel.sh
@@ -0,0 +1,729 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# End-to-end eBPF tunnel test suite
+# The script tests BPF network tunnel implementation.
+#
+# Topology:
+# ---------
+# root namespace | at_ns0 namespace
+# |
+# ----------- | -----------
+# | tnl dev | | | tnl dev | (overlay network)
+# ----------- | -----------
+# metadata-mode | native-mode
+# with bpf |
+# |
+# ---------- | ----------
+# | veth1 | --------- | veth0 | (underlay network)
+# ---------- peer ----------
+#
+#
+# Device Configuration
+# --------------------
+# Root namespace with metadata-mode tunnel + BPF
+# Device names and addresses:
+# veth1 IP: 172.16.1.200, IPv6: 00::22 (underlay)
+# tunnel dev <type>11, ex: gre11, IPv4: 10.1.1.200 (overlay)
+#
+# Namespace at_ns0 with native tunnel
+# Device names and addresses:
+# veth0 IPv4: 172.16.1.100, IPv6: 00::11 (underlay)
+# tunnel dev <type>00, ex: gre00, IPv4: 10.1.1.100 (overlay)
+#
+#
+# End-to-end ping packet flow
+# ---------------------------
+# Most of the tests start by namespace creation, device configuration,
+# then ping the underlay and overlay network. When doing 'ping 10.1.1.100'
+# from root namespace, the following operations happen:
+# 1) Route lookup shows 10.1.1.100/24 belongs to tnl dev, fwd to tnl dev.
+# 2) Tnl device's egress BPF program is triggered and set the tunnel metadata,
+# with remote_ip=172.16.1.200 and others.
+# 3) Outer tunnel header is prepended and route the packet to veth1's egress
+# 4) veth0's ingress queue receive the tunneled packet at namespace at_ns0
+# 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet
+# 6) Forward the packet to the overlay tnl dev
+
+PING_ARG="-c 3 -w 10 -q"
+ret=0
+GREEN='\033[0;92m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+config_device()
+{
+ ip netns add at_ns0
+ ip link add veth0 type veth peer name veth1
+ ip link set veth0 netns at_ns0
+ ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
+ ip netns exec at_ns0 ip link set dev veth0 up
+ ip link set dev veth1 up mtu 1500
+ ip addr add dev veth1 172.16.1.200/24
+}
+
+add_gre_tunnel()
+{
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE seq key 2 \
+ local 172.16.1.100 remote 172.16.1.200
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+ # root namespace
+ ip link add dev $DEV type $TYPE key 2 external
+ ip link set dev $DEV up
+ ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6gretap_tunnel()
+{
+
+ # assign ipv6 address
+ ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+ ip netns exec at_ns0 ip link set dev veth0 up
+ ip addr add dev veth1 ::22/96
+ ip link set dev veth1 up
+
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE seq flowlabel 0xbcdef key 2 \
+ local ::11 remote ::22
+
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+ ip netns exec at_ns0 ip addr add dev $DEV_NS fc80::100/96
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external
+ ip addr add dev $DEV 10.1.1.200/24
+ ip addr add dev $DEV fc80::200/24
+ ip link set dev $DEV up
+}
+
+add_erspan_tunnel()
+{
+ # at_ns0 namespace
+ if [ "$1" == "v1" ]; then
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE seq key 2 \
+ local 172.16.1.100 remote 172.16.1.200 \
+ erspan_ver 1 erspan 123
+ else
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE seq key 2 \
+ local 172.16.1.100 remote 172.16.1.200 \
+ erspan_ver 2 erspan_dir egress erspan_hwid 3
+ fi
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external
+ ip link set dev $DEV up
+ ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6erspan_tunnel()
+{
+
+ # assign ipv6 address
+ ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+ ip netns exec at_ns0 ip link set dev veth0 up
+ ip addr add dev veth1 ::22/96
+ ip link set dev veth1 up
+
+ # at_ns0 namespace
+ if [ "$1" == "v1" ]; then
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE seq key 2 \
+ local ::11 remote ::22 \
+ erspan_ver 1 erspan 123
+ else
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE seq key 2 \
+ local ::11 remote ::22 \
+ erspan_ver 2 erspan_dir egress erspan_hwid 7
+ fi
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external
+ ip addr add dev $DEV 10.1.1.200/24
+ ip link set dev $DEV up
+}
+
+add_vxlan_tunnel()
+{
+ # Set static ARP entry here because iptables set-mark works
+ # on L3 packet, as a result not applying to ARP packets,
+ # causing errors at get_tunnel_{key/opt}.
+
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE \
+ id 2 dstport 4789 gbp remote 172.16.1.200
+ ip netns exec at_ns0 \
+ ip link set dev $DEV_NS address 52:54:00:d9:01:00 up
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+ ip netns exec at_ns0 arp -s 10.1.1.200 52:54:00:d9:02:00
+ ip netns exec at_ns0 iptables -A OUTPUT -j MARK --set-mark 0x800FF
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external gbp dstport 4789
+ ip link set dev $DEV address 52:54:00:d9:02:00 up
+ ip addr add dev $DEV 10.1.1.200/24
+ arp -s 10.1.1.100 52:54:00:d9:01:00
+}
+
+add_ip6vxlan_tunnel()
+{
+ #ip netns exec at_ns0 ip -4 addr del 172.16.1.100 dev veth0
+ ip netns exec at_ns0 ip -6 addr add ::11/96 dev veth0
+ ip netns exec at_ns0 ip link set dev veth0 up
+ #ip -4 addr del 172.16.1.200 dev veth1
+ ip -6 addr add dev veth1 ::22/96
+ ip link set dev veth1 up
+
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE id 22 dstport 4789 \
+ local ::11 remote ::22
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external dstport 4789
+ ip addr add dev $DEV 10.1.1.200/24
+ ip link set dev $DEV up
+}
+
+add_geneve_tunnel()
+{
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE \
+ id 2 dstport 6081 remote 172.16.1.200
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+ # root namespace
+ ip link add dev $DEV type $TYPE dstport 6081 external
+ ip link set dev $DEV up
+ ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ip6geneve_tunnel()
+{
+ ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+ ip netns exec at_ns0 ip link set dev veth0 up
+ ip addr add dev veth1 ::22/96
+ ip link set dev veth1 up
+
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE id 22 \
+ remote ::22 # geneve has no local option
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external
+ ip addr add dev $DEV 10.1.1.200/24
+ ip link set dev $DEV up
+}
+
+add_ipip_tunnel()
+{
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE \
+ local 172.16.1.100 remote 172.16.1.200
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external
+ ip link set dev $DEV up
+ ip addr add dev $DEV 10.1.1.200/24
+}
+
+add_ipip6tnl_tunnel()
+{
+ ip netns exec at_ns0 ip addr add ::11/96 dev veth0
+ ip netns exec at_ns0 ip link set dev veth0 up
+ ip addr add dev veth1 ::22/96
+ ip link set dev veth1 up
+
+ # at_ns0 namespace
+ ip netns exec at_ns0 \
+ ip link add dev $DEV_NS type $TYPE \
+ local ::11 remote ::22
+ ip netns exec at_ns0 ip addr add dev $DEV_NS 10.1.1.100/24
+ ip netns exec at_ns0 ip link set dev $DEV_NS up
+
+ # root namespace
+ ip link add dev $DEV type $TYPE external
+ ip addr add dev $DEV 10.1.1.200/24
+ ip link set dev $DEV up
+}
+
+test_gre()
+{
+ TYPE=gretap
+ DEV_NS=gretap00
+ DEV=gretap11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_gre_tunnel
+ attach_bpf $DEV gre_set_tunnel gre_get_tunnel
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gre()
+{
+ TYPE=ip6gre
+ DEV_NS=ip6gre00
+ DEV=ip6gre11
+ ret=0
+
+ check $TYPE
+ config_device
+ # reuse the ip6gretap function
+ add_ip6gretap_tunnel
+ attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+ # underlay
+ ping6 $PING_ARG ::11
+ # overlay: ipv4 over ipv6
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ # overlay: ipv6 over ipv6
+ ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6gretap()
+{
+ TYPE=ip6gretap
+ DEV_NS=ip6gretap00
+ DEV=ip6gretap11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_ip6gretap_tunnel
+ attach_bpf $DEV ip6gretap_set_tunnel ip6gretap_get_tunnel
+ # underlay
+ ping6 $PING_ARG ::11
+ # overlay: ipv4 over ipv6
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ # overlay: ipv6 over ipv6
+ ip netns exec at_ns0 ping6 $PING_ARG fc80::200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_erspan()
+{
+ TYPE=erspan
+ DEV_NS=erspan00
+ DEV=erspan11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_erspan_tunnel $1
+ attach_bpf $DEV erspan_set_tunnel erspan_get_tunnel
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6erspan()
+{
+ TYPE=ip6erspan
+ DEV_NS=ip6erspan00
+ DEV=ip6erspan11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_ip6erspan_tunnel $1
+ attach_bpf $DEV ip4ip6erspan_set_tunnel ip4ip6erspan_get_tunnel
+ ping6 $PING_ARG ::11
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_vxlan()
+{
+ TYPE=vxlan
+ DEV_NS=vxlan00
+ DEV=vxlan11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_vxlan_tunnel
+ attach_bpf $DEV vxlan_set_tunnel vxlan_get_tunnel
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6vxlan()
+{
+ TYPE=vxlan
+ DEV_NS=ip6vxlan00
+ DEV=ip6vxlan11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_ip6vxlan_tunnel
+ ip link set dev veth1 mtu 1500
+ attach_bpf $DEV ip6vxlan_set_tunnel ip6vxlan_get_tunnel
+ # underlay
+ ping6 $PING_ARG ::11
+ # ip4 over ip6
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_geneve()
+{
+ TYPE=geneve
+ DEV_NS=geneve00
+ DEV=geneve11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_geneve_tunnel
+ attach_bpf $DEV geneve_set_tunnel geneve_get_tunnel
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ip6geneve()
+{
+ TYPE=geneve
+ DEV_NS=ip6geneve00
+ DEV=ip6geneve11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_ip6geneve_tunnel
+ attach_bpf $DEV ip6geneve_set_tunnel ip6geneve_get_tunnel
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: ip6$TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
+}
+
+test_ipip()
+{
+ TYPE=ipip
+ DEV_NS=ipip00
+ DEV=ipip11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_ipip_tunnel
+ ip link set dev veth1 mtu 1500
+ attach_bpf $DEV ipip_set_tunnel ipip_get_tunnel
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+test_ipip6()
+{
+ TYPE=ip6tnl
+ DEV_NS=ipip6tnl00
+ DEV=ipip6tnl11
+ ret=0
+
+ check $TYPE
+ config_device
+ add_ipip6tnl_tunnel
+ ip link set dev veth1 mtu 1500
+ attach_bpf $DEV ipip6_set_tunnel ipip6_get_tunnel
+ # underlay
+ ping6 $PING_ARG ::11
+ # ip4 over ip6
+ ping $PING_ARG 10.1.1.100
+ check_err $?
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+}
+
+setup_xfrm_tunnel()
+{
+ auth=0x$(printf '1%.0s' {1..40})
+ enc=0x$(printf '2%.0s' {1..32})
+ spi_in_to_out=0x1
+ spi_out_to_in=0x2
+ # at_ns0 namespace
+ # at_ns0 -> root
+ ip netns exec at_ns0 \
+ ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
+ spi $spi_in_to_out reqid 1 mode tunnel \
+ auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
+ ip netns exec at_ns0 \
+ ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir out \
+ tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
+ mode tunnel
+ # root -> at_ns0
+ ip netns exec at_ns0 \
+ ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
+ spi $spi_out_to_in reqid 2 mode tunnel \
+ auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
+ ip netns exec at_ns0 \
+ ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir in \
+ tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
+ mode tunnel
+ # address & route
+ ip netns exec at_ns0 \
+ ip addr add dev veth0 10.1.1.100/32
+ ip netns exec at_ns0 \
+ ip route add 10.1.1.200 dev veth0 via 172.16.1.200 \
+ src 10.1.1.100
+
+ # root namespace
+ # at_ns0 -> root
+ ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
+ spi $spi_in_to_out reqid 1 mode tunnel \
+ auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
+ ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir in \
+ tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
+ mode tunnel
+ # root -> at_ns0
+ ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
+ spi $spi_out_to_in reqid 2 mode tunnel \
+ auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
+ ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir out \
+ tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
+ mode tunnel
+ # address & route
+ ip addr add dev veth1 10.1.1.200/32
+ ip route add 10.1.1.100 dev veth1 via 172.16.1.100 src 10.1.1.200
+}
+
+test_xfrm_tunnel()
+{
+ config_device
+ #tcpdump -nei veth1 ip &
+ output=$(mktemp)
+ cat /sys/kernel/debug/tracing/trace_pipe | tee $output &
+ setup_xfrm_tunnel
+ tc qdisc add dev veth1 clsact
+ tc filter add dev veth1 proto ip ingress bpf da obj test_tunnel_kern.o \
+ sec xfrm_get_state
+ ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
+ sleep 1
+ grep "reqid 1" $output
+ check_err $?
+ grep "spi 0x1" $output
+ check_err $?
+ grep "remote ip 0xac100164" $output
+ check_err $?
+ cleanup
+
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: xfrm tunnel"${NC}
+ return 1
+ fi
+ echo -e ${GREEN}"PASS: xfrm tunnel"${NC}
+}
+
+attach_bpf()
+{
+ DEV=$1
+ SET=$2
+ GET=$3
+ tc qdisc add dev $DEV clsact
+ tc filter add dev $DEV egress bpf da obj test_tunnel_kern.o sec $SET
+ tc filter add dev $DEV ingress bpf da obj test_tunnel_kern.o sec $GET
+}
+
+cleanup()
+{
+ ip netns delete at_ns0 2> /dev/null
+ ip link del veth1 2> /dev/null
+ ip link del ipip11 2> /dev/null
+ ip link del ipip6tnl11 2> /dev/null
+ ip link del gretap11 2> /dev/null
+ ip link del ip6gre11 2> /dev/null
+ ip link del ip6gretap11 2> /dev/null
+ ip link del vxlan11 2> /dev/null
+ ip link del ip6vxlan11 2> /dev/null
+ ip link del geneve11 2> /dev/null
+ ip link del ip6geneve11 2> /dev/null
+ ip link del erspan11 2> /dev/null
+ ip link del ip6erspan11 2> /dev/null
+}
+
+cleanup_exit()
+{
+ echo "CATCH SIGKILL or SIGINT, cleanup and exit"
+ cleanup
+ exit 0
+}
+
+check()
+{
+ ip link help $1 2>&1 | grep -q "^Usage:"
+ if [ $? -ne 0 ];then
+ echo "SKIP $1: iproute2 not support"
+ cleanup
+ return 1
+ fi
+}
+
+enable_debug()
+{
+ echo 'file ip_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+ echo 'file ip6_gre.c +p' > /sys/kernel/debug/dynamic_debug/control
+ echo 'file vxlan.c +p' > /sys/kernel/debug/dynamic_debug/control
+ echo 'file geneve.c +p' > /sys/kernel/debug/dynamic_debug/control
+ echo 'file ipip.c +p' > /sys/kernel/debug/dynamic_debug/control
+}
+
+check_err()
+{
+ if [ $ret -eq 0 ]; then
+ ret=$1
+ fi
+}
+
+bpf_tunnel_test()
+{
+ echo "Testing GRE tunnel..."
+ test_gre
+ echo "Testing IP6GRE tunnel..."
+ test_ip6gre
+ echo "Testing IP6GRETAP tunnel..."
+ test_ip6gretap
+ echo "Testing ERSPAN tunnel..."
+ test_erspan v2
+ echo "Testing IP6ERSPAN tunnel..."
+ test_ip6erspan v2
+ echo "Testing VXLAN tunnel..."
+ test_vxlan
+ echo "Testing IP6VXLAN tunnel..."
+ test_ip6vxlan
+ echo "Testing GENEVE tunnel..."
+ test_geneve
+ echo "Testing IP6GENEVE tunnel..."
+ test_ip6geneve
+ echo "Testing IPIP tunnel..."
+ test_ipip
+ echo "Testing IPIP6 tunnel..."
+ test_ipip6
+ echo "Testing IPSec tunnel..."
+ test_xfrm_tunnel
+}
+
+trap cleanup 0 3 6
+trap cleanup_exit 2 9
+
+cleanup
+bpf_tunnel_test
+
+exit 0
diff --git a/samples/bpf/tcbpf2_kern.c b/tools/testing/selftests/bpf/test_tunnel_kern.c
similarity index 67%
rename from samples/bpf/tcbpf2_kern.c
rename to tools/testing/selftests/bpf/test_tunnel_kern.c
index 9a8db7bd..504df69 100644
--- a/samples/bpf/tcbpf2_kern.c
+++ b/tools/testing/selftests/bpf/test_tunnel_kern.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2016 VMware
* Copyright (c) 2016 Facebook
*
@@ -5,39 +6,41 @@
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*/
-#define KBUILD_MODNAME "foo"
-#include <uapi/linux/bpf.h>
-#include <uapi/linux/if_ether.h>
-#include <uapi/linux/if_packet.h>
-#include <uapi/linux/ip.h>
-#include <uapi/linux/ipv6.h>
-#include <uapi/linux/in.h>
-#include <uapi/linux/tcp.h>
-#include <uapi/linux/filter.h>
-#include <uapi/linux/pkt_cls.h>
-#include <uapi/linux/erspan.h>
-#include <net/ipv6.h>
+#include <stddef.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/types.h>
+#include <linux/tcp.h>
+#include <linux/socket.h>
+#include <linux/pkt_cls.h>
+#include <linux/erspan.h>
#include "bpf_helpers.h"
#include "bpf_endian.h"
-#define _htonl __builtin_bswap32
#define ERROR(ret) do {\
char fmt[] = "ERROR line:%d ret:%d\n";\
bpf_trace_printk(fmt, sizeof(fmt), __LINE__, ret); \
- } while(0)
+ } while (0)
+
+int _version SEC("version") = 1;
struct geneve_opt {
__be16 opt_class;
- u8 type;
- u8 length:5;
- u8 r3:1;
- u8 r2:1;
- u8 r1:1;
- u8 opt_data[8]; /* hard-coded to 8 byte */
+ __u8 type;
+ __u8 length:5;
+ __u8 r3:1;
+ __u8 r2:1;
+ __u8 r1:1;
+ __u8 opt_data[8]; /* hard-coded to 8 byte */
};
struct vxlan_metadata {
- u32 gbp;
+ __u32 gbp;
};
SEC("gre_set_tunnel")
@@ -86,7 +89,7 @@
int ret;
__builtin_memset(&key, 0x0, sizeof(key));
- key.remote_ipv6[3] = _htonl(0x11); /* ::11 */
+ key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
key.tunnel_id = 2;
key.tunnel_tos = 0;
key.tunnel_ttl = 64;
@@ -136,7 +139,8 @@
key.tunnel_tos = 0;
key.tunnel_ttl = 64;
- ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_ZERO_CSUM_TX);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
@@ -147,8 +151,8 @@
md.version = 1;
md.u.index = bpf_htonl(123);
#else
- u8 direction = 1;
- u8 hwid = 7;
+ __u8 direction = 1;
+ __u8 hwid = 7;
md.version = 2;
md.u.md2.dir = direction;
@@ -171,7 +175,7 @@
char fmt[] = "key %d remote ip 0x%x erspan version %d\n";
struct bpf_tunnel_key key;
struct erspan_metadata md;
- u32 index;
+ __u32 index;
int ret;
ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
@@ -214,7 +218,7 @@
int ret;
__builtin_memset(&key, 0x0, sizeof(key));
- key.remote_ipv6[3] = _htonl(0x11);
+ key.remote_ipv6[3] = bpf_htonl(0x11);
key.tunnel_id = 2;
key.tunnel_tos = 0;
key.tunnel_ttl = 64;
@@ -229,11 +233,11 @@
__builtin_memset(&md, 0, sizeof(md));
#ifdef ERSPAN_V1
- md.u.index = htonl(123);
+ md.u.index = bpf_htonl(123);
md.version = 1;
#else
- u8 direction = 0;
- u8 hwid = 17;
+ __u8 direction = 0;
+ __u8 hwid = 17;
md.version = 2;
md.u.md2.dir = direction;
@@ -256,10 +260,11 @@
char fmt[] = "ip6erspan get key %d remote ip6 ::%x erspan version %d\n";
struct bpf_tunnel_key key;
struct erspan_metadata md;
- u32 index;
+ __u32 index;
int ret;
- ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
@@ -304,7 +309,8 @@
key.tunnel_tos = 0;
key.tunnel_ttl = 64;
- ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_ZERO_CSUM_TX);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
@@ -346,6 +352,48 @@
return TC_ACT_OK;
}
+SEC("ip6vxlan_set_tunnel")
+int _ip6vxlan_set_tunnel(struct __sk_buff *skb)
+{
+ struct bpf_tunnel_key key;
+ int ret;
+
+ __builtin_memset(&key, 0x0, sizeof(key));
+ key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+ key.tunnel_id = 22;
+ key.tunnel_tos = 0;
+ key.tunnel_ttl = 64;
+
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ return TC_ACT_OK;
+}
+
+SEC("ip6vxlan_get_tunnel")
+int _ip6vxlan_get_tunnel(struct __sk_buff *skb)
+{
+ char fmt[] = "key %d remote ip6 ::%x label %x\n";
+ struct bpf_tunnel_key key;
+ int ret;
+
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ bpf_trace_printk(fmt, sizeof(fmt),
+ key.tunnel_id, key.remote_ipv6[3], key.tunnel_label);
+
+ return TC_ACT_OK;
+}
+
SEC("geneve_set_tunnel")
int _geneve_set_tunnel(struct __sk_buff *skb)
{
@@ -360,15 +408,16 @@
key.tunnel_ttl = 64;
__builtin_memset(&gopt, 0x0, sizeof(gopt));
- gopt.opt_class = 0x102; /* Open Virtual Networking (OVN) */
+ gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
gopt.type = 0x08;
gopt.r1 = 0;
gopt.r2 = 0;
gopt.r3 = 0;
gopt.length = 2; /* 4-byte multiple */
- *(int *) &gopt.opt_data = 0xdeadbeef;
+ *(int *) &gopt.opt_data = bpf_htonl(0xdeadbeef);
- ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_ZERO_CSUM_TX);
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_ZERO_CSUM_TX);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
@@ -408,6 +457,71 @@
return TC_ACT_OK;
}
+SEC("ip6geneve_set_tunnel")
+int _ip6geneve_set_tunnel(struct __sk_buff *skb)
+{
+ struct bpf_tunnel_key key;
+ struct geneve_opt gopt;
+ int ret;
+
+ __builtin_memset(&key, 0x0, sizeof(key));
+ key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
+ key.tunnel_id = 22;
+ key.tunnel_tos = 0;
+ key.tunnel_ttl = 64;
+
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ __builtin_memset(&gopt, 0x0, sizeof(gopt));
+ gopt.opt_class = bpf_htons(0x102); /* Open Virtual Networking (OVN) */
+ gopt.type = 0x08;
+ gopt.r1 = 0;
+ gopt.r2 = 0;
+ gopt.r3 = 0;
+ gopt.length = 2; /* 4-byte multiple */
+ *(int *) &gopt.opt_data = bpf_htonl(0xfeedbeef);
+
+ ret = bpf_skb_set_tunnel_opt(skb, &gopt, sizeof(gopt));
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ return TC_ACT_OK;
+}
+
+SEC("ip6geneve_get_tunnel")
+int _ip6geneve_get_tunnel(struct __sk_buff *skb)
+{
+ char fmt[] = "key %d remote ip 0x%x geneve class 0x%x\n";
+ struct bpf_tunnel_key key;
+ struct geneve_opt gopt;
+ int ret;
+
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ ret = bpf_skb_get_tunnel_opt(skb, &gopt, sizeof(gopt));
+ if (ret < 0) {
+ ERROR(ret);
+ return TC_ACT_SHOT;
+ }
+
+ bpf_trace_printk(fmt, sizeof(fmt),
+ key.tunnel_id, key.remote_ipv4, gopt.opt_class);
+
+ return TC_ACT_OK;
+}
+
SEC("ipip_set_tunnel")
int _ipip_set_tunnel(struct __sk_buff *skb)
{
@@ -431,9 +545,9 @@
if (iph->protocol != IPPROTO_TCP || iph->ihl != 5)
return TC_ACT_SHOT;
- if (tcp->dest == htons(5200))
+ if (tcp->dest == bpf_htons(5200))
key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
- else if (tcp->dest == htons(5201))
+ else if (tcp->dest == bpf_htons(5201))
key.remote_ipv4 = 0xac100165; /* 172.16.1.101 */
else
return TC_ACT_SHOT;
@@ -481,28 +595,12 @@
return TC_ACT_SHOT;
}
- key.remote_ipv6[0] = _htonl(0x2401db00);
+ __builtin_memset(&key, 0x0, sizeof(key));
+ key.remote_ipv6[3] = bpf_htonl(0x11); /* ::11 */
key.tunnel_ttl = 64;
- if (iph->protocol == IPPROTO_ICMP) {
- key.remote_ipv6[3] = _htonl(1);
- } else {
- if (iph->protocol != IPPROTO_TCP || iph->ihl != 5) {
- ERROR(iph->protocol);
- return TC_ACT_SHOT;
- }
-
- if (tcp->dest == htons(5200)) {
- key.remote_ipv6[3] = _htonl(1);
- } else if (tcp->dest == htons(5201)) {
- key.remote_ipv6[3] = _htonl(2);
- } else {
- ERROR(tcp->dest);
- return TC_ACT_SHOT;
- }
- }
-
- ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
@@ -518,14 +616,15 @@
struct bpf_tunnel_key key;
char fmt[] = "remote ip6 %x::%x\n";
- ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
}
- bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
- _htonl(key.remote_ipv6[3]));
+ bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+ bpf_htonl(key.remote_ipv6[3]));
return TC_ACT_OK;
}
@@ -545,28 +644,29 @@
return TC_ACT_SHOT;
}
- key.remote_ipv6[0] = _htonl(0x2401db00);
+ key.remote_ipv6[0] = bpf_htonl(0x2401db00);
key.tunnel_ttl = 64;
- if (iph->nexthdr == NEXTHDR_ICMP) {
- key.remote_ipv6[3] = _htonl(1);
+ if (iph->nexthdr == 58 /* NEXTHDR_ICMP */) {
+ key.remote_ipv6[3] = bpf_htonl(1);
} else {
- if (iph->nexthdr != NEXTHDR_TCP) {
+ if (iph->nexthdr != 6 /* NEXTHDR_TCP */) {
ERROR(iph->nexthdr);
return TC_ACT_SHOT;
}
- if (tcp->dest == htons(5200)) {
- key.remote_ipv6[3] = _htonl(1);
- } else if (tcp->dest == htons(5201)) {
- key.remote_ipv6[3] = _htonl(2);
+ if (tcp->dest == bpf_htons(5200)) {
+ key.remote_ipv6[3] = bpf_htonl(1);
+ } else if (tcp->dest == bpf_htons(5201)) {
+ key.remote_ipv6[3] = bpf_htonl(2);
} else {
ERROR(tcp->dest);
return TC_ACT_SHOT;
}
}
- ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+ ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
@@ -582,14 +682,31 @@
struct bpf_tunnel_key key;
char fmt[] = "remote ip6 %x::%x\n";
- ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), BPF_F_TUNINFO_IPV6);
+ ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
+ BPF_F_TUNINFO_IPV6);
if (ret < 0) {
ERROR(ret);
return TC_ACT_SHOT;
}
- bpf_trace_printk(fmt, sizeof(fmt), _htonl(key.remote_ipv6[0]),
- _htonl(key.remote_ipv6[3]));
+ bpf_trace_printk(fmt, sizeof(fmt), bpf_htonl(key.remote_ipv6[0]),
+ bpf_htonl(key.remote_ipv6[3]));
+ return TC_ACT_OK;
+}
+
+SEC("xfrm_get_state")
+int _xfrm_get_state(struct __sk_buff *skb)
+{
+ struct bpf_xfrm_state x;
+ char fmt[] = "reqid %d spi 0x%x remote ip 0x%x\n";
+ int ret;
+
+ ret = bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
+ if (ret < 0)
+ return TC_ACT_OK;
+
+ bpf_trace_printk(fmt, sizeof(fmt), x.reqid, bpf_ntohl(x.spi),
+ bpf_ntohl(x.remote_ipv4));
return TC_ACT_OK;
}
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index fd7de7e..4b4f015 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -41,13 +41,14 @@
# endif
#endif
#include "bpf_rlimit.h"
+#include "bpf_rand.h"
#include "../../../include/linux/filter.h"
#ifndef ARRAY_SIZE
# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#endif
-#define MAX_INSNS 512
+#define MAX_INSNS BPF_MAXINSNS
#define MAX_FIXUPS 8
#define MAX_NR_MAPS 4
#define POINTER_VALUE 0xcafe4all
@@ -64,6 +65,7 @@
struct bpf_insn insns[MAX_INSNS];
int fixup_map1[MAX_FIXUPS];
int fixup_map2[MAX_FIXUPS];
+ int fixup_map3[MAX_FIXUPS];
int fixup_prog[MAX_FIXUPS];
int fixup_map_in_map[MAX_FIXUPS];
const char *errstr;
@@ -76,6 +78,8 @@
} result, result_unpriv;
enum bpf_prog_type prog_type;
uint8_t flags;
+ __u8 data[TEST_DATA_LEN];
+ void (*fill_helper)(struct bpf_test *self);
};
/* Note we want this to be 64 bit aligned so that the end of our array is
@@ -88,6 +92,91 @@
int foo[MAX_ENTRIES];
};
+struct other_val {
+ long long foo;
+ long long bar;
+};
+
+static void bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
+{
+ /* test: {skb->data[0], vlan_push} x 68 + {skb->data[0], vlan_pop} x 68 */
+#define PUSH_CNT 51
+ unsigned int len = BPF_MAXINSNS;
+ struct bpf_insn *insn = self->insns;
+ int i = 0, j, k = 0;
+
+ insn[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
+loop:
+ for (j = 0; j < PUSH_CNT; j++) {
+ insn[i++] = BPF_LD_ABS(BPF_B, 0);
+ insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0x34, len - i - 2);
+ i++;
+ insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
+ insn[i++] = BPF_MOV64_IMM(BPF_REG_2, 1);
+ insn[i++] = BPF_MOV64_IMM(BPF_REG_3, 2);
+ insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_skb_vlan_push),
+ insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, len - i - 2);
+ i++;
+ }
+
+ for (j = 0; j < PUSH_CNT; j++) {
+ insn[i++] = BPF_LD_ABS(BPF_B, 0);
+ insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0x34, len - i - 2);
+ i++;
+ insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_6);
+ insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_skb_vlan_pop),
+ insn[i] = BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, len - i - 2);
+ i++;
+ }
+ if (++k < 5)
+ goto loop;
+
+ for (; i < len - 1; i++)
+ insn[i] = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, 0xbef);
+ insn[len - 1] = BPF_EXIT_INSN();
+}
+
+static void bpf_fill_jump_around_ld_abs(struct bpf_test *self)
+{
+ struct bpf_insn *insn = self->insns;
+ unsigned int len = BPF_MAXINSNS;
+ int i = 0;
+
+ insn[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
+ insn[i++] = BPF_LD_ABS(BPF_B, 0);
+ insn[i] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 10, len - i - 2);
+ i++;
+ while (i < len - 1)
+ insn[i++] = BPF_LD_ABS(BPF_B, 1);
+ insn[i] = BPF_EXIT_INSN();
+}
+
+static void bpf_fill_rand_ld_dw(struct bpf_test *self)
+{
+ struct bpf_insn *insn = self->insns;
+ uint64_t res = 0;
+ int i = 0;
+
+ insn[i++] = BPF_MOV32_IMM(BPF_REG_0, 0);
+ while (i < self->retval) {
+ uint64_t val = bpf_semi_rand_get();
+ struct bpf_insn tmp[2] = { BPF_LD_IMM64(BPF_REG_1, val) };
+
+ res ^= val;
+ insn[i++] = tmp[0];
+ insn[i++] = tmp[1];
+ insn[i++] = BPF_ALU64_REG(BPF_XOR, BPF_REG_0, BPF_REG_1);
+ }
+ insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_0);
+ insn[i++] = BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32);
+ insn[i++] = BPF_ALU64_REG(BPF_XOR, BPF_REG_0, BPF_REG_1);
+ insn[i] = BPF_EXIT_INSN();
+ res ^= (res >> 32);
+ self->retval = (uint32_t)res;
+}
+
static struct bpf_test tests[] = {
{
"add+sub+mul",
@@ -1597,6 +1686,121 @@
.prog_type = BPF_PROG_TYPE_SK_SKB,
},
{
+ "valid access family in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, family)),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
+ "valid access remote_ip4 in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, remote_ip4)),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
+ "valid access local_ip4 in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, local_ip4)),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
+ "valid access remote_port in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, remote_port)),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
+ "valid access local_port in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, local_port)),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
+ "valid access remote_ip6 in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, remote_ip6[0])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, remote_ip6[1])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, remote_ip6[2])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, remote_ip6[3])),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SK_SKB,
+ },
+ {
+ "valid access local_ip6 in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, local_ip6[0])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, local_ip6[1])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, local_ip6[2])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct sk_msg_md, local_ip6[3])),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_SK_SKB,
+ },
+ {
+ "invalid 64B read of family in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,
+ offsetof(struct sk_msg_md, family)),
+ BPF_EXIT_INSN(),
+ },
+ .errstr = "invalid bpf_context access",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
+ "invalid read past end of SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+ offsetof(struct sk_msg_md, local_port) + 4),
+ BPF_EXIT_INSN(),
+ },
+ .errstr = "R0 !read_ok",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
+ "invalid read offset in SK_MSG",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+ offsetof(struct sk_msg_md, family) + 1),
+ BPF_EXIT_INSN(),
+ },
+ .errstr = "invalid bpf_context access",
+ .result = REJECT,
+ .prog_type = BPF_PROG_TYPE_SK_MSG,
+ },
+ {
"direct packet read for SK_MSG",
.insns = {
BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,
@@ -5594,6 +5798,257 @@
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},
{
+ "map lookup helper access to map",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 8 },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map update helper access to map",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_update_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 10 },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map update helper access to map: wrong size",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_update_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map1 = { 3 },
+ .fixup_map3 = { 10 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=8 off=0 size=16",
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via const imm)",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
+ offsetof(struct other_val, bar)),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 9 },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via const imm): out-of-bound 1",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2,
+ sizeof(struct other_val) - 4),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 9 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=16 off=12 size=8",
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via const imm): out-of-bound 2",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 9 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=16 off=-4 size=8",
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via const reg)",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_3,
+ offsetof(struct other_val, bar)),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 10 },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via const reg): out-of-bound 1",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_3,
+ sizeof(struct other_val) - 4),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 10 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=16 off=12 size=8",
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via const reg): out-of-bound 2",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_3, -4),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 10 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=16 off=-4 size=8",
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via variable)",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+ BPF_JMP_IMM(BPF_JGT, BPF_REG_3,
+ offsetof(struct other_val, bar), 4),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 11 },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via variable): no max check",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 10 },
+ .result = REJECT,
+ .errstr = "R2 unbounded memory access, make sure to bounds check any array access into a map",
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "map helper access to adjusted map (via variable): wrong max check",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
+ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+ BPF_JMP_IMM(BPF_JGT, BPF_REG_3,
+ offsetof(struct other_val, bar) + 1, 4),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_3),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map3 = { 3, 11 },
+ .result = REJECT,
+ .errstr = "invalid access to map value, value_size=16 off=9 size=8",
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
"map element value is preserved across register spilling",
.insns = {
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
@@ -11423,6 +11878,278 @@
.errstr = "BPF_XADD stores into R2 packet",
.prog_type = BPF_PROG_TYPE_XDP,
},
+ {
+ "bpf_get_stack return R0 within range",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 28),
+ BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+ BPF_MOV64_IMM(BPF_REG_9, sizeof(struct test_val)),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+ BPF_MOV64_IMM(BPF_REG_3, sizeof(struct test_val)),
+ BPF_MOV64_IMM(BPF_REG_4, 256),
+ BPF_EMIT_CALL(BPF_FUNC_get_stack),
+ BPF_MOV64_IMM(BPF_REG_1, 0),
+ BPF_MOV64_REG(BPF_REG_8, BPF_REG_0),
+ BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32),
+ BPF_ALU64_IMM(BPF_ARSH, BPF_REG_8, 32),
+ BPF_JMP_REG(BPF_JSLT, BPF_REG_1, BPF_REG_8, 16),
+ BPF_ALU64_REG(BPF_SUB, BPF_REG_9, BPF_REG_8),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_7),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_8),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_9),
+ BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32),
+ BPF_ALU64_IMM(BPF_ARSH, BPF_REG_1, 32),
+ BPF_MOV64_REG(BPF_REG_3, BPF_REG_2),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_1),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+ BPF_MOV64_IMM(BPF_REG_5, sizeof(struct test_val)),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_5),
+ BPF_JMP_REG(BPF_JGE, BPF_REG_3, BPF_REG_1, 4),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+ BPF_MOV64_REG(BPF_REG_3, BPF_REG_9),
+ BPF_MOV64_IMM(BPF_REG_4, 0),
+ BPF_EMIT_CALL(BPF_FUNC_get_stack),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map2 = { 4 },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_TRACEPOINT,
+ },
+ {
+ "ld_abs: invalid op 1",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_LD_ABS(BPF_DW, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = REJECT,
+ .errstr = "unknown opcode",
+ },
+ {
+ "ld_abs: invalid op 2",
+ .insns = {
+ BPF_MOV32_IMM(BPF_REG_0, 256),
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_LD_IND(BPF_DW, BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = REJECT,
+ .errstr = "unknown opcode",
+ },
+ {
+ "ld_abs: nmap reduced",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_LD_ABS(BPF_H, 12),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0x806, 28),
+ BPF_LD_ABS(BPF_H, 12),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0x806, 26),
+ BPF_MOV32_IMM(BPF_REG_0, 18),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -64),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_10, -64),
+ BPF_LD_IND(BPF_W, BPF_REG_7, 14),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -60),
+ BPF_MOV32_IMM(BPF_REG_0, 280971478),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -56),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_10, -56),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -60),
+ BPF_ALU32_REG(BPF_SUB, BPF_REG_0, BPF_REG_7),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 15),
+ BPF_LD_ABS(BPF_H, 12),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0x806, 13),
+ BPF_MOV32_IMM(BPF_REG_0, 22),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -56),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_10, -56),
+ BPF_LD_IND(BPF_H, BPF_REG_7, 14),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -52),
+ BPF_MOV32_IMM(BPF_REG_0, 17366),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -48),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_10, -48),
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -52),
+ BPF_ALU32_REG(BPF_SUB, BPF_REG_0, BPF_REG_7),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ BPF_MOV32_IMM(BPF_REG_0, 256),
+ BPF_EXIT_INSN(),
+ BPF_MOV32_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .data = {
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x08, 0x06, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0x10, 0xbf, 0x48, 0xd6, 0x43, 0xd6,
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 256,
+ },
+ {
+ "ld_abs: div + abs, test 1",
+ .insns = {
+ BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+ BPF_LD_ABS(BPF_B, 3),
+ BPF_ALU64_IMM(BPF_MOV, BPF_REG_2, 2),
+ BPF_ALU32_REG(BPF_DIV, BPF_REG_0, BPF_REG_2),
+ BPF_ALU64_REG(BPF_MOV, BPF_REG_8, BPF_REG_0),
+ BPF_LD_ABS(BPF_B, 4),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_8, BPF_REG_0),
+ BPF_LD_IND(BPF_B, BPF_REG_8, -70),
+ BPF_EXIT_INSN(),
+ },
+ .data = {
+ 10, 20, 30, 40, 50,
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 10,
+ },
+ {
+ "ld_abs: div + abs, test 2",
+ .insns = {
+ BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+ BPF_LD_ABS(BPF_B, 3),
+ BPF_ALU64_IMM(BPF_MOV, BPF_REG_2, 2),
+ BPF_ALU32_REG(BPF_DIV, BPF_REG_0, BPF_REG_2),
+ BPF_ALU64_REG(BPF_MOV, BPF_REG_8, BPF_REG_0),
+ BPF_LD_ABS(BPF_B, 128),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_8, BPF_REG_0),
+ BPF_LD_IND(BPF_B, BPF_REG_8, -70),
+ BPF_EXIT_INSN(),
+ },
+ .data = {
+ 10, 20, 30, 40, 50,
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 0,
+ },
+ {
+ "ld_abs: div + abs, test 3",
+ .insns = {
+ BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+ BPF_ALU64_IMM(BPF_MOV, BPF_REG_7, 0),
+ BPF_LD_ABS(BPF_B, 3),
+ BPF_ALU32_REG(BPF_DIV, BPF_REG_0, BPF_REG_7),
+ BPF_EXIT_INSN(),
+ },
+ .data = {
+ 10, 20, 30, 40, 50,
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 0,
+ },
+ {
+ "ld_abs: div + abs, test 4",
+ .insns = {
+ BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_1),
+ BPF_ALU64_IMM(BPF_MOV, BPF_REG_7, 0),
+ BPF_LD_ABS(BPF_B, 256),
+ BPF_ALU32_REG(BPF_DIV, BPF_REG_0, BPF_REG_7),
+ BPF_EXIT_INSN(),
+ },
+ .data = {
+ 10, 20, 30, 40, 50,
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 0,
+ },
+ {
+ "ld_abs: vlan + abs, test 1",
+ .insns = { },
+ .data = {
+ 0x34,
+ },
+ .fill_helper = bpf_fill_ld_abs_vlan_push_pop,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 0xbef,
+ },
+ {
+ "ld_abs: vlan + abs, test 2",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+ BPF_LD_ABS(BPF_B, 0),
+ BPF_LD_ABS(BPF_H, 0),
+ BPF_LD_ABS(BPF_W, 0),
+ BPF_MOV64_REG(BPF_REG_7, BPF_REG_6),
+ BPF_MOV64_IMM(BPF_REG_6, 0),
+ BPF_MOV64_REG(BPF_REG_1, BPF_REG_7),
+ BPF_MOV64_IMM(BPF_REG_2, 1),
+ BPF_MOV64_IMM(BPF_REG_3, 2),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_skb_vlan_push),
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_7),
+ BPF_LD_ABS(BPF_B, 0),
+ BPF_LD_ABS(BPF_H, 0),
+ BPF_LD_ABS(BPF_W, 0),
+ BPF_MOV64_IMM(BPF_REG_0, 42),
+ BPF_EXIT_INSN(),
+ },
+ .data = {
+ 0x34,
+ },
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 42,
+ },
+ {
+ "ld_abs: jump around ld_abs",
+ .insns = { },
+ .data = {
+ 10, 11,
+ },
+ .fill_helper = bpf_fill_jump_around_ld_abs,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 10,
+ },
+ {
+ "ld_dw: xor semi-random 64 bit imms, test 1",
+ .insns = { },
+ .data = { },
+ .fill_helper = bpf_fill_rand_ld_dw,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 4090,
+ },
+ {
+ "ld_dw: xor semi-random 64 bit imms, test 2",
+ .insns = { },
+ .data = { },
+ .fill_helper = bpf_fill_rand_ld_dw,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 2047,
+ },
+ {
+ "ld_dw: xor semi-random 64 bit imms, test 3",
+ .insns = { },
+ .data = { },
+ .fill_helper = bpf_fill_rand_ld_dw,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 511,
+ },
+ {
+ "ld_dw: xor semi-random 64 bit imms, test 4",
+ .insns = { },
+ .data = { },
+ .fill_helper = bpf_fill_rand_ld_dw,
+ .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+ .result = ACCEPT,
+ .retval = 5,
+ },
};
static int probe_filter_length(const struct bpf_insn *fp)
@@ -11526,16 +12253,20 @@
return outer_map_fd;
}
-static char bpf_vlog[32768];
+static char bpf_vlog[UINT_MAX >> 8];
static void do_test_fixup(struct bpf_test *test, struct bpf_insn *prog,
int *map_fds)
{
int *fixup_map1 = test->fixup_map1;
int *fixup_map2 = test->fixup_map2;
+ int *fixup_map3 = test->fixup_map3;
int *fixup_prog = test->fixup_prog;
int *fixup_map_in_map = test->fixup_map_in_map;
+ if (test->fill_helper)
+ test->fill_helper(test);
+
/* Allocating HTs with 1 elem is fine here, since we only test
* for verifier and not do a runtime lookup, so the only thing
* that really matters is value size in this case.
@@ -11556,6 +12287,14 @@
} while (*fixup_map2);
}
+ if (*fixup_map3) {
+ map_fds[1] = create_map(sizeof(struct other_val), 1);
+ do {
+ prog[*fixup_map3].imm = map_fds[1];
+ fixup_map3++;
+ } while (*fixup_map3);
+ }
+
if (*fixup_prog) {
map_fds[2] = create_prog_array();
do {
@@ -11577,10 +12316,8 @@
int *passes, int *errors)
{
int fd_prog, expected_ret, reject_from_alignment;
+ int prog_len, prog_type = test->prog_type;
struct bpf_insn *prog = test->insns;
- int prog_len = probe_filter_length(prog);
- char data_in[TEST_DATA_LEN] = {};
- int prog_type = test->prog_type;
int map_fds[MAX_NR_MAPS];
const char *expected_err;
uint32_t retval;
@@ -11590,6 +12327,7 @@
map_fds[i] = -1;
do_test_fixup(test, prog, map_fds);
+ prog_len = probe_filter_length(prog);
fd_prog = bpf_verify_program(prog_type ? : BPF_PROG_TYPE_SOCKET_FILTER,
prog, prog_len, test->flags & F_LOAD_WITH_STRICT_ALIGNMENT,
@@ -11629,8 +12367,9 @@
}
if (fd_prog >= 0) {
- err = bpf_prog_test_run(fd_prog, 1, data_in, sizeof(data_in),
- NULL, NULL, &retval, NULL);
+ err = bpf_prog_test_run(fd_prog, 1, test->data,
+ sizeof(test->data), NULL, NULL,
+ &retval, NULL);
if (err && errno != 524/*ENOTSUPP*/ && errno != EPERM) {
printf("Unexpected bpf_prog_test_run error\n");
goto fail_log;
@@ -11788,5 +12527,6 @@
return EXIT_FAILURE;
}
+ bpf_semi_rand_init();
return do_test(unpriv, from, to);
}
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
new file mode 100644
index 0000000..3868dcb
--- /dev/null
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <errno.h>
+#include <poll.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/mman.h>
+#include "trace_helpers.h"
+
+#define MAX_SYMS 300000
+static struct ksym syms[MAX_SYMS];
+static int sym_cnt;
+
+static int ksym_cmp(const void *p1, const void *p2)
+{
+ return ((struct ksym *)p1)->addr - ((struct ksym *)p2)->addr;
+}
+
+int load_kallsyms(void)
+{
+ FILE *f = fopen("/proc/kallsyms", "r");
+ char func[256], buf[256];
+ char symbol;
+ void *addr;
+ int i = 0;
+
+ if (!f)
+ return -ENOENT;
+
+ while (!feof(f)) {
+ if (!fgets(buf, sizeof(buf), f))
+ break;
+ if (sscanf(buf, "%p %c %s", &addr, &symbol, func) != 3)
+ break;
+ if (!addr)
+ continue;
+ syms[i].addr = (long) addr;
+ syms[i].name = strdup(func);
+ i++;
+ }
+ sym_cnt = i;
+ qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp);
+ return 0;
+}
+
+struct ksym *ksym_search(long key)
+{
+ int start = 0, end = sym_cnt;
+ int result;
+
+ while (start < end) {
+ size_t mid = start + (end - start) / 2;
+
+ result = key - syms[mid].addr;
+ if (result < 0)
+ end = mid;
+ else if (result > 0)
+ start = mid + 1;
+ else
+ return &syms[mid];
+ }
+
+ if (start >= 1 && syms[start - 1].addr < key &&
+ key < syms[start].addr)
+ /* valid ksym */
+ return &syms[start - 1];
+
+ /* out of range. return _stext */
+ return &syms[0];
+}
+
+long ksym_get_addr(const char *name)
+{
+ int i;
+
+ for (i = 0; i < sym_cnt; i++) {
+ if (strcmp(syms[i].name, name) == 0)
+ return syms[i].addr;
+ }
+
+ return 0;
+}
+
+static int page_size;
+static int page_cnt = 8;
+static struct perf_event_mmap_page *header;
+
+int perf_event_mmap(int fd)
+{
+ void *base;
+ int mmap_size;
+
+ page_size = getpagesize();
+ mmap_size = page_size * (page_cnt + 1);
+
+ base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (base == MAP_FAILED) {
+ printf("mmap err\n");
+ return -1;
+ }
+
+ header = base;
+ return 0;
+}
+
+static int perf_event_poll(int fd)
+{
+ struct pollfd pfd = { .fd = fd, .events = POLLIN };
+
+ return poll(&pfd, 1, 1000);
+}
+
+struct perf_event_sample {
+ struct perf_event_header header;
+ __u32 size;
+ char data[];
+};
+
+static enum bpf_perf_event_ret bpf_perf_event_print(void *event, void *priv)
+{
+ struct perf_event_sample *e = event;
+ perf_event_print_fn fn = priv;
+ int ret;
+
+ if (e->header.type == PERF_RECORD_SAMPLE) {
+ ret = fn(e->data, e->size);
+ if (ret != LIBBPF_PERF_EVENT_CONT)
+ return ret;
+ } else if (e->header.type == PERF_RECORD_LOST) {
+ struct {
+ struct perf_event_header header;
+ __u64 id;
+ __u64 lost;
+ } *lost = (void *) e;
+ printf("lost %lld events\n", lost->lost);
+ } else {
+ printf("unknown event type=%d size=%d\n",
+ e->header.type, e->header.size);
+ }
+
+ return LIBBPF_PERF_EVENT_CONT;
+}
+
+int perf_event_poller(int fd, perf_event_print_fn output_fn)
+{
+ enum bpf_perf_event_ret ret;
+ void *buf = NULL;
+ size_t len = 0;
+
+ for (;;) {
+ perf_event_poll(fd);
+ ret = bpf_perf_event_read_simple(header, page_cnt * page_size,
+ page_size, &buf, &len,
+ bpf_perf_event_print,
+ output_fn);
+ if (ret != LIBBPF_PERF_EVENT_CONT)
+ break;
+ }
+ free(buf);
+
+ return ret;
+}
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
new file mode 100644
index 0000000..3b4bcf7
--- /dev/null
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __TRACE_HELPER_H
+#define __TRACE_HELPER_H
+
+#include <libbpf.h>
+
+struct ksym {
+ long addr;
+ char *name;
+};
+
+int load_kallsyms(void);
+struct ksym *ksym_search(long key);
+long ksym_get_addr(const char *name);
+
+typedef enum bpf_perf_event_ret (*perf_event_print_fn)(void *data, int size);
+
+int perf_event_mmap(int fd);
+/* return LIBBPF_PERF_EVENT_DONE or LIBBPF_PERF_EVENT_ERROR */
+int perf_event_poller(int fd, perf_event_print_fn output_fn);
+#endif
diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
index 4acfdeb..9de8b7c 100644
--- a/tools/testing/selftests/bpf/urandom_read.c
+++ b/tools/testing/selftests/bpf/urandom_read.c
@@ -6,15 +6,21 @@
#include <stdlib.h>
#define BUF_SIZE 256
-int main(void)
+
+int main(int argc, char *argv[])
{
int fd = open("/dev/urandom", O_RDONLY);
int i;
char buf[BUF_SIZE];
+ int count = 4;
if (fd < 0)
return 1;
- for (i = 0; i < 4; ++i)
+
+ if (argc == 2)
+ count = atoi(argv[1]);
+
+ for (i = 0; i < count; ++i)
read(fd, buf, BUF_SIZE);
close(fd);
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index c612d6e..f0e6c35 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -7,3 +7,7 @@
reuseport_bpf_numa
reuseport_dualstack
reuseaddr_conflict
+tcp_mmap
+udpgso
+udpgso_bench_rx
+udpgso_bench_tx
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 3ff81a4..7cb0f49 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -5,13 +5,18 @@
CFLAGS += -I../../../../usr/include/
TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh rtnetlink.sh
-TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh
+TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh
+TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh
TEST_PROGS_EXTENDED := in_netns.sh
TEST_GEN_FILES = socket
TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
+TEST_GEN_FILES += tcp_mmap tcp_inq
+TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
include ../lib.mk
$(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma
+$(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread
+$(OUTPUT)/tcp_inq: LDFLAGS += -lpthread
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
new file mode 100755
index 0000000..d4cfb6a
--- /dev/null
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -0,0 +1,248 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB rules API
+
+ret=0
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+IP="ip -netns testns"
+
+RTABLE=100
+GW_IP4=192.51.100.2
+SRC_IP=192.51.100.3
+GW_IP6=2001:db8:1::2
+SRC_IP6=2001:db8:1::3
+
+DEV_ADDR=192.51.100.1
+DEV=dummy0
+
+log_test()
+{
+ local rc=$1
+ local expected=$2
+ local msg="$3"
+
+ if [ ${rc} -eq ${expected} ]; then
+ nsuccess=$((nsuccess+1))
+ printf "\n TEST: %-50s [ OK ]\n" "${msg}"
+ else
+ nfail=$((nfail+1))
+ printf "\n TEST: %-50s [FAIL]\n" "${msg}"
+ if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+ echo
+ echo "hit enter to continue, 'q' to quit"
+ read a
+ [ "$a" = "q" ] && exit 1
+ fi
+ fi
+}
+
+log_section()
+{
+ echo
+ echo "######################################################################"
+ echo "TEST SECTION: $*"
+ echo "######################################################################"
+}
+
+setup()
+{
+ set -e
+ ip netns add testns
+ $IP link set dev lo up
+
+ $IP link add dummy0 type dummy
+ $IP link set dev dummy0 up
+ $IP address add 198.51.100.1/24 dev dummy0
+ $IP -6 address add 2001:db8:1::1/64 dev dummy0
+
+ set +e
+}
+
+cleanup()
+{
+ $IP link del dev dummy0 &> /dev/null
+ ip netns del testns
+}
+
+fib_check_iproute_support()
+{
+ ip rule help 2>&1 | grep -q $1
+ if [ $? -ne 0 ]; then
+ echo "SKIP: iproute2 iprule too old, missing $1 match"
+ return 1
+ fi
+
+ ip route get help 2>&1 | grep -q $2
+ if [ $? -ne 0 ]; then
+ echo "SKIP: iproute2 get route too old, missing $2 match"
+ return 1
+ fi
+
+ return 0
+}
+
+fib_rule6_del()
+{
+ $IP -6 rule del $1
+ log_test $? 0 "rule6 del $1"
+}
+
+fib_rule6_del_by_pref()
+{
+ pref=$($IP -6 rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+ $IP -6 rule del pref $pref
+}
+
+fib_rule6_test_match_n_redirect()
+{
+ local match="$1"
+ local getmatch="$2"
+
+ $IP -6 rule add $match table $RTABLE
+ $IP -6 route get $GW_IP6 $getmatch | grep -q "table $RTABLE"
+ log_test $? 0 "rule6 check: $1"
+
+ fib_rule6_del_by_pref "$match"
+ log_test $? 0 "rule6 del by pref: $match"
+}
+
+fib_rule6_test()
+{
+ # setup the fib rule redirect route
+ $IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
+
+ match="oif $DEV"
+ fib_rule6_test_match_n_redirect "$match" "$match" "oif redirect to table"
+
+ match="from $SRC_IP6 iif $DEV"
+ fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to table"
+
+ match="tos 0x10"
+ fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to table"
+
+ match="fwmark 0x64"
+ getmatch="mark 0x64"
+ fib_rule6_test_match_n_redirect "$match" "$getmatch" "fwmark redirect to table"
+
+ fib_check_iproute_support "uidrange" "uid"
+ if [ $? -eq 0 ]; then
+ match="uidrange 100-100"
+ getmatch="uid 100"
+ fib_rule6_test_match_n_redirect "$match" "$getmatch" "uid redirect to table"
+ fi
+
+ fib_check_iproute_support "sport" "sport"
+ if [ $? -eq 0 ]; then
+ match="sport 666 dport 777"
+ fib_rule6_test_match_n_redirect "$match" "$match" "sport and dport redirect to table"
+ fi
+
+ fib_check_iproute_support "ipproto" "ipproto"
+ if [ $? -eq 0 ]; then
+ match="ipproto tcp"
+ fib_rule6_test_match_n_redirect "$match" "$match" "ipproto match"
+ fi
+
+ fib_check_iproute_support "ipproto" "ipproto"
+ if [ $? -eq 0 ]; then
+ match="ipproto icmp"
+ fib_rule6_test_match_n_redirect "$match" "$match" "ipproto icmp match"
+ fi
+}
+
+fib_rule4_del()
+{
+ $IP rule del $1
+ log_test $? 0 "del $1"
+}
+
+fib_rule4_del_by_pref()
+{
+ pref=$($IP rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+ $IP rule del pref $pref
+}
+
+fib_rule4_test_match_n_redirect()
+{
+ local match="$1"
+ local getmatch="$2"
+
+ $IP rule add $match table $RTABLE
+ $IP route get $GW_IP4 $getmatch | grep -q "table $RTABLE"
+ log_test $? 0 "rule4 check: $1"
+
+ fib_rule4_del_by_pref "$match"
+ log_test $? 0 "rule4 del by pref: $match"
+}
+
+fib_rule4_test()
+{
+ # setup the fib rule redirect route
+ $IP route add table $RTABLE default via $GW_IP4 dev $DEV onlink
+
+ match="oif $DEV"
+ fib_rule4_test_match_n_redirect "$match" "$match" "oif redirect to table"
+
+ match="from $SRC_IP iif $DEV"
+ fib_rule4_test_match_n_redirect "$match" "$match" "iif redirect to table"
+
+ match="tos 0x10"
+ fib_rule4_test_match_n_redirect "$match" "$match" "tos redirect to table"
+
+ match="fwmark 0x64"
+ getmatch="mark 0x64"
+ fib_rule4_test_match_n_redirect "$match" "$getmatch" "fwmark redirect to table"
+
+ fib_check_iproute_support "uidrange" "uid"
+ if [ $? -eq 0 ]; then
+ match="uidrange 100-100"
+ getmatch="uid 100"
+ fib_rule4_test_match_n_redirect "$match" "$getmatch" "uid redirect to table"
+ fi
+
+ fib_check_iproute_support "sport" "sport"
+ if [ $? -eq 0 ]; then
+ match="sport 666 dport 777"
+ fib_rule4_test_match_n_redirect "$match" "$match" "sport and dport redirect to table"
+ fi
+
+ fib_check_iproute_support "ipproto" "ipproto"
+ if [ $? -eq 0 ]; then
+ match="ipproto tcp"
+ fib_rule4_test_match_n_redirect "$match" "$match" "ipproto tcp match"
+ fi
+
+ fib_check_iproute_support "ipproto" "ipproto"
+ if [ $? -eq 0 ]; then
+ match="ipproto icmp"
+ fib_rule4_test_match_n_redirect "$match" "$match" "ipproto icmp match"
+ fi
+}
+
+run_fibrule_tests()
+{
+ log_section "IPv4 fib rule"
+ fib_rule4_test
+ log_section "IPv6 fib rule"
+ fib_rule6_test
+}
+
+if [ "$(id -u)" -ne 0 ];then
+ echo "SKIP: Need root privileges"
+ exit 0
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+ echo "SKIP: Could not run test without ip tool"
+ exit 0
+fi
+
+# start clean
+cleanup &> /dev/null
+setup
+run_fibrule_tests
+cleanup
+
+exit $ret
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 9164e60..e7d76fb 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -6,8 +6,10 @@
ret=0
-VERBOSE=${VERBOSE:=0}
-PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt"
+VERBOSE=0
+PAUSE_ON_FAIL=no
+PAUSE=no
IP="ip -netns testns"
log_test()
@@ -18,8 +20,10 @@
if [ ${rc} -eq ${expected} ]; then
printf " TEST: %-60s [ OK ]\n" "${msg}"
+ nsuccess=$((nsuccess+1))
else
ret=1
+ nfail=$((nfail+1))
printf " TEST: %-60s [FAIL]\n" "${msg}"
if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
echo
@@ -28,6 +32,13 @@
[ "$a" = "q" ] && exit 1
fi
fi
+
+ if [ "${PAUSE}" = "yes" ]; then
+ echo
+ echo "hit enter to continue, 'q' to quit"
+ read a
+ [ "$a" = "q" ] && exit 1
+ fi
}
setup()
@@ -563,20 +574,649 @@
}
################################################################################
-#
+# Tests on route add and replace
-fib_test()
+run_cmd()
{
- if [ -n "$TEST" ]; then
- eval $TEST
- else
- fib_unreg_test
- fib_down_test
- fib_carrier_test
- fib_nexthop_test
+ local cmd="$1"
+ local out
+ local stderr="2>/dev/null"
+
+ if [ "$VERBOSE" = "1" ]; then
+ printf " COMMAND: $cmd\n"
+ stderr=
+ fi
+
+ out=$(eval $cmd $stderr)
+ rc=$?
+ if [ "$VERBOSE" = "1" -a -n "$out" ]; then
+ echo " $out"
+ fi
+
+ [ "$VERBOSE" = "1" ] && echo
+
+ return $rc
+}
+
+# add route for a prefix, flushing any existing routes first
+# expected to be the first step of a test
+add_route6()
+{
+ local pfx="$1"
+ local nh="$2"
+ local out
+
+ if [ "$VERBOSE" = "1" ]; then
+ echo
+ echo " ##################################################"
+ echo
+ fi
+
+ run_cmd "$IP -6 ro flush ${pfx}"
+ [ $? -ne 0 ] && exit 1
+
+ out=$($IP -6 ro ls match ${pfx})
+ if [ -n "$out" ]; then
+ echo "Failed to flush routes for prefix used for tests."
+ exit 1
+ fi
+
+ run_cmd "$IP -6 ro add ${pfx} ${nh}"
+ if [ $? -ne 0 ]; then
+ echo "Failed to add initial route for test."
+ exit 1
fi
}
+# add initial route - used in replace route tests
+add_initial_route6()
+{
+ add_route6 "2001:db8:104::/64" "$1"
+}
+
+check_route6()
+{
+ local pfx="2001:db8:104::/64"
+ local expected="$1"
+ local out
+ local rc=0
+
+ out=$($IP -6 ro ls match ${pfx} | sed -e 's/ pref medium//')
+ if [ -z "${out}" ]; then
+ if [ "$VERBOSE" = "1" ]; then
+ printf "\nNo route entry found\n"
+ printf "Expected:\n"
+ printf " ${expected}\n"
+ fi
+ return 1
+ fi
+
+ # tricky way to convert output to 1-line without ip's
+ # messy '\'; this drops all extra white space
+ out=$(echo ${out})
+ if [ "${out}" != "${expected}" ]; then
+ rc=1
+ if [ "${VERBOSE}" = "1" ]; then
+ printf " Unexpected route entry. Have:\n"
+ printf " ${out}\n"
+ printf " Expected:\n"
+ printf " ${expected}\n\n"
+ fi
+ fi
+
+ return $rc
+}
+
+route_cleanup()
+{
+ $IP li del red 2>/dev/null
+ $IP li del dummy1 2>/dev/null
+ $IP li del veth1 2>/dev/null
+ $IP li del veth3 2>/dev/null
+
+ cleanup &> /dev/null
+}
+
+route_setup()
+{
+ route_cleanup
+ setup
+
+ [ "${VERBOSE}" = "1" ] && set -x
+ set -e
+
+ $IP li add red up type vrf table 101
+ $IP li add veth1 type veth peer name veth2
+ $IP li add veth3 type veth peer name veth4
+
+ $IP li set veth1 up
+ $IP li set veth3 up
+ $IP li set veth2 vrf red up
+ $IP li set veth4 vrf red up
+ $IP li add dummy1 type dummy
+ $IP li set dummy1 vrf red up
+
+ $IP -6 addr add 2001:db8:101::1/64 dev veth1
+ $IP -6 addr add 2001:db8:101::2/64 dev veth2
+ $IP -6 addr add 2001:db8:103::1/64 dev veth3
+ $IP -6 addr add 2001:db8:103::2/64 dev veth4
+ $IP -6 addr add 2001:db8:104::1/64 dev dummy1
+
+ $IP addr add 172.16.101.1/24 dev veth1
+ $IP addr add 172.16.101.2/24 dev veth2
+ $IP addr add 172.16.103.1/24 dev veth3
+ $IP addr add 172.16.103.2/24 dev veth4
+ $IP addr add 172.16.104.1/24 dev dummy1
+
+ set +ex
+}
+
+# assumption is that basic add of a single path route works
+# otherwise just adding an address on an interface is broken
+ipv6_rt_add()
+{
+ local rc
+
+ echo
+ echo "IPv6 route add / append tests"
+
+ # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2"
+ log_test $? 2 "Attempt to add duplicate route - gw"
+
+ # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro add 2001:db8:104::/64 dev veth3"
+ log_test $? 2 "Attempt to add duplicate route - dev only"
+
+ # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro add unreachable 2001:db8:104::/64"
+ log_test $? 2 "Attempt to add duplicate route - reject route"
+
+ # iproute2 prepend only sets NLM_F_CREATE
+ # - adds a new route; does NOT convert existing route to ECMP
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro prepend 2001:db8:104::/64 via 2001:db8:103::2"
+ check_route6 "2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024 2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024"
+ log_test $? 0 "Add new route for existing prefix (w/o NLM_F_EXCL)"
+
+ # route append with same prefix adds a new route
+ # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro append 2001:db8:104::/64 via 2001:db8:103::2"
+ check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1"
+ log_test $? 0 "Append nexthop to existing route - gw"
+
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro append 2001:db8:104::/64 dev veth3"
+ check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop dev veth3 weight 1"
+ log_test $? 0 "Append nexthop to existing route - dev only"
+
+ # multipath route can not have a nexthop that is a reject route
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro append unreachable 2001:db8:104::/64"
+ log_test $? 2 "Append nexthop to existing route - reject route"
+
+ # reject route can not be converted to multipath route
+ run_cmd "$IP -6 ro flush 2001:db8:104::/64"
+ run_cmd "$IP -6 ro add unreachable 2001:db8:104::/64"
+ run_cmd "$IP -6 ro append 2001:db8:104::/64 via 2001:db8:103::2"
+ log_test $? 2 "Append nexthop to existing reject route - gw"
+
+ run_cmd "$IP -6 ro flush 2001:db8:104::/64"
+ run_cmd "$IP -6 ro add unreachable 2001:db8:104::/64"
+ run_cmd "$IP -6 ro append 2001:db8:104::/64 dev veth3"
+ log_test $? 2 "Append nexthop to existing reject route - dev only"
+
+ # insert mpath directly
+ add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1"
+ log_test $? 0 "Add multipath route"
+
+ add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro add 2001:db8:104::/64 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ log_test $? 2 "Attempt to add duplicate multipath route"
+
+ # insert of a second route without append but different metric
+ add_route6 "2001:db8:104::/64" "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2 metric 512"
+ rc=$?
+ if [ $rc -eq 0 ]; then
+ run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::3 metric 256"
+ rc=$?
+ fi
+ log_test $rc 0 "Route add with different metrics"
+
+ run_cmd "$IP -6 ro del 2001:db8:104::/64 metric 512"
+ rc=$?
+ if [ $rc -eq 0 ]; then
+ check_route6 "2001:db8:104::/64 via 2001:db8:103::3 dev veth3 metric 256 2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024"
+ rc=$?
+ fi
+ log_test $rc 0 "Route delete with metric"
+}
+
+ipv6_rt_replace_single()
+{
+ # single path with single path
+ #
+ add_initial_route6 "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:103::2"
+ check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024"
+ log_test $? 0 "Single path with single path"
+
+ # single path with multipath
+ #
+ add_initial_route6 "nexthop via 2001:db8:101::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::2"
+ check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1"
+ log_test $? 0 "Single path with multipath"
+
+ # single path with reject
+ #
+ add_initial_route6 "nexthop via 2001:db8:101::2"
+ run_cmd "$IP -6 ro replace unreachable 2001:db8:104::/64"
+ check_route6 "unreachable 2001:db8:104::/64 dev lo metric 1024"
+ log_test $? 0 "Single path with reject route"
+
+ # single path with single path using MULTIPATH attribute
+ #
+ add_initial_route6 "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:103::2"
+ check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024"
+ log_test $? 0 "Single path with single path via multipath attribute"
+
+ # route replace fails - invalid nexthop
+ add_initial_route6 "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:104::2"
+ if [ $? -eq 0 ]; then
+ # previous command is expected to fail so if it returns 0
+ # that means the test failed.
+ log_test 0 1 "Invalid nexthop"
+ else
+ check_route6 "2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024"
+ log_test $? 0 "Invalid nexthop"
+ fi
+
+ # replace non-existent route
+ # - note use of change versus replace since ip adds NLM_F_CREATE
+ # for replace
+ add_initial_route6 "via 2001:db8:101::2"
+ run_cmd "$IP -6 ro change 2001:db8:105::/64 via 2001:db8:101::2"
+ log_test $? 2 "Single path - replace of non-existent route"
+}
+
+ipv6_rt_replace_mpath()
+{
+ # multipath with multipath
+ add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3"
+ check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::3 dev veth3 weight 1"
+ log_test $? 0 "Multipath with multipath"
+
+ # multipath with single
+ add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:101::3"
+ check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024"
+ log_test $? 0 "Multipath with single path"
+
+ # multipath with single
+ add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3"
+ check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024"
+ log_test $? 0 "Multipath with single path via multipath attribute"
+
+ # multipath with reject
+ add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro replace unreachable 2001:db8:104::/64"
+ check_route6 "unreachable 2001:db8:104::/64 dev lo metric 1024"
+ log_test $? 0 "Multipath with reject route"
+
+ # route replace fails - invalid nexthop 1
+ add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3"
+ check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1"
+ log_test $? 0 "Multipath - invalid first nexthop"
+
+ # route replace fails - invalid nexthop 2
+ add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:113::3"
+ check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1"
+ log_test $? 0 "Multipath - invalid second nexthop"
+
+ # multipath non-existent route
+ add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2"
+ run_cmd "$IP -6 ro change 2001:db8:105::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3"
+ log_test $? 2 "Multipath - replace of non-existent route"
+}
+
+ipv6_rt_replace()
+{
+ echo
+ echo "IPv6 route replace tests"
+
+ ipv6_rt_replace_single
+ ipv6_rt_replace_mpath
+}
+
+ipv6_route_test()
+{
+ route_setup
+
+ ipv6_rt_add
+ ipv6_rt_replace
+
+ route_cleanup
+}
+
+# add route for a prefix, flushing any existing routes first
+# expected to be the first step of a test
+add_route()
+{
+ local pfx="$1"
+ local nh="$2"
+ local out
+
+ if [ "$VERBOSE" = "1" ]; then
+ echo
+ echo " ##################################################"
+ echo
+ fi
+
+ run_cmd "$IP ro flush ${pfx}"
+ [ $? -ne 0 ] && exit 1
+
+ out=$($IP ro ls match ${pfx})
+ if [ -n "$out" ]; then
+ echo "Failed to flush routes for prefix used for tests."
+ exit 1
+ fi
+
+ run_cmd "$IP ro add ${pfx} ${nh}"
+ if [ $? -ne 0 ]; then
+ echo "Failed to add initial route for test."
+ exit 1
+ fi
+}
+
+# add initial route - used in replace route tests
+add_initial_route()
+{
+ add_route "172.16.104.0/24" "$1"
+}
+
+check_route()
+{
+ local pfx="172.16.104.0/24"
+ local expected="$1"
+ local out
+ local rc=0
+
+ out=$($IP ro ls match ${pfx})
+ if [ -z "${out}" ]; then
+ if [ "$VERBOSE" = "1" ]; then
+ printf "\nNo route entry found\n"
+ printf "Expected:\n"
+ printf " ${expected}\n"
+ fi
+ return 1
+ fi
+
+ # tricky way to convert output to 1-line without ip's
+ # messy '\'; this drops all extra white space
+ out=$(echo ${out})
+ if [ "${out}" != "${expected}" ]; then
+ rc=1
+ if [ "${VERBOSE}" = "1" ]; then
+ printf " Unexpected route entry. Have:\n"
+ printf " ${out}\n"
+ printf " Expected:\n"
+ printf " ${expected}\n\n"
+ fi
+ fi
+
+ return $rc
+}
+
+# assumption is that basic add of a single path route works
+# otherwise just adding an address on an interface is broken
+ipv4_rt_add()
+{
+ local rc
+
+ echo
+ echo "IPv4 route add / append tests"
+
+ # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2"
+ log_test $? 2 "Attempt to add duplicate route - gw"
+
+ # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro add 172.16.104.0/24 dev veth3"
+ log_test $? 2 "Attempt to add duplicate route - dev only"
+
+ # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro add unreachable 172.16.104.0/24"
+ log_test $? 2 "Attempt to add duplicate route - reject route"
+
+ # iproute2 prepend only sets NLM_F_CREATE
+ # - adds a new route; does NOT convert existing route to ECMP
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro prepend 172.16.104.0/24 via 172.16.103.2"
+ check_route "172.16.104.0/24 via 172.16.103.2 dev veth3 172.16.104.0/24 via 172.16.101.2 dev veth1"
+ log_test $? 0 "Add new nexthop for existing prefix"
+
+ # route append with same prefix adds a new route
+ # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2"
+ check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.2 dev veth3"
+ log_test $? 0 "Append nexthop to existing route - gw"
+
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro append 172.16.104.0/24 dev veth3"
+ check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 dev veth3 scope link"
+ log_test $? 0 "Append nexthop to existing route - dev only"
+
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro append unreachable 172.16.104.0/24"
+ check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 unreachable 172.16.104.0/24"
+ log_test $? 0 "Append nexthop to existing route - reject route"
+
+ run_cmd "$IP ro flush 172.16.104.0/24"
+ run_cmd "$IP ro add unreachable 172.16.104.0/24"
+ run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2"
+ check_route "unreachable 172.16.104.0/24 172.16.104.0/24 via 172.16.103.2 dev veth3"
+ log_test $? 0 "Append nexthop to existing reject route - gw"
+
+ run_cmd "$IP ro flush 172.16.104.0/24"
+ run_cmd "$IP ro add unreachable 172.16.104.0/24"
+ run_cmd "$IP ro append 172.16.104.0/24 dev veth3"
+ check_route "unreachable 172.16.104.0/24 172.16.104.0/24 dev veth3 scope link"
+ log_test $? 0 "Append nexthop to existing reject route - dev only"
+
+ # insert mpath directly
+ add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1"
+ log_test $? 0 "add multipath route"
+
+ add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ log_test $? 2 "Attempt to add duplicate multipath route"
+
+ # insert of a second route without append but different metric
+ add_route "172.16.104.0/24" "via 172.16.101.2"
+ run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2 metric 512"
+ rc=$?
+ if [ $rc -eq 0 ]; then
+ run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.3 metric 256"
+ rc=$?
+ fi
+ log_test $rc 0 "Route add with different metrics"
+
+ run_cmd "$IP ro del 172.16.104.0/24 metric 512"
+ rc=$?
+ if [ $rc -eq 0 ]; then
+ check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.3 dev veth3 metric 256"
+ rc=$?
+ fi
+ log_test $rc 0 "Route delete with metric"
+}
+
+ipv4_rt_replace_single()
+{
+ # single path with single path
+ #
+ add_initial_route "via 172.16.101.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.103.2"
+ check_route "172.16.104.0/24 via 172.16.103.2 dev veth3"
+ log_test $? 0 "Single path with single path"
+
+ # single path with multipath
+ #
+ add_initial_route "nexthop via 172.16.101.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.2"
+ check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1"
+ log_test $? 0 "Single path with multipath"
+
+ # single path with reject
+ #
+ add_initial_route "nexthop via 172.16.101.2"
+ run_cmd "$IP ro replace unreachable 172.16.104.0/24"
+ check_route "unreachable 172.16.104.0/24"
+ log_test $? 0 "Single path with reject route"
+
+ # single path with single path using MULTIPATH attribute
+ #
+ add_initial_route "via 172.16.101.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.103.2"
+ check_route "172.16.104.0/24 via 172.16.103.2 dev veth3"
+ log_test $? 0 "Single path with single path via multipath attribute"
+
+ # route replace fails - invalid nexthop
+ add_initial_route "via 172.16.101.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 via 2001:db8:104::2"
+ if [ $? -eq 0 ]; then
+ # previous command is expected to fail so if it returns 0
+ # that means the test failed.
+ log_test 0 1 "Invalid nexthop"
+ else
+ check_route "172.16.104.0/24 via 172.16.101.2 dev veth1"
+ log_test $? 0 "Invalid nexthop"
+ fi
+
+ # replace non-existent route
+ # - note use of change versus replace since ip adds NLM_F_CREATE
+ # for replace
+ add_initial_route "via 172.16.101.2"
+ run_cmd "$IP ro change 172.16.105.0/24 via 172.16.101.2"
+ log_test $? 2 "Single path - replace of non-existent route"
+}
+
+ipv4_rt_replace_mpath()
+{
+ # multipath with multipath
+ add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3"
+ check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.3 dev veth3 weight 1"
+ log_test $? 0 "Multipath with multipath"
+
+ # multipath with single
+ add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.101.3"
+ check_route "172.16.104.0/24 via 172.16.101.3 dev veth1"
+ log_test $? 0 "Multipath with single path"
+
+ # multipath with single
+ add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3"
+ check_route "172.16.104.0/24 via 172.16.101.3 dev veth1"
+ log_test $? 0 "Multipath with single path via multipath attribute"
+
+ # multipath with reject
+ add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro replace unreachable 172.16.104.0/24"
+ check_route "unreachable 172.16.104.0/24"
+ log_test $? 0 "Multipath with reject route"
+
+ # route replace fails - invalid nexthop 1
+ add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.111.3 nexthop via 172.16.103.3"
+ check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1"
+ log_test $? 0 "Multipath - invalid first nexthop"
+
+ # route replace fails - invalid nexthop 2
+ add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.113.3"
+ check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1"
+ log_test $? 0 "Multipath - invalid second nexthop"
+
+ # multipath non-existent route
+ add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2"
+ run_cmd "$IP ro change 172.16.105.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3"
+ log_test $? 2 "Multipath - replace of non-existent route"
+}
+
+ipv4_rt_replace()
+{
+ echo
+ echo "IPv4 route replace tests"
+
+ ipv4_rt_replace_single
+ ipv4_rt_replace_mpath
+}
+
+ipv4_route_test()
+{
+ route_setup
+
+ ipv4_rt_add
+ ipv4_rt_replace
+
+ route_cleanup
+}
+
+################################################################################
+# usage
+
+usage()
+{
+ cat <<EOF
+usage: ${0##*/} OPTS
+
+ -t <test> Test(s) to run (default: all)
+ (options: $TESTS)
+ -p Pause on fail
+ -P Pause after each test before cleanup
+ -v verbose mode (show commands and output)
+EOF
+}
+
+################################################################################
+# main
+
+while getopts :t:pPhv o
+do
+ case $o in
+ t) TESTS=$OPTARG;;
+ p) PAUSE_ON_FAIL=yes;;
+ P) PAUSE=yes;;
+ v) VERBOSE=$(($VERBOSE + 1));;
+ h) usage; exit 0;;
+ *) usage; exit 1;;
+ esac
+done
+
+PEER_CMD="ip netns exec ${PEER_NS}"
+
+# make sure we don't pause twice
+[ "${PAUSE}" = "yes" ] && PAUSE_ON_FAIL=no
+
if [ "$(id -u)" -ne 0 ];then
echo "SKIP: Need root privileges"
exit 0
@@ -596,6 +1236,23 @@
# start clean
cleanup &> /dev/null
-fib_test
+for t in $TESTS
+do
+ case $t in
+ fib_unreg_test|unregister) fib_unreg_test;;
+ fib_down_test|down) fib_down_test;;
+ fib_carrier_test|carrier) fib_carrier_test;;
+ fib_nexthop_test|nexthop) fib_nexthop_test;;
+ ipv6_route_test|ipv6_rt) ipv6_route_test;;
+ ipv4_route_test|ipv4_rt) ipv4_route_test;;
+
+ help) echo "Test names: $TESTS"; exit 0;;
+ esac
+done
+
+if [ "$TESTS" != "none" ]; then
+ printf "\nTests passed: %3d\n" ${nsuccess}
+ printf "Tests failed: %3d\n" ${nfail}
+fi
exit $ret
diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
index 75d9224..d8313d0 100755
--- a/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6 learning flooding"
NUM_NETIFS=4
CHECK_TC="yes"
source lib.sh
@@ -75,14 +76,31 @@
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 192.0.2.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:1::2
+}
+
+learning()
+{
+ learning_test "br0" $swp1 $h1 $h2
+}
+
+flooding()
+{
+ flood_test $swp2 $h1 $h2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 192.0.2.2
-ping6_test $h1 2001:db8:1::2
-learning_test "br0" $swp1 $h1 $h2
-flood_test $swp2 $h1 $h2
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
index 1cddf06..c15c6c8 100755
--- a/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6 learning flooding"
NUM_NETIFS=4
source lib.sh
@@ -73,14 +74,31 @@
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 192.0.2.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:1::2
+}
+
+learning()
+{
+ learning_test "br0" $swp1 $h1 $h2
+}
+
+flooding()
+{
+ flood_test $swp2 $h1 $h2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 192.0.2.2
-ping6_test $h1 2001:db8:1::2
-learning_test "br0" $swp1 $h1 $h2
-flood_test $swp2 $h1 $h2
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 1ac6c62..89ba4cd 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -321,6 +321,50 @@
vrf_destroy $vrf_name
}
+tunnel_create()
+{
+ local name=$1; shift
+ local type=$1; shift
+ local local=$1; shift
+ local remote=$1; shift
+
+ ip link add name $name type $type \
+ local $local remote $remote "$@"
+ ip link set dev $name up
+}
+
+tunnel_destroy()
+{
+ local name=$1; shift
+
+ ip link del dev $name
+}
+
+vlan_create()
+{
+ local if_name=$1; shift
+ local vid=$1; shift
+ local vrf=$1; shift
+ local ips=("${@}")
+ local name=$if_name.$vid
+
+ ip link add name $name link $if_name type vlan id $vid
+ if [ "$vrf" != "" ]; then
+ ip link set dev $name master $vrf
+ fi
+ ip link set dev $name up
+ __addr_add_del $name add "${ips[@]}"
+}
+
+vlan_destroy()
+{
+ local if_name=$1; shift
+ local vid=$1; shift
+ local name=$if_name.$vid
+
+ ip link del dev $name
+}
+
master_name_get()
{
local if_name=$1
@@ -335,6 +379,15 @@
ip -j -s link show dev $if_name | jq '.[]["stats64"]["tx"]["packets"]'
}
+tc_rule_stats_get()
+{
+ local dev=$1; shift
+ local pref=$1; shift
+
+ tc -j -s filter show dev $dev ingress pref $pref |
+ jq '.[1].options.actions[].stats.packets'
+}
+
mac_get()
{
local if_name=$1
@@ -353,19 +406,33 @@
echo $((ageing_time / 100))
}
+declare -A SYSCTL_ORIG
+sysctl_set()
+{
+ local key=$1; shift
+ local value=$1; shift
+
+ SYSCTL_ORIG[$key]=$(sysctl -n $key)
+ sysctl -qw $key=$value
+}
+
+sysctl_restore()
+{
+ local key=$1; shift
+
+ sysctl -qw $key=${SYSCTL_ORIG["$key"]}
+}
+
forwarding_enable()
{
- ipv4_fwd=$(sysctl -n net.ipv4.conf.all.forwarding)
- ipv6_fwd=$(sysctl -n net.ipv6.conf.all.forwarding)
-
- sysctl -q -w net.ipv4.conf.all.forwarding=1
- sysctl -q -w net.ipv6.conf.all.forwarding=1
+ sysctl_set net.ipv4.conf.all.forwarding 1
+ sysctl_set net.ipv6.conf.all.forwarding 1
}
forwarding_restore()
{
- sysctl -q -w net.ipv6.conf.all.forwarding=$ipv6_fwd
- sysctl -q -w net.ipv4.conf.all.forwarding=$ipv4_fwd
+ sysctl_restore net.ipv6.conf.all.forwarding
+ sysctl_restore net.ipv4.conf.all.forwarding
}
tc_offload_check()
@@ -381,6 +448,92 @@
return 0
}
+trap_install()
+{
+ local dev=$1; shift
+ local direction=$1; shift
+
+ # For slow-path testing, we need to install a trap to get to
+ # slow path the packets that would otherwise be switched in HW.
+ tc filter add dev $dev $direction pref 1 flower skip_sw action trap
+}
+
+trap_uninstall()
+{
+ local dev=$1; shift
+ local direction=$1; shift
+
+ tc filter del dev $dev $direction pref 1 flower skip_sw
+}
+
+slow_path_trap_install()
+{
+ if [ "${tcflags/skip_hw}" != "$tcflags" ]; then
+ trap_install "$@"
+ fi
+}
+
+slow_path_trap_uninstall()
+{
+ if [ "${tcflags/skip_hw}" != "$tcflags" ]; then
+ trap_uninstall "$@"
+ fi
+}
+
+__icmp_capture_add_del()
+{
+ local add_del=$1; shift
+ local pref=$1; shift
+ local vsuf=$1; shift
+ local tundev=$1; shift
+ local filter=$1; shift
+
+ tc filter $add_del dev "$tundev" ingress \
+ proto ip$vsuf pref $pref \
+ flower ip_proto icmp$vsuf $filter \
+ action pass
+}
+
+icmp_capture_install()
+{
+ __icmp_capture_add_del add 100 "" "$@"
+}
+
+icmp_capture_uninstall()
+{
+ __icmp_capture_add_del del 100 "" "$@"
+}
+
+icmp6_capture_install()
+{
+ __icmp_capture_add_del add 100 v6 "$@"
+}
+
+icmp6_capture_uninstall()
+{
+ __icmp_capture_add_del del 100 v6 "$@"
+}
+
+matchall_sink_create()
+{
+ local dev=$1; shift
+
+ tc qdisc add dev $dev clsact
+ tc filter add dev $dev ingress \
+ pref 10000 \
+ matchall \
+ action drop
+}
+
+tests_run()
+{
+ local current_test
+
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ $current_test
+ done
+}
+
##############################################################################
# Tests
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre.sh b/tools/testing/selftests/net/forwarding/mirror_gre.sh
new file mode 100755
index 0000000..e6fd7a1
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre.sh
@@ -0,0 +1,159 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# Test for "tc action mirred egress mirror" when the device to mirror to is a
+# gretap or ip6gretap netdevice. Expect that the packets come out encapsulated,
+# and another gretap / ip6gretap netdevice is then capable of decapsulating the
+# traffic. Test that the payload is what is expected (ICMP ping request or
+# reply, depending on test).
+
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+ test_gretap_mac
+ test_ip6gretap_mac
+ test_two_spans
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ ip address add dev $swp3 192.0.2.129/28
+ ip address add dev $h3 192.0.2.130/28
+
+ ip address add dev $swp3 2001:db8:2::1/64
+ ip address add dev $h3 2001:db8:2::2/64
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ip address del dev $h3 2001:db8:2::2/64
+ ip address del dev $swp3 2001:db8:2::1/64
+
+ ip address del dev $h3 192.0.2.130/28
+ ip address del dev $swp3 192.0.2.129/28
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+}
+
+test_span_gre_mac()
+{
+ local tundev=$1; shift
+ local direction=$1; shift
+ local prot=$1; shift
+ local what=$1; shift
+
+ local swp3mac=$(mac_get $swp3)
+ local h3mac=$(mac_get $h3)
+
+ RET=0
+
+ mirror_install $swp1 $direction $tundev "matchall $tcflags"
+ tc filter add dev $h3 ingress pref 77 prot $prot \
+ flower ip_proto 0x2f src_mac $swp3mac dst_mac $h3mac \
+ action pass
+
+ mirror_test v$h1 192.0.2.1 192.0.2.2 $h3 77 10
+
+ tc filter del dev $h3 ingress pref 77
+ mirror_uninstall $swp1 $direction
+
+ log_test "$direction $what: envelope MAC ($tcflags)"
+}
+
+test_two_spans()
+{
+ RET=0
+
+ mirror_install $swp1 ingress gt4 "matchall $tcflags"
+ mirror_install $swp1 egress gt6 "matchall $tcflags"
+ quick_test_span_gre_dir gt4 ingress
+ quick_test_span_gre_dir gt6 egress
+
+ mirror_uninstall $swp1 ingress
+ fail_test_span_gre_dir gt4 ingress
+ quick_test_span_gre_dir gt6 egress
+
+ mirror_install $swp1 ingress gt4 "matchall $tcflags"
+ mirror_uninstall $swp1 egress
+ quick_test_span_gre_dir gt4 ingress
+ fail_test_span_gre_dir gt6 egress
+
+ mirror_uninstall $swp1 ingress
+ log_test "two simultaneously configured mirrors ($tcflags)"
+}
+
+test_gretap()
+{
+ full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
+ full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
+}
+
+test_ip6gretap()
+{
+ full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
+ full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap"
+}
+
+test_gretap_mac()
+{
+ test_span_gre_mac gt4 ingress ip "mirror to gretap"
+ test_span_gre_mac gt4 egress ip "mirror to gretap"
+}
+
+test_ip6gretap_mac()
+{
+ test_span_gre_mac gt6 ingress ipv6 "mirror to ip6gretap"
+ test_span_gre_mac gt6 egress ipv6 "mirror to ip6gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
new file mode 100755
index 0000000..360ca13
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
@@ -0,0 +1,226 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# +---------------------+ +---------------------+
+# | H1 | | H2 |
+# | + $h1 | | $h2 + |
+# | | 192.0.2.1/28 | | 192.0.2.2/28 | |
+# +-----|---------------+ +---------------|-----+
+# | |
+# +-----|-------------------------------------------------------------|-----+
+# | SW o--> mirror | |
+# | +---|-------------------------------------------------------------|---+ |
+# | | + $swp1 BR $swp2 + | |
+# | +---------------------------------------------------------------------+ |
+# | |
+# | +---------------------------------------------------------------------+ |
+# | | OL + gt6 (ip6gretap) + gt4 (gretap) | |
+# | | : loc=2001:db8:2::1 : loc=192.0.2.129 | |
+# | | : rem=2001:db8:2::2 : rem=192.0.2.130 | |
+# | | : ttl=100 : ttl=100 | |
+# | | : tos=inherit : tos=inherit | |
+# | +-------------------------:--|-------------------:--|-----------------+ |
+# | : | : | |
+# | +-------------------------:--|-------------------:--|-----------------+ |
+# | | UL : |,---------------------' | |
+# | | + $swp3 : || : | |
+# | | | 192.0.2.129/28 : vv : | |
+# | | | 2001:db8:2::1/64 : + ul (dummy) : | |
+# | +---|---------------------:----------------------:--------------------+ |
+# +-----|---------------------:----------------------:----------------------+
+# | : :
+# +-----|---------------------:----------------------:----------------------+
+# | H3 + $h3 + h3-gt6 (ip6gretap) + h3-gt4 (gretap) |
+# | 192.0.2.130/28 loc=2001:db8:2::2 loc=192.0.2.130 |
+# | 2001:db8:2::2/64 rem=2001:db8:2::1 rem=192.0.2.129 |
+# | ttl=100 ttl=100 |
+# | tos=inherit tos=inherit |
+# | |
+# +-------------------------------------------------------------------------+
+#
+# This tests mirroring to gretap and ip6gretap configured in an overlay /
+# underlay manner, i.e. with a bound dummy device that marks underlay VRF where
+# the encapsulated packed should be routed.
+
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+
+h1_create()
+{
+ simple_if_init $h1 192.0.2.1/28
+}
+
+h1_destroy()
+{
+ simple_if_fini $h1 192.0.2.1/28
+}
+
+h2_create()
+{
+ simple_if_init $h2 192.0.2.2/28
+}
+
+h2_destroy()
+{
+ simple_if_fini $h2 192.0.2.2/28
+}
+
+h3_create()
+{
+ simple_if_init $h3 192.0.2.130/28 2001:db8:2::2/64
+
+ tunnel_create h3-gt4 gretap 192.0.2.130 192.0.2.129
+ ip link set h3-gt4 vrf v$h3
+ matchall_sink_create h3-gt4
+
+ tunnel_create h3-gt6 ip6gretap 2001:db8:2::2 2001:db8:2::1
+ ip link set h3-gt6 vrf v$h3
+ matchall_sink_create h3-gt6
+}
+
+h3_destroy()
+{
+ tunnel_destroy h3-gt6
+ tunnel_destroy h3-gt4
+
+ simple_if_fini $h3 192.0.2.130/28 2001:db8:2::2/64
+}
+
+switch_create()
+{
+ # Bridge between H1 and H2.
+
+ ip link add name br1 type bridge vlan_filtering 1
+ ip link set dev br1 up
+
+ ip link set dev $swp1 master br1
+ ip link set dev $swp1 up
+
+ ip link set dev $swp2 master br1
+ ip link set dev $swp2 up
+
+ tc qdisc add dev $swp1 clsact
+
+ # Underlay.
+
+ simple_if_init $swp3 192.0.2.129/28 2001:db8:2::1/64
+
+ ip link add name ul type dummy
+ ip link set dev ul master v$swp3
+ ip link set dev ul up
+
+ # Overlay.
+
+ vrf_create vrf-ol
+ ip link set dev vrf-ol up
+
+ tunnel_create gt4 gretap 192.0.2.129 192.0.2.130 \
+ ttl 100 tos inherit dev ul
+ ip link set dev gt4 master vrf-ol
+ ip link set dev gt4 up
+
+ tunnel_create gt6 ip6gretap 2001:db8:2::1 2001:db8:2::2 \
+ ttl 100 tos inherit dev ul allow-localremote
+ ip link set dev gt6 master vrf-ol
+ ip link set dev gt6 up
+}
+
+switch_destroy()
+{
+ vrf_destroy vrf-ol
+
+ tunnel_destroy gt6
+ tunnel_destroy gt4
+
+ simple_if_fini $swp3 192.0.2.129/28 2001:db8:2::1/64
+
+ ip link del dev ul
+
+ tc qdisc del dev $swp1 clsact
+
+ ip link set dev $swp1 down
+ ip link set dev $swp2 down
+ ip link del dev br1
+}
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+
+ h1_create
+ h2_create
+ h3_create
+
+ switch_create
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ switch_destroy
+
+ h3_destroy
+ h2_destroy
+ h1_destroy
+
+ vrf_cleanup
+}
+
+test_gretap()
+{
+ full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap w/ UL"
+ full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap w/ UL"
+}
+
+test_ip6gretap()
+{
+ full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap w/ UL"
+ full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap w/ UL"
+}
+
+test_all()
+{
+ RET=0
+
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d_vlan.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d_vlan.sh
new file mode 100755
index 0000000..3d47afc
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d_vlan.sh
@@ -0,0 +1,109 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# Test for "tc action mirred egress mirror" when the underlay route points at a
+# bridge device without vlan filtering (802.1d). The device attached to that
+# bridge is a VLAN.
+
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ ip link add name br2 type bridge vlan_filtering 0
+ ip link set dev br2 up
+
+ vlan_create $swp3 555
+
+ ip link set dev $swp3.555 master br2
+ ip route add 192.0.2.130/32 dev br2
+ ip -6 route add 2001:db8:2::2/128 dev br2
+
+ ip address add dev br2 192.0.2.129/32
+ ip address add dev br2 2001:db8:2::1/128
+
+ vlan_create $h3 555 v$h3 192.0.2.130/28 2001:db8:2::2/64
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ vlan_destroy $h3 555
+ ip link del dev br2
+ vlan_destroy $swp3 555
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+}
+
+test_vlan_match()
+{
+ local tundev=$1; shift
+ local vlan_match=$1; shift
+ local what=$1; shift
+
+ full_test_span_gre_dir_vlan $tundev ingress "$vlan_match" 8 0 "$what"
+ full_test_span_gre_dir_vlan $tundev egress "$vlan_match" 0 8 "$what"
+}
+
+test_gretap()
+{
+ test_vlan_match gt4 'vlan_id 555 vlan_ethtype ip' "mirror to gretap"
+}
+
+test_ip6gretap()
+{
+ test_vlan_match gt6 'vlan_id 555 vlan_ethtype ipv6' "mirror to ip6gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
new file mode 100755
index 0000000..aa29d46
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
@@ -0,0 +1,278 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# Test how mirrors to gretap and ip6gretap react to changes to relevant
+# configuration.
+
+ALL_TESTS="
+ test_ttl
+ test_tun_up
+ test_egress_up
+ test_remote_ip
+ test_tun_del
+ test_route_del
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ # This test downs $swp3, which deletes the configured IPv6 address
+ # unless this sysctl is set.
+ sysctl_set net.ipv6.conf.$swp3.keep_addr_on_down 1
+
+ ip address add dev $swp3 192.0.2.129/28
+ ip address add dev $h3 192.0.2.130/28
+
+ ip address add dev $swp3 2001:db8:2::1/64
+ ip address add dev $h3 2001:db8:2::2/64
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ip address del dev $h3 2001:db8:2::2/64
+ ip address del dev $swp3 2001:db8:2::1/64
+
+ ip address del dev $h3 192.0.2.130/28
+ ip address del dev $swp3 192.0.2.129/28
+
+ sysctl_restore net.ipv6.conf.$swp3.keep_addr_on_down
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+}
+
+test_span_gre_ttl()
+{
+ local tundev=$1; shift
+ local type=$1; shift
+ local prot=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ tc filter add dev $h3 ingress pref 77 prot $prot \
+ flower ip_ttl 50 action pass
+
+ mirror_test v$h1 192.0.2.1 192.0.2.2 $h3 77 0
+
+ ip link set dev $tundev type $type ttl 50
+ mirror_test v$h1 192.0.2.1 192.0.2.2 $h3 77 10
+
+ ip link set dev $tundev type $type ttl 100
+ tc filter del dev $h3 ingress pref 77
+ mirror_uninstall $swp1 ingress
+
+ log_test "$what: TTL change ($tcflags)"
+}
+
+test_span_gre_tun_up()
+{
+ local tundev=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ ip link set dev $tundev down
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ fail_test_span_gre_dir $tundev ingress
+
+ ip link set dev $tundev up
+
+ quick_test_span_gre_dir $tundev ingress
+ mirror_uninstall $swp1 ingress
+
+ log_test "$what: tunnel down/up ($tcflags)"
+}
+
+test_span_gre_egress_up()
+{
+ local tundev=$1; shift
+ local remote_ip=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ ip link set dev $swp3 down
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ fail_test_span_gre_dir $tundev ingress
+
+ # After setting the device up, wait for neighbor to get resolved so that
+ # we can expect mirroring to work.
+ ip link set dev $swp3 up
+ while true; do
+ ip neigh sh dev $swp3 $remote_ip nud reachable |
+ grep -q ^
+ if [[ $? -ne 0 ]]; then
+ sleep 1
+ else
+ break
+ fi
+ done
+
+ quick_test_span_gre_dir $tundev ingress
+ mirror_uninstall $swp1 ingress
+
+ log_test "$what: egress down/up ($tcflags)"
+}
+
+test_span_gre_remote_ip()
+{
+ local tundev=$1; shift
+ local type=$1; shift
+ local correct_ip=$1; shift
+ local wrong_ip=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ ip link set dev $tundev type $type remote $wrong_ip
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ fail_test_span_gre_dir $tundev ingress
+
+ ip link set dev $tundev type $type remote $correct_ip
+ quick_test_span_gre_dir $tundev ingress
+ mirror_uninstall $swp1 ingress
+
+ log_test "$what: remote address change ($tcflags)"
+}
+
+test_span_gre_tun_del()
+{
+ local tundev=$1; shift
+ local type=$1; shift
+ local flags=$1; shift
+ local local_ip=$1; shift
+ local remote_ip=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ quick_test_span_gre_dir $tundev ingress
+ ip link del dev $tundev
+ fail_test_span_gre_dir $tundev ingress
+
+ tunnel_create $tundev $type $local_ip $remote_ip \
+ ttl 100 tos inherit $flags
+
+ # Recreating the tunnel doesn't reestablish mirroring, so reinstall it
+ # and verify it works for the follow-up tests.
+ mirror_uninstall $swp1 ingress
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ quick_test_span_gre_dir $tundev ingress
+ mirror_uninstall $swp1 ingress
+
+ log_test "$what: tunnel deleted ($tcflags)"
+}
+
+test_span_gre_route_del()
+{
+ local tundev=$1; shift
+ local edev=$1; shift
+ local route=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ quick_test_span_gre_dir $tundev ingress
+
+ ip route del $route dev $edev
+ fail_test_span_gre_dir $tundev ingress
+
+ ip route add $route dev $edev
+ quick_test_span_gre_dir $tundev ingress
+
+ mirror_uninstall $swp1 ingress
+
+ log_test "$what: underlay route removal ($tcflags)"
+}
+
+test_ttl()
+{
+ test_span_gre_ttl gt4 gretap ip "mirror to gretap"
+ test_span_gre_ttl gt6 ip6gretap ipv6 "mirror to ip6gretap"
+}
+
+test_tun_up()
+{
+ test_span_gre_tun_up gt4 "mirror to gretap"
+ test_span_gre_tun_up gt6 "mirror to ip6gretap"
+}
+
+test_egress_up()
+{
+ test_span_gre_egress_up gt4 192.0.2.130 "mirror to gretap"
+ test_span_gre_egress_up gt6 2001:db8:2::2 "mirror to ip6gretap"
+}
+
+test_remote_ip()
+{
+ test_span_gre_remote_ip gt4 gretap 192.0.2.130 192.0.2.132 "mirror to gretap"
+ test_span_gre_remote_ip gt6 ip6gretap 2001:db8:2::2 2001:db8:2::4 "mirror to ip6gretap"
+}
+
+test_tun_del()
+{
+ test_span_gre_tun_del gt4 gretap "" \
+ 192.0.2.129 192.0.2.130 "mirror to gretap"
+ test_span_gre_tun_del gt6 ip6gretap allow-localremote \
+ 2001:db8:2::1 2001:db8:2::2 "mirror to ip6gretap"
+}
+
+test_route_del()
+{
+ test_span_gre_route_del gt4 $swp3 192.0.2.128/28 "mirror to gretap"
+ test_span_gre_route_del gt6 $swp3 2001:db8:2::/64 "mirror to ip6gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh b/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
new file mode 100755
index 0000000..12914f4
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
@@ -0,0 +1,137 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# This tests flower-triggered mirroring to gretap and ip6gretap netdevices. The
+# interfaces on H1 and H2 have two addresses each. Flower match on one of the
+# addresses is configured with mirror action. It is expected that when pinging
+# this address, mirroring takes place, whereas when pinging the other one,
+# there's no mirroring.
+
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ ip address add dev $swp3 192.0.2.129/28
+ ip address add dev $h3 192.0.2.130/28
+
+ ip address add dev $swp3 2001:db8:2::1/64
+ ip address add dev $h3 2001:db8:2::2/64
+
+ ip address add dev $h1 192.0.2.3/28
+ ip address add dev $h2 192.0.2.4/28
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ip address del dev $h2 192.0.2.4/28
+ ip address del dev $h1 192.0.2.3/28
+
+ ip address del dev $h3 2001:db8:2::2/64
+ ip address del dev $swp3 2001:db8:2::1/64
+
+ ip address del dev $h3 192.0.2.130/28
+ ip address del dev $swp3 192.0.2.129/28
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+}
+
+test_span_gre_dir_acl()
+{
+ test_span_gre_dir_ips "$@" 192.0.2.3 192.0.2.4
+}
+
+fail_test_span_gre_dir_acl()
+{
+ fail_test_span_gre_dir_ips "$@" 192.0.2.3 192.0.2.4
+}
+
+full_test_span_gre_dir_acl()
+{
+ local tundev=$1; shift
+ local direction=$1; shift
+ local forward_type=$1; shift
+ local backward_type=$1; shift
+ local match_dip=$1; shift
+ local what=$1; shift
+
+ mirror_install $swp1 $direction $tundev \
+ "protocol ip flower $tcflags dst_ip $match_dip"
+ fail_test_span_gre_dir $tundev $direction
+ test_span_gre_dir_acl "$tundev" "$direction" \
+ "$forward_type" "$backward_type"
+ mirror_uninstall $swp1 $direction
+
+ # Test lack of mirroring after ACL mirror is uninstalled.
+ fail_test_span_gre_dir_acl "$tundev" "$direction"
+
+ log_test "$direction $what ($tcflags)"
+}
+
+test_gretap()
+{
+ full_test_span_gre_dir_acl gt4 ingress 8 0 192.0.2.4 "ACL mirror to gretap"
+ full_test_span_gre_dir_acl gt4 egress 0 8 192.0.2.3 "ACL mirror to gretap"
+}
+
+test_ip6gretap()
+{
+ full_test_span_gre_dir_acl gt6 ingress 8 0 192.0.2.4 "ACL mirror to ip6gretap"
+ full_test_span_gre_dir_acl gt6 egress 0 8 192.0.2.3 "ACL mirror to ip6gretap"
+}
+
+test_all()
+{
+ RET=0
+
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_lib.sh b/tools/testing/selftests/net/forwarding/mirror_gre_lib.sh
new file mode 100644
index 0000000..92ef6dd
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_lib.sh
@@ -0,0 +1,98 @@
+# SPDX-License-Identifier: GPL-2.0
+
+source mirror_lib.sh
+
+quick_test_span_gre_dir_ips()
+{
+ local tundev=$1; shift
+
+ do_test_span_dir_ips 10 h3-$tundev "$@"
+}
+
+fail_test_span_gre_dir_ips()
+{
+ local tundev=$1; shift
+
+ do_test_span_dir_ips 0 h3-$tundev "$@"
+}
+
+test_span_gre_dir_ips()
+{
+ local tundev=$1; shift
+
+ test_span_dir_ips h3-$tundev "$@"
+}
+
+full_test_span_gre_dir_ips()
+{
+ local tundev=$1; shift
+ local direction=$1; shift
+ local forward_type=$1; shift
+ local backward_type=$1; shift
+ local what=$1; shift
+ local ip1=$1; shift
+ local ip2=$1; shift
+
+ RET=0
+
+ mirror_install $swp1 $direction $tundev "matchall $tcflags"
+ test_span_dir_ips "h3-$tundev" "$direction" "$forward_type" \
+ "$backward_type" "$ip1" "$ip2"
+ mirror_uninstall $swp1 $direction
+
+ log_test "$direction $what ($tcflags)"
+}
+
+full_test_span_gre_dir_vlan_ips()
+{
+ local tundev=$1; shift
+ local direction=$1; shift
+ local vlan_match=$1; shift
+ local forward_type=$1; shift
+ local backward_type=$1; shift
+ local what=$1; shift
+ local ip1=$1; shift
+ local ip2=$1; shift
+
+ RET=0
+
+ mirror_install $swp1 $direction $tundev "matchall $tcflags"
+
+ test_span_dir_ips "h3-$tundev" "$direction" "$forward_type" \
+ "$backward_type" "$ip1" "$ip2"
+
+ tc filter add dev $h3 ingress pref 77 prot 802.1q \
+ flower $vlan_match ip_proto 0x2f \
+ action pass
+ mirror_test v$h1 $ip1 $ip2 $h3 77 10
+ tc filter del dev $h3 ingress pref 77
+
+ mirror_uninstall $swp1 $direction
+
+ log_test "$direction $what ($tcflags)"
+}
+
+quick_test_span_gre_dir()
+{
+ quick_test_span_gre_dir_ips "$@" 192.0.2.1 192.0.2.2
+}
+
+fail_test_span_gre_dir()
+{
+ fail_test_span_gre_dir_ips "$@" 192.0.2.1 192.0.2.2
+}
+
+test_span_gre_dir()
+{
+ test_span_gre_dir_ips "$@" 192.0.2.1 192.0.2.2
+}
+
+full_test_span_gre_dir()
+{
+ full_test_span_gre_dir_ips "$@" 192.0.2.1 192.0.2.2
+}
+
+full_test_span_gre_dir_vlan()
+{
+ full_test_span_gre_dir_vlan_ips "$@" 192.0.2.1 192.0.2.2
+}
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh b/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
new file mode 100755
index 0000000..fc0508e
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
@@ -0,0 +1,115 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# Test for mirroring to gretap and ip6gretap, such that the neighbor entry for
+# the tunnel remote address has invalid address at the time that the mirroring
+# is set up. Later on, the neighbor is deleted and it is expected to be
+# reinitialized using the usual ARP process, and the mirroring offload updated.
+
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ ip address add dev $swp3 192.0.2.129/28
+ ip address add dev $h3 192.0.2.130/28
+
+ ip address add dev $swp3 2001:db8:2::1/64
+ ip address add dev $h3 2001:db8:2::2/64
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ip address del dev $h3 2001:db8:2::2/64
+ ip address del dev $swp3 2001:db8:2::1/64
+
+ ip address del dev $h3 192.0.2.130/28
+ ip address del dev $swp3 192.0.2.129/28
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+}
+
+test_span_gre_neigh()
+{
+ local addr=$1; shift
+ local tundev=$1; shift
+ local direction=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ ip neigh replace dev $swp3 $addr lladdr 00:11:22:33:44:55
+ mirror_install $swp1 $direction $tundev "matchall $tcflags"
+ fail_test_span_gre_dir $tundev ingress
+ ip neigh del dev $swp3 $addr
+ quick_test_span_gre_dir $tundev ingress
+ mirror_uninstall $swp1 $direction
+
+ log_test "$direction $what: neighbor change ($tcflags)"
+}
+
+test_gretap()
+{
+ test_span_gre_neigh 192.0.2.130 gt4 ingress "mirror to gretap"
+ test_span_gre_neigh 192.0.2.130 gt4 egress "mirror to gretap"
+}
+
+test_ip6gretap()
+{
+ test_span_gre_neigh 2001:db8:2::2 gt6 ingress "mirror to ip6gretap"
+ test_span_gre_neigh 2001:db8:2::2 gt6 egress "mirror to ip6gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
new file mode 100755
index 0000000..8fa681e
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
@@ -0,0 +1,127 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# Test that gretap and ip6gretap mirroring works when the other tunnel endpoint
+# is reachable through a next-hop route (as opposed to directly-attached route).
+
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ sysctl_set net.ipv4.conf.all.rp_filter 0
+ sysctl_set net.ipv4.conf.$h3.rp_filter 0
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ ip address add dev $swp3 192.0.2.161/28
+ ip address add dev $h3 192.0.2.162/28
+ ip address add dev gt4 192.0.2.129/32
+ ip address add dev h3-gt4 192.0.2.130/32
+
+ # IPv6 route can't be added after address. Such routes are rejected due
+ # to the gateway address having been configured on the local system. It
+ # works the other way around though.
+ ip address add dev $swp3 2001:db8:4::1/64
+ ip -6 route add 2001:db8:2::2/128 via 2001:db8:4::2
+ ip address add dev $h3 2001:db8:4::2/64
+ ip address add dev gt6 2001:db8:2::1
+ ip address add dev h3-gt6 2001:db8:2::2
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ip -6 route del 2001:db8:2::2/128 via 2001:db8:4::2
+ ip address del dev $h3 2001:db8:4::2/64
+ ip address del dev $swp3 2001:db8:4::1/64
+
+ ip address del dev $h3 192.0.2.162/28
+ ip address del dev $swp3 192.0.2.161/28
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+
+ sysctl_restore net.ipv4.conf.$h3.rp_filter
+ sysctl_restore net.ipv4.conf.all.rp_filter
+}
+
+test_gretap()
+{
+ RET=0
+ mirror_install $swp1 ingress gt4 "matchall $tcflags"
+
+ # For IPv4, test that there's no mirroring without the route directing
+ # the traffic to tunnel remote address. Then add it and test that
+ # mirroring starts. For IPv6 we can't test this due to the limitation
+ # that routes for locally-specified IPv6 addresses can't be added.
+ fail_test_span_gre_dir gt4 ingress
+
+ ip route add 192.0.2.130/32 via 192.0.2.162
+ quick_test_span_gre_dir gt4 ingress
+ ip route del 192.0.2.130/32 via 192.0.2.162
+
+ mirror_uninstall $swp1 ingress
+ log_test "mirror to gre with next-hop remote ($tcflags)"
+}
+
+test_ip6gretap()
+{
+ RET=0
+
+ mirror_install $swp1 ingress gt6 "matchall $tcflags"
+ quick_test_span_gre_dir gt6 ingress
+ mirror_uninstall $swp1 ingress
+
+ log_test "mirror to ip6gre with next-hop remote ($tcflags)"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_topo_lib.sh b/tools/testing/selftests/net/forwarding/mirror_gre_topo_lib.sh
new file mode 100644
index 0000000..2534195
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_topo_lib.sh
@@ -0,0 +1,94 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# This is the standard topology for testing mirroring to gretap and ip6gretap
+# netdevices. The tests that use it tweak it in one way or another--importantly,
+# $swp3 and $h3 need to have addresses set up.
+#
+# +---------------------+ +---------------------+
+# | H1 | | H2 |
+# | + $h1 | | $h2 + |
+# | | 192.0.2.1/28 | | 192.0.2.2/28 | |
+# +-----|---------------+ +---------------|-----+
+# | |
+# +-----|-------------------------------------------------------------|-----+
+# | SW o--> mirror | |
+# | +---|-------------------------------------------------------------|---+ |
+# | | + $swp1 BR $swp2 + | |
+# | +---------------------------------------------------------------------+ |
+# | |
+# | + $swp3 + gt6 (ip6gretap) + gt4 (gretap) |
+# | | : loc=2001:db8:2::1 : loc=192.0.2.129 |
+# | | : rem=2001:db8:2::2 : rem=192.0.2.130 |
+# | | : ttl=100 : ttl=100 |
+# | | : tos=inherit : tos=inherit |
+# | | : : |
+# +-----|---------------------:----------------------:----------------------+
+# | : :
+# +-----|---------------------:----------------------:----------------------+
+# | H3 + $h3 + h3-gt6 (ip6gretap) + h3-gt4 (gretap) |
+# | loc=2001:db8:2::2 loc=192.0.2.130 |
+# | rem=2001:db8:2::1 rem=192.0.2.129 |
+# | ttl=100 ttl=100 |
+# | tos=inherit tos=inherit |
+# | |
+# +-------------------------------------------------------------------------+
+
+source mirror_topo_lib.sh
+
+mirror_gre_topo_h3_create()
+{
+ mirror_topo_h3_create
+
+ tunnel_create h3-gt4 gretap 192.0.2.130 192.0.2.129
+ ip link set h3-gt4 vrf v$h3
+ matchall_sink_create h3-gt4
+
+ tunnel_create h3-gt6 ip6gretap 2001:db8:2::2 2001:db8:2::1
+ ip link set h3-gt6 vrf v$h3
+ matchall_sink_create h3-gt6
+}
+
+mirror_gre_topo_h3_destroy()
+{
+ tunnel_destroy h3-gt6
+ tunnel_destroy h3-gt4
+
+ mirror_topo_h3_destroy
+}
+
+mirror_gre_topo_switch_create()
+{
+ mirror_topo_switch_create
+
+ tunnel_create gt4 gretap 192.0.2.129 192.0.2.130 \
+ ttl 100 tos inherit
+
+ tunnel_create gt6 ip6gretap 2001:db8:2::1 2001:db8:2::2 \
+ ttl 100 tos inherit allow-localremote
+}
+
+mirror_gre_topo_switch_destroy()
+{
+ tunnel_destroy gt6
+ tunnel_destroy gt4
+
+ mirror_topo_switch_destroy
+}
+
+mirror_gre_topo_create()
+{
+ mirror_topo_h1_create
+ mirror_topo_h2_create
+ mirror_gre_topo_h3_create
+
+ mirror_gre_topo_switch_create
+}
+
+mirror_gre_topo_destroy()
+{
+ mirror_gre_topo_switch_destroy
+
+ mirror_gre_topo_h3_destroy
+ mirror_topo_h2_destroy
+ mirror_topo_h1_destroy
+}
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_vlan.sh b/tools/testing/selftests/net/forwarding/mirror_gre_vlan.sh
new file mode 100755
index 0000000..88cecdb
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_vlan.sh
@@ -0,0 +1,92 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# Test for "tc action mirred egress mirror" that mirrors to a gretap netdevice
+# whose underlay route points at a vlan device.
+
+ALL_TESTS="
+ test_gretap
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ ip link add name $swp3.555 link $swp3 type vlan id 555
+ ip address add dev $swp3.555 192.0.2.129/32
+ ip address add dev $swp3.555 2001:db8:2::1/128
+ ip link set dev $swp3.555 up
+
+ ip route add 192.0.2.130/32 dev $swp3.555
+ ip -6 route add 2001:db8:2::2/128 dev $swp3.555
+
+ ip link add name $h3.555 link $h3 type vlan id 555
+ ip link set dev $h3.555 master v$h3
+ ip address add dev $h3.555 192.0.2.130/28
+ ip address add dev $h3.555 2001:db8:2::2/64
+ ip link set dev $h3.555 up
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ip link del dev $h3.555
+ ip link del dev $swp3.555
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+}
+
+test_gretap()
+{
+ full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
+ full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh b/tools/testing/selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh
new file mode 100755
index 0000000..01ec28a
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_vlan_bridge_1q.sh
@@ -0,0 +1,140 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing gretap. See
+# mirror_gre_topo_lib.sh for more details.
+#
+# Test for "tc action mirred egress mirror" when the underlay route points at a
+# vlan device on top of a bridge device with vlan filtering (802.1q).
+
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+ test_gretap_forbidden
+ test_ip6gretap_forbidden
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_gre_lib.sh
+source mirror_gre_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_gre_topo_create
+
+ vlan_create br1 555 "" 192.0.2.129/32 2001:db8:2::1/128
+ bridge vlan add dev br1 vid 555 self
+ ip route rep 192.0.2.130/32 dev br1.555
+ ip -6 route rep 2001:db8:2::2/128 dev br1.555
+
+ vlan_create $h3 555 v$h3 192.0.2.130/28 2001:db8:2::2/64
+
+ ip link set dev $swp3 master br1
+ bridge vlan add dev $swp3 vid 555
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ip link set dev $swp3 nomaster
+ vlan_destroy $h3 555
+ vlan_destroy br1 555
+
+ mirror_gre_topo_destroy
+ vrf_cleanup
+}
+
+test_vlan_match()
+{
+ local tundev=$1; shift
+ local vlan_match=$1; shift
+ local what=$1; shift
+
+ full_test_span_gre_dir_vlan $tundev ingress "$vlan_match" 8 0 "$what"
+ full_test_span_gre_dir_vlan $tundev egress "$vlan_match" 0 8 "$what"
+}
+
+test_gretap()
+{
+ test_vlan_match gt4 'vlan_id 555 vlan_ethtype ip' "mirror to gretap"
+}
+
+test_ip6gretap()
+{
+ test_vlan_match gt6 'vlan_id 555 vlan_ethtype ipv6' "mirror to ip6gretap"
+}
+
+test_span_gre_forbidden()
+{
+ local tundev=$1; shift
+ local what=$1; shift
+
+ RET=0
+
+ # Run the pass-test first, to prime neighbor table.
+ mirror_install $swp1 ingress $tundev "matchall $tcflags"
+ quick_test_span_gre_dir $tundev ingress
+
+ # Now forbid the VLAN at the bridge and see it fail.
+ bridge vlan del dev br1 vid 555 self
+ sleep 1
+
+ fail_test_span_gre_dir $tundev ingress
+ mirror_uninstall $swp1 ingress
+
+ bridge vlan add dev br1 vid 555 self
+ sleep 1
+
+ log_test "$what: vlan forbidden at a bridge ($tcflags)"
+}
+
+test_gretap_forbidden()
+{
+ test_span_gre_forbidden gt4 "mirror to gretap"
+}
+
+test_ip6gretap_forbidden()
+{
+ test_span_gre_forbidden gt4 "mirror to ip6gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
+
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/mirror_lib.sh b/tools/testing/selftests/net/forwarding/mirror_lib.sh
new file mode 100644
index 0000000..04cbc38
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_lib.sh
@@ -0,0 +1,94 @@
+# SPDX-License-Identifier: GPL-2.0
+
+mirror_install()
+{
+ local from_dev=$1; shift
+ local direction=$1; shift
+ local to_dev=$1; shift
+ local filter=$1; shift
+
+ tc filter add dev $from_dev $direction \
+ pref 1000 $filter \
+ action mirred egress mirror dev $to_dev
+}
+
+mirror_uninstall()
+{
+ local from_dev=$1; shift
+ local direction=$1; shift
+
+ tc filter del dev $swp1 $direction pref 1000
+}
+
+mirror_test()
+{
+ local vrf_name=$1; shift
+ local sip=$1; shift
+ local dip=$1; shift
+ local dev=$1; shift
+ local pref=$1; shift
+ local expect=$1; shift
+
+ local t0=$(tc_rule_stats_get $dev $pref)
+ ip vrf exec $vrf_name \
+ ${PING} ${sip:+-I $sip} $dip -c 10 -i 0.1 -w 2 &> /dev/null
+ local t1=$(tc_rule_stats_get $dev $pref)
+ local delta=$((t1 - t0))
+ # Tolerate a couple stray extra packets.
+ ((expect <= delta && delta <= expect + 2))
+ check_err $? "Expected to capture $expect packets, got $delta."
+}
+
+do_test_span_dir_ips()
+{
+ local expect=$1; shift
+ local dev=$1; shift
+ local direction=$1; shift
+ local ip1=$1; shift
+ local ip2=$1; shift
+
+ icmp_capture_install $dev
+ mirror_test v$h1 $ip1 $ip2 $dev 100 $expect
+ mirror_test v$h2 $ip2 $ip1 $dev 100 $expect
+ icmp_capture_uninstall $dev
+}
+
+quick_test_span_dir_ips()
+{
+ do_test_span_dir_ips 10 "$@"
+}
+
+fail_test_span_dir_ips()
+{
+ do_test_span_dir_ips 0 "$@"
+}
+
+test_span_dir_ips()
+{
+ local dev=$1; shift
+ local direction=$1; shift
+ local forward_type=$1; shift
+ local backward_type=$1; shift
+ local ip1=$1; shift
+ local ip2=$1; shift
+
+ quick_test_span_dir_ips "$dev" "$direction" "$ip1" "$ip2"
+
+ icmp_capture_install $dev "type $forward_type"
+ mirror_test v$h1 $ip1 $ip2 $dev 100 10
+ icmp_capture_uninstall $dev
+
+ icmp_capture_install $dev "type $backward_type"
+ mirror_test v$h2 $ip2 $ip1 $dev 100 10
+ icmp_capture_uninstall $dev
+}
+
+fail_test_span_dir()
+{
+ fail_test_span_dir_ips "$@" 192.0.2.1 192.0.2.2
+}
+
+test_span_dir()
+{
+ test_span_dir_ips "$@" 192.0.2.1 192.0.2.2
+}
diff --git a/tools/testing/selftests/net/forwarding/mirror_topo_lib.sh b/tools/testing/selftests/net/forwarding/mirror_topo_lib.sh
new file mode 100644
index 0000000..04979e5
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_topo_lib.sh
@@ -0,0 +1,101 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# This is the standard topology for testing mirroring. The tests that use it
+# tweak it in one way or another--typically add more devices to the topology.
+#
+# +---------------------+ +---------------------+
+# | H1 | | H2 |
+# | + $h1 | | $h2 + |
+# | | 192.0.2.1/28 | | 192.0.2.2/28 | |
+# +-----|---------------+ +---------------|-----+
+# | |
+# +-----|-------------------------------------------------------------|-----+
+# | SW o--> mirror | |
+# | +---|-------------------------------------------------------------|---+ |
+# | | + $swp1 BR $swp2 + | |
+# | +---------------------------------------------------------------------+ |
+# | |
+# | + $swp3 |
+# +-----|-------------------------------------------------------------------+
+# |
+# +-----|-------------------------------------------------------------------+
+# | H3 + $h3 |
+# | |
+# +-------------------------------------------------------------------------+
+
+mirror_topo_h1_create()
+{
+ simple_if_init $h1 192.0.2.1/28
+}
+
+mirror_topo_h1_destroy()
+{
+ simple_if_fini $h1 192.0.2.1/28
+}
+
+mirror_topo_h2_create()
+{
+ simple_if_init $h2 192.0.2.2/28
+}
+
+mirror_topo_h2_destroy()
+{
+ simple_if_fini $h2 192.0.2.2/28
+}
+
+mirror_topo_h3_create()
+{
+ simple_if_init $h3
+ tc qdisc add dev $h3 clsact
+}
+
+mirror_topo_h3_destroy()
+{
+ tc qdisc del dev $h3 clsact
+ simple_if_fini $h3
+}
+
+mirror_topo_switch_create()
+{
+ ip link set dev $swp3 up
+
+ ip link add name br1 type bridge vlan_filtering 1
+ ip link set dev br1 up
+
+ ip link set dev $swp1 master br1
+ ip link set dev $swp1 up
+
+ ip link set dev $swp2 master br1
+ ip link set dev $swp2 up
+
+ tc qdisc add dev $swp1 clsact
+}
+
+mirror_topo_switch_destroy()
+{
+ tc qdisc del dev $swp1 clsact
+
+ ip link set dev $swp1 down
+ ip link set dev $swp2 down
+ ip link del dev br1
+
+ ip link set dev $swp3 down
+}
+
+mirror_topo_create()
+{
+ mirror_topo_h1_create
+ mirror_topo_h2_create
+ mirror_topo_h3_create
+
+ mirror_topo_switch_create
+}
+
+mirror_topo_destroy()
+{
+ mirror_topo_switch_destroy
+
+ mirror_topo_h3_destroy
+ mirror_topo_h2_destroy
+ mirror_topo_h1_destroy
+}
diff --git a/tools/testing/selftests/net/forwarding/mirror_vlan.sh b/tools/testing/selftests/net/forwarding/mirror_vlan.sh
new file mode 100755
index 0000000..1e10520
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/mirror_vlan.sh
@@ -0,0 +1,169 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test uses standard topology for testing mirroring. See mirror_topo_lib.sh
+# for more details.
+#
+# Test for "tc action mirred egress mirror" that mirrors to a vlan device.
+
+ALL_TESTS="
+ test_vlan
+ test_tagged_vlan
+"
+
+NUM_NETIFS=6
+source lib.sh
+source mirror_lib.sh
+source mirror_topo_lib.sh
+
+setup_prepare()
+{
+ h1=${NETIFS[p1]}
+ swp1=${NETIFS[p2]}
+
+ swp2=${NETIFS[p3]}
+ h2=${NETIFS[p4]}
+
+ swp3=${NETIFS[p5]}
+ h3=${NETIFS[p6]}
+
+ vrf_prepare
+ mirror_topo_create
+
+ vlan_create $swp3 555
+
+ vlan_create $h3 555 v$h3
+ matchall_sink_create $h3.555
+
+ vlan_create $h1 111 v$h1 192.0.2.17/28
+ bridge vlan add dev $swp1 vid 111
+
+ vlan_create $h2 111 v$h2 192.0.2.18/28
+ bridge vlan add dev $swp2 vid 111
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ vlan_destroy $h2 111
+ vlan_destroy $h1 111
+ vlan_destroy $h3 555
+ vlan_destroy $swp3 555
+
+ mirror_topo_destroy
+ vrf_cleanup
+}
+
+test_vlan_dir()
+{
+ local direction=$1; shift
+ local forward_type=$1; shift
+ local backward_type=$1; shift
+
+ RET=0
+
+ mirror_install $swp1 $direction $swp3.555 "matchall $tcflags"
+ test_span_dir "$h3.555" "$direction" "$forward_type" "$backward_type"
+ mirror_uninstall $swp1 $direction
+
+ log_test "$direction mirror to vlan ($tcflags)"
+}
+
+test_vlan()
+{
+ test_vlan_dir ingress 8 0
+ test_vlan_dir egress 0 8
+}
+
+vlan_capture_add_del()
+{
+ local add_del=$1; shift
+ local pref=$1; shift
+ local dev=$1; shift
+ local filter=$1; shift
+
+ tc filter $add_del dev "$dev" ingress \
+ proto 802.1q pref $pref \
+ flower $filter \
+ action pass
+}
+
+vlan_capture_install()
+{
+ vlan_capture_add_del add 100 "$@"
+}
+
+vlan_capture_uninstall()
+{
+ vlan_capture_add_del del 100 "$@"
+}
+
+do_test_span_vlan_dir_ips()
+{
+ local expect=$1; shift
+ local dev=$1; shift
+ local vid=$1; shift
+ local direction=$1; shift
+ local ip1=$1; shift
+ local ip2=$1; shift
+
+ vlan_capture_install $dev "vlan_id $vid"
+ mirror_test v$h1 $ip1 $ip2 $dev 100 $expect
+ mirror_test v$h2 $ip2 $ip1 $dev 100 $expect
+ vlan_capture_uninstall $dev
+}
+
+test_tagged_vlan_dir()
+{
+ local direction=$1; shift
+ local forward_type=$1; shift
+ local backward_type=$1; shift
+
+ RET=0
+
+ mirror_install $swp1 $direction $swp3.555 "matchall $tcflags"
+ do_test_span_vlan_dir_ips 10 "$h3.555" 111 "$direction" \
+ 192.0.2.17 192.0.2.18
+ do_test_span_vlan_dir_ips 0 "$h3.555" 555 "$direction" \
+ 192.0.2.17 192.0.2.18
+ mirror_uninstall $swp1 $direction
+
+ log_test "$direction mirror to vlan ($tcflags)"
+}
+
+test_tagged_vlan()
+{
+ test_tagged_vlan_dir ingress 8 0
+ test_tagged_vlan_dir egress 0 8
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+ trap_install $h3 ingress
+
+ tests_run
+
+ trap_install $h3 ingress
+ slow_path_trap_uninstall $swp1 egress
+ slow_path_trap_uninstall $swp1 ingress
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tcflags="skip_hw"
+test_all
+
+if ! tc_offload_check; then
+ echo "WARN: Could not test offloaded functionality"
+else
+ tcflags="skip_sw"
+ test_all
+fi
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/router.sh b/tools/testing/selftests/net/forwarding/router.sh
index cc6a14a..a75cb51 100755
--- a/tools/testing/selftests/net/forwarding/router.sh
+++ b/tools/testing/selftests/net/forwarding/router.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6"
NUM_NETIFS=4
source lib.sh
@@ -114,12 +115,21 @@
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 198.51.100.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:2::2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 198.51.100.2
-ping6_test $h1 2001:db8:2::2
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/router_multipath.sh b/tools/testing/selftests/net/forwarding/router_multipath.sh
index 3bc3510..8b6d0fb6 100755
--- a/tools/testing/selftests/net/forwarding/router_multipath.sh
+++ b/tools/testing/selftests/net/forwarding/router_multipath.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6 multipath_test"
NUM_NETIFS=8
source lib.sh
@@ -191,7 +192,7 @@
diff=$(echo $weights_ratio - $packets_ratio | bc -l)
diff=${diff#-}
- test "$(echo "$diff / $weights_ratio > 0.1" | bc -l)" -eq 0
+ test "$(echo "$diff / $weights_ratio > 0.15" | bc -l)" -eq 0
check_err $? "Too large discrepancy between expected and measured ratios"
log_test "$desc"
log_info "Expected ratio $weights_ratio Measured ratio $packets_ratio"
@@ -204,13 +205,11 @@
local weight_rp13=$3
local t0_rp12 t0_rp13 t1_rp12 t1_rp13
local packets_rp12 packets_rp13
- local hash_policy
# Transmit multiple flows from h1 to h2 and make sure they are
# distributed between both multipath links (rp12 and rp13)
# according to the configured weights.
- hash_policy=$(sysctl -n net.ipv4.fib_multipath_hash_policy)
- sysctl -q -w net.ipv4.fib_multipath_hash_policy=1
+ sysctl_set net.ipv4.fib_multipath_hash_policy 1
ip route replace 198.51.100.0/24 vrf vrf-r1 \
nexthop via 169.254.2.22 dev $rp12 weight $weight_rp12 \
nexthop via 169.254.3.23 dev $rp13 weight $weight_rp13
@@ -232,7 +231,7 @@
ip route replace 198.51.100.0/24 vrf vrf-r1 \
nexthop via 169.254.2.22 dev $rp12 \
nexthop via 169.254.3.23 dev $rp13
- sysctl -q -w net.ipv4.fib_multipath_hash_policy=$hash_policy
+ sysctl_restore net.ipv4.fib_multipath_hash_policy
}
multipath6_l4_test()
@@ -242,13 +241,11 @@
local weight_rp13=$3
local t0_rp12 t0_rp13 t1_rp12 t1_rp13
local packets_rp12 packets_rp13
- local hash_policy
# Transmit multiple flows from h1 to h2 and make sure they are
# distributed between both multipath links (rp12 and rp13)
# according to the configured weights.
- hash_policy=$(sysctl -n net.ipv6.fib_multipath_hash_policy)
- sysctl -q -w net.ipv6.fib_multipath_hash_policy=1
+ sysctl_set net.ipv6.fib_multipath_hash_policy 1
ip route replace 2001:db8:2::/64 vrf vrf-r1 \
nexthop via fe80:2::22 dev $rp12 weight $weight_rp12 \
@@ -271,7 +268,7 @@
nexthop via fe80:2::22 dev $rp12 \
nexthop via fe80:3::23 dev $rp13
- sysctl -q -w net.ipv6.fib_multipath_hash_policy=$hash_policy
+ sysctl_restore net.ipv6.fib_multipath_hash_policy
}
multipath6_test()
@@ -364,13 +361,21 @@
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 198.51.100.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:2::2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 198.51.100.2
-ping6_test $h1 2001:db8:2::2
-multipath_test
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index 3a6385e..813d02d 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
+ mirred_egress_mirror_test gact_trap_test"
NUM_NETIFS=4
source tc_common.sh
source lib.sh
@@ -111,6 +113,10 @@
{
RET=0
+ if [[ "$tcflags" != "skip_sw" ]]; then
+ return 0;
+ fi
+
tc filter add dev $swp1 ingress protocol ip pref 1 handle 101 flower \
skip_hw dst_ip 192.0.2.2 action drop
tc filter add dev $swp1 ingress protocol ip pref 3 handle 103 flower \
@@ -179,24 +185,29 @@
ip link set $swp1 address $swp1origmac
}
+mirred_egress_redirect_test()
+{
+ mirred_egress_test "redirect"
+}
+
+mirred_egress_mirror_test()
+{
+ mirred_egress_test "mirror"
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-gact_drop_and_ok_test
-mirred_egress_test "redirect"
-mirred_egress_test "mirror"
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- gact_drop_and_ok_test
- mirred_egress_test "redirect"
- mirred_egress_test "mirror"
- gact_trap_test
+ tests_run
fi
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_chains.sh b/tools/testing/selftests/net/forwarding/tc_chains.sh
index 2fd1522..d2c783e 100755
--- a/tools/testing/selftests/net/forwarding/tc_chains.sh
+++ b/tools/testing/selftests/net/forwarding/tc_chains.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="unreachable_chain_test gact_goto_chain_test"
NUM_NETIFS=2
source tc_common.sh
source lib.sh
@@ -107,16 +108,14 @@
setup_prepare
setup_wait
-unreachable_chain_test
-gact_goto_chain_test
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- unreachable_chain_test
- gact_goto_chain_test
+ tests_run
fi
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_flower.sh b/tools/testing/selftests/net/forwarding/tc_flower.sh
index 032b882..20d1077 100755
--- a/tools/testing/selftests/net/forwarding/tc_flower.sh
+++ b/tools/testing/selftests/net/forwarding/tc_flower.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="match_dst_mac_test match_src_mac_test match_dst_ip_test \
+ match_src_ip_test match_ip_flags_test"
NUM_NETIFS=2
source tc_common.sh
source lib.sh
@@ -149,6 +151,74 @@
log_test "src_ip match ($tcflags)"
}
+match_ip_flags_test()
+{
+ RET=0
+
+ tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
+ $tcflags ip_flags frag action continue
+ tc filter add dev $h2 ingress protocol ip pref 2 handle 102 flower \
+ $tcflags ip_flags firstfrag action continue
+ tc filter add dev $h2 ingress protocol ip pref 3 handle 103 flower \
+ $tcflags ip_flags nofirstfrag action continue
+ tc filter add dev $h2 ingress protocol ip pref 4 handle 104 flower \
+ $tcflags ip_flags nofrag action drop
+
+ $MZ $h1 -c 1 -p 1000 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+ -t ip "frag=0" -q
+
+ tc_check_packets "dev $h2 ingress" 101 1
+ check_fail $? "Matched on wrong frag filter (nofrag)"
+
+ tc_check_packets "dev $h2 ingress" 102 1
+ check_fail $? "Matched on wrong firstfrag filter (nofrag)"
+
+ tc_check_packets "dev $h2 ingress" 103 1
+ check_err $? "Did not match on nofirstfrag filter (nofrag) "
+
+ tc_check_packets "dev $h2 ingress" 104 1
+ check_err $? "Did not match on nofrag filter (nofrag)"
+
+ $MZ $h1 -c 1 -p 1000 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+ -t ip "frag=0,mf" -q
+
+ tc_check_packets "dev $h2 ingress" 101 1
+ check_err $? "Did not match on frag filter (1stfrag)"
+
+ tc_check_packets "dev $h2 ingress" 102 1
+ check_err $? "Did not match fistfrag filter (1stfrag)"
+
+ tc_check_packets "dev $h2 ingress" 103 1
+ check_err $? "Matched on wrong nofirstfrag filter (1stfrag)"
+
+ tc_check_packets "dev $h2 ingress" 104 1
+ check_err $? "Match on wrong nofrag filter (1stfrag)"
+
+ $MZ $h1 -c 1 -p 1000 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+ -t ip "frag=256,mf" -q
+ $MZ $h1 -c 1 -p 1000 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+ -t ip "frag=256" -q
+
+ tc_check_packets "dev $h2 ingress" 101 3
+ check_err $? "Did not match on frag filter (no1stfrag)"
+
+ tc_check_packets "dev $h2 ingress" 102 1
+ check_err $? "Matched on wrong firstfrag filter (no1stfrag)"
+
+ tc_check_packets "dev $h2 ingress" 103 3
+ check_err $? "Did not match on nofirstfrag filter (no1stfrag)"
+
+ tc_check_packets "dev $h2 ingress" 104 1
+ check_err $? "Matched on nofrag filter (no1stfrag)"
+
+ tc filter del dev $h2 ingress protocol ip pref 1 handle 101 flower
+ tc filter del dev $h2 ingress protocol ip pref 2 handle 102 flower
+ tc filter del dev $h2 ingress protocol ip pref 3 handle 103 flower
+ tc filter del dev $h2 ingress protocol ip pref 4 handle 104 flower
+
+ log_test "ip_flags match ($tcflags)"
+}
+
setup_prepare()
{
h1=${NETIFS[p1]}
@@ -177,20 +247,14 @@
setup_prepare
setup_wait
-match_dst_mac_test
-match_src_mac_test
-match_dst_ip_test
-match_src_ip_test
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- match_dst_mac_test
- match_src_mac_test
- match_dst_ip_test
- match_src_ip_test
+ tests_run
fi
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_shblocks.sh b/tools/testing/selftests/net/forwarding/tc_shblocks.sh
index 077b980..b5b9172 100755
--- a/tools/testing/selftests/net/forwarding/tc_shblocks.sh
+++ b/tools/testing/selftests/net/forwarding/tc_shblocks.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="shared_block_test"
NUM_NETIFS=4
source tc_common.sh
source lib.sh
@@ -109,14 +110,14 @@
setup_prepare
setup_wait
-shared_block_test
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- shared_block_test
+ tests_run
fi
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 1e42878..7651fd4 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -368,7 +368,7 @@
fail=0
- min=1280
+ min=68 # vti6 can carry IPv4 packets too
max=$((65535 - 40))
# Check invalid values first
for v in $((min - 1)) $((max + 1)); do
@@ -384,7 +384,7 @@
done
# Now check valid values
- for v in 1280 1300 $((65535 - 40)); do
+ for v in 68 1280 1300 $((65535 - 40)); do
${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
${ns_a} ip link del vti6_a
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index e6f4852..760faef 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -502,6 +502,108 @@
echo "PASS: macsec"
}
+#-------------------------------------------------------------------
+# Example commands
+# ip x s add proto esp src 14.0.0.52 dst 14.0.0.70 \
+# spi 0x07 mode transport reqid 0x07 replay-window 32 \
+# aead 'rfc4106(gcm(aes))' 1234567890123456dcba 128 \
+# sel src 14.0.0.52/24 dst 14.0.0.70/24
+# ip x p add dir out src 14.0.0.52/24 dst 14.0.0.70/24 \
+# tmpl proto esp src 14.0.0.52 dst 14.0.0.70 \
+# spi 0x07 mode transport reqid 0x07
+#
+# Subcommands not tested
+# ip x s update
+# ip x s allocspi
+# ip x s deleteall
+# ip x p update
+# ip x p deleteall
+# ip x p set
+#-------------------------------------------------------------------
+kci_test_ipsec()
+{
+ srcip="14.0.0.52"
+ dstip="14.0.0.70"
+ algo="aead rfc4106(gcm(aes)) 0x3132333435363738393031323334353664636261 128"
+
+ # flush to be sure there's nothing configured
+ ip x s flush ; ip x p flush
+ check_err $?
+
+ # start the monitor in the background
+ tmpfile=`mktemp ipsectestXXX`
+ ip x m > $tmpfile &
+ mpid=$!
+ sleep 0.2
+
+ ipsecid="proto esp src $srcip dst $dstip spi 0x07"
+ ip x s add $ipsecid \
+ mode transport reqid 0x07 replay-window 32 \
+ $algo sel src $srcip/24 dst $dstip/24
+ check_err $?
+
+ lines=`ip x s list | grep $srcip | grep $dstip | wc -l`
+ test $lines -eq 2
+ check_err $?
+
+ ip x s count | grep -q "SAD count 1"
+ check_err $?
+
+ lines=`ip x s get $ipsecid | grep $srcip | grep $dstip | wc -l`
+ test $lines -eq 2
+ check_err $?
+
+ ip x s delete $ipsecid
+ check_err $?
+
+ lines=`ip x s list | wc -l`
+ test $lines -eq 0
+ check_err $?
+
+ ipsecsel="dir out src $srcip/24 dst $dstip/24"
+ ip x p add $ipsecsel \
+ tmpl proto esp src $srcip dst $dstip \
+ spi 0x07 mode transport reqid 0x07
+ check_err $?
+
+ lines=`ip x p list | grep $srcip | grep $dstip | wc -l`
+ test $lines -eq 2
+ check_err $?
+
+ ip x p count | grep -q "SPD IN 0 OUT 1 FWD 0"
+ check_err $?
+
+ lines=`ip x p get $ipsecsel | grep $srcip | grep $dstip | wc -l`
+ test $lines -eq 2
+ check_err $?
+
+ ip x p delete $ipsecsel
+ check_err $?
+
+ lines=`ip x p list | wc -l`
+ test $lines -eq 0
+ check_err $?
+
+ # check the monitor results
+ kill $mpid
+ lines=`wc -l $tmpfile | cut "-d " -f1`
+ test $lines -eq 20
+ check_err $?
+ rm -rf $tmpfile
+
+ # clean up any leftovers
+ ip x s flush
+ check_err $?
+ ip x p flush
+ check_err $?
+
+ if [ $ret -ne 0 ]; then
+ echo "FAIL: ipsec"
+ return 1
+ fi
+ echo "PASS: ipsec"
+}
+
kci_test_gretap()
{
testns="testns"
@@ -755,6 +857,7 @@
kci_test_vrf
kci_test_encap
kci_test_macsec
+ kci_test_ipsec
kci_del_dummy
}
diff --git a/tools/testing/selftests/net/tcp_inq.c b/tools/testing/selftests/net/tcp_inq.c
new file mode 100644
index 0000000..d044b29
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_inq.c
@@ -0,0 +1,189 @@
+/*
+ * Copyright 2018 Google Inc.
+ * Author: Soheil Hassas Yeganeh (soheil@google.com)
+ *
+ * Simple example on how to use TCP_INQ and TCP_CM_INQ.
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ */
+#define _GNU_SOURCE
+
+#include <error.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#ifndef TCP_INQ
+#define TCP_INQ 36
+#endif
+
+#ifndef TCP_CM_INQ
+#define TCP_CM_INQ TCP_INQ
+#endif
+
+#define BUF_SIZE 8192
+#define CMSG_SIZE 32
+
+static int family = AF_INET6;
+static socklen_t addr_len = sizeof(struct sockaddr_in6);
+static int port = 4974;
+
+static void setup_loopback_addr(int family, struct sockaddr_storage *sockaddr)
+{
+ struct sockaddr_in6 *addr6 = (void *) sockaddr;
+ struct sockaddr_in *addr4 = (void *) sockaddr;
+
+ switch (family) {
+ case PF_INET:
+ memset(addr4, 0, sizeof(*addr4));
+ addr4->sin_family = AF_INET;
+ addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+ addr4->sin_port = htons(port);
+ break;
+ case PF_INET6:
+ memset(addr6, 0, sizeof(*addr6));
+ addr6->sin6_family = AF_INET6;
+ addr6->sin6_addr = in6addr_loopback;
+ addr6->sin6_port = htons(port);
+ break;
+ default:
+ error(1, 0, "illegal family");
+ }
+}
+
+void *start_server(void *arg)
+{
+ int server_fd = (int)(unsigned long)arg;
+ struct sockaddr_in addr;
+ socklen_t addrlen = sizeof(addr);
+ char *buf;
+ int fd;
+ int r;
+
+ buf = malloc(BUF_SIZE);
+
+ for (;;) {
+ fd = accept(server_fd, (struct sockaddr *)&addr, &addrlen);
+ if (fd == -1) {
+ perror("accept");
+ break;
+ }
+ do {
+ r = send(fd, buf, BUF_SIZE, 0);
+ } while (r < 0 && errno == EINTR);
+ if (r < 0)
+ perror("send");
+ if (r != BUF_SIZE)
+ fprintf(stderr, "can only send %d bytes\n", r);
+ /* TCP_INQ can overestimate in-queue by one byte if we send
+ * the FIN packet. Sleep for 1 second, so that the client
+ * likely invoked recvmsg().
+ */
+ sleep(1);
+ close(fd);
+ }
+
+ free(buf);
+ close(server_fd);
+ pthread_exit(0);
+}
+
+int main(int argc, char *argv[])
+{
+ struct sockaddr_storage listen_addr, addr;
+ int c, one = 1, inq = -1;
+ pthread_t server_thread;
+ char cmsgbuf[CMSG_SIZE];
+ struct iovec iov[1];
+ struct cmsghdr *cm;
+ struct msghdr msg;
+ int server_fd, fd;
+ char *buf;
+
+ while ((c = getopt(argc, argv, "46p:")) != -1) {
+ switch (c) {
+ case '4':
+ family = PF_INET;
+ addr_len = sizeof(struct sockaddr_in);
+ break;
+ case '6':
+ family = PF_INET6;
+ addr_len = sizeof(struct sockaddr_in6);
+ break;
+ case 'p':
+ port = atoi(optarg);
+ break;
+ }
+ }
+
+ server_fd = socket(family, SOCK_STREAM, 0);
+ if (server_fd < 0)
+ error(1, errno, "server socket");
+ setup_loopback_addr(family, &listen_addr);
+ if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR,
+ &one, sizeof(one)) != 0)
+ error(1, errno, "setsockopt(SO_REUSEADDR)");
+ if (bind(server_fd, (const struct sockaddr *)&listen_addr,
+ addr_len) == -1)
+ error(1, errno, "bind");
+ if (listen(server_fd, 128) == -1)
+ error(1, errno, "listen");
+ if (pthread_create(&server_thread, NULL, start_server,
+ (void *)(unsigned long)server_fd) != 0)
+ error(1, errno, "pthread_create");
+
+ fd = socket(family, SOCK_STREAM, 0);
+ if (fd < 0)
+ error(1, errno, "client socket");
+ setup_loopback_addr(family, &addr);
+ if (connect(fd, (const struct sockaddr *)&addr, addr_len) == -1)
+ error(1, errno, "connect");
+ if (setsockopt(fd, SOL_TCP, TCP_INQ, &one, sizeof(one)) != 0)
+ error(1, errno, "setsockopt(TCP_INQ)");
+
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ msg.msg_iov = iov;
+ msg.msg_iovlen = 1;
+ msg.msg_control = cmsgbuf;
+ msg.msg_controllen = sizeof(cmsgbuf);
+ msg.msg_flags = 0;
+
+ buf = malloc(BUF_SIZE);
+ iov[0].iov_base = buf;
+ iov[0].iov_len = BUF_SIZE / 2;
+
+ if (recvmsg(fd, &msg, 0) != iov[0].iov_len)
+ error(1, errno, "recvmsg");
+ if (msg.msg_flags & MSG_CTRUNC)
+ error(1, 0, "control message is truncated");
+
+ for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm))
+ if (cm->cmsg_level == SOL_TCP && cm->cmsg_type == TCP_CM_INQ)
+ inq = *((int *) CMSG_DATA(cm));
+
+ if (inq != BUF_SIZE - iov[0].iov_len) {
+ fprintf(stderr, "unexpected inq: %d\n", inq);
+ exit(1);
+ }
+
+ printf("PASSED\n");
+ free(buf);
+ close(fd);
+ return 0;
+}
diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
new file mode 100644
index 0000000..77f7627
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -0,0 +1,447 @@
+/*
+ * Copyright 2018 Google Inc.
+ * Author: Eric Dumazet (edumazet@google.com)
+ *
+ * Reference program demonstrating tcp mmap() usage,
+ * and SO_RCVLOWAT hints for receiver.
+ *
+ * Note : NIC with header split is needed to use mmap() on TCP :
+ * Each incoming frame must be a multiple of PAGE_SIZE bytes of TCP payload.
+ *
+ * How to use on loopback interface :
+ *
+ * ifconfig lo mtu 61512 # 15*4096 + 40 (ipv6 header) + 32 (TCP with TS option header)
+ * tcp_mmap -s -z &
+ * tcp_mmap -H ::1 -z
+ *
+ * Or leave default lo mtu, but use -M option to set TCP_MAXSEG option to (4096 + 12)
+ * (4096 : page size on x86, 12: TCP TS option length)
+ * tcp_mmap -s -z -M $((4096+12)) &
+ * tcp_mmap -H ::1 -z -M $((4096+12))
+ *
+ * Note: -z option on sender uses MSG_ZEROCOPY, which forces a copy when packets go through loopback interface.
+ * We might use sendfile() instead, but really this test program is about mmap(), for receivers ;)
+ *
+ * $ ./tcp_mmap -s & # Without mmap()
+ * $ for i in {1..4}; do ./tcp_mmap -H ::1 -z ; done
+ * received 32768 MB (0 % mmap'ed) in 14.1157 s, 19.4732 Gbit
+ * cpu usage user:0.057 sys:7.815, 240.234 usec per MB, 65531 c-switches
+ * received 32768 MB (0 % mmap'ed) in 14.6833 s, 18.7204 Gbit
+ * cpu usage user:0.043 sys:8.103, 248.596 usec per MB, 65524 c-switches
+ * received 32768 MB (0 % mmap'ed) in 11.143 s, 24.6682 Gbit
+ * cpu usage user:0.044 sys:6.576, 202.026 usec per MB, 65519 c-switches
+ * received 32768 MB (0 % mmap'ed) in 14.9056 s, 18.4413 Gbit
+ * cpu usage user:0.036 sys:8.193, 251.129 usec per MB, 65530 c-switches
+ * $ kill %1 # kill tcp_mmap server
+ *
+ * $ ./tcp_mmap -s -z & # With mmap()
+ * $ for i in {1..4}; do ./tcp_mmap -H ::1 -z ; done
+ * received 32768 MB (99.9939 % mmap'ed) in 6.73792 s, 40.7956 Gbit
+ * cpu usage user:0.045 sys:2.827, 87.6465 usec per MB, 65532 c-switches
+ * received 32768 MB (99.9939 % mmap'ed) in 7.26732 s, 37.8238 Gbit
+ * cpu usage user:0.037 sys:3.087, 95.3369 usec per MB, 65532 c-switches
+ * received 32768 MB (99.9939 % mmap'ed) in 7.61661 s, 36.0893 Gbit
+ * cpu usage user:0.046 sys:3.559, 110.016 usec per MB, 65529 c-switches
+ * received 32768 MB (99.9939 % mmap'ed) in 7.43764 s, 36.9577 Gbit
+ * cpu usage user:0.035 sys:3.467, 106.873 usec per MB, 65530 c-switches
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sys/types.h>
+#include <fcntl.h>
+#include <error.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <sys/resource.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include <time.h>
+#include <sys/time.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <poll.h>
+#include <linux/tcp.h>
+#include <assert.h>
+
+#ifndef MSG_ZEROCOPY
+#define MSG_ZEROCOPY 0x4000000
+#endif
+
+#define FILE_SZ (1UL << 35)
+static int cfg_family = AF_INET6;
+static socklen_t cfg_alen = sizeof(struct sockaddr_in6);
+static int cfg_port = 8787;
+
+static int rcvbuf; /* Default: autotuning. Can be set with -r <integer> option */
+static int sndbuf; /* Default: autotuning. Can be set with -w <integer> option */
+static int zflg; /* zero copy option. (MSG_ZEROCOPY for sender, mmap() for receiver */
+static int xflg; /* hash received data (simple xor) (-h option) */
+static int keepflag; /* -k option: receiver shall keep all received file in memory (no munmap() calls) */
+
+static int chunk_size = 512*1024;
+
+unsigned long htotal;
+
+static inline void prefetch(const void *x)
+{
+#if defined(__x86_64__)
+ asm volatile("prefetcht0 %P0" : : "m" (*(const char *)x));
+#endif
+}
+
+void hash_zone(void *zone, unsigned int length)
+{
+ unsigned long temp = htotal;
+
+ while (length >= 8*sizeof(long)) {
+ prefetch(zone + 384);
+ temp ^= *(unsigned long *)zone;
+ temp ^= *(unsigned long *)(zone + sizeof(long));
+ temp ^= *(unsigned long *)(zone + 2*sizeof(long));
+ temp ^= *(unsigned long *)(zone + 3*sizeof(long));
+ temp ^= *(unsigned long *)(zone + 4*sizeof(long));
+ temp ^= *(unsigned long *)(zone + 5*sizeof(long));
+ temp ^= *(unsigned long *)(zone + 6*sizeof(long));
+ temp ^= *(unsigned long *)(zone + 7*sizeof(long));
+ zone += 8*sizeof(long);
+ length -= 8*sizeof(long);
+ }
+ while (length >= 1) {
+ temp ^= *(unsigned char *)zone;
+ zone += 1;
+ length--;
+ }
+ htotal = temp;
+}
+
+void *child_thread(void *arg)
+{
+ unsigned long total_mmap = 0, total = 0;
+ struct tcp_zerocopy_receive zc;
+ unsigned long delta_usec;
+ int flags = MAP_SHARED;
+ struct timeval t0, t1;
+ char *buffer = NULL;
+ void *addr = NULL;
+ double throughput;
+ struct rusage ru;
+ int lu, fd;
+
+ fd = (int)(unsigned long)arg;
+
+ gettimeofday(&t0, NULL);
+
+ fcntl(fd, F_SETFL, O_NDELAY);
+ buffer = malloc(chunk_size);
+ if (!buffer) {
+ perror("malloc");
+ goto error;
+ }
+ if (zflg) {
+ addr = mmap(NULL, chunk_size, PROT_READ, flags, fd, 0);
+ if (addr == (void *)-1)
+ zflg = 0;
+ }
+ while (1) {
+ struct pollfd pfd = { .fd = fd, .events = POLLIN, };
+ int sub;
+
+ poll(&pfd, 1, 10000);
+ if (zflg) {
+ socklen_t zc_len = sizeof(zc);
+ int res;
+
+ zc.address = (__u64)addr;
+ zc.length = chunk_size;
+ zc.recv_skip_hint = 0;
+ res = getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE,
+ &zc, &zc_len);
+ if (res == -1)
+ break;
+
+ if (zc.length) {
+ assert(zc.length <= chunk_size);
+ total_mmap += zc.length;
+ if (xflg)
+ hash_zone(addr, zc.length);
+ total += zc.length;
+ }
+ if (zc.recv_skip_hint) {
+ assert(zc.recv_skip_hint <= chunk_size);
+ lu = read(fd, buffer, zc.recv_skip_hint);
+ if (lu > 0) {
+ if (xflg)
+ hash_zone(buffer, lu);
+ total += lu;
+ }
+ }
+ continue;
+ }
+ sub = 0;
+ while (sub < chunk_size) {
+ lu = read(fd, buffer + sub, chunk_size - sub);
+ if (lu == 0)
+ goto end;
+ if (lu < 0)
+ break;
+ if (xflg)
+ hash_zone(buffer + sub, lu);
+ total += lu;
+ sub += lu;
+ }
+ }
+end:
+ gettimeofday(&t1, NULL);
+ delta_usec = (t1.tv_sec - t0.tv_sec) * 1000000 + t1.tv_usec - t0.tv_usec;
+
+ throughput = 0;
+ if (delta_usec)
+ throughput = total * 8.0 / (double)delta_usec / 1000.0;
+ getrusage(RUSAGE_THREAD, &ru);
+ if (total > 1024*1024) {
+ unsigned long total_usec;
+ unsigned long mb = total >> 20;
+ total_usec = 1000000*ru.ru_utime.tv_sec + ru.ru_utime.tv_usec +
+ 1000000*ru.ru_stime.tv_sec + ru.ru_stime.tv_usec;
+ printf("received %lg MB (%lg %% mmap'ed) in %lg s, %lg Gbit\n"
+ " cpu usage user:%lg sys:%lg, %lg usec per MB, %lu c-switches\n",
+ total / (1024.0 * 1024.0),
+ 100.0*total_mmap/total,
+ (double)delta_usec / 1000000.0,
+ throughput,
+ (double)ru.ru_utime.tv_sec + (double)ru.ru_utime.tv_usec / 1000000.0,
+ (double)ru.ru_stime.tv_sec + (double)ru.ru_stime.tv_usec / 1000000.0,
+ (double)total_usec/mb,
+ ru.ru_nvcsw);
+ }
+error:
+ free(buffer);
+ close(fd);
+ if (zflg)
+ munmap(addr, chunk_size);
+ pthread_exit(0);
+}
+
+static void apply_rcvsnd_buf(int fd)
+{
+ if (rcvbuf && setsockopt(fd, SOL_SOCKET,
+ SO_RCVBUF, &rcvbuf, sizeof(rcvbuf)) == -1) {
+ perror("setsockopt SO_RCVBUF");
+ }
+
+ if (sndbuf && setsockopt(fd, SOL_SOCKET,
+ SO_SNDBUF, &sndbuf, sizeof(sndbuf)) == -1) {
+ perror("setsockopt SO_SNDBUF");
+ }
+}
+
+
+static void setup_sockaddr(int domain, const char *str_addr,
+ struct sockaddr_storage *sockaddr)
+{
+ struct sockaddr_in6 *addr6 = (void *) sockaddr;
+ struct sockaddr_in *addr4 = (void *) sockaddr;
+
+ switch (domain) {
+ case PF_INET:
+ memset(addr4, 0, sizeof(*addr4));
+ addr4->sin_family = AF_INET;
+ addr4->sin_port = htons(cfg_port);
+ if (str_addr &&
+ inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1)
+ error(1, 0, "ipv4 parse error: %s", str_addr);
+ break;
+ case PF_INET6:
+ memset(addr6, 0, sizeof(*addr6));
+ addr6->sin6_family = AF_INET6;
+ addr6->sin6_port = htons(cfg_port);
+ if (str_addr &&
+ inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1)
+ error(1, 0, "ipv6 parse error: %s", str_addr);
+ break;
+ default:
+ error(1, 0, "illegal domain");
+ }
+}
+
+static void do_accept(int fdlisten)
+{
+ if (setsockopt(fdlisten, SOL_SOCKET, SO_RCVLOWAT,
+ &chunk_size, sizeof(chunk_size)) == -1) {
+ perror("setsockopt SO_RCVLOWAT");
+ }
+
+ apply_rcvsnd_buf(fdlisten);
+
+ while (1) {
+ struct sockaddr_in addr;
+ socklen_t addrlen = sizeof(addr);
+ pthread_t th;
+ int fd, res;
+
+ fd = accept(fdlisten, (struct sockaddr *)&addr, &addrlen);
+ if (fd == -1) {
+ perror("accept");
+ continue;
+ }
+ res = pthread_create(&th, NULL, child_thread,
+ (void *)(unsigned long)fd);
+ if (res) {
+ errno = res;
+ perror("pthread_create");
+ close(fd);
+ }
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct sockaddr_storage listenaddr, addr;
+ unsigned int max_pacing_rate = 0;
+ unsigned long total = 0;
+ char *host = NULL;
+ int fd, c, on = 1;
+ char *buffer;
+ int sflg = 0;
+ int mss = 0;
+
+ while ((c = getopt(argc, argv, "46p:svr:w:H:zxkP:M:")) != -1) {
+ switch (c) {
+ case '4':
+ cfg_family = PF_INET;
+ cfg_alen = sizeof(struct sockaddr_in);
+ break;
+ case '6':
+ cfg_family = PF_INET6;
+ cfg_alen = sizeof(struct sockaddr_in6);
+ break;
+ case 'p':
+ cfg_port = atoi(optarg);
+ break;
+ case 'H':
+ host = optarg;
+ break;
+ case 's': /* server : listen for incoming connections */
+ sflg++;
+ break;
+ case 'r':
+ rcvbuf = atoi(optarg);
+ break;
+ case 'w':
+ sndbuf = atoi(optarg);
+ break;
+ case 'z':
+ zflg = 1;
+ break;
+ case 'M':
+ mss = atoi(optarg);
+ break;
+ case 'x':
+ xflg = 1;
+ break;
+ case 'k':
+ keepflag = 1;
+ break;
+ case 'P':
+ max_pacing_rate = atoi(optarg) ;
+ break;
+ default:
+ exit(1);
+ }
+ }
+ if (sflg) {
+ int fdlisten = socket(cfg_family, SOCK_STREAM, 0);
+
+ if (fdlisten == -1) {
+ perror("socket");
+ exit(1);
+ }
+ apply_rcvsnd_buf(fdlisten);
+ setsockopt(fdlisten, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
+
+ setup_sockaddr(cfg_family, host, &listenaddr);
+
+ if (mss &&
+ setsockopt(fdlisten, IPPROTO_TCP, TCP_MAXSEG,
+ &mss, sizeof(mss)) == -1) {
+ perror("setsockopt TCP_MAXSEG");
+ exit(1);
+ }
+ if (bind(fdlisten, (const struct sockaddr *)&listenaddr, cfg_alen) == -1) {
+ perror("bind");
+ exit(1);
+ }
+ if (listen(fdlisten, 128) == -1) {
+ perror("listen");
+ exit(1);
+ }
+ do_accept(fdlisten);
+ }
+ buffer = mmap(NULL, chunk_size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (buffer == (char *)-1) {
+ perror("mmap");
+ exit(1);
+ }
+
+ fd = socket(AF_INET6, SOCK_STREAM, 0);
+ if (fd == -1) {
+ perror("socket");
+ exit(1);
+ }
+ apply_rcvsnd_buf(fd);
+
+ setup_sockaddr(cfg_family, host, &addr);
+
+ if (mss &&
+ setsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, &mss, sizeof(mss)) == -1) {
+ perror("setsockopt TCP_MAXSEG");
+ exit(1);
+ }
+ if (connect(fd, (const struct sockaddr *)&addr, cfg_alen) == -1) {
+ perror("connect");
+ exit(1);
+ }
+ if (max_pacing_rate &&
+ setsockopt(fd, SOL_SOCKET, SO_MAX_PACING_RATE,
+ &max_pacing_rate, sizeof(max_pacing_rate)) == -1)
+ perror("setsockopt SO_MAX_PACING_RATE");
+
+ if (zflg && setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY,
+ &on, sizeof(on)) == -1) {
+ perror("setsockopt SO_ZEROCOPY, (-z option disabled)");
+ zflg = 0;
+ }
+ while (total < FILE_SZ) {
+ long wr = FILE_SZ - total;
+
+ if (wr > chunk_size)
+ wr = chunk_size;
+ /* Note : we just want to fill the pipe with 0 bytes */
+ wr = send(fd, buffer, wr, zflg ? MSG_ZEROCOPY : 0);
+ if (wr <= 0)
+ break;
+ total += wr;
+ }
+ close(fd);
+ munmap(buffer, chunk_size);
+ return 0;
+}
diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
new file mode 100644
index 0000000..48a0592
--- /dev/null
+++ b/tools/testing/selftests/net/udpgso.c
@@ -0,0 +1,620 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <stddef.h>
+#include <arpa/inet.h>
+#include <error.h>
+#include <errno.h>
+#include <net/if.h>
+#include <linux/in.h>
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <netinet/if_ether.h>
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+#include <netinet/udp.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#ifndef ETH_MAX_MTU
+#define ETH_MAX_MTU 0xFFFFU
+#endif
+
+#ifndef UDP_SEGMENT
+#define UDP_SEGMENT 103
+#endif
+
+#define CONST_MTU_TEST 1500
+
+#define CONST_HDRLEN_V4 (sizeof(struct iphdr) + sizeof(struct udphdr))
+#define CONST_HDRLEN_V6 (sizeof(struct ip6_hdr) + sizeof(struct udphdr))
+
+#define CONST_MSS_V4 (CONST_MTU_TEST - CONST_HDRLEN_V4)
+#define CONST_MSS_V6 (CONST_MTU_TEST - CONST_HDRLEN_V6)
+
+#define CONST_MAX_SEGS_V4 (ETH_MAX_MTU / CONST_MSS_V4)
+#define CONST_MAX_SEGS_V6 (ETH_MAX_MTU / CONST_MSS_V6)
+
+static bool cfg_do_ipv4;
+static bool cfg_do_ipv6;
+static bool cfg_do_connected;
+static bool cfg_do_connectionless;
+static bool cfg_do_msgmore;
+static bool cfg_do_setsockopt;
+static int cfg_specific_test_id = -1;
+
+static const char cfg_ifname[] = "lo";
+static unsigned short cfg_port = 9000;
+
+static char buf[ETH_MAX_MTU];
+
+struct testcase {
+ int tlen; /* send() buffer size, may exceed mss */
+ bool tfail; /* send() call is expected to fail */
+ int gso_len; /* mss after applying gso */
+ int r_num_mss; /* recv(): number of calls of full mss */
+ int r_len_last; /* recv(): size of last non-mss dgram, if any */
+};
+
+const struct in6_addr addr6 = IN6ADDR_LOOPBACK_INIT;
+const struct in_addr addr4 = { .s_addr = __constant_htonl(INADDR_LOOPBACK + 2) };
+
+struct testcase testcases_v4[] = {
+ {
+ /* no GSO: send a single byte */
+ .tlen = 1,
+ .r_len_last = 1,
+ },
+ {
+ /* no GSO: send a single MSS */
+ .tlen = CONST_MSS_V4,
+ .r_num_mss = 1,
+ },
+ {
+ /* no GSO: send a single MSS + 1B: fail */
+ .tlen = CONST_MSS_V4 + 1,
+ .tfail = true,
+ },
+ {
+ /* send a single MSS: will fail with GSO, because the segment
+ * logic in udp4_ufo_fragment demands a gso skb to be > MTU
+ */
+ .tlen = CONST_MSS_V4,
+ .gso_len = CONST_MSS_V4,
+ .tfail = true,
+ .r_num_mss = 1,
+ },
+ {
+ /* send a single MSS + 1B */
+ .tlen = CONST_MSS_V4 + 1,
+ .gso_len = CONST_MSS_V4,
+ .r_num_mss = 1,
+ .r_len_last = 1,
+ },
+ {
+ /* send exactly 2 MSS */
+ .tlen = CONST_MSS_V4 * 2,
+ .gso_len = CONST_MSS_V4,
+ .r_num_mss = 2,
+ },
+ {
+ /* send 2 MSS + 1B */
+ .tlen = (CONST_MSS_V4 * 2) + 1,
+ .gso_len = CONST_MSS_V4,
+ .r_num_mss = 2,
+ .r_len_last = 1,
+ },
+ {
+ /* send MAX segs */
+ .tlen = (ETH_MAX_MTU / CONST_MSS_V4) * CONST_MSS_V4,
+ .gso_len = CONST_MSS_V4,
+ .r_num_mss = (ETH_MAX_MTU / CONST_MSS_V4),
+ },
+
+ {
+ /* send MAX bytes */
+ .tlen = ETH_MAX_MTU - CONST_HDRLEN_V4,
+ .gso_len = CONST_MSS_V4,
+ .r_num_mss = CONST_MAX_SEGS_V4,
+ .r_len_last = ETH_MAX_MTU - CONST_HDRLEN_V4 -
+ (CONST_MAX_SEGS_V4 * CONST_MSS_V4),
+ },
+ {
+ /* send MAX + 1: fail */
+ .tlen = ETH_MAX_MTU - CONST_HDRLEN_V4 + 1,
+ .gso_len = CONST_MSS_V4,
+ .tfail = true,
+ },
+ {
+ /* EOL */
+ }
+};
+
+#ifndef IP6_MAX_MTU
+#define IP6_MAX_MTU (ETH_MAX_MTU + sizeof(struct ip6_hdr))
+#endif
+
+struct testcase testcases_v6[] = {
+ {
+ /* no GSO: send a single byte */
+ .tlen = 1,
+ .r_len_last = 1,
+ },
+ {
+ /* no GSO: send a single MSS */
+ .tlen = CONST_MSS_V6,
+ .r_num_mss = 1,
+ },
+ {
+ /* no GSO: send a single MSS + 1B: fail */
+ .tlen = CONST_MSS_V6 + 1,
+ .tfail = true,
+ },
+ {
+ /* send a single MSS: will fail with GSO, because the segment
+ * logic in udp4_ufo_fragment demands a gso skb to be > MTU
+ */
+ .tlen = CONST_MSS_V6,
+ .gso_len = CONST_MSS_V6,
+ .tfail = true,
+ .r_num_mss = 1,
+ },
+ {
+ /* send a single MSS + 1B */
+ .tlen = CONST_MSS_V6 + 1,
+ .gso_len = CONST_MSS_V6,
+ .r_num_mss = 1,
+ .r_len_last = 1,
+ },
+ {
+ /* send exactly 2 MSS */
+ .tlen = CONST_MSS_V6 * 2,
+ .gso_len = CONST_MSS_V6,
+ .r_num_mss = 2,
+ },
+ {
+ /* send 2 MSS + 1B */
+ .tlen = (CONST_MSS_V6 * 2) + 1,
+ .gso_len = CONST_MSS_V6,
+ .r_num_mss = 2,
+ .r_len_last = 1,
+ },
+ {
+ /* send MAX segs */
+ .tlen = (IP6_MAX_MTU / CONST_MSS_V6) * CONST_MSS_V6,
+ .gso_len = CONST_MSS_V6,
+ .r_num_mss = (IP6_MAX_MTU / CONST_MSS_V6),
+ },
+
+ {
+ /* send MAX bytes */
+ .tlen = IP6_MAX_MTU - CONST_HDRLEN_V6,
+ .gso_len = CONST_MSS_V6,
+ .r_num_mss = CONST_MAX_SEGS_V6,
+ .r_len_last = IP6_MAX_MTU - CONST_HDRLEN_V6 -
+ (CONST_MAX_SEGS_V6 * CONST_MSS_V6),
+ },
+ {
+ /* send MAX + 1: fail */
+ .tlen = IP6_MAX_MTU - CONST_HDRLEN_V6 + 1,
+ .gso_len = CONST_MSS_V6,
+ .tfail = true,
+ },
+ {
+ /* EOL */
+ }
+};
+
+static unsigned int get_device_mtu(int fd, const char *ifname)
+{
+ struct ifreq ifr;
+
+ memset(&ifr, 0, sizeof(ifr));
+
+ strcpy(ifr.ifr_name, ifname);
+
+ if (ioctl(fd, SIOCGIFMTU, &ifr))
+ error(1, errno, "ioctl get mtu");
+
+ return ifr.ifr_mtu;
+}
+
+static void __set_device_mtu(int fd, const char *ifname, unsigned int mtu)
+{
+ struct ifreq ifr;
+
+ memset(&ifr, 0, sizeof(ifr));
+
+ ifr.ifr_mtu = mtu;
+ strcpy(ifr.ifr_name, ifname);
+
+ if (ioctl(fd, SIOCSIFMTU, &ifr))
+ error(1, errno, "ioctl set mtu");
+}
+
+static void set_device_mtu(int fd, int mtu)
+{
+ int val;
+
+ val = get_device_mtu(fd, cfg_ifname);
+ fprintf(stderr, "device mtu (orig): %u\n", val);
+
+ __set_device_mtu(fd, cfg_ifname, mtu);
+ val = get_device_mtu(fd, cfg_ifname);
+ if (val != mtu)
+ error(1, 0, "unable to set device mtu to %u\n", val);
+
+ fprintf(stderr, "device mtu (test): %u\n", val);
+}
+
+static void set_pmtu_discover(int fd, bool is_ipv4)
+{
+ int level, name, val;
+
+ if (is_ipv4) {
+ level = SOL_IP;
+ name = IP_MTU_DISCOVER;
+ val = IP_PMTUDISC_DO;
+ } else {
+ level = SOL_IPV6;
+ name = IPV6_MTU_DISCOVER;
+ val = IPV6_PMTUDISC_DO;
+ }
+
+ if (setsockopt(fd, level, name, &val, sizeof(val)))
+ error(1, errno, "setsockopt path mtu");
+}
+
+static unsigned int get_path_mtu(int fd, bool is_ipv4)
+{
+ socklen_t vallen;
+ unsigned int mtu;
+ int ret;
+
+ vallen = sizeof(mtu);
+ if (is_ipv4)
+ ret = getsockopt(fd, SOL_IP, IP_MTU, &mtu, &vallen);
+ else
+ ret = getsockopt(fd, SOL_IPV6, IPV6_MTU, &mtu, &vallen);
+
+ if (ret)
+ error(1, errno, "getsockopt mtu");
+
+
+ fprintf(stderr, "path mtu (read): %u\n", mtu);
+ return mtu;
+}
+
+/* very wordy version of system("ip route add dev lo mtu 1500 127.0.0.3/32") */
+static void set_route_mtu(int mtu, bool is_ipv4)
+{
+ struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
+ struct nlmsghdr *nh;
+ struct rtattr *rta;
+ struct rtmsg *rt;
+ char data[NLMSG_ALIGN(sizeof(*nh)) +
+ NLMSG_ALIGN(sizeof(*rt)) +
+ NLMSG_ALIGN(RTA_LENGTH(sizeof(addr6))) +
+ NLMSG_ALIGN(RTA_LENGTH(sizeof(int))) +
+ NLMSG_ALIGN(RTA_LENGTH(0) + RTA_LENGTH(sizeof(int)))];
+ int fd, ret, alen, off = 0;
+
+ alen = is_ipv4 ? sizeof(addr4) : sizeof(addr6);
+
+ fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+ if (fd == -1)
+ error(1, errno, "socket netlink");
+
+ memset(data, 0, sizeof(data));
+
+ nh = (void *)data;
+ nh->nlmsg_type = RTM_NEWROUTE;
+ nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE;
+ off += NLMSG_ALIGN(sizeof(*nh));
+
+ rt = (void *)(data + off);
+ rt->rtm_family = is_ipv4 ? AF_INET : AF_INET6;
+ rt->rtm_table = RT_TABLE_MAIN;
+ rt->rtm_dst_len = alen << 3;
+ rt->rtm_protocol = RTPROT_BOOT;
+ rt->rtm_scope = RT_SCOPE_UNIVERSE;
+ rt->rtm_type = RTN_UNICAST;
+ off += NLMSG_ALIGN(sizeof(*rt));
+
+ rta = (void *)(data + off);
+ rta->rta_type = RTA_DST;
+ rta->rta_len = RTA_LENGTH(alen);
+ if (is_ipv4)
+ memcpy(RTA_DATA(rta), &addr4, alen);
+ else
+ memcpy(RTA_DATA(rta), &addr6, alen);
+ off += NLMSG_ALIGN(rta->rta_len);
+
+ rta = (void *)(data + off);
+ rta->rta_type = RTA_OIF;
+ rta->rta_len = RTA_LENGTH(sizeof(int));
+ *((int *)(RTA_DATA(rta))) = 1; //if_nametoindex("lo");
+ off += NLMSG_ALIGN(rta->rta_len);
+
+ /* MTU is a subtype in a metrics type */
+ rta = (void *)(data + off);
+ rta->rta_type = RTA_METRICS;
+ rta->rta_len = RTA_LENGTH(0) + RTA_LENGTH(sizeof(int));
+ off += NLMSG_ALIGN(rta->rta_len);
+
+ /* now fill MTU subtype. Note that it fits within above rta_len */
+ rta = (void *)(((char *) rta) + RTA_LENGTH(0));
+ rta->rta_type = RTAX_MTU;
+ rta->rta_len = RTA_LENGTH(sizeof(int));
+ *((int *)(RTA_DATA(rta))) = mtu;
+
+ nh->nlmsg_len = off;
+
+ ret = sendto(fd, data, off, 0, (void *)&nladdr, sizeof(nladdr));
+ if (ret != off)
+ error(1, errno, "send netlink: %uB != %uB\n", ret, off);
+
+ if (close(fd))
+ error(1, errno, "close netlink");
+
+ fprintf(stderr, "route mtu (test): %u\n", mtu);
+}
+
+static bool __send_one(int fd, struct msghdr *msg, int flags)
+{
+ int ret;
+
+ ret = sendmsg(fd, msg, flags);
+ if (ret == -1 && (errno == EMSGSIZE || errno == ENOMEM))
+ return false;
+ if (ret == -1)
+ error(1, errno, "sendmsg");
+ if (ret != msg->msg_iov->iov_len)
+ error(1, 0, "sendto: %d != %lu", ret, msg->msg_iov->iov_len);
+ if (msg->msg_flags)
+ error(1, 0, "sendmsg: return flags 0x%x\n", msg->msg_flags);
+
+ return true;
+}
+
+static bool send_one(int fd, int len, int gso_len,
+ struct sockaddr *addr, socklen_t alen)
+{
+ char control[CMSG_SPACE(sizeof(uint16_t))] = {0};
+ struct msghdr msg = {0};
+ struct iovec iov = {0};
+ struct cmsghdr *cm;
+
+ iov.iov_base = buf;
+ iov.iov_len = len;
+
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+
+ msg.msg_name = addr;
+ msg.msg_namelen = alen;
+
+ if (gso_len && !cfg_do_setsockopt) {
+ msg.msg_control = control;
+ msg.msg_controllen = sizeof(control);
+
+ cm = CMSG_FIRSTHDR(&msg);
+ cm->cmsg_level = SOL_UDP;
+ cm->cmsg_type = UDP_SEGMENT;
+ cm->cmsg_len = CMSG_LEN(sizeof(uint16_t));
+ *((uint16_t *) CMSG_DATA(cm)) = gso_len;
+ }
+
+ /* If MSG_MORE, send 1 byte followed by remainder */
+ if (cfg_do_msgmore && len > 1) {
+ iov.iov_len = 1;
+ if (!__send_one(fd, &msg, MSG_MORE))
+ error(1, 0, "send 1B failed");
+
+ iov.iov_base++;
+ iov.iov_len = len - 1;
+ }
+
+ return __send_one(fd, &msg, 0);
+}
+
+static int recv_one(int fd, int flags)
+{
+ int ret;
+
+ ret = recv(fd, buf, sizeof(buf), flags);
+ if (ret == -1 && errno == EAGAIN && (flags & MSG_DONTWAIT))
+ return 0;
+ if (ret == -1)
+ error(1, errno, "recv");
+
+ return ret;
+}
+
+static void run_one(struct testcase *test, int fdt, int fdr,
+ struct sockaddr *addr, socklen_t alen)
+{
+ int i, ret, val, mss;
+ bool sent;
+
+ fprintf(stderr, "ipv%d tx:%d gso:%d %s\n",
+ addr->sa_family == AF_INET ? 4 : 6,
+ test->tlen, test->gso_len,
+ test->tfail ? "(fail)" : "");
+
+ val = test->gso_len;
+ if (cfg_do_setsockopt) {
+ if (setsockopt(fdt, SOL_UDP, UDP_SEGMENT, &val, sizeof(val)))
+ error(1, errno, "setsockopt udp segment");
+ }
+
+ sent = send_one(fdt, test->tlen, test->gso_len, addr, alen);
+ if (sent && test->tfail)
+ error(1, 0, "send succeeded while expecting failure");
+ if (!sent && !test->tfail)
+ error(1, 0, "send failed while expecting success");
+ if (!sent)
+ return;
+
+ mss = addr->sa_family == AF_INET ? CONST_MSS_V4 : CONST_MSS_V6;
+
+ /* Recv all full MSS datagrams */
+ for (i = 0; i < test->r_num_mss; i++) {
+ ret = recv_one(fdr, 0);
+ if (ret != mss)
+ error(1, 0, "recv.%d: %d != %d", i, ret, mss);
+ }
+
+ /* Recv the non-full last datagram, if tlen was not a multiple of mss */
+ if (test->r_len_last) {
+ ret = recv_one(fdr, 0);
+ if (ret != test->r_len_last)
+ error(1, 0, "recv.%d: %d != %d (last)",
+ i, ret, test->r_len_last);
+ }
+
+ /* Verify received all data */
+ ret = recv_one(fdr, MSG_DONTWAIT);
+ if (ret)
+ error(1, 0, "recv: unexpected datagram");
+}
+
+static void run_all(int fdt, int fdr, struct sockaddr *addr, socklen_t alen)
+{
+ struct testcase *tests, *test;
+
+ tests = addr->sa_family == AF_INET ? testcases_v4 : testcases_v6;
+
+ for (test = tests; test->tlen; test++) {
+ /* if a specific test is given, then skip all others */
+ if (cfg_specific_test_id == -1 ||
+ cfg_specific_test_id == test - tests)
+ run_one(test, fdt, fdr, addr, alen);
+ }
+}
+
+static void run_test(struct sockaddr *addr, socklen_t alen)
+{
+ struct timeval tv = { .tv_usec = 100 * 1000 };
+ int fdr, fdt, val;
+
+ fdr = socket(addr->sa_family, SOCK_DGRAM, 0);
+ if (fdr == -1)
+ error(1, errno, "socket r");
+
+ if (bind(fdr, addr, alen))
+ error(1, errno, "bind");
+
+ /* Have tests fail quickly instead of hang */
+ if (setsockopt(fdr, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
+ error(1, errno, "setsockopt rcv timeout");
+
+ fdt = socket(addr->sa_family, SOCK_DGRAM, 0);
+ if (fdt == -1)
+ error(1, errno, "socket t");
+
+ /* Do not fragment these datagrams: only succeed if GSO works */
+ set_pmtu_discover(fdt, addr->sa_family == AF_INET);
+
+ if (cfg_do_connectionless) {
+ set_device_mtu(fdt, CONST_MTU_TEST);
+ run_all(fdt, fdr, addr, alen);
+ }
+
+ if (cfg_do_connected) {
+ set_device_mtu(fdt, CONST_MTU_TEST + 100);
+ set_route_mtu(CONST_MTU_TEST, addr->sa_family == AF_INET);
+
+ if (connect(fdt, addr, alen))
+ error(1, errno, "connect");
+
+ val = get_path_mtu(fdt, addr->sa_family == AF_INET);
+ if (val != CONST_MTU_TEST)
+ error(1, 0, "bad path mtu %u\n", val);
+
+ run_all(fdt, fdr, addr, 0 /* use connected addr */);
+ }
+
+ if (close(fdt))
+ error(1, errno, "close t");
+ if (close(fdr))
+ error(1, errno, "close r");
+}
+
+static void run_test_v4(void)
+{
+ struct sockaddr_in addr = {0};
+
+ addr.sin_family = AF_INET;
+ addr.sin_port = htons(cfg_port);
+ addr.sin_addr = addr4;
+
+ run_test((void *)&addr, sizeof(addr));
+}
+
+static void run_test_v6(void)
+{
+ struct sockaddr_in6 addr = {0};
+
+ addr.sin6_family = AF_INET6;
+ addr.sin6_port = htons(cfg_port);
+ addr.sin6_addr = addr6;
+
+ run_test((void *)&addr, sizeof(addr));
+}
+
+static void parse_opts(int argc, char **argv)
+{
+ int c;
+
+ while ((c = getopt(argc, argv, "46cCmst:")) != -1) {
+ switch (c) {
+ case '4':
+ cfg_do_ipv4 = true;
+ break;
+ case '6':
+ cfg_do_ipv6 = true;
+ break;
+ case 'c':
+ cfg_do_connected = true;
+ break;
+ case 'C':
+ cfg_do_connectionless = true;
+ break;
+ case 'm':
+ cfg_do_msgmore = true;
+ break;
+ case 's':
+ cfg_do_setsockopt = true;
+ break;
+ case 't':
+ cfg_specific_test_id = strtoul(optarg, NULL, 0);
+ break;
+ default:
+ error(1, 0, "%s: parse error", argv[0]);
+ }
+ }
+}
+
+int main(int argc, char **argv)
+{
+ parse_opts(argc, argv);
+
+ if (cfg_do_ipv4)
+ run_test_v4();
+ if (cfg_do_ipv6)
+ run_test_v6();
+
+ fprintf(stderr, "OK\n");
+ return 0;
+}
diff --git a/tools/testing/selftests/net/udpgso.sh b/tools/testing/selftests/net/udpgso.sh
new file mode 100755
index 0000000..fec24f5
--- /dev/null
+++ b/tools/testing/selftests/net/udpgso.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgso regression tests
+
+echo "ipv4 cmsg"
+./in_netns.sh ./udpgso -4 -C
+
+echo "ipv4 setsockopt"
+./in_netns.sh ./udpgso -4 -C -s
+
+echo "ipv6 cmsg"
+./in_netns.sh ./udpgso -6 -C
+
+echo "ipv6 setsockopt"
+./in_netns.sh ./udpgso -6 -C -s
+
+echo "ipv4 connected"
+./in_netns.sh ./udpgso -4 -c
+
+# blocked on 2nd loopback address
+# echo "ipv6 connected"
+# ./in_netns.sh ./udpgso -6 -c
+
+echo "ipv4 msg_more"
+./in_netns.sh ./udpgso -4 -C -m
+
+echo "ipv6 msg_more"
+./in_netns.sh ./udpgso -6 -C -m
diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh
new file mode 100755
index 0000000..792fa4d
--- /dev/null
+++ b/tools/testing/selftests/net/udpgso_bench.sh
@@ -0,0 +1,74 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Run a series of udpgso benchmarks
+
+wake_children() {
+ local -r jobs="$(jobs -p)"
+
+ if [[ "${jobs}" != "" ]]; then
+ kill -1 ${jobs} 2>/dev/null
+ fi
+}
+trap wake_children EXIT
+
+run_one() {
+ local -r args=$@
+
+ ./udpgso_bench_rx &
+ ./udpgso_bench_rx -t &
+
+ ./udpgso_bench_tx ${args}
+}
+
+run_in_netns() {
+ local -r args=$@
+
+ ./in_netns.sh $0 __subprocess ${args}
+}
+
+run_udp() {
+ local -r args=$@
+
+ echo "udp"
+ run_in_netns ${args}
+
+ echo "udp gso"
+ run_in_netns ${args} -S
+
+ echo "udp gso zerocopy"
+ run_in_netns ${args} -S -z
+}
+
+run_tcp() {
+ local -r args=$@
+
+ echo "tcp"
+ run_in_netns ${args} -t
+
+ echo "tcp zerocopy"
+ run_in_netns ${args} -t -z
+}
+
+run_all() {
+ local -r core_args="-l 4"
+ local -r ipv4_args="${core_args} -4 -D 127.0.0.1"
+ local -r ipv6_args="${core_args} -6 -D ::1"
+
+ echo "ipv4"
+ run_tcp "${ipv4_args}"
+ run_udp "${ipv4_args}"
+
+ echo "ipv6"
+ run_tcp "${ipv4_args}"
+ run_udp "${ipv6_args}"
+}
+
+if [[ $# -eq 0 ]]; then
+ run_all
+elif [[ $1 == "__subprocess" ]]; then
+ shift
+ run_one $@
+else
+ run_in_netns $@
+fi
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
new file mode 100644
index 0000000..727cf67
--- /dev/null
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -0,0 +1,265 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <error.h>
+#include <errno.h>
+#include <limits.h>
+#include <linux/errqueue.h>
+#include <linux/if_packet.h>
+#include <linux/socket.h>
+#include <linux/sockios.h>
+#include <net/ethernet.h>
+#include <net/if.h>
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+#include <netinet/tcp.h>
+#include <netinet/udp.h>
+#include <poll.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+static int cfg_port = 8000;
+static bool cfg_tcp;
+static bool cfg_verify;
+
+static bool interrupted;
+static unsigned long packets, bytes;
+
+static void sigint_handler(int signum)
+{
+ if (signum == SIGINT)
+ interrupted = true;
+}
+
+static unsigned long gettimeofday_ms(void)
+{
+ struct timeval tv;
+
+ gettimeofday(&tv, NULL);
+ return (tv.tv_sec * 1000) + (tv.tv_usec / 1000);
+}
+
+static void do_poll(int fd)
+{
+ struct pollfd pfd;
+ int ret;
+
+ pfd.events = POLLIN;
+ pfd.revents = 0;
+ pfd.fd = fd;
+
+ do {
+ ret = poll(&pfd, 1, 10);
+ if (ret == -1)
+ error(1, errno, "poll");
+ if (ret == 0)
+ continue;
+ if (pfd.revents != POLLIN)
+ error(1, errno, "poll: 0x%x expected 0x%x\n",
+ pfd.revents, POLLIN);
+ } while (!ret && !interrupted);
+}
+
+static int do_socket(bool do_tcp)
+{
+ struct sockaddr_in6 addr = {0};
+ int fd, val;
+
+ fd = socket(PF_INET6, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
+ if (fd == -1)
+ error(1, errno, "socket");
+
+ val = 1 << 21;
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val)))
+ error(1, errno, "setsockopt rcvbuf");
+ val = 1;
+ if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &val, sizeof(val)))
+ error(1, errno, "setsockopt reuseport");
+
+ addr.sin6_family = PF_INET6;
+ addr.sin6_port = htons(cfg_port);
+ addr.sin6_addr = in6addr_any;
+ if (bind(fd, (void *) &addr, sizeof(addr)))
+ error(1, errno, "bind");
+
+ if (do_tcp) {
+ int accept_fd = fd;
+
+ if (listen(accept_fd, 1))
+ error(1, errno, "listen");
+
+ do_poll(accept_fd);
+
+ fd = accept(accept_fd, NULL, NULL);
+ if (fd == -1)
+ error(1, errno, "accept");
+ if (close(accept_fd))
+ error(1, errno, "close accept fd");
+ }
+
+ return fd;
+}
+
+/* Flush all outstanding bytes for the tcp receive queue */
+static void do_flush_tcp(int fd)
+{
+ int ret;
+
+ while (true) {
+ /* MSG_TRUNC flushes up to len bytes */
+ ret = recv(fd, NULL, 1 << 21, MSG_TRUNC | MSG_DONTWAIT);
+ if (ret == -1 && errno == EAGAIN)
+ return;
+ if (ret == -1)
+ error(1, errno, "flush");
+ if (ret == 0) {
+ /* client detached */
+ exit(0);
+ }
+
+ packets++;
+ bytes += ret;
+ }
+
+}
+
+static char sanitized_char(char val)
+{
+ return (val >= 'a' && val <= 'z') ? val : '.';
+}
+
+static void do_verify_udp(const char *data, int len)
+{
+ char cur = data[0];
+ int i;
+
+ /* verify contents */
+ if (cur < 'a' || cur > 'z')
+ error(1, 0, "data initial byte out of range");
+
+ for (i = 1; i < len; i++) {
+ if (cur == 'z')
+ cur = 'a';
+ else
+ cur++;
+
+ if (data[i] != cur)
+ error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n",
+ i, len,
+ sanitized_char(data[i]), data[i],
+ sanitized_char(cur), cur);
+ }
+}
+
+/* Flush all outstanding datagrams. Verify first few bytes of each. */
+static void do_flush_udp(int fd)
+{
+ static char rbuf[ETH_DATA_LEN];
+ int ret, len, budget = 256;
+
+ len = cfg_verify ? sizeof(rbuf) : 0;
+ while (budget--) {
+ /* MSG_TRUNC will make return value full datagram length */
+ ret = recv(fd, rbuf, len, MSG_TRUNC | MSG_DONTWAIT);
+ if (ret == -1 && errno == EAGAIN)
+ return;
+ if (ret == -1)
+ error(1, errno, "recv");
+ if (len) {
+ if (ret == 0)
+ error(1, errno, "recv: 0 byte datagram\n");
+
+ do_verify_udp(rbuf, ret);
+ }
+
+ packets++;
+ bytes += ret;
+ }
+}
+
+static void usage(const char *filepath)
+{
+ error(1, 0, "Usage: %s [-tv] [-p port]", filepath);
+}
+
+static void parse_opts(int argc, char **argv)
+{
+ int c;
+
+ while ((c = getopt(argc, argv, "ptv")) != -1) {
+ switch (c) {
+ case 'p':
+ cfg_port = htons(strtoul(optarg, NULL, 0));
+ break;
+ case 't':
+ cfg_tcp = true;
+ break;
+ case 'v':
+ cfg_verify = true;
+ break;
+ }
+ }
+
+ if (optind != argc)
+ usage(argv[0]);
+
+ if (cfg_tcp && cfg_verify)
+ error(1, 0, "TODO: implement verify mode for tcp");
+}
+
+static void do_recv(void)
+{
+ unsigned long tnow, treport;
+ int fd;
+
+ fd = do_socket(cfg_tcp);
+
+ treport = gettimeofday_ms() + 1000;
+ do {
+ do_poll(fd);
+
+ if (cfg_tcp)
+ do_flush_tcp(fd);
+ else
+ do_flush_udp(fd);
+
+ tnow = gettimeofday_ms();
+ if (tnow > treport) {
+ if (packets)
+ fprintf(stderr,
+ "%s rx: %6lu MB/s %8lu calls/s\n",
+ cfg_tcp ? "tcp" : "udp",
+ bytes >> 20, packets);
+ bytes = packets = 0;
+ treport = tnow + 1000;
+ }
+
+ } while (!interrupted);
+
+ if (close(fd))
+ error(1, errno, "close");
+}
+
+int main(int argc, char **argv)
+{
+ parse_opts(argc, argv);
+
+ signal(SIGINT, sigint_handler);
+
+ do_recv();
+
+ return 0;
+}
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c
new file mode 100644
index 0000000..e821564
--- /dev/null
+++ b/tools/testing/selftests/net/udpgso_bench_tx.c
@@ -0,0 +1,420 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <error.h>
+#include <netinet/if_ether.h>
+#include <netinet/in.h>
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+#include <netinet/udp.h>
+#include <poll.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#ifndef ETH_MAX_MTU
+#define ETH_MAX_MTU 0xFFFFU
+#endif
+
+#ifndef UDP_SEGMENT
+#define UDP_SEGMENT 103
+#endif
+
+#ifndef SO_ZEROCOPY
+#define SO_ZEROCOPY 60
+#endif
+
+#ifndef MSG_ZEROCOPY
+#define MSG_ZEROCOPY 0x4000000
+#endif
+
+#define NUM_PKT 100
+
+static bool cfg_cache_trash;
+static int cfg_cpu = -1;
+static int cfg_connected = true;
+static int cfg_family = PF_UNSPEC;
+static uint16_t cfg_mss;
+static int cfg_payload_len = (1472 * 42);
+static int cfg_port = 8000;
+static int cfg_runtime_ms = -1;
+static bool cfg_segment;
+static bool cfg_sendmmsg;
+static bool cfg_tcp;
+static bool cfg_zerocopy;
+
+static socklen_t cfg_alen;
+static struct sockaddr_storage cfg_dst_addr;
+
+static bool interrupted;
+static char buf[NUM_PKT][ETH_MAX_MTU];
+
+static void sigint_handler(int signum)
+{
+ if (signum == SIGINT)
+ interrupted = true;
+}
+
+static unsigned long gettimeofday_ms(void)
+{
+ struct timeval tv;
+
+ gettimeofday(&tv, NULL);
+ return (tv.tv_sec * 1000) + (tv.tv_usec / 1000);
+}
+
+static int set_cpu(int cpu)
+{
+ cpu_set_t mask;
+
+ CPU_ZERO(&mask);
+ CPU_SET(cpu, &mask);
+ if (sched_setaffinity(0, sizeof(mask), &mask))
+ error(1, 0, "setaffinity %d", cpu);
+
+ return 0;
+}
+
+static void setup_sockaddr(int domain, const char *str_addr, void *sockaddr)
+{
+ struct sockaddr_in6 *addr6 = (void *) sockaddr;
+ struct sockaddr_in *addr4 = (void *) sockaddr;
+
+ switch (domain) {
+ case PF_INET:
+ addr4->sin_family = AF_INET;
+ addr4->sin_port = htons(cfg_port);
+ if (inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1)
+ error(1, 0, "ipv4 parse error: %s", str_addr);
+ break;
+ case PF_INET6:
+ addr6->sin6_family = AF_INET6;
+ addr6->sin6_port = htons(cfg_port);
+ if (inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1)
+ error(1, 0, "ipv6 parse error: %s", str_addr);
+ break;
+ default:
+ error(1, 0, "illegal domain");
+ }
+}
+
+static void flush_zerocopy(int fd)
+{
+ struct msghdr msg = {0}; /* flush */
+ int ret;
+
+ while (1) {
+ ret = recvmsg(fd, &msg, MSG_ERRQUEUE);
+ if (ret == -1 && errno == EAGAIN)
+ break;
+ if (ret == -1)
+ error(1, errno, "errqueue");
+ if (msg.msg_flags != (MSG_ERRQUEUE | MSG_CTRUNC))
+ error(1, 0, "errqueue: flags 0x%x\n", msg.msg_flags);
+ msg.msg_flags = 0;
+ }
+}
+
+static int send_tcp(int fd, char *data)
+{
+ int ret, done = 0, count = 0;
+
+ while (done < cfg_payload_len) {
+ ret = send(fd, data + done, cfg_payload_len - done,
+ cfg_zerocopy ? MSG_ZEROCOPY : 0);
+ if (ret == -1)
+ error(1, errno, "write");
+
+ done += ret;
+ count++;
+ }
+
+ return count;
+}
+
+static int send_udp(int fd, char *data)
+{
+ int ret, total_len, len, count = 0;
+
+ total_len = cfg_payload_len;
+
+ while (total_len) {
+ len = total_len < cfg_mss ? total_len : cfg_mss;
+
+ ret = sendto(fd, data, len, cfg_zerocopy ? MSG_ZEROCOPY : 0,
+ cfg_connected ? NULL : (void *)&cfg_dst_addr,
+ cfg_connected ? 0 : cfg_alen);
+ if (ret == -1)
+ error(1, errno, "write");
+ if (ret != len)
+ error(1, errno, "write: %uB != %uB\n", ret, len);
+
+ total_len -= len;
+ count++;
+ }
+
+ return count;
+}
+
+static int send_udp_sendmmsg(int fd, char *data)
+{
+ const int max_nr_msg = ETH_MAX_MTU / ETH_DATA_LEN;
+ struct mmsghdr mmsgs[max_nr_msg];
+ struct iovec iov[max_nr_msg];
+ unsigned int off = 0, left;
+ int i = 0, ret;
+
+ memset(mmsgs, 0, sizeof(mmsgs));
+
+ left = cfg_payload_len;
+ while (left) {
+ if (i == max_nr_msg)
+ error(1, 0, "sendmmsg: exceeds max_nr_msg");
+
+ iov[i].iov_base = data + off;
+ iov[i].iov_len = cfg_mss < left ? cfg_mss : left;
+
+ mmsgs[i].msg_hdr.msg_iov = iov + i;
+ mmsgs[i].msg_hdr.msg_iovlen = 1;
+
+ off += iov[i].iov_len;
+ left -= iov[i].iov_len;
+ i++;
+ }
+
+ ret = sendmmsg(fd, mmsgs, i, cfg_zerocopy ? MSG_ZEROCOPY : 0);
+ if (ret == -1)
+ error(1, errno, "sendmmsg");
+
+ return ret;
+}
+
+static void send_udp_segment_cmsg(struct cmsghdr *cm)
+{
+ uint16_t *valp;
+
+ cm->cmsg_level = SOL_UDP;
+ cm->cmsg_type = UDP_SEGMENT;
+ cm->cmsg_len = CMSG_LEN(sizeof(cfg_mss));
+ valp = (void *)CMSG_DATA(cm);
+ *valp = cfg_mss;
+}
+
+static int send_udp_segment(int fd, char *data)
+{
+ char control[CMSG_SPACE(sizeof(cfg_mss))] = {0};
+ struct msghdr msg = {0};
+ struct iovec iov = {0};
+ int ret;
+
+ iov.iov_base = data;
+ iov.iov_len = cfg_payload_len;
+
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+
+ msg.msg_control = control;
+ msg.msg_controllen = sizeof(control);
+ send_udp_segment_cmsg(CMSG_FIRSTHDR(&msg));
+
+ msg.msg_name = (void *)&cfg_dst_addr;
+ msg.msg_namelen = cfg_alen;
+
+ ret = sendmsg(fd, &msg, cfg_zerocopy ? MSG_ZEROCOPY : 0);
+ if (ret == -1)
+ error(1, errno, "sendmsg");
+ if (ret != iov.iov_len)
+ error(1, 0, "sendmsg: %u != %lu\n", ret, iov.iov_len);
+
+ return 1;
+}
+
+static void usage(const char *filepath)
+{
+ error(1, 0, "Usage: %s [-46cmStuz] [-C cpu] [-D dst ip] [-l secs] [-p port] [-s sendsize]",
+ filepath);
+}
+
+static void parse_opts(int argc, char **argv)
+{
+ int max_len, hdrlen;
+ int c;
+
+ while ((c = getopt(argc, argv, "46cC:D:l:mp:s:Stuz")) != -1) {
+ switch (c) {
+ case '4':
+ if (cfg_family != PF_UNSPEC)
+ error(1, 0, "Pass one of -4 or -6");
+ cfg_family = PF_INET;
+ cfg_alen = sizeof(struct sockaddr_in);
+ break;
+ case '6':
+ if (cfg_family != PF_UNSPEC)
+ error(1, 0, "Pass one of -4 or -6");
+ cfg_family = PF_INET6;
+ cfg_alen = sizeof(struct sockaddr_in6);
+ break;
+ case 'c':
+ cfg_cache_trash = true;
+ break;
+ case 'C':
+ cfg_cpu = strtol(optarg, NULL, 0);
+ break;
+ case 'D':
+ setup_sockaddr(cfg_family, optarg, &cfg_dst_addr);
+ break;
+ case 'l':
+ cfg_runtime_ms = strtoul(optarg, NULL, 10) * 1000;
+ break;
+ case 'm':
+ cfg_sendmmsg = true;
+ break;
+ case 'p':
+ cfg_port = strtoul(optarg, NULL, 0);
+ break;
+ case 's':
+ cfg_payload_len = strtoul(optarg, NULL, 0);
+ break;
+ case 'S':
+ cfg_segment = true;
+ break;
+ case 't':
+ cfg_tcp = true;
+ break;
+ case 'u':
+ cfg_connected = false;
+ break;
+ case 'z':
+ cfg_zerocopy = true;
+ break;
+ }
+ }
+
+ if (optind != argc)
+ usage(argv[0]);
+
+ if (cfg_family == PF_UNSPEC)
+ error(1, 0, "must pass one of -4 or -6");
+ if (cfg_tcp && !cfg_connected)
+ error(1, 0, "connectionless tcp makes no sense");
+ if (cfg_segment && cfg_sendmmsg)
+ error(1, 0, "cannot combine segment offload and sendmmsg");
+
+ if (cfg_family == PF_INET)
+ hdrlen = sizeof(struct iphdr) + sizeof(struct udphdr);
+ else
+ hdrlen = sizeof(struct ip6_hdr) + sizeof(struct udphdr);
+
+ cfg_mss = ETH_DATA_LEN - hdrlen;
+ max_len = ETH_MAX_MTU - hdrlen;
+
+ if (cfg_payload_len > max_len)
+ error(1, 0, "payload length %u exceeds max %u",
+ cfg_payload_len, max_len);
+}
+
+static void set_pmtu_discover(int fd, bool is_ipv4)
+{
+ int level, name, val;
+
+ if (is_ipv4) {
+ level = SOL_IP;
+ name = IP_MTU_DISCOVER;
+ val = IP_PMTUDISC_DO;
+ } else {
+ level = SOL_IPV6;
+ name = IPV6_MTU_DISCOVER;
+ val = IPV6_PMTUDISC_DO;
+ }
+
+ if (setsockopt(fd, level, name, &val, sizeof(val)))
+ error(1, errno, "setsockopt path mtu");
+}
+
+int main(int argc, char **argv)
+{
+ unsigned long num_msgs, num_sends;
+ unsigned long tnow, treport, tstop;
+ int fd, i, val;
+
+ parse_opts(argc, argv);
+
+ if (cfg_cpu > 0)
+ set_cpu(cfg_cpu);
+
+ for (i = 0; i < sizeof(buf[0]); i++)
+ buf[0][i] = 'a' + (i % 26);
+ for (i = 1; i < NUM_PKT; i++)
+ memcpy(buf[i], buf[0], sizeof(buf[0]));
+
+ signal(SIGINT, sigint_handler);
+
+ fd = socket(cfg_family, cfg_tcp ? SOCK_STREAM : SOCK_DGRAM, 0);
+ if (fd == -1)
+ error(1, errno, "socket");
+
+ if (cfg_zerocopy) {
+ val = 1;
+ if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val)))
+ error(1, errno, "setsockopt zerocopy");
+ }
+
+ if (cfg_connected &&
+ connect(fd, (void *)&cfg_dst_addr, cfg_alen))
+ error(1, errno, "connect");
+
+ if (cfg_segment)
+ set_pmtu_discover(fd, cfg_family == PF_INET);
+
+ num_msgs = num_sends = 0;
+ tnow = gettimeofday_ms();
+ tstop = tnow + cfg_runtime_ms;
+ treport = tnow + 1000;
+
+ i = 0;
+ do {
+ if (cfg_tcp)
+ num_sends += send_tcp(fd, buf[i]);
+ else if (cfg_segment)
+ num_sends += send_udp_segment(fd, buf[i]);
+ else if (cfg_sendmmsg)
+ num_sends += send_udp_sendmmsg(fd, buf[i]);
+ else
+ num_sends += send_udp(fd, buf[i]);
+ num_msgs++;
+
+ if (cfg_zerocopy && ((num_msgs & 0xF) == 0))
+ flush_zerocopy(fd);
+
+ tnow = gettimeofday_ms();
+ if (tnow > treport) {
+ fprintf(stderr,
+ "%s tx: %6lu MB/s %8lu calls/s %6lu msg/s\n",
+ cfg_tcp ? "tcp" : "udp",
+ (num_msgs * cfg_payload_len) >> 20,
+ num_sends, num_msgs);
+ num_msgs = num_sends = 0;
+ treport = tnow + 1000;
+ }
+
+ /* cold cache when writing buffer */
+ if (cfg_cache_trash)
+ i = ++i < NUM_PKT ? i : 0;
+
+ } while (!interrupted && (cfg_runtime_ms == -1 || tnow < tstop));
+
+ if (close(fd))
+ error(1, errno, "close");
+
+ return 0;
+}
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
index 93cf8fe..3a2f51f 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
@@ -398,13 +398,83 @@
255
]
],
- "cmdUnderTest": "for i in `seq 1 32`; do cmd=\"action csum tcp continue index $i \"; args=\"$args$cmd\"; done && $TC actions add $args",
- "expExitCode": "255",
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+ "expExitCode": "0",
"verifyCmd": "$TC actions ls action csum",
"matchPattern": "^[ \t]+index [0-9]* ref",
"matchCount": "32",
"teardown": [
"$TC actions flush action csum"
]
+ },
+ {
+ "id": "b4e9",
+ "name": "Delete batch of 32 csum actions",
+ "category": [
+ "actions",
+ "csum"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action csum",
+ 0,
+ 1,
+ 255
+ ],
+ "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\""
+ ],
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action csum",
+ "matchPattern": "^[ \t]+index [0-9]+ ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "0015",
+ "name": "Add batch of 32 csum tcp actions with large cookies",
+ "category": [
+ "actions",
+ "csum"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action csum",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i cookie aaabbbcccdddeee \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions ls action csum",
+ "matchPattern": "^[ \t]+index [0-9]* ref",
+ "matchCount": "32",
+ "teardown": [
+ "$TC actions flush action csum"
+ ]
+ },
+ {
+ "id": "989e",
+ "name": "Delete batch of 32 csum actions with large cookies",
+ "category": [
+ "actions",
+ "csum"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action csum",
+ 0,
+ 1,
+ 255
+ ],
+ "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum tcp continue index \\$i cookie aaabbbcccdddeee \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\""
+ ],
+ "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action csum index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action csum",
+ "matchPattern": "^[ \t]+index [0-9]+ ref",
+ "matchCount": "0",
+ "teardown": []
}
]
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
index 9f34f07..de97e4f 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
@@ -1,7 +1,7 @@
[
{
- "id": "a568",
- "name": "Add action with ife type",
+ "id": "7682",
+ "name": "Create valid ife encode action with mark and pass control",
"category": [
"actions",
"ife"
@@ -12,21 +12,20 @@
0,
1,
255
- ],
- "$TC actions add action ife encode type 0xDEAD index 1"
+ ]
],
- "cmdUnderTest": "$TC actions get action ife index 1",
+ "cmdUnderTest": "$TC actions add action ife encode allow mark pass index 2",
"expExitCode": "0",
- "verifyCmd": "$TC actions get action ife index 1",
- "matchPattern": "type 0xDEAD",
+ "verifyCmd": "$TC actions get action ife index 2",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*allow mark.*index 2",
"matchCount": "1",
"teardown": [
"$TC actions flush action ife"
]
},
{
- "id": "b983",
- "name": "Add action without ife type",
+ "id": "ef47",
+ "name": "Create valid ife encode action with mark and pipe control",
"category": [
"actions",
"ife"
@@ -37,16 +36,1029 @@
0,
1,
255
- ],
- "$TC actions add action ife encode index 1"
+ ]
],
- "cmdUnderTest": "$TC actions get action ife index 1",
+ "cmdUnderTest": "$TC actions add action ife encode use mark 10 pipe index 2",
"expExitCode": "0",
- "verifyCmd": "$TC actions get action ife index 1",
- "matchPattern": "type 0xED3E",
+ "verifyCmd": "$TC actions get action ife index 2",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*use mark.*index 2",
"matchCount": "1",
"teardown": [
"$TC actions flush action ife"
]
+ },
+ {
+ "id": "df43",
+ "name": "Create valid ife encode action with mark and continue control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow mark continue index 2",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 2",
+ "matchPattern": "action order [0-9]*: ife encode action continue.*type 0xED3E.*allow mark.*index 2",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "e4cf",
+ "name": "Create valid ife encode action with mark and drop control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use mark 789 drop index 2",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 2",
+ "matchPattern": "action order [0-9]*: ife encode action drop.*type 0xED3E.*use mark 789.*index 2",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "ccba",
+ "name": "Create valid ife encode action with mark and reclassify control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use mark 656768 reclassify index 2",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 2",
+ "matchPattern": "action order [0-9]*: ife encode action reclassify.*type 0xED3E.*use mark 656768.*index 2",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "a1cf",
+ "name": "Create valid ife encode action with mark and jump control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use mark 65 jump 1 index 2",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 2",
+ "matchPattern": "action order [0-9]*: ife encode action jump 1.*type 0xED3E.*use mark 65.*index 2",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "cb3d",
+ "name": "Create valid ife encode action with mark value at 32-bit maximum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use mark 4294967295 reclassify index 90",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 90",
+ "matchPattern": "action order [0-9]*: ife encode action reclassify.*type 0xED3E.*use mark 4294967295.*index 90",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "1efb",
+ "name": "Create ife encode action with mark value exceeding 32-bit maximum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use mark 4294967295999 pipe index 90",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 90",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*use mark 4294967295999.*index 90",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "95ed",
+ "name": "Create valid ife encode action with prio and pass control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow prio pass index 9",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 9",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*allow prio.*index 9",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "aa17",
+ "name": "Create valid ife encode action with prio and pipe control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 7 pipe index 9",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 9",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*use prio 7.*index 9",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "74c7",
+ "name": "Create valid ife encode action with prio and continue control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 3 continue index 9",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 9",
+ "matchPattern": "action order [0-9]*: ife encode action continue.*type 0xED3E.*use prio 3.*index 9",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "7a97",
+ "name": "Create valid ife encode action with prio and drop control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow prio drop index 9",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 9",
+ "matchPattern": "action order [0-9]*: ife encode action drop.*type 0xED3E.*allow prio.*index 9",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "f66b",
+ "name": "Create valid ife encode action with prio and reclassify control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 998877 reclassify index 9",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 9",
+ "matchPattern": "action order [0-9]*: ife encode action reclassify.*type 0xED3E.*use prio 998877.*index 9",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "3056",
+ "name": "Create valid ife encode action with prio and jump control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 998877 jump 10 index 9",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 9",
+ "matchPattern": "action order [0-9]*: ife encode action jump 10.*type 0xED3E.*use prio 998877.*index 9",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "7dd3",
+ "name": "Create valid ife encode action with prio value at 32-bit maximum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 4294967295 reclassify index 99",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 99",
+ "matchPattern": "action order [0-9]*: ife encode action reclassify.*type 0xED3E.*use prio 4294967295.*index 99",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "2ca1",
+ "name": "Create ife encode action with prio value exceeding 32-bit maximum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 4294967298 pipe index 99",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 99",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*use prio 4294967298.*index 99",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "05bb",
+ "name": "Create valid ife encode action with tcindex and pass control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow tcindex pass index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*allow tcindex.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "ce65",
+ "name": "Create valid ife encode action with tcindex and pipe control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use tcindex 111 pipe index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*use tcindex 111.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "09cd",
+ "name": "Create valid ife encode action with tcindex and continue control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use tcindex 1 continue index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action continue.*type 0xED3E.*use tcindex 1.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "8eb5",
+ "name": "Create valid ife encode action with tcindex and continue control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use tcindex 1 continue index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action continue.*type 0xED3E.*use tcindex 1.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "451a",
+ "name": "Create valid ife encode action with tcindex and drop control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow tcindex drop index 77",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 77",
+ "matchPattern": "action order [0-9]*: ife encode action drop.*type 0xED3E.*allow tcindex.*index 77",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "d76c",
+ "name": "Create valid ife encode action with tcindex and reclassify control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow tcindex reclassify index 77",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 77",
+ "matchPattern": "action order [0-9]*: ife encode action reclassify.*type 0xED3E.*allow tcindex.*index 77",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "e731",
+ "name": "Create valid ife encode action with tcindex and jump control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow tcindex jump 999 index 77",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 77",
+ "matchPattern": "action order [0-9]*: ife encode action jump 999.*type 0xED3E.*allow tcindex.*index 77",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "b7b8",
+ "name": "Create valid ife encode action with tcindex value at 16-bit maximum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use tcindex 65535 pass index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*use tcindex 65535.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action skbedit"
+ ]
+ },
+ {
+ "id": "d0d8",
+ "name": "Create ife encode action with tcindex value exceeding 16-bit maximum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use tcindex 65539 pipe index 1",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*use tcindex 65539.*index 1",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "2a9c",
+ "name": "Create valid ife encode action with mac src parameter",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow mark src 00:11:22:33:44:55 pipe index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*allow mark src 00:11:22:33:44:55.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "cf5c",
+ "name": "Create valid ife encode action with mac dst parameter",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 9876 dst 00:11:22:33:44:55 reclassify index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action reclassify.*type 0xED3E.*use prio 9876 dst 00:11:22:33:44:55.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "2353",
+ "name": "Create valid ife encode action with mac src and mac dst parameters",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow tcindex src 00:aa:bb:cc:dd:ee dst 00:11:22:33:44:55 pass index 11",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 11",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*allow tcindex dst 00:11:22:33:44:55 src 00:aa:bb:cc:dd:ee .*index 11",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "552c",
+ "name": "Create valid ife encode action with mark and type parameters",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use mark 7 type 0xfefe pass index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xFEFE.*use mark 7.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "0421",
+ "name": "Create valid ife encode action with prio and type parameters",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use prio 444 type 0xabba pipe index 21",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 21",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xABBA.*use prio 444.*index 21",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "4017",
+ "name": "Create valid ife encode action with tcindex and type parameters",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode use tcindex 5000 type 0xabcd reclassify index 21",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 21",
+ "matchPattern": "action order [0-9]*: ife encode action reclassify.*type 0xABCD.*use tcindex 5000.*index 21",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "fac3",
+ "name": "Create valid ife encode action with index at 32-bit maximnum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow mark pass index 4294967295",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 4294967295",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*allow mark.*index 4294967295",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "7c25",
+ "name": "Create valid ife decode action with pass control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife decode pass index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife decode action pass.*type 0x0.*allow mark allow tcindex allow prio.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "dccb",
+ "name": "Create valid ife decode action with pipe control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife decode pipe index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife decode action pipe.*type 0x0.*allow mark allow tcindex allow prio.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "7bb9",
+ "name": "Create valid ife decode action with continue control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife decode continue index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife decode action continue.*type 0x0.*allow mark allow tcindex allow prio.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "d9ad",
+ "name": "Create valid ife decode action with drop control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife decode drop index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife decode action drop.*type 0x0.*allow mark allow tcindex allow prio.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "219f",
+ "name": "Create valid ife decode action with reclassify control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife decode reclassify index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife decode action reclassify.*type 0x0.*allow mark allow tcindex allow prio.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "8f44",
+ "name": "Create valid ife decode action with jump control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife decode jump 10 index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 1",
+ "matchPattern": "action order [0-9]*: ife decode action jump 10.*type 0x0.*allow mark allow tcindex allow prio.*index 1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "56cf",
+ "name": "Create ife encode action with index exceeding 32-bit maximum",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow mark pass index 4294967295999",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 4294967295999",
+ "matchPattern": "action order [0-9]*: ife encode action pass.*type 0xED3E.*allow mark.*index 4294967295999",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "ee94",
+ "name": "Create ife encode action with invalid control",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow mark kuka index 4",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 4",
+ "matchPattern": "action order [0-9]*: ife encode action kuka.*type 0xED3E.*allow mark.*index 4",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "b330",
+ "name": "Create ife encode action with cookie",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow prio pipe index 4 cookie aabbccddeeff112233445566778800a1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action ife index 4",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*allow prio.*index 4.*cookie aabbccddeeff112233445566778800a1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action ife"
+ ]
+ },
+ {
+ "id": "bbc0",
+ "name": "Create ife encode action with invalid argument",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow foo pipe index 4",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 4",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0xED3E.*allow foo.*index 4",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "d54a",
+ "name": "Create ife encode action with invalid type argument",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow prio type 70000 pipe index 4",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 4",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*type 0x11170.*allow prio.*index 4",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "7ee0",
+ "name": "Create ife encode action with invalid mac src argument",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow prio src 00:11:22:33:44:pp pipe index 4",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 4",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*allow prio.*index 4",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "0a7d",
+ "name": "Create ife encode action with invalid mac dst argument",
+ "category": [
+ "actions",
+ "ife"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action ife",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action ife encode allow prio dst 00.111-22:33:44:aa pipe index 4",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action ife index 4",
+ "matchPattern": "action order [0-9]*: ife encode action pipe.*allow prio.*index 4",
+ "matchCount": "0",
+ "teardown": []
}
]
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json b/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json
index 443c9b3..6e4edfa 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json
@@ -340,7 +340,7 @@
},
{
"id": "8b69",
- "name": "Add mirred mirror action with maximum index",
+ "name": "Add mirred mirror action with index at 32-bit maximum",
"category": [
"actions",
"mirred"
@@ -363,6 +363,28 @@
]
},
{
+ "id": "3f66",
+ "name": "Add mirred mirror action with index exceeding 32-bit maximum",
+ "category": [
+ "actions",
+ "mirred"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action mirred",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action mirred ingress mirror dev lo pipe index 429496729555",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action mirred index 429496729555",
+ "matchPattern": "action order [0-9]*: mirred \\(Ingress Mirror to device lo\\) pipe.*index 429496729555",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
"id": "a70e",
"name": "Delete mirred mirror action",
"category": [
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
index 38d85a1..f03763d 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
@@ -401,11 +401,11 @@
],
"cmdUnderTest": "$TC actions add action police rate 10mbit burst 10k index 4294967295",
"expExitCode": "0",
- "verifyCmd": "$TC actions get action mirred index 4294967295",
+ "verifyCmd": "$TC actions get action police index 4294967295",
"matchPattern": "action order [0-9]*: police 0xffffffff rate 10Mbit burst 10Kb mtu 2Kb",
"matchCount": "1",
"teardown": [
- "$TC actions flush action mirred"
+ "$TC actions flush action police"
]
},
{
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json b/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json
new file mode 100644
index 0000000..3aca33c
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/sample.json
@@ -0,0 +1,588 @@
+[
+ {
+ "id": "9784",
+ "name": "Add valid sample action with mandatory arguments",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 10 group 1 index 2",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 2",
+ "matchPattern": "action order [0-9]+: sample rate 1/10 group 1.*index 2 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "5c91",
+ "name": "Add valid sample action with mandatory arguments and continue control action",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 700 group 2 continue index 2",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 2",
+ "matchPattern": "action order [0-9]+: sample rate 1/700 group 2 continue.*index 2 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "334b",
+ "name": "Add valid sample action with mandatory arguments and drop control action",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 10000 group 11 drop index 22",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/10000 group 11 drop.*index 22 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "da69",
+ "name": "Add valid sample action with mandatory arguments and reclassify control action",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 20000 group 72 reclassify index 100",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/20000 group 72 reclassify.*index 100 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "13ce",
+ "name": "Add valid sample action with mandatory arguments and pipe control action",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 20 group 2 pipe index 100",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/20 group 2 pipe.*index 100 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "1886",
+ "name": "Add valid sample action with mandatory arguments and jump control action",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 700 group 25 jump 4 index 200",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 200",
+ "matchPattern": "action order [0-9]+: sample rate 1/700 group 25 jump 4.*index 200 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "b6d4",
+ "name": "Add sample action with mandatory arguments and invalid control action",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 200000 group 52 foo index 1",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/200000 group 52 foo.*index 1 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "a874",
+ "name": "Add invalid sample action without mandatory arguments",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample index 1",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample.*index 1 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "ac01",
+ "name": "Add invalid sample action without mandatory argument rate",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample group 10 index 1",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample.*group 10.*index 1 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "4203",
+ "name": "Add invalid sample action without mandatory argument group",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 100 index 10",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action sample index 10",
+ "matchPattern": "action order [0-9]+: sample rate 1/100.*index 10 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "14a7",
+ "name": "Add invalid sample action without mandatory argument group",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 100 index 10",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action sample index 10",
+ "matchPattern": "action order [0-9]+: sample rate 1/100.*index 10 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "8f2e",
+ "name": "Add valid sample action with trunc argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 1024 group 4 trunc 1024 index 10",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 10",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 4 trunc_size 1024 pipe.*index 10 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "45f8",
+ "name": "Add sample action with maximum rate argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 4294967295 group 4 index 10",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 10",
+ "matchPattern": "action order [0-9]+: sample rate 1/4294967295 group 4 pipe.*index 10 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "ad0c",
+ "name": "Add sample action with maximum trunc argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 16000 group 4 trunc 4294967295 index 10",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 10",
+ "matchPattern": "action order [0-9]+: sample rate 1/16000 group 4 trunc_size 4294967295 pipe.*index 10 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "83a9",
+ "name": "Add sample action with maximum group argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 4294 group 4294967295 index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 1",
+ "matchPattern": "action order [0-9]+: sample rate 1/4294 group 4294967295 pipe.*index 1 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "ed27",
+ "name": "Add sample action with invalid rate argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 4294967296 group 4 index 10",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action sample index 10",
+ "matchPattern": "action order [0-9]+: sample rate 1/4294967296 group 4 pipe.*index 10 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "2eae",
+ "name": "Add sample action with invalid group argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 4098 group 5294967299 continue index 1",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action sample index 1",
+ "matchPattern": "action order [0-9]+: sample rate 1/4098 group 5294967299 continue.*index 1 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "6ff3",
+ "name": "Add sample action with invalid trunc size",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 1024 group 4 trunc 112233445566 index 11",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action sample index 11",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 4 trunc_size 112233445566.*index 11 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "2b2a",
+ "name": "Add sample action with invalid index",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 1024 group 4 index 5294967299",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action sample index 5294967299",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 4 pipe.*index 5294967299 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "dee2",
+ "name": "Add sample action with maximum allowed index",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 1024 group 4 index 4294967295",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 4294967295",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 4 pipe.*index 4294967295 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "560e",
+ "name": "Add sample action with cookie",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action sample rate 1024 group 4 index 45 cookie aabbccdd",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action sample index 45",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 4 pipe.*index 45.*cookie aabbccdd",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "704a",
+ "name": "Replace existing sample action with new rate argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action sample rate 1024 group 4 index 4"
+ ],
+ "cmdUnderTest": "$TC actions replace action sample rate 2048 group 4 index 4",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/2048 group 4 pipe.*index 4",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "60eb",
+ "name": "Replace existing sample action with new group argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action sample rate 1024 group 4 index 4"
+ ],
+ "cmdUnderTest": "$TC actions replace action sample rate 1024 group 7 index 4",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 7 pipe.*index 4",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "2cce",
+ "name": "Replace existing sample action with new trunc argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action sample rate 1024 group 4 trunc 48 index 4"
+ ],
+ "cmdUnderTest": "$TC actions replace action sample rate 1024 group 7 trunc 64 index 4",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 7 trunc_size 64 pipe.*index 4",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ },
+ {
+ "id": "59d1",
+ "name": "Replace existing sample action with new control argument",
+ "category": [
+ "actions",
+ "sample"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action sample",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action sample rate 1024 group 4 reclassify index 4"
+ ],
+ "cmdUnderTest": "$TC actions replace action sample rate 1024 group 7 pipe index 4",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action sample",
+ "matchPattern": "action order [0-9]+: sample rate 1/1024 group 7 pipe.*index 4",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action sample"
+ ]
+ }
+]
diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
index 4510ddf..69ea09e 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
@@ -1,7 +1,7 @@
[
{
"id": "6f5a",
- "name": "Add vlan pop action",
+ "name": "Add vlan pop action with pipe opcode",
"category": [
"actions",
"vlan"
@@ -14,18 +14,18 @@
255
]
],
- "cmdUnderTest": "$TC actions add action vlan pop index 8",
+ "cmdUnderTest": "$TC actions add action vlan pop pipe index 8",
"expExitCode": "0",
"verifyCmd": "$TC actions list action vlan",
- "matchPattern": "action order [0-9]+: vlan.*pop.*index 8 ref",
+ "matchPattern": "action order [0-9]+: vlan.*pop.*pipe.*index 8 ref",
"matchCount": "1",
"teardown": [
"$TC actions flush action vlan"
]
},
{
- "id": "ee6f",
- "name": "Add vlan pop action with large index",
+ "id": "df35",
+ "name": "Add vlan pop action with pass opcode",
"category": [
"actions",
"vlan"
@@ -38,10 +38,82 @@
255
]
],
- "cmdUnderTest": "$TC actions add action vlan pop index 4294967295",
+ "cmdUnderTest": "$TC actions add action vlan pop pass index 8",
"expExitCode": "0",
- "verifyCmd": "$TC actions list action vlan",
- "matchPattern": "action order [0-9]+: vlan.*pop.*index 4294967295 ref",
+ "verifyCmd": "$TC actions get action vlan index 8",
+ "matchPattern": "action order [0-9]+: vlan.*pop.*pass.*index 8 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
+ "id": "b0d4",
+ "name": "Add vlan pop action with drop opcode",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan pop drop index 8",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 8",
+ "matchPattern": "action order [0-9]+: vlan.*pop.*drop.*index 8 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
+ "id": "95ee",
+ "name": "Add vlan pop action with reclassify opcode",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan pop reclassify index 8",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 8",
+ "matchPattern": "action order [0-9]+: vlan.*pop.*reclassify.*index 8 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
+ "id": "0283",
+ "name": "Add vlan pop action with continue opcode",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan pop continue index 8",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 8",
+ "matchPattern": "action order [0-9]+: vlan.*pop.*continue.*index 8 ref",
"matchCount": "1",
"teardown": [
"$TC actions flush action vlan"
@@ -96,6 +168,74 @@
]
},
{
+ "id": "a178",
+ "name": "Add vlan pop action with invalid opcode",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan pop foo index 8",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions list action vlan",
+ "matchPattern": "action order [0-9]+: vlan.*pop.*foo.*index 8 ref",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
+ "id": "ee6f",
+ "name": "Add vlan pop action with index at 32-bit maximum",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan pop index 4294967295",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions list action vlan",
+ "matchPattern": "action order [0-9]+: vlan.*pop.*index 4294967295 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
+ "id": "0dfa",
+ "name": "Add vlan pop action with index exceeding 32-bit maximum",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan pop reclassify index 429496729599",
+ "expExitCode": "255",
+ "verifyCmd": "$TC actions get action vlan index 429496729599",
+ "matchPattern": "action order [0-9]+: vlan.*pop.reclassify.*index 429496729599",
+ "matchCount": "0",
+ "teardown": []
+ },
+ {
"id": "2b91",
"name": "Add vlan invalid action",
"category": [
@@ -115,13 +255,11 @@
"verifyCmd": "$TC actions list action vlan",
"matchPattern": "action order [0-9]+: vlan.*bad_mode",
"matchCount": "0",
- "teardown": [
- "$TC actions flush action vlan"
- ]
+ "teardown": []
},
{
"id": "57fc",
- "name": "Add vlan action with invalid protocol type",
+ "name": "Add vlan push action with invalid protocol type",
"category": [
"actions",
"vlan"
@@ -139,9 +277,7 @@
"verifyCmd": "$TC actions list action vlan",
"matchPattern": "action order [0-9]+: vlan.*push",
"matchCount": "0",
- "teardown": [
- "$TC actions flush action vlan"
- ]
+ "teardown": []
},
{
"id": "3989",
@@ -216,6 +352,30 @@
]
},
{
+ "id": "1f4b",
+ "name": "Add vlan push action with maximum 12-bit vlan ID",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan push id 4094 index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 1",
+ "matchPattern": "action order [0-9]+: vlan.*push id 4094.*protocol 802.1Q.*priority 0.*index 1 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
"id": "1f7b",
"name": "Add vlan push action with invalid vlan ID",
"category": [
@@ -240,6 +400,30 @@
]
},
{
+ "id": "fe40",
+ "name": "Add vlan push action with maximum 3-bit IEEE 802.1p priority",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ]
+ ],
+ "cmdUnderTest": "$TC actions add action vlan push id 4 priority 7 reclassify index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 1",
+ "matchPattern": "action order [0-9]+: vlan.*push id 4.*protocol 802.1Q.*priority 7.*reclassify.*index 1 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
"id": "5d02",
"name": "Add vlan push action with invalid IEEE 802.1p priority",
"category": [
@@ -259,9 +443,7 @@
"verifyCmd": "$TC actions list action vlan",
"matchPattern": "action order [0-9]+: vlan.*push id 5.*index 1 ref",
"matchCount": "0",
- "teardown": [
- "$TC actions flush action vlan"
- ]
+ "teardown": []
},
{
"id": "6812",
@@ -312,6 +494,106 @@
]
},
{
+ "id": "3deb",
+ "name": "Replace existing vlan push action with new ID",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action vlan push id 500 pipe index 12"
+ ],
+ "cmdUnderTest": "$TC actions replace action vlan push id 700 pipe index 12",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 12",
+ "matchPattern": "action order [0-9]+: vlan.*push id 700 protocol 802.1Q priority 0 pipe.*index 12 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
+ "id": "9e76",
+ "name": "Replace existing vlan push action with new protocol",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action vlan push id 1 protocol 802.1Q pipe index 1"
+ ],
+ "cmdUnderTest": "$TC actions replace action vlan push id 1 protocol 802.1ad pipe index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 1",
+ "matchPattern": "action order [0-9]+: vlan.*push id 1 protocol 802.1ad priority 0 pipe.*index 1 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
+ "id": "ede4",
+ "name": "Replace existing vlan push action with new priority",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action vlan push id 1 protocol 802.1Q priority 3 reclassify index 1"
+ ],
+ "cmdUnderTest": "$TC actions replace action vlan push id 1 priority 4 reclassify index 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 1",
+ "matchPattern": "action order [0-9]+: vlan.*push id 1 protocol 802.1Q priority 4 reclassify.*index 1 ref",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
+ "id": "d413",
+ "name": "Replace existing vlan pop action with new cookie",
+ "category": [
+ "actions",
+ "vlan"
+ ],
+ "setup": [
+ [
+ "$TC actions flush action vlan",
+ 0,
+ 1,
+ 255
+ ],
+ "$TC actions add action vlan pop continue index 1 cookie 22334455"
+ ],
+ "cmdUnderTest": "$TC actions replace action vlan pop continue index 1 cookie a1b1c2d1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC actions get action vlan index 1",
+ "matchPattern": "action order [0-9]+: vlan.*pop continue.*index 1 ref.*cookie a1b1c2d1",
+ "matchCount": "1",
+ "teardown": [
+ "$TC actions flush action vlan"
+ ]
+ },
+ {
"id": "83a4",
"name": "Delete vlan pop action",
"category": [
@@ -385,7 +667,7 @@
},
{
"id": "1d78",
- "name": "Add vlan action with cookie",
+ "name": "Add vlan push action with cookie",
"category": [
"actions",
"vlan"
diff --git a/tools/testing/selftests/uevent/Makefile b/tools/testing/selftests/uevent/Makefile
new file mode 100644
index 0000000..f7baa9a
--- /dev/null
+++ b/tools/testing/selftests/uevent/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+include ../lib.mk
+
+.PHONY: all clean
+
+BINARIES := uevent_filtering
+CFLAGS += -Wl,-no-as-needed -Wall
+
+uevent_filtering: uevent_filtering.c ../kselftest.h ../kselftest_harness.h
+ $(CC) $(CFLAGS) $< -o $@
+
+TEST_PROGS += $(BINARIES)
+EXTRA_CLEAN := $(BINARIES)
+
+all: $(BINARIES)
diff --git a/tools/testing/selftests/uevent/config b/tools/testing/selftests/uevent/config
new file mode 100644
index 0000000..1038f45
--- /dev/null
+++ b/tools/testing/selftests/uevent/config
@@ -0,0 +1,2 @@
+CONFIG_USER_NS=y
+CONFIG_NET=y
diff --git a/tools/testing/selftests/uevent/uevent_filtering.c b/tools/testing/selftests/uevent/uevent_filtering.c
new file mode 100644
index 0000000..f83391a
--- /dev/null
+++ b/tools/testing/selftests/uevent/uevent_filtering.c
@@ -0,0 +1,486 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/netlink.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sched.h>
+#include <sys/eventfd.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+#include "../kselftest_harness.h"
+
+#define __DEV_FULL "/sys/devices/virtual/mem/full/uevent"
+#define __UEVENT_BUFFER_SIZE (2048 * 2)
+#define __UEVENT_HEADER "add@/devices/virtual/mem/full"
+#define __UEVENT_HEADER_LEN sizeof("add@/devices/virtual/mem/full")
+#define __UEVENT_LISTEN_ALL -1
+
+ssize_t read_nointr(int fd, void *buf, size_t count)
+{
+ ssize_t ret;
+
+again:
+ ret = read(fd, buf, count);
+ if (ret < 0 && errno == EINTR)
+ goto again;
+
+ return ret;
+}
+
+ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+ ssize_t ret;
+
+again:
+ ret = write(fd, buf, count);
+ if (ret < 0 && errno == EINTR)
+ goto again;
+
+ return ret;
+}
+
+int wait_for_pid(pid_t pid)
+{
+ int status, ret;
+
+again:
+ ret = waitpid(pid, &status, 0);
+ if (ret == -1) {
+ if (errno == EINTR)
+ goto again;
+
+ return -1;
+ }
+
+ if (ret != pid)
+ goto again;
+
+ if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
+ return -1;
+
+ return 0;
+}
+
+static int uevent_listener(unsigned long post_flags, bool expect_uevent,
+ int sync_fd)
+{
+ int sk_fd, ret;
+ socklen_t sk_addr_len;
+ int fret = -1, rcv_buf_sz = __UEVENT_BUFFER_SIZE;
+ uint64_t sync_add = 1;
+ struct sockaddr_nl sk_addr = { 0 }, rcv_addr = { 0 };
+ char buf[__UEVENT_BUFFER_SIZE] = { 0 };
+ struct iovec iov = { buf, __UEVENT_BUFFER_SIZE };
+ char control[CMSG_SPACE(sizeof(struct ucred))];
+ struct msghdr hdr = {
+ &rcv_addr, sizeof(rcv_addr), &iov, 1,
+ control, sizeof(control), 0,
+ };
+
+ sk_fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC,
+ NETLINK_KOBJECT_UEVENT);
+ if (sk_fd < 0) {
+ fprintf(stderr, "%s - Failed to open uevent socket\n", strerror(errno));
+ return -1;
+ }
+
+ ret = setsockopt(sk_fd, SOL_SOCKET, SO_RCVBUF, &rcv_buf_sz,
+ sizeof(rcv_buf_sz));
+ if (ret < 0) {
+ fprintf(stderr, "%s - Failed to set socket options\n", strerror(errno));
+ goto on_error;
+ }
+
+ sk_addr.nl_family = AF_NETLINK;
+ sk_addr.nl_groups = __UEVENT_LISTEN_ALL;
+
+ sk_addr_len = sizeof(sk_addr);
+ ret = bind(sk_fd, (struct sockaddr *)&sk_addr, sk_addr_len);
+ if (ret < 0) {
+ fprintf(stderr, "%s - Failed to bind socket\n", strerror(errno));
+ goto on_error;
+ }
+
+ ret = getsockname(sk_fd, (struct sockaddr *)&sk_addr, &sk_addr_len);
+ if (ret < 0) {
+ fprintf(stderr, "%s - Failed to retrieve socket name\n", strerror(errno));
+ goto on_error;
+ }
+
+ if ((size_t)sk_addr_len != sizeof(sk_addr)) {
+ fprintf(stderr, "Invalid socket address size\n");
+ goto on_error;
+ }
+
+ if (post_flags & CLONE_NEWUSER) {
+ ret = unshare(CLONE_NEWUSER);
+ if (ret < 0) {
+ fprintf(stderr,
+ "%s - Failed to unshare user namespace\n",
+ strerror(errno));
+ goto on_error;
+ }
+ }
+
+ if (post_flags & CLONE_NEWNET) {
+ ret = unshare(CLONE_NEWNET);
+ if (ret < 0) {
+ fprintf(stderr,
+ "%s - Failed to unshare network namespace\n",
+ strerror(errno));
+ goto on_error;
+ }
+ }
+
+ ret = write_nointr(sync_fd, &sync_add, sizeof(sync_add));
+ close(sync_fd);
+ if (ret != sizeof(sync_add)) {
+ fprintf(stderr, "Failed to synchronize with parent process\n");
+ goto on_error;
+ }
+
+ fret = 0;
+ for (;;) {
+ ssize_t r;
+
+ r = recvmsg(sk_fd, &hdr, 0);
+ if (r <= 0) {
+ fprintf(stderr, "%s - Failed to receive uevent\n", strerror(errno));
+ ret = -1;
+ break;
+ }
+
+ /* ignore libudev messages */
+ if (memcmp(buf, "libudev", 8) == 0)
+ continue;
+
+ /* ignore uevents we didn't trigger */
+ if (memcmp(buf, __UEVENT_HEADER, __UEVENT_HEADER_LEN) != 0)
+ continue;
+
+ if (!expect_uevent) {
+ fprintf(stderr, "Received unexpected uevent:\n");
+ ret = -1;
+ }
+
+ if (TH_LOG_ENABLED) {
+ /* If logging is enabled dump the received uevent. */
+ (void)write_nointr(STDERR_FILENO, buf, r);
+ (void)write_nointr(STDERR_FILENO, "\n", 1);
+ }
+
+ break;
+ }
+
+on_error:
+ close(sk_fd);
+
+ return fret;
+}
+
+int trigger_uevent(unsigned int times)
+{
+ int fd, ret;
+ unsigned int i;
+
+ fd = open(__DEV_FULL, O_RDWR | O_CLOEXEC);
+ if (fd < 0) {
+ if (errno != ENOENT)
+ return -EINVAL;
+
+ return -1;
+ }
+
+ for (i = 0; i < times; i++) {
+ ret = write_nointr(fd, "add\n", sizeof("add\n") - 1);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to trigger uevent\n");
+ break;
+ }
+ }
+ close(fd);
+
+ return ret;
+}
+
+int set_death_signal(void)
+{
+ int ret;
+ pid_t ppid;
+
+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
+
+ /* Check whether we have been orphaned. */
+ ppid = getppid();
+ if (ppid == 1) {
+ pid_t self;
+
+ self = getpid();
+ ret = kill(self, SIGKILL);
+ }
+
+ if (ret < 0)
+ return -1;
+
+ return 0;
+}
+
+static int do_test(unsigned long pre_flags, unsigned long post_flags,
+ bool expect_uevent, int sync_fd)
+{
+ int ret;
+ uint64_t wait_val;
+ pid_t pid;
+ sigset_t mask;
+ sigset_t orig_mask;
+ struct timespec timeout;
+
+ sigemptyset(&mask);
+ sigaddset(&mask, SIGCHLD);
+
+ ret = sigprocmask(SIG_BLOCK, &mask, &orig_mask);
+ if (ret < 0) {
+ fprintf(stderr, "%s- Failed to block SIGCHLD\n", strerror(errno));
+ return -1;
+ }
+
+ pid = fork();
+ if (pid < 0) {
+ fprintf(stderr, "%s - Failed to fork() new process\n", strerror(errno));
+ return -1;
+ }
+
+ if (pid == 0) {
+ /* Make sure that we go away when our parent dies. */
+ ret = set_death_signal();
+ if (ret < 0) {
+ fprintf(stderr, "Failed to set PR_SET_PDEATHSIG to SIGKILL\n");
+ _exit(EXIT_FAILURE);
+ }
+
+ if (pre_flags & CLONE_NEWUSER) {
+ ret = unshare(CLONE_NEWUSER);
+ if (ret < 0) {
+ fprintf(stderr,
+ "%s - Failed to unshare user namespace\n",
+ strerror(errno));
+ _exit(EXIT_FAILURE);
+ }
+ }
+
+ if (pre_flags & CLONE_NEWNET) {
+ ret = unshare(CLONE_NEWNET);
+ if (ret < 0) {
+ fprintf(stderr,
+ "%s - Failed to unshare network namespace\n",
+ strerror(errno));
+ _exit(EXIT_FAILURE);
+ }
+ }
+
+ if (uevent_listener(post_flags, expect_uevent, sync_fd) < 0)
+ _exit(EXIT_FAILURE);
+
+ _exit(EXIT_SUCCESS);
+ }
+
+ ret = read_nointr(sync_fd, &wait_val, sizeof(wait_val));
+ if (ret != sizeof(wait_val)) {
+ fprintf(stderr, "Failed to synchronize with child process\n");
+ _exit(EXIT_FAILURE);
+ }
+
+ /* Trigger 10 uevents to account for the case where the kernel might
+ * drop some.
+ */
+ ret = trigger_uevent(10);
+ if (ret < 0)
+ fprintf(stderr, "Failed triggering uevents\n");
+
+ /* Wait for 2 seconds before considering this failed. This should be
+ * plenty of time for the kernel to deliver the uevent even under heavy
+ * load.
+ */
+ timeout.tv_sec = 2;
+ timeout.tv_nsec = 0;
+
+again:
+ ret = sigtimedwait(&mask, NULL, &timeout);
+ if (ret < 0) {
+ if (errno == EINTR)
+ goto again;
+
+ if (!expect_uevent)
+ ret = kill(pid, SIGTERM); /* success */
+ else
+ ret = kill(pid, SIGUSR1); /* error */
+ if (ret < 0)
+ return -1;
+ }
+
+ ret = wait_for_pid(pid);
+ if (ret < 0)
+ return -1;
+
+ return ret;
+}
+
+static void signal_handler(int sig)
+{
+ if (sig == SIGTERM)
+ _exit(EXIT_SUCCESS);
+
+ _exit(EXIT_FAILURE);
+}
+
+TEST(uevent_filtering)
+{
+ int ret, sync_fd;
+ struct sigaction act;
+
+ if (geteuid()) {
+ TH_LOG("Uevent filtering tests require root privileges. Skipping test");
+ _exit(KSFT_SKIP);
+ }
+
+ ret = access(__DEV_FULL, F_OK);
+ EXPECT_EQ(0, ret) {
+ if (errno == ENOENT) {
+ TH_LOG(__DEV_FULL " does not exist. Skipping test");
+ _exit(KSFT_SKIP);
+ }
+
+ _exit(KSFT_FAIL);
+ }
+
+ act.sa_handler = signal_handler;
+ act.sa_flags = 0;
+ sigemptyset(&act.sa_mask);
+
+ ret = sigaction(SIGTERM, &act, NULL);
+ ASSERT_EQ(0, ret);
+
+ sync_fd = eventfd(0, EFD_CLOEXEC);
+ ASSERT_GE(sync_fd, 0);
+
+ /*
+ * Setup:
+ * - Open uevent listening socket in initial network namespace owned by
+ * initial user namespace.
+ * - Trigger uevent in initial network namespace owned by initial user
+ * namespace.
+ * Expected Result:
+ * - uevent listening socket receives uevent
+ */
+ ret = do_test(0, 0, true, sync_fd);
+ ASSERT_EQ(0, ret) {
+ goto do_cleanup;
+ }
+
+ /*
+ * Setup:
+ * - Open uevent listening socket in non-initial network namespace
+ * owned by initial user namespace.
+ * - Trigger uevent in initial network namespace owned by initial user
+ * namespace.
+ * Expected Result:
+ * - uevent listening socket receives uevent
+ */
+ ret = do_test(CLONE_NEWNET, 0, true, sync_fd);
+ ASSERT_EQ(0, ret) {
+ goto do_cleanup;
+ }
+
+ /*
+ * Setup:
+ * - unshare user namespace
+ * - Open uevent listening socket in initial network namespace
+ * owned by initial user namespace.
+ * - Trigger uevent in initial network namespace owned by initial user
+ * namespace.
+ * Expected Result:
+ * - uevent listening socket receives uevent
+ */
+ ret = do_test(CLONE_NEWUSER, 0, true, sync_fd);
+ ASSERT_EQ(0, ret) {
+ goto do_cleanup;
+ }
+
+ /*
+ * Setup:
+ * - Open uevent listening socket in non-initial network namespace
+ * owned by non-initial user namespace.
+ * - Trigger uevent in initial network namespace owned by initial user
+ * namespace.
+ * Expected Result:
+ * - uevent listening socket receives no uevent
+ */
+ ret = do_test(CLONE_NEWUSER | CLONE_NEWNET, 0, false, sync_fd);
+ ASSERT_EQ(0, ret) {
+ goto do_cleanup;
+ }
+
+ /*
+ * Setup:
+ * - Open uevent listening socket in initial network namespace
+ * owned by initial user namespace.
+ * - unshare network namespace
+ * - Trigger uevent in initial network namespace owned by initial user
+ * namespace.
+ * Expected Result:
+ * - uevent listening socket receives uevent
+ */
+ ret = do_test(0, CLONE_NEWNET, true, sync_fd);
+ ASSERT_EQ(0, ret) {
+ goto do_cleanup;
+ }
+
+ /*
+ * Setup:
+ * - Open uevent listening socket in initial network namespace
+ * owned by initial user namespace.
+ * - unshare user namespace
+ * - Trigger uevent in initial network namespace owned by initial user
+ * namespace.
+ * Expected Result:
+ * - uevent listening socket receives uevent
+ */
+ ret = do_test(0, CLONE_NEWUSER, true, sync_fd);
+ ASSERT_EQ(0, ret) {
+ goto do_cleanup;
+ }
+
+ /*
+ * Setup:
+ * - Open uevent listening socket in initial network namespace
+ * owned by initial user namespace.
+ * - unshare user namespace
+ * - unshare network namespace
+ * - Trigger uevent in initial network namespace owned by initial user
+ * namespace.
+ * Expected Result:
+ * - uevent listening socket receives uevent
+ */
+ ret = do_test(0, CLONE_NEWUSER | CLONE_NEWNET, true, sync_fd);
+ ASSERT_EQ(0, ret) {
+ goto do_cleanup;
+ }
+
+do_cleanup:
+ close(sync_fd);
+}
+
+TEST_HARNESS_MAIN